Women in Data Science (WiDS)
  • Home
  • About
    • Blog
    • WiDStory
    • News
    • Research
    • Sponsors
    • Collaborators
    • Contact
    • Donate
  • Conferences
    • WiDS Stanford 2023 Agenda
    • WiDS Stanford 2023 Speakers
    • WiDS Regional Events 2023
    • Ambassadors 2023 >
      • Ambassador Advisory Council
    • WiDS Ambassador Program
    • Past Conferences >
      • WiDS 2022
      • WiDS 2021
      • WiDS 2020
      • WiDS 2019
      • WiDS 2018
      • WiDS 2017
      • WiDS 2015
    • Conference Committee
  • Datathon
    • Datathon Details
    • Datathon Resources >
      • Datathon Press Release
    • WiDS Datathon Workshops 2023
    • Datathon News
    • Datathon Collaborators
    • Datathon Committee
  • Podcast
    • Podcast Committee
  • Education
    • Workshops >
      • Workshop Instructors
      • Workhop Committee
    • Next Gen >
      • Next Gen Resources
      • Next Gen Committee

Dealing with Missing Data

9/15/2021

 
Picture
We live in an era of big data with data sets that require computational analysis to gain insights and knowledge. The volume of big data has been increasing steadily, and will only continue to climb. Since we started the WiDS initiative in 2015, Statistica estimates that the volume of data has increased from 15.5 to 74 zetabytes, and they forecast that data volume will double again by 2024.
Yet with all of this data, one of the biggest challenges that data scientists and researchers face is dealing with missing data. In some cases, the missing data is due to not readily having access to the data sets that are required to perform the analysis, while other cases involve data sets that are incomplete and not uniformly populated.
This year, during the Women in Data Science (WiDS) Worldwide conference, Professor Fatima Abu Salem from the American University of Beirut (AUB) delivered a technical talk, “Doing Data Science in Data Deserts”. In the case studies that she described, her biggest hurdle was gaining access to the data sets required to do her analysis. As Fatima says, “...we have logistical and financial hurdles against the ability to collect data and when we do it, it has low temporal or spatial resolution”.
Fatima’s talk is also available in Arabic. You can also get to know Fatima better during her WiDS Worldwide Meet the Speaker session, moderated by WiDS ambassador, Lama Moussawi, Associate Dean for Research and Faculty Development at AUB. In addition, you can hear more about Fatima’s work and her journey from high school teacher to theoretical mathematician to data scientist and professor in thisepisode of the WiDS podcast. 
Also during the WiDS Worldwide conference, Megan Price and Maria Gargiulo from the Human Rights Data Analysis Group (HRDAG), delivered a workshop, “Data Processing & Statistical Models to Impute Missing Perpetrator Information”. For Megan, Maria, and their colleagues at HRDAG, the datasets that they receive from their partners have missing data on the perpetrators, as well as other variables. In this workshop, Maria describes her process and statistical methods for imputing the missing variables, which then helped her to impute the missing perpetrator data.
Megan Price also delivered a fascinating technical talk during the WiDS conference at Stanford in 2017, talking about how she and her colleagues used statistics and computer science methods to quantify how many people were killed in the conflict in Syria. You can learn more about how Megan uses data science to fight for human rights in this episode of the WiDS podcast.
Madeleine Udell, Assistant Professor at Cornell University and Stanford ICME alumnus, delivered a talk titled, “Filling in Missing Data with Low Rank Models”. Madeleine talks about how to use low rank models to analyze big, messy data sets, introducing the mathematics behind these models along the way. During the talk, she cites several examples including a 300-million-row data set with non-numeric and missing data that she wrangled during Obama’s 2020 campaign. Madeleine also delivered an excellent workshop at the WiDS Worldwide conference in 2021 on Automating Machine Learning.
As the volume of big, messy datasets continues to grow, the challenge of missing data will grow, too. Whether the problem is lack of access to the data sets that you need, or large swaths of missing data, there are data science methods for solving the problem. Thanks to Fatima, Megan, Maria, and Madeleine, you now have some strategies and approaches to dealing with missing data.

To get more great content like this, you’ll want to save the date for the next WiDS Worldwide conference happening at Stanford University and online on March 7, 2022. We hope you can join us.

Related Articles:
  • Responsible Data Science
  • Women in Data Science at the Forefront of COVID Research
  • WiDS Honors Juneteenth

Comments are closed.

    Categories

    All
    WiDS Ambassadors
    WiDS Conference
    WiDS Datathon
    WiDS NextGen
    WiDS Podcast
    WiDS Regional Events
    WiDStory
    WiDS Workshops

    RSS Feed

Initiatives

Conference
Ambassador Program
Datathon
Podcast
Workshops 
Next Gen

Follow Us

LinkedIn
Twitter
Facebook
Instagram
YouTube
​Blog

connect

LinkedIn Group
Facebook Group
subscribe
donate

© 2022 Women in data science. Women in Data Science is a Registered trademark of Stanford University. 

  • Home
  • About
    • Blog
    • WiDStory
    • News
    • Research
    • Sponsors
    • Collaborators
    • Contact
    • Donate
  • Conferences
    • WiDS Stanford 2023 Agenda
    • WiDS Stanford 2023 Speakers
    • WiDS Regional Events 2023
    • Ambassadors 2023 >
      • Ambassador Advisory Council
    • WiDS Ambassador Program
    • Past Conferences >
      • WiDS 2022
      • WiDS 2021
      • WiDS 2020
      • WiDS 2019
      • WiDS 2018
      • WiDS 2017
      • WiDS 2015
    • Conference Committee
  • Datathon
    • Datathon Details
    • Datathon Resources >
      • Datathon Press Release
    • WiDS Datathon Workshops 2023
    • Datathon News
    • Datathon Collaborators
    • Datathon Committee
  • Podcast
    • Podcast Committee
  • Education
    • Workshops >
      • Workshop Instructors
      • Workhop Committee
    • Next Gen >
      • Next Gen Resources
      • Next Gen Committee