Big Graph Data Science
One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, and noisy interlinked data. We need data science techniques that can represent and reason effectively with this form of rich and multi-relational graph data. In this talk, I will describe some common inference patterns needed for graph data including: collective classification (predicting missing labels for nodes), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe some key capabilities required to solve these problems, and finally I will describe a highly scalable open-source probabilistic programming language being developed within my group to solve these challenges.
Lise Getoor is a professor in the Computer Science Department at the University of California, Santa Cruz. Her research areas include machine learning, data integration and reasoning under uncertainty, with an emphasis on graph and network data. She is a AAAI Fellow, serves on the Computing Research Association and International Machine Learning Society boards, was co-chair of ICML 2011, is a recipient of an NSF Career Award and nine best paper and best student paper awards. She was recently recognized as one of the top ten emerging researchers leaders in data mining and data science based on citation and impact, according to KDD Nuggets. She received her PhD from Stanford University, her MS from the University of California, Berkeley, and her BS from the University of California, Santa Barbara, and was a professor at the University of Maryland, College Park from 2001–2013.