Time: Lunch: 12:30pm; Talk: 1pm

Location: Science Ctr. Hall E, 1 Oxford Street, Cambridge MA 02138

Speaker: Jeff Bilmes, Professor of Electrical Engineering at the University of Washington

Title: Summarizing Large Data Sets

Abstract: The recent growth of available data is both a blessing and a curse for the field of data science. While large data sets can lead to improved predictive accuracy and can motivate research in parallel computing, they can also be plagued with redundancy, leading to wasted computation. In this talk we will discuss a class of approaches to data summarization and subset selection based on submodular functions. We will see how a form of "combinatorial dependence" over data sets can be naturally induced via submodular functions, and how resulting submodular programs (that often have approximation guarantees) can yield practical and high-quality data summarization strategies. The effectiveness of this approach will be demonstrated based on results from a wide range of applications, including document summarization, machine learning training data subset selection (for speech recognition, machine translation, and handwritten digit recognition), image summarization, and assay selection in functional genomics.

Speaker Bio: Jeffrey A. Bilmes is a professor in the Department of Electrical Engineering at the University of Washington, Seattle and an adjunct professor in the Department of Computer Science and Engineering and the Department of Linguistics. He received his Ph.D. in Computer Science from the University of California, Berkeley. He is a 2001 NSF Career award winner, a 2002 CRA Digital Government Fellow, a 2008 NAE Gilbreth Lectureship award recipient, and a 2012/2013 ISCA Distinguished Lecturer. Prof. Bilmes has been working on submodularity in machine learning for more than twelve years. He received the best paper award at ICML 2013 and a best paper award at NIPS 2013 for work in this area. Prof. Bilmes is also a recipient of a 25-year paper award from the International Conference on Supercomputing for his 1997 paper on high-performance matrix optimization. Prof. Bilmes has authored the graphical models toolkit (GMTK), a dynamic graphical-model based software system that is widely used in speech and language processing, bioinformatics, and human-activity recognition.

Free and open to the public. No registration required.

***********************

UPCOMING SEMINARS

4/10 Budhendra Bhaduri (Oak Ridge National Laboratory--- Geographic Information Science and Technology) ON "Big Data, Geospatial Computing, and My 2 Cents in an Open Data Economy"

4/24 Christian Rudder (OkCupid) ON “DATA: A Love Story"

Click here to subscribe to our events list.