Location: Science Ctr. Hall E, 1 Oxford Street, Cambridge MA 02138
Speaker: Jeff Bilmes, Professor of Electrical Engineering at the University of Washington
Title: Summarizing Large Data Sets
Abstract: The recent growth of available data is both a blessing and a curse for the field of data science. While large data sets can lead to improved predictive accuracy and can motivate research in parallel computing, they can also be plagued with redundancy, leading to wasted computation. In this talk we will discuss a class of approaches to data summarization and subset selection based on submodular functions. We will see how a form of "combinatorial dependence" over data sets can be naturally induced via submodular functions, and how resulting submodular programs (that often have approximation guarantees) can yield practical and high-quality data summarization strategies. The effectiveness of this approach will be demonstrated based on results from a wide range of applications, including document summarization, machine learning training data subset selection (for speech recognition, machine translation, and handwritten digit recognition), image summarization, and assay selection in functional genomics.
Free and open to the public. No registration required.
***********************
UPCOMING SEMINARS
4/10 Budhendra Bhaduri (Oak Ridge National Laboratory--- Geographic Information Science and Technology) ON "Big Data, Geospatial Computing, and My 2 Cents in an Open Data Economy"
4/24 Christian Rudder (OkCupid) ON “DATA: A Love Story"
Click here to subscribe to our events list.