One of the central tasks in scientific computing is to accurately
approximate unknown target functions. This is typically done with the help
of data - samples of the unknown functions. In statistics this falls into
the realm of regression and machine learning. In mathematics, it is the
central theme of approximation theory. The emergence of Big Data presents
both opportunities and challenges. On one hand, big data introduces more
information about the unknowns and, in principle, allows us to create more
accurate models. On the other hand, data storage and processing become
highly challenging. Moreover, data often contain certain corruption errors,
in addition to the standard noisy errors. In this talk, we present some new
developments regarding certain aspects of big data approximation. More
specifically, we present numerical algorithms that address two issues: (1)
how to automatically eliminate corruption/biased errors in data; and (2)
how to create accurate approximation models in very high dimensional spaces
using stream/live data, without the need to store the entire data set. We
present both the numerical algorithms, which are easy to implement, as well
as rigorous analysis for their theoretical foundation.
|