Risa Wechsler: Cosmologists use data science to explain the origins of the universe

preview_player
Показать описание
A new generation of powerful telescopes will generate vast amounts of data, but crunching those numbers poses difficult challenges for data scientists.

Cosmology, the study of the origin of the universe, is usually thought of as the realm of astronomers armed with giant telescopes and sophisticated satellites. But it is also a data science, “and a very interesting one,” says Risa Wechsler, an associate professor of physics at Stanford University.
Explaining that relationship was the theme of Wechsler’s talk at this year’s Women in Data Science conference at Stanford. “I have the pleasure to talk to you about trying to understand the data of literally the entire universe. So that’s definitely a big data problem.”
Cosmology, she says, is striving to answer fundamental questions about the universe, such as how did it begin, what is it made of, what’s accelerating it, and how did galaxies form?
The links between cosmology and data science are especially pertinent now as researchers prepare to analyze a flood of data from two new and extremely powerful telescopes – the Large Synoptic Survey Telescope, or LSST, and the Hyper Suprime-Cam Telescope. When completed, the LSST will contain a 3200-megapixel camera, the largest ever built.
Cosmology, Wechsler says, has undergone a revolution in the last 20 years, driven by the vast increase in data available to analyze. The LSST, for example, will take a picture of the entire sky every three days, producing about 20 terabytes of data a night.
Simulations of the universe have grown exponentially over the years. Wechsler demonstrated that by showing a simulation made in 1985, which contained 32,000 particles. A similar simulation performed in 2014 contained a trillion particles. Building it took several tens of millions of CPU hours on one of the largest computers in the world, according to Wechsler.
Building the models involves combining data from parts of the visible universe with data derived using machine learning and other types of statistical inference. Doing so allows scientists to make predictions about what the universe as a whole looks like and how to infer the parameters that determine how it behaves.
Dealing with such a massive amount of data poses challenges. Often the available computational resources aren’t sufficient, “so we have to be smart about picking which pieces of that problem we solve and in what combination,” Wechsler says.
Accuracy is important to all researchers, of course, but Wechsler says she and her colleagues feel a special responsibility to be careful. “We’re really trying to tell you what is the physics of the universe, and we don’t want to get it wrong.”
Рекомендации по теме