EarthCube Brings Big Data Sets to Diverse Researchers
EarthCube — described as a cyber-infrastructure for the next generation of geoscience — is trying to do just that, and taking into consideration the needs of the science communities it serves.An avalanche of big data has descended on the field of Earth sciences, and researchers are increasingly turning to novel ways of integrating, assimilating and interpreting those data to better understand the field.
EarthCube was built for the future of big and heterogeneous data science. During the past few years, the National Science Foundation (NSF) has realized that each time they create a large program, they have to create a new stovepipe for the information.
EarthCube was born in 2011, to accelerate the convergence process for the cyber-infrastructure needed in the next generation of projects, create a scalable system, and take advantage of emerging technologies. EarthCube aims to transform the conduct of research through the development of community-guided cyber-infrastructure to integrate information and data across the geosciences, says Eva Zanzerkia, National Science Foundation program director, Division of Earth.
EarthCube is not just about big science, either. “Big data is part of science, but there’s also a whole set of small-data science out there that can be integrated together,” says Zanzerkia. While accessing and integrating information from diverse research communities has been a challenge in the past, EarthCube will try to bring different data sets together.
From Years to Days
One of the pilot projects supported by EarthCube focused on understanding the terrestrial biosphere. The project aimed to seamlessly integrate knowledge from multiple sources to look at the Earth in a broad way, using models of the terrestrial biosphere.
“We can zoom in on different areas, look at inter-annual variability,” explains Robert Cook, a scientist at Oak Ridge National Laboratory. “As you can imagine, we have a tremendous amount of data – say, 100 years of modeling results over the whole globe. That’s an enormous amount of interesting information.”
As part of this pilot, EarthCube supports, among other projects, an experimental “brokering” technique to find information in different sciences and bring data together in a useful way. For example, the broker compiles information on soil types and weather patterns, and uses Web services to acquire data from different sources, then puts them in a format that the models can use. “Eventually we’ll use the brokering technique to pull out the information we need and use tools to visualize the results, doing a sort of advanced visualization,” adds Cook.
Cook explains that before the brokering tool existed, the same results would be available, but it would take around two years to manually acquire the data and process it in a proper way for the models. Using a unique brokering system to do the processing could yield the same results in just a few days, says Cook – as long as it’s set up in the proper way.
EarthCube is trying to make its tools accessible and user-friendly. “We believe that the wave of the future is to make cyber-infrastructure a seamless thing, so as a geoscientist in the field you don’t even know you’re using it,” explains NSF’s Zanzerkia.
One example of a solution comes in another EarthCube pilot project. Siri Jodha Singh Khalsa, a research scientist at the National Snow and Ice Data Center in Boulder, Colorado, is working on a project that could relieve researchers who consume or publish data from the burden of implementing data management solutions. One solution might be to create “middleware,” which is independent of any given repository, Khalsa explains. “In this EarthCube pilot project, the goal is to have a broker or brokers that are … an infrastructure service, like the Web is a service everyone utilizes.”
There are challenges to creating such a project – and while some are technical, others are social. Types of data gathered in different research communities can be diverse. Some communities have isolated data sets, hard-won by individual researchers. Those data may be very ad hoc and difficult to search, says Khalsa.
On the other hand, some communities are on top of cyber-infrastructure, linked repositories, and up-to-speed. “So how do you bridge those two?” Khalsa asks. “And how to convince a community that it’s to their advantage to build infrastructure that benefits the rest of science? There are very different cultures, perceptions, and desires out there.”
Part of EarthCube program aims to tackle those core sociological issues in addition to the technological challenges. Joel Cutcher-Gershenfeld, a professor in the School of Labor and Employment Relations at the University of Illinois at Urbana-Champaign, has examined stakeholder issues in scientific communities for years.
In 2011, Cutcher-Gershenfeld began a study on stakeholder alignment in Earth science. “It was clear to everyone that the success depends not just on the technical architecture, but also on the associated social systems by which that happens,” he explains. He conducted an international stakeholder survey, which was presented in June 2012. The results were presented with the idea that it was not only important to understand the views, but important for the stakeholders to understand the view of each other.
He found that across all fields and disciplines, there’s a broad consensus that having access to other data, tools, models, and software is important. More than 90 percent of geoscientists report that having access to data other than their own is important. And more than 80 percent of geoscientists report that it’s important to have access to tools even outside of their field or disciplines. The vast majority of researchers said it was difficult to get the necessary data, tools and software.
Cutcher-Gershenfeld also learned that people don’t perceive there’s a lot of support from their home organizations to get involved in collaborations like EarthCube. He says that his research is not a one-time event, and will continue as a loop, offering feedback to future parts of the project. While the reasons aren’t entirely clear, Cutcher-Gershenfeld says the data reflection of broader challenges of interdisciplinary and collaborative science – which are only beginning to be addressed with respect to promotion, tenure, competitive review, and other arrangements.
Jay Pearlman, an IEEE fellow, says that EarthCube’s benefits will reach far into the future of science, stemming from the discovery of new information and the ability to access it. When scientists can translate each other’s data sets into terms, timescales and coordinates that fit into their systems, the work of all is improved.
Pearlman points to another example of a problem in the geosciences that EarthCube could help: Carbon sequestration requires remote sensing data and they’re not always a consistent set. Earthcube could be a useful tool to solve that problem, bringing together diverse science for the betterment of all.
Other Articles on EarthCubeby