A deep "data lake" for coronavirus information
An AI software provider has created a sprawling new "data lake" of information about the COVID-19 pandemic for researchers around the world.
Why it matters: In just a few short months, researchers have generated an astounding amount of data about COVID-19. Putting much of that information in an easily readable source will enable researchers and policymakers to get the most out of big data.
How it works: For all the rich data being produced about COVID-19, much of it is being compiled in separate silos by the government, academia and business, often in unreadable formats. Without an integrated data set, there's no easy way to produce the AI models used to analyze the many facets of the pandemic.
- C3.ai has produced a data lake that draws from scores of different sources. Researchers can explore areas that may be of interest — like diagnosis or preexisting conditions — as they build out models based on that data.
- "As a data scientist, you don't have to spend all your time connecting all of these sets," Tom Siebel, the CEO of C3.ai, tells Axios. "This enables scientists to perform very advanced research using AI, accurately predict the spread of the disease, and evaluate the efficacy of social mitigation."
- The C3.ai data lake is part of a group of other big data sets on the pandemic like CORD-19 and the 2019 Novel Coronavirus Research Compendium.
The bottom line: We live in the age of big data, and the lightning-quick research around COVID-19 demonstrates our ability to produce ever more information. But data can't be meaningful unless it is accessible.