While many datasets are of direct interest only to a few specialists, many others touch on questions that matter deeply to a variety of individuals and interest groups. What can we do to promote critical engagement with this data across a wide spectrum? We’ll need to leverage existing communities and create new ones; collaborate with educational platforms; build new analytic tools and expose existing ones to a wider public. Along the way, we should measure and display the growth of this improved engagement.
Many of us know that the centralization of data -- scientific data about the natural world, government data about how we are governed, personal data about ourselves -- poses many risks. But we don’t have examples of practical alternatives. Data Together models how we can use decentralized infrastructure to store public data, and to coordinate access, discovery, verification and preservation
Our model for data storage will outperform the competition. Provide methods for discovery and use data superior to the existing practices.
There are hundreds (thousands?) of sources of Open Data publishers on the centralized web. Any such data store is just an API away from being included in the distributed web. Let’s build bridges with citizen groups, libraries, archives, scientists, journals, and government organizations to pull all of this data into our richer, more robust, more durable decentralized platform.
The world’s data sources are vast; as an example, NOAA estimates that it produces petabytes of scientific data every week. Storing data at this scale is a significant burden for any node; storing multiple copies across a network is likely to be even more difficult. Our vision will require us to imagine and implement better ways for nodes to communicate their intention and responsibility to keep track of data. For instance, for a given IPFS “swarm”, a library node may commit to keeping copies of every piece of data that is not stored on N other trusted nodes. A high-performance computing facility might commit to keeping the last N weeks of data from a given source on a fast connection, while archiving older data on a low-speed storage medium. We will need to make it easy for community members to:
- Monitor the health of a swarm
- Commit to support a swarm at a level commensurate with their available resources