Strategy

Spark engagement with datasets, collections, and archives

While many datasets are of direct interest only to a few specialists, many others touch on questions that matter deeply to a variety of individuals and interest groups. What can we do to promote critical engagement with this data across a wide spectrum? We’ll need to leverage existing communities and create new ones; collaborate with educational platforms; build new analytic tools and expose existing ones to a wider public. Along the way, we should measure and display the growth of this improved engagement.

Demonstrate it is possible for communities to decentralize data they care about

Many of us know that the centralization of data -- scientific data about the natural world, government data about how we are governed, personal data about ourselves -- poses many risks. But we don’t have examples of practical alternatives. Data Together models how we can use decentralized infrastructure to store public data, and to coordinate access, discovery, verification and preservation

Make it easier to use decentralized data than centralized stores

Our model for data storage will outperform the competition. Provide methods for discovery and use data superior to the existing practices.

Ingest all the world’s open data onto the Distributed Web

There are hundreds (thousands?) of sources of Open Data publishers on the centralized web. Any such data store is just an API away from being included in the distributed web. Let’s build bridges with citizen groups, libraries, archives, scientists, journals, and government organizations to pull all of this data into our richer, more robust, more durable decentralized platform.

Share the work of storing data dynamically

The world’s data sources are vast; as an example, NOAA estimates that it produces petabytes of scientific data every week. Storing data at this scale is a significant burden for any node; storing multiple copies across a network is likely to be even more difficult. Our vision will require us to imagine and implement better ways for nodes to communicate their intention and responsibility to keep track of data. For instance, for a given IPFS “swarm”, a library node may commit to keeping copies of every piece of data that is not stored on N other trusted nodes. A high-performance computing facility might commit to keeping the last N weeks of data from a given source on a fast connection, while archiving older data on a low-speed storage medium. We will need to make it easy for community members to:

Monitor the health of a swarm
Commit to support a swarm at a level commensurate with their available resources

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STRATEGY.md

STRATEGY.md

Strategy

Spark engagement with datasets, collections, and archives

Demonstrate it is possible for communities to decentralize data they care about

Make it easier to use decentralized data than centralized stores

Ingest all the world’s open data onto the Distributed Web

Share the work of storing data dynamically

Files

STRATEGY.md

Latest commit

History

STRATEGY.md

File metadata and controls

Strategy

Spark engagement with datasets, collections, and archives

Demonstrate it is possible for communities to decentralize data they care about

Make it easier to use decentralized data than centralized stores

Ingest all the world’s open data onto the Distributed Web

Share the work of storing data dynamically