Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# Open Grant Proposal: Onboarding and Indexing World’s Largest Open Datasets for Enhanced Filecoin Utility #1657

Closed
herrehesse opened this issue Sep 29, 2023 · 7 comments
Assignees

Comments

@herrehesse
Copy link

herrehesse commented Sep 29, 2023

Project Name: Onboarding and Indexing World’s Largest Open Datasets for Enhanced Filecoin Utility

Proposal Category: Applications

Individual or Entity Name: DCENT

Proposer: cryptowhizzard

(Optional) Filecoin ecosystem affiliations: DCENT

(Optional) Technical Sponsor: N/A

Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes

Project Summary

We aim to flawlessly store all 191 datasets listed on OpenPanda, ensuring data quality, indexing, and retrieval while showcasing Filecoin's capability to preserve humanity's essential information. This project serves as a beacon amid debates, showcasing best practices in data validity and retrievability.

Impact

Data authenticity, retrieval, and utility represent opportunities for enhancement within the Filecoin and the broader data storage ecosystem. By onboarding the world’s largest open datasets, we can amplify Filecoin's value, attracting more users and potentially boosting token demand. Embracing this initiative ensures we uphold trust, foster user engagement, and enhance network value. Success translates to a robust, reliable, and readily accessible dataset on the Filecoin network, serving diverse compute projects.

Outcomes

  • Successful storage of all 191 datasets with guaranteed retrieval functionality.
  • Targeting 5 to 10 copies worldwide, and prioritising rapid access for all datasets through HTTP, Graphsync and Bitswap.
  • Collaboration with trusted storage providers and tooling (Singularity & Boost V2) for quality assurance.
  • Collaboration with lending entities and FVM pools on SP collateral requirements.
  • Measurement of success: 95%+ retrievability and indexing of the on-boarded datasets, enhanced network value, and increased user engagement.

Adoption, Reach, and Growth Strategies

Target audience comprises developers, researchers, and organizations seeking reliable and accessible datasets. We are engaging with them through OpenPanda, Filecoin forums, and direct outreach. Our strategy involves tutorials, workshops, and demonstrations for initial user onboarding.

Datacap

One significant aspect of the process involves the application and utilisation of "datacap." Given the favorable disposition of storage providers towards sectors with datacap over regular deals, it's anticipated that the entirety of the data will be stored with datacap.

In the initial phase, we intend to navigate through the basic structure of the current LDN system to secure our datacap requirements. However, in a parallel effort, we will engage the Filecoin Plus community with a proposal to recognize us as a potential allocator of datacap, branching out from the traditional LDN approach.

Such an arrangement serves dual purposes:

  1. Experimentation & Evolution: Operating as a datacap allocator would equip us with a unique vantage point. This allows us to actively experiment with diverse models of datacap distribution, and in turn, contribute insights that aid the community in refining and redefining allocation strategies.

  2. Community Development: As active participants, we aim to offer recommendations and insights into various dimensions including retrieval standards, bot automations, and the broader framework that the community might adopt in future allocation mechanisms.

By embedding ourselves in this process, we aim not only to secure our data storage needs but also to actively shape and refine the Filecoin ecosystem's approach to datacap management.

Development Roadmap

  1. Milestone 1: Setup & High-Utility Onboarding

    • Set up infrastructure, gather all dataset information, and initial onboarding of the most frequently used datasets (1 - 46), ensuring their availability and utility from the get-go.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months
  2. Milestone 2: Intermediate Dataset Integration

    • Concentrate on integrating medium-utilized datasets (47 - 92), further expanding the platform's diversity and range.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months
  3. Milestone 3: Intermediate Dataset Integration

    • Concentrate on integrating low-utilized datasets (93 - 138), further expanding the platform's diversity and range.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months
  4. Milestone 3: Final Dataset Integration

    • Concentrate on integrating remaining datasets (139 - 191), completing the full range of 191 available sets on OpenPanda.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months

*We will later include a detailed list of datasets along with their corresponding milestones.

Total Budget Requested

Milestone # Description Deliverables Completion Date Funding
1 Setup & High-Utility Onboarding 25% Datasets Onboarded Q2 24 $50,000
2 Intermediate Dataset Onboarding 50% Datasets Onboarded Q3 24 $50,000
3 Intermediate Dataset Onboarding 75% Datasets Onboarded Q4 24 $50,000
4 Final Dataset Onboarding 100% Datasets Onboarded Q4 24 $50,000

Maintenance and Upgrade Plans

Post-project, we plan to continually monitor data integrity, ensure data remains indexed and retrievable, and work with the Filecoin community for improvements. Maintenance will be sustained via community contributions and potential future grants.

Team

Team Members

  • Hidde Hoogland
  • Wijnand Schouten
  • Ben Oostland

Team Member LinkedIn Profiles

Team Website

www.dcent.nl

Relevant Experience

Our data preparation business under the DCENT name has already onboarded over 100PiB in volume globally. Post the slingshot 2.6 program, we dived into genuine data onboarding and refined our techniques. We have developed tools to automate processes and maintain a track record of performance, positioning us uniquely for this task.

Team code repositories

GitHub.com/cryptowhizzard
OpenPanda GitHub RePo

Additional Information

We learned about the Open Grants Program through our continued involvement in the Filecoin community and from the Filecoin Foundation's outreach

@orvn
Copy link

orvn commented Oct 3, 2023

If this proposal is accepted, happy to support getting the onboarded data accessible on Open Panda (which I was a core contributor to).

@herrehesse
Copy link
Author

Hello @orvn,

Big thanks for your supportive comment!

Over the past 8 weeks since the initiation of this proposal, we have been exploring multiple pathways of getting the proposal approved. We kicked things off by talking with the "Data Program" teams to get feedback and share our goals. Later, we chatted with Deep, Mara, Porter, Stefaan, and Clara to fine-tune our ideas into a solid plan, with Porter being a huge help in drafting our proposal.

A week ago at the Iceland DEV Summit, @protocolin, @momack2, and I had a great talk about the cool things that could happen if this proposal takes off. We’re excited about the benefits but know that figuring out funding, especially with how the market is now, is a big hurdle.

We want this project to be something everyone supports and benefits from as we work towards our goal: making super important information easy for anyone to access and use on the Filecoin network.

We're fully committed to working through the challenges and keeping the communication clear and constructive. Everyone's support, feedback, and willingness to work together mean a lot as we push forward, aiming to make real progress.

@DSS-AL
Copy link

DSS-AL commented Oct 5, 2023

This is a fantastic initiative developed by one of the most active SPs in the community, DSS are resources to support this in a meaningful way upon implementation.

@xmcai2016
Copy link

I support this idea. It'd be great to have Open Panda showcase Filecoin's data retrievability end to end.

@ErinOCon
Copy link
Collaborator

HI @herrehesse, thank you for your patience with our review. Unfortunately, due to a shift in funding priorities in this current climate, we will not be moving forward with a grant at this time. If you have any questions for our team, please feel welcome to be in touch at [email protected].

Wishing you the best with your building progress!

@herrehesse
Copy link
Author

herrehesse commented Mar 29, 2024

@ErinOCon @xmcai2016 @orvn I am trying to get this grant moving through: filecoin-project/community#695 (comment)

@herrehesse
Copy link
Author

Adding another funding option through RETROPGF this round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants