Common Crawl Foundation and Constellation Network announce partnership to bridge blockchain and AI

Common-Crawl

The Common Crawl Foundation, a nonprofit organization founded in 2007 dedicated to providing the public with a copy of the Internet, and Constellation Network, a Web3 blockchain ecosystem with a proven track record delivering solutions to the U.S. Department of Defense, today announced a strategic partnership aimed at democratizing and improving the accessibility and utility of web crawl data on blockchain technology for artificial intelligence (AI) and data applications.

This collaboration will explore potential ways to improve large language models used by AI. It will start with Common Crawl’s massive dataset, which is used by 80% of large language models, has crawled over 250 billion web pages to date (19 billion in 2024 alone), and consists of an archive of nearly nine petabytes of archived crawl data. By leveraging Constellation’s Hypergraph decentralized network, which ensures data immutability, provenance, and auditability, the partnership aims to offer joint solutions for responsible and transparent AI.

With AI set to be a $3 trillion industry by 2030, there is a growing need for secure solutions for sharing datasets used to train large language models, improved storage of queried and cleaned data, opportunities to monetize data, and improved transparency into the source of the data . With Constellation’s unique approach to providing tools for converging existing infrastructure with distributed and decentralized networks, and Common Crawl’s history with data and the growth of data usage, this partnership is poised to further democratize data.

Also Read: Abstracta Launches AI-Powered Copilot to Revolutionize Software Testing Efficiency

“This partnership represents a significant step forward in securing the trusted distribution of Common Crawl,” said Rich Skrenta, executive director of the Common Crawl Foundation. “By combining our comprehensive web archive with Constellation’s proven blockchain technology, researchers and developers around the world can trust what they receive from Common Crawl and have a model for authenticating large open datasets, such as those used for AI training.”

Ben Jorgensen , CEO of Constellation Network, explains: “The partnership between Constellation Network and Common Crawl underscores the broad adoption of web3 solutions outside the echo chambers of the crypto economy. This alignment continues Constellation’s mission to leverage our zero-trust network as a public good for a data-centric future.” Jorgensen continues: “Our goal is to attract new developers by showcasing capabilities such as integrating immutability into digital workflows, further differentiating ourselves from previous generations of blockchain technology.”

The two organizations will roll out this initiative in stages, starting with a customizable subnet called a Metagraph that incorporates a subset of Common Crawl‘s data. This subnet is currently live on their testnet and will soon be incorporated into Hypergraph, Constellation‘s public network. More details on the live Metagraph will be announced in the coming weeks, as well as information on how organizations and developers can participate.

SOURCE: PRNewsWire