A new way to archive the web

One might think that platforms such as Archive.org have done a decent job in providing an open archive of the web. A team of researchers believe that more could/should be done however, and suggest an innovative solution in a newly published paper.

The researchers suggest an open source and collaborative platform that they call Cobweb, which will enable a comprehensive web archive to be created via the coordination of existing efforts by the archiving community. They reason that by sharing this responsibility across a number of institutions, the aggregation of their effort will provide a more frequently updated archive at both greater speed and with less cost.

Better archiving

The researchers highlight the Arab Spring to elucidate their point, reminding us of how the rapidly unfolding events occurred across blogs, official media and social media, thus presenting a major challenge for archiving efforts.

“Recognizing the importance of recording this event, a curator immediately creates a new Cobweb project and issues an open call for nominations of relevant web sites,” they say. “Scholars, subject area specialists, interested members of the public, and event participants themselves quickly respond, contributing to a site list that is more comprehensive than could be created by any curator or institution.”

“Archiving institutions review the site list and publicly claim responsibility for capturing portions of it that are consistent with local collection development policies and technical capacities.”

Cobweb relies heavily on a collaborative approach from members, which is a distinction from existing efforts that have a little bit of collaboration but are primarily individual endeavors.

“As a centralized catalog of aggregated collection and seed-level descriptive metadata, Cobweb will enable a range of desirable collaborative, coordinated, and complementary collecting activities,” the team say. “Cobweb will leverage existing tools and sources of archival information, exploiting, for example, the APIs being developed for Archive-It to retrieve holdings information for over 3,500 collections from 350 institutions.”

If the project reaches fruition, it’s planned to be hosted at the California Digital Library, with initial data provided by the collected metadata from partners and stakeholders. It’s expected that the project will take a year, with a public release made during the IIPC General Assembly in April 2017 in order to gather feedback from the community.

The Horizons Tracker

A new way to archive the web

Better archiving

Related

Leave a Reply Cancel reply