Call for Participation: Archives Unleashed 4.0

Web Archive Datathon

The British Library

June 11 – 13, 2017

Travel grants will be available for US-based graduate students.

Applications for all attendees are due 7 April 2017. due April 17!!

Call for Participation
This event is the fourth workshop in the Archives Unleashed series. Each event is a standalone datathon aimed at building the Web Archiving community and providing a forum for interdisciplinary collaboration.
* * *

The World Wide Web has a profound impact on how we research and understand the past. The sheer amount of cultural information that is generated and, crucially, preserved every day in electronic form, presents exciting new opportunities for researchers. Much of this information is captured within web archives.

Web archives often contain hundreds of billions of web pages, ranging from individual homepages and social media posts, to institutional websites. These archives offer tremendous potential for social scientists and humanists, and the questions research may pose stretch across a multitude of fields. Scholars broaching topics dating back to the mid-1990s will find their projects enhanced by web data. Moreover, scholars hoping to study the evolution of cultural and societal phenomena will find a treasure trove of data in web archives. In short, web archives offer the ability to reconstruct large-scale traces of the relatively recent past.

While there has been considerable discussion about web archive tools and datasets, few forums or mechanisms for coordinated, mutually informing development efforts have been created. Our series of datathons presents an opportunity to collaboratively unleash our web collections, exploring cutting-edge research tools while fostering a broad-based consensus on future directions in web archive analysis.

This event will bring together a small group of 35 – 45 participants to collaboratively develop new open-source tools and approaches to web archives, and to kick-off collaboratively inspired research projects. Researchers should be comfortable with command line interactions, and knowledge of a scripting language (such as but not limited to Python) is strongly desired. By bringing together a group of like-minded scholars and programmers, we hope to begin building unified analytic production effort and to continue coalescing this nascent research community.

At this event, we hope to continue to converge on a shared vision of future directions in the use of web archives for inquiry in the humanities and social sciences in order to build a community of practice around various web archive analytics platforms and tools.

The event is sponsored by the British Library, Rutgers University, University of Waterloo, the National Science Foundation and the International Internet Preservation Consortium. Thanks to generous support from the British Library, lunch and refreshments during the event will be covered. In addition, there will be a reception the first night and a dinner the second night, supported by funding from the NetLab at Aarhus University, the Social Sciences and Humanities Research Council, and Rutgers University.

We are also providing sample datasets for people to work on during the datathon, or they are happy to use their own. Included datasets are:

  • The .gov web archive covering the American government domain
  • The End of Term Web Archives (.gov/.mil), from 2008, 2012, and 2016
  • Social media collections from the 2016 archive
  • Canadian Political Parties and Political Interest Groups collection and other datasets to be announced
  • UK Government Web Archive – 2010 UK General Election Collection: Data from crawls of UK Ministerial Departments, delivery channels (such as and a few other selected UK Central Government sites launched one month before the 2010 UK General Election and on the day after the new coalition government was formed (12th May 2010).
  • UK Government Web Archive – Public Inquiries, Inquests, Royal Commissions, Reviews and Investigations: Data from crawls of around 70 UK Government sponsored: Public Inquires, Inquests, Royal Commission, Reviews and Investigations. The crawl dates range from 2000 to 2016 but cover inquiries which reported as long ago as 1996.

Those interested in participating should send a 250-word expression of interest and a CV to Ian Milligan ( by 17 April 2017 with “Archives Unleashed” in the subject line. This expression of interest should address the scholarly questions that you will be bringing to the datathon, and what datasets you might be interested in either working with or bringing to the event. Applicants will be notified by 22 April 2017.

We expect to be able to issue a limited number of travel grants available for US-based doctoral students; preference will be given to those who have not participated in the Archives Unleashed program in the past, although we welcome returning participants. These grants can cover up to $1,000 USD in expenses. If you are in an eligible position, please indicate in your statement of interest that you would like to be considered for the travel grant.

On behalf of the organizers,

Matthew Weber (Rutgers University), Ian Milligan (University of Waterloo), Jimmy Lin (University of Waterloo), and Olga Holownia (British Library).