By Tessa Fallon
The web has given human rights organizations unprecedented access to global audiences. However a website will last only as long as funds are available for maintenance and hosting. Leaving aside practical challenges which exist for every website, in many places there is also the possibility of sabotage or attempts to remove a human rights-related website by opponents, religious, ideological, governmental or otherwise. Examples include denial-of-service attacks or in the most extreme case, the cutoff of all Internet service providers (the Internet “kill” switch).
In 2009, Columbia University Libraries received a grant from the Mellon Foundation to explore web archiving program development. The collection at the center of our web archiving program is the Human Rights Web Archive. The initial and prevailing focus of this collection is websites of human rights NGOs. As the project progresses, we have also included national human rights institutions, truth commissions, tribunals, and blogs related to human rights.
Web archiving may be defined as the selection, harvesting, and preservation of web sites. An archived website is more than a snapshot of a website: it is a working copy of a website as it existed at the time of capture. Sites are captured by a web crawler (such as Heritrix) and then displayed using specialized software (such as the WayBack Machine). We currently crawl the sites of the Human Rights Web Archive four times a year. For active sites, there may exist 12 captures of a single site at different points in time.
Why are we doing this? In brief, to preserve online resources for future researchers and activists. Archiving the sites of human rights organizations ensures, to a certain degree, that the website content will be preserved in the context of the original site, and will be accessible even if the original site becomes unavailable. As with all things digital, long-term preservation is still very much an open question, but web archiving is a step towards an additional degree of protection. In our case, our data will persevere as long as the Internet Archive does, stored in WARC (Web ARChive) files, an international standard.
In September 2011, we crawled over 300 GB of data, 4 million URLs, and 800,000 files. While the live sites are currently available to everyone, what are the chances of being able to find the same particular pages, documents, and video in five years? “Link rot,” or the disappearance/changing of URLs makes bookmarking pages or storing URLs unreliable. Downloading every reference or page of interest is an untenable solution. However, an archived URL will not change (although modes of access can change). If you want to see the URL you bookmarked in 2009, you can search for a 2009 capture of the URL in a web archive.
Sierra Leone TRC website, as captured by the Internet Archive on November 15, 2005.
Temporary bodies or organizations, such as truth commissions or tribunals, are particularly vulnerable to the ephemerality of the web. After the group ceases operations, it is not always clear who has responsibility for the website. Sites vanish (Sierra Leone TRC), or are maintained by dedicated individuals (Greensboro TRC) or simply exist as static sites until the domain expires. In the last case, site functionality usually begins to fail (link rot, site corruption/spam, pages not found, etc.) long before the domain expires (Peru TRC).
Tracking Global Movements and Trends
In addition to ensuring enduring access, the Human Rights Web Archive also provides a sense of context and temporality for the state of human rights in the world from 2009 to present. Because of the interconnectedness of the global human rights community, news and documents proliferate across sites, and events occurring in one region are often documented by organizations located in other regions (especially true for globally-focused organizations like Amnesty International or Human Rights Watch). The capture of hundreds of human rights websites allows a web archives user to track reporting of a single event or series of events across organizations at a specific point in time.
During the Arab Spring, a number of web archiving initiatives (Columbia, American University in Cairo, Internet Memory Foundation, Internet Archive) collected web sites reporting on the protests in the Middle East and North Africa. These collections document not only the organizations’ sites, but the momentum of a movement that swept over a dozen countries. Reflected in these archived sites are the multitudes of voices, cultures, and ideologies that were part of the Arab Spring, as it was occurring. The archiving of many of these websites continues, capturing the ongoing struggle and unrest in the Arab world and creating a living archive.
As the Human Rights Web Archive and other web archives continue to grow, it is also possible to study trends in human rights activism, such as changing tactics of human rights defenders and government responses to human rights violations. The archives also reflect how human rights organizations are communicating via social media. After our June 2011 crawl of 369 sites, reviewers found that at least 115 sites had Facebook pages/profiles, 98 had Twitter accounts, 77 published material on YouTube or had their own YouTube channel, and 19 had had Flickr accounts.
Spotlight on Local Organizations
Another objective of the Human Rights Web Archive is to preserve the websites of smaller NGOs working on a local or national scale in the Global South, and to provide enhanced discovery of these organizations and their online resources. When using any commercial search engine, results for human rights-related searches tend to give more weight to information and publications from the major international organizations. This makes sense for the purposes of search engines, but it tends to obscure the work of smaller organizations, which may be far more relevant for the searcher.
Gays and Lesbians of Zimbabwe (GALZ) website
For example: we recently received permission to archive the website of the organization Gays and Lesbians of Zimbabwe (GALZ), a non-governmental organization based in Harare, Zimbabwe. In a basic search for “gay rights Zimbabwe” in Google, GALZ does not appear until page seven of the search results. Seven of the top ten results were from American or British media. If I restricted the search to the exact phrase “gay rights Zimbabwe”, the GALZ web site does not appear at all. The only search that returned the organization in the top 10 was an exact search for the organization’s name.
This example illustrates the need to increase discovery for smaller human rights organizations, as well as the importance of metadata. HuriSearch, a search engine created by the organization HURIDOCS provides focused searching of live human rights websites. For archived human rights sites, Columbia catalogers create a full bibliographic record for our online catalog (CLIO), a record in the global catalog WorldCat, and a record for our Archive-It collection page.
Help Us Grow the Human Rights Web Archive
For more information or to nominate a site to be included in the Archive, please visit our website. For more about web archiving, please visit the website of the International Internet Preservation Consortium (IIPC).
The University of Texas at Austin is also archiving human rights website as part of its Human Rights Documentation Initiative. Do you know of other preservation and archiving projects aimed at human rights content on the Internet? Please share them below in the comments section.
The WITNESS website, as captured by the Internet Archive on December 22, 1996.
Tessa Fallon is a Web collection Curator at Columbia University Libraries. Her work is currently focused on the development of the Human Rights Web Archive. Tessa is co-director of the International Council on Archives’ Human Rights Archives Directory Project and co-chair of the Society of American Archivists Human Rights Roundtable.