In the WITNESS Media Archive, we rely on a lot of open-source and openly documented (i.e. with published specs) resources to manage our collection of human rights videos created by our partners all over the world. This will be the first in a series of posts in which I’ll explore some of the tools and standards we employ in the archive, in the spirit of information sharing and with the hope of sparking some discussion.
Today I’ll look at the metadata standards that we use to describe and manage the videos in our collection. Metadata is structured information that describes an information resource. The publisher information and ISBN on the inside cover of a book, for example, is metadata; so is the bitrate and codec information embedded in a digital video file. A metadata standard is a set of rules for interpreting and expressing metadata that has been widely adopted by a community. Metadata and metadata standards are important because they make it possible to find, sort, interpret, aggregate, and exchange information.
The WITNESS Media Archive contains around 4000 hours of footage (and growing!), so it’s important that we describe videos in our catalog in a consistent way so that we can find them later. To do this, we draw from openly documented metadata standards, such as the Public Broadcasting Metadata Dictionary Project, otherwise known as PBCore. The fields in the WITNESS Catalog mostly align with the hierarchies, elements, and meanings specified in PBCore, enabling us to create PBCore-compliant, human- and machine-readable exports that can be exchanged and used by others. Using a recognized standard means that our catalog records can be easily understood outside of WITNESS, and allows us to take advantage of tools developed by others based on the standard (such as these).
Of course, one single standard may not always meet all of the specific needs of an archive. PBCore primarily focuses on descriptive and technical metadata. Therefore, to document any other information such as preservation activities, for example, we turned to another standard specifically designed for preservation metadata called PREMIS. Most of our technical metadata (e.g. bitrate, fps, codec) is translated to PBCore from a MediaInfo report (MediaInfo is not a metadata standard but rather a widely adopted open source tool for extracting metadata from digital media files that uses its own custom data dictionary and structure).
Sometimes, there are no existing standards for the information we want to keep track of. For example, we do not rely on a standard for our extensive rights and security information, such as information about whether we can show the name or face of a person depicted in a video. There is also no generally accepted controlled vocabulary for human rights-related topic terms. In cases like these, we have developed our own internal standards, which we carefully document and have made available to other organizations (look out some time in the future when we will publish our Subjects Thesaurus here!).
This guy really likes metadata (and so do we!):