Blog

[ on digital matters and a dissertation ]

Dublin Core (and Omeka): a project prequel

The phrase Dublin Core might well conjure a folksonomy of James Joyce, Sweet Molly Malone, The Pogues (okay, not strictly Dubliners), the Chester Beatty, Jameson and Guinness.

guinnessAnd if it does, this one’s for you.

Actually, the referenced Dublin is Dublin, Ohio, site of the first exploratory Core workshop; the Core itself, 15 basic descriptors to classify, identify, and categorize just about any digital resource on the web. Officially known as the Dublin Core Metadata Initiative (DCMI), the idea for Dublin Core originated in 1995 when self-described “freaks…geeks, and the people with sensible shoes,” admitted frustration in those pre-search-engine days over challenges of finding materials online. They looked for a solution appropriate to ordering and locating digital resources just as library classification systems (think Dewey Decimal) sort and relate multivariate materials.

The answer was Dublin Core, a metadata schematic with standardized categories to structure descriptions of digital resources across disciplines and among diverse kinds of materials and projects. These standards are developed, defined, and continuously refined via DCMI’s international, cross-disciplinary community.

WTF…?

dublincore

Dublin Core is extensively documented. Even the most digitally inclined, however, have been known (anecdotally) to mutter some iteration of “What the…?” on first look at the definitions and descriptions explaining the metadata set (right).

And perhaps no wonder.

The terms themselves—title, description, subject, and so on—seem self-evident; but DCMI’s explanations of them obfuscate. What does it mean that “Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.” Which vocabulary? Whose scheme? Or how about “…reference the resource by means of a string or number conforming to a formal identification system?” And again, what formal identification system? String? Number?

The how to of Dublin Core assumes basic familiarity with the structure of archival and library metadata. But those of us who use archives and surf databases with scholarly abandon haven’t always created them or worked behind the scenes. We don’t have to understand the thinking behind their organization in order to be able to use them. Therein lies a disjuncture perhaps comparable to the gap between reading a foreign language and actually speaking it.

My own confrontation with Dublin Core occurs primarily through developing projects powered by Omeka, the web-publishing platform for library, museum, archives and scholarly collections and exhibits developed at Roy Rosenzweig Center for History and New Media (CHNM). What follows is a basic stab at clarifying questions that flash through my head in nanoseconds during work on various projects. The answers take longer and come out of conversations with Omeka developers at CHNM, with other project managers, from reading Omeka forums, and from mind probes of other Dublin Core beginners to identify their points of greatest confusion.

So where do you go after “What the…?” Here are three basic questions and answers, prequels, to jumpstart the use of Dublin Core and an Omeka project.

A string conforming to a formal identification system.

1. Why mess with Dublin Core in the first place? What does it do for my Omeka project?

It’s helpful to have an overview of exactly what Dublin Core allows us do and why we need to do it. Individual items entered into a database (e.g., documents, still images, oral histories, persons, moving images) are the foundation of Omeka projects. Whether a database contains ten items or several thousand, whether it’s an individual project or a massive institutional archive, one purpose of that—or any—database is efficient information retrieval, a process which requires organizing and identifying the content of the database around consistent criteria or metadata.

As The Rutgers University Repository Metadata Guidelines documentation explains,

Metadata allows people to discover, view, locate, and use…items. It provides the framework and the vocabulary for collection owners to document collections. It instructs computers in how to display images or texts.

And from Tooling Up, a workshop series from Stanford University,

…databases connect files in networks. With this model, a single item can be linked to many different entities instead of a single parent entity. …databases allow a single item to possess multiple connections to other files and folders.

Dublin Core Metadata Elements structure the items in the Omeka digital database from three points of view (sometimes variously named): descriptive (title, creator, subject, description), administrative (publisher, contributor, date, source), and technical (type, language, coverage, relation).

These metadata sets comprise a generic set of categories that document, authenticate, source, and qualify all varieties of items in the database. Essentially, each item is parsed in terms of its content, of its format, and of administrative details surrounding stewardship such as rights and preservation. These categories identify the common elements among items enabling cross-references and relationships among them. Simultaneously, these categories include information that is unique to each item and essential to cataloging, referencing and retrieving it. Items, then, are identified according to their relationships within the collective and according to their individual characteristics.

2. Even so, the explanations of some metadata elements don’t make sense. There’s a lot of jargon and what seems like repetition and overlap. I’m confused?

The explanations of Dublin Core Metadata Elements reflect an innate tension between structure and flexibility and between the abstract and the concrete. The Dublin Core Metadata Element set is intended to come as close as possible to being all things to all items, and abstract definitions and descriptions offer the widest latitude for universal use. The trick is defining how the specific items–images, documents, and objects–of a project intersect with those abstract explanations.

Dublin Core Metadata and Omeka encourage thinking about items as parts of a larger structure, as networked components of a narrative. The description and definition of the metadata elements emphasize that a coherent narrative is supported with consistent metadata, or controlled vocabulary–that is, consistent terminology, consistent descriptive parameters, consistent stylistic conventions. Ideally, too, the metadata associated with each item not only creates networks and narratives within a database, but offers the possibility of linking to other databases.

However, the guidelines or explanations for each element of the Dublin Core set are also simply suggestions about how to create a project infrastructure. The content for each field and the form of that content for each of the items in the database–even the inclusion, exclusion, or expansion of metadata fields–is the reflection of project planning, of mediating the database. So, too, is the design or selection of controlled vocabulary, the decision to use only controlled vocabulary or a mix-and-match with open tags and descriptors, and a variety of other parameters. That’s the flexibility of Dublin Core. The content of metadata fields–perhaps particularly the descriptive fields–is responsive to the kinds of items in a project and to the organizational needs.

3. What are the first steps, then, the prequel to working with Dublin Core in Omeka?

With the caveat: there’s no one way!

Since Omeka is item-based, it makes sense to take a look at the items the database will contain and how they will relate to each other, to start asking questions of and about each item type. What kinds of items (item types) are there? What descriptive, administrative, and technical information is necessary to ensure that each item is fully described, inter-related, contextualized, authenticated, and easily accessed?

Create a visualization: draw a picture; make a chart; define critical data for each item type; link relationships; check for hierarchies. The partial Omnigraffle diagram below served as a starting point for a project containing personal and official materials of World War II veterans and helped to define the contents of the database.
omnigraffle

Second and third steps for that same project involved creating a metadata dictionary; that is, defining the content of each metadata field (below) and deciding what metadata about each item type is critical to the integrity and vitality of its representation. Concomitantly, creating the equivalent of a style sheet for the content of the metadata fields, clarifies where controlled vocabulary is important (geographic terms, for example) and where open terminology boosted item and information retrieval.

metadata dictionary

With those simple steps, the infrastructure of a project begins to take shape and gather cohesion. And once those basic steps are in place, the abstraction of Dublin Core also takes form in the mist, and it’s time to move on, exploring these and similar articles: