Image

Data Management and the Humanities

Humanities Data: Document fragments from church archives in Cuba

Humanities Data: Document fragments from church archives in Cuba

This morning my colleagues Kristin Partlo and Sarah Calhoun presented with me on data management plans (DMPs). The session itself was mostly a conversation about some example DMPs from the collection of successful DMPs that was recently released by the National Endowment for the Humanities followed by conversations about how we might talk to faculty (using this DMP template as food for thought). But from this conversation a few themes crystalized for me.

Why would humanists care?

I think we often hear that “sharing the stuff of my research with the world” isn’t a huge motivator in the humanities. I’m not sure how broadly true that is, but it’s true enough with enough people that it bears thinking about. And I have heard skepticism, usually in the form of “nobody else will care about this stuff I’ve collected,” so how do we address that?

Sometimes I think maybe it’s a matter of vocabulary, so talking about creating “an archive” or “a digital collection” might capture imaginations where “manage/share your data” won’t. Even talking about how a bibliography is a described dataset can be useful because humanists are very familiar with practices of collecting, organizing, and sharing bibliographic information, usually at the ends of articles, books, and syllabi, but sometimes independently. Humanists have actually been doing data management since forever when you think about bibliographies. In this context, we care about versions (editions), standardized and encoded “data fields” (so that other researchers know if you’re referring to a title of a chapter, article, or book, for example), durable URLs whenever possible… Bibliographies are rich with descriptive and preservationist practices that can help inform management and sharing of files and information more broadly.

I think another aspect of why Humanists might care involves thinking about sharing with the sympathetic collaborator that is your future self. Your future self will forget the ins and outs of where you put or how you named your image files, or what the columns on your spreadsheet actually mean. Our computers are chalk full of the stuff of our research — PDFs, draft versions, images, audio, video. Knowing where all of those things are requires management of all of that data just so that you can find things again later.

Learn through analogy with the known

Kristin talks about how the monograph is largely self-describing. It has title, author, and publisher information in predictable spaces. There’s the table of contents, often an index and bibliography, and things like introductions and conclusions that describe the book for you.

Then there are style guides and formal or informal glossaries that people adopt, and these serve to help make your data (“data” writ large) understandable and consistent for other readers.

These are things that are familiar, so it’s easier to point to these things and remind ourselves that we’ve already seen the usefulness of self-describing units of scholarship and of somewhat standardized best practices. Now we just need to apply it to the digital stuff we’re working with more and more these days.

And in the past, libraries and archives were the main places that managed the sharing of shareable humanities data (primary and secondary sources), but the sharing involved researchers traveling to those collections (humanities “datasets”) to use them. Now individual researchers can create collections, or use collections without physically traveling from dataset to dataset. But this also means that researchers now have more responsibility to do some of the description and standardization that libraries, archives, and publishers formerly did a lot of. So yes, the work feels different, but it’s built on the same principles that humanists already value.

Formal vs informal data management

One of the interesting themes of our session and all the other data-related sessions I attended is that people talk a lot about the data management requirements of grant-funded projects. Meanwhile, I haven’t supported that work at all, but I have helped quite a few people manage their own individual or collaborative non-grant-funded projects. And I really think that data management becomes much more alive and broadly useful if I think about how the best practices identified and codified by grant funding agencies help us think about best practices for regular, every-day digital life, right down to the daily action of naming and putting a file into a folder on my computer. For me and for the people I’ve worked with so far, formal DMPs are alien things that don’t intersect with my life and research. But the spirit and practices reflected in those DMPs? THOSE I care deeply about. So perhaps shifting language from compliance to best practices, and shifting focus from grants to every day organizational practices, perhaps these things can help make data management less of an alien object that only scientists and social scientists ever touch.

Leave a Reply