Since last I wrote, I’ve been able to learn more about a question that had plagued me: what of IPUMS, that unparalleled resource for census micro-data? For one thing, I was sure they must be thinking about privacy already – micro-data must be handled with care. For another, the 2020 Census “Privacy Budget” work is likely to make IPUMS’s work pretty complicated or even impossible.
My roots are about as humanistic/artsy as they come. I majored in English and minored in Art. Then I got a masters in literary studies. Then I got my masters in library and information science. That last degree was first time the word “science” was part of my life in any way more substantial than checking the box next to required credits for graduation. (It took me a long time, actually, to figure out why “library science” was a pair of words that go together, but be that as it may, my official degree is Master of Library and Information Science, which sounds very scientific to me.) From there I became the Librarian for Languages and Literature here at Carleton. All my previous passions neatly packaged in a single job.
This week I’ve been thinking a lot about an almost-decade-old paper by dana boyd and Kate Crawford. “Six Povocations for Big Data” made a big impression on me back in 2011 purely because epistemology and ways of knowing are my stock in trade, and this paper felt somehow Very True to me at the time. (Plus I’d just heard dana boyd talk at a library conference and was pretty much determined to listen to anything she ever said from then on.) This week I pulled it out of my Zotero library for a refresher, and it felt even more Very True to me now.
Part of my current re-fascination with this piece is that my liaison departments have taken a decided turn toward data. It has become increasingly clear to me that a major role I can play for my new liaison department (Computer Science — hey look! a second science in my life!) is to become a Data Librarian Lite(TM). Not a huge surprise there, and something I’m greatly enjoying learning. But they’re definitely not the only department I serve that’s turning to data. I’ve gotten more and more requests for linguistics corpora (spawning a new page on my Linguistics Research Guide just this weekend). And multiple faculty in the English department are working with digital textual analysis and literature corpora.
So yes, the phrases that stood out to me 8 years ago from boyd and Crawford’s piece stood out again: “Big Data is no longer just the domain of actuaries and scientists. … Big Data creates a radical shift in how we think about research … about the constitution of knowledge, the process of research, how we should engage with information, and the nature and categorization of reality” (pages 2-3).
But then there was this gem: “Claims to objectivity and accuracy are misleading” (boyd and Crawford, page 4). That meant one thing to me in 2011, but a lot has happened since then that has made me understand this statement in new ways. First there’s all the research into bias in algorithms (e.g. Safiya Noble). Then last week there was a talk presented in the Computer Science department here about “differential privacy” and (tangentially in the talk, but centrally to my world) the 2020 census’ plan to add deliberate small inaccuracies into reported results in order to protect respondents’ privacy. So not only are claims to objectivity and accuracy misleading, but too much accuracy has become harmful enough that we’re backing away from it in key areas.
Meanwhile, as epistemologies shift and the world of research continues to remake itself, I’ll be over here learning to be a librarian who navigates the various worlds of data in addition to being a librarian who absolutely values close reading and minute observation. And I’m loving it.