Primary source searching is hard. It has always been hard. First there was the problem of extremely limited access (unless you had travel funding and archives access). Then, after the digitization boom, there’s the new problem of helping students understand that they can’t search for topics or ideas; they have to search for concrete things from the source description or from the text already in the source. “Postcard” will likely be in the metadata about a postcard, but “depicting domesticity in the 18th century” is just not part of the metadata as a general rule. I tell students that they have to search for people, places, or things, not topics. And even then it won’t be comprehensive. And there’s literally no way to search for “paintings by women” or “novels by Black people.” That’s just not how the systems are set up, I say, over and over and over. You have to literally type in the letters-in-a-row that the original authors typed, or you have to know the name of the creator, I explain, over and over and over. If you want to find out how x group is referenced in newspapers, you have to OR together all the names and words that might have been associated with that group, I instruct, over and over and over.
And therein lies the rub. I am no longer willing to inflict on my students the trauma – the violence – of ORing together all the epithets that have been used in newspapers and legislation and editorial cartoons and broadsides to refer to minority groups. It’s one thing to be presented with these terms once you’ve gained access to a historical document. It’s quite another to have to use your imagination, creativity, and research skills to come up with these terms. And then after all that you have to actively recreate these epithets by typing these terms into a search box?? All neatly strung together with your fancy boolean operators?? No. Doing that myself is painful. Requiring students to do that in order to gain access to the historical record is horrible.
Going through and improving the metadata in our digital collections is going to be hard, expensive, and time consuming. The historic record is quite large, after all. But we and our vendors must do this work. It’s our ethical, moral, and social responsibility, and the technology exists to make it possible. We’ve been applying subject metadata to secondary and tertiary sources for years — for decades. And especially now that curricula have shifted toward teaching from primary sources more and more, we can’t hide behind the convenient excuse that “this is just the price you pay for studying history.” No. This is not a price we should have to pay. This is certainly not the price that my Black, Indigenous, LGBTQ, Latinx, and other historically marginalized students should have to pay in order to study history and culture.
After yesterday’s post I had a fascinating discussion with someone who codes for a living about whether patents were a viable research resource in CS. First off, they’re extremely hard to understand. And yes, I definitely agree, and it’s a good reminder that when I talk about this with students I also talk explicitly about what I expect they’ll be able to learn from the exercise.
If you find a patent that you think is related to your topic, look at other similarly classified patents to see what problems people are tackling in the field and who is tackling them.
As you look through similarly classified patents, collect vocabulary that you can use in future searches. After all, most search systems simply match letters in a row rather than semantics, so if people are talking about the same thing but using different words to do so, you won’t find that whole side of the conversation.
While reading in order to understand the patented process is probably not feasible for most people, reading instrumentally has been super useful for me when exploring CS topics.
So far so good, but what really set me thinking was this industry coder’s take on the disadvantages of reading patents. Apparently he’s told not to read patents because knowingly infringing on someone else’s IP brings worse penalties than unknowingly infringing. In order to mitigate penalties, they don’t look at patents. So now I’m wondering how to guide students as they prepare for a world in which, at least some of the time, lack of information has value. And how do I square that with the idea of the very real costs involved in having a bunch of people reinventing wheels and falling into the same pitfalls, all so that if they get sued it won’t be quite so bad? And how do I square that with how this upends the progress narrative of the sciences in general, a set of disciplines which so carefully finds gaps in knowledge and then fills them, or finds the limits of current knowledge and then pushes those limits back bit by bit?
I wonder if it matters what sector you’re in, or even what specific companies you’re working for. And I wonder how liberal arts students might engage with this conundrum in a way that prepares them for life after graduation, whether that life involves CS careers or not.
For 14 years, I’ve been a librarian for a pretty cohesive set of language and literature departments. My BA and MA are both in literary criticism, and I studied a few languages (not fluent in any of them any more, sadly), so my core departments have felt very much like home to me.
As you probably know, I also love computer stuff. I’ve never been formally trained in any of it, but I’m a huge fan and an intrepid experimenter. Plus the CS faculty here are awesome and many of them were friends of mine already, so when the chance came for me to be their liaison I said YES. Besides, I could draw parallels from some of the strategies of language research to the strategies of CS research.
But there’s also a lot that’s very very new to me, starting with exactly how information literacy works in CS… You know, just a small thing. Where does information literacy fit into a curriculum that’s full of coding and not a whole lot of traditional literature searching?
Thankfully the faculty here and the absolutely outstanding CS and STEM librarians at the Library Society of the World have been great partners and resources for me in my first year of being the CS librarian. I’ve also made a point of attending as many presentations and functions in that department as I can, listening for how information literacy works in CS. Here’s what I’ve found so far.
Information literacy in CS – Early observations
You’re going to need a good, well-evaluated corpus to train your AI. You kind of have to know what gets included in a corpus, and how, and where that stuff originated from in order to understand what your AI can or should do with the stuff, or to interpret what it spits out. Misunderstanding your corpus can result in wonky AI results. Luckily, librarians happen to have a long history of working with the kinds of things that get included in large text or metadata corpus-type-thingies — finding, evaluating, and using them!
You’re going to need good data to develop your visualizations. I’m learning a lot from our data librarian here. The one thing I found most interesting this past year is that CS students here have high confidence that they can knit datasets together to get what they want, but they have low levels of experience in determining if the datasets in question are built on compatible methodologies and variables. Next year I’ll spend a lot more time emphasizing that I’m not cautioning against combining datasets because the combining is hard — I’m cautioning against it because the thing you create might be the worst kind of chimera.
You’re going to need to think about license agreements and copyright if you’re using stuff that other people built, including APIs. Luckily, librarians have a long history of working with intellectual property topics!
You’re probably going to need to find libraries (the code kind, not the institution kind) or algorithms or code bases to work with. I haven’t really dipped my toes into this water yet, but what I have noticed is that students talk about this process differently than faculty do. Students talk about “looking online” and evaluating for speed, memory needs, and functions. Faculty talk about finding something that will be stable over time, with good documentation and a track record. There are undertones of publisher/author credibility, reliability, and stability threaded throughout. Definitely something for me to think about.
If you want to build something new, you’ll have to know the state of the art, past and present. This is where I’m learning more… and it needs more than a sentence or two, so I’ll give it a couple whole sections.
Finding The Current State of the Art
How do you know that what you’re building is new? And how do you make sure you’re building constructively on what’s already known? Translated into library-speak: What’s the conversation on this topic, and how does this project move that conversation forward? The information need is familiar to me, but the places to find that information are … not. CS has traditional scholarly publication venues, sure, but unlike my other fields, CS draws heavily on conference papers, research and technical reports, and patents. Not only that, but a bunch of stuff is proprietary — decidedly not the case for the latest interpretations of Hamlet.
So I’ve been trying to build up my skills in the grey literature area. Current strategies include using more familiar library databases to find out the names of people, associations, or institutions that are active in an area, and taking that knowledge over to Google for some advanced googling. I’m curious to see if Inspec Analytics turns out to be helpful with this, too, to help me figure out which institutions are active in an area and might have repositories of research and technical reports.
Patents are playing a larger and larger role in my work because that’s one of the only ways I’ve found of peeking into the proprietary research. That’s where company secrets comes right up against the desire to protect IP for future profit. So I’ve been exploring ways of navigating patents and analyzing publication and citation patterns to help me figure out the past and present of a process or topic. Are there key people or companies at play in a particular area? Do those people or companies have other reports available to the public?
Delving into the past to improve the future
There was a fascinating talk here last spring by an engineer working on Non-Volatile Memory. One of her many useful insights during the talk was that back in the 1960s people were working on Mmap, and in the 1980s “Bubble Memory” was set to be the memory of the future. It didn’t become the memory of the future, so most people now don’t know the term or remember the concept, but there are a lot of things about Bubble Memory that are the same as NVM. There’s also a nearly 40-year conversation about developing persistent languages (apparently called “persistent foo,” which is awesome) vs persistent databases. One of the speaker’s points was that finding out these kinds of histories can save people from reinventing wheels, falling into the old pitfalls, and basically repeating history in the worst way.
Of course this set me to wondering how a librarian could coach students in a research strategy to find things that are the similar but not necessarily the same, and that don’t share a lot of keywords. And how would you map out and synthesize what you find in meaningful ways, but as efficiently as possible? So next I think I’ll explore the literature around persistent memory, starting with the specifics this speaker mentioned in her talk, and see which search tools give students a good way to discover this kind of overlap with historical avenues of research. Strategy suggestions welcome!
So much more to learn
Soon we’ll launch into my second school year as the CS liaison, and I have a long way to go before I’ll feel like I really know how information works in this field. What do YOU think I should know in order to be the best librarian I can be for this field?
It occurred to me recently that we in higher education talk about the curriculum and the co-curriculum, and there are departments and offices and structures involved in making those parallel structures work (hopefully) to the benefit of our students. But there’s another curriculum at play as well — a curriculum that is every bit as fundamental to our institutional learning outcomes as the formal curriculum but that isn’t “owned” by any department and isn’t administered in any systematic way across the institution. I’ve started thinking of this as the “interstitial curriculum.”
The interstitial curriculum is where students learn the intellectual habits and skills that cut across curricular and co-curricular lines. It doesn’t have a home in the formal curriculum, and it can’t happen exclusively in the co-curriculum, either. Instead, it lives in the multiple and cumulative experiences that individual students have as they live out their college experiences through, among, and between the intertwinings of the curriculum and the co-curriculum. Depending on the institution, these are probably things like writing (never something that any single department can teach fully), metacognition, project management, time management, interpersonal “soft” skills, and yes, information literacy.
These are things that might even be named in mission statements or in institutional and departmental learning objectives, or that tons of faculty say are critical … but there’s often no course or formal home for them in the institutional structures that ensure other learning objectives. Everyone relies on students building these intellectual muscles by working with someone else somewhere else in the institution. They may not be sure who or where or when this work happens or should happen, but they really hope that does happen because otherwise their own goals for students in their courses or majors can’t happen, or can’t happen well.
In my own work, I live in the tension between the deeply rewarding, mission-critical work that I get to do with students every day, and the dismissal of some who assume that the work I want to do with their students has surely already been done by someone else at some other time — probably in their first year seminar. I live in a liminal space, where literally dozens of departments on campus list learning outcomes directly related to information literacy, the campus mission and learning goals invoke information literacy, and yet no department has a formal plan to ensure that their students get intentional, scaffolded practice with the intellectual habits of information literacy. And I’m not saying that this is a bad place to be! There are many good reasons at play in this state of affairs. But it does mean that my entire existence feels similar to the work of the fascia in the human body: necessary, often invisible, existing between the better-known structures of the body, not well understood, but instrumental in encouraging and even allowing the intellectual work of the disciplines. I live in the spaces between.
It’s a very, very interesting space to inhabit. Not easy, but interesting.