For 14 years, I’ve been a librarian for a pretty cohesive set of language and literature departments. My BA and MA are both in literary criticism, and I studied a few languages (not fluent in any of them any more, sadly), so my core departments have felt very much like home to me.
As you probably know, I also love computer stuff. I’ve never been formally trained in any of it, but I’m a huge fan and an intrepid experimenter. Plus the CS faculty here are awesome and many of them were friends of mine already, so when the chance came for me to be their liaison I said YES. Besides, I could draw parallels from some of the strategies of language research to the strategies of CS research.
But there’s also a lot that’s very very new to me, starting with exactly how information literacy works in CS… You know, just a small thing. Where does information literacy fit into a curriculum that’s full of coding and not a whole lot of traditional literature searching?
Thankfully the faculty here and the absolutely outstanding CS and STEM librarians at the Library Society of the World have been great partners and resources for me in my first year of being the CS librarian. I’ve also made a point of attending as many presentations and functions in that department as I can, listening for how information literacy works in CS. Here’s what I’ve found so far.
Information literacy in CS – Early observations
- You’re going to need a good, well-evaluated corpus to train your AI.
You kind of have to know what gets included in a corpus, and how, and where that stuff originated from in order to understand what your AI can or should do with the stuff, or to interpret what it spits out. Misunderstanding your corpus can result in wonky AI results. Luckily, librarians happen to have a long history of working with the kinds of things that get included in large text or metadata corpus-type-thingies — finding, evaluating, and using them!
- You’re going to need good data to develop your visualizations.
I’m learning a lot from our data librarian here. The one thing I found most interesting this past year is that CS students here have high confidence that they can knit datasets together to get what they want, but they have low levels of experience in determining if the datasets in question are built on compatible methodologies and variables. Next year I’ll spend a lot more time emphasizing that I’m not cautioning against combining datasets because the combining is hard — I’m cautioning against it because the thing you create might be the worst kind of chimera.
- You’re going to need to think about license agreements and copyright if you’re using stuff that other people built, including APIs.
Luckily, librarians have a long history of working with intellectual property topics!
- You’re probably going to need to find libraries (the code kind, not the institution kind) or algorithms or code bases to work with.
I haven’t really dipped my toes into this water yet, but what I have noticed is that students talk about this process differently than faculty do. Students talk about “looking online” and evaluating for speed, memory needs, and functions. Faculty talk about finding something that will be stable over time, with good documentation and a track record. There are undertones of publisher/author credibility, reliability, and stability threaded throughout. Definitely something for me to think about.
- If you want to build something new, you’ll have to know the state of the art, past and present.
This is where I’m learning more… and it needs more than a sentence or two, so I’ll give it a couple whole sections.
Finding The Current State of the Art
How do you know that what you’re building is new? And how do you make sure you’re building constructively on what’s already known? Translated into library-speak: What’s the conversation on this topic, and how does this project move that conversation forward? The information need is familiar to me, but the places to find that information are … not. CS has traditional scholarly publication venues, sure, but unlike my other fields, CS draws heavily on conference papers, research and technical reports, and patents. Not only that, but a bunch of stuff is proprietary — decidedly not the case for the latest interpretations of Hamlet.
So I’ve been trying to build up my skills in the grey literature area. Current strategies include using more familiar library databases to find out the names of people, associations, or institutions that are active in an area, and taking that knowledge over to Google for some advanced googling. I’m curious to see if Inspec Analytics turns out to be helpful with this, too, to help me figure out which institutions are active in an area and might have repositories of research and technical reports.
Patents are playing a larger and larger role in my work because that’s one of the only ways I’ve found of peeking into the proprietary research. That’s where company secrets comes right up against the desire to protect IP for future profit. So I’ve been exploring ways of navigating patents and analyzing publication and citation patterns to help me figure out the past and present of a process or topic. Are there key people or companies at play in a particular area? Do those people or companies have other reports available to the public?
Delving into the past to improve the future
There was a fascinating talk here last spring by an engineer working on Non-Volatile Memory. One of her many useful insights during the talk was that back in the 1960s people were working on Mmap, and in the 1980s “Bubble Memory” was set to be the memory of the future. It didn’t become the memory of the future, so most people now don’t know the term or remember the concept, but there are a lot of things about Bubble Memory that are the same as NVM. There’s also a nearly 40-year conversation about developing persistent languages (apparently called “persistent foo,” which is awesome) vs persistent databases. One of the speaker’s points was that finding out these kinds of histories can save people from reinventing wheels, falling into the old pitfalls, and basically repeating history in the worst way.
Of course this set me to wondering how a librarian could coach students in a research strategy to find things that are the similar but not necessarily the same, and that don’t share a lot of keywords. And how would you map out and synthesize what you find in meaningful ways, but as efficiently as possible? So next I think I’ll explore the literature around persistent memory, starting with the specifics this speaker mentioned in her talk, and see which search tools give students a good way to discover this kind of overlap with historical avenues of research. Strategy suggestions welcome!
So much more to learn
Soon we’ll launch into my second school year as the CS liaison, and I have a long way to go before I’ll feel like I really know how information works in this field. What do YOU think I should know in order to be the best librarian I can be for this field?