After yesterday’s post I had a fascinating discussion with someone who codes for a living about whether patents were a viable research resource in CS. First off, they’re extremely hard to understand. And yes, I definitely agree, and it’s a good reminder that when I talk about this with students I also talk explicitly about what I expect they’ll be able to learn from the exercise.
If you find a patent that you think is related to your topic, look at other similarly classified patents to see what problems people are tackling in the field and who is tackling them.
As you look through similarly classified patents, collect vocabulary that you can use in future searches. After all, most search systems simply match letters in a row rather than semantics, so if people are talking about the same thing but using different words to do so, you won’t find that whole side of the conversation.
While reading in order to understand the patented process is probably not feasible for most people, reading instrumentally has been super useful for me when exploring CS topics.
So far so good, but what really set me thinking was this industry coder’s take on the disadvantages of reading patents. Apparently he’s told not to read patents because knowingly infringing on someone else’s IP brings worse penalties than unknowingly infringing. In order to mitigate penalties, they don’t look at patents. So now I’m wondering how to guide students as they prepare for a world in which, at least some of the time, lack of information has value. And how do I square that with the idea of the very real costs involved in having a bunch of people reinventing wheels and falling into the same pitfalls, all so that if they get sued it won’t be quite so bad? And how do I square that with how this upends the progress narrative of the sciences in general, a set of disciplines which so carefully finds gaps in knowledge and then fills them, or finds the limits of current knowledge and then pushes those limits back bit by bit?
I wonder if it matters what sector you’re in, or even what specific companies you’re working for. And I wonder how liberal arts students might engage with this conundrum in a way that prepares them for life after graduation, whether that life involves CS careers or not.
For 14 years, I’ve been a librarian for a pretty cohesive set of language and literature departments. My BA and MA are both in literary criticism, and I studied a few languages (not fluent in any of them any more, sadly), so my core departments have felt very much like home to me.
As you probably know, I also love computer stuff. I’ve never been formally trained in any of it, but I’m a huge fan and an intrepid experimenter. Plus the CS faculty here are awesome and many of them were friends of mine already, so when the chance came for me to be their liaison I said YES. Besides, I could draw parallels from some of the strategies of language research to the strategies of CS research.
But there’s also a lot that’s very very new to me, starting with exactly how information literacy works in CS… You know, just a small thing. Where does information literacy fit into a curriculum that’s full of coding and not a whole lot of traditional literature searching?
Thankfully the faculty here and the absolutely outstanding CS and STEM librarians at the Library Society of the World have been great partners and resources for me in my first year of being the CS librarian. I’ve also made a point of attending as many presentations and functions in that department as I can, listening for how information literacy works in CS. Here’s what I’ve found so far.
Information literacy in CS – Early observations
You’re going to need a good, well-evaluated corpus to train your AI. You kind of have to know what gets included in a corpus, and how, and where that stuff originated from in order to understand what your AI can or should do with the stuff, or to interpret what it spits out. Misunderstanding your corpus can result in wonky AI results. Luckily, librarians happen to have a long history of working with the kinds of things that get included in large text or metadata corpus-type-thingies — finding, evaluating, and using them!
You’re going to need good data to develop your visualizations. I’m learning a lot from our data librarian here. The one thing I found most interesting this past year is that CS students here have high confidence that they can knit datasets together to get what they want, but they have low levels of experience in determining if the datasets in question are built on compatible methodologies and variables. Next year I’ll spend a lot more time emphasizing that I’m not cautioning against combining datasets because the combining is hard — I’m cautioning against it because the thing you create might be the worst kind of chimera.
You’re going to need to think about license agreements and copyright if you’re using stuff that other people built, including APIs. Luckily, librarians have a long history of working with intellectual property topics!
You’re probably going to need to find libraries (the code kind, not the institution kind) or algorithms or code bases to work with. I haven’t really dipped my toes into this water yet, but what I have noticed is that students talk about this process differently than faculty do. Students talk about “looking online” and evaluating for speed, memory needs, and functions. Faculty talk about finding something that will be stable over time, with good documentation and a track record. There are undertones of publisher/author credibility, reliability, and stability threaded throughout. Definitely something for me to think about.
If you want to build something new, you’ll have to know the state of the art, past and present. This is where I’m learning more… and it needs more than a sentence or two, so I’ll give it a couple whole sections.
Finding The Current State of the Art
How do you know that what you’re building is new? And how do you make sure you’re building constructively on what’s already known? Translated into library-speak: What’s the conversation on this topic, and how does this project move that conversation forward? The information need is familiar to me, but the places to find that information are … not. CS has traditional scholarly publication venues, sure, but unlike my other fields, CS draws heavily on conference papers, research and technical reports, and patents. Not only that, but a bunch of stuff is proprietary — decidedly not the case for the latest interpretations of Hamlet.
So I’ve been trying to build up my skills in the grey literature area. Current strategies include using more familiar library databases to find out the names of people, associations, or institutions that are active in an area, and taking that knowledge over to Google for some advanced googling. I’m curious to see if Inspec Analytics turns out to be helpful with this, too, to help me figure out which institutions are active in an area and might have repositories of research and technical reports.
Patents are playing a larger and larger role in my work because that’s one of the only ways I’ve found of peeking into the proprietary research. That’s where company secrets comes right up against the desire to protect IP for future profit. So I’ve been exploring ways of navigating patents and analyzing publication and citation patterns to help me figure out the past and present of a process or topic. Are there key people or companies at play in a particular area? Do those people or companies have other reports available to the public?
Delving into the past to improve the future
There was a fascinating talk here last spring by an engineer working on Non-Volatile Memory. One of her many useful insights during the talk was that back in the 1960s people were working on Mmap, and in the 1980s “Bubble Memory” was set to be the memory of the future. It didn’t become the memory of the future, so most people now don’t know the term or remember the concept, but there are a lot of things about Bubble Memory that are the same as NVM. There’s also a nearly 40-year conversation about developing persistent languages (apparently called “persistent foo,” which is awesome) vs persistent databases. One of the speaker’s points was that finding out these kinds of histories can save people from reinventing wheels, falling into the old pitfalls, and basically repeating history in the worst way.
Of course this set me to wondering how a librarian could coach students in a research strategy to find things that are the similar but not necessarily the same, and that don’t share a lot of keywords. And how would you map out and synthesize what you find in meaningful ways, but as efficiently as possible? So next I think I’ll explore the literature around persistent memory, starting with the specifics this speaker mentioned in her talk, and see which search tools give students a good way to discover this kind of overlap with historical avenues of research. Strategy suggestions welcome!
So much more to learn
Soon we’ll launch into my second school year as the CS liaison, and I have a long way to go before I’ll feel like I really know how information works in this field. What do YOU think I should know in order to be the best librarian I can be for this field?
It occurred to me recently that we in higher education talk about the curriculum and the co-curriculum, and there are departments and offices and structures involved in making those parallel structures work (hopefully) to the benefit of our students. But there’s another curriculum at play as well — a curriculum that is every bit as fundamental to our institutional learning outcomes as the formal curriculum but that isn’t “owned” by any department and isn’t administered in any systematic way across the institution. I’ve started thinking of this as the “interstitial curriculum.”
The interstitial curriculum is where students learn the intellectual habits and skills that cut across curricular and co-curricular lines. It doesn’t have a home in the formal curriculum, and it can’t happen exclusively in the co-curriculum, either. Instead, it lives in the multiple and cumulative experiences that individual students have as they live out their college experiences through, among, and between the intertwinings of the curriculum and the co-curriculum. Depending on the institution, these are probably things like writing (never something that any single department can teach fully), metacognition, project management, time management, interpersonal “soft” skills, and yes, information literacy.
These are things that might even be named in mission statements or in institutional and departmental learning objectives, or that tons of faculty say are critical … but there’s often no course or formal home for them in the institutional structures that ensure other learning objectives. Everyone relies on students building these intellectual muscles by working with someone else somewhere else in the institution. They may not be sure who or where or when this work happens or should happen, but they really hope that does happen because otherwise their own goals for students in their courses or majors can’t happen, or can’t happen well.
In my own work, I live in the tension between the deeply rewarding, mission-critical work that I get to do with students every day, and the dismissal of some who assume that the work I want to do with their students has surely already been done by someone else at some other time — probably in their first year seminar. I live in a liminal space, where literally dozens of departments on campus list learning outcomes directly related to information literacy, the campus mission and learning goals invoke information literacy, and yet no department has a formal plan to ensure that their students get intentional, scaffolded practice with the intellectual habits of information literacy. And I’m not saying that this is a bad place to be! There are many good reasons at play in this state of affairs. But it does mean that my entire existence feels similar to the work of the fascia in the human body: necessary, often invisible, existing between the better-known structures of the body, not well understood, but instrumental in encouraging and even allowing the intellectual work of the disciplines. I live in the spaces between.
It’s a very, very interesting space to inhabit. Not easy, but interesting.
A student recently spent some time tearing his hair out in my office because he was really really worried about the possibility that he might inadvertently plagiarize his sources and therefore fail his course. Meanwhile, a group of faculty in a recent workshop worried that their students routinely failed to quote, summarize, synthesize, and cite in proper measure. And every year our Information Literacy in Student Writing assessment project reveals that our sophomores do a good but not great job of knowing when their readers could use some more clues about their sources.
And on the one hand, everyone kind of nods and says, “Yeah, students these days.” But I think that we’re at least as much of the problem.
Lots of academic librarians (and disciplinary faculty) deal heavily in the world of plagiarism detection and anti-plagiarism instruction, but I think long habit and acculturation has made a lot of us feel like plagiarism is an objective thing with rules that everyone agrees on. You don’t use passages in your writing that other people have already used; you don’t pass off other people’s ideas as your own. If you do these things you are either ignorant or bad, or both. Done.
While college policies make it sound like a concrete thing called plagiarism is forbidden, in real life there are a whole lot of circumstances where those rules just don’t seem to fit very well, or where they’re applied one way in one classroom and another way in another classroom. What do you do about writing that results from thoughts developed over the course of class discussion or long conversations? What about the world outside of academic writing, where sharing and re-purposing may be the rule rather than the crime? What about all the genres of writing assigned in classrooms that mimic genres outside of academia and that therefore don’t have the same norms built around them?
Similarly, librarians and faculty either “care about citation” or “don’t care about citation,” but we also tend to link plagiarism and citation as almost one-to-one topics when really citation is about so very much more than just the presence or absence of plagiarism. The complex interplay between what counts as evidence and what counts as proper community participation lead to different norms even within academia. For example, in Lit studies, individual words are the evidence, so quoting individual words is every bit like reporting the response rates on a survey in the social sciences. In other disciplines this kind of quoting is frowned on as “over-citation” and a lack of synthesis. In computer science, there’ll be different rules for your classroom work and your industry work because using other people’s code functions fundamentally differently in an individual competence environment than it does in a development team. Anthropologists tend not to cite each other very much, and historians tend to value rich tapestries of citation functioning as almost a parallel narrative to the author’s work. These rules are not self-evident or consistent, but they are vitally important in their own contexts. Is it any wonder that students are confused?
So my job as a Disciplinary Discourse Mediator is to figure out what the rules are in my departments, why they are that way, and then to help students see the norms that they will be expected to conform to. More than that, my job is to alert students to the importance of figuring out what the rules are for any community they participate in, because the community will value these norms deeply but may express them vaguely (if at all), and when they do talk about they they’ll probably do so as if these are self-evident practices or even a matter of basic human decency.
Right now my strategy has been pretty subtle for the most part, framing any conversations about citation or academic honesty in terms of community norms and (when the situation seems to call for it) explaining some aspect of the connection between the norm and its community. For the rare people who are actually interested, I point them toward the work of Ken Hyland and all the various studies he did on disciplinary practices of quotation and attribution. But I always wonder if there’s something better I could do to usefully problematize the monolithic definition of plagiarism and its strangle-hold on the topic of citation while at the same time actually improving students’ abilities to detect and mimic the practices of the communities they participate in.