Close Reading Data Visualizations

As is often the case with such TED talks, I watched Gary Flake’s demonstration of Pivot with a mixture of awe and jealousy. I want that kind of thing for the deep web as well as the free web!

Go watch it, then come back.


No, really. I’m about to reference a specific visualization, so you should see it first. If you get bored, just watch the first sequence (until the flying people-graphs are done).

Ok. Wasn’t that cool?

The instructor in me, though, noticed an implicit message in the visualizations that I think would reinforce incorrect assumptions that my students make all the time. My students are constantly looking at census data, for example, and hoping that they can make talk about how many of these people from this chart in their hands that describe educational levels — how many of these people died in that other chart on accidental deaths. They’re wanting to track individuals rather than talk about probabilities and percentages. And the initial example that Flake uses to talk about mortality and age absolutely reinforces that faulty understanding of the data. Icon-people fly from one column to the next as he filters for different characteristics, making it seem like if you just concentrated enough, you’d know everything there was to know about that one blue guy who started off 3rd from the right of the 4th row.

I really wish the visualizations had figured out a way to make each one appear to be exactly what it was: a snapshot of a sample. Right now it looks like they’re drawing on incredibly detailed longitudinal data.

How Big Is My Library?

Image by the CCAC North Library: http://www.flickr.com/photos/ccacnorthlib/3554627894/

I’ve been mulling over Steve’s latest post about some of the ways in which knowing the number of books in your library is either impossible or not very meaningful. And I imagine that for most of the parents on these college tours this number really isn’t very meaningful at all. For it to be meaningful you need to know how that number compares to other libraries, and what the collection’s strengths are. I freely admit that I really haven’t a clue how many “books” we have in our library. I think of it as a medium-sized college library. I know that we have one of the strongest collections of “big name” critical editions of renaissance scores in the state. I know we have almost nothing in our collection about topics that aren’t actively taught on campus.

“Number of volumes” is one of those standard measures that libraries use to describe themselves, and I started wondering what was useful and what wasn’t about that measure. Like Carol in Steve’s comments (and actually like Steve says in his last non-bulleted paragraph), I think that there’s more to having more books than simply having more books. It makes lots of kinds of things possible that simply aren’t possible with smaller collections.

On the other hand, when that’s the number that we give to people who are, in effect, asking “how good is your library,” I think we’re missing the boat. And when the parents of prospective students ask “how many books do you have” they are actually asking you “how good is your library.” It’s a classic compromised question (for those of you familiar with the reference interview). They’ve already decided on a specific measure that they hope will help them figure out the answer to the larger question, not realizing that there are probably better ways to get answers to their real question. And they’re asking for this measure because back in the day, back when information was hard to come by, having a lot of it in your library was a huge deal. Period. Now the library’s actual holdings are not only hard to count, but they’re really only a portion of the information that’s available to our communities. The free web is bursting at the seams with fantastic sources of all kinds, and I make it my business to help my students navigate those as well as what’s actually in my library.

And so now, no matter how useful knowing the number of volumes in my library may be in some circumstances, I think that the worth of the library is measured in the people who work here and the relationships we have with our campus community. I think that “we have 35 employees on a campus of under 2000 students,” “we conduct about 1200 individual appointments with students each year,” “we have the most popular computer lab on campus, this one printer does a quarter of all printing for the entire campus, and 10% of all students log into one of these 20 computers every day,” “we have 8 subject specialist librarians and one is assigned to each one of your classes,” all of these are more meaningful measures of the library’s value than a count of volumes. Now that getting your hands on information isn’t the driving problem, now that learning to filter and evaluate the information you find is the primary struggle, now it’s the people who work here that are the key, now it’s the ways in which those people can help you not only find but also evaluate information that seems to be the most relevant measure of worth.

OAIster

For those of you who don’t know OAIster, if you have any reason to search for digitized primary sources, you should check it out. It’s a union catalog of digital library holdings. It’s chief asset is wonderfully descriptive metadata. And like with other collections of collections, I recommend searching OAIster to find which digital collections contain the kinds of things you’re interested in, and then searching or browsing those collections individually.

For those of you who know OAIster, you know that it recently stopped being its own unique entity and started being an OCLC-hosted entity. It’s now available on the FirstSearch interface and the WorldCat.org interface. (Here’s more on the history of the catalog.)

Enter the oddness. My co-worker ran some identical searches on both interfaces and came up with startlingly different numbers of results for most of her searches. Confused, I contacted OAIster and have just heard back from them why this is so. Apparently, the “keyword” search in the FirstSearch interface searches through the Source, Subject, Title, and Notes indexes. The keyword search on the WorldCat.org interface searches all available fields and all indexes.

So now we know.

Budgets, Databases, and Trust

It’s been an interesting week. We learned that the Bibliography of the History of Art will die at the end of the month due to lack of funding (Carleton’s news item on the topic), and we learned that back in January, CSA/ProQuest stopped providing Biology Digest because it was a free database and therefore not profitable to continue providing. They didn’t tell us they were going to do this — they just explained why our links didn’t work after the fact. But that’s not actually the point. The point is: the Great Recession Strikes Again. Also, Don’t Put Your Faith in Online Subscriptions. So that’s two points.

ANYWAY, all this made me rail briefly against the Big Vendors for not saving these great resources when the little institutions started drowning, and then I railed for a while against the Not For Profits for not doing more to ensure the continuation of things we rely on, and then I moved back to railing against money in general, which I don’t like and have never really understood. And now I wonder, who is best suited to run these projects? Is it the smaller and/or private institutions that seem to have more invested in having highly curated collections? Or is it the massive institutions/corporations that promise more stable hosting and possibly wider distribution and therefore clout? Who would I trust more with something like the Bibliography of the History of Art?

And reading over those questions, I wonder if they’re even the right questions, or built off of the right assumptions.  Basically, I’d like my databases to stop disappearing. Thanks.

Seeing Search Boxes

We’ve all heard that single search boxes are the only way to go when it comes to building search interfaces. We’ve probably also seen students who will bypass all relevant information or links on a page and zero in on whatever looks like a search box. But I never put these two pieces of knowledge together before. Not, that is, until just this morning as I was driving in to work. This morning I had a revelation:

Every page with a search box is a “single search box” page.

We may gripe about clutter. We may grouse about having not enough guidance surrounding our search boxes. It doesn’t matter. For people who are primed to search, they will only see the search box anyway. The other stuff may as well not be there. (For those of you getting hot under the collar like I would be if I were reading this right now? Hang on, I’ve got something for you in a minute.)

Here’s my Parable With Two Screenshots. We have several lists of electronic resources on our library’s website, each of which has a “Search” and “Browse” function at the top.

We’ve gently corrected students who started entering their topic keywords into the “search” box, but haven’t been able to get rid of the box entirely. “Oh those kids,” we thought. “Desperately seeking search boxes again.”

Then last week I had a professor call me in consternation that the library systems were telling her there was nothing on her topic. Turns out… she had entered her topic terms into that search box.

Ok, ok, so that one’s legitimately confusing. We’ve realized that while the existence of the box is out of our control, the wording next to it isn’t. Soon it’ll say something more like “find a database.” Still, this is only the first half of the parable, and probably not the most relevant half at that. It’s mostly relevant in that its proximity to the second half made everything come together in my head. And so, on to the second half…

For the past two weeks, I’ve also had students from a lit class coming to see me, all of whom want “something, anything from the last ten years written about [insert famous theme] and [insert famous piece of literature here].” Granted, searching for themes is hard. Even something standard like “performative identity” requires thinking up all kinds of synonyms (body, fashion, display, etc). But what struck me is that as soon as I set the date limiter on MLA International Bibliography, each student gasped in shock and surprise. This limiter is not hidden. It is three lines, or 1 inch, below the search boxes. And yet it had been totally invisible to my students as they focused all their energies on those tantalizing search boxes.

So now back to my revelation (and those of you who’ve been thinking “But we simply can’t do away with advanced search pages! Single search boxes aren’t always the way to go!!” can tune back in now). Here’s what I now think: We can feel free to have advanced search pages on any interface that we think functions better with all of those options laid out. It doesn’t matter. People who only want a single search box will only see that search box anyway. People who want the options will see and appreciate the options. Everybody happy.