Image

Spreadsheets: my not-so-secret love affair

I’ve gotta admit. I love a good spreadsheet now and again. Maybe it’s because they aren’t the only thing I work with day in and day out, but I’m very much a fan of computers doing my grunt work for me, and spreadsheets are very much a part of that picture for me (right along with most of the other light-weight coding I’ve dabbled in).

I’m NOT a fan of using a spreadsheet just because it has columns and rows. Call me a software snob, but if all you need is a table in a document to keep a bunch of text in order, that’s not a spreadsheet. No sense using an extra tool. But if it can automate processes or do your error-checking for you — that my friends is workplace GOLD.

I’ve used spreadsheets to take conference session information as submitted by presenters and wrap each bit up in html, string the html together, and then paste the resulting html into a web page (hundreds of sessions processed in minimal time with minimal human error). I’ve used it to reconfigure 14 years of reference, instruction, and consultation statistics to make the data from various systems from our past match the needs of our new system, flipping people’s names, combining, splitting, reshuffling, and reformatting. The list goes on. If you can teach a computer a pattern, you can probably teach it to do your fiddly work for you.

Case in point: Libguides asset management.

Most of us librarians at this point use Libguides. One of its great strengths is that you can reuse “assets” (links, books, etc) from one guide to another. But over time, duplicate assets multiply like rabbits, and old unused assets clutter up your search results for that one asset you really want to reuse. So over time it becomes easier to make new assets rather than see if that asset already exists in the system. And then you end up in a vicious cycle the spirals your assets out of control and makes one of the great features of Libguides functionally useless.

So every summer I do a big ol’ asset clean-up project. I ask Springshare to delete our unused assets for me (we mere mortals can’t do bulk deletions), and then I work to knit back together all the unnecessarily duplicated assets that have spawned in the system when the librarians either make a new one that already exists or copy boxes or guides to new guides (which duplicates all the assets in the box or guide — asset management hell).

Screenshot of spreadsheet showing examples of normalized titles for alphabetization.

This is where the massive spreadsheet comes in. I need to find all the assets that are actually the same thing, and then map them back together so that they ARE the same thing. And the first part of this is to sort them by name. So I download a spreadsheet of all assets, plunk it into Google Sheets (easier to work on from multiple computers or share with others in the department), and alphabetize by title. But as you may know, neither Libguides nor any spreadsheet software I’m aware of is smart enough to alphabetize by the first “real” word, or to know that “US,” “USA,” “U.S.,” U. S.,” and “United States” are all the same thing. And anything with a quotation mark in front of it will go up into the non-alphabet part of the sort. And the list goes on. Alphabetizing just doesn’t cut it.

So in my google sheet, I add a column for Sort Title, and in the first cell of that column I use a formal to teach it all of the patterns that I know will be a problem with the titles in my asset list. Then I drag that formula down through all 7-9,000 asset records, and Ta-Da! Alphabetizable titles!

I’m a little nervous about sharing my formula for this because I’m a rank amateur and probably used a million IF statements where a simpler solution is possible. But hey, I’m also a rank amateur, so if you’ve never done this you can join me and then improve on what I’ve found! So… if you want to try this out, the basic pattern is “If at the left of the string in the title column you see x string, substitute x string with y string.” The other basic pattern is “If at the left of the string in the title column you see x string, delete it.” If your Title column is in column E, and you’re starting on row 2 (because row 2 is your header row). My current formula, which accounts for the patterns I’m currently seeing in my title list, looks like this:

=IF(LEFT(E2,2)="A ",RIGHT(E2,len(E2)-2),IF(LEFT(E2,3)="An ",RIGHT(E2,len(E2)-3),IF(LEFT(E2,4)="The ",RIGHT(E2,len(E2)-4),IF(LEFT(E2,6)=""",SUBSTITUTE(E2,""",""),IF(LEFT(E2,5)="U.S. ",SUBSTITUTE(E2,"U.S. ","US "),IF(LEFT(E2,14)="United States ",SUBSTITUTE(E2,"United States ","US "),IF(LEFT(E2,4)="USA ",SUBSTITUTE(E2,"USA ","US "),IF(LEFT(E2,6)="U. S. ",SUBSTITUTE(E2,"U. S. ","US "),IF(LEFT(E2,11)="University ", SUBSTITUTE(E2,"University ","U "),IF(LEFT(E2, 5)="U.K. ",SUBSTITUTE(E2,"U.K. ","UK "),IF(LEFT(E2, 5)="U.N. ",SUBSTITUTE(E2,"U.N. ","UN "),IF(LEFT(E2,15)="United Nations ",SUBSTITUTE(E2,"United Nations ","UN "),IF(LEFT(E2, 15)="United Kingdom ",SUBSTITUTE(E2,"United Kingdom ","UK "),E2)))))))))))))))))))))))

So, it’s IF(logical statement, result if true, result if false), and in the “result if false” section, that’s where I put the next IF statement. The very last “result if false” is just “show me the full title from the title column” because by that time the title is probably just fine without alteration. Each logical statement here tests the characters at the left of the title column, since that’s what I’ll be alphabetizing on. Then each “result if true” section tells it how to transform those left-hand characters so that they’ll alphabetize properly.

So that’s my spreadsheet love story of the week! I can’t believe that last year I sat there and used successive “find/replace” searches and thought I was being efficient. Live and learn!

Leave a Reply