The Zeitgeist Vocabulary

Found myself in a conversation the other day about searches and carefully watching what it is that people actually search for in your interfaces. Seems like if you could have an easy, convenient way to focus the staff of a library on which searches are occurring most often, which ones are rising and falling, etc., you might have a way to connect more people to resources, more resources to other resources, and more people to other people.

This makes me think of two commercial efforts that do this and, because of it, they deliver a compelling service. One is a medical information product that took a new approach ten years ago -- it might be old hat and common by now, but it was interesting when it started. They saw that ER was popular, their doctors told them that every Friday morning their patients would call up asking about whatever disease or cure was on ER the night before and whether they had it or should get it, and they realized: we need a "what was on ER last night" section. Apparently, the clinical staff who used their product found it very helpful.

The other is Mahalo, the "human-powered search" folks. Unfortunately their nginx proxy is b04k3d right now, but if it weren't I'd have a bunch of links to how they follow the latest news and popular topics into their hand-curated lists and links and bios and stuff. Maybe it's always just about what's happening right now, but it's not like their stuff goes away -- as they build it up over time, it adds up to an interesting repository of pop knowledge that seems to integrate bits and bobs found elsewhere in a helpful way.

So I was thinking about whether it would make sense for libraries to publish feeds or stats of the most common searches in their most prominent search interfaces. And if they did, what we could do with such feeds and summaries if we threw 'em together a bit to see how they add up.

Okay, I hear you, stop right there -- I'm not listening to your privacy rant. This data is important, and we can be thoughtful about how to use it to help our users without selling them out at the same time, so I won't let you sidetrack me. :)

I want a site where I can log in and see what people are looking for at a variety of places. Sorta like what you can see at Digg Labs, but, yknow, with search data. And on another screen, I want to see current summary data like Google Trends and historical summaries like Google Zeitgeist.

And then I want a screen where I can pick a topic that those tools tell me users are interested in, one for which I know my library has some cool stuff. Maybe that cool stuff has a whiz-bang site already, or maybe it's buried in some stacks somewhere without so much as a collection-level record, but we have it, and I want people to know about it. And when I pick that topic, I want to be able to see: for this topic, what tags are people using for it on delicious and digg? Which form of the query is most popular in my stats? Which terms from my local data's vocabulary are relevant for this if I have good metadata for it -- from LCSH, or DDC, or LCC, or MeSH, or AGRICOLA, or even just identifiers like ISBNs or IMDB links, whatever -- what access points do I already have for this?

Then, knowing that those terms from our side of the world are inevitably going to look arcane and unfathomable to the normal people out there, I want to be able to *connect* those terms from our world directly to what it is that normal people talk about when they talk about those things. I want to connect tags and terms, to associate topics and tokens, and to build up those associations over time. Then I want to use those connections on my own sites. I want to republish the mappings between the user tags and tokens and our topics and terms and use those to draw the bots and crawlers and spiderers into our stuff from what it is that other people are looking for. There are any number of ways we could do this, but the magic would be to share the work of making the connections, and then to bring the connections local into our collections.

The next time Mahalo goes to update their page about that popular topic, they'll redo all their searches in the popular non-human-powered search engines and they'll find our stuff. Because the machine searchers will have found our stuff, because we will have connected the two. As far as I can tell, the smart librarians have long since found ways to do this, just like the folks at Mahalo.

We could call it the Zeitgeist Vocabulary.

Just a thought.

Trackback URL for this post:

http://onebiglibrary.net/trackback/303

Jonathan Rochkind (not verified) on April 15th 2009

That's a great idea. I like the idea of seeing popular searches, and then picking some resources of value for those topics and highlighting them on front pages etc.

I'm not sure my local reference librarians would have the time/interest to do that, even if the stats were provided though. But it's a great idea.

Dale A (not verified) on April 21st 2009

... with what you wrote. I rant all the time about libraries finally banding together to create the massive data sets that are necessary to make reliable and useful inferences about their behavior and desires. You will not hear any privacy rant from me, either, because I take the same approach; we can sanitize the data and when used in mass quantities, anonymity is practically a given.

This is perhaps a naive question, but is what we need something like a (gasp) standard, where all of our various OPAC logs and whatnot can be ported to some simple and pliable XML schema for further use and application? Even within our own library we have issues with data compatibility, let alone on a larger scale. There's the OPAC log files, our search appliance's quirky Google XML format, and the massive data our very detailed reference librarians record about f2f transactions, which should be a gold mine, if only ...

I think this is what you meant when you said "publish" our feeds and stats.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <pre> <code> <img> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <form> <input> <span> <object> <embed> <br>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <apache>, <bash>, <css>, <diff>, <dot>, <java>, <javascript>, <mysql>, <perl>, <php>, <python>, <rails>, <ruby>, <sql>, <xml>. Beside the tag style "<foo>" it is also possible to use "[foo]".

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
10 + 4 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.