A Clean, Well-Linked 'Base (or, Solving the "Appropriate Resolver" Problem with the OCLC Resolver Registry)

Along with several colleagues I've come to the opinion that the current OpenURL implementation workflow and user experience is wholly insufficient. There's too much maintenance overhead on both sides of the library/publisher equation. There's too much work for small publishers to be able to participate. There's almost zero possibility that a tiny publisher (like a single weblogger somewhere) will be able to put useful OpenURL links on their own little sites.

What we need next if we really want to spread OpenURL-based services more widely is a no-configuration, no-overhead, inexpensive solution that works for the widest possible range of libraries, publisher/vendors, and users. (The usability of the prevailing OpenURL resolution click-flow is a wholly separate matter with its own insufficiencies, but we can't solve everything at once.)

COinS, as cool as it is, remains inadequate for meeting this need statement because it requires every user to twiddle bits on their desktop, although it pushes us a step closer by *allowing* users to benefit by twiddling bits, which wasn't possible before.

What to do? Use registries.

The OCLC OpenURL Resolver Registry comprises records for roughly 1000 OpenURL resolvers at various institutions, mostly but not solely in North America. It also provides a simple web service that takes an IP address as a parameter and returns zero-to-many resolver records for every resolver that serves users coming from that IP address.

What does that mean? If you're like me, and you work for a small service like the Canary Database, you used to be essentially unable to provide user-appropriate OpenURL linking without having to configure many many ranges of IP addresses after many many conversations with librarians. "Used to be," that is.

Are you on a campus (or using a campus proxy) for an institution with an OpenURL resolver? If yes, visit the Canary here and tell me what you see.

What you *should* see (only just turned this on, mind you, so please report broken stuff!) is working links to your own institution's OpenURL resolver. Easy, right?

Here's how to implement it on your site:

  • read about the OCLC OpenURL Registry Gateway service and find the details of the query service in the Word doc on that page.
  • implement code in your database that queries the gateway with the IP address of your webapp's incoming users (REMOTE_ADDR in CGI-land)
  • parse the response and if there's a resolver in there, formulate and render links and link buttons to that user for all the references on your site
  • watch users' eyes light up when they see links
  • get excited

Well, it's a bit more complicated than that. You're not quite done:

  • since you don't want to hit OCLC's service multiple times for the same user, build a little caching system into your application It could be as simple as a single table with a UNIQUE constraint on users' IP addresses that maybe stores the raw xml Registry responses and parsed values for at least base_url, icon_url, and link_text
  • instead of hitting OCLC every time, first check your db, and only query OCLC if you don't already have a record.

Better, right? Actually, the best thing to do would be to also parse out the per-institution IP address block information and do local queries against the *ranges*, not specific IP addresses. PostgreSQL has a built-in type that supports this really well.

Any competent web geek should be able to implement this in a few hours. Call me if have questions.

So, to review, here's what happens:

  • You implement a function in your webapp that queries OCLC and caches the resolver information locally and then renders appropriate resolver links to all your users
  • Your webapp users follow the links as if they were always there because it looks just like what they're used to seeing in Fancy Expensive Resources and they'll Just Know (tm) what to do

Pretty cool, eh? It's not without some important problems, though.

  • It's still not good enough. It doesn't solve the "but I'm off-campus" problem. The good news is, though, that if your campus, like ours, uses a web proxy for remote access, your remote users will probably be using the proxy anyway if they're already doing research, so this should work for them in a lot of cases.
  • There's no good fallback if there's no resolver for you. Ideally something like OCLC's Find in a Library function could work here too, but that's fraught with difficulties. Trust me, if you've ever been to New Haven, you'll know... if you live on the wrong side of the street across from Ivy U., you won't necessarily get Ivy U. access even if you beg and plead. But, that's a much bigger problem of which this is just yet another instance.
  • This simple function isn't as smart as the work Google Scholar does to attempt to check against each institution's holdings before showing links. But then again, Google Scholar isn't a last mile service, so they don't want to have to deal with the untidy problem of finding something that might not be online. But we librarians do!
  • It isn't clear whether the OCLC Registry coordinates queries across the pond to the UK Registry, or any other registry, or whether OCLC's registry comprises their remote data, but it would be good to be able to fire off just one query and be reasonably assured that users *anywhere* will find their resolvers.
  • There's a fundamental flaw in the whole approach of using a web service to query IP- and DNS-related information.

The flaw with the IP/DNS query bit is that a massive, distributed, caching system for queries about membership of IP addresses in IP blocks already exists, and it's all connected to user- and application-queriable layers through DNS, too. And a protocol which uses these existing layers for just these kinds of purposes already exists along with a related DNS Service Discovery piece. Check the name of the first protocol: Zero Configuration Networking.

That's exactly what we're doing here - providing a zero configuration experience to users. But we're not doing it with ZeroConf, though we probably should be. Or at least we need to make a concerted effort to try it before we dismiss it.

Still, this seems to work. And from multiple, repeated usability tests on the Canary, the first thing everybody always says *still* has been "I want full-text links." Now they can have 'em.

A quick aside: the folks behind the Registry and Gateway at OCLC are supportive of this approach and want to see more people using it, so don't be shy (but cache your queries so as not to be rude, either :). If your institution's resolver isn't in the Registry, or your resolver's record there is out of date, use the form they provide to enter or update information about your service.

Have at it!

Comments

fly in the ointment

Dan,

There are two problems with the canary database implementation of this service that I have observed from the perspective of Cornell.

We use the WebBridge link resolver product from III. WebBridge requires two pieces of information:

1. a sid (service id) in the OpenURL that is sent to us

2. knowledge that an "origin" record for the canary database needs to be set up.

The first problem is easy, and is really just good practice. Please ad a sid to the OpenURL. The second problem is harder, but do able. I was talking to Scott Schultz at OCLC last week about this and he thinks it might be possible to broadcast an email to III libraries based on information in the OCLC registery, informing them to create an origin for the canary database.

- Adam Chandler

sid to be fixed

Hi Adam, thanks for writing in!

You're right about the first one, and you're not even the first person to write me about it - I just haven't had a chance to fix it yet, but will soon.

Thanks too for checking in with Scott about the second bit, and, yes, that would be an improvement. But does it really make sense for small resources providers like us to have to get this information disseminated through formal product knowledgebases?

It seems to me that a rule forcing a recognizable origin in incoming OpenURLs is antithetical to the goal of this original post - minimizing the configuration requirement at every node in the chain (publisher, library, user). Seems like it would be cheaper and easier all around to remove that rule from the product. Am I foolish for thinking that way?

in theory I agree, in practice at this point we need a workaroun

Dan,

Is the sid requirement with WebBridge a feature or bug? From the perspective of III, it is a feature -- it's pretty tightly integrated into the software. It allows profiles to be established based on what kind of options a library expects from a given origin, plus their reporting system is based on this same notion. I think a compromise might be for III to permit the library to set up a default action when a sid is not included in an incoming OpenURL. I'll talk to Ted Fons at III about that.

I understand and agree with your point about minimizing the burden on the small publisher. I do think there is something important missing in this model though. It is very helpful to the people at the library who are responsible for the quality of the resolver service to know that a link to their resolver has been activated ,but I'm not sure what the solution looks like exactly. The idea that Scott suggested, of broadcasting an email is certainly possible with the contact data that will be available in the OCLC registry. However, I agree it is inelegant and an additional burden on the small publisher. I guess it also points out the potential for spam abuse in what OCLC is planning with the registry, doesn't it?

- Adam

Your suggestion that "a

Your suggestion that "a compromise might be for III to permit the library to set up a default action when a sid is not included in an incoming OpenURL" sounds right and relaly helpful... with the caveat that an unrecognized sid should be allowed to be made acceptable as well.

I'm not sure I follow your point about knowing that "a link to their resolver has been activated." Do you mean followed by a user through a resolver? Because, if so, the library should know that, and can find it in the context of usage analysis, right?

And I'm not sure I understand how that relates to the emailing-out about another-new-sid issue. Do you mean that in terms of analysis again, or just the workflow about a new item for the knowledgebase?

Since I haven't had nearly the experience you've had dealing with usage reports and analysis, maybe I'm missing a huge piece of the model here. Sorry if so. :\