As seen at code4lib 2006: OPA-0.1
Attached is OPA version 0.1. OPA (OPA Proxies APIs) is a simple python application that demonstrates the unAPI specification. By default, it proxies all records at Pubmed, all photos at Flickr, and metadata for books with isbns from Amazon. It will also proxy any arbitrary OAI-PMH service provider which uses fairly consistent object identifier patterns.
It ships with almost everything you need to run it. If you want to use it to proxy Flickr or Amazon you'll need to get your own API keys and configure them in your config.ini. Also, it will expect to find ElementTree in your python2.4 install so do that first if you need to.
To test it, download and unwrap it. Then, try this:
$ python2.4 opa.py info:pmid/12345678
[('pubmed', 'application/xml'), ('text', 'text/plain'), ('asn1', 'text/plain')]This is a python list of formats for the Pubmed record 12345678 that are available through OPA. To see one up close, do:
$ python2.4 opa.py info:pmid/12345678 text ...text response here...
Really, try it. It works! Try the other formats too.
Then (get and) configure your Amazon key in config.ini, then search for a book like this:
$ python2.4 opa.py urn:isbn:013937681X
[('dc', 'text/plain'), ('mods', 'application/xml'), ('amazon', 'application/xml')]Just for fun, get the MODS record for that title.
$ python2.4 opa.py urn:isbn:013937681X mods ...mods record here...
MODS from Amazon.com? Who knew! Now you do.
Add an OAI target, like, say, Citeseer, into your config.ini.
[oai.citeseer] base_url = http://cs1.ist.psu.edu/cgi-bin/oai.cgi? id_regex = oai:CiteSeerPSU:([0-9]+)
Then go get a Citeseer record:
$ python2.4 opa.py oai:CiteSeerPSU:123
[('oai_dc', 'application/xml'), ('oai_citeseer', 'application/xml')]...etc., etc. Pick a random repository from the UIUC OAI Registry. Call ListMetadataFormats on it and then ListIdentifiers (with a specific metadataPrefix from the ListMetadataFormats response). Look at the identifer pattern and write a regex for it, along with the repo's OAI base url into your config.ini and try querying for a record there. Really, it's fun!
But it gets really fun when you run the unAPI web service, which now speaks unAPI revision 1 (what I showed at code4lib 2006 last week in Corvallis was version 0 with RFC2295 Alternates header content list responses; that's all gone, now :).
$ python2.4 opa.py Launching server: http://0.0.0.0:8080/
Visit your localhost here. You should see an empty unAPI formats response (empty because OPA *proxies* APIs, it doesn't have any data of its own, and hence no formats to list). Try some unAPI calls over the same data you were looking at before:
...or any of the other URI/format combos you tried earlier.
One caveat: this is a *demo* application. It's not designed for real web-scale services. In particular, it does not attempt to cache what it finds out about available formats for a given repository, so, the formats list and format fetch unAPI functions will each repeat calls to the server dumbly. So if you want it to scale up, fix that first before throwing it onto your production rack. :)
If you don't feel like running it yourself you can try the links above on my dev server (if it's up... ping me if it's not) at opa.onebiglibrary.net.
OPA!
Trackback URL for this post:
| Attachment | Size |
|---|---|
| opa-0.1.tar.gz | 30.47 KB |
Post new comment