I've written before about taking some of the ideas from MPEG21 DIDL and LANL's xmltape model and implementing these using brain-dead simple techniques. I've started working on this and progress so far is very good.
Here's the plan: first, I'm not building a long-term preservation archive today, so I can play fast and loose with the full-on requirements of the OAIS Reference Model. Instead, I'm building a quick-n-dirty standardized repository structure for use in prototyping search interface usability. If we get this funded, then I'll worry about the longer-term issues and stuff like updates, but they aren't a requirement today.
Second, instead of MPEG21 DIDLs in XML a la aDORe, I'm using a simplified version of the DID model in JSON. Here's the object model:
[img_assist|fid=15|thumb=0|alt=Suki AIP object model|caption=Suki AIP object model]
(Note: I don't know UML; rather, I have Visio. :)
The SukiAIP id is a package identifier; the SukiItem id is a content identifier; the SukiResource is a resource (datastream) identifier. Oh, and, this codebase is codenamed "suki", as in Japanese for "like", "love" or even "adore". :) Pronounced much more like "ski" than "sue-key".
Third, instead of XMLTapes, I'm using plain old zipfiles, as supported in the python standard library. *Zero* additional coding required, no separate indexing step, nothing. Just stuff the JSON SukiAIPs into the zipfile using their package identifiers as filenames, and both instantaneous retrieval and compression comes for free, fully debugged.
More on the application architecture soon... suffice for now to say that I'll be managing SIP handlers and ID cross-referencing in an rdbms with a Django admin front-end and Django templates for OAI-PMH and unAPI responses.
Don't get me wrong, I *love* aDORe, and think MPEG21 DIDLs are a fine idea, as is XMLTape, and I'd advise anyone to go that route. But, I don't need all that just now; instead, I just need this to be up, and running, and ingesting millions of items, like, yesterday, because I have, like, no time and no funding to build this repository. Hence the oversimplification, and, I think I can get this whole thing done and sucking up mass quantities of data in a matter of days.
SPARQL JSON results format?
Dan -- have you looked at the new JSON version of the SPARQL results format? Might be relevant ...
Good to know about
Hey Bruce - thanks for the pointer, this is good to see.
I don't know if this is directly relevant... the examples in the post and the OPA wrap output are really about object wrapping, not so much about expressing query result sets. But, to the extent that a result set is an object, we can stuff it in a jsondidl. :)
I'm definitely encouraged to see more use of JSON.
Hmm... this is making me wonder, has anybody implemented an implicit web-page-as-RDF-store, in something like a greasemonkey script? i.e., for any arbitrary web page, automatically treat it like a knowledge structure. Then you could do SPARQL queries against it and if the page uses microformats or something like it, or references external protocols like feeds or unAPI or whatever, you can pull in data from those very explicitly and mix and match as needed. Hmm...
Post new comment