0. This Cranky Librarian is tired of the promulgation of library standards that attempt to define things in ways that don't fit the way everybody else does things on the web. I would much rather see as few as possible new concepts and protocols introduced as part of new library standards development. Much like the reductionist philosophy of the Microformats project, the more we make what we want to make work happen in a way that fits naturally into what people already do on The Nets, the better chance we will have at gaining adoption and wide use for our new specs. I'm particularly Cranky when I have to find or to write my own software to handle arcane or obscure library concepts. I uncrankificate when I find that some part of a library specification actually just leans on some already widely known protocol or spec and the amount of original work necessary to put the new spec to use becomes simply a matter of developing a lightweight new application using some software toolkit I already know. Because of this, and because their goals are important, I'm excited about the OAI-ORE specification, whose authors purport to want to do things this uncrankish way. I attended their "here's what we've got so far" meeting this week, and these are my Cranky thoughts.
1. OAI-ORE sets out to enable us to identify and describe aggregations of web resources. Not just "sites", or "pages", but those funky sets of multiple things that the web architecture doesn't explicitly speak to: "resources made up of other resources". This is an excellent pair of objectives - identifying and describing aggregations - because sometimes these aggregations need to be cited as a whole, or moved as a whole, or versioned or chopped up or crawled or otherwise manipulated in some way that acknowledges that "yeah, these separate bits of things actually fit together as a molecular unit in this way with this name at some point and based on that I can do stuff with it more usefully." Wonderful.
2. OAI-ORE is being developed by a stellar crew of Smart People. Many of whom have PhDs. All of whom have developed well-known standards before, among other well-known work in their fields of expertise. Some of whom I've had the good fortune to meet and get to know and work with some over the years and of whom I can privately ask a blunt question in sincere expectation of getting an honest answer. Terrific. Though some of the specs these same folks wrote before have been the kind that make me Cranky. And although they aspire to having the development of OAI-ORE being open for community discussion, their public discussion archives are clearly not where the Real Work has been done on this spec to date. Frustrating. Herein my Cranky rant.
3. OAI-ORE defines what they mean by "aggregation", and its description in a "resource map", in an abstract data model. This is the first and most important OAI-ORE document you can read, since it lays out all the ideas. So we should read it very carefully to understand it well. This document repeats a whole chunk of the Introduction from the TOC in its own Introduction, which is confusing. For some reason also they try to explain the web architecture, the semantic web, rdf, and named graphs as part of the next section, "Architecture Foundations". These concepts are defined elsewhere. This attempt to restate their definitions here expands the scope of this document, not to mention that they've mixed core, proven architectural generalizations based on years of experience at scale (about the web architecture) with provisional generalizations about the semantic web as if the knowledge of the semantic web is driven by years of experience at scale (which nobody has). To me, this is where OAI-ORE starts to veer of the tracks and make me Cranky. I would rather see the Introduction merely say "we are building on the well-understood notions of Resource, URI, Representation, and Link as defined in the [web architecture docs], and the developing notions of named graphs and the semantic web as understood to date in [semweb docs]." This would be more honest, recognizing that they are attempting to leverage both things we all know and other things some of us think we know all at once, and together. And the abstract data model document would grow much shorter. Both of which would make me less Cranky.
4. Now it gets meaty. Sections 3 and 4 of the 0.2 version of the abstract data model are the meat, potatoes, and apple pie of OAI-ORE, where they define what they mean by an aggregation and a resource map. Slow down. Read these three times. Draw a picture or two. Read them again. This is the important part. It's only one-and-a-half screenfuls in my browser, which is nice. To me, this is precisely the place where OAI-ORE fully falls off-track. In my advanced state of Crankiness, I don't want to have to think this hard. Largely because I've learned over the years that as a librarian I will wade through this kind of thing patiently and repeatedly until I Think I Get It, but that that's a perverse instinct borne of insecurity and geek power that most Normal People don't share. If I have to work hard to understand it, somebody else won't want to work so hard, and both of us will be left not understanding, and fewer people will use the spec. This is my standard for standards. Can I understand it readily? Will other people? If the answer to either question is "no", I've learned, it's best to Just Move On because People Don't Read Stuff Anymore. Blame market forces. Blame some despot leader of a failed state. Blame HBO's new series In Treatment featuring Gabriel Byrne, new episodes every weeknight! But people don't read stuff. So we need to give them less to read, and make what we give them readily understandable.
5. Here's how I think I'd do that. The goals of OAI-ORE are (a) identifying and (b) describing resource aggregations. To do that with the 0.2 specs you have to wrap your head around ReM, URI-R, URI-A - three new acronyms. They expand to four new concepts: aggregation, aggregated resource, resource map, and resource map document. These are thoughtfully marked in BOLD TEXT so you don't miss them. The entire rest of this document (section 5) goes into great detail about how these new concepts interrelate, and it gets confusing quickly. If you want to jump straight to the confusing part, see section 5.8, which also introduces URI-S, URI-P, and URI-O, which are acronyms for RDF triple roles, and a table of the cardinality of relationships between all of these concepts. I understand all of these concepts on a basic level but am still struggling to understand how they really fit together and why this abstract data model needs to be talking about anything other than aggregations of resources as described by resource maps and identified by URIs. And why concepts as core to the semweb such as How You Connect a Graph of Ss, Ps, and Os where some of those things are really ORE things need to be spelled out here in this abstract data model. I would prefer to see all of this stuff removed from this document.
6. Version 0.Cranky.3 of ORE's abstract data model document has two sections - the introduction and the definitions of an aggregation and resource map. That's all we're doing here, right? So let's just do that. "In ORE we describe aggregations with a resource map, which may be specified in several compatible ways, and which must be discoverable via an explicit URI. For humans, this provides a bookmarkable or citable reference; for machines, this provides access to a named graph for further automated processing." Describe the internal logic of the graph in another document, and then only people who want to further process this stuff automatically have to read that much further, but everybody gets the core concepts: aggregations are described in resource maps which are discoverable via URIs and rendered in several compatible ways. That's the abstract abstract data model, right? So let's stop that document right after we say that, and then point people at separate docs explaining the core resource map concepts (the vocabulary), the internal structure of graph-based processing of resource maps (section 5 of the current abstract data model), the recommended discovery techniques, and other serialization examples.
(By now I'm violating my suggested reader crankiness rules, so I'll be briefer. The above is my main point, the secondary main point follows in 8 and 9.)
7. The vocabulary document is easier to understand, and should be the second thing people read after the data model. My beefs with it are simple: (a) why consider a resource map document a separate entity? What we really need is compatible understandings of the conceptual makeup of resource maps across diverse software implementations working with equivalent resource maps expressed in different serializations. The semantic web is all about triples adding up to a graph, right? Let us work hard to implement test suites and sample data that allows developers to build diverse but equally reliable digital libraries that understand resource maps compatably. The document is incidental to a resource map's discoverability and its internal model. (b) why define "aggregates" and "isAggregatedBy" when you can instead define "type resourcemap" and lean on "hasPart" and "isPartOf"? (c) I think Rob S. is onto something useful with his notion of a "resource in context", where a context can be defined with a date-of-access timestamp and that context can be used down (up?) the "scholarly value chain" by others to refer to what they read when they read it. But this concept isn't in the current specs.
8. If ORE is to be successful at web scale (why else follow principles of the web architecture?), which I would like it to be, then it has to be useful to normal people using ordinary browsers to save and share bookmarks and read text in HTML. Having worked a medical library reference desk for a few years I'm guessing the most important, most common, and simplest use case for ORE is to describe a "journal article" as an aggregation of "this html page, that pdf, these references, and those images", which is *what online journals already do*. This consistent expression of the molecular nature of an aggregation - in this case "all that journal article comprises" - can and should be equally manifest via microformat-style semantic html, rdfa-style embedded description, autodiscovery-style links to alternate expressions, and direct access to atom or rdf serializations. I'm ordering that list that way because I think that's the ordering most likely to be actually experienced by people, because that acknowledges what's already being actually experienced by people: first and foremost, through little blocks of html in the top corner of a journal article page. Let's give publishers a way to keep doing what they're already doing but also newly define those blocks consistently with an html pattern that allows processing environments to find those blocks and treat their contents like any other resource map. Let's do that first, because that's pretty much how everybody already does this. Like I wrote before, the "resource map document" is incidental - what matters is the resource map, and that should be made able to be discovered and understood unambiguously through a microformat pattern, rdfa, atom, or rdf.
9. If that makes sense, then the discovery document becomes much more important. And a section like its 5 (on "Methods Not Recommended for ReM Discovery"), which basically says "this well-known web publishing pattern over here, which seems so logical and useful that popular modern web frameworks are baking it in to their cores (note: link broken at time of writing, but I think that's the right link), yeah, that's the one, well, that, and yknow, SIMPLE HTML LINKS, you can't do those here. Nope, they're forbidden", can just go away. Lovely. Much less Cranky, I am.
10. Forgive me for repeating myself, but here's what I think ORE should say: "ORE specifies how to describe aggregations with a resource map, which may be rendered in several compatible ways, and which must be discoverable via an explicit URI. For humans, this provides a bookmarkable or citable reference; for machines, this provides access to a named graph for further automated processing." And the core vocabulary, and the discovery mechanisms, and sample renderings as microformattish html, rdfa, atom, and rdf/*, and provide a suite of web-accessible named test documents in all renderings to allow developers of ORE processing tools to know (a) what they have to do to make their code understand ORE resource maps compatibly and (b) how to communicate with other developers about their implementations and (c) when their code is done.
11. This rant does not Crank to 11.
Recent comments
1 day 16 hours ago
2 days 3 hours ago
2 days 6 hours ago
4 days 5 hours ago
1 week 5 days ago
3 weeks 6 days ago
4 weeks 1 day ago
4 weeks 2 days ago
4 weeks 4 days ago
4 weeks 4 days ago