Will I need to understand the Semantic Web in 2008?

Lately I've been thinking a lot about alternate metadata universes where things might look rather different from our libraries' one item => one record world. The thing is, every time I reach some intermediate conclusion about it, the only people I can find who are thinking the same ways seem to be Semantic Web People, or at least people whose blogs/projects I follow, an overlapping set with people from our own profession who care about these things and have already drunk the SemWeb punch to some degree. They tend to call things by names different from what my brain wants to assign them, but no matter, so long as the URIs don't change, I suppose.

On the other hand, if somebody asks, today, in January 2008, "where is the Semantic Web?", I, at least, as a neophyte, have no idea how to answer. Except maybe to suggest that the Web was around for years before People (capital 'P' as in world-scale counts of "people") did interesting things with it and it appeared on a machine near me, and maybe we're now in a similar intermediate phase, so stay tuned, eh.

Which leads me to wondering - is now the time for all good library hackers to come to grips with the state of the SemWeb art? Have we crossed some tipping point?

I know which software libraries to try out, I have some data to muck around with, and there's plenty of interesting work on linked data, so there's something to start with. And it feels like it's time to start. So maybe now's the time?


interesting times for libraries/semantic web

Maybe it's just me, but I DO think things have changed re: semantic web over the last year or so, and it's not just that folks are buying into the promise of linked data. For one thing, the semantic web, by way of GRDDL and RDF/a is coming to the web we all know and love already. For another, microformats and tagging are emerging as something of a small 's' semantic web. Thirdly, a renewed appreciation of data first approaches make RDF a natural fit. Even ArtStor is moving away from cumbersome top-down metadata hierarchies to simple key-value pairs (if anything, RDF and OWL allow us to apply semantics when/if it matters and to avoid the premature optimizations of top-down metadata schemas). I think that the idea of RDF simply being triples -- subject-object-predicate, i.e. a key-value (predicate-object) pair attached to a resource (subject) -- rather than an obtuse xml format helps a lot, too.

The main thing, though, is that the semantic web is here already, perhaps not in the original Scientific American article way, but in a more subtle, pervasive & profound way as a natural maturation of the web itself. So we as librarians ignore it at our own peril. It's kind of like 1994 all over again... ;-).

--peter keane

It's definitely not just

It's definitely not just you. :) On the other hand, if you have to first learn to think about RDF and OWL and GRDDL and RDF/a before you can do anything (Jonathan's point, I think), we're definitely not close to 1994 yet.


I guess what I meant to say is that technologies like RDF/a, OWL, GRDDL, etc. are getting to the point that we DON'T need to think about them (I'd put the REST architectural principles in there as well), but rather the tools we use (social bookmarking, blogging engines, IR's (!)) will know that when you type in an author's name, that data will be accessible to the outside world in a properly semantic encoding:

Peter Keane -> identified by -> http://blogs.law.harvard.edu/pkeane/
Peter Keane -> has email -> pkeane@mail.utexas.edu
http://onebiglibrary.net/comment/reply/220/5371 -> authored by -> Peter Keane
(triples that define what "identified by", "has email", "authored by" by pointing to a URI that defines those relationships included here too).

So when I go an post to another blog with the same identifying information, a semantic search service can understand both as the same person and aggregate them as "blog postings by Peter Keane". There were a bunch of things going on around http in 1994 that the folks at NCSA, Univ of Illinois, etc. that we all didn't need to understand but that made it all work. God know if we DID all need to understand OWL, the semantic web would be dead in the water (note here OWL makes my head hurt). The underlying principles that all these resources can be described by simple triples (and that the stated relationship itself can be described thusly as well -- think controlled vocabulary) is the real key. And that resources are identified by persistent, network addressable URI's. That (it seems to me) is all there is too it. Frankly, the semantic web needs librarians (since we've been doing just this for many years) more than we need the capital 'S' sematic web (too top-heavy with impenetrable concepts!).

The first step is to get our controlled vocabularies all accessible by way of REST-based web services!


Yes; it's time

I think you put your finger on it that there's a valuable shift in perspective (that can in turn can have big practical payoff) from moving away from the monolithic "data record" to the notion of a linked web of data.

I know Ed's been doing some interesting things with linked data. You could always get together with him and brainstorm some ideas/play with some code? I'd be interested in seeing what you come up with.

A key benefit of my current

A key benefit of my current job is that I sit in the cube next to Ed. :) He's one of the main "overlappers" (in the sense meant by my original post), and we talk about this stuff all the time. You're another one (assuming you're the Bruce I think you are)!

I came to a similar

I came to a similar conclusion: Rather that the semantic web people are the people thinking about the _problems_ that I think we need to solve. They've identified the metadata problems we are faced with in what seems to me to 'right' way, and are a community working on solutions. Do they have the 'right' solutions? I'm not sure. I don't know enough about what they're doing, don't understand it enough. I too have felt like I should resolve to learn more about it but I've had trouble finding an entry point. So much of what they're doing seems to be this weird esoteric knowledge, that unless you've been in the community for the past ten years, along for the ride as they've developed what they've developed---it's hard to get on the train at this point.

Somebody in the know needs to spend more time writing documentation and introductions for newcomers.

There's no O'Reilly book on semantic web, is there?

I knew I should have taken that one somewhat relevant class form Stuart Sutton in library school.

That "Semantic Web: The

That "Semantic Web: The Missing Manual" point is a good one and I've been thinking about that too. Until there's a "Semantic Web: In A Nutshell" or "Semantic Web Cookbook" or even a "Linked Data Hacks" we'll just have to keep digging through the older titles that embody the same spirit (e.g. Spidering Hacks and the like).

Or, maybe we should write one.

I kind of like: ....

All of these are accessible, and written by smart people doing practical work with RDF (notably, the first two are from contributors to the linked data movement, and the last from Ian Davis at Talis)..

Thanks for the links

Thanks for the links, Bruce. The first one in particular is one that I keep coming back around to and each time I read it I get something different out of it. This last read after you mentioned it in your comment comes after reading RESTful Web Services, which brought a whole new meaning to "linked data" served up through a REST architectural style.

To Dan's original question, a tipping point does seem to have been reached, at least among those at the leading edge of the library profession. (I shudder to think about those still coming to terms with the fact that MARC is not the end-all, be-all of descriptive formats.) I, for one, seem to be in about the same position as Dan in trying to make sense of it all.

as good a time as any

It's nice to see you blogging about this Dan. One of the essential things for me that came out of being part of the unapi effort was the importance of understanding HTTP status codes, content-negotiation, URIs, REST, and web architecture in general. At the end of the day I think understanding how people generally think about the architecture of the web, and how the various pieces in the standards jigsaw puzzle fit together is really important. So I'd recommend reading AWWW to anyone who has an interest in learning about semantic web technologies, but doesn't know where to start.

I think the links Bruce posted for understanding RDF are good ones. A few months ago there was some discussion about establishing a discussion list for semantic web technologies in libraries (a counterpart to xml4lib if you will) ... but in the end nobody did anything. I think the inaction was the result of wanting to see things percolate a bit longer. Perhaps now is a better time? I'm kind of undecided. One of the benefits to a little niche discussion list is that it helps library folks get their feet wet, without having to dive into the big. sometimes overwhelming discussions going on. Also, the issues that come up are rooted in the library domain, so they are more understandable to people that are new to a set of technologies. The downside is that participants don't engage with the wider discussion going on about information technologies in general, there isn't enough cross-fertilization of ideas, wheels are reinvented, there isn't enough exposure, etc...

As for whether it's important to understand semweb technologies: the code4libcon 2008 lineup and the OAI-ORE effort are good indicators that it's important. code4libcon presentations are chosen by anyone who cares to vote, and as such are one of the better barometers we have for measuring what's important to library technologists. I count at least 4 out of 22 talks that are explicitly about semantic web technologies, and I suspect that at least another and perhaps a keynote or two will mention them.

OAI-ORE on the other hand represents the work of a tight knit cabal of digital library technology standards leaders. The recently released data model, vocabulary and Atom profile are deeply informed by semantic web technologies such as RDF, RDFS, OWL and web architecture concepts such as URIs, conneg, autodiscovery, etc. So I think based on these two points it's possible to triangulate that at least an understanding of the basics of RDF and web architecture is a valuable thing to have.

That being said, I think you have it already :-) I prefer to look at semantic web technologies as a fine wine that's been taking some time to age rather than the Kool Aid that you gotta drink this year.

You're not alone!

I was in an email exchange recently with one of the librarians here at University of Mary Washington about similar things. She asked:

Ten years ago, I wanted to know enough about cataloging to be a better searcher (and to teach students to be better searchers) but I didn't want to make a career as a cataloger. Now, I want to understand the concepts behind, say, RDF, for the same reasons. . . but I'll leave it to better men than I am, Gunga Din, to employ RDF day in and day out. From those of you who already grasp the SemWeb and its tools, I would be grateful for an opinion: Is that a workable stance for a "new librarian?"

My response was a resounding "yes!" followed by many paragraphs, the gist of which is that eventually (soon?), a little knowledge of RDF will help you understanding the inner workings of the tools you'll use, but in general all that RDF (and head-hurting OWL!) will be happily hidden from the user.