Ongoing questions about linked data and the semantic web
I'm getting a bit further along in trying to understand how the pieces are supposed to fit together. I don't have immediate answers but I've found one particular use case that I think is a big win for linked data. If I can assemble a useful working implementation I'll write about it here.
In the meantime, I keep stumbling over the following sticking points. I'm searching for answers (even for the ones not framed as questions), and would gladly take any advice or suggestions.
- I love the *idea* of linked data but I'm not sure I can buy into the current state of the art in how to best link data. In particular, it seems like "sameAs" claims should be jumping off points for human judgement, rather than being presumed to be automated declarations of equivalence. Let's automate bridging the human judgement pieces... that'd be interesting.
- I have never understood FOAF. It seems like a fine way to serialize a cult-of-personality network (e.g. "see? i'm only two steps from timbl himself!!") Similarly I don't get the whole "social graph" buzz either. I'm not a marketer looking to harvest customer data. I'm not doing any affinity indexing just now. What other use is there for saying who my friends are, besides those two?
- Does the linked data movement really depend upon RDF? It doesn't seem like it has to. Maybe it could grow faster if it didn't.
- The info resource / non-info resource dichotomy doesn't fit my brain. (Wherein everything is always a representation, and sometimes I can only share description, but that description is as important as any other representation, because surrogation is really important too.) It's been pointed out to me that this is still controversial. I can understand why.
- If blank nodes are bad (end of the section), how do I represent sets of literals that mean the same thing but are expressed in different languages? I need to do that right now and I can't figure out how without blank nodes.
- I'm still mainly interested in Description (talking about things) and am completely disinterested in modeling knowledge (what things are and mean) and seem to keep finding examples where arguments about best practices hinge on notions of essential truths ("is it a resource that can be dereferenced on the web? a dog is not, so it's a non-information resource") that simply never matter in the work I do (I'm a librarian and I want to improve ways to organize and provide access to stuff). I care about systems that help people come to their own judgements about what things are and what they mean, and in particular I care about systems that allow a wide range of people to come to a wide range of these judgements. If I have to start a system about things that aren't on the web by accepting truth-based categorizations about web availability, I'm shoehorning my system into an oddly-structured container from day zero. I think I don't want to do that. Granted, we've had to do that for centuries in libraries (see Books, Oversized, or the "basement full of gifts the President received while in office" in any presidential library/archives for good examples), but we make do for weird examples and still put "most stuff" in "standard, appropriate containers" (e.g. ordinary shelves and file boxes), rather than building whole systems around odd catch-all structures (basements full of stuff).
- er, that last one was over-long, so I'll try it this way instead. I think I'm interested in Linked Description, not Linked Data.
A colleague I trust and always learn from tells me that I should stop saying snarky-seeming things like the above in private channels and should instead say them publicly. It's weird - when some friends and correspondents hear me going on like I do in this post, they seem to presume that I'm saying I hate the semantic web or am bad-mouthing RDF. I don't intend to do either of those things. I'm seriously just trying to understand their place and whether they can help me and my colleagues in the work we're dong. After ten years of keeping an eye on them, and having real problems in front of us to solve right now, I need to know whether this path is going to help me right now. It seems to have a lot of potential, but I don't see many public examples of current implementations that solve real problems for people. I've read about a few private ones that do sound promising, but I can't see them for myself, so I have to dismiss them. There are cool linked data sites that offer up good data, but I haven't found many that really require the RDF/OWL/etc. flavor they're offering (i.e. they could just offer up their data in other formats and might be just as effective for data linking over time).
So I'm not trying to be dismissive or insulting, though I admit that sometimes I just am, and jerkily so. I'm trying to avoid that now by posting more thoughtfully here. I still need help understanding why my thinking might be wrong on the issues I listed above, but if the helpers take a defensive stand without backing up claims with working sites, I'm going to keep questioning the state-of-the-received wisdom. Please don't take it personally.
alf (not verified) on February 08th 2008
#3: "Does the linked data movement really depend upon RDF?"
I think linked data depends on identifiers, and metadata attached to those identifiers, which is essentially what RDF is.
You could just store all the metadata in a relational database if you wanted to though, so it's not a dependency as such.
#4 The "information/real object" problem is probably the most broken part of the "semantic web" as it exists at the moment, just because you can never be sure that if you ask for RDF/XML about a dog you're actually going to get that, and not HTML, or an actual dog.
dchud on February 08th 2008
Instinctively, I want to think that linked data depends on people. Identifiers are useful, but not always practical. To bridge the gap, we need humans.
I think. :)
Kevin S. Clarke (not verified) on February 08th 2008
Linked Description... that's catchy. I like it. It keeps the people in the process.
Ross (not verified) on February 08th 2008
I think you may be taking FOAF a little too literally. Rather than thinking of it as a vanity plate, it might be better to look at it as a vocabulary to describe a person (or other agent) and how they exist on the web and the relationships they have with other people, agents and services.
I had a hard time wrapping my head around the utility of FOAF until I was able to look at it in this context.
Also, I'm not entirely sure why identifiers and humans are mutually exclusive. It seems like a human (say, a cataloger) could just as easily create:
<http://www.loc.gov/lccn/2001017848> frbr:subject <http://www.loc.gov/lcsh/sh2002000569>
<http://www.loc.gov/lccn/2001017848> dc:creator <http://www.loc.gov/naf/n00003285>
As
010 |a 2001017848
100 |a Hjelm, Johan.
650 |a Semantic Web.
And it'd be less ambiguous regarding the 100 field. (Although this particular example fits more into the Yngwie J. Malmsteen category)
dchud on February 08th 2008
...er, that's what I thought it was. I see that as only useful for affinity indexing and mining customer data for marketing purposes, which are all fine and well, but I don't need to do those things. So why would I want other people to mine my own data and that of my friends? The reason I note people as "friends" in last.fm and flickr is that I like to see what my friends are listening to and their photos. I don't use other social networks because none offer me services as useful as flickr and last.fm, and otherwise I'm just hanging data out there for others to mine. So I don't.
If FOAF is primarily useful in this context for experimenting with and learning semantic web stuff, especially because semantic web stuff enables data mining and affinity indexing and similar automated analysis techniques, well, sure, then, I get that. But that doesn't interest me, because I'd rather learn this stuff with the descriptive data I want (and need)to link, rather than with FOAF.
Bruce (not verified) on February 08th 2008
Dan, just today Dan Brickley has a post that talks about the intentions behind FOAF (in the first paragraph). He doesn't go into an intense amount of detail, but the idea of enhancing the possibility for "serendipity" is kind of suggestive at least.
Ed Summers (not verified) on February 08th 2008
I don't remember anyone suggesting that people weren't an essential part of the mechanisms of linked data!!!
Specifically, I think deciding how to link up data with owl:sameAs (just one of the many ways to do it) is exactly the spot where domain-experts are essential.
Linking data requires identifying appropriate methods and algorithms for connecting up the URIs, and for deciding on how various vocabularies are related to each other.
As How to Publish Linked Data on the Web points out Record Linkage is an area of study unto itself, and there are people in the semweb community who are really interested in approaches to equivalence mining. It feels like pretty fertile ground for collaboration between library / compsci people.
Ross (not verified) on February 08th 2008
I'm not sure where you're getting this marketing angle for FOAF, still.
Most simply, it's a model for describing a person or group especially in the context of where they fit in the web.
Do you feel like vCard's sole purpose is marketing?
There aren't many metadata schemas available to describe a person, FOAF and vCard do a pretty good job (neither is perfect alone, FOAF doesn't address the 'physical world'; there's no 'address' property, for example, and vCard doesn't deal with the online identify very gracefully).
However, it's a good way to associate various bits and pieces on the web. Take, for instance:
<http://www.ariadne.ac.uk/issue43/chudnov/> dc:creator <http://onebiglibrary.net/bio>
<http://onebiglibrary.net/bio> foaf:first_name Daniel
<http://onebiglibrary.net/bio> foaf:surname Chudnov
<http://onebiglibrary.net/bio> foaf:homepage http://onebiglibrary.net/
<http://onebiglibrary.net/bio> foaf:holdsOnlineAccount http://unalog.com/person/dchud/
<http://onebiglibrary.net/bio> frbr:creatorOf
http://www.ariadne.ac.uk/issue43/chudnov/
<http://onebiglibrary.net/bio> frbr:creatorOf
http://dx.doi.org/10.1016/S0098-7913(00)00100-3
<http://onebiglibrary.net/bio> frbr:creatorOf
http://dx.doi.org/10.1007/s10393-004-0151-1
<http://onebiglibrary.net/bio> frbr:creatorOf
http://dx.doi.org/10.1145/544220.544319
So, from the Opening up OpenURLs with Autodiscovery article, we could potentially bring in other articles that you (or I, Jeremy, Raymond or Richard) had written; see the things that currently interest you via what you've stacked on Unalog; and see what you're currently thinking about on your blog.
Would this require FOAF? No, but FOAF would make it considerably easier.
Tony Hammond (not verified) on February 09th 2008
"If blank nodes are bad ..."
I don't think the authors are actually saying that blank nodes are bad. Indeed blank nodes serve a very useful organizing principle in RDF. Also, as anyone who is interested in persistent identifiers already knows, assigning and maintaining names is a time-consuming business. Obviously key resources should and will be named so that they can be externally referenced. There is, however, a danger of going too far in this direction and seeking to name *everything*. That would be a mistake. Names don't come for free.
Courtney (not verified) on February 09th 2008
The only social networking site I've seen using FOAF data is LiveJournal. Are other sites making this data publicly visible? That is my biggest question with all these metadata standards, who is using this stuff and where would it be most helpful to adhere to one standard.
LeBard (not verified) on February 10th 2008
Here is an article from TechCrunch on the release of Google's Social Graph API that uses FOAF and XFN. I haven't had a chance to fully investigate the implementation yet.
Post new comment