I've been doing some homework, trying to learn what RDF is all about, starting from suggested reading linked from the comments on my earlier post. So far I have a few basic questions and comments:
- I get that things "started with" RDF/XML. But why do we need anything more than N3? It's not the simplest grammar in the world, but it's readable, compact, featureful, and it's what people seem to use when they actually talk about RDF. On top of that, it's just text. These seem like winning features.
- I am befuddled by the sheer number of RDF examples that attempt to "model knowledge". What if I really just want to "describe resources"? Aren't those completely different activities? I think I want a framework for talking about stuff - not for representing stuff itself. Instinctively I'd guess this reflects the whole AI/KR "I Know What You Mean" heritage of hype, but we're not talking about the "Knowledge Representation Framework" here, right?
- Somebody please hire Joe Celko (or a sufficiently advanced AI thereof) to write us a "SPARQL for Smarties".
And another thing, hard to form as a question. This nearly-machismistic talk about "how many triples? HOW MANY?" feels like a distraction. Every bit of recent experience I have building data-backed apps with a lot of data coming in the door has taught me to (a) keep the data just like you got it, (b) build a way to extract enough from it to run your app, and (c) optimize the extracted bits to meet user requirements. The separation of steps from (a) to (c) means you can always swap out (b) and (c) from the original data in (a), keeping the sourced data safe over time. Especially because over time (b) and (c) just get easier and easier (faster machines, more RAM, better web frameworks, etc.). If this is common understanding (is it? it's what I know, now, at least), then what's the point of having a live environment for 2,000,000,000 triples? Is that really the only useful way you can query that kind of pile of data in all its semantic glory?
Instinctively I don't want to believe that that's true. I get that some applications are really about accumulating more and more data, and that that can get pretty big pretty quickly. But the triple model seems inherently optimized for flexibility, and as apps/data get bigger and bigger, you want to optimize for efficiency along a few known paths, and I can only imagine you'd again want to (b) extract enough from the source data to run your app. Which, I presume, would mean something very different from 2,000,000,000 triples.
Does that make any sense? Maybe I'm not stating my concern clearly... or just don't yet get something fundamental about all this (very likely!).
Recent comments
6 days 11 hours ago
3 weeks 3 days ago
5 weeks 6 days ago
6 weeks 3 hours ago
6 weeks 2 days ago
6 weeks 2 days ago
7 weeks 6 hours ago
7 weeks 10 hours ago
7 weeks 13 hours ago
7 weeks 17 hours ago