THATCamp 2009
Another THATCamp has come and gone and it was, again, a lot of fun. I've grown used to the dynamics of an unconference in the past five years or so because that's the kind of event I attend most of the time, now. JCDL 2009 was the first academic conference I'd attended in years, and though I enjoyed it as well and met a lot of interesting people and learned some useful stuff, it was missing the energy the mix of people at a good unconference can generate. And, though I feel like a self-important prig as I write this, I hated that though I'd made the effort to attend, there was no chance for me to get up and show off some stuff I'd worked on in front of the group. I use software that lets a user to become a committer; I value friendships that let a student become a teacher; I attend conferences that let an attendee become a presenter. Take out that dynamic and it's nowhere near as compelling.
Because it features this principle, as any good unconference does, the best part of THATCamp is the people. Both years I've met so many fascinating people and learned about so much amazing work that it's taken the whole week following for my brain to settle back down and follow up on all the threads left dangling on sunday afternoon like so many thesis topics. There's talk of franchised THATCamps to be staged in Austin and London among other places, and that's exciting. There's a #thatcamp channel on freenode that threatens to become a regular hangout. I've got about 50 more people I'm following on twitter all of whom already fill my screen with fascinating stuff to read and look at all day and some of them are even following me, too. What more could you ask for?
Well, there are a few things. I think there are a few tweaks to the formula that could improve the event a bit. I offer these only in the hopes of making THATCamp even better, not to complain or kill anybody's leftover buzz.
- Shorter sessions. This year the sessions were 1:15 long; for intense topics that engage everybody in the room that's what you need to give everybody a chance to go deep. But for open-ended discussions where there's as much airing of concerns about how "this needs to happen" and "we have to do that", 1:15 is about 25 minutes too long. It might have just been the sessions I chose this year, but it seemed like I was in more of the latter type sessions than the former, and that was a bit of a let down. Also, there were as many as five or six sessions running concurrently in several slots on the first day, any three or four of which I would've liked to sit in on. Tightening the schedule could allow for more time blocks and cut down on the number of simultaneous tracks.
- More hacking. When you go from having Bill Turkel teaching people how to fire code into an Arduino and the Omeka developers teaching how to write plugins and even me doing a simple tutorial on how to make little colorful balls dance around on screen with Processing one year to basically none of that the next year, it's a bit of a drop off to somebody like me who likes to learn by doing, especially in realtime at a moment when I'm jazzed up by all the amazing people and ideas in the air.
We talked about this a bit in #thatcamp on IRC last night - maybe if the sessions were a bit shorter and there were fewer concurrent tracks, one of the extra rooms could be a "hackin' room" or some such. Sorta like the chillout room at a rave with plenty of water and comfy couches where people can take a breather but, er, well, the exact opposite of that.
It might just be that I'm a little bit disappointed in myself for not prepping a hackier topic myself. I put a lot of time into hacks just for THATCamp last year and it was great fun pulling them off. I'd like to think that it was fun for the people in the room with me, too, and either way I learned a lot from the experience and I hope that was mutual. This year I was burned out on conference travel and work and didn't have the extra cycles to put something fun and new together, and I'm sorry I didn't. If I get to go again, I promise to do whatever I can to bring the hackin' back in!
- Let us do our own scheduling. This is probably the biggest one. At the Foo Camp I went to the intro evening session ended with everybody mingling around big schedule boards where times, topics, and rooms get worked out among the attendees in realtime. It's messy and takes a while but it ends with drinks and everybody's just happy to be bumping into all the other fascinating people around them anyway so it serves as a nice icebreaker, too. At THATCamp, CHNM staff instead comb through ideas posted in advance to the blog and group and sort and lump and split topics into sessions with titles that don't necessarily match what the idea-posters had in mind. I wanted to talk about improving web sites with linked data but where do I go to talk about that in this schedule? "Standards"? "Publishing"? "Software Development"? "Libraries and Web 2.0"? (that's where I went, and did a bit of the talk, but I'm not sure my topic was what everybody else there had in mind, and I know I wasn't alone in this mode of confusion).
By cutting out this dynamic let-the-people-do-it-themselves step you minimize opportunities for catchy titles to draw people in, for people to negotiate whether or not they should merge their own topics, and for people to simply get to know each other and decide which other people they want to be sure to hear from and hang out with right off the bat. And imho you maximize confusion about which sessions to go to and where you can find the people you want to hear from.
I'd advocate for filling out a big whiteboard with a schedule with people putting the names of their talks and their names with it and leaving a good 60-90 minutes to work it all out. On a real board or on paper (vs. online), so we'd have to occupy the same physical space. With drinks nearby.
I know Jeremy put a ton of work into scheduling because I caught him in the act when I arrived late so I know it was no trivial feat. I just think opening it up would be easier on @clioweb and @digitalhumanist and better for the rest of us too.
- Three word intros. Another nice thing they did at Foo was *very* brief intros of everybody in the room: your name, your affiliation, and *just* three words about who you are or what you're into. Mine would be: "Dan Chudnov, Library of Congress, One Big Library". It's a chance to put names to faces, it's another friendly icebreaker, and it's a chance for all of us 140-charsmiths to be clever.
- The schedule. Maybe it might help to have an evening meeting the night before for the welcoming session, the scheduling, and maybe one or two lead talks to kick things off. Then everybody can go get dinner or drinks and talk and think about what's coming the next morning and maybe work on their slides or demos or whatever overnight. You'd know when your slot is the next day, and which sessions you want to be sure to get to.
I don't want to be all "they do it better at Foo Camp" but these last few points really do reflect things that Foo Camp does a little better that I think THATCamp could adopt to make it just that much better.
And not to repeat myself, but I offer all this up with the hope of leading folks to think about various ways to make a great event even greater. I ain't complaining - the organizers do a great job making a lot of people with diverse backgrounds comfortable in a terrific space with plenty of coffee and wifi and surprisingly good food and nicely designed t-shirts and as long as they'll have me, I'll keep applying to attend again. It's just that I'm a bit of a hacker at heart and I'm always thinking about little optimizations, so take this as nothing more than that.
I hope to see y'all again next year, or even sooner - and next time you're in DC please stop by LC to say hi if you like.
TCDL 2009 talk: Better living through linking
Wednesday I spoke at the TCDL 2009 conference about why I think Linked Data is important for libraries. I've given talks about this twice before, once at the code4lib 2009 pre-conference on linked data, and a variation on that talk at the TCDL 2009 developers forum pre-conference Tuesday.
This was the first time I spoke about this in a room not entirely filled with hackers, though, so I couldn't just start talking about conneg and RDF models. It needed more context. As far as I can tell, the context that matters most is that we've been building a web for fifteen years, now, and we've continually changed how we build the web as we've changed how we use the web. So I spent most of the talk stressing how adhering to the four rules of Linked Data can help us make our libraries' stuff more relevant, more connected, and more likely to be found and used by improving how we link things together.
First, though, a comment about the contents of the slides - I work for the Library of Congress, but I wasn't representing the library at this talk, which I traveled to and gave off work hours. So that second slide is for real - the opinions are my own. You'll see a lot of LC examples, there, though, for two reasons. One is that I see these sites and think about them a lot, much like the rest of you, just more so because I'm there. When I can show an example from an LC site, it's likely something most people in a room have seen before and understand. The other reason is that LC has a long history of doing digital library stuff, so long that a lot of what's up there looks prehistoric in some ways, but at the same time, there are a lot of cool new things happening there, not all of which get a lot of attention, like LCCN Permalink. I don't work directly on any of the systems which have screenshots in these slides, so when you see images of those systems, you're not seeing my work. I know a few scattered details about the systems and am lucky to get to interact with many of the people who work on building them, but when I spoke about them at TCDL I had no intention of representing their work, and said so. My comments probably seemed more critical than promotional, but I meant them to illustrate situations we all find ourselves in at all our institutions, that we all know well about already, so it's not news to anybody that we all need to improve how we do things.
So, right, disclaimer doubly disclaimed. On with the slides:
I really enjoy events like TCDL - a single track, a healthy mix of public services, technical services, IT, managers, and administrators, and a tech focus but with a broad perspective necessary to talk tech in a roomful of diverse skills and interests. It really focuses my attention on the one or two issues that are at the core of the changes in technology coming at us. It seemed like people received the talk well, as I heard several comments from non-coders and coders alike about how it made sense that we should move in this direction.
Unfortunately I had to leave early but I'd encourage you to look at the abstracts and learn about all the great work being done in the Lone Star state.
Rot, rot, rot for the Natinals
Fanatic sports fan that I am it's been a pleasure to have become a pro hockey and pro baseball season ticket holder after moving to DC. I didn't particularly care for either the Nationals or the Capitals when I got here, but I went to see the Caps because hey, that's Ovechkin down there, and I went to see the Nats because hey, pro baseball in my own neighborhood.
Over the past two years I've come to enjoy rooting for both of these teams, and though my heart remains in Detroit (go Lions! etc.) I have no qualms about rooting for these two new teams, both of which were pretty bad when I first got here. The Caps are quite good, now, and though they have a ways to go (like, adding a few more fully-D-minded skaters), they're always exciting to watch. But you knew that already.
For my money, if the Nats were a stock, I'd be buying in big round lots right now. Their offense is loaded with all kinds of good hitters -- a healthy Nick Johnson should scare anybody, Dunn's a far more disciplined hitter than I ever realized, there's nothing to say about Zimmerman right now that the numbers don't say for themselves, Dukes still has more upside, and if Guzman could learn to take a few pitches, look out. Most of their starting pitchers keep looking better and better each time out, and they might actually get this guy in the draft, and at least the bullpen isn't losing *every* game for them anymore.
So, yeah, mark my words - look out for the Nats. Maybe if the Caps bow out early (not a done deal yet, I wouldn't count them out even going into Yannisburgh for a game six down 3-2), they could lend a few Os to their southeast neighbors.
Quick hack: re-rendering newspaper pages from OCR data
This is a page of want ads, based on an image of a 110-year-old newspaper page currently available here. Visit that link and you can see the "Text", which isn't particularly formatted like it was originally, and you can also grab a pdf or jp2 or zoom in to the jp2 online.
Or you can see my re-rendering of the same data using the ocr data with its text positioning info.
I'm rendering this based on a dumb reading of the ocr'd text coordinate data, implemented using Processing. The code is trivial and dumb, so it's not even worth posting. Even so, what's good about it is that you can spot the right number of columns and pick out some words that stand out from the page image in the right places. What's bad about it is that I'm not re-rendering font sizes correctly and there's all that weird business along the left side and the top. That and i'm certainly not understanding all the info the alto data is offering me, but that's easily remedied.
In other words, with raw data comes raw responsibility!
Stay tuned, lots and lots of this data coming along very soon... can't wait to see what the rest of you do with it.
The Zeitgeist Vocabulary
Found myself in a conversation the other day about searches and carefully watching what it is that people actually search for in your interfaces. Seems like if you could have an easy, convenient way to focus the staff of a library on which searches are occurring most often, which ones are rising and falling, etc., you might have a way to connect more people to resources, more resources to other resources, and more people to other people.
This makes me think of two commercial efforts that do this and, because of it, they deliver a compelling service. One is a medical information product that took a new approach ten years ago -- it might be old hat and common by now, but it was interesting when it started. They saw that ER was popular, their doctors told them that every Friday morning their patients would call up asking about whatever disease or cure was on ER the night before and whether they had it or should get it, and they realized: we need a "what was on ER last night" section. Apparently, the clinical staff who used their product found it very helpful.
The other is Mahalo, the "human-powered search" folks. Unfortunately their nginx proxy is b04k3d right now, but if it weren't I'd have a bunch of links to how they follow the latest news and popular topics into their hand-curated lists and links and bios and stuff. Maybe it's always just about what's happening right now, but it's not like their stuff goes away -- as they build it up over time, it adds up to an interesting repository of pop knowledge that seems to integrate bits and bobs found elsewhere in a helpful way.
So I was thinking about whether it would make sense for libraries to publish feeds or stats of the most common searches in their most prominent search interfaces. And if they did, what we could do with such feeds and summaries if we threw 'em together a bit to see how they add up.
Okay, I hear you, stop right there -- I'm not listening to your privacy rant. This data is important, and we can be thoughtful about how to use it to help our users without selling them out at the same time, so I won't let you sidetrack me. :)
I want a site where I can log in and see what people are looking for at a variety of places. Sorta like what you can see at Digg Labs, but, yknow, with search data. And on another screen, I want to see current summary data like Google Trends and historical summaries like Google Zeitgeist.
And then I want a screen where I can pick a topic that those tools tell me users are interested in, one for which I know my library has some cool stuff. Maybe that cool stuff has a whiz-bang site already, or maybe it's buried in some stacks somewhere without so much as a collection-level record, but we have it, and I want people to know about it. And when I pick that topic, I want to be able to see: for this topic, what tags are people using for it on delicious and digg? Which form of the query is most popular in my stats? Which terms from my local data's vocabulary are relevant for this if I have good metadata for it -- from LCSH, or DDC, or LCC, or MeSH, or AGRICOLA, or even just identifiers like ISBNs or IMDB links, whatever -- what access points do I already have for this?
Then, knowing that those terms from our side of the world are inevitably going to look arcane and unfathomable to the normal people out there, I want to be able to *connect* those terms from our world directly to what it is that normal people talk about when they talk about those things. I want to connect tags and terms, to associate topics and tokens, and to build up those associations over time. Then I want to use those connections on my own sites. I want to republish the mappings between the user tags and tokens and our topics and terms and use those to draw the bots and crawlers and spiderers into our stuff from what it is that other people are looking for. There are any number of ways we could do this, but the magic would be to share the work of making the connections, and then to bring the connections local into our collections.
The next time Mahalo goes to update their page about that popular topic, they'll redo all their searches in the popular non-human-powered search engines and they'll find our stuff. Because the machine searchers will have found our stuff, because we will have connected the two. As far as I can tell, the smart librarians have long since found ways to do this, just like the folks at Mahalo.
We could call it the Zeitgeist Vocabulary.
Just a thought.


Recent comments
9 hours 37 min ago
6 weeks 6 days ago
7 weeks 2 days ago
7 weeks 2 days ago
7 weeks 2 days ago
7 weeks 3 days ago
7 weeks 3 days ago
9 weeks 2 days ago
9 weeks 3 days ago
10 weeks 2 days ago