A simple, old design for widespread blog mirroring

(Remembered this after last week's interview, and thought I would re-post it in case it makes any sense to anyone, assuming a few more interested people might be watching just now.)

Back at the Access 2004 Hackfest I worked with a few folks to design up a design for widespread copying and mirroring of blog content for distributed-copy and "preservation" purposes. I think we came up with something that could definitely work, at low cost, on a wide scale. Rethinking it today I'd substitute Atom for RSS, and maybe rethink using METS (perhaps instead just using Atom for both purposes).

It's a pretty simple idea: you extend an aggregator system to "archive" entries posted each day into bittorrent files, and then build a secondary system to turn the data distributed over bittorrents back into browseable "blog" mirrors if/when you need to. The best part is that you don't really need any new technology to do it.

Weblog mirroring system diagram

The main drawback is that you're dependent on the quality and completeness of what you get in the source feeds to begin with, which isn't always good enough. But, I still think it could work.

Trackback URL for this post:

http://onebiglibrary.net/trackback/166

DataPortability.org and the Dream of a Web 2.0 Backup System

I just discovered DataPortability.org through Peter Van Garderen’s blog post about it. I was entirely surprised that I’d heard nary a peep about it. Some basic examination (running a WHOIS query on the domain) shows that it’s still a ...

Kevin S. Clarke (not verified) on February 19th 2007

I keep hearing people mention the idea of replacing METS with Atom. I wish someone had the time to tinker with it and come up with a first draft (I know you probably don't Dan)... it is just such a tantalizing idea I'd like to know if it would really work.

dchud on February 19th 2007

In this case, I think it'd just be a question of whether or not you need the wrapper capabilities of METS. Would we need per-entry checksums, or technical metadata blocks, or filespecs? It's possible you would want all that, so METS would still be a useful shim, even if you have METS wrapping Atom, and then an Atom feed of the METS torrent/bundles (so you could subscribe to the torrent/bundle feed).

But, maybe that's silly because if you have Atom on the outside and Atom on the inside, maybe it would be logically and programmatically simpler to just integrate those layers into a single, flat feed.

I'm not sure which level of specificity is more important - we'd have to build it and test it widely to see, I think.

Graham (not verified) on February 20th 2007

Missing from Part 3 of your diagram is a picture of a station wagon loaded with backup tapes. ;-)

dchud on February 20th 2007

That was supposed to be implied. :)

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <pre> <code> <img> <ul> <ol> <li> <dl> <dt> <dd> <blockquote> <form> <input> <span> <object> <embed> <br>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <apache>, <bash>, <css>, <diff>, <dot>, <java>, <javascript>, <mysql>, <perl>, <php>, <python>, <rails>, <ruby>, <sql>, <xml>. Beside the tag style "<foo>" it is also possible to use "[foo]".

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
13 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.