Saturday, January 31, 2009

Black swans and Google failure {Ramble}

Re-reading this article from last year in the Times (via George Dvosrky and BoingBoing) concerning Nassim Nicholas Taleb and black swans.

As I write Google seems to be going through a bit of a crisis, flagging all sites it's searches return with "this site may harm your computer" (it even spawned a Twitter hashtag #googmayharm) and redirecting you to an interstitial site:

Anyway it occurs to me that Google going bankrupt or suffering some huge failure that wipes out all the data stored in, say, blogger would be a black swan event. High impact and widely unpredicted.

This is exactly why I've taken to making local backups of my blog with this software using Blogger Backup. In the event of Google getting fubared I can be back up and running within two shakes of a Wordpress template.

I've also been doing something similar with my Delicious account, by using this website to create a local xml copy of all my Delicious bookmarks.

Incidentally Delicious is now becoming really useful: it's got to the stage (with 2096 tags and 2009 saved URLs) where it acts as a sort of private search engine of stuff I know I'll already be interested in.

But my obsession with long term data storage (and I mean for Long Now values of long term) has since been piqued by this article by Charles Stross. Stoss is talking about data formats being essentially forgotten after a few decades and data stored in those formats becoming inaccessible.

But the value of something like my Delicious xml backups may change for me because those websites might drop off the web.

So I'm currently looking for some software that will save all the pages associated with the URLs in my Delicious account as local html files.

LATER: Well Google is working properly and I found something like what I just described (a means to acquire local copies of all my Delicious sites) using the wget tool on UNIX based systems.

wget actually looks really awesome.

I know I should bite the bullet and switch to either Apple or Linux but I haven't got round to it yet, so in the meantime I'm looking for something similar to wget but for Windows...


Alex said...

If you do Python, the urllib module has a function called urllib.urlretrieve(url, filename) which loads the URI you pass it and writes the page to the specified file (unlike urlopen which returns the page to the caller as a file-like object). You could write a script to iterate through your links and urlretrieve them all. (

TJ said...

Thanks Alex. I'm learning the Python at the moment using "Dive into Python."

I'll certainly have a go at that.