Found via the Washington University DLS blog, here’s an article from the NYT on the challenges faced by the National Archives when it comes to the massive data dump the Bush administration is about to unload.
The archives invoked its emergency plan to deal with problems in transferring two types of electronic files: a huge collection of digital photographs and the “records management system,” which provides an index to most of the textual records generated by Mr. Bush and his staff members in the last eight years. … If the electronic records of the Bush White House total 100 terabytes of information, as archives officials estimate, that would be about 50 times the volume of electronic records left behind by the Clinton White House in 2001 and some five times the contents of all 20 million catalogued books in the Library of Congress … [the] agency was expecting to receive 20 to 24 terabytes of e-mail alone from the Bush White House.
That’s an incredible amount of material for the archives to ingest. Since the administration, the Vice President’s office in particular, is not providing details on the types of materials being provided, what this amounts to is the National Archives trying to implement an OAIS solution with a hostile data provider. The reason they’re hostile is because the data has political implications in the short run, but the archives needs to preserve the material for the long run. In the next year or so, I hope the archives will write and present a case study what they eventually ended up doing.