Before the next museum fire, make 4K video of all your documents

Tags: 
There are special machines for this but it's easy to make your own setup.

Many of you will have read of the tragic fire which destroyed the National Museum of Brazil. Many of the artifacts and documents in the museum were not photographed or backed up, and so are destroyed forever.

This includes things like language research notes -- the only remaining documents on now extinct human languages. Gone.

I hope this means that museums and collections around the world are now scrambling to make sure they have digital backups. This leads me to post a reminder of my article on Digitizing your papers for the future with 4K video. The idea there is simple. Get your papers and quickly go through them while being recorded by a 4K video camera. There is no software today to turn that video into a document. But there will be. And you can manually pull out any page by going through the video.

If you know people who work at a museum that doesn't have digital images of its archives, pass this note along!

The goal is to make it as fast an easy as possible so that there is no excuse. Museums, like the one in Brazil, don't digitize the collections because they don't have any funding for it, and the idea of making scans you can't immediately use is hard to get around.

As my article details, you just need a 4K camera (which is almost any nice point and shoot or digital high end camera) on a mount looking at your table. Set it up with the right exposure and light it as brightly as you can. The bright light will make the images much sharper, and set a short exposure. This is stuff almost every person, let alone every museum, already has.

This means you can now just quickly lay down pages in a pile at high speed. You might even be able to just flick through a book if you can be sure to do it one page at a time.

You'll get the data. You can check the video to be sure it worked. Then, of course, store it off-site! If the worst happens, not all will be lost.

Some day, people will write software to extract the pages from these videos and OCR them or turn them into a PDF. They just need a motive.

Today, there are some things that could be written to help with this. For example, while full extraction is hard, one could probably build a tool that detects pages and tries to find page numbers. That tool could tell you if you missed any pages going through a book, and which pages to go and make sure are in the video. Another tool could see if any pages never had a sharp image and warn you about that. As you flip through pages, it's possible you might never have one sit still for the 30th of a second needed to get a video frame of it. If you get the exposure up very fast with very bright light and a small aperture for large depth of field, it should be easy to get a sharp image of every page.

It could even be possible to "fan" through a bound volume. This will miss a lot of pages, but software might be able to provide a list of what was missed after two fans through the pages. The unique documents at a museum will not be bound books (unless super rare) but rather will be notebooks and loosely bound volumes.

Museums could also do this with artifacts, though there still photography is probably best. Just a photographer with a nice bright flash rig shooting every artifact, fast as they can. Just in case it burns.

For artifacts, take pictures around it in a circle -- again, bright light, such as off camera flash. Or as long as it doesn't risk breaking it, put it on a small turntable and spin it, taking as many pictures as you can, like 36 in a circle. Software is already there to make a very good 3-D model from that. However, if it's easier to just take a 4K video as you walk around it or spin it, do that. What matters is whatever gets you there fastest, until you have the budget to do it all more slowly.

Comments

As you're probably aware, the obstacles here are not technological, they're bureaucratic. Having worked with many of the top museums in the UK, the issue is not the money or the technology, it's the organisational reluctance to do anything new or different. I sympathise with this to some extent as their responsibility is to make sure the objects aren't damaged through improper handling or imaging, and obviously somewhere like the British Museum has much better fire safety protocols - and yet it is still very sad that they're so reluctant to try this, given that imaging (let along 3D/4K imaging) is so useful for so many purposes, even moreso than the usual labour-intensive metadata generation that museums tend to prioritise.

There are usually some smart and techy-savvy people at top museums but the managers tend to have little or no technological expertise, and even worse, they don't know who to approach. Often they'll have some 'digital' people on the board of trustees, typically C-team middle-management from big names like Microsoft and IBM; not anyone with hands-on experience of getting things done. So they end up making terrible deals with the likes of Google, signing away their IP. It's a real shame and I have tried to change minds but it's like banging your head against a brick wall. Nothing will change until the trustees change.

This "just get something & count on software later" strategy is a great idea, but involves a leap-of-faith that may not be typical of the staid management of these institutions. The concept probably needs a snappy name – memorable but a little self-deprecating – to help it gather mindshare among the relevant audience. "Shoestring Vidgitization"?

An interesting policy I 1st encountered at the UK National Archives is that whenever a document is requested, if it hasn't already been digitized, it's just-in-time-digitized before being handed to the requesting-researcher. With a large backlog, that seems a nice way to prioritize & protect against any careless/malicious researcher harming originals. Institutions that can't do high-quality digitization could possibly do this cheapo vidgitization whenever anything's requested – and ideally whenever all new holdings arrive.

For large backlogs, I could imagine a head-mounted unit: GoPro-like-camera & bright light. Send trustworthy but otherwise low-skilled vigitization gargoyles through the stacks. Maybe even two cameras, on both sides of head, to better catch depth info or rapidly-fanned pages from multiple angles. Give bonuses based on whether random probes of known holdings/pages turn up usable frames (even before full eval/reconstruction software is available.)

While the software to turn a video into a clean scan of a document has yet to be written, you can look at the videos with ordinary tools and see that they captured the pages, and in fact, while it is more work than having nice searchable documents, you can actually go into the videos by hand and pull out frames and manually clean them up with no extra technology. What doesn't exist yet is the software to turn a video into what an archivist would want for research.

The key is the bright light. With bright light, you get a narrow aperture and a short shutter speed, so you avoid motion blur and everything is in focus. In fact, you should even bump the ISO to be sure of this since the noise is easier to remove than blur. You could sit by a sunny window to do this, though shadows are an issue, so it's nicer if you can just set up several bright lights around the table to avoid shadows. Yes, 2 cameras is better to deal with glare but generally on a video the pages are moving so glare (or shadow) does not obscure the same region in every frame.

Later, once software does turn the video into a document, it will also identify the pages you should go back to re-shoot if you want an archival version for OCR. And yes, every archive should work to product good quality digital versions of all their unique items. But they also should not wait for that. With bright enough light, many phones today can shoot excellent 4K video and folios could be scanned at 100 pages per minute by slapping them down on the table.

I know the point of this is to save money, but I keep thinking about a robot that turns pages with the camera running.

People have built such robots. Aside from cost, there is the issue of convincing the archive to trust unique artifacts to such a machine. It's one thing to trust a book, even a rare book, because there are other copies out there.

This plan is for unique documents, which are not usually bound into books. These are folios of loose paper or perhaps bound notebooks. They don't fit well with such machines. Indeed, if a printed bound book is unique in the world, they don't usually trust it to a robot either.

I don't think the price or the speed of the scanner is the limiting factor for museums and archives. Rather, it is the physical objects themselves and how they are kept. I once researched a court file from 1870 in the Maryland state archives. It came as a block about the size of a brick, of papers all trifolded, some stapled or otherwise physically attached to one another, with a cloth ribbon tied around the block. I very much doubt that anyone had looked at it since it was shipped to the archives from the court clerk's office. I spent the better part of a day carefully taking everything apart, photographing each page, and reassembling it. Better equipment would certainly have sped things up. I was using a small held-held camera, and pens and coins to keep each page reasonably flat. So let's be optimistic and say I could have finished the job in a third the time, given better equipment. That still is a infinitesimal fraction of the total collection in the archives.

I am acquainted with the head librarian of the research library at the baseball hall of fame. He tells me he gets asked a lot about scanning the collection. They would love to do it, but the funding simply isn't there.

I suspect you could do it if you were willing to trust volunteers. So you would have to train and vet the volunteers, and only give them things to scan which they are both unlikely to damage and which can survive some damage, because mistakes will be made. Mistakes will even be made by skilled archivists, who would be used on the most valuable and difficult items. The choice is whether it is better to risk a small amount of damage or to risk total loss in the event of fire/flood etc. Get the risk low enough and it could be done.

Perhaps, but I suspect that getting and keeping volunteers is a pretty hard sell. Sure, it sounds like something a bookish person with some spare time would jump at, but the reality would be endless hours of fiddling with paper bits, the vast majority of which are utterly uninteresting, and which you don't have time to sit down and read anyway. Also, in the case of the Baseball Hall of Fame, it is located out in the middle of nowhere: a very pretty and quaint middle of nowhere, but not an area with a large surplus of bookish people with spare time, eager to volunteer their services fiddling with paper bits.

Yes, there may not be many. But any scanning is better than none, if there is risk of fire/flood.

Add new comment