Processing archival photos

I took about 100gb of photos between my two most recent archival trips (not even counting the ones from pre-dissertation research). If you read my first post on Research Tools and Technology (part I), you may remember that I use a DSLR with a 35mm f/2.4 AL lens that is well-suited to photographing text in low light. I also sometimes use a wide angle lens, depending on the shape of the document. The benefit of doing it this way is that, equipped with an extra SD card or two, I can take as many photos as I want, unloading them onto my computer in batches, and avoiding exorbitant copying fees. I do still copy certain things, though, that are especially important. With most of my data in digital image files, however, there comes the daunting task of organizing everything so that it is usable. The archival organization systems are good for this, to an extent. In my case, I visited two unprocessed personal collections and so needed to create my own structure.

Before even leaving for the archive, I had to decide how to process what I would find. My DEVONthink database is great once everything is loaded into it, but useless if I have not yet OCRed files or put them into some kind of organization. I read Miriam Posner’s excellent blog on “Batch-processing photos from your archive trip” and came up with a process that so far is working well for me.

Miriam’s process is to sort image files by archive, attach metadata, and then to put everything in a PDF. She does this using Hazel and Automator. You can read more about those on her blog post. Since my sources, even within a single archive, were so varied, I added a few steps.


First, I create a folder for “incoming photos,” which you can see here from my trip to the National Museum of American History’s Archives Center:


Within the folder, I dump all of the pictures as they come off my camera (for this, I got a little SD card reader stick on Amazon for like $5 so that I could do it in the archive while taking photos on a new card). It looks like this:


Then, later when I get a chance, I sort out the files by collection, like this:


Beyond that, I file images according to how I found them in the collection. For me, the spatial hierarchy of this system appeals to me but it may not work for you. I also want to be able to tell people exactly where I found stuff, even though citing the box and folder numbers is not customary. Here is an example from an archive at North Carolina State University:



Within those box folders, I do file folders:



Within those folders, there are usually not that many images so it is easy to open them in Preview to look at them. However, some of my sources are texts and it is tedious to open that many files, even with the navigation pane in Preview. Miriam turns hers into PDFs. Though for me, it would probably crash my computer to turn the files from a whole collection into PDFs, it works for these smaller batches. For example, I found the senior honors thesis of one of the thinkers I am working on, and compiled the photos into a big pdf that looks like this:



After that, you can OCR the text. Like Miriam, I use Adobe Acrobat Professional. She uses Automator, which is already on your Mac if you have one, to turn image files into PDFs using a script.

This is where I eliminated a step: Acrobat Pro will do this for you if you select all of the photos and open them with the application. It asks first, and it can take a few minutes, but it does a good job. It also makes it super easy to alter in whatever way you need to (like rotating the pages so they all go the same way).

The disadvantage of doing things my way is that you do have to look at your photos and decide where they go. This is made easier by my way of taking photos: I photographed every box and folder, even sometimes when I didn’t use anything within them. As icons in the finder, it is easy to tell when one folder ends and another begins. That eliminated the need to look at every single image.

The advantage of doing it this way is that going through your materials a few weeks or a month after you collected them will help you remember what is there. Also, since I took notes in DEVONthink in the order in which the files came and always wrote down the citation, it is easier to match up notes to images this way. It will be even faster when I put everything in DT, because the side bar will show the image when I look at the notes file, since both cite the same source and thus have very specific and similar text.

One last thing from Miriam’s blog that I will probably do is to dump the PDFs into Zotero. It never hurts to have a backup. It may also be helpful to have the finding aid organizational structure (mimicked in my file structure) in my citation software.

To briefly summarize:

1. Dump images into a semi-structured folder system. You can even make these folders ahead of time using the finding aid and then quickly sort into them.

2. View and sort files. The more sub-categorizing the better, if you have time.

3. Turn into PDFs and OCR. Put into Zotero.

4. Back everything up! (which I am doing as we speak, as you may be able to tell from the Time Machine icon in all of the above Finder windows).


Hope this is helpful to someone and minimizes the amount of work you have to do.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s