Migrating from iPhoto to digikam

Posted on: Wed, 28 Nov 2018 19:22 By: patrick

This is a longish story about how I migrated my photo library from Apple's discontinued iPhoto application to the open source program digikam (version 5.9.0 beta from March 2018). If you follow a similar migration path and make the same decisions as I did on what data to keep and what data to discard, then I hope you will find a reusable solution in this story. These are the pieces:

  • An AppleScript for the initial iPhoto export
  • A shell script for post-processing and preparing the exported data for the digikam import
  • The instructions how to configure digikam to make the import work
  • Optional cleanup work and sync'ing back of metadata to image files

A warning before you dig in: Although I have tried to explain everything in as much detail as I could, you should prepare yourselves for surprises and an arduous migration - it certainly was for me.

Background story

I've been a long-time user of Apple's iPhoto. I still think iPhoto is a great photo organizer, but alas Apple has discontinued the program a few years back (as Wikipedia tells me, the announcement was in 2014), thus effectively turning iPhoto into a data prison for my images. When iPhoto's designated successor, the Photos app, was released it was reportedly buggy and didn't have much to offer feature-wise, so I didn't make the switch immediately. Instead I started to think about a migration to an open and platform-independent photo organizer in order to escape the data prison I had built for myself with Apple's help. Having heard of digikam I investigated it, but at the time no ready-to-use version was available for the Mac, although it was basically possible to compile it yourself via MacPorts.

The migration project stalled and I continued to use iPhoto because, after all, it still worked perfectly fine for all my purposes. As the years went by I occasionally had a peek at digikam's status and was happy to see that the software was alive and kicking, or at least not slowly dying off as some open source software project do. In late 2018 I received a jolt when suddenly I found myself unable to import images into iPhoto from my wife's iPhone. As it quickly turned out, the reason was the new .heic image format introduced by Apple with iOS 11. I now knew that iPhoto's time had finally come.

I was very pleased when I saw that the digikam website now had a ready-to-use macOS installer package available for download. A few clicks later I had a working digikam on my system, and this got me so worked up that within two weeks my migration plan was ready.

Other migration solutions

An initial bit of research to oppose the NIH syndrome did not turn up any iPhoto-to-digikam migration solutions that I could use for my own purposes. The first solution I found is a pair of Python scripts titled photokam, but I didn't look at this too closely because it seemed very old (2009) and targets an ancient digikam version (0.9.3). The second solution I found is slightly newer (2012) but still targets an outdated digikam version (2.1.1). Moreover, it is based on "photokam", so I discounted that solution, too.

What to keep and what to discard

With this initial research out of the way, I was free to develop my own solution. First I needed a way to liberate the data that was stuck inside iPhoto. I made a list of what I wanted to preserve during the migration:

  • The original image as well as any modifications I had made in iPhoto. For those that do not know iPhoto: Whenever you make a change to an image, iPhoto keeps the latest version of the photo, plus it preserves the original image file in case you want to revert. Intermediate versions are not supported.
    • Heads-up: Later in the migration process I decided that if a modified image version exists I want to keep only that modified version. You can find the reason for this decision later in the story.
  • The photo title, if one is present
  • Any comments I might have written
  • The rating (1-5 stars)
  • Any keywords that I had tagged my photos with
  • The event structure, which is my primary method to organize my photos. For those that do not know iPhoto: iPhoto completely decouples the way how image files are stored in the file system from the way how you organize your photos. You can have as many albums as you want, either in a flat list or organized hierarchically as you see fit. A photo can be present in any number of albums. In addition iPhoto organizes photos into a flat list of so-called "events". A photo can be present in only one event, but you can decide in which one. I had never used the albums feature much, so it didn't hurt me to abandon the few albums I had created over time. But events were crucial for me. Obviously I had events for vacations and other time-based occasions, but I had also created a few "artificial" events that acted as theme-based containers for single images that had accumulated over a long stretch of time (e.g. "Family", "Friends", "Pictures of Switzerland", etc.)

Migration step 1: Exporting the data from iPhoto

I then started to write an AppleScript program that would export from iPhoto the things from my list above. Why AppleScript? Well, it's quite simple: Why dig around in iPhoto's internal data structure (AlbumData.xml, SQLite databases, file system organization) when I can instead use an official and documented API? A solution that I had also considered for some time was to try and write an iPhoto plugin, but this would have been much harder to achieve than a simple scripting solution, so I immediately gave up this plan as soon as I became aware of iPhoto's AppleScript API.

As it turned out, the AppleScript way wasn't without its hardships. First of all I had to teach myself how to program in AppleScript - an interesting experience because of AppleScript's "natural language" philosophy, but in the end I spent rather more time than I would have liked on simple tasks such as copying a file. Besides struggling with the programming language and library, I also encountered a number of additional problems:

  • I found that iPhoto doesn't expose events in its AppleScript API, so I had to work around this by manually creating a smart album in iPhoto for every event in the iPhoto library. I configured each smart album so that it would only contain photos from its corresponding event. Via this kludge I was then able to write the AppleScript program so that it iterates over the smart albums instead of trying to get hold of events.
  • The longer the AppleScript runs the more resources it gobbles up, until it either freezes or dies. In the end I had to resort to divide my smart albums into chunks that comprised approximately 1500-2000 photos. On my MacBook Pro with 16 GB RAM this seemed to be a safe limit that the AppleScript could handle. EDIT: I'm not so sure any more that this was actually a general problem with the script, it might also have been an effect of my MacBook going to sleep at the wrong moment while the script was running unsupervised.
  • iPhoto's internal database appears to contain garbage file references for many (but not all!) images in older events before 2011, and that the AppleScript API accesses this garbage data. For an affected image, the AppleScript API returns two different file references for the original and the modified image file (thus implying that there is both an original and a modified version of the image), but only the reference to the modified image file is valid and points to an existing file - the reference to the original image file is invalid and points to a file that does not exist! I looked at a random sample of the problematic images in the iPhoto UI: There it appears that the images do not have a modified version at all! Selecting File > Show in Finder > Original correctly locates the image file, and it is the same file that the AppleScript API claims to be the modified image version. To make a long story short, I had to rewrite the AppleScript so that it was capable of dealing with invalid file references: Instead of copying the image file it now creates a dummy file with the extension .missing, which indicates to the post-processing step (discussed later in this story) what happened.

You can find the final AppleScript in my tools Git repository. The script is fairly well documented and if you follow a similar migration path as I did you might even find it useful. Please make sure, though, to read the blurb and caveats at the top of the script.

Export result

So at this point I had all the data I wanted in a folder structure that looked like this:

# /path/to/export-folder
#  +-- album-name1
#  |    +-- originals
#  |    |   +-- photo1.png
#  |    |   +-- photo1.jpg.metadata
#  |    |   +-- photo2.jpg
#  |    |   +-- photo2_1.png
#  |    |   +-- photo2_2.png
#  |    |   +-- photo3.png
#  |    |   +-- [...]
#  |    +-- modified
#  |    |   +-- photo2_1.png
#  |        +-- photo3.png
#  |    |   +-- [...]
#  +-- album-name2
#  |    +-- [...]
#  +-- [...]

digikam research

I then started experimenting with digikam how to import the liberated ex-iPhoto data into digikam's database. These were my findings:

  • The import is done via Import > Add Folders...
  • The simplest way how to get the iPhoto metadata (title, comments, rating, keywords) into digikam is via a sidecar .xmp file. For this to work you have to configure digikam to read sidecar files under Preferences > Metadata > Sidecars > Read from sidecar files.
  • digikam does not link image versions when it imports image files, even if those image files are named according to digikam's versioning file name scheme (e.g. suffix _v1 for version 1). I therefore abandoned my plan to preserve the original image during this migration.

Migration step 2: Post-processing and preparing the exported data for the digikam import

The digikam research showed me that I would have to perform post-processing on the exported data to prepare it for the final digikam import. Instead of post-processing I could have modified the AppleScript program from step 1 to directly generate the data in the necessary format, but I didn't want to spend more time fiddling with a programming language and library that I do not know nor like very much, and I also didn't want to re-run the tedious and time-consuming iPhoto export.

Because I am comfortable with shell script programming, I ended up creating a combination of shell/AWK script that performs the following post-processing:

  • Convert the metadata in the .metadata file created by the AppleScript into an .xmp sidecar file.
  • Convert the iPhoto export folder structure, which contains both original and modified image versions, into a new straightforward folder structure that contains only the latest version of each image.
  • Add a file extension to those image files that have none. That there are such files is something that I discovered only after I examined the iPhoto export folder structure. An image file's extension may be missing but the image would still be usable by iPhoto because of the file's CREATOR/TYPE metadata that is present in the HFS filesystem. For digikam, however, a file extension is required, especially if the digikam photo library is managed on a system that does not run macOS.

You can find the shell script in my tools Git repository. Obviously the script can only operate on a folder structure that was generated by the AppleScript program in step 1.

Migration step 3: Importing the post-processed data into digikam

As mentioned above, before the import can begin digikam must be configured to read .xmp sidecar files under Preferences > Metadata > Sidecars > Read from sidecar files.

The actual migration is then started via Import > Add Folders.... Select the top-level folder created by the shell script in step 2 and wait until the import is done.

Cleanup overview

Here is a list of cleanup work that I performed after the migration was done:

  • Delete .xmp sidecar files
  • Delete useless technical JPEG comments
  • Fix encoding issues in titles and comments

Some of the steps require direct access to the digikam SQLite database file digikam4.db. I am using the application "DB Browser for SQLite" for that purpose. It can be downloaded from sqlitebrowser.org and is available for all platforms. Important: Do not access the SQLite database file while digikam is still running!

Cleanup step 1: Delete .xmp sidecar files

During the import digikam not only imported the image files, it also imported the .xmp sidecar files. These are no longer necessary and can be safely removed with this shell command to remove clutter:

find /path/to/digikam-folder -name \*.xmp -print0 | xargs -0 rm

Cleanup step 2: Delete useless technical JPEG comments

I found that a large number of image files contain useless technical JPEG comments which needlessly clutter the digikam database. The following SQL query reveals those comments which occur most often:

select
    comment,
    length(comment),
    count(comment) as numberOfItems
from
    ImageComments
group by
    comment
order by
    numberOfItems desc

As an example, in my case the top offenders were:

  • AppleMark (3594 images)
  • KONICA MINOLTA DIGITAL CAMERA (2281 images)
  • Empty spaces (1552 images)
  • LEAD Technologies Inc. V1.01 (415 images)
  • OLYMPUS DIGITAL CAMERA (57 images)
  • Created with GIMP on a Mac (34 images)

This SQL query deletes one comment:

delete from
    ImageComments
where
    comment = '<the comment to delete>'

Cleanup step 3: Fix encoding issues in titles and comments

The import from iPhoto did not handle well special non-ASCII characters, such as German umlaut characters (äöü) - apparently iPhoto uses a different encoding than digikam for such characters. There may be more elegant and faster solutions, but since the number of problematic titles and comments was quite manageable in my case I decided to resort to a manual approach:

  • Create a list of titles and comments that contain non-ASCII characters
  • For each title and comment use the digikam UI to manually search for images with that title and/or comment
  • Use the digikam UI to manually edit and fix the image title and/or comment

The following SQL query lists all titles and comments in the digikam database.

select
    id, comment
from
    ImageComments
group by
    comment

I copy&pasted the output of the SQL query to a text file and then ran an AWK one-liner script over the text file to find any titles and comments which contain something else than just ASCII characters. The script's input is expected to be a text file with two fields separated by a tab character, each field beginning and ending with a double quote character. Example:

"12345"<tab>"comment"

And here's the AWK script. Note that the list of ASCII characters in the script is incomplete, which means that the script will result in a few false positives, i.e. it will find some titles and comments although they contain only ASCII characters. A notable example is the single quote character (').

awk 'BEGIN { FS="\t" } { comment=$2; gsub(/^"/, "", comment); gsub(/"$/, "", comment); gsub(/[a-zA-Z0-9 ,():\.\"\-\+\!\?\/]/, "", comment); if (length(comment) > 0){ print $0}}' /path/to/digikam-titles-and-comments.txt

In my case this resulted in something over 300 entries. Note that a given entry can be present in more than one image.

To find the images for an entry, copy&paste the entry into the search box in the digikam UI (Browse > Search). To fix the entry for an image, first select the image, then call up the "Captions" pane by selecting the tab labelled Captions on the right-hand side of the main window, then write the correct character into the title or caption field, and finally hit the "Apply" button. Note that you can select multiple images to perform the change on all of them - this is very useful several images contain the same title and/or comment.

Synchronize digikam database and image metadata

All the metadata that digikam imported from .xmp sidecar files, possibly with edits to fix encoding issues, currently resides in digikam's SQLite database only. I decided to complete the migration by writing back all the metadata into the image files themselves. This process also causes the useless technical JPEG comments that I deleted from the digikam database to be deleted from the image files as well.

First digikam has to be configured to write metadata back to image files. This is done in digikam's preferences dialog under Metadata > Behavior. Select the checkboxes for "Image tags", "Captions and title" and "Rating". In the same place I also like to disable the setting "Update file timestamps when files are modified".

To write back the image metadata, select the menu entry Tools > Maintenance. This pops up a dialog which at the bottom contains the maintenance tool named "Sync Metadata and Database". When you select this tool you can decide in which direction you want to sync: Obviously you want to select "Database > Image metadata".

Article Styles

Add new comment

The content of this field is kept private and will not be shown publicly.

Filtered HTML

  • Allowed HTML tags: <h1> <pre> <br> <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type='1 A I'> <li> <dl> <dt> <dd> <h2 id='jump-*'> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.