Still fighting with audiobooks

I’m fighting with audiobooks again.

I’ve mostly got my process down, but there continue to be new levels of dumb to deal with. As it stands, the process goes something like:

  1. Buy an audiobook. Generally, this is on CD because I trust Audible about as much as I trust Ubisoft, and for the exact same reason.
  2. Rip the discs to high-quality MP3. On OSX I use Max, and on Windows I built a custom version of CDex to support disc numbers.
  3. Queue up all of the individual tracks and play the first few seconds of each, looking for chapter breaks. For the breaks, title the track something smart like 4: The Leaky Cauldron.
  4. Use the audiobookify Perl script I wrote to bundle tracks into chapters and convert from MP3 to M4B format so that I get nice chapter markers, cover art, etc. (And because pretty much every device plays M4B at this point.)

It’s a real PITA most of the time, but manageable. Sometimes, if I’m lucky, the CDDB titles for the tracks will be something like Chapter 4c so I don’t have to manually scan and tag.

But sometimes it’s just profoundly dumb.

Some audiobook producers (Blackstone, for one) think it’s amusing to disregard the actual spoken breaks in the book and instead use some fixed amount, such as tracks every 3 minutes. Some producers find it necessary to use all 99 available tracks on a CD. Some include intro and outro music on each disc. Some include a minute or two of the end of the previous disc as the start of the next disc (and then don’t put the overlap into its own track). Some audiobooks are only available as giant disc-length MP3 files with no breaks at all.

Why am I so hung up on chapter breaks? I spend most of my time listening to audiobooks while running. Trying to manage iPod volume and sport accoutrements while running involves a whole lot of fumbling. Occasionally, I’ll accidentally hit the track forward or back buttons. This is compounded by the length of audiobooks, which are often packaged into 5½ hour chunks, owing to a longstanding iPod firmware issue that does hinky things with longer audiobooks. Without proper breaks, scanning through a 5½ hour file is a nontrivial task.

When I am confronted with such lunacy, the easiest solution I’ve found is:

  1. Join together all of the track files into one giant file.
  2. Use mp3splt to find likely chapter breaks based on silence. But since I can’t under-split, as it would offend my delicate sensibilities to have 18 of 20 chapters marked, I have to over-split. And mp3splt is a bit of a hammer, so you have to really over-split. I find that a 20:1 ratio is what generally works: for every 1 chapter in the audiobook, split the book into 20 parts.
  3. Pick up on step 3 above.

I occasionally run into audiobooks that are just a bit too long to fit neatly into 5½ hour chunks. In such a case, SoX has functionality to let you change tempo without changing pitch, which works well on audiobooks. I’ve found that speeding up most audiobooks up to 20% is almost negligible for impacting comprehension. This is also useful for the narrators that are just a bit too slow for my liking.

I’ve been looking into speech recognition software to help me out. You would think that it would be trivial: obtain an electronic text of the book, give the audio and the text to some magical program that parses the speech and associates timecodes with the text, and then do a phenome search for words like chapter, book, and part.

Unfortunately, it looks like that magical software doesn’t exist yet. The closest thing might be Google Audio Indexing, the technology that powers Google Voice. But as near as I can tell, it’s not for public use yet. Something might be built atop CMU Sphinx, but it would take a significant effort.

Oh, and on a side note: M4B files will upload to Google Music (with some limitations on file size), but are automagically converted to MP3 format on their servers, thus removing any chapter breaks. Amazon’s Cloud Player does not yet support M4B files.

Side note the second: a 5-hour M4B file of decent quality weighs in at around 100MB. Even really long audiobooks, Neal Stephenson’s Anathem for example, don’t generally run more than 30-35 hours. You could fit even that entire book as a series of 7 M4B files on a single CD. Up that to a 1GB flash drive and get ridiculous audio quality instead of just decent. And again, pretty much every device supports M4B files these days. Or, you know, you could go on selling a 2lb stack of 28 CDs.

Categorized as Books

By Rick Osborne

I am a web geek who has been doing this sort of thing entirely too long. I rant, I muse, I whine. That is, I am not at all atypical for my breed.