Podcast processor configuration files

If you’re in a hurry, see the examples below :). They should be reasonably self-explanatory.

The configuration file comprises one or more stream specifications that tell the script how to chop up and combine the various podcast input sources. Timestamps are whitespace-delimited (space or newline), but whitespace is otherwise insignificant. Use # for comments.

Stream specifications can appear in any order (except for ^ frame inputs, see below), but this will make the file harder to understand. We recommended grouping all stream specifications of the same type in chronological order. Each stream specification starts with an input specification, followed by zero or more segment specifications.

Input specifications

These are of the form: [type:filename:num]

type is mandatory, and must be one of a/audio, v/video, or f/frame. Audio and video inputs are self-explanatory. Frame inputs are special type of video input based on a single still image, which could be from a JPEG image (preferred), or automatically extracted from a video source or PDF. You need to specify a frame number for the latter (see below).

filename is mandatory for frame inputs. The special filename ^ indicates that the frame is to be extracted from the immediately preceding segment, so segment order within the configuration file is important for these. If you’ve specified default input files using --audio or --video on the command line, you can omit filename for audio and video inputs (as appropriate), and the script will use the default inputs. If there are no default input files specified, or you want to input from a file other than the default, then filename is required.

num is optional and represents either the ffmpeg stream number for audio and video inputs, or the frame/page number for frame inputs. It is zero-indexed (so page 1 of a PDF input is specified as 0). It defaults to 0 if omitted. For frame inputs, you can also specify -1 or last to indicate that it should use the last frame of the input file (negative values other than -1 are currently not supported).

Segment specifications

These are normally just a sequence of timestamps, representing punch in and punch out times. Timestamps are in HH:MM:SS[.ccc] format. HH and MM are currently mandatory, but this will change soon to allow more flexibility. The punch out time for a segment must be later than the punch in time, but otherwise you can list timestamps in any order. However, we recommend listing them chronologically for the sake of sanity!

If you provide no timestamps, then the script will generate one segment punching in at 0 and punching out at the end of the input. If you provide an even number of timestamps, each pair of timestamps (t₁, t₂) will generate a new segment punching in at t₁ and punching out at t₂. If you provide an odd number of timestamps, each pair of timestamps will generate segments as above, plus one final segment punching in at the last timestamp and punching out at the end of the input.

As frame inputs are a still image, the punch in and punch out timestamps effectively determine the duration of the generated frame (i.e., punch out minus punch in). It’d be unusual to specify more than two timestamps for a single frame, but if you need to generate multiple versions of the same frame with different durations, this should work as expected.

You can also use the special sequence @filename to generate a punch out point based on the duration (not the content!) of the specified file. This should either be the only entry (implying punch in at 0, punch out after file duration), or preceded by exactly one timestamp (implying punch in at timestamp, punch out at timestamp + file duration). This is handy if you want to do something like insert a filler frame that matches the duration of an audio file.

The punch in/out times for corresponding segments in the audio and video streams don’t have to match (this would only occur when the audio and video inputs come from the same file), but the total duration of the audio stream should match that of the video stream if both are included.

Examples

These illustrate some common use cases. Some have been tested, but not all!

# Read the entire video and audio input (defaults).
[a]
[v]

# Segment the video and audio, skipping irrelevant parts at the start and
# end, and five minutes in the middle.
[a] 00:01:35.000 00:25:00.000 00:30:00.000 00:54:27.000
[v] 00:00:17.000 00:23:42.000 00:28:42.000 00:53:09.000

# Split into two segments, separated by filler audio (filler.wav), and a
# filler frame generated by repeating the last frame of the first video
# segment to the same duration as the filler audio.
[a] 00:01:53.000 00:23:15.000
[a:filler.wav]
[a] 00:49:42.000 00:50:25.000

[v] 00:02:15.000 00:23:37.000
[f:^:last] @filler.wav
[v] 00:50:04.000 00:50:47.000

# Stitch a collection of individual JPEG slide images together with the
# recorded audio. Notice that the punch in and punch out times for the
# audio correspond to the punch in time of the first frame, and the
# punch out time of the last frame, respectively.
[a:audio.wav] 00:07:59.000 00:53:27.000
[f:slide-000.jpg] 00:07:59.000 00:08:46.000
[f:slide-001.jpg] 00:08:46.000 00:10:28.000
[f:slide-002.jpg] 00:10:28.000 00:12:19.000
[f:slide-003.jpg] 00:12:19.000 00:13:53.000
[f:slide-004.jpg] 00:13:53.000 00:14:26.000
[f:slide-005.jpg] 00:14:26.000 00:16:22.000
[f:slide-006.jpg] 00:16:22.000 00:20:16.000
[f:slide-007.jpg] 00:20:16.000 00:20:50.000
[f:slide-008.jpg] 00:20:50.000 00:22:32.000
[f:slide-009.jpg] 00:22:32.000 00:22:49.000
[f:slide-010.jpg] 00:22:49.000 00:25:59.000
[f:slide-011.jpg] 00:25:59.000 00:26:25.000
[f:slide-012.jpg] 00:26:25.000 00:25:50.000
[f:slide-012.jpg] 00:00:00.000 00:00:05.000
[f:slide-013.jpg] 00:53:00.000 00:53:27.000

# Extract slide images from a PDF and merge with the recorded audio.
[a] 00:00:04.000 00:26:33.000
[f:slides.pdf:0] 00:00:04.000 00:03:39.000
[f:slides.pdf:1] 00:03:39.000 00:09:14.000
[f:slides.pdf:2] 00:09:14.000 00:13:58.000
[f:slides.pdf:3] 00:13:58.000 00:17:05.000
[f:slides.pdf:4] 00:17:05.000 00:17:13.000
[f:slides.pdf:5] 00:17:13.000 00:17:22.000
[f:slides.pdf:6] 00:17:22.000 00:17:28.000
[f:slides.pdf:7] 00:17:28.000 00:21:25.000
[f:slides.pdf:8] 00:21:25.000 00:24:04.000
[f:slides.pdf:9] 00:24:04.000 00:24:07.000
[f:slides.pdf:10] 00:24:07.000 00:26:17.000
[f:slides.pdf:11] 00:26:17.000 00:26:33.000