The flood of file formats in today’s audio industry can be a headache for those who haven’t kept up with the developments. If you’re looking for a one-stop reference for what the major formats are, what all the terms mean, and when you should use each, then you’ve come to the right place. Let’s get started by clarifying some terminology.
Codecs, Formats, and other Funtimes
When we’re talking about digital audio, a file is going to fall into one of three categories:
- Uncompressed Audio
- Compressed Audio (Lossless)
- Compressed Audio (Lossy)
These categories exist as a result of the perpetual toss-up between audio fidelity and file size. In an ideal world, we’d have infinite storage space and be able to use totally uncompressed audio in every scenario for maximum quality, but the reality is that it’s not practical to do so in most cases.
Files in any of the three categories will be saved in various specific formats, the majority of which we’ll briefly look at below, but before we do it’s important to make a distinction between a file format and a codec. A codec is an algorithm designed to encode and decode digital media — it creates the data itself. A file format, on the other hand, is more like a container in which the data is stored. There are many more formats than codecs, and certain formats are capable of storing more than one kind of media data.
One other term that is often associated with digital audio (and other files, in fact) is metadata. Metadata can be considered all the “extra” information that is included with a file. In the audio world, it usually consists of album art, artist name, album, composer, etc. Many formats allow this information to be encoded directly into the file for ease of use.
Essentially, you’re only going to be worrying about the file format in most cases, since the codecs are fairly standardized and not generally toyed with too much for audio delivery purposes.
The first of the major digital audio categories is the easiest to explain. Uncompressed audio is exactly what it sounds like: raw, full fidelity data with no missing or altered information. Consequently, it takes up the most hard drive space, requiring upwards of 10MB per minute of audio. Typically, uncompressed audio formats are used for large-scale theatrical and television broadcasts, and for storing or archiving audio.
Major formats in this category include:
- WAV: a common and flexible format that can store lossless PCM (the major uncompressed codec) audio at several different bit and sample rates
- AIFF: the Mac-centric format designed to contain PCM data
- BWF: standing for Broadcast Wave Format, this is a more complex successor to WAV that allows for embedded metadata, which makes it a favourite among professionals for film delivery since timecode information can be placed right into the file
Compressed Audio (Lossless)
This category has been the subject of many developments in recent years, surging from obscurity to mainstream very quickly. Audio that falls into this category is compressed; that is, its original data is made smaller by use of algorithms that can intelligently shrink the file size.
Lossless compression is a newer method of reducing file size and requires more computer power to play since the data is reconstructed in real-time as you play the file, resulting in a nearly perfect (hence ‘lossless’) reproduction of the original uncompressed data.
The advantage is that you maintain audio fidelity and use up much less space, but the disadvantage is that it’s not efficient when you’re trying to minimize resource usage as in, for example, a game engine, which must play and combine files during gameplay.
Here, then, are the major players in the current world of lossless compression formats:
- FLAC: the Free Lossless Audio Codec is a popular and open-source format that uses the codec of the same name to produce files that are 50-60% smaller than the original with no loss of audio fidelity. FLAC remains unsupported on many portable audio players, but its support for artwork and other metadata ensures it will make its way further into the mainstream presently
- APE: known as Monkey’s Audio codec, this system boasts marginally better compression than its competitors, at the expense of being less resource-efficient. It is not entirely open-source and has not seen wide adoption outside of Windows platforms and extremely fast portable players
- ALAC (M4A): the Apple Lossless Audio Codec is stored in the MP4 container format as .M4A files, but it is not derived from Apple’s lossy AAC format; rather, it is a similar algorithm to FLAC and APE. ALAC’s advantage is that it is quite easy to encode/decode, requiring minimal computer resources, which makes it ideal for portable players. It supports metadata tagging and is also the only one of the formats officially supported on the iPod
- WMA Lossless: this Microsoft format has drawn mixed responses, with some arguing that it does not contend with the competing formats, which may explain its relatively modest spread. It nevertheless is a solid format allowing for high quality variable bitrate compression of audio for archival purposes
- WV: WavPack is another open-source codec that provides extremely good compression, especially for music with a lot of dynamic range (like classical). It is not widely used, but one feature that sets it apart is a ‘hybrid’ encoding mode that produces both a lossy compressed file that can be independently used, and a ‘correction’ file that can be combined with the lossy one to restore the lossless source
Compressed Audio (Lossy)
Lossy compression is the most frequently encountered and is the most relevant for media composers because we are typically called upon to deliver our materials in one of these formats.
Originally, file compression methods (before lossless came around) all worked by actually removing non-essential data in much the same way as image compression algorithms work. Now known as ‘lossy’ formats, these workhorses offer the smallest file sizes, but also result in significant quality degradation, especially at higher compression rates.
Of course, the fact that the compression is lossy means that there’s no reconstruction happening when you play back the file, meaning that they’re also much more efficient. This has made them the mainstay of game audio and other mediums where both size and efficiency are significant considerations.
Let’s take a look at the major lossy compression formats:
- MP3: The Mpeg Audio Layer 3 is a proprietary format that has become the standard for digital music distribution owing to its incredibly small file sizes. It works based on the perceptual coding method, which works by discarding or reducing the quality of elements that are considered beyond the range of typical human auditory perception. Several levels of compression are available, with only the highest being truly viable for music given the high quality of today’s playback systems
- AAC (MP4, M4A/B/P): designed to replace the aging MP3 format, Advanced Audio Coding is another popular format that achieves better audio fidelity at similar bitrates and file sizes. It represents the Mpeg-4 standard and has been widely adapted
- 3GP/3G2: this format is actually a subset of the Mpeg-4 formats above, but it is specifically designed for mobile phone usage and therefore is primarily of interest only to those delivering to such platforms
- OGG: Vorbis’ mighty open-source format has taken the gaming world by storm, becoming a quick standard for encoding video game audio assets. It offers much better audio fidelity than MP3, but produces slightly larger files and does not have any standard for including metadata
- SPX: known as Speex, this is a project similar to the Vorbis one that is designed specifically for use in compressing speech for podcasts, Voice-Over-IP, and other similar applications. It can also be placed within an OGG container file
- RA/RAM: the aging RealAudio format is seldom encountered nowadays except in the deepest wilds of the ancient internet, but it was a standard delivery format for many huge names including the BBC as recently as 2009. The format was one of the first to be designed for internet streaming, but no longer competes with the others in terms of audio fidelity
When To Use What
Okay, so now that we’ve taken a look at the formats you’re likely to encounter, we ought to take a moment to discuss when you’re likely to use them.
Whenever it comes to delivering for film, television, or other properties that are likely to be mixed professionally and displayed on very good sound systems like the ones in theaters, you will want to deliver in uncompressed formats. Which specific format will depend on the preferences of the audio engineer that will be using your files, but typically you can rely on WAV as your workhorse delivery format. If you can, consider delivering BWF because the ability to embed timecode data right in the file makes it that much easier for the mixing engineer to place your cues exactly where they’re meant to go. Uncompressed audio is also the preference for archiving your music files in the best possible quality, of course.
Games are possibly the more inconsistent medium when it comes to what format to deliver in. Sometimes, you’ll be delivering the uncompressed audio and letting the audio engineer figure out what format to convert it to, but in many cases you represent the entire audio team so you need to be able to make good decisions. Nowadays, music assets are frequently encoded in the OGG format because of its good balance between quality and size, but games for mobile platforms and Flash games will often still rely on MP3 to keep the file sizes as small as possible for more compact delivery.
Losslessly compressed audio formats tend not to be used in media properties because playback requires real-time decompression, which wastes precious processing resources. On the other hand, they are quickly establishing themselves as a favorable alternative to the older lossy algorithms for delivering music for listening purposes and even for compact archiving.
One last thing to consider is that even though you’ll be delivering a lot of lossy files, you should make sure to always make a fully lossless audio rendering of your work. This archival version can be yours to keep in case the project files get damaged or lost, and it allows you to easily go back and re-encode in a different format without having to re-open the project to do so if necessary.
Obviously, in most cases the format you’ll be using will be the one your producer asks for, but if you ever get a say in the matter you will now be equipped to give an intelligent recommendation!