Tutorial on the Creation of Sufficient Audio Contrast

Introduction

WCAG 2.0 Success Criterion 1.4.7. is not about absolute volume, but rather relative volume between the foreground and background. In other words, contrast is about comparing the volume of the speech which is in the foreground with the background.

The relative difference between foreground and background audio can be measured with the assistance of a sound editor. Sound files are usually created using a sound production software tool such as Cubase, Cool Edit Pro, Sound Forge, Pro Tools, Sonar, etc., which are used to mix foreground and background audio. Programs like this range in price from under $100 to over $60,000. These programs give the user full control over the foreground and background. But they all have basically the same way of measuring and it is pretty easy to measure and separate the foreground and background.

RMS volume measurement

Most editors will allow an RMS volume measurement of a selection. Mathematically, this refers to the square root of the average of a group of numbers. RMS power is different than peak power, and is a more accurate rating of power than peak. A discussion of the difference between "power," "volume" and "perceived volume" can be found at:

http://www.eramp.com/david/audio_contrast.htm

Testing for audio contrast

In order to determine the difference in volume between the foreground and the background, we must compare two audio samples. Here is the process to determine audio contrast:

  1. Sample a section of the audio that contains only background.
  2. Sample a section of audio that contains foreground and background (where the background is similar to the sample that contains only background).
  3. Measure the RMS average volume, in decibels (db), of each sample.
  4. Compare the samples.

The foreground/background sample should be at least 20 db louder than the sample that only contains background. If this is the case, the audio is ready to be mixed and posted on the Web site.

Note: Broadcast media, such as TV and radio, often adds compression/expansion to audio, which can skew the results of this test. In that case, the background sample should be taken at a point within 20-100ms after a section where the foreground is present. The reason for this is that it usually takes about 20-100ms for the compressor/expander to activate, leaving a thin window of time to get a clean background sample that has the same compression settings as the sample containing foreground and background. If you are unsure about the presence of compression/expansion, use the audio examples below as references to determine if the audio in question has sufficient contrast.

It is also possible to "earball" audio contrast, for audio that is already mixed or if an audio editor is not available. Listen to the audio and compare the contrast of the voice and background in your recording to the good example below. This may be necessary for sound that is recorded live with the background already in it, such as an interview on a busy street corner. In that case you cannot separate out the foreground and the background, but can determine contrast, roughly, with the "earballing" method.

Examples

In the examples below I had control of the voice foreground, and a separate control for the music background. I mixed them and then I exported them. All the sounds are compiled together in the output and are hardcoded together (this is called "mixing"). But authors can open the source with the tracks all separate again, adjust the audio, and remix it. In the examples of mixed audio below, one example has sufficient contrast and one example has insufficient contrast.

Example 1 - Good audio contrast.

This example demonstrates a voice with music in the background in which the voice is the apprpriate 20 DB above the background. The voice (foreground) is recorded at -17.52 decibels (average RMS) and the music (background) is at -37.52 decibels, which makes the foreground 20 decibels louder than the background.

Audio Example: Foreground is 20 decibels above the background (mp3).

Transcript of example 1 (good contrast): 
"Usually the foreground refers to a voice that is speaking and should be understood. My speaking voice right now is 20 decibels above the background which is the music. This is an example of how it should be done."

Visual example: This audio example is represented below in a snapshot of the file in an audio editor. A section is highlighted that contains foreground and  background. It is a much larger wave than the section that contains only background.

Audio good contrast example

Example 2 - Insufficient audio contrast

This example demonstrates a voice with music in the background in which the voice is the not 20 DB above the background. The voice (foreground) is at -18 decibels and the music (background) is at about -16 decibels making the foreground only 2 decibels louder than the background.

Audio Example: Foreground is less than 20 decibels above the background (mp3)

Transcript of example 2 (bad contrast):
"This is an example of a voice that is not loud enough against the background. The voice which is the foreground is only about 2 decibels above the background. Therefore is difficult to understand for a person who is hard of hearing. It is hard to discern one word from the next. This is an example of what not to do."

Visual example: The highlighted section contains foreground and background. The wave is almost the same size the section that contains only background.

Audio bad example

Background music (c)1995 D. MacDonald