Audio
Article by
on November 25, 2013This is a high-level article on all things digital audio.
Recording
There is a lot of knowledge required to make good recordings. But, from what I have gathered it takes minimal skill to at least get on the ladder of being good at recording audio.
Recording Software
First, let me start by giving a quick overview of some software.
- Audacity: this free cross-platform app will enable you to do a lot very easily. It has a great UI, community support, and documentation.
- ffmpeg: this is also free and if you know what you are doing you can install it on any platform. It's a library, so there are command line tools as well as GUI's built on it. Just search for ffmpeg and you will find both. The command line utilities have enormous power and can allow you to do almost anything.
- SoX: this command line tool is a recent discovery of mine and is now my favorite tool, even more than audacity. Simply read the man page and you will see what I mean.
Terminology
I recommend becoming at least familiar with the numbers and key words behind some terms before recording. The SoX man page is a great place to start.
Think the number of times a second your computer actually "listens" to the audio per second. (Remember, computers are digital not analog.) In other words, it's the number of samples recorded each second. 44.1 khz means 44,100 samples are recorded every second.
- 8 khz is used in telephony
- 32 khz is common on audio files
- 44.1 khz is used on CDs
- 96 khz is used on professional recordings
Think the amount of information gathered (bits) per sample. To compare with a photograph, this would be the resolution; having a low sample size is like having a pixelated image. A 16-bit sample size means each sample contains 16 bits of information.
- 16-bit is common
- 24-bit is common in professional recordings
Think the amount of information gathered (bits) per sample per second. The formula is: bit rate = sample rate x bit depth x channels.
- 64 kbps is used in telephony
- 128-196 kbps is common in MP3 music files
- 550-760 kbps is common in FLAC music
File Size
It helps to be familiar with about how large files are based on their bit rate. (For math proficient people this should be obvious.)
- 64 kbps = 0.5 MB per minute
- 96 kbps = 0.75 MB per minute
- 128 kbps = 1.0 MB per minute
How to Record With SoX
This section will be a basic tutorial on how to record audio with SoX.
1. Record
First, select the default input device you want to use. There are SoX options to select the input source, but since SoX by default uses the default system input and I'm on a Mac it was easier to go to System Preferences=>Sound=>Input.
Next, you need to decide the quality of recording you want. If you don't care here is an easy example:
$ rec -c 1 -C 196.2 -r 192000 output.mp3
Note: use -c 2 if audio is stereo and -c 1 if audio is mono.
Now, if you want a lossless recording, you could do this:
$ rec -c 1 -r 96000 output.wav
Using the lossless recording you can down-size later. Keep in mind wav files have a very high bit rate, i.e. they will produce very large files.
2. Test Your Editing Options
Next, you may want to edit your recording. I usually want to remove noise and silence. Because these are highly configurable and highly specific to your recording, you may want to test your settings before running them live.
Read steps 3 and 4 to find what configurations you want, and then you can test your settings like this:
$ play output.mp3 noisered noise.prof 0.21 silence 1 0.2 0.5% -1 0.2 0.5%
If you want to test using 4.b. Option 2's voice recognition to remove silence, use this:
$ play output.mp3 noisered noise.prof 0.21 vad reverse vad reverse
Keep in mind that SoX has to process the file before playing, so there will be a delay. You can track the progress by looking at the bottom left of the output where it will say something like "In:21.7%".
3. Noise Reduction
a. Create a Noise Profile
First, you need to get a profile of the silence. On my audio, I had 8 seconds of silence in the beginning so I ran this:
$ sox output.mp3 -n trim 00:00:00 00:00:08 noiseprof noise.prof
The 00:00:00 is the start of the silence and 00:00:08 is the end.
b. Clean Up the Noise
Next, clean up the noise by using the noise.prof file we created:
$ sox output.mp3 output-clean.mp3 noisered noise.prof 0.21
From what I read the best is to choose a reduction between 0.20 and 0.30. The lower the number the less it will reduce and the higher the more it will reduce.
Note: this command took 1 min 8 sec on a 1 hr 8 min (31 mb) file.
4. Silence Removal
a. Option 1: Trim All Silence
The first option is to trim all the silence everywhere. I ran across a comment by Shawn Dowler who was optimizing his settings based on talk radio from internet streams. Run the following command to do this:
$ sox output-clean.mp3 output-clean-and-shortened.mp3 silence 1 0.2 0.5% -1 0.2 0.5%
Unfortunately, I found the silence in my recording to be very important and didn't take the time to tweak these settings.
b. Option 2: Trim Beginning and Ending Silence
The second option is just to trim silence at the beginning and end.
$ sox output-clean.mp3 output-clean-and-shortened.mp3 vad reverse vad reverse
Unfortunately, there was some silence in the middle of my audio so I had to switch over to Audacity to trim it.
Note: using Option 2 it took 41 sec on a 1 hr 8 min (31 mb) file.
5. Add Meta Tags
Finally, you may want to add meta tags to your audio file. I found a number of utilities to tag audio files, but the only one I found that supported setting the disc number was ffmpeg. Jon Hall has a great article on this; here is an example:
$ ffmpeg -i output-clean-and-shortened.mp3 -metadata title="Title of the Song" -metadata artist="John Doe" -metadata album="Album Name" -metadata date="2013" -metadata track="1/12" -metadata disc="1/4" output-clean-and-shortened-and-tagged.mp3
When imported in iTunes the file will be recognized as "Title of the Song" by "John Doe" track 1 of 12 on "Album Name" (2013) disc 1 of 4.
Further Reading and References
- http://ffmpeg.gusari.org/viewtopic.php?f=16&t=593
- http://ffmpeg.org/ffmpeg.html
- http://forum.audacityteam.org/viewtopic.php?f=18&t=50426&start=20
- http://jonhall.info/how_to/create_id3_tags_using_ffmpeg
- http://manpages.ubuntu.com/manpages/oneiric/man1/id3tool.1.html
- http://mmanoba.wordpress.com/2011/08/29/howto-capture-record-live-streaming-audio-on-mac-os-x/
- http://sox.10957.n7.nabble.com/Truncating-silence-in-the-middle-of-a-file-td431.html
- http://sox.10957.n7.nabble.com/quot-can-t-set-sample-rate-48000-quot-on-MacOS-td2209.html
- http://sox.sourceforge.net/Docs/Documentation
- http://superuser.com/questions/186077/command-line-id3-tag-editor-that-handles-all-tags
- http://ubuntuforums.org/showthread.php?t=1585928
- http://www.catswhocode.com/blog/19-ffmpeg-commands-for-all-needs
- http://www.codinghorror.com/blog/2005/12/variable-bit-rate-getting-the-best-bang-for-your-byte.html
- http://www.commandlinefu.com/commands/view/5230/capture-screen-and-mic-input-using-ffmpeg-and-alsa
- http://www.pa-software.com/id3editor/
- http://www.portaudio.com/
- http://www.sysop.ca/?p=89
- http://www.tldp.org/HOWTO/MP3-HOWTO-13.html
- https://developer.apple.com/library/mac/documentation/MusicAudio/Conceptual/CoreAudioOverview/WhatsinCoreAudio/WhatsinCoreAudio.html
- http://avp.stackexchange.com/a/3256
- https://learn.sparkfun.com/tutorials/analog-vs-digital
- http://en.wikipedia.org/wiki/Comparison_of_analog_and_digital_recording
- http://news.cnet.com/8301-13645_3-20055650-47.html
- http://youtu.be/4MqjPA5eOU4?t=2m28s
- http://documentation.apple.com/en/finalcutpro/usermanual/index.html#chapter=54%26section=2%26tasks=true
- http://wiki.audacityteam.org/wiki/Sanitising_Speech_Recordings_Taken_with_portable_audio_player-recorders
- http://aaron.birenboim.com/unix/audioRecording.html
- http://billposer.org/Linguistics/Computation/SoxTutorial.html
- http://www.thegeekstuff.com/2009/05/sound-exchange-sox-15-examples-to-manipulate-audio-files/