Wave RIFF format Header and Data description by Nathan Davidson --------------------------------------------------------------- I've had a lot of people asking how to load in a .WAV file and/or what the format of a .WAV file looks like. So here's my attempt at explaining exactly what you've got in a wave file. The wave file is a Microsoft creation of the RIFF (Resource Interchange File Format) standard. The original format for such types of files were made in the C language, but can be easily ported to other languages (like pascal). RIFF type files are separated into blocks of data called chunks. The chunks specifically look like this: typedef unsigned long DWORD; //32 bits typedef unsigned char BYTE; //8 bits typedef DWORD FOURCC; typedef struct { FOURCC ckID; DWORD ckSize; BYTE ckData[ckSize]; } CK; FOURCC stands for four character code, which means you'll have four identifying Characters.The ckSize specifies how long the chunk is (minus the 4bytes of the FOURCC ckID) Then you have the actual data in the chunk. I. The First Chunk The first chunk in a wave file starts with 4 bytes - the FOURCC ckID of 'RIFF'. Always check for the first four characters in a wave file to be 'RIFF', if the first four bytes in a file aren't equal to this then you don't have a wave file. Then we have the File Size (a 32 bit value), which is the entire size of the file not including those first four bytes we used for the 'RIFF' identifier. II. The 'WAVE' RIFF id The following four bytes should say 'WAVE' which specify what type of RIFF this is. This is another good thing to check for. III. The Format Chunk Now we start the format chunk. Since we're starting a new chunk we have another 4 byte FOURCC ckID: 'fmt ' (notice the intentional space at the end of 'fmt ' and small caps!!) Which lets us know the format chunk is coming up. The format chunk holds info about the wave file, like how many channels it's using, is it 16 or 8 bit,KHz, etc.. Immediately after 'fmt ' is a 32 bit number that is our format chunk length. For a Wave file we use 16 bytes to describe the format, so the format chunk length should be 16. IIIb. Format Chunk First up in the data of the Format chunk is the format tag which is 2 bytes long and tells us what type of format the data is in. The most common format is PCM (Pulse Code Modulation) and the value to represent this is a 1. Next we have 2 more bytes specifying the number of channels, a 1 represents Mono and a 2 represents stereo. Following this is a 32 bit number that represents the sample rate such as 44100, 22050, 11025, or 8000. Next is another 32 bit number that represents the average bytes p/second, this is not really a neccessary number for most people and is usually just discarded, but it is nice to know if you're getting nitty gritty, To find the average bytes p/second you use this formula: SampleRate*Channels*(Bits/8), So if we had a 16bit Stereo 44.1khz WAVE file the avg. bytes p/sec would be: 44100(samplerate)*2(stereo)*(16(Bits)/8)=176400, This lets you know that your computer is gonna be crunching on 176,400 bytes every second it's playing a CD Quality wave file!! Next up in our wave file we have a 2byte number used for Block Alignment, this is another number that's usually discarded (it's used for helping to determine the number of Bytes used for each sample) but if needed you use this formula to calculate it: (Bits/8)*Channels So for an 8bit mono wave you'd get 1 (meaning 1 byte for each sample) for a 16bit stereo you'd get 4 (meaning 4 bytes for each sample). Ok next up is another 2byte number that stores the number of Bits used for each sample, this is gonna be 8 or 16. That finishes up our 16bytes used in the format data chunk. IV. Data Chunk Now we get to the actual data chunk. To start off the data chunk we, of course, have 4 bytes (FOURCC ckID) that store the word 'data'. Immediately after that is another 32bit number (ckSize) that says how many bytes the sample data is. IVb. Actual Raw Sample Data Now we've finally come to the actual data that makes up the wave file. The way the data is packed in PCM is determined by the Bits (8 or 16) and the Channels (Mono or Stereo). An 8 bit mono file is simple and looks like: byte ,byte ,byte ,byte |__1sample______|__1sample______|__1sample______|__ with a 16 bit stereo file you'll have data as follows: [channel 0 (2bytes)],[channel 1(2bytes)],[channel 0 (2bytes)],[chan..etc.] |______1 sample________________________|________1 sample_________________| it alternates between channel zero and one using 16 bits to represent each sample. 8 bit samples are unsigned and have values between 0 and 255 meaning the midpoint of an 8bit wave (volume at 0) is 128 16 bit samples are signed and have values between -32768 and 32767 and the midpoint is at 0. That's why 16 bit sound is sooo much better than 8 bit, you get approximately 64000 volume levels per sample with a 16 bit sound and a measely 256 with 8 bit. The raw data runs until the end of the file,however you should know that some programs like to attach some notes or comments at the very end of the data, but this can be ignored. Now let me try and represent all this with some pretty ASCII graphics. |<-----------------------------32 bits---------------------------->| |<--------------16 bits----------->|<-------------16 bits--------->| |<----8 bits---->|<-----8 bits---->|<----8 bits--->|<----8 bits--->| File starts: -------------------------------------------------------------------- | 'R' | 'I' | 'F' | 'F' | -------------------------------------------------------------------- | RIFF Chunk Length | -------------------------------------------------------------------- | 'W' | 'A' | 'V' | 'E' | -------------------------------------------------------------------- | 'f' | 'm' | 't' | ' ' | -------------------------------------------------------------------- | FORMAT Chunk Length (16) | -------------------------------------------------------------------- | Format Tag (1=PCM) | Channels (1=Mono 2=Stereo) | -------------------------------------------------------------------- | Sample Rate (44100,22050,11025,or 8000) | -------------------------------------------------------------------- | Average # of Bytes P/Second (Sample rate*Channels*(Bits/8) | -------------------------------------------------------------------- | Block Align ((Bits/8)*Channels) | Bits per Sample (8 or 16) | -------------------------------------------------------------------- | 'd' | 'a' | 't' | 'a' | -------------------------------------------------------------------- | Data Length (actual length of raw data) | -------------------------------------------------------------------- | | | | | | | | | raw data | | | | | ----------------------------------EOF------------------------------- Well, there you have it. If you're programming this on a system other than a PC (a Mac, Unix, etc.) then you need to do some reading about little and big endian. Other computers store the raw data differently and the way you pull up data if it's 16 bits is backwards (MSB and LSB are switched). Unfortunately this is a pain to explain and i'm already tired. That's a topic for another day =) (look for my .WAV C source code and other sound/game programming stuff at my web page) questions/comments/complaints/donations send to: npawn@geocities.com (try this address first) or npawn@juno.com and my web page is currently at: http://www.geocities.com/SiliconValley/Pines/4223/ ------------------------------------------------------------------------- Copyright 1996,1997 Nathan Davidson Feel free to distribute this file wherever you want - as long as it's for non-profit and the contents of this file aren't changed. -------------------------------------------------------------------------