A DefineSound tag declares a set of samples of a sound effect or a music.
The sound samples can be compressed or not, stereo or not and 8 or 16 bits. The different modes are not all available in version 2, although the same tag is used in newer versions with additional capabilities.
The f_sound_is_16bits is always set to 1 (16bits samples) if the samples are compressed (neither Raw
nor Uncompressed
).
The f_sound_rate represents the rate at which the samples are defined. The rate at which it will be played on the target computers may differ. The following equation can be used to determine the rate:
rate = 5512.5 * 2 ** f_sound_rate
It yields the following values (the rate of 5512.5 is rounded down to 5512):
|
The f_sound_samples_count value is the exact number of samples not the size of the data in byte. Thus, in stereo, it represents the number of pairs. To know the byte size, use the total size of the tag minus the header (11 or 13 depending on whether the size of the tag is larger than 62 - it is more than likely that it will be 13).
The f_sound_format can be one of the following values:
|
The f_sound_data depends on the sound format. The following describes the different formats as used in the DefineSound and the SoundStreamBlock tags.
8 bits data is saved in an array of signed char
. The value 0 represents silence. The samples can otherwise have values between -128 and +127.
16 bits data is saved in an array of signed short
. The value 0 represents silence. The samples can otherwise have values between -32768 and +32767. By default, the data will be encoded in little endian. However, the RAW
format doesn't specify the endianess of the data saved in that case. You should avoid using RAW
16 bits data. Use Uncompressed
data instead, compress it in some of the available compression formats (including RAW
8 bits data). A player may wish to avoid playing any sound saved in RAW
16 bits to avoid any problem.
Mono sound saves only one channel of sound. It will be played back on both output (left and right) channels. This is often enough for most sound effects and voice.
For better quality music and sound effects, you can save the data in stereo. In this case, the samples for each channel (left and right) are interleaved, with the data for the left channel first. Thus, you will have: LRLRLRLRLR... In 8 bit, you get one byte for the left channel, then one byte for the right, one for the left, one for the right, etc. In 16 bit, you get two bytes for the left then two for the right channel, etc.
The RAW
encoding is an uncompressed endian unspecified encoding. You can use this format to safely save small 8 bits samples sound effects. For 16 bit sound effects, some system may not swap the data before playing it, although it is likely that the buffer is expected to be in little endian.
Audio differential pulse code modulation compression scheme. This is pretty good compression for sound effects.
The ADPCM tables used by the SWF players are as follow:
int swf_adpcm_2bits[ 2] = { -1, 2 }; int swf_adpcm_3bits[ 4] = { -1, -1, 2, 4 }; int swf_adpcm_4bits[ 8] = { -1, -1, -1, -1, 2, 4, 6, 8 }; int swf_adpcm_5bits[16] = { -1, -1, -1, -1, -1, -1, -1, -1, 1, 2, 4, 6, 8, 10, 13, 16 };
The ADPCM data is composed of a 2 bits encoding size (2 to 5 bits) and an array of 4096 left (mono) or left and right (stereo) samples.
struct swf_adpcm_header { unsigned f_encoding : 2; };
The number of bits for the compression is f_encoding + 2
.
struct swf_adpcm_mono { unsigned short f_first_sample; unsigned f_first_index : 6; unsigned f_data[4096] : f_encoding + 2; }; struct swf_adpcm_stereo { unsigned short f_first_sample_left; unsigned f_first_index_left : 6; unsigned short f_first_sample_right; unsigned f_first_index_right : 6; unsigned f_data[8192] : f_encoding + 2; };
IMPORTANT LICENSING NOTES: please, see The entire SSWF project license above for information about the Audio MPEG licensing rights.
The SWF players which support movie v4.x and better will also support MPEG1 audio compression. This is a good quality high compression scheme. The players need to support constant and variable bit rates, and MPEG1 Layer 3, v2 and v2.5. For more information about MPEG you probably want to check out this web site: http://www.mp3-tech.org/.
In SWF movies, you need to save a seeking point (position of the data to play in a given frame) before the MP3 frames themselves. It is also called the initial latency. I will make this clearer once I understand better what it means.
An MP3 frame is described below. This is exactly what you will find in any music file.
struct swf_mp3_header { unsigned f_sync_word : 11; unsigned f_version : 2; unsigned f_layer : 2; unsigned f_no_protection : 1; unsigned f_bit_rate : 4; unsigned f_sample_rate : 2; unsigned f_padding : 1; unsigned f_reserved : 1; unsigned f_channel_mode : 2; unsigned f_mode_extension : 2; unsigned f_copyright : 1; unsigned f_original : 1; unsigned f_emphasis : 2; if(f_no_protection == 0) { unsigned short f_check_sum; } unsigned char f_data[variable size]; };
The f_sync_word are 11 bits set to 1's only. This can be used to synchronize to the next frame without knowing the exact size of the previous frame.
The f_version can be one of the following:
Note: if the MPEG version 2.5 isn't use, then the f_sync_word can be viewed as 12 bits and the f_version as 1 bit.
In SWF movies, the f_layer must be set to III (which is 1). The valid MPEG layers are as follow:
The f_no_protection determines whether a checksum is defined right after the 32 bits header. If there is a checksum, it is a 16 bit value which represents the total of all the words in the frame data.
The f_bit_rate determines the rate at which the following data shall be taken as. The version and layer have also an effect on determining what the rate is from this f_bit_rate value. Since SWF only accepts Layer III data, we can only accepts a few set of rates as follow. MP3 players (and thus SWF players) must support variable bit rates. Thus, each frame may use a different value for the f_bit_rate field.
|
(1) free — means any (variable) bit rate
(2) bad — means you can't properly use this value
The f_sample_rate defines the rate at which the encoded samples will be played at. This rate may vary and be equal or smaller than the rate indicated in the DefineSound header. The rate definition depends on the MPEG version as follow:
|
The f_padding will be set to 1 if the stream includes pads (one extra slot - 8 bits of data). This is used to ensure that the sound is exactly the right size. Useful only if your sound is very long and synchronized with the images.
The f_reserved isn't used and must be set to zero in SWF files.
The f_channel_mode determines the mode used to compress stereophonic audio. Note that the Dual Channel mode is viewed as a stereo stream by SWF. It can be one of the following:
The f_mode_extension determines whether the intensity stereo (L+R — bit 5) and middle side stereo (L-R — bit 4) are used (set bit to 1) or not (set bit to 0) in joint stereo. f_mode_extension is usually always set to 3.
The f_copyright field is a boolean value which specify whether the corresponding audio is copyrighted or not. The default is to set it to 1 (copyrighted).
The f_original field is a boolean value which specify whether the corresponding audio is a copy or the actual original sound track. It's usually set to 0 (a copy) in SWF movies.
The f_emphasis field can be one of the following values. It is rarely used. It tells the decoder to re-equalize the sounds.
This is a newly supported scheme to encode speech (and audio) of either better quality or smaller bit rate. Thus you can either put more sound in your files resulting in a similar file size or make the entire file smaller so it downloads faster.
Somehow, the Nellymoser encoding and decoding patents used by Flash have been released. You may want to look at the mpeg project for information about the format. Feel free to check out the http://www.nellymoser.com web site for more info about this compression scheme.