關于AAC音頻格式基本情況,可參考維基百科http://en.wikipedia.org/wiki/Advanced_Audio_Coding
?
AAC音頻格式分析
AAC音頻格式有ADIF和ADTS:?
ADIF:Audio Data Interchange Format 音頻數據交換格式。這種格式的特征是可以確定的找到這個音頻數據的開始,不需進行在音頻數據流中間開始的解碼,即它的解碼必須在明確定義的開始處進行。故這種格式常用在磁盤文件中。?
ADTS:Audio Data Transport Stream 音頻數據傳輸流。這種格式的特征是它是一個有同步字的比特流,解碼可以在這個流中任何位置開始。它的特征類似于mp3數據流格式。?
簡單說,ADTS可以在任意幀解碼,也就是說它每一幀都有頭信息。ADIF只有一個統一的頭,所以必須得到所有的數據后解碼。且這兩種的header的格式也是不同的,目前一般編碼后的和抽取出的都是ADTS格式的音頻流。?
語音系統對實時性要求較高,基本是這樣一個流程,采集音頻數據,本地編碼,數據上傳,服務器處理,數據下發,本地解碼?
ADTS是幀序列,本身具備流特征,在音頻流的傳輸與處理方面更加合適。?
??
ADTS幀結構:
header | body |
ADTS幀首部結構:
序號 | 域 | 長度(bits) | 說明 |
1 | Syncword | 12 | all bits?must?be 1 |
2 | MPEG version | 1 | 0 for MPEG-4, 1 for MPEG-2 |
3 | Layer | 2 | always 0 |
4 | Protection Absent | 1 | et to 1 if there is no CRC and 0 if there is CRC |
5 | Profile | 2 | the?MPEG-4 Audio Object Type?minus 1 |
6 | MPEG-4 Sampling Frequency Index | 4 | MPEG-4 Sampling Frequency Index?(15 is forbidden) |
7 | Private Stream | 1 | set to 0 when encoding, ignore when decoding |
8 | MPEG-4 Channel Configuration | 3 | MPEG-4 Channel Configuration?(in the case of 0, the channel configuration is sent via an inband PCE) |
9 | Originality | 1 | set to 0 when encoding, ignore when decoding |
10 | Home | 1 | set to 0 when encoding, ignore when decoding |
11 | Copyrighted Stream | 1 | set to 0 when encoding, ignore when decoding |
12 | Copyrighted Start | 1 | set to 0 when encoding, ignore when decoding |
13 | Frame Length | 13 | this value must include 7 or 9 bytes of header length: FrameLength = (ProtectionAbsent == 1 ? 7 : 9) + size(AACFrame) |
14 | Buffer Fullness | 11 | buffer fullness |
15 | Number of AAC Frames | 2 | number of AAC frames (RDBs) in ADTS frame?minus 1, for maximum compatibility always use 1 AAC frame per ADTS frame |
16 | CRC | 16 | CRC if?protection absent?is 0 |
MPEG-4 Audio
- Company:?ISO
- Samples:?http://samples.mplayerhq.hu/MPEG-4/
- Samples:?http://samples.mplayerhq.hu/A-codecs/AAC/
- Samples:?sample repo at standards.iso.org
- Sample Docs:?sample docs
Specification links:
- MPEG-4 Audio:?ISO/IEC 14496-3:2009
- Conformance:?ISO/IEC 14496-26:2010
Contents
?[hide]?- 1?MPEG-4 Audio
- 2?Subparts
- 3?Audio Specific Config
- 4?Audio Object Types
- 5?Sampling Frequencies
- 6?Channel Configurations
MPEG-4 Audio
MPEG-4 includes a system for handling a diverse group of audio formats in a uniform matter. Each format is assigned a unique Audio Object Type (AOT) to represent it. The common format Global header shared by all AOTs is called the Audio Specific Config.?
Subparts
- Subpart 0: Overview
- Subpart 1: Main (Systems?Interaction)
- Subpart 2: Speech coding - HVXC
- Subpart 3: Speech coding - CELP
- Subpart 4: General Audio coding (GA) -?AAC, TwinVQ, BSAC
- Subpart 5: Structured Audio (SA)
- Subpart 6: Text To Speech Interface (TTSI)
- Subpart 7: Parametric Audio Coding - HILN
- Subpart 8: Parametric coding for high quality audio - SSC (and?Parametric Stereo)
- Subpart 9:?MPEG-1/2 Audio?in MPEG-4
- Subpart 10: Lossless coding of oversampled audio - DST
- Subpart 11: Audio lossless coding -?ALS
- Subpart 12: Scalable lossless coding -?SLS
Audio Specific Config
The Audio Specific Config is the global header for MPEG-4 Audio:
5 bits: object type if (object type == 31)6 bits + 32: object type 4 bits: frequency index if (frequency index == 15)24 bits: frequency 4 bits: channel configuration var bits: AOT Specific Config
Audio Object Types
MPEG-4 Audio Object Types:
- 0: Null?
- 1:?AAC?Main
- 2:?AAC?LC (Low Complexity)
- 3:?AAC?SSR (Scalable Sample Rate)
- 4:?AAC?LTP (Long Term Prediction)
- 5: SBR (Spectral Band Replication)
- 6:?AAC?Scalable
- 7:?TwinVQ
- 8:?CELP?(Code Excited Linear Prediction)
- 9: HXVC (Harmonic Vector eXcitation Coding)
- 10: Reserved
- 11: Reserved
- 12: TTSI (Text-To-Speech Interface)
- 13: Main Synthesis
- 14: Wavetable Synthesis
- 15: General MIDI
- 16: Algorithmic Synthesis and Audio Effects
- 17: ER (Error Resilient)?AAC?LC
- 18: Reserved
- 19: ER?AAC?LTP
- 20: ER?AAC?Scalable
- 21: ER?TwinVQ
- 22: ER?BSAC?(Bit-Sliced Arithmetic Coding)
- 23: ER?AAC?LD (Low Delay)
- 24: ER?CELP
- 25: ER HVXC
- 26: ER HILN (Harmonic and Individual Lines plus Noise)
- 27: ER Parametric
- 28: SSC (SinuSoidal Coding)
- 29: PS (Parametric Stereo)
- 30:?MPEG Surround
- 31: (Escape value)
- 32:?Layer-1
- 33:?Layer-2
- 34:?Layer-3
- 35: DST (Direct Stream Transfer)
- 36:?ALS?(Audio Lossless)
- 37:?SLS?(Scalable LosslesS)
- 38:?SLS?non-core
- 39: ER?AAC?ELD (Enhanced Low Delay)
- 40: SMR (Symbolic Music Representation) Simple
- 41: SMR Main
- 42:?USAC?(Unified Speech and Audio Coding) (no?SBR)
- 43: SAOC (Spatial Audio Object Coding)
- 44: LD?MPEG Surround
- 45:?USAC
Sampling Frequencies
There are 13 supported frequencies:
- 0: 96000 Hz
- 1: 88200 Hz
- 2: 64000 Hz
- 3: 48000 Hz
- 4: 44100 Hz
- 5: 32000 Hz
- 6: 24000 Hz
- 7: 22050 Hz
- 8: 16000 Hz
- 9: 12000 Hz
- 10: 11025 Hz
- 11: 8000 Hz
- 12: 7350 Hz
- 13: Reserved
- 14: Reserved
- 15: frequency is written explictly
Channel Configurations
These are the channel configurations:
- 0: Defined in AOT Specifc Config
- 1: 1 channel: front-center
- 2: 2 channels: front-left, front-right
- 3: 3 channels: front-center, front-left, front-right
- 4: 4 channels: front-center, front-left, front-right, back-center
- 5: 5 channels: front-center, front-left, front-right, back-left, back-right
- 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
- 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
- 8-15: Reserved