libaom 編碼器實驗 AV1 標準 SVC 分層編碼

SVC編碼

視頻SVC編碼，即Scalable Video Coding（可適性視訊編碼或可分級視頻編碼），是H.264/MPEG-4 AVC編碼的一種擴展，它提供了更大的編碼彈性，并且具有時間可適性（Temporal Scalability）、空間可適性（Spatial Scalability）及訊雜比（質量）可適性（SNR Scalability）三大特性。這種編碼方式允許視頻序列被分割成不同的部分，并根據實際環境選擇解碼的層級，從而適應不同的網絡帶寬和解碼能力。

SVC技術的目標是標準化編碼的高品質視頻碼流，使其能夠獨立拆分成一個或多個子比特流進行解碼，每個子比特流可以代表不同空間或時間分辨率較低或品質較差的視頻信號。SVC技術可以廣泛應用于監控、視頻會議、流媒體IPTV等領域，特別是在網絡丟包環境下，通過丟棄部分時域層級實現網絡適應性。

SVC技術的優勢在于其靈活性和網絡適應性，它可以一次編碼產生多個不同質量的視頻流，減少了服務器的編解碼負擔。然而，SVC的解碼復雜度比單層編碼高，且在相同條件下，分級碼流的壓縮效率比單層碼流低約10%。此外，由于SVC較晚成為正式標準，其兼容性和通用性不如AVC，實際應用不如AVC廣泛。

SVC對H.264的語法也進行了擴展，例如對NAL（Network Adaptive Layer）頭進行了擴展，以描述碼流的分級信息，并使用保留的NAL類型14、20編碼增強層碼流。

AV1

AV1是一種開源、免版稅的視頻編碼格式，由開放媒體聯盟（AOMedia）開發，旨在提供比現有標準更高的視頻壓縮效率。AV1是基于VP9的繼任者，結合了多種技術，提供了更多的編碼選項，以適應不同類型的輸入視頻內容。

AV1的主要目標是在現有編解碼器的基礎上獲得顯著的壓縮率提升，同時確保解碼的復雜性和硬件的實際可行性。它提供了以下關鍵編碼技術：

幀間預測運動補償：AV1允許更復雜的參考幀和運動矢量池，擴展了參考幀的數量，并使用高自適應加權算法和源，增強了復合預測。
動態空間與時間運動矢量參考：AV1通過搜索空間和時間候選，獲得更好的運動矢量參考，并通過運動場估計過程，生成時間候選。
重疊塊運動補償（OBMC）：通過平滑地組合從鄰近運動矢量創建的預測，減少塊邊緣附近的預測誤差。
變換塊分區和擴展的轉換內核：AV1支持多種大小的變換單元，以及更豐富的轉換內核集，包括DCT、ADST、flipADST和IDTX等。
熵編碼：AV1使用多符號熵編碼和電平圖系數編碼，提高壓縮效率并簡化編碼器設計。

AV1編碼器在FFmpeg中得到支持，包括libaom（libaom-av1）、SVT-AV1（libsvtav1） 和 rav1e（librav1e） 等編碼器。AV1編碼器提供了不同的碼率控制模式，如恒定質量（CRF）、限制質量等，以適應不同的編碼需求。

NVIDIA GeForce RTX 30系列GPU支持AV1解碼，這標志著視頻內容新紀元的開啟。AV1編碼效率相比H.264最高提升50%，支持10-bit編碼和HDR視頻，為用戶提供了更高的分辨率和幀率體驗。

總的來說，AV1作為一種新興的開源視頻編碼格式，以其高效的壓縮性能和靈活的編碼選項，有望在視頻傳輸和流媒體服務中發揮重要作用。

AV1是默認支持SVC的第一個編解碼器，這使得它在公共互聯網上的應用具有顯著優勢。例如，在WebRTC（Web Real-Time Communications）應用中，AV1的SVC特性可以提供更好的網絡適應性和彈性，同時支持更高的視頻質量和更低的帶寬需求。

此外，AV1的SVC實現還包括了對屏幕內容編碼的優化，這是會議和視頻通話中的一個重要用例。AV1的屏幕編碼工具作為基本功能集成在編解碼器中，而不是作為擴展，這為屏幕共享提供了更高的編碼效率。

libaom實驗SVC 編碼

代碼下載：git clone https://aomedia.googlesource.com/aom
安裝依賴軟件：CMake、Git、編譯器(gcc 6+, clang 7+, Microsoft Visual Studio 2019+ or the latest version of MinGW-w64 (clang64 or ucrt toolchains))、Perl、yasm/nasm、doxygen、EMSDK
編譯：參考 README.md，利用 CMake 進行編譯。

    $ cmake path/to/aom$ make

查看編譯后文件：在 build 目錄下，有對應的庫文件和可執行程序，其中在 examples 目錄里有對應的 libaom 使用例子:

.
├── aom_cx_set_ref
├── decode_to_md5
├── decode_with_drops
├── lightfield_bitstream_parsing
├── lightfield_decoder
├── lightfield_encoder
├── lightfield_tile_list_decoder
├── lossless_encoder
├── noise_model
├── photon_noise_table
├── scalable_decoder
├── scalable_encoder
├── set_maps
├── simple_decoder
├── simple_encoder
├── svc_encoder_rtc
├── twopass_encoder

終端實驗svc_encoder_rtc可執行程序：./svc_encoder_rtc

Usage: ./svc_encoder_rtc <options> input_filename -o output_filename
Options:-f <arg>, --frames=<arg>              Number of frames to encode-o <arg>, --output=<arg>              Output filename-w <arg>, --width=<arg>               Source width-h <arg>, --height=<arg>              Source height-t <arg>, --timebase=<arg>            Timebase (num/den)-b <arg>, --target-bitrate=<arg>      Encoding bitrate, in kilobits per second-sl <arg>, --spatial-layers=<arg>     Number of spatial SVC layers-k <arg>, --kf-dist=<arg>             Number of frames between keyframes-r <arg>, --scale-factors=<arg>       Scale factors (lowest to highest layer)--min-q=<arg>               Minimum quantizer--max-q=<arg>               Maximum quantizer-tl <arg>, --temporal-layers=<arg>    Number of temporal SVC layers-lm <arg>, --layering-mode=<arg>      Temporal layering scheme.-th <arg>, --threads=<arg>            Number of threads to use-aq <arg>, --aqmode=<arg>             AQ mode off/on-d <arg>, --bit-depth=<arg>           Bit depth for codec 8 or 10. 8, 10-sp <arg>, --speed=<arg>              Speed configuration-bl <arg>, --bitrates=<arg>           Bitrates[spatial_layer * num_temporal_layer + temporal_layer]--drop-frame=<arg>          Temporal resampling threshold (buf %)--error-resilient=<arg>     Error resilient flag--output-obu=<arg>          Write OBUs when set to 1. Otherwise write IVF files.--test-decode=<arg>         Attempt to test decoding the output when set to 1. Default is 1.--tune-content=<arg>        Tune content typedefault, screen, film--psnr=<arg>                Show PSNR in status line.

命令行輸入空域 3 層 SVC 編碼：./svc_encoder_rtc -w 1280 -h 720 -k 30 -sl 3 -lm 6 -b 1000 --bitrates=200,300,500 vidyo4_720p_60.yuv -o o.ivf
- 注意：分層數要與分層模式相匹配。

Codec AOMedia Project AV1 Encoder v3.8.3
layers: 3
width 1280, height: 720
num: 1, den: 30, bitrate: 1000
gop size: 30
Total number of processed frames: 600Rate control layer stats for 1 layer(s):For layer#: 0 0 
Bitrate (target vs actual): 200 202.397200
Average frame size (target vs actual): 6666.666667 6357.117241
Average rate_mismatch: 38.170552
Number of input frames, encoded (non-key) frames, and perc dropped frames: 600 580 3.166667For layer#: 1 0 
Bitrate (target vs actual): 300 302.790800
Average frame size (target vs actual): 10000.000000 10441.062069
Average rate_mismatch: 59.582759
Number of input frames, encoded (non-key) frames, and perc dropped frames: 600 580 3.166667For layer#: 2 0 
Bitrate (target vs actual): 500 505.510000
Average frame size (target vs actual): 16666.666667 17431.379310
Average rate_mismatch: 58.197545
Number of input frames, encoded (non-key) frames, and perc dropped frames: 600 580 3.166667Short-time stats, for window of 15 frames:
Average, rms-variance, and percent-fluct: 512.739200 184.254718 35.935368Per layer encoding time/FPS stats for encoder: 0 0 601 1.340471 746.006507 
Per layer encoding time/FPS stats for encoder: 1 0 601 2.559216 390.744619 
Per layer encoding time/FPS stats for encoder: 2 0 601 7.417110 134.823405 Frame cnt and encoding time/FPS stats for encoding: 601 11.316797 88.364225

查看編碼出來的視頻流：可以看到在文件目錄里生成四個 ivf 格式視頻流，分別對應 0、1、2 層空域視頻流，其中 o.ivf 和 2 層一樣。
播放 SVC 視頻流：利用 ffplay 分別播放視頻，0、1、2 層分辨率分別為 360x180、640x360、1280x720。
用流工具 elecard 查看流信息：可以看到 AV1 的流格式為 IVF Start Header | IVF Frame Header | OBU Header |OBU Sequence Header | OUB Frame |…