系列文章目錄
文章目錄
- 系列文章目錄
- 完善
- PCIe Retimer Overview Document Outline
- Switch 維度
- BroadCom
- MicroChip
- Asmedia 祥碩
- Cyan
- 其他
完善
- Functional block diagram,功能框圖
- Key Features and Benefits,主要功能和優點
- Fabric 鏈路
- Multi-root
PCIe Retimer Overview Document Outline
KB90xx (Regli?) PCIe Retimer Overview Document Outline
-
Introduction
- Overview of KB90xx (Regli?) PCIe Retimer family
- Purpose and target applications
-
Kandou’s Unique Chiplet Approach
- Single-silicon development for multiple products (x16, x8, x4 retimers)
- Integration of Glasswing interface for low-power, low-latency chiplet communication
-
KB900x Product Overview
- Key Features:
- Compliance with PCIe Gen5/CXL 2.0 standards
- Low latency (~10ns), insertion loss compensation (up to 36dB@16GHz)
- Dynamic lane skew compensation, automatic offset calibration
- Support for L1PM substates, on-chip diagnostics (eye scope, BER monitors, logic analyzer)
- Voltage Flexibility:
- PWR_1 (VDD_IO): 1.8V
- PWR_2 (VDD_CORE): 0.9V
- PWR_12 (VDD_PHY): 1.8V (Regular) or 1.2V/1.5V (Power Saving Mode)
- Packaging Options:
- KB9003 (x16): 354-ball BGA (8.9mm×22.8mm)
- KB9002 (x8): 332-ball BGA (8.5mm×13.4mm)
- KB9001 (x4): 146-ball BGA (5.5mm×10mm)
- Key Features:
-
KB900x Product Family Comparison (NDA Required)
Feature KB9003 KB9002 KB9001 PCIe Lanes 16 (Bidir) 8 (Bidir) 4 (Bidir) CXL Support CXL 1.0/2.0 CXL 1.0/2.0 CXL 1.0/2.0 Insertion Loss Comp. Rx: 36dB@16GHz Same as KB9003 Same as KB9003 Power Consumption 14.7W 7.4W 3.7W Availability ES: Now CS: May 2024 CS: Q1 2025 -
KB900x Functional Block Diagram
- Integration of AC coupling capacitors (220nF)
- MCU with EEPROM/SPI Flash boot options
- 100MHz HCSL clock source for PCIe reference clocks
-
KB900x Key Features & Benefits
- Co-design compatibility with Astera Labs
- Dynamic channel loss compensation (up to 36dB)
- Secure platform boot support
- Integrated logic analyzer for real-time debugging
- Power-saving modes (1.2V/1.5V supply)
-
KB900x Software Overview
- Besso GUI for advanced debugging features
- Remote diagnostics capabilities
-
KB900x Advanced Debug Features
- Eye Scope & BER Monitors: Analyze signal integrity and bit error rates
- RTSSM Analyzer: Track state transitions across all lanes
- Logic Analyzer: Trigger on signals (rising/falling edges) for upstream/downstream debug
- Link Training Widget: Visualize PCIe Gen1-Gen5 link speeds and states
-
KB900x Link Training States
- States: INACTIVE (gray), FAILED (red), ACTIVE (blue), PASSED (green)
- Components: Detect, Polling, Configuration, Recovery, Loopback, L0 (Operational)
-
KB9003 CEM (x16 Lane) Riser Card
- Evaluation board for KB9003
- USB-connected PC control via Besso app
-
Retimer Use Cases
- Genoa-based interoperability testing
- Ethernet SmartNIC (ConnectX) integration
- MCIO AEC test configurations
-
PCIe TSSM Block Diagram
- Illustrates link training states (LTSSM) and protocol awareness
-
KB900x Debugging Tools
- Firmware update and version display
- Temperature sensor monitoring
- Register dump and soft/hard MCU reset
- Logic analyzer trigger conditions (e.g., pl_ltssm = 0x10 for L0 state)
-
Conclusion & Support
- Global technical assistance from Kandou’s AE/FAE teams
- Accelerated time-to-market through co-design and prototyping support
附錄圖表說明
- KB900x Functional Block Diagram: 展示芯片功能模塊(AC電容、MCU、時鐘源)。
- KB900x RTSSM Analyzer: 實時狀態轉換監控界面截圖。
- KB900x Logic Analyzer: 觸發條件配置與信號采樣界面示例。
Switch 維度
- Part Number, 部件編號
- PCI-Sig Base Spec, PCI-SIG 規范版本
- Lanes,通道數
- Port Count,端口數量
- Product Brief,產品簡介
- ACS/ARI,訪問控制服務/高級可擴展接口
- DMA,直接內存訪問
- Dual/Multi Cast,單播/多播
- Latency,延遲
- Multi-Root/Multi-Host,多根/多主機
- Non-Transparency,非透明性
- Packaging Size,封裝尺寸
- Power Typ. ,典型功耗
- Read Pacing,讀取速率控制
- Virtual Channels,虛擬通道
BroadCom
ExpressFabric Switch and Retimer Solutions
博通PCIe Switch-學習筆記
MicroChip
Switchtec? PCIe? Switches
Asmedia 祥碩
PCIe Switch
Cyan
其他
NVMe All Flash Array (AFA) systems
青芯說,國內現在他們的 PCIe4 Switch 用的最多的是 Storage RAID 卡(下行多個x4,上行一般是x8或者x16),和 GPU的一卡雙芯(上行X16,兩個下行X16)
全球PCIe交換芯片(PCIe Switch)核心廠商包括Broadcom、Microchip和Texas Instruments等,前三大廠商占有全球大約80%的份額。亞太是最大的市場,占有大約75%的份額。產品類型而言,PCIe 3.0是最大的細分,占有大約47%的份額。就下游來說,企業級是最大的下游領域,占有約45%的份額。
AI產業筆記(二):存算一體和PCIe SW芯片
- Balance 拓撲:每個CPU下連接1個PCIe SW芯片,每個PCIe SW芯片最多可以連接5張GPU。遠端GPU P2P通信受限于CPU間的UPI通信瓶頸,比較適用于VDI、公有云、AI訓練等場景,屬目前市場主流拓撲。
- Common 拓撲:CPU0下連接2個PCIe SW芯片,每個PCIe SW芯片連接4張GPU。遠端GPU通信無需跨CPU通信,GPU P2P吞吐量高。適合CPU參與較多任務的P2P通信密集訓練算法模型,如Resnet 101/50。
- Cascade 拓撲:CPU0 直連1個 PCIe SW芯片,此PCIe SW芯片與另一個PCIe SW芯片互聯,每個PCIe SW芯片下面連接 4 張 GPU。Switch 芯片互聯提供最強的 GPU P2P 通信,但CPU到GPU吞吐量小。適合CPU參與較少任務的P2P參數密集型訓練算法模型如VGG-16。
- 雙上行拓撲:每個CPU下連接一個PCIe SW芯片,每個PCIe SW芯片連接4張GPU。CPU利用率最大化,提供最大上行鏈路帶寬(2個x16的速率),但遠端GPU P2P通信受限于CPU間的UPI通信瓶頸,適用于VDI、公有云、AI訓練等場景。