Branch direction predictor
分支方向預測器
A branch direction predictor generates taken/not-taken predictions of the direction of conditional branch instructions. It sits near the front of the processor pipeline, and is responsible for directing instruction fetch down the (hopefully) correct program execution path. A branch direction predictor is usually used with a branch target buffer (BTB), where the BTB predicts the target addresses and the direction predictor chooses whether to branch to the target or keep fetching along the fall-through path.
分支方向預測器會預測條件分支指令方向的執行/不執行情況。它位于處理器流水線前端附近,負責引導指令提取到(希望)正確的程序執行路徑。分支方向預測器通常與分支目標緩沖區 (BTB) 一起使用,其中 BTB 預測目標地址,方向預測器選擇是分支到目標地址還是繼續沿著 fall-through 路徑提取指令。
Sometime later in the pipeline (typically at branch execution or retire), the results of executed branch instructions are sent back to the branch predictor to train it to predict more accurately in the future by observing past branch behaviour. There can also be pipeline flushes when there is a mispredicted branch.
在流水線的某個稍后階段(通常是在分支執行或退出時),已執行分支指令的結果會被發送回分支預測器,以便通過觀察過去的分支行為來訓練它,使其能夠更準確地預測未來的分支。當出現分支預測錯誤時,也可能會觸發流水線刷新。
Branch direction predictor located in the Fetch stage. The branch predictor makes a prediction using the current pc and history register, with the result of the prediction affecting the next pc value. Training and misprediction requests come from later in the pipeline.
分支方向預測器位于提取階段。分支預測器使用當前 pc 和歷史寄存器進行預測,預測結果會影響下一個 pc 值。訓練和錯誤預測請求來自流水線的后期。
For this exercise, the branch direction predictor is assumed to sit in the fetch stage of a hypothetical processor pipeline shown in the diagram on the right. This exercise builds only the branch direction predictor, indicated by the blue dashed rectangle in the diagram.
在本練習中,假設分支方向預測器位于右圖所示的假設處理器流水線的提取階段。本練習僅構建分支方向預測器,圖中藍色虛線矩形表示。
The branch direction prediction is a combinational path: The?pc
?register is used to compute the taken/not-taken prediction, which affects the next-pc multiplexer to determine the value of?pc
?in the next cycle.
分支方向預測是一條組合路徑:?pc
?寄存器用于計算是否被采取的預測,這會影響下一個 pc 多路復用器以確定下一個周期中?pc
?的值。
Conversely, updates to the pattern history table (PHT) and branch history register take effect at the next positive clock edge, as would be expected for state stored in flip-flops.
相反,模式歷史表 (PHT) 和分支歷史寄存器的更新將在下一個正時鐘邊沿生效,正如觸發器中存儲的狀態所預期的那樣。
Gshare predictor??Gshare 預測器
Branch direction predictors are often structured as tables of counters indexed by the program counter and branch history. The table index is a hash of the branch address and history, and tries to give each branch and history combination its own table entry (or at least, reduce the number of collisions). Each table entry contains a two-bit saturating counter to remember the branch direction when the same branch and history pattern executed in the past.
分支方向預測器通常構造為由程序計數器和分支歷史記錄索引的計數器表。表索引是分支地址和歷史記錄的哈希值,并嘗試為每個分支和歷史記錄組合賦予其自己的表條目(或至少減少沖突次數)。每個表條目包含一個兩位飽和計數器,用于記住過去執行相同分支和歷史記錄模式時的分支方向。
One example of this style of predictor is the gshare predictor[1]. In the gshare algorithm, the branch address (pc
) and history bits "share" the table index bits. The basic gshare algorithm computes an N-bit PHT table index by xoring N branch address bits and N global branch history bits together.
這種預測器的一個例子是 gshare 預測器?[1]?。在 gshare 算法中,分支地址 (?pc
?) 和歷史位“共享”表索引位。基本的 gshare 算法通過對 N 個分支地址位和 N 個全局分支歷史位進行異或運算來計算 N 位 PHT 表索引。
The N-bit index is then used to access one entry of a 2N-entry table of two-bit saturating counters. The value of this counter provides the prediction (0 or 1 = not taken, 2 or 3 = taken).
然后使用 N 位索引訪問 2?N 個兩位飽和計數器表中的一項。該計數器的值用于預測(0 或 1 = 未執行,2 或 3 = 執行)。
Training indexes the table in a similar way. The training?pc
?and history are used to compute the table index. Then, the two-bit counter at that index is incremented or decremented depending on the actual outcome of the branch.
訓練以類似的方式對表進行索引。訓練?pc
?和歷史記錄用于計算表索引。然后,根據分支的實際結果,該索引處的兩位計數器會遞增或遞減。
References??參考
- ?S. McFarling, "Combining Branch Predictors",?WRL Technical Note TN-36, Jun. 1993
?S. McFarling,“?合并分支預測器?”,?WRL 技術說明 TN-36,1993?年 6 月
Description??描述
Build a gshare branch predictor with 7-bit?pc
?and 7-bit global history, hashed (using xor) into a 7-bit index. This index accesses a 128-entry table of two-bit saturating counters (similar to?cs450/counter_2bc). The branch predictor should contain a 7-bit global branch history register (similar to?cs450/history_shift).
構建一個 gshare 分支預測器,其 7 位?pc
?和 7 位全局歷史記錄,并通過異或操作將其散列到一個 7 位索引中。該索引訪問一個包含 128 個條目的兩位飽和計數器表(類似于?cs450/counter_2bc)?。?)。分支預測器應包含一個 7 位全局分支歷史寄存器(類似于?cs450/history_shift?)。
The branch predictor has two sets of interfaces: One for doing predictions and one for doing training. The prediction interface is used in the processor's Fetch stage to ask the branch predictor for branch direction predictions for the instructions being fetched. Once these branches proceed down the pipeline and are executed, the true outcomes of the branches become known. The branch predictor is then trained using the actual branch direction outcomes.
分支預測器有兩組接口:一組用于預測,一組用于訓練。預測接口用于處理器的提取階段,用于請求分支預測器對正在提取的指令進行分支方向預測。一旦這些分支指令沿著流水線向下執行,就會知道分支的真實結果。然后,使用實際的分支方向結果對分支預測器進行訓練。
When a branch prediction is requested (predict_valid
?= 1) for a given?pc
, the branch predictor produces the predicted branch direction and state of the branch history register used to make the prediction. The branch history register is then updated (at the next positive clock edge) for the predicted branch.
當針對給定?pc
?請求分支預測(?predict_valid
?= 1)時,分支預測器會生成預測的分支方向以及用于進行預測的分支歷史寄存器的狀態。然后,分支歷史寄存器會在下一個正時鐘沿根據預測的分支進行更新。
When training for a branch is requested (train_valid
?= 1), the branch predictor is told the?pc
?and branch history register value for the branch that is being trained, as well as the actual branch outcome and whether the branch was a misprediction (needing a pipeline flush). Update the pattern history table (PHT) to train the branch predictor to predict this branch more accurately next time. In addition, if the branch being trained is mispredicted, also recover the branch history register to the state immediately after the mispredicting branch completes execution.
當請求訓練分支時 (?train_valid
?= 1),分支預測器會被告知正在訓練的分支的?pc
?和分支歷史寄存器值,以及實際分支結果以及該分支是否為錯誤預測(需要流水線刷新)。更新模式歷史表 (PHT) 以訓練分支預測器,使其下次能夠更準確地預測該分支。此外,如果正在訓練的分支預測錯誤,還會將分支歷史寄存器恢復到錯誤預測分支執行完成后的狀態。
If training for a misprediction and a prediction (for a different, younger instruction) occurs in the same cycle, both operations will want to modify the branch history register. When this happens, training takes precedence, because the branch being predicted will be discarded anyway. If training and prediction of the same PHT entry happen at the same time, the prediction sees the PHT state before training because training only modifies the PHT at the next positive clock edge. The following timing diagram shows the timing when training and predicting PHT entry 0 at the same time. The training request at cycle 4 changes the PHT entry state in cycle 5, but the prediction request in cycle 4 outputs the PHT state at cycle 4, without considering the effect of the training request in cycle 4.
如果在同一周期內發生針對錯誤預測和預測(針對不同的、較新的指令)的訓練,則這兩個操作都需要修改分支歷史寄存器。當這種情況發生時,訓練優先,因為被預測的分支無論如何都會被丟棄。如果對同一 PHT 條目的訓練和預測同時發生,則預測會在訓練之前看到 PHT 狀態,因為訓練只會在下一個時鐘上升沿修改 PHT。下圖顯示了同時訓練和預測 PHT 條目 0 的時序。周期 4 的訓練請求會改變周期 5 的 PHT 條目狀態,但周期 4 的預測請求會輸出周期 4 的 PHT 狀態,而沒有考慮周期 4 訓練請求的影響。
areset
?is an asynchronous reset that clears the entire PHT to 2b'01 (weakly not-taken). It also clears the global history register to 0.
areset
?是一個異步復位,它會將整個 PHT 清零為 2b'01(弱不執行)。它還會將全局歷史寄存器清零。
module top_module#(parameter N = 128,parameter M = 7
)(input clk,input areset,input predict_valid,input [M-1:0] predict_pc,output predict_taken,output [M-1:0] predict_history,input train_valid,input train_taken,input train_mispredicted,input [M-1:0] train_history,input [M-1:0] train_pc
);reg [1:0] PHT [N-1:0];reg [M-1:0] GHR;assign predict_history = GHR;assign predict_taken = PHT[predict_history ^ predict_pc] > 1;integer i;always@(posedge clk or posedge areset) beginif(areset) beginfor (i=0; i<N; i=i+1) PHT[i] <= 1;GHR <= 0;endelse begin//處理GHR,回滾優先if(train_valid & train_mispredicted)beginGHR <= {train_history[M-2:0],train_taken};endelse if(predict_valid) beginGHR <= {GHR[M-2:0],predict_taken}; end//處理PHTif(train_valid) beginif(train_taken) beginPHT[train_history ^ train_pc] <= PHT[train_history ^ train_pc] == 3 ? 3 : PHT[train_history ^ train_pc] + 1;endelse beginPHT[train_history ^ train_pc] <= PHT[train_history ^ train_pc] == 0 ? 0 : PHT[train_history ^ train_pc] - 1;endend endend
endmodule
還想用之前的模塊,但發現越改越麻煩,還是重寫吧。
那么HDLBits就堂堂完結了,希望秋招能找個好工作