Vortex GPGPU的github流程跑通與功能模塊波形探索(三)

文章目錄

  • 前言
  • 一、./build/ci下的文件結構
  • 二、基于驅動進行仿真過程牽扯的文件
    • 2.1 blackbox.sh文件
    • 2.2 demo文件
    • 2.3 額外牽扯到的ramulator
      • 2.3.1 ramulator簡單介紹
      • 2.3.2 ramulator使用方法
      • 2.3.3 ramulator的輸出
      • 2.3.4 ramulator的復現
        • 2.3.4.1 調試與驗證(第 4.1 節)
        • 2.3.4.2 與其他模擬器的比較(第 4.2 節)
        • 2.3.4.3 DRAM標準的橫斷面研究(第 4.3 節)
      • 2.3.5 功耗測試
      • 2.3.6 簡單使用Ramulator的命令并閱讀輸入配置文件和輸出文件
    • 2.4 額外牽扯到的GEM5
      • 2.4.1 GEM5簡單介紹
      • 2.4.2 GEM5的源代碼樹
      • 2.4.3 基于gem5驅動的ramulator嘗試
  • 總結


前言

盡管通過如下命令分別產生了vcd波形文件和逐條指令的csv文件:

# 導出vcd波形文件
./ci/blackbox.sh --driver=rtlsim --app=demo --debug=1# 導出逐條指令的csv文件
./ci/trace_csv.py -trtlsim run.log -otrace_rtlsim.csv

但在看波形前,還是先確認blackbox.sh文件和trace_csv.py文件分別哪個源文件通過什么樣的選項限制得到波形文件。


一、./build/ci下的文件結構

./build/ci下的文件結構如下:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build/ci$ tree
.
├── blackbox.sh
├── datagen.py
├── install_dependencies.sh
├── regression.sh
├── toolchain_env.sh
├── toolchain_install.sh
├── toolchain_prebuilt.sh
├── trace_csv.py
└── travis_run.py0 directories, 9 files

二、基于驅動進行仿真過程牽扯的文件

2.1 blackbox.sh文件

blackbox.sh文件內容如下,一點點分析,首先是show_usage()show_help()函數:

show_usage()
{echo "Vortex BlackBox Test Driver v1.0"echo "Usage: $0 [[--clusters=#n] [--cores=#n] [--warps=#n] [--threads=#n] [--l2cache] [--l3cache] [[--driver=#name] [--app=#app] [--args=#args] [--debug=#level] [--scope] [--perf=#class] [--rebuild=#n] [--log=logfile] [--help]]"
}show_help()
{show_usageecho "  where"echo "--driver: gpu, simx, rtlsim, oape, xrt"echo "--app: any subfolder test under regression or opencl"echo "--class: 0=disable, 1=pipeline, 2=memsys"echo "--rebuild: 0=disable, 1=force, 2=auto, 3=temp"
}

功能相同,都用于顯示腳本的基本使用方法,測一下:

# 測試show_help()的功能
dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ ./ci/blackbox.sh --help
Vortex BlackBox Test Driver v1.0
Usage: ./ci/blackbox.sh [[--clusters=#n] [--cores=#n] [--warps=#n] [--threads=#n] [--l2cache] [--l3cache] [[--driver=#name] [--app=#app] [--args=#args] [--debug=#level] [--scope] [--perf=#class] [--rebuild=#n] [--log=logfile] [--help]]where
--driver: gpu, simx, rtlsim, oape, xrt
--app: any subfolder test under regression or opencl
--class: 0=disable, 1=pipeline, 2=memsys
--rebuild: 0=disable, 1=force, 2=auto, 3=temp# 測試show_usage()的功能
dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ ./ci/blackbox.sh --invalid-option
Vortex BlackBox Test Driver v1.0
Usage: ./ci/blackbox.sh [[--clusters=#n] [--cores=#n] [--warps=#n] [--threads=#n] [--l2cache] [--l3cache] [[--driver=#name] [--app=#app] [--args=#args] [--debug=#level] [--scope] [--perf=#class] [--rebuild=#n] [--log=logfile] [--help]]

接下來是初始化變量:

SCRIPT_DIR=$(dirname "$0")  # 獲取腳本所在目錄
ROOT_DIR=$SCRIPT_DIR/..     # 設置所在目錄為根目錄# 以下變量和show_help()相對應
DRIVER=simx
APP=sgemm
CLUSTERS=1
CORES=1
WARPS=4
THREADS=4
L2=
L3=
DEBUG=0
DEBUG_LEVEL=0
SCOPE=0
HAS_ARGS=0
PERF_CLASS=0
REBUILD=2
TEMPBUILD=0
LOGFILE=run.log

然后是解析緊跟著./ci/blackbox.sh后的參數:

for i in "$@"
do
case $i in--driver=*)DRIVER=${i#*=}shift;;--app=*)APP=${i#*=}shift;;--clusters=*)CLUSTERS=${i#*=}shift;;--cores=*)CORES=${i#*=}shift;;--warps=*)WARPS=${i#*=}shift;;--threads=*)THREADS=${i#*=}shift;;--l2cache)L2=-DL2_ENABLEshift;;--l3cache)L3=-DL3_ENABLEshift;;--debug=*)DEBUG_LEVEL=${i#*=}DEBUG=1shift;;--scope)SCOPE=1CORES=1shift;;--perf=*)PERF_FLAG=-DPERF_ENABLEPERF_CLASS=${i#*=}shift;;--args=*)ARGS=${i#*=}HAS_ARGS=1shift;;--rebuild=*)REBUILD=${i#*=}shift;;--log=*)LOGFILE=${i#*=}shift;;--help)show_helpexit 0;;*)show_usageexit -1;;
esac
done

其中上述的解析種--help*分別指向了前面測試show_help()show_usage()功能的方法。
然后就是設置驅動測試程序:

case $DRIVER ingpu)DRIVER_PATH=;;simx)DRIVER_PATH=$ROOT_DIR/runtime/simx;;rtlsim)DRIVER_PATH=$ROOT_DIR/runtime/rtlsim;;opae)DRIVER_PATH=$ROOT_DIR/runtime/opae;;xrt)DRIVER_PATH=$ROOT_DIR/runtime/xrt;;*)echo "invalid driver: $DRIVER"exit -1;;
esac

其中./runtime下的文件如下:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build/runtime$ ls
common.mk                 librtlsim.so          libvortex-opae.so    libvortex.so      libxrtsim.so.obj_dir  rtlsim  vortex_afu.h
libopae-c-sim.so          librtlsim.so.obj_dir  libvortex-rtlsim.so  libvortex-xrt.so  Makefile              simx    xrt
libopae-c-sim.so.obj_dir  libsimx.so            libvortex-simx.so    libxrtsim.so      opae                  stub

除了gpu這個驅動選項外,可以比較好的對應起來!
然后是配置應用的路徑:

if [ -d "$ROOT_DIR/tests/opencl/$APP" ];
thenAPP_PATH=$ROOT_DIR/tests/opencl/$APP
elif [ -d "$ROOT_DIR/tests/regression/$APP" ];
thenAPP_PATH=$ROOT_DIR/tests/regression/$APP
elseecho "Application folder not found: $APP"exit -1
fi

可以發現這里限制了測試用例的路徑為./tests/regression./tests/opencl,看了./tests下的其他文件夾:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests$ ls
kernel  Makefile  opencl  regression  riscv  unittestdention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests/regression$ ls
basic  common.mk  conv3x  demo  diverge  dogfood  fence  io_addr  Makefile  matmul  mstress  printf  sgemm2x  sgemmx  sort  stencil3d  vecaddxdention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests/opencl$ ls
bfs           common.mk  dotproduct  kmeans  Makefile  oclprintf  psum   sfilter  sgemm2  spmv     transpose
blackscholes  conv3      guassian    lbm     nearn     psort      saxpy  sgemm    sgemm3  stencil  vecadddention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests/kernel$ ls
common.mk  conform  fibonacci  hello  Makefiledention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests/unittest$ ls
common.mk  Makefile  vx_mallocdention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests/riscv$ ls
benchmarks_32  benchmarks_64  common.mk  isa  Makefile  riscv-vector-tests

kernelriscv的測試用例文件名中大致可以推測,應該也可能可以對這倆文件夾下的測試用例進行測試。咱不妨嘗試對比一下./opencl/Makefile./regression/Makefile

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests$ diff ./opencl/Makefile ./regression/Makefile 
5,24c5,18
< 	$(MAKE) -C vecadd
< 	$(MAKE) -C sgemm
...  # 省略一部分
< 	$(MAKE) -C blackscholes
< 	$(MAKE) -C bfs
---
> 	$(MAKE) -C basic
> 	$(MAKE) -C demo
... # 省略一部分
> 	$(MAKE) -C sgemm2x
> 	$(MAKE) -C stencil3d
27,45c21,34
< 	$(MAKE) -C vecadd run-simx
< 	$(MAKE) -C sgemm run-simx
... # 省略一部分
< 	$(MAKE) -C blackscholes run-simx
< 	$(MAKE) -C bfs run-simx
---
> 	$(MAKE) -C basic run-simx
> 	$(MAKE) -C demo run-simx
... # 省略一部分
> 	$(MAKE) -C sgemm2x run-simx
> 	$(MAKE) -C stencil3d run-simx
48,66c37,50
< 	$(MAKE) -C vecadd run-rtlsim
< 	$(MAKE) -C sgemm run-rtlsim
... # 省略一部分
< 	$(MAKE) -C blackscholes run-rtlsim
< 	$(MAKE) -C bfs run-rtlsim
---
> 	$(MAKE) -C basic run-rtlsim
> 	$(MAKE) -C demo run-rtlsim
... # 省略一部分
> 	$(MAKE) -C sgemm2x run-rtlsim
> 	$(MAKE) -C stencil3d run-rtlsim
69,88c53,66
< 	$(MAKE) -C vecadd clean
< 	$(MAKE) -C sgemm clean
... # 省略一部分
< 	$(MAKE) -C blackscholes clean
< 	$(MAKE) -C bfs clean
\ No newline at end of file
---
> 	$(MAKE) -C basic clean
> 	$(MAKE) -C demo clean
... # 省略一部分
> 	$(MAKE) -C sgemm2x clean
> 	$(MAKE) -C stencil3d clean
\ No newline at end of file

從以上差異可以看出./opencl/Makefile./regression/Makefile僅支持simxrtlsim這兩種模式。此外./kernel/Makefile./riscv/Makefile也僅支持simxrtlsim,估計應該可以設置opaexrt這兩種選項,不過實際上執行./ci/blackbox.sh --driver=opae./ci/blackbox.sh --driver=xrt可以直接運行出結果。結果如下:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ ./ci/blackbox.sh --driver=opae
Running: make -C ./ci/../runtime/opae > /dev/null
Running: make -C ./ci/../tests/opencl/sgemm run-opae
make: Entering directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'
SCOPE_JSON_PATH=/home/dention/Desktop/vortex/vortex/build/runtime/scope.json OPAE_DRV_PATHS=libopae-c-sim.so LD_LIBRARY_PATH=/home/dention/tools/pocl/lib:/home/dention/Desktop/vortex/vortex/build/runtime:/home/dention/tools/llvm-vortex/lib:/lib/x86_64-linux-gnu/: POCL_VORTEX_XLEN=32 LLVM_PREFIX=/home/dention/tools/llvm-vortex POCL_VORTEX_BINTOOL="OBJCOPY=/home/dention/tools/llvm-vortex/bin/llvm-objcopy /home/dention/Desktop/vortex/vortex/kernel/scripts/vxbin.py" POCL_VORTEX_CFLAGS="-march=rv32imaf -mabi=ilp32f -O3 -mcmodel=medany --sysroot=/home/dention/tools/riscv32-gnu-toolchain/riscv32-unknown-elf --gcc-toolchain=/home/dention/tools/riscv32-gnu-toolchain -fno-rtti -fno-exceptions -nostartfiles -nostdlib -fdata-sections -ffunction-sections -I/home/dention/Desktop/vortex/vortex/build/hw -I/home/dention/Desktop/vortex/vortex/kernel/include -DXLEN_32 -DNDEBUG -Xclang -target-feature -Xclang +vortex -Xclang -target-feature -Xclang +zicond -mllvm -disable-loop-idiom-all	" POCL_VORTEX_LDFLAGS="-Wl,-Bstatic,--gc-sections,-T/home/dention/Desktop/vortex/vortex/kernel/scripts/link32.ld,--defsym=STARTUP_ADDR=0x80000000 /home/dention/Desktop/vortex/vortex/build/kernel/libvortex.a -L/home/dention/tools/libc32/lib -lm -lc /home/dention/tools/libcrt32/lib/baremetal/libclang_rt.builtins-riscv32.a" VORTEX_DRIVER=opae ./sgemm -n32
Workload size=32
CONFIGS: num_threads=4, num_warps=4, num_cores=1, num_clusters=1, socket_size=1, local_mem_base=0xffff0000, num_barriers=2
Create context
Create program from kernel source
Upload source buffers
Execute the kernel
Elapsed time: 14155 ms
Download destination buffer
Verify result
PASSED!
PERF: instrs=289393, cycles=159694, IPC=1.812172
make: Leaving directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ ./ci/blackbox.sh --driver=xrt
Running: make -C ./ci/../runtime/xrt > /dev/null
Running: make -C ./ci/../tests/opencl/sgemm run-xrt
make: Entering directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'
SCOPE_JSON_PATH=/home/dention/Desktop/vortex/vortex/build/runtime/scope.json LD_LIBRARY_PATH=/lib:/home/dention/tools/pocl/lib:/home/dention/Desktop/vortex/vortex/build/runtime:/home/dention/tools/llvm-vortex/lib:/lib/x86_64-linux-gnu/: POCL_VORTEX_XLEN=32 LLVM_PREFIX=/home/dention/tools/llvm-vortex POCL_VORTEX_BINTOOL="OBJCOPY=/home/dention/tools/llvm-vortex/bin/llvm-objcopy /home/dention/Desktop/vortex/vortex/kernel/scripts/vxbin.py" POCL_VORTEX_CFLAGS="-march=rv32imaf -mabi=ilp32f -O3 -mcmodel=medany --sysroot=/home/dention/tools/riscv32-gnu-toolchain/riscv32-unknown-elf --gcc-toolchain=/home/dention/tools/riscv32-gnu-toolchain -fno-rtti -fno-exceptions -nostartfiles -nostdlib -fdata-sections -ffunction-sections -I/home/dention/Desktop/vortex/vortex/build/hw -I/home/dention/Desktop/vortex/vortex/kernel/include -DXLEN_32 -DNDEBUG -Xclang -target-feature -Xclang +vortex -Xclang -target-feature -Xclang +zicond -mllvm -disable-loop-idiom-all	" POCL_VORTEX_LDFLAGS="-Wl,-Bstatic,--gc-sections,-T/home/dention/Desktop/vortex/vortex/kernel/scripts/link32.ld,--defsym=STARTUP_ADDR=0x80000000 /home/dention/Desktop/vortex/vortex/build/kernel/libvortex.a -L/home/dention/tools/libc32/lib -lm -lc /home/dention/tools/libcrt32/lib/baremetal/libclang_rt.builtins-riscv32.a" VORTEX_DRIVER=xrt ./sgemm -n32
Workload size=32
CONFIGS: num_threads=4, num_warps=4, num_cores=1, num_clusters=1, socket_size=1, local_mem_base=0xffff0000, num_barriers=2
info: device name=vortex_xrtsim, memory_capacity=0x100000000 bytes, memory_banks=2.
Create context
Create program from kernel source
Upload source buffers
allocating bank0...
reusing bank0...
Execute the kernel
reusing bank0...
reusing bank0...
allocating bank1...
Elapsed time: 12817 ms
Download destination buffer
Verify result
PASSED!
freeing bank0...
freeing bank1...
allocating bank0...
PERF: instrs=289393, cycles=159485, IPC=1.814547
make: Leaving directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'

(有點意思的是,這倆cycles不一樣,為后續探索多了點理由!!!!)

此外,以下這段可以接著修改:

if [ -d "$ROOT_DIR/tests/opencl/$APP" ];
thenAPP_PATH=$ROOT_DIR/tests/opencl/$APP
elif [ -d "$ROOT_DIR/tests/regression/$APP" ];
thenAPP_PATH=$ROOT_DIR/tests/regression/$APP
elseecho "Application folder not found: $APP"exit -1
fi||||\/
## 加入除了opencl和regression之外的kernel和riscv
# kernel支持的APP包括:conform hello fibonacci
# riscv支持的APP包括:benchmarks_${XLEN}
# opencl支持的APP包括:fs dotproduct kmeans oclprintf psum sfilter sgemm2 spmv transpose blackscholes conv3 guassian lbm nearn psort saxpy sgemm sgemm3 stencil vecadd
# regression支持的APP包括:basic conv3x demo diverge dogfood fence io_addr matmul mstress printf sgemm2x sgemmx sort stencil3d vecaddx# 默認的設置:DRIVER=simx  APP=sgemm

接下來是運行應用:

if [ "$DRIVER" = "gpu" ];
then# running applicationif [ $HAS_ARGS -eq 1 ]thenecho "running: OPTS=$ARGS make -C $APP_PATH run-$DRIVER"OPTS=$ARGS make -C $APP_PATH run-$DRIVERstatus=$?elseecho "running: make -C $APP_PATH run-$DRIVER"make -C $APP_PATH run-$DRIVERstatus=$?fiexit $status
fi

插入一次測試,由于筆記本并不支持NVIDIA GPU,但還是測了一下,不出意外,塌方了:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ ./ci/blackbox.sh --driver=gpu 
Running: make -C ./ci/../tests/opencl/sgemm run-gpu
make: Entering directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'
g++ -std=c++17 -Wall -Wextra -Wfatal-errors -Wno-deprecated-declarations -Wno-unused-parameter -Wno-narrowing -pthread -I/home/dention/tools/pocl/include -O2 -DNDEBUG main.cc.o -Wl,-rpath,/home/dention/tools/llvm-vortex/lib -lOpenCL -o sgemm.host
/usr/bin/ld: cannot find -lOpenCL: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [../common.mk:88: sgemm.host] Error 1
make: Leaving directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'

看著error是鏈接階段找不到-lOpenCL庫,開始檢查:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ ldconfig -p | grep OpenCL
libOpenCL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libOpenCL.so.1

這說明有庫但沒鏈接上,想了想可能是pkg-config沒有正確配置OpenCL庫。

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ pkg-config --libs OpenCL
Package OpenCL was not found in the pkg-config search path.
Perhaps you should add the directory containing `OpenCL.pc'
to the PKG_CONFIG_PATH environment variable
No package 'OpenCL' found

所以接下來就是:

sudo apt-get install pkg-config opencl-headers ocl-icd-opencl-dev
export LD_LIBRARY_PATH=/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
export LDFLAGS="-L/lib/x86_64-linux-gnu/ -lOpenCL"
export CPPFLAGS="-I/home/dention/tools/pocl/include"

然后:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build$ ./ci/blackbox.sh --driver=gpu 
Running: make -C ./ci/../tests/opencl/sgemm run-gpu
make: Entering directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'
./sgemm.host -n32
Workload size=32
OpenCL Error: 'clGetPlatformIDs(1, &platform_id, NULL)' returned -1001!
make: *** [../common.mk:91: run-gpu] Error 255
make: Leaving directory '/home/dention/Desktop/vortex/vortex/build/tests/opencl/sgemm'

能輸出Workload size=32的信息,只是說相關的設備沒找到,大概率就是沒匹配到gpu版本的opencv,因此報錯。有條件的可以試試!

接下來是配置和構建:

CONFIGS="-DNUM_CLUSTERS=$CLUSTERS -DNUM_CORES=$CORES -DNUM_WARPS=$WARPS -DNUM_THREADS=$THREADS $L2 $L3 $PERF_FLAG $CONFIGS"
echo "CONFIGS=$CONFIGS"if [ $REBUILD -ne 0 ]
thenBLACKBOX_CACHE=blackbox.$DRIVER.cacheif [ -f "$BLACKBOX_CACHE" ]thenLAST_CONFIGS=`cat $BLACKBOX_CACHE`fiif [ $REBUILD -eq 1 ] || [ "$CONFIGS+$DEBUG+$SCOPE" != "$LAST_CONFIGS" ];thenmake -C $DRIVER_PATH clean-driver > /dev/nullecho "$CONFIGS+$DEBUG+$SCOPE" > $BLACKBOX_CACHEfi
fi

根據用戶指定的配置(如集群數、核心數等),生成CONFIGS字符串。如果REBUILD不為0,腳本會檢查是否需要重新構建驅動程序。如果配置發生變化或用戶強制重新構建,腳本會清理舊的構建文件并重新構建。

接下來是運行應用:

if [ $DEBUG -ne 0 ]
then# running applicationif [ $TEMPBUILD -eq 1 ]then# setup temp directoryTEMPDIR=$(mktemp -d)mkdir -p "$TEMPDIR/$DRIVER"# driver initializationif [ $SCOPE -eq 1 ]thenecho "running: DESTDIR=$TEMPDIR/$DRIVER DEBUG=$DEBUG_LEVEL SCOPE=1 CONFIGS=$CONFIGS make -C $DRIVER_PATH"DESTDIR="$TEMPDIR/$DRIVER" DEBUG=$DEBUG_LEVEL SCOPE=1 CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullelseecho "running: DESTDIR=$TEMPDIR/$DRIVER DEBUG=$DEBUG_LEVEL CONFIGS=$CONFIGS make -C $DRIVER_PATH"DESTDIR="$TEMPDIR/$DRIVER" DEBUG=$DEBUG_LEVEL CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullfi# running applicationif [ $HAS_ARGS -eq 1 ]thenecho "running: VORTEX_RT_PATH=$TEMPDIR OPTS=$ARGS make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1"DEBUG=1 VORTEX_RT_PATH=$TEMPDIR OPTS=$ARGS make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1status=$?elseecho "running: VORTEX_RT_PATH=$TEMPDIR make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1"DEBUG=1 VORTEX_RT_PATH=$TEMPDIR make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1status=$?fi# cleanup temp directorytrap "rm -rf $TEMPDIR" EXITelse# driver initializationif [ $SCOPE -eq 1 ]thenecho "running: DEBUG=$DEBUG_LEVEL SCOPE=1 CONFIGS=$CONFIGS make -C $DRIVER_PATH"DEBUG=$DEBUG_LEVEL SCOPE=1 CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullelseecho "running: DEBUG=$DEBUG_LEVEL CONFIGS=$CONFIGS make -C $DRIVER_PATH"DEBUG=$DEBUG_LEVEL CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullfi# running applicationif [ $HAS_ARGS -eq 1 ]thenecho "running: OPTS=$ARGS make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1"DEBUG=1 OPTS=$ARGS make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1status=$?elseecho "running: make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1"DEBUG=1 make -C $APP_PATH run-$DRIVER > $LOGFILE 2>&1status=$?fifiif [ -f "$APP_PATH/trace.vcd" ]thenmv -f $APP_PATH/trace.vcd .fi
elseif [ $TEMPBUILD -eq 1 ]then# setup temp directoryTEMPDIR=$(mktemp -d)mkdir -p "$TEMPDIR/$DRIVER"# driver initializationif [ $SCOPE -eq 1 ]thenecho "running: DESTDIR=$TEMPDIR/$DRIVER SCOPE=1 CONFIGS=$CONFIGS make -C $DRIVER_PATH"DESTDIR="$TEMPDIR/$DRIVER" SCOPE=1 CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullelseecho "running: DESTDIR=$TEMPDIR/$DRIVER CONFIGS=$CONFIGS make -C $DRIVER_PATH"DESTDIR="$TEMPDIR/$DRIVER" CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullfi# running applicationif [ $HAS_ARGS -eq 1 ]thenecho "running: VORTEX_RT_PATH=$TEMPDIR OPTS=$ARGS make -C $APP_PATH run-$DRIVER"VORTEX_RT_PATH=$TEMPDIR OPTS=$ARGS make -C $APP_PATH run-$DRIVERstatus=$?elseecho "running: VORTEX_RT_PATH=$TEMPDIR make -C $APP_PATH run-$DRIVER"VORTEX_RT_PATH=$TEMPDIR make -C $APP_PATH run-$DRIVERstatus=$?fi# cleanup temp directorytrap "rm -rf $TEMPDIR" EXITelse# driver initializationif [ $SCOPE -eq 1 ]thenecho "running: SCOPE=1 CONFIGS=$CONFIGS make -C $DRIVER_PATH"SCOPE=1 CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullelseecho "running: CONFIGS=$CONFIGS make -C $DRIVER_PATH"CONFIGS="$CONFIGS" make -C $DRIVER_PATH > /dev/nullfi# running applicationif [ $HAS_ARGS -eq 1 ]thenecho "running: OPTS=$ARGS make -C $APP_PATH run-$DRIVER"OPTS=$ARGS make -C $APP_PATH run-$DRIVERstatus=$?elseecho "running: make -C $APP_PATH run-$DRIVER"make -C $APP_PATH run-$DRIVERstatus=$?fifi
fiexit $status

具體功能可以分為三點:

1、如果啟用了調試模式(DEBUG=1),腳本會根據配置運行應用,并將輸出保存到日志文件中。
2、如果啟用了臨時構建(TEMPBUILD=1),腳本會創建一個臨時目錄來構建和運行應用,運行完成后清理臨時目錄。
3、如果未啟用調試模式,腳本會直接運行應用。

以上blackbox.sh已經從blackbox變成whitebox了,咱先告一個段落!

2.2 demo文件

這個文件直接和波形掛鉤,或者說直接和指令掛鉤,為了看明白RTL代碼,該文件繞不開。
在編譯之前,./demo的文件內容如下:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/tests/regression/demo$ tree
.
├── common.h
├── kernel.cpp
├── main.cpp
└── Makefile0 directories, 4 files

而在編譯之后,./demo的文件內容如下:

dention@dention-virtual-machine:~/Desktop/vortex/vortex/build/tests/regression/demo$ tree
.
├── demo
├── kernel.dump
├── kernel.elf
├── kernel.vxbin
├── Makefile
├── ramulator.stats.log
└── trace├── ramulator.log.ch0└── ramulator.log.ch11 directory, 8 files

其中ramulator.stats.log根據文件名,結合2.3.31可以判斷是通過ramulator輸出的日志文件,其內容如下:

Frontend:impl: GEM5MemorySystem:impl: GenericDRAMtotal_num_other_requests: 0total_num_write_requests: 7372total_num_read_requests: 1480memory_system_cycles: 22034DRAM:impl: HBM2AddrMapper:impl: RoBaRaCoChController:impl: Genericid: Channel 0avg_read_latency_0: 60.5959473read_queue_len_avg_0: 1.16075158write_queue_len_0: 340590queue_len_0: 366166num_other_reqs_0: 0num_write_reqs_0: 5508read_latency_0: 44841priority_queue_len_avg_0: 0row_hits_0: 4367priority_queue_len_0: 0row_misses_0: 21row_conflicts_0: 36read_row_misses_0: 4queue_len_avg_0: 16.618227read_row_conflicts_core_0: 22read_row_hits_0: 712write_queue_len_avg_0: 15.4574747read_row_conflicts_0: 22write_row_misses_0: 17write_row_conflicts_0: 14read_queue_len_0: 25576write_row_hits_0: 3655read_row_hits_core_0: 712read_row_misses_core_0: 4num_read_reqs_0: 740Scheduler:impl: FRFCFSRefreshManager:impl: AllBankRowPolicy:impl: OpenRowPolicyControllerPlugin:impl: TraceRecorderController:impl: Genericid: Channel 1read_queue_len_avg_1: 0.92838341priority_queue_len_1: 0write_queue_len_1: 336405num_write_reqs_1: 6379row_hits_1: 4335row_misses_1: 33avg_read_latency_1: 53.7567558queue_len_avg_1: 16.1959248read_queue_len_1: 20456read_row_misses_1: 2priority_queue_len_avg_1: 0read_row_hits_1: 703queue_len_1: 356861read_row_conflicts_1: 33num_read_reqs_1: 740num_other_reqs_1: 0row_conflicts_1: 56write_row_hits_1: 3632write_queue_len_avg_1: 15.2675409write_row_misses_1: 31write_row_conflicts_1: 23read_row_hits_core_0: 703read_row_misses_core_0: 2read_latency_1: 39780read_row_conflicts_core_0: 33Scheduler:impl: FRFCFSRefreshManager:impl: AllBankRowPolicy:impl: OpenRowPolicyControllerPlugin:impl: TraceRecorder

單純就輸出而言,其實和2.3.6的日志輸出有所不同,但從trace的結果來看,八九不離十,估計有腳本經過進一步處理!

然后再看看kernel.dump文件,太多了,貼一部分:

kernel.elf:	file format elf32-littleriscvDisassembly of section .init:80000000 <_start>:
80000000: f3 22 10 fc  	csrr	t0, nw
80000004: 17 03 00 00  	auipc	t1, 0x0
80000008: 13 03 c3 15  	addi	t1, t1, 0x15c
8000000c: 0b 90 62 00  	vx_wspawn	t0, t1
80000010: 93 02 f0 ff  	li	t0, -0x1
80000014: 0b 80 02 00  	vx_tmc	t0
80000018: ef 00 80 11  	jal	0x80000130 <init_regs>

看到這,估計有熟悉的感覺了,trace_rtlsim.csv的部分內容如下:
在這里插入圖片描述
這不就對上了!

再看看ramulator.log.ch0ramulator.log.ch1
在這里插入圖片描述
在這里插入圖片描述
HBM2的兩個channeltrace情況有差異,從第一個數字的角度來考慮,感覺是乒乓讀取操作了!不過具體含義還得仔細推敲,這個值得挖一挖!

另外就是輸入文件的功能,這個放到下一篇介紹!這一篇內容太多了!

├── common.h
├── kernel.cpp
├── main.cpp
└── Makefile

2.3 額外牽扯到的ramulator

2.3.1 ramulator簡單介紹

此處簡單嘗試ramulator,而不去研究這個黑盒的原理,只研究測試用例的輸入和輸出及其含義:

# 下載ramulator
proxychains4 git clone --recursive https://github.com/CMU-SAFARI/ramulator.git
# 編譯
cd ramulator
make -j8

根據官網的描述(直接照搬官網原文的直接翻譯了):在2023年8月發布了一個更新版本的Ramulator,稱為Ramulator 2.0Ramulator 2.0更易于使用、擴展和修改。它還支持當時的最新DRAM標準(例如,DDR5LPDDR5HBM3GDDR6)。Ramulator是一個快速且精確到周期的DRAM模擬器,支持廣泛的商業和學術DRAM標準:

DDR3 (2007), DDR4 (2012)
LPDDR3 (2012), LPDDR4 (2014)
GDDR5 (2009)
WIO (2011), WIO2 (2014)
HBM (2013)

Ramulator的初始發布在以下論文中描述:

Y. Kim, W. Yang, O. Mutlu. "Ramulator: A Fast and Extensible DRAM Simulator". In IEEE Computer Architecture Letters, March 2015.

關于新特性的信息,以及使用Ramulator進行的廣泛內存特性分析,請閱讀:

S. Ghose, T. Li, N. Hajinazar, D. Senol Cali, O. Mutlu. "Demystifying Complex Workload–DRAM Interactions: An Experimental Study". In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), June 2019 (slides). In Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2019.

2.3.2 ramulator使用方法

Ramulator支持三種常見的不同使用模式和補充方法。
1、內存軌跡驅動Ramulator直接從文件中讀取內存軌跡,并僅模擬DRAM子系統。軌跡文件中的每一行代表一個內存請求,以十六進制地址開頭,后跟'R''W'表示讀或寫。
官網提供的測試用例如下:

 $ cd ramulator$ make -j$ ./ramulator configs/DDR3-config.cfg --mode=dram dram.traceSimulation done. Statistics written to DDR3.stats# NOTE: dram.trace is a very short trace file provided only as an example.$ ./ramulator configs/DDR3-config.cfg --mode=dram --stats my_output.txt dram.traceSimulation done. Statistics written to my_output.txt# NOTE: optional --stats flag changes the statistics output filename

2、CPU軌跡驅動Ramulator直接從文件中讀取指令軌跡,并模擬一個簡化的“核心”模型,該模型向DRAM子系統生成內存請求。軌跡文件中的每一行代表一個內存請求,可以有以下兩種格式之一。

<num-cpuinst> <addr-read>:對于包含兩個標記的行,第一個標記表示在內存請求之前的CPU(即非內存)指令數量,第二個標記是讀取的十進制地址。
<num-cpuinst> <addr-read> <addr-writeback>:對于包含三個標記的行,第三個標記是寫回請求的十進制地址,這是由之前的讀取請求引起的臟緩存行驅逐。

官網提供的測試用例如下:

 $ cd ramulator$ make -j$ ./ramulator configs/DDR3-config.cfg --mode=cpu cpu.traceSimulation done. Statistics written to DDR3.stats# NOTE: cpu.trace is a very short trace file provided only as an example.$ ./ramulator configs/DDR3-config.cfg --mode=cpu --stats my_output.txt cpu.traceSimulation done. Statistics written to my_output.txt# NOTE: optional --stats flag changes the statistics output filename

3、gem5驅動Ramulator作為完整系統模擬器(gem5)的一部分運行,從其中接收生成的內存請求。
官網提供的測試用例如下:

 $ hg clone http://repo.gem5.org/gem5-stable$ cd gem5-stable$ hg update -c 10231  # Revert to stable version from 5/31/2014 (10231:0e86fac7254c)$ patch -Np1 --ignore-whitespace < /path/to/ramulator/gem5-0e86fac7254c-ramulator.patch$ cd ext/ramulator$ mkdir Ramulator$ cp -r /path/to/ramulator/src Ramulator# Compile gem5# Run gem5 with `--mem-type=ramulator` and `--ramulator-config=configs/DDR3-config.cfg`

默認情況下,gem5使用原子CPU并采用原子內存訪問方式,即實際上并沒有使用像Ramulator這樣詳細的內存模型。若要以時序模式運行gem5,則需要通過命令行參數--cpu-type指定CPU類型。例如:--cpu-type=timing

4、對于某些DRAM標準,Ramulator還能夠通過依賴VAMPIREDRAMPower 作為后端來報告功耗。

2.3.3 ramulator的輸出

Ramulator在每次運行時會報告一系列統計信息,這些信息會被寫入一個文件。Statistics.h提供了一系列與gem5兼容的統計類。

1、內存軌跡/CPU軌跡驅動:當以內存軌跡驅動或CPU軌跡驅動模式運行時,Ramulator會將這些統計信息寫入一個文件。默認情況下,文件名將是 <standard_name>.stats(例如,DDR3.stats)。你可以通過在--mode選項之后的命令行中添加--stats <filename>來將統計文件寫入不同的文件名。

注意此條里的默認情況下,文件名將是 <standard_name>.stats(例如,DDR3.stats)是很關鍵的線索!

2、gem5驅動Ramulator會自動將其統計信息集成到gem5中。Ramulator的統計信息會直接寫入gem5的統計文件中,并且每個統計信息的名稱前會加上 ramulator.前綴。

2.3.4 ramulator的復現

2.3.4.1 調試與驗證(第 4.1 節)

為了調試和驗證,Ramulator可以打印其發出的每條DRAM命令的軌跡,以及它們的地址和時序信息。為此,請在配置文件中啟用print_cmd_trace變量。

2.3.4.2 與其他模擬器的比較(第 4.2 節)

為了將Ramulator與其他DRAM模擬器進行比較,我們提供了一個腳本,用于自動化此過程:test_ddr3.py。然而,在運行此腳本之前,你必須在腳本的源代碼中指定的行中指定它們的可執行文件和配置文件的位置:

Ramulator
DRAMSim2 (https://wiki.umd.edu/DRAMSim2):test_ddr3.py 第 39-40 行
USIMM (http://www.cs.utah.edu/~rajeev/jwac12):test_ddr3.py 第 54-55 行
DrSim (http://lph.ece.utexas.edu/public/Main/DrSim):test_ddr3.py 第 66-67 行
NVMain (http://wiki.nvmain.org):test_ddr3.py 第 78-79 行

5種模擬器都使用相同的參數進行了配置:

DDR3-1600K (11-11-11)1 通道,1 級,2Gb x8 芯片
FR-FCFS 調度
Open-Row 策略
32/32 條目讀/寫隊列
寫隊列的高/低水位線:28/16

最后,運行test_ddr3.py <num-requests>來啟動模擬。請確保在模擬期間沒有其他活動進程,以獲得準確的內存使用量和CPU時間測量。

2.3.4.3 DRAM標準的橫斷面研究(第 4.3 節)

請使用cputraces文件夾中提供的CPU軌跡(SPEC 2006)來運行基于CPU軌跡的模擬。

2.3.5 功耗測試

為了估算功耗,Ramulator可以將它發出的每條DRAM命令的軌跡以 DRAMPower格式記錄到一個文件中。為此,請在配置文件中啟用record_cmd_trace變量。生成的DRAM命令軌跡(例如,cmd-trace-chan-N-rank-M.cmdtrace)應被輸入到一個兼容的DRAM功耗模擬器(如VAMPIREDRAMPower)中,并使用正確的配置(標準/速度/組織)來估算單個等級(rank)的能耗/功耗(這是VAMPIREDRAMPower的一個當前限制)。

2.3.6 簡單使用Ramulator的命令并閱讀輸入配置文件和輸出文件

運行的命令如下:

# 使用內存軌跡驅動模式
dention@dention-virtual-machine:~/Desktop/ramulator$ ./ramulator configs/DDR3-config.cfg --mode=dram --stats my_output_DDR3_mode_dram.txt dram.trace
Simulation done. Statistics written to my_output_DDR3_mode_dram.txt# 使用CPU軌跡驅動模式
dention@dention-virtual-machine:~/Desktop/ramulator$ ./ramulator configs/DDR3-config.cfg --mode=cpu --stats my_output_DDR3_mode_cpu.txt cpu.trace
tracenum: 1
trace_list[0]: cpu.trace
Warmup complete! Resetting stats...
Starting the simulation...
CPU heartbeat, cycles: 50000000 
CPU heartbeat, cycles: 100000000 
CPU heartbeat, cycles: 150000000 
CPU heartbeat, cycles: 200000000 
CPU heartbeat, cycles: 250000000 
CPU heartbeat, cycles: 300000000 
CPU heartbeat, cycles: 350000000 
CPU heartbeat, cycles: 400000000 
CPU heartbeat, cycles: 450000000 
CPU heartbeat, cycles: 500000000 
CPU heartbeat, cycles: 550000000 
CPU heartbeat, cycles: 600000000 
CPU heartbeat, cycles: 650000000 
CPU heartbeat, cycles: 700000000 
CPU heartbeat, cycles: 750000000 
Simulation done. Statistics written to my_output_DDR3_mode_cpu.txt

輸入配置文件DDR3-config.cfg是:

########################
# Example config file
# Comments start with #
# There are restrictions for valid channel/rank numbersstandard = DDR3channels = 1ranks = 1speed = DDR3_1600Korg = DDR3_2Gb_x8
# record_cmd_trace: (default is off): on, offrecord_cmd_trace = off
# print_cmd_trace: (default is off): on, offprint_cmd_trace = off### Below are parameters only for CPU tracecpu_tick = 4mem_tick = 1
### Below are parameters only for multicore mode
# When early_exit is on, all cores will be terminated when the earliest one finishes.early_exit = on
# early_exit = on, off (default value is on)
# If expected_limit_insts is set, some per-core statistics will be recorded when this limit (or the end of the whole trace if it's shorter than specified limit) is reached. The simulation won't stop and will roll back automatically until the last one reaches the limit.expected_limit_insts = 200000000
# warmup_insts = 100000000warmup_insts = 0cache = no
# cache = no, L1L2, L3, all (default value is no)translation = None
# translation = None, Random (default value is None)
#
########################

簡單解釋這里的配置,首先是DRAM標準和配置

 standard = DDR3      # 指定使用的DRAM標準為DDR3。channels = 1         # 指定通道數為1。ranks = 1            # 指定每個通道的等級數為1。speed = DDR3_1600K   # 指定DRAM的速度等級為DDR3_1600K。org = DDR3_2Gb_x8    # 指定DRAM的組織方式為DDR3_2Gb_x8,即每個芯片為2Gb,數據寬度為8位。

然后是CPU軌跡相關參數

cpu_tick = 4          # 指定 CPU 的時鐘周期(tick)為 4。
mem_tick = 1          # 指定內存的時鐘周期(tick)為 1

最后是多核模式相關參數

### Below are parameters only for multicore mode
# When early_exit is on, all cores will be terminated when the earliest one finishes.early_exit = on      # 在多核模式下,如果設置為on,則當最早的核完成時,所有核都將被終止。
# early_exit = on, off (default value is on)
# If expected_limit_insts is set, some per-core statistics will be recorded when this limit (or the end of the whole trace if it's shorter than specified limit) is reached. The simulation won't stop and will roll back automatically until the last one reaches the limit.expected_limit_insts = 200000000   # 設置每個核預期執行的指令數上限。當達到這個限制(或整個軌跡的結束,如果它比指定的限制短)時,將記錄一些每個核的統計信息。模擬不會停止,并會自動回滾,直到最后一個核達到限制。
# warmup_insts = 100000000warmup_insts = 0    # 設置預熱指令數。這里設置為0,表示不進行預熱。cache = no          # 指定是否啟用緩存。這里設置為no,表示不啟用緩存。
# cache = no, L1L2, L3, all (default value is no)translation = None  # 指定地址轉換方式。這里設置為None,表示不進行地址轉換。
# translation = None, Random (default value is None)

類似配置文件太多了:
在這里插入圖片描述
核心文件應該是:
在這里插入圖片描述
不過先不做解讀了。此外dram.trace是:

0x12345680 R
0x4cbd56c0 W
0x35d46f00 R
0x696fed40 W
0x7876af80 R

cpu.trace是:

3 20734016
1 20846400
6 20734208
8 20841280 20841280
0 20734144
2 20918976 20734016

這倆確實沒看出來是個啥!先跳過!

輸出包括了my_output_DDR3_mode_dram.txtmy_output_DDR3_mode_cpu.txtmy_output_DDR3_mode_dram.txt的內容如下:

               ramulator.active_cycles_0                  57                                      # Total active cycles for level _0ramulator.busy_cycles_0                  57                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0ramulator.serving_requests_0                 148                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0ramulator.average_serving_requests_0            2.551724                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0ramulator.active_cycles_0_0                  57                                      # Total active cycles for level _0_0ramulator.busy_cycles_0_0                  57                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0ramulator.serving_requests_0_0                 148                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0ramulator.average_serving_requests_0_0            2.551724                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0ramulator.active_cycles_0_0_0                   0                                      # Total active cycles for level _0_0_0ramulator.busy_cycles_0_0_0                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_0ramulator.serving_requests_0_0_0                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0
ramulator.average_serving_requests_0_0_0            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0ramulator.active_cycles_0_0_1                   0                                      # Total active cycles for level _0_0_1ramulator.busy_cycles_0_0_1                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_1ramulator.serving_requests_0_0_1                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_1
ramulator.average_serving_requests_0_0_1            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_1ramulator.active_cycles_0_0_2                  49                                      # Total active cycles for level _0_0_2ramulator.busy_cycles_0_0_2                  49                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_2ramulator.serving_requests_0_0_2                  49                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_2
ramulator.average_serving_requests_0_0_2            0.844828                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_2ramulator.active_cycles_0_0_3                  43                                      # Total active cycles for level _0_0_3ramulator.busy_cycles_0_0_3                  43                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_3ramulator.serving_requests_0_0_3                  43                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_3
ramulator.average_serving_requests_0_0_3            0.741379                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_3ramulator.active_cycles_0_0_4                   0                                      # Total active cycles for level _0_0_4ramulator.busy_cycles_0_0_4                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_4ramulator.serving_requests_0_0_4                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_4
ramulator.average_serving_requests_0_0_4            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_4ramulator.active_cycles_0_0_5                  41                                      # Total active cycles for level _0_0_5ramulator.busy_cycles_0_0_5                  41                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_5ramulator.serving_requests_0_0_5                  41                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_5
ramulator.average_serving_requests_0_0_5            0.706897                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_5ramulator.active_cycles_0_0_6                   0                                      # Total active cycles for level _0_0_6ramulator.busy_cycles_0_0_6                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_6ramulator.serving_requests_0_0_6                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_6
ramulator.average_serving_requests_0_0_6            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_6ramulator.active_cycles_0_0_7                  15                                      # Total active cycles for level _0_0_7ramulator.busy_cycles_0_0_7                  15                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_7ramulator.serving_requests_0_0_7                  15                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_7
ramulator.average_serving_requests_0_0_7            0.258621                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_7ramulator.read_transaction_bytes_0                 192                                      # The total byte of read transaction per channelramulator.write_transaction_bytes_0                 128                                      # The total byte of write transaction per channelramulator.row_hits_channel_0_core                   0                                      # Number of row hits per channel per coreramulator.row_misses_channel_0_core                   4                                      # Number of row misses per channel per coreramulator.row_conflicts_channel_0_core                   1                                      # Number of row conflicts per channel per coreramulator.read_row_hits_channel_0_core                   0                                      # Number of row hits for read requests per channel per core[0]                 0.0                                      # 
ramulator.read_row_misses_channel_0_core                   3                                      # Number of row misses for read requests per channel per core[0]                 3.0                                      # 
ramulator.read_row_conflicts_channel_0_core                   0                                      # Number of row conflicts for read requests per channel per core[0]                 0.0                                      # ramulator.write_row_hits_channel_0_core                   0                                      # Number of row hits for write requests per channel per core[0]                 0.0                                      # 
ramulator.write_row_misses_channel_0_core                   1                                      # Number of row misses for write requests per channel per core[0]                 1.0                                      # 
ramulator.write_row_conflicts_channel_0_core                   1                                      # Number of row conflicts for write requests per channel per core[0]                 1.0                                      # ramulator.useless_activates_0_core                   0                                      # Number of useless activations. E.g, ACT -> PRE w/o RD or WRramulator.read_latency_avg_0           44.333333                                      # The average memory latency cycles (in memory time domain) per request for all read requests in this channelramulator.read_latency_sum_0                 133                                      # The memory latency cycles (in memory time domain) sum for all read requests in this channelramulator.req_queue_length_avg_0            1.896552                                      # Average of read and write queue length per memory cycle per channel.ramulator.req_queue_length_sum_0                 110                                      # Sum of read and write queue length per memory cycle per channel.ramulator.read_req_queue_length_avg_0            1.172414                                      # Read queue length average per memory cycle per channel.ramulator.read_req_queue_length_sum_0                  68                                      # Read queue length sum per memory cycle per channel.ramulator.write_req_queue_length_avg_0            0.724138                                      # Write queue length average per memory cycle per channel.ramulator.write_req_queue_length_sum_0                  42                                      # Write queue length sum per memory cycle per channel.ramulator.record_read_hits                 0.0                                      # record read hit count for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.record_read_misses                 0.0                                      # record_read_miss count for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.record_read_conflicts                 0.0                                      # record read conflict count for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.record_write_hits                 0.0                                      # record write hit count for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.record_write_misses                 0.0                                      # record write miss count for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.record_write_conflicts                 0.0                                      # record write conflict for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.dram_capacity          2147483648                                      # Number of bytes in simulated DRAMramulator.dram_cycles                  58                                      # Number of DRAM cycles simulatedramulator.incoming_requests                   5                                      # Number of incoming requests to DRAMramulator.read_requests                   3                                      # Number of incoming read requests to DRAM per core[0]                 3.0                                      # ramulator.write_requests                   2                                      # Number of incoming write requests to DRAM per core[0]                 2.0                                      # ramulator.ramulator_active_cycles                  57                                      # The total number of cycles that the DRAM part is active (serving R/W)ramulator.incoming_requests_per_channel                 5.0                                      # Number of incoming requests to each DRAM channel[0]                 5.0                                      # 
ramulator.incoming_read_reqs_per_channel                 3.0                                      # Number of incoming read requests to each DRAM channel[0]                 3.0                                      # ramulator.physical_page_replacement                   0                                      # The number of times that physical page replacement happens.ramulator.maximum_bandwidth         12800000000                                      # The theoretical maximum bandwidth (Bps)ramulator.in_queue_req_num_sum                 110                                      # Sum of read/write queue lengthramulator.in_queue_read_req_num_sum                  68                                      # Sum of read queue lengthramulator.in_queue_write_req_num_sum                  42                                      # Sum of write queue lengthramulator.in_queue_req_num_avg            1.896552                                      # Average of read/write queue length per memory cycleramulator.in_queue_read_req_num_avg            1.172414                                      # Average of read queue length per memory cycleramulator.in_queue_write_req_num_avg            0.724138                                      # Average of write queue length per memory cycleramulator.record_read_requests                 0.0                                      # record read requests for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.record_write_requests                 0.0                                      # record write requests for this core when it reaches request limit or to the end[0]                 0.0                                      # 

簡單看下注釋內容,如下:

分類統計項注釋
DRAM 活動周期和忙周期ramulator.active_cycles_057級別 _0 的總活動周期
DRAM 活動周期和忙周期ramulator.busy_cycles_057級別 _0 的總忙周期(僅包括刷新時間)
DRAM 服務請求ramulator.serving_requests_0148級別 _0 每個內存周期內服務的讀寫請求總數
DRAM 服務請求ramulator.average_serving_requests_02.551724級別 _0 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_057級別 _0_0 的總活動周期
多級統計ramulator.busy_cycles_0_057級別 _0_0 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0148級別 _0_0 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_02.551724級別 _0_0 每個內存周期內服務的讀寫請求的平均數
讀寫事務字節數ramulator.read_transaction_bytes_0192每個通道的讀事務總字節數
讀寫事務字節數ramulator.write_transaction_bytes_0128每個通道的寫事務總字節數
行命中、未命中和沖突ramulator.row_hits_channel_0_core0每個通道每個核心的行命中次數
行命中、未命中和沖突ramulator.row_misses_channel_0_core4每個通道每個核心的行未命中次數
行命中、未命中和沖突ramulator.row_conflicts_channel_0_core1每個通道每個核心的行沖突次數
讀寫隊列長度ramulator.req_queue_length_avg_01.896552每個通道每個內存周期內讀寫隊列長度的平均值
讀寫隊列長度ramulator.req_queue_length_sum_0110每個通道每個內存周期內讀寫隊列長度的總和
DRAM 容量和周期ramulator.dram_capacity2147483648模擬的 DRAM 容量(字節)
DRAM 容量和周期ramulator.dram_cycles58模擬的 DRAM 周期數
其他統計信息ramulator.incoming_requests5到達 DRAM 的請求數
其他統計信息ramulator.read_requests3每個核心到達 DRAM 的讀請求數
其他統計信息ramulator.write_requests2每個核心到達 DRAM 的寫請求數
隊列長度統計ramulator.in_queue_req_num_sum110讀寫隊列長度的總和
隊列長度統計ramulator.in_queue_read_req_num_sum68讀隊列長度的總和
隊列長度統計ramulator.in_queue_write_req_num_sum42寫隊列長度的總和
平均隊列長度ramulator.in_queue_req_num_avg1.896552每個內存周期內讀寫隊列長度的平均值
平均隊列長度ramulator.in_queue_read_req_num_avg1.172414每個內存周期內讀隊列長度的平均值
平均隊列長度ramulator.in_queue_write_req_num_avg0.724138每個內存周期內寫隊列長度的平均值

my_output_DDR3_mode_cpu.txt的內容如下:

               ramulator.active_cycles_0           122445958                                      # Total active cycles for level _0ramulator.busy_cycles_0           122445958                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0ramulator.serving_requests_0           429759559                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0ramulator.average_serving_requests_0            2.273115                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0ramulator.active_cycles_0_0           122445958                                      # Total active cycles for level _0_0ramulator.busy_cycles_0_0           126324102                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0ramulator.serving_requests_0_0           429759559                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0ramulator.average_serving_requests_0_0            2.273115                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0ramulator.active_cycles_0_0_0           106259048                                      # Total active cycles for level _0_0_0ramulator.busy_cycles_0_0_0           106259048                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_0ramulator.serving_requests_0_0_0           107522760                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0
ramulator.average_serving_requests_0_0_0            0.568717                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_0ramulator.active_cycles_0_0_1           104663261                                      # Total active cycles for level _0_0_1ramulator.busy_cycles_0_0_1           104663261                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_1ramulator.serving_requests_0_0_1           107556805                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_1
ramulator.average_serving_requests_0_0_1            0.568897                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_1ramulator.active_cycles_0_0_2                   0                                      # Total active cycles for level _0_0_2ramulator.busy_cycles_0_0_2                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_2ramulator.serving_requests_0_0_2                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_2
ramulator.average_serving_requests_0_0_2            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_2ramulator.active_cycles_0_0_3           115429984                                      # Total active cycles for level _0_0_3ramulator.busy_cycles_0_0_3           115429984                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_3ramulator.serving_requests_0_0_3           214679974                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_3
ramulator.average_serving_requests_0_0_3            1.135501                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_3ramulator.active_cycles_0_0_4                   0                                      # Total active cycles for level _0_0_4ramulator.busy_cycles_0_0_4                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_4ramulator.serving_requests_0_0_4                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_4
ramulator.average_serving_requests_0_0_4            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_4ramulator.active_cycles_0_0_5                   0                                      # Total active cycles for level _0_0_5ramulator.busy_cycles_0_0_5                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_5ramulator.serving_requests_0_0_5                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_5
ramulator.average_serving_requests_0_0_5            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_5ramulator.active_cycles_0_0_6                   0                                      # Total active cycles for level _0_0_6ramulator.busy_cycles_0_0_6                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_6ramulator.serving_requests_0_0_6                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_6
ramulator.average_serving_requests_0_0_6            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_6ramulator.active_cycles_0_0_7                   0                                      # Total active cycles for level _0_0_7ramulator.busy_cycles_0_0_7                   0                                      # (All-bank refresh only. busy cycles only include refresh time in rank level) The sum of cycles that the DRAM part is active or under refresh for level _0_0_7ramulator.serving_requests_0_0_7                   0                                      # The sum of read and write requests that are served in this DRAM element per memory cycle for level _0_0_7
ramulator.average_serving_requests_0_0_7            0.000000                                      # The average of read and write requests that are served in this DRAM element per memory cycle for level _0_0_7ramulator.read_transaction_bytes_0          1828569600                                      # The total byte of read transaction per channelramulator.write_transaction_bytes_0           914284416                                      # The total byte of write transaction per channelramulator.row_hits_channel_0_core            42766197                                      # Number of row hits per channel per coreramulator.row_misses_channel_0_core               90897                                      # Number of row misses per channel per coreramulator.row_conflicts_channel_0_core                   0                                      # Number of row conflicts per channel per coreramulator.read_row_hits_channel_0_core            28491232                                      # Number of row hits for read requests per channel per core[0]          28491232.0                                      # 
ramulator.read_row_misses_channel_0_core               80168                                      # Number of row misses for read requests per channel per core[0]             80168.0                                      # 
ramulator.read_row_conflicts_channel_0_core                   0                                      # Number of row conflicts for read requests per channel per core[0]                 0.0                                      # ramulator.write_row_hits_channel_0_core            14274965                                      # Number of row hits for write requests per channel per core[0]          14274965.0                                      # 
ramulator.write_row_misses_channel_0_core               10729                                      # Number of row misses for write requests per channel per core[0]             10729.0                                      # 
ramulator.write_row_conflicts_channel_0_core                   0                                      # Number of row conflicts for write requests per channel per core[0]                 0.0                                      # ramulator.useless_activates_0_core                   0                                      # Number of useless activations. E.g, ACT -> PRE w/o RD or WRramulator.read_latency_avg_0          151.034782                                      # The average memory latency cycles (in memory time domain) per request for all read requests in this channelramulator.read_latency_sum_0          6472919250                                      # The memory latency cycles (in memory time domain) sum for all read requests in this channelramulator.req_queue_length_avg_0           50.407613                                      # Average of read and write queue length per memory cycle per channel.ramulator.req_queue_length_sum_0          9530162547                                      # Sum of read and write queue length per memory cycle per channel.ramulator.read_req_queue_length_avg_0           35.435893                                      # Read queue length average per memory cycle per channel.ramulator.read_req_queue_length_sum_0          6699579776                                      # Read queue length sum per memory cycle per channel.ramulator.write_req_queue_length_avg_0           14.971719                                      # Write queue length average per memory cycle per channel.ramulator.write_req_queue_length_sum_0          2830582771                                      # Write queue length sum per memory cycle per channel.ramulator.record_read_hits          28491232.0                                      # record read hit count for this core when it reaches request limit or to the end[0]          28491232.0                                      # ramulator.record_read_misses             80168.0                                      # record_read_miss count for this core when it reaches request limit or to the end[0]             80168.0                                      # ramulator.record_read_conflicts                 0.0                                      # record read conflict count for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.record_write_hits          14274965.0                                      # record write hit count for this core when it reaches request limit or to the end[0]          14274965.0                                      # ramulator.record_write_misses             10729.0                                      # record write miss count for this core when it reaches request limit or to the end[0]             10729.0                                      # ramulator.record_write_conflicts                 0.0                                      # record write conflict for this core when it reaches request limit or to the end[0]                 0.0                                      # ramulator.dram_capacity          2147483648                                      # Number of bytes in simulated DRAMramulator.dram_cycles           189061970                                      # Number of DRAM cycles simulatedramulator.incoming_requests            57142857                                      # Number of incoming requests to DRAMramulator.read_requests            42857143                                      # Number of incoming read requests to DRAM per core[0]          42857143.0                                      # ramulator.write_requests            14285714                                      # Number of incoming write requests to DRAM per core[0]          14285714.0                                      # ramulator.ramulator_active_cycles           122445958                                      # The total number of cycles that the DRAM part is active (serving R/W)ramulator.incoming_requests_per_channel          57142857.0                                      # Number of incoming requests to each DRAM channel[0]          57142857.0                                      # 
ramulator.incoming_read_reqs_per_channel          42857143.0                                      # Number of incoming read requests to each DRAM channel[0]          42857143.0                                      # ramulator.physical_page_replacement                   0                                      # The number of times that physical page replacement happens.ramulator.maximum_bandwidth         12800000000                                      # The theoretical maximum bandwidth (Bps)ramulator.in_queue_req_num_sum          9530162547                                      # Sum of read/write queue lengthramulator.in_queue_read_req_num_sum          6699579776                                      # Sum of read queue lengthramulator.in_queue_write_req_num_sum          2830582771                                      # Sum of write queue lengthramulator.in_queue_req_num_avg           50.407613                                      # Average of read/write queue length per memory cycleramulator.in_queue_read_req_num_avg           35.435893                                      # Average of read queue length per memory cycleramulator.in_queue_write_req_num_avg           14.971719                                      # Average of write queue length per memory cycleramulator.record_read_requests          42857143.0                                      # record read requests for this core when it reaches request limit or to the end[0]          42857143.0                                      # ramulator.record_write_requests          14285714.0                                      # record write requests for this core when it reaches request limit or to the end[0]          14285714.0                                      # ramulator.L3_cache_read_miss                   0                                      # cache read miss countramulator.L3_cache_write_miss                   0                                      # cache write miss countramulator.L3_cache_total_miss                   0                                      # cache total miss countramulator.L3_cache_eviction                   0                                      # number of evict from this level to lower levelramulator.L3_cache_read_access                   0                                      # cache read access countramulator.L3_cache_write_access                   0                                      # cache write access countramulator.L3_cache_total_access                   0                                      # cache total access countramulator.L3_cache_mshr_hit                   0                                      # cache mshr hit countramulator.L3_cache_mshr_unavailable                   0                                      # cache mshr not available countramulator.L3_cache_set_unavailable                   0                                      # cache set not availableramulator.cpu_cycles           756247878                                      # cpu cycle numberramulator.record_cycs_core_0           756247878                                      # Record cycle number for calculating weighted speedup. (Only valid when expected limit instruction number is non zero in config file.)ramulator.record_insts_core_0           200000000                                      # Retired instruction number when record cycle number. (Only valid when expected limit instruction number is non zero in config file.)ramulator.memory_access_cycles_core_0           184376713                                      # memory access cycles in memory time domainramulator.cpu_instructions_core_0           200000000                                      # cpu instruction number

同前面相同的操作,如下:

分類統計項注釋
DRAM 活動周期和忙周期ramulator.active_cycles_0122445958級別 _0 的總活動周期
DRAM 活動周期和忙周期ramulator.busy_cycles_0122445958級別 _0 的總忙周期(僅包括刷新時間)
DRAM 服務請求ramulator.serving_requests_0429759559級別 _0 每個內存周期內服務的讀寫請求總數
DRAM 服務請求ramulator.average_serving_requests_02.273115級別 _0 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0122445958級別 _0_0 的總活動周期
多級統計ramulator.busy_cycles_0_0126324102級別 _0_0 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0429759559級別 _0_0 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_02.273115級別 _0_0 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_0106259048級別 _0_0_0 的總活動周期
多級統計ramulator.busy_cycles_0_0_0106259048級別 _0_0_0 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_0107522760級別 _0_0_0 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_00.568717級別 _0_0_0 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_1104663261級別 _0_0_1 的總活動周期
多級統計ramulator.busy_cycles_0_0_1104663261級別 _0_0_1 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_1107556805級別 _0_0_1 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_10.568897級別 _0_0_1 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_20級別 _0_0_2 的總活動周期
多級統計ramulator.busy_cycles_0_0_20級別 _0_0_2 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_20級別 _0_0_2 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_20.000000級別 _0_0_2 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_3115429984級別 _0_0_3 的總活動周期
多級統計ramulator.busy_cycles_0_0_3115429984級別 _0_0_3 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_3214679974級別 _0_0_3 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_31.135501級別 _0_0_3 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_40級別 _0_0_4 的總活動周期
多級統計ramulator.busy_cycles_0_0_40級別 _0_0_4 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_40級別 _0_0_4 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_40.000000級別 _0_0_4 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_50級別 _0_0_5 的總活動周期
多級統計ramulator.busy_cycles_0_0_50級別 _0_0_5 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_50級別 _0_0_5 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_50.000000級別 _0_0_5 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_60級別 _0_0_6 的總活動周期
多級統計ramulator.busy_cycles_0_0_60級別 _0_0_6 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_60級別 _0_0_6 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_60.000000級別 _0_0_6 每個內存周期內服務的讀寫請求的平均數
多級統計ramulator.active_cycles_0_0_70級別 _0_0_7 的總活動周期
多級統計ramulator.busy_cycles_0_0_70級別 _0_0_7 的總忙周期(僅包括刷新時間)
多級統計ramulator.serving_requests_0_0_70級別 _0_0_7 每個內存周期內服務的讀寫請求總數
多級統計ramulator.average_serving_requests_0_0_70.000000級別 _0_0_7 每個內存周期內服務的讀寫請求的平均數
讀寫事務字節數ramulator.read_transaction_bytes_01828569600每個通道的讀事務總字節數
讀寫事務字節數ramulator.write_transaction_bytes_0914284416每個通道的寫事務總字節數
行命中、未命中和沖突ramulator.row_hits_channel_0_core42766197每個通道每個核心的行命中次數
行命中、未命中和沖突ramulator.row_misses_channel_0_core90897每個通道每個核心的行未命中次數
行命中、未命中和沖突ramulator.row_conflicts_channel_0_core0每個通道每個核心的行沖突次數
行命中、未命中和沖突ramulator.read_row_hits_channel_0_core28491232每個通道每個核心的讀行命中次數
行命中、未命中和沖突ramulator.read_row_misses_channel_0_core80168每個通道每個核心的讀行未命中次數
行命中、未命中和沖突ramulator.read_row_conflicts_channel_0_core0每個通道每個核心的讀行沖突次數
行命中、未命中和沖突ramulator.write_row_hits_channel_0_core14274965每個通道每個核心的寫行命中次數
行命中、未命中和沖突ramulator.write_row_misses_channel_0_core10729每個通道每個核心的寫行未命中次數
行命中、未命中和沖突ramulator.write_row_conflicts_channel_0_core0每個通道每個核心的寫行沖突次數
其他統計信息ramulator.useless_activates_0_core0無用激活次數
其他統計信息ramulator.read_latency_avg_0151.034782每個通道的平均讀延遲(內存時間域)
其他統計信息ramulator.read_latency_sum_06472919250每個通道的讀延遲總和(內存時間域)
讀寫隊列長度ramulator.req_queue_length_avg_050.407613每個通道每個內存周期內讀寫隊列長度的平均值
讀寫隊列長度ramulator.req_queue_length_sum_09530162547每個通道每個內存周期內讀寫隊列長度的總和
讀寫隊列長度ramulator.read_req_queue_length_avg_035.435893每個通道每個內存周期內讀隊列長度的平均值
讀寫隊列長度ramulator.read_req_queue_length_sum_06699579776每個通道每個內存周期內讀隊列長度的總和
讀寫隊列長度ramulator.write_req_queue_length_avg_014.971719每個通道每個內存周期內寫隊列長度的平均值
讀寫隊列長度ramulator.write_req_queue_length_sum_02830582771每個通道每個內存周期內寫隊列長度的總和
DRAM 容量和周期ramulator.dram_capacity2147483648模擬的 DRAM 容量(字節)
DRAM 容量和周期ramulator.dram_cycles189061970模擬的 DRAM 周期數
其他統計信息ramulator.incoming_requests57142857到達 DRAM 的請求數
其他統計信息ramulator.read_requests42857143每個核心到達 DRAM 的讀請求數
其他統計信息ramulator.write_requests14285714每個核心到達 DRAM 的寫請求數
隊列長度統計ramulator.in_queue_req_num_sum9530162547讀寫隊列長度的總和
隊列長度統計ramulator.in_queue_read_req_num_sum6699579776讀隊列長度的總和
隊列長度統計ramulator.in_queue_write_req_num_sum2830582771寫隊列長度的總和
平均隊列長度ramulator.in_queue_req_num_avg50.407613每個內存周期內讀寫隊列長度的平均值
平均隊列長度ramulator.in_queue_read_req_num_avg35.435893每個內存周期內讀隊列長度的平均值
平均隊列長度ramulator.in_queue_write_req_num_avg14.971719每個內存周期內寫隊列長度的平均值

以上關于Ramulator的介紹暫時到此為止!

2.4 額外牽扯到的GEM5

2.4.1 GEM5簡單介紹

GEM5模擬器是一個用于計算機系統架構研究的模塊化平臺,涵蓋了系統級架構以及處理器微架構。它主要用于評估新的硬件設計、系統軟件變更,以及編譯時和運行時的系統優化。

2.4.2 GEM5的源代碼樹

主源代碼樹包括以下子目錄:

  • build_opts:gem5 的預設默認配置
  • build_tools:gem5 構建過程內部使用的工具
  • configs:示例模擬配置腳本
  • ext:構建 gem5 所需的不太常見的外部包
  • include:供其他程序使用的頭文件
  • site_scons:構建系統的模塊化組件
  • src:gem5 模擬器的源代碼。C++ 源代碼、Python 包裝器和 Python 標準庫都位于此目錄中。
  • system:為模擬系統提供的一些可選系統軟件的源代碼
  • tests:回歸測試
  • util:有用的工具程序和文件

因為GEM5模擬器確實很復雜,我暫時不想牽扯太多,只把前面與Ramulator相關的GEM5進行跑通!

2.4.3 基于gem5驅動的ramulator嘗試

這個有點棘手,放到下一篇展開!遇到了不少bug,得修一修!


總結

這一篇為分析RTL代碼做了點準備,探索了blackbox.sh的輸入輸出,ramulator的輸入輸出以及簡單介紹了GEM5

挖了不少坑:
1、xrtopae兩種驅動模式得到的cycles不一樣;
2、HBM2的兩個channeltrace情況探討。

留給下一篇的:
1、trace_csv.py解讀;
2、demo下的四個文件解讀;
3、基于gem5驅動ramulator嘗試。

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/pingmian/82024.shtml
繁體地址,請注明出處:http://hk.pswp.cn/pingmian/82024.shtml
英文地址,請注明出處:http://en.pswp.cn/pingmian/82024.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

公有云AWS基礎架構與核心服務:從概念到實踐

??「炎碼工坊」技術彈藥已裝填! 點擊關注 → 解鎖工業級干貨【工具實測|項目避坑|源碼燃燒指南】 (初學者技術專欄) 一、基礎概念 定義:AWS(Amazon Web Services)是亞馬遜提供的云計算服務,包含計算、存儲、網絡、數據庫等核心能力,通過全球數據中心為用戶提供靈活…

wsl2 不能聯網

wsl2 安裝后用 wifi 共享是能聯網&#xff0c;問題出在公司網絡限制 wsl2 IP 訪問網絡&#xff0c;但是主機可以上網。 解決辦法&#xff0c;在主機用 nginx 設置代理&#xff0c;可能需要開端口權限 server {listen 9000;server_name localhost;location /ubuntu/ {#…

HarmonyOS鴻蒙應用規格開發指南

在鴻蒙生態系統中&#xff0c;應用規格是確保應用符合系統要求的基礎。本文將深入探討鴻蒙應用的規格開發實踐&#xff0c;幫助開發者打造符合規范的應用。 應用包結構規范 1. 基本配置要求 包結構規范 符合規范的應用包結構正確的HAP配置文件完整的應用信息 示例配置&…

異步日志分析:MongoDB與FastAPI的高效存儲揭秘

title: 異步日志分析:MongoDB與FastAPI的高效存儲揭秘 date: 2025/05/22 17:04:56 updated: 2025/05/22 17:04:56 author: cmdragon excerpt: MongoDB與FastAPI集成構建日志分析系統,通過Motor驅動實現異步操作,提升數據處理效率。使用Pydantic進行數據驗證,配置環境變量…

[原理理解] 超分使用到的RAM模型和LLAVA模型

文章目錄 前述RAM 模型介紹LLAVA 模型介紹 前述 最近在研究基于diffusion的超分模型&#xff0c;發現基本都文本編碼的時候都需要用到RAM模型或者LLAVA模型&#xff0c;兩個有什么區別呢&#xff1f; RAM 模型介紹 RAM&#xff08;Recognize Anything Model&#xff09; 是用…

基于 SpringBoot + Vue 的海濱體育館管理系統設計與實現

一、項目概述 本項目是一套基于SpringBoot Vue技術棧開發的海濱體育館管理系統&#xff0c;旨在幫助管理者更高效地管理體育館的各項資源和活動&#xff0c;同時也為學生提供方便的借還器材、預約活動等功能。系統采用了前后端分離的架構&#xff0c;后端使用Spring Boot框架…

【時時三省】(C語言基礎)對被調用函數的聲明和函數原型

山不在高&#xff0c;有仙則名。水不在深&#xff0c;有龍則靈。 ----CSDN 時時三省 在一個函數中調用另一個函數&#xff08;即被調用函數&#xff09;需要具備如下條件 ( 1 )首先被調用的函數必須是已經定義的函數(是庫函數或用戶自己定義的函數)&#xff0c;但僅有這一條件…

微軟宣布的五大重要事項|AI日報0520

微軟宣布的五大重要事項 在 Build 大會上&#xff0c;微軟向大家展示了微軟如何構建開放的智能體網絡。它正在重塑技術棧的每一層&#xff0c;微軟的目標是幫助每一位開發者構建能夠賦能世界各地的人們和組織的應用與智能體。消息來源 詳細了解 以下是微軟宣布的五大重要事項…

三、【數據建模篇】:用 Django Models 構建測試平臺核心數據

【數據建模篇】&#xff1a;用 Django Models 構建測試平臺核心數據 前言我們要設計哪些核心數據&#xff1f;準備工作&#xff1a;創建 Django App開始設計數據模型 (Models)1. 通用基礎模型 (可選但推薦)2. 項目模型 (Project)3. 模塊模型 (Module)4. 測試用例模型 (TestCase…

centos原系統安裝了Python3.7.9兼用在安裝一個python3.8

系統有個3.7.9版本的python 但是會遇到錯誤 usr/local/python3/lib/python3.7/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host ‘www.xxx.com’. Adding certificate verification is strongly advi…

道可云人工智能每日資訊|浙江省人民政府印發《關于支持人工智能創新發展的若干措施》

道可云元宇宙每日簡報&#xff08;2025年5月21日&#xff09;訊&#xff0c;今日元宇宙新鮮事有&#xff1a; 浙江省人民政府印發《關于支持人工智能創新發展的若干措施》 為搶占人工智能發展制高點&#xff0c;打造全球人工智能創新發展高地&#xff0c;浙江省人民政府于近日…

OpenGL ES 基本基本使用、繪制基本2D圖形

OpenGL ES 繪制基礎圖形 OpenGL ES基本概念 OpenGL ES (Embedded-System) 是專為嵌入式設備&#xff08;如手機、平板、VR 設備&#xff09;設計的圖形 API&#xff0c;是 OpenGL 的輕量級版本。 &#xff5c;下面是一個Android使用 OpenGL ES的基本框架 MainActivity 設置一…

JavaScript進階(十二)

第三部分:JavaScript進階 目錄 第三部分:JavaScript進階 十二、深淺拷貝 12.1 淺拷貝 12.2 深拷貝 1. 通過遞歸實現深拷貝 2. js庫lodash里面cloneDeep內部實現了深拷貝 3. 通過JSON.stringify()實現 十三、異常處理 13.1 throw拋異常 13.2 try /catch捕獲異常 1…

大疆制圖跑飛馬D2000的正射與三維模型

1 問題描述 大疆制圖在跑大疆無人機飛的影像的時候&#xff0c;能夠自動識別相機參數并且影像自帶pos信息&#xff0c;但是用飛馬無人機獲取的影像pos信息與影像是分開的&#xff08;飛馬無人機數據處理有講&#xff09;&#xff0c;所以在用大疆制圖時需要對相機參數進行設置…

探索服務網格(Service Mesh):云原生時代的網絡新范式

文章目錄 一、引言二、什么是服務網格基本定義形象比喻 三、服務網格解決了哪些問題微服務通信復雜性可觀察性安全性 四、常見的服務網格實現IstioLinkerdConsul Connect 五、服務網格的應用場景大型微服務架構混合云環境 六、服務網格的未來發展與其他技術的融合標準化和行業規…

Electron+vite+vue3 從0到1搭建項目,開發Win、Mac客戶端

隨著前端技術的發展&#xff0c;出現了所謂的大前端。 大前端則是指基于前端技術延伸出來的各種終端平臺及應用場景&#xff0c;包括APP、桌面端、手表終端、服務端等。 本篇文章主要是和大家一起學習一下使用Electron 如何打包出 Windows 和 Mac 所使用的客戶端APP&#xff…

【DevSecOps】從零到一:用OpenAI Codey與Trivy打造智能化安全掃描 CI/CD 流水線實戰

背景與動機 核心概念&#xff1a;DevSecOps、OpenAI Codey 與 Trivy 什么是 DevSecOpsOpenAI Codey&#xff08;Codex&#xff09;概覽Trivy 掃描器簡介 架構設計與技術選型 流水線實戰&#xff1a;GitHub Actions 集成示例 多平臺適配&#xff1a;GitLab CI 與 Azure DevO…

Swagger、Springfox、Springdoc-openapi 到底是什么關系

記得剛開始想在 SpringBoot 應用中使用 Swagger 生成 API 文檔時&#xff0c;在 Swagger 官網上想找如何在 SpringBoot 中使用的指導&#xff0c;結果肯定是找不到&#xff0c;因為當時不清楚 Swagger 的定位是什么&#xff0c;只知道可以用它生成 API 文檔。所以就想寫這篇文章…

目標檢測DINO-DETR(2023)詳細解讀

文章目錄 對比去噪訓練混合查詢選擇look forward twice 論文全稱為&#xff1a;DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection 提出了三個新的方法&#xff1a; 首先&#xff0c;為了改進一對一的匹配效果&#xff0c;提出了一種對比去噪訓練方法…

深度學習-162-DeepSeek之調用遠程大模型API接口參數結構分析

文章目錄 1 文本對話請求1.1 請求參數1.1.1 參數說明1.1.2 參數總結1.2 返回參數1.2.1 id1.2.2 choices1.2.3 usage1.2.4 created1.2.5 model1.2.6 object1.2.7 參數總結2 應用示例2.1 調用大模型API2.2 返回結果3 參考附錄分析文本對話請求v1/chat/completions的參數結構含義。…