本文比較了幾款CPU對同一測試程序的比較結果,用的是Oracle公有云OCI上的計算實例,均分配的1 OCPU,內存用的默認值,不過內存對此測試程序運行結果不重要。
本文只列結果,不做任何評價。下表中,最后一列為測試程序運行5次的平均耗時。
OCI shape名稱 | CPU 型號 | 基本頻率(GHz) | 測試程序運行耗時平均值(秒) |
---|---|---|---|
VM.Standard3.Flex | Intel Xeon Platinum 8358 | 2.6 | 135.084 |
VM.Optimized3 | Intel Xeon 6354 | 3.0 | 123.65 |
VM.Standard.E4.Flex | AMD EPYC 7J13 | 2.55 | 62.766 |
VM.Standard.E5.Flex | AMD EPYC 7J13 | 2.4 | 53.22 |
VM.Standard.A1.Flex | Ampere Altra Q80-30 | 3.0 | 107.206 |
測試程序:
#include <stdio.h>
#include <math.h>void main()
{double r;int i, j;for (i=0; i< 100000; i++)for (j=0; j< 100000; j++)r = r + sqrt(sqrt(i));}
編譯:
cc -lm a.c
test.sh運行a.out 5次:
for i in 1 2 3 4 5; dotime -p ./a.out
done
求平均值可以將以上輸出存于臨時文件,例如/tmp/1,然后運行一下:
cat /tmp/1|grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
Intel Xeon Platinum 8358
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Stepping: 6
CPU MHz: 2594.024
BogoMIPS: 5188.04
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm ov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm consta nt_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq v mx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invp cid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid e pt_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdse ed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveop t xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx 512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 r dpid md_clear arch_capabilities
測試結果:
$ ./test.sh
real 135.08
user 134.69
sys 0.00
real 135.04
user 134.67
sys 0.00
real 135.14
user 134.67
sys 0.02
real 135.10
user 134.68
sys 0.00
real 135.06
user 134.69
sys 0.00
通過grep real|sed 's/real //'
可以得到所有real time統計:
135.08
135.04
135.14
135.10
135.06
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值為135.084。
Intel Xeon 6354
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
Stepping: 6
CPU MHz: 2993.064
BogoMIPS: 5986.12
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear arch_capabilities
測試結果:
$ ./test.sh
real 123.69
user 123.40
sys 0.00
real 123.66
user 123.37
sys 0.00
real 123.65
user 123.38
sys 0.00
real 123.62
user 123.38
sys 0.00
real 123.63
user 123.38
sys 0.01
通過grep real|sed 's/real //'
可以得到所有real time統計:
123.69
123.66
123.65
123.62
123.63
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值為123.65。
AMD EPYC 7J13
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00
測試結果:
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00
通過grep real|sed 's/real //'
可以得到所有real time統計:
60.26
60.51
64.45
67.76
60.85
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值為62.766。
AMD EPYC 9J14
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9J14 96-Core Processor
Stepping: 1
CPU MHz: 2596.100
BogoMIPS: 5192.20
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good avx512_bf16 clzero xsaveerptr wbnoinvd arat npt nrip_save avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid arch_capabilities
測試結果:
$ ./test.sh
real 52.63
user 52.62
sys 0.00
real 53.29
user 53.19
sys 0.00
real 52.13
user 52.12
sys 0.00
real 52.28
user 52.27
sys 0.00
real 55.77
user 54.79
sys 0.01
通過grep real|sed 's/real //'
可以得到所有real time統計:
52.63
53.29
52.13
52.28
55.77
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值為53.22。
Ampere Altra Q80-30
$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: ARM
Model: 1
Model name: Neoverse-N1
Stepping: r3p1
BogoMIPS: 50.00
L1d cache: unknown size
L1i cache: unknown size
L2 cache: unknown size
NUMA node0 CPU(s): 0
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
測試結果:
$ ./test.sh
real 113.46
user 103.23
sys 0.23
real 103.77
user 103.02
sys 0.03
real 109.15
user 103.01
sys 0.14
real 105.11
user 103.29
sys 0.02
real 104.54
user 103.06
sys 0.02
通過grep real|sed 's/real //'
可以得到所有real time統計:
113.46
103.77
109.15
105.11
104.54
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值為107.206。
參考
- https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#vm-standard