PC and Pi Performance Comparisons
The following results are for the original Classic Benchmarks, comprising Livermore Loops, Linpack 100 and Whetstone applications, for PCs from 1991 and the Pi 5. They tended to be produced by the latest compiler version, available at the time. These probably represent best case Pi 5 comparative performance, mainly better than the Core i5 CPU on a per MHz basis.
To be fair, the later MP-MFLOPS results, included below, reflect the other extreme via SIMD vector performance. However, my present compiling procedures might be confusing for a newbie. For the Pi 5, compiling parameters for all programs used were -O3 and -march=armv8-a for optimisation level 3 using armv8-a architecture. For Intel the method I adopted requires inclusion of compile directives for such as SSE, AVX, AVX2 or AVX512.
For those who only consider maximum performance, the Intel based PC MP-MFLOPS speeds are indicated as being far superior. But on a MFLOPS per MHz basis, the Pi 5 results were between Intel SSE and AVX measurements. Considering these and repeated runs, the Core i5 CPUs (on a laptop in this case) appear to be running at a lower MHz, using 4 threads or more.
Given an application mainly running 4 core vector MP-MFLOPS type code and a much smaller part executing the slow Whetstone scalar MFLOPS type functions, the Pi 5 can appear to be faster than that Core i5 PC. This is shown in the example (tongue in cheek) performance calculations shown below.
Note the Pi 5 / Cray 1 comparisons, particularly Livermore Loops results, the benchmark originally run to validate required performance of the first Cray 1 system. Here, Gmean MFLOPS was the official average, where the Raspberry Pi 5 is indicated as being 194 times faster.
The following results are for the original Classic Benchmarks, comprising Livermore Loops, Linpack 100 and Whetstone applications, for PCs from 1991 and the Pi 5. They tended to be produced by the latest compiler version, available at the time. These probably represent best case Pi 5 comparative performance, mainly better than the Core i5 CPU on a per MHz basis.
To be fair, the later MP-MFLOPS results, included below, reflect the other extreme via SIMD vector performance. However, my present compiling procedures might be confusing for a newbie. For the Pi 5, compiling parameters for all programs used were -O3 and -march=armv8-a for optimisation level 3 using armv8-a architecture. For Intel the method I adopted requires inclusion of compile directives for such as SSE, AVX, AVX2 or AVX512.
For those who only consider maximum performance, the Intel based PC MP-MFLOPS speeds are indicated as being far superior. But on a MFLOPS per MHz basis, the Pi 5 results were between Intel SSE and AVX measurements. Considering these and repeated runs, the Core i5 CPUs (on a laptop in this case) appear to be running at a lower MHz, using 4 threads or more.
Given an application mainly running 4 core vector MP-MFLOPS type code and a much smaller part executing the slow Whetstone scalar MFLOPS type functions, the Pi 5 can appear to be faster than that Core i5 PC. This is shown in the example (tongue in cheek) performance calculations shown below.
Note the Pi 5 / Cray 1 comparisons, particularly Livermore Loops results, the benchmark originally run to validate required performance of the first Cray 1 system. Here, Gmean MFLOPS was the official average, where the Raspberry Pi 5 is indicated as being 194 times faster.
Code:
LOOPS Gmean LLLOOPS MFLOPS MFLOPS MWIPS MFLOPS Device MFLOPSCPU MHz Max Gmean Min Linpack Whets Whets Year per MHzMain Columns V V V VCray 1 80 82.1 11.9 1.2 27 16.2 6.0 1978 0.15Windows or Linux PCsAMD 80386 40 1.2 0.6 0.2 0.5 5.7 0.8 1991 0.0280486 DX2 66 4.9 2.7 0.7 2.6 15 3.3 1992 0.04Pentium 75 24 7.7 1.3 7.6 48 11 1994 0.10Pentium 100 34 12 2.1 12 66 16 1994 0.12Pentium 200 66 22 3.8 132 31 1996 0.11AMD K6 200 68 22 2.7 23 124 26 1997 0.11Pentium Pro 200 121 34 3.6 49 161 41 1995 0.17Pentium II 300 177 51 5.5 48 245 61 1997 0.17AMD K62 500 172 55 6.0 46 309 67 1999 0.11Pentium III 450 267 77 8.3 62 368 92 1999 0.17Pentium 4 1700 1043 187 19 382 603 146 2002 0.11Athlon Tbird 1000 1124 201 23 373 769 161 2000 0.20Core 2 1830 1650 413 40 998 1557 374 2007 0.23Core i5 2300 2326 438 35 1065 1813 428 2009 0.19Athlon 64 2150 2484 447 48 812 1720 355 2005 0.21Phenom II 3000 3894 644 64 1413 2145 424 2009 0.21Core i7 930 3066 2751 732 68 1765 2496 576 2010 0.24Core i7 4820K 3900 5508 1108 88 2680 3114 716 2013 0.28Core i5 1135G7 4150 7505 1387 92 3541 3293 802 2021 0.33Linux PCs AVX New CompilerCore i7 4820K 3900 12878 2615 597 5098 5887 1174 2013 0.67Core i5 1135G7 4150 19794 3568 943 6998 6477 1077 2021 0.86Raspberry Pi 700 140 55 17 42 271 94 2013 0.08Raspberry Pi 2B 900 248 115 42 120 525 244 2015 0.13Raspberry Pi 3B 1200 436 184 56 180 725 324 2016 0.15Raspberry Pi 4B 1500 1861 679 180 957 1883 415 2019 0.35Raspberry Pi 4B 64b 1500 2491 730 212 1060 2269 476 2019 0.35Raspberry Pi 5 64b 2400 10577 2308 734 4136 5843 1206 2023 0.96Core i5 / Pi 5 1.73 1.87 1.55 1.28 1.69 1.11 0.89 0.90Pi 5 / Cray 1 30 129 194 612 153 361 201#################################################################################MP-MFLOPS -----------MFLOPS------------ ------MFLOPS/MHz-----=Threads MHz 1 2 4 8 1 2 4 8Core i7 SSE 3900 23355 46883 88776 119313 6.0 12.0 22.8 30.6Core i7 AVX 3900 45459 91277 172443 184765 11.7 23.4 44.2 47.4Core I5 SSE 4150 33273 64727 86194 119426 8.0 15.6 20.8 28.8Core i5 AVX 4150 64946 128515 153955 225265 15.6 31.0 37.1 54.3Core i5 AVX512 4150 94417 185785 324870 325915 22.8 44.8 78.3 78.5Pi 5 2400 21519 42488 80947 85086 9.0 17.7 33.7 35.5################################################################################# Performance Calculations i5 SSE i5 AVX Pi 5 MOPS MFLOPS secs MFLOPS secs MFLOPS secs 5000 1077 4.64 1077 4.64 1206 4.15 50000 86194 0.58 80947 0.62 50000 153955 0.32 Total 5.22 4.96 4.77
Statistics: Posted by RoyLongbottom — Wed Jan 17, 2024 5:05 pm