About the deep analysis of the A15 architecture, where is he strong?

This year's new mobile phone trend is nothing more than a comprehensive approach to the four cores, but the same is the quad-core, in fact, the actual performance is very different. For example, the quad-core mobile phones for the entry-level mainstream market are generally Cortex-A7 and Cortex-A9 CPU cores. These cores have lower performance, cost and heat, so they are popular in the entry market.

In the high-end smartphones, there have been some new changes. In addition to the quad-core of the Qualcomm Krait series architecture that has emerged last year, the ARM orthodox Cortex-A15 has also embarked on the stage of quad-core mobile phones, such as Samsung's Exynos 5. Octa, NVIDIA Tegra 4.

The Cortex-A15 is the most powerful CPU core architecture in the ARM Cortex-A family and was released in 2010. Texas Instruments was the first (2011) to commission a processor based on this architecture (model OMAP 5).

Compared with ARM's Cortex-A7, Cortex-A9 and other microarchitectures, the Cortex-A15 is very different.

A15 and A9 also have out-of-order execution, but Cortex-A15 has (twice) instruction transmit port and execution resources, instruction decoding capability is also 50% higher, dynamic branch prediction capability is stronger (multi-level branch table cache is adopted) ), the command pick-up bandwidth is stronger (128 bit vs 64 bit), which can make the A15's pipeline execution more efficient. In addition, the A15 uses the VFPv4 floating-point unit design, which can execute FMA instructions and hardware divide instructions. Compared to A9, the peak vector floating-point performance is basically only half of A15.

However, in reality, the A15's opponent should be Qualcomm's own ARMv7A compatible processor architecture Krait. Qualcomm revealed that Krait's architectural details are not many, roughly three instruction decoding ports (same as A15), seven command transmissions. The port (8 for A15) and 4 transmit ports (8 for A15) have a single-cycle delay L0 Cache design of 4KB + 4KB.

If you use the old Dhrystone DMIPS/MHz as a performance measure, Krait is 3.3, A9 is 2.5, and A15 is 3.5. From the paper, Krait is really suitable as an A15 opponent.

However, Dhrystone's shortcomings are obvious. It can be fully loaded into the CPU's L1 cache. This means that the L2 cache cannot be used. (The A15 is an all-in-one design. Krait is a separate design. The integrated design can reduce memory swapping. A large amount of delay caused), hardware efficiency/complexity of out-of-order execution, memory subsystem unit (A15 memory unit can implement a load instruction pre-executed under certain conditions, and whether Krait can have such capability is not clear) A valuable assessment of the impact of many architectural differences on actual performance.

Of course, the DMIPS indicator used by ARM is actually not the Dhrystone 28 years ago, but from the EEBMC Coremark (in fact, Coremark is an improved version of the former, mainly to reduce pre-optimization, stricter rules for testing), but CoreMark is also Can be plugged into the L1 cache of most processors today, Dhrystone can not reflect the real application of mobile devices today still exists here.

Deep analysis of the A15 architecture, where is he strong?

Due to the increasing complexity of the application environment, it is becoming more and more complicated to properly evaluate the performance of a mobile device processor, because now web browsing, 3D games, audio and video, artificial intelligence, etc. of mobile devices cannot be completely plugged into L1. Cache, because these applications involve a lot of data processing.

At this time, the experience and testing methods that people have learned on desktop performance evaluation can be adopted on mobile devices. For CPU testing, the most reasonable test method is to use real-time application source code with multiple calculation scales to compile and test with native code. Under such circumstances, the computing unit and memory unit of the mobile device can be fully tested. The test results are the most informative.

The CPU test that can be officially recognized by the industry (computer industry, academic research) is SPEC.org's SPEC CPU. It uses source code to allow testers to compile to native code for testing. Many processors are in development at the beginning. The SPEC CPU is used as the most important performance evaluation indicator.

The latest version of the SPEC CPU is CPU2006, but CPU2006 is for the current desktop, workstation, server processor application environment, memory capacity (CPU2006 supports multi-threaded testing, so the required memory capacity is quite high, 8 thread processor with 16 GB memory It is also a bit reluctant) and its own storage space (the number of GB space is not compiled, it takes 1xGB after compilation), the requirements are higher, so the use of CPU2006 is not realistic for current mobile devices.

The SPEC CPU is updated every few years. The old version before CPU2006 was CPU2000. Its speed integer performance test can be run on a 1GB mobile device. In the past, even some CPU2000 tests were ported to the GPU. Do accelerated performance testing.

The ARM camp rarely publishes SPEC CPU test results, which of course has a reason, because in the past many times, ARM has only a few hundred megabytes of memory for the devices, and the space for the program is left after being plugged into the operating system. It's even less, and because of the power-saving pre-requisites, the performance of the ARM processor is actually not very good.

Interestingly, this year's ARM camp NVIDIA announced the CPU2000INT test results when the Tegra 4 was released: In the NVIDIA reference platform set at 1.9GHz, the Tegra 4 SPEC PU2000int_base is 1168. This test result is equivalent to 2003. The AMD K8 Sledgehammer 2GHz test results announced at SPEC.org in the fourth quarter of the year.

NVIDIA also conducted the CPU2000 test on Xiaomi Phone 2 (using Qualcomm Snapdragon S4 Pro, APQ8064 1.7GHz), and estimated the S800 based on the variation of S800's S600 relative to S600 in IPC (per-cycle command) and frequency. CPU2000 test results:

From the chart, the S600's CPUINT2000_base test result is less than half of the Tegra 4, which largely reflects the real application difference between the Cortex-A15 and the Krait processor.

It should be pointed out that the test platform of both sides also has some influence. For example, when the Xiaomi mobile phone 2 performs this test, there is no frequency reduction of the CPU frequency. NVIDIA does not explain this.

In general, when the APQ8064 is running at full speed on a quad-core, the frequency will drop from the highest 1.7GHz due to overheating for a period of time. Of course, NVIDIA announced here the CPU2000INT test results in speed mode, which is a single-threaded test, only one CPU core will be used.

Unfortunately, Qualcomm has not raised any objection to this test result (it is said that Qualcomm is not very concerned about the high performance of the processor performance, they call it the baseband to send the CPU), and the configuration of the CPU2000 is quite complicated for the average person. Things, so this test is temporarily not supported by third parties using the same platform test.

VIA Electronics has released a document when it released the Nano X2 processor. It also uses the CPU2000 to test the Nano X2 1.2+GHz and Atom D525. The CPU2000 INT scores of the gcc compiler are 799 and 582 respectively, using Intel. The compiler's scores are 955 and 725 respectively.

NVIDIA's Tegra 4 CPU belongs to the ARMv7A instruction set, so the compiler is probably armcc or gcc. NVIDIA's newly acquired PGI is a veteran compiler vendor. Maybe it can provide an internal beta to NVIDIA, but PGI has never been released before. ARM compiler.

80W Solar Panel

Our Professional 80W solar panel manufacturer is located in China. including Solar Module. PV Solar Module, Silicon PV Solar Module, 80W solar panel for global market.

80W  1

80W solar panel, solar panel, PV solar panel, high efficiency solar panel 80W

Jiangxi Huayang New Energy Co.,Ltd , https://www.huayangenergy.com