A quad-core processor is defined as dual-core. What is better multi-core or higher frequency? How to increase laptop battery life

Tutorial

In this article I will try to describe the terminology used to describe systems capable of executing several programs in parallel, that is, multi-core, multi-processor, multi-threaded. Different types parallelism in the IA-32 CPU appeared in different time and in a somewhat inconsistent order. It’s quite easy to get confused in all this, especially considering that operating systems carefully hide details from less sophisticated application programs.

The purpose of the article is to show that with all the variety of possible configurations of multiprocessor, multi-core and multi-threaded systems, opportunities are created for programs running on them both for abstraction (ignoring differences) and for taking into account specifics (the ability to programmatically find out the configuration).

Warning about ®, ™ signs in the article

Mine explains why company employees should use copyright notices in public communications. In this article I had to use them quite often.

CPU

Of course, the oldest, most often used and controversial term is “processor”.

IN modern world a processor is what we buy in a beautiful Retail box or a not-so-nice OEM package. An indivisible entity inserted into a socket on the motherboard. Even if there is no connector and it cannot be removed, that is, if it is tightly soldered, it is one chip.

Mobile systems (phones, tablets, laptops) and most desktops have a single processor. Workstations and servers sometimes boast two or more processors on a single motherboard.

Supporting multiple CPUs in a single system requires numerous design changes. At a minimum, it is necessary to provide them physical connection(provide multiple sockets on the motherboard), resolve issues of processor identification (see later in this article, as well as my note), coordination of memory accesses and interrupt delivery (the interrupt controller must be able to route interrupts to multiple processors) and, of course, support from the operating system. Unfortunately, I could not find a documentary mention of the creation of the first multiprocessor system on Intel processors, but Wikipedia claims that Sequent Computer Systems supplied them already in 1987, using Intel 80386 processors. Support for multiple chips in one system is becoming widespread , starting with Intel® Pentium.

If there are several processors, then each of them has its own connector on the board. Each of them has complete independent copies of all resources, such as registers, execution devices, caches. They share a common memory - RAM. Memory can be connected to them in various and rather non-trivial ways, but this is a separate story beyond the scope of this article. The important thing is that in any case, the illusion of homogeneous shared memory accessible from all processors included in the system should be created for executable programs.

Ready for takeoff! Intel® Desktop Board D5400XS

Core

Historically, multi-cores in Intel IA-32 appeared later than Intel® HyperThreading, but in the logical hierarchy it comes next.

It would seem that if in the system more processors, then its performance is higher (on tasks that can use all resources). However, if the cost of communication between them is too high, then all the gains from parallelism are killed by long delays for the transfer of common data. This is exactly what is observed in multiprocessor systems - both physically and logically they are very far from each other. For effective communication In such conditions, it is necessary to come up with specialized buses, such as Intel® QuickPath Interconnect. Energy consumption, size and price of the final solution, of course, are not reduced by all this. High integration of components should come to the rescue - circuits executing parts of a parallel program must be brought closer to each other, preferably on one chip. In other words, one processor should organize several cores, identical to each other in everything, but working independently.

The first multi-core IA-32 processors from Intel were introduced in 2005. Since then, the average number of cores in server, desktop, and now mobile platforms is growing steadily.

Unlike two single-core processors on the same system sharing only memory, two cores can also share caches and other memory-related resources. Most often, first-level caches remain private (each core has its own), while the second and third levels can be either shared or separate. This system organization allows you to reduce data delivery delays between neighboring cores, especially if they are working on a common task.

Micrograph of a quad-core Intel processor codenamed Nehalem. There are separate cores, a common third-level cache, as well as QPI links to other processors and a common memory controller.

Hyperthread

Until about 2002, the only way to get an IA-32 system capable of running two or more programs in parallel was to use multiprocessor systems. The Intel® Pentium® 4, as well as the Xeon line codenamed Foster (Netburst), introduced a new technology - hyperthreads or hyperthreads - Intel® HyperThreading (hereinafter HT).

There is nothing new under the sun. HT is special case what in the literature is called simultaneous multithreading (SMT). Unlike “real” cores, which are complete and independent copies, in the case of HT, only part of the internal nodes, primarily responsible for storing the architectural state - registers, are duplicated in one processor. The executive nodes responsible for organizing and processing data remain singular, and at any given time are used by at most one of the threads. Like cores, hyperthreads share caches, but from what level depends on the specific system.

I won't try to explain all the pros and cons of SMT designs in general and HT designs in particular. The interested reader can find quite a detailed discussion of the technology in many sources, and, of course, on Wikipedia. However, I will note the following important point, explaining the current limits on the number of hyperthreads in real-world production.

Thread restrictions

In what cases is the presence of “unfair” multi-cores in the form of HT justified? If one application thread is not able to load all the execution nodes inside the kernel, then they can be “lent” to another thread. This is typical for applications that have " bottleneck” not in calculations, but in data access, that is, often generating cache misses and having to wait for data to be delivered from memory. During this time, the core without HT will be forced to idle. The presence of HT allows you to quickly switch free execution nodes to another architectural state (since it is duplicated) and execute its instructions. This is a special case of a technique called latency hiding, when one long operation, during which useful resources are idle, is masked by the parallel execution of other tasks. If the application already has high degree utilization of kernel resources, the presence of hyperthreads will not allow for acceleration - “honest” kernels are needed here.

Typical scenarios for desktop and server applications designed for machine architectures general purpose,have the potential for parallelism enabled by HT. However, this potential is quickly being used up. Perhaps for this reason, on almost all IA-32 processors the number of hardware hyperthreads does not exceed two. In typical scenarios, the gain from using three or more hyperthreads would be small, but the loss in die size, its power consumption and cost is significant.

A different situation is observed in typical tasks performed on video accelerators. Therefore, these architectures are characterized by the use of SMT technology with a larger number of threads. Since Intel® Xeon Phi coprocessors (introduced in 2010) are ideologically and genealogically quite close to video cards, they may have four hyperthreading on each core - a configuration unique to the IA-32.

Logical processor

Of the three described “levels” of parallelism (processors, cores, hyperthreads), some or even all may be missing in a particular system. This is affected by BIOS settings (multi-core and multi-threading are disabled independently), microarchitecture features (for example, HT was absent from the Intel® Core™ Duo, but was brought back with the release of Nehalem) and system events (multi-processor servers can shut down failed processors if faults are detected and continue to “fly” on the remaining ones). How is this multi-level zoo of concurrency visible to the operating system and, ultimately, to application applications?

Further, for convenience, we denote the number of processors, cores and threads in a certain system by three ( x, y, z), Where x is the number of processors, y- the number of cores in each processor, and z- number of hyperthreads in each core. From now on I will call this three topology- an established term that has little to do with the branch of mathematics. Work p = xyz defines the number of entities called logical processors systems. It defines the total number of independent application process contexts in a system with shared memory executing in parallel, which the operating system is forced to take into account. I say "forced" because it cannot control the execution order of two processes on different logical processors. This also applies to hyperthreads: although they run “sequentially” on the same core, the specific order is dictated by the hardware and cannot be observed or controlled by programs.

Most often, the operating system hides from end applications the features of the physical topology of the system on which it is running. For example, the following three topologies: (2, 1, 1), (1, 2, 1) and (1, 1, 2) - the OS will represent two logical processors, although the first of them has two processors, the second - two cores, and the third - just two threads.

Windows Task Manager shows 8 logical processors; but how much is it in processors, cores and hyperthreads?

Linux top shows 4 logical processors.

This is quite convenient for application creators - they do not have to deal with hardware features that are often unimportant for them.

Software definition of topology

Of course, abstracting the topology into a single number of logical processors in some cases creates enough grounds for confusion and misunderstandings (in heated Internet disputes). Computing applications that want to squeeze maximum performance out of hardware require detailed control over where their threads will be placed: closer to each other on adjacent hyperthreads or, conversely, further away on different processors. The speed of communication between logical processors within the same core or processor is much higher than the speed of data transfer between processors. The possibility of heterogeneity in the organization of working memory also complicates the picture.

Information about the topology of the system as a whole, as well as the position of each logical processor in the IA-32, is available using the CPUID instruction. Since the advent of the first multiprocessor systems, the logical processor identification scheme has been expanded several times. To date, its parts are contained in sheets 1, 4 and 11 of the CPUID. Which sheet to look at can be determined from the following flowchart taken from the article:

I will not bore you here with all the details of the individual parts of this algorithm. If there is interest, the next part of this article can be devoted to this. I will refer the interested reader to, which examines this issue in as much detail as possible. Here I will first briefly describe what APIC is and how it relates to topology. Next we'll look at working with sheet 0xB (eleven in decimal), which is currently the last word in “apico-building”.

APIC ID

Local APIC (advanced programmable interrupt controller) is a device (now part of the processor) responsible for handling interrupts coming to a specific logical processor. Each logical processor has its own APIC. And each of them in the system must have unique value APIC ID. This number is used by interrupt controllers for addressing when delivering messages, and by everyone else (for example, the operating system) to identify logical processors. The specification for this interrupt controller has evolved from the Intel 8259 PIC through Dual PIC, APIC and xAPIC to x2APIC.

Currently, the width of the number stored in the APIC ID has reached a full 32 bits, although in the past it was limited to 16, and even earlier - only 8 bits. Today, remnants of the old days are scattered throughout the CPUID, but CPUID.0xB.EDX returns all 32 bits of the APIC ID. On each logical processor that independently executes the CPUID instruction, a different value will be returned.

Clarification of family ties

The APIC ID value itself does not tell you anything about the topology. To find out which two logical processors are located inside one physical processor (i.e., they are “brothers” hyperthreads), which two are inside the same processor, and which ones are completely different processors, you need to compare their APIC ID values. Depending on the degree of relationship, some of their bits will coincide. This information is contained in CPUID.0xB sublists, which are operand encoded in ECX. Each of them describes the position of the bit field of one of the topology levels in EAX (more precisely, the number of bits that need to be shifted to the right in the APIC ID to remove the lower topology levels), as well as the type of this level - hyperthread, core or processor - in ECX.

For logical processors located inside the same core, all APIC ID bits will match, except those belonging to the SMT field. For logical processors located in the same processor, all bits except the Core and SMT fields. Since the number of subsheets for CPUID.0xB may increase, this scheme will allow us to support the description of topologies with a larger number of levels, if the need arises in the future. Moreover, it will be possible to introduce intermediate levels between existing ones.

An important consequence of the organization of this scheme is that in the set of all APIC IDs of all logical processors of the system there may be “holes”, i.e. they won't go sequentially. For example, in multi-core processor with HT turned off, all APIC IDs may turn out to be even, since the least significant bit, responsible for encoding the hyperthread number, will always be zero.

I note that CPUID.0xB is not the only source of information about logical processors available to the operating system. A list of all processors available to it, along with their APIC ID values, is encoded in the MADT ACPI table.

Operating systems and topology

Operating systems provide information about the topology of logical processors to applications using their own interfaces.

IN Linux information The topology information is contained in the /proc/cpuinfo pseudofile as well as in the output of the dmidecode command. In the example below, I filter the contents of cpuinfo on some quad-core system without HT, leaving only entries related to the topology:

Hidden text

ggg@shadowbox:~$ cat /proc/cpuinfo |grep "processor\|physical\ id\|siblings\|core\|cores\|apicid" processor: 0 physical id: 0 siblings: 4 core id: 0 cpu cores: 2 apicid: 0 initial apicid: 0 processor: 1 physical id: 0 siblings: 4 core id: 0 cpu cores: 2 apicid: 1 initial apicid: 1 processor: 2 physical id: 0 siblings: 4 core id: 1 cpu cores: 2 apicid: 2 initial apicid: 2 processor: 3 physical id: 0 siblings: 4 core id: 1 cpu cores: 2 apicid: 3 initial apicid: 3

On FreeBSD, the topology is reported via the sysctl mechanism in the kern.sched.topology_spec variable as XML:

Hidden text

user@host:~$ sysctl kern.sched.topology_spec kern.sched.topology_spec: 0, 1, 2, 3, 4, 5, 6, 7 0, 1, 2, 3, 4, 5, 6, 7 0, 1 THREAD groupSMT group 2, 3 THREAD groupSMT group 4, 5 THREAD groupSMT group 6, 7 THREAD groupSMT group

In MS Windows 8, topology information can be seen in the Task Manager.

Hi all! Sometimes a game or program does not work on full power, because Not all cores are responsible for performance. In this article we will look at how to use all the cores of your processor.

But don't wait magic wand, because if a game or program does not support multi-cores, then nothing can be done unless you rewrite the application again.

How to run all processor cores?

So, there will be several ways. That's why I'm showing first.

Go to start - run or win + r keys

Select your maximum number of processors.

Go to the task manager - ctrl+shift+esc.
Or ctrl+alt+del and task manager.
Or right-click on the control panel and select task manager.

Go to the processes tab. Find the game and right-click on the process. By the way, the game must be running. You can collapse it either Win+D or alt+tab.

Select set match.

Select all and click ok.

To see whether all cores are working or not, go to the performance tab in the task manager.

There will be a diagram in all tabs.

If not, then click again to set the correspondence, leave only CPU 0, click ok. Close the task manager, open it again, repeat the same thing, select all processors and click OK.

In laptops, power saving is sometimes configured in such a way that the settings do not allow all cores to be used.

Win7 - Go to the control panel, go to power options - Change plan settings - change Extra options power supply—processor power management—minimum processor state.
Win8, 10 - Or: Settings - System - Power and Sleep - Advanced Power Settings - Configure Power Plan - Change Advanced Power Settings - Processor Power Management - Minimum Processor Status

For full use, it should be 100%.

How to check how many cores are running?

We launch it and see the number of active cores.

Do not confuse this parameter with the number of virtual processors, which is displayed to the right.

What does the number of processor cores affect?

Many people confuse the concept of number of cores and processor frequency. If we compare this with a person, then the brain is a processor, neurons are nuclei. Cores do not work in all games and applications. If, for example, a game runs 2 processes, one draws a forest and the other a city, and the game is multi-core, then you only need 2 cores to load this picture. And if the game has more processes, then all the cores are used.

And it may be the other way around: a game or application can be written in such a way that only one core can perform one action, and in this situation the processor with the higher frequency and the most well-built architecture will win (usually for this reason).

What are the differences between quad-core and octa-core smartphone processors? The explanation is quite simple. Eight-core chips have twice as much processor cores than in quad-core ones. At first glance, an eight-core processor seems twice as powerful, right? In reality, nothing like that happens. To understand why an eight-core processor does not double the performance of a smartphone, some explanation is required. has already arrived. Eight-core processors, which only recently could only be dreamed of, are becoming increasingly widespread. But it turns out that their task is not to increase the performance of the device.

Quad- and eight-core processors. Performance

The terms "octa-core" and "quad-core" themselves reflect the number of CPU cores.

But the key difference between these two types of processors is the at least as of 2015 - consists of the method of installing processor cores.

With a quad-core processor, all cores can work simultaneously to enable fast and flexible multitasking, smoother 3D gaming, faster camera performance, and more.

Modern eight-core chips, in turn, simply consist of two quad-core processors that distribute different tasks among themselves depending on their type. Most often, an eight-core chip contains a set of four cores with a lower clock speed than the second set. When to perform difficult task, it is, of course, taken over by a faster processor.

A more accurate term than "octa-core" would be "dual quad-core." But it doesn't sound so nice and isn't suitable for marketing purposes. That's why these processors are called eight-core.

Why do we need two sets of processor cores?

What is the reason for combining two sets of processor cores, transferring tasks to one another, in one device? To ensure energy efficiency.

A more powerful CPU consumes more power and the battery needs to be charged more often. And batteries are a much weaker link in a smartphone than processors. As a result, the more powerful the smartphone processor, the more capacious battery he needs it.

However, for most smartphone tasks you will not need such high computing performance as can be provided by modern processor. Navigating between home screens, checking messages, and even web navigation are less processor-intensive tasks.

But HD video, games and working with photos are such tasks. Therefore, eight-core processors are quite practical, although this solution can hardly be called elegant. More weak processor handles less resource-intensive tasks. More powerful - more resource-intensive. As a result, overall power consumption is reduced compared to the situation when only a processor with a high clock frequency would handle all tasks. Thus, the dual processor primarily solves the problem of increasing energy efficiency, rather than performance.

Technological features

All modern eight-core processors are based on the ARM architecture, the so-called big.LITTLE.

This eight-core big.LITTLE architecture was announced in October 2011 and allowed four low-performance Cortex-A7 cores to work in conjunction with four high-performance Cortex-A15 cores. ARM has repeated this approach every year since, offering more capable chips for both sets of processor cores on the eight-core chip.

Some of the major chip manufacturers for mobile devices focused their efforts on this "octa-core" big.LITTLE sample. One of the first and most notable was its own chip Samsung, famous Exynos. Its eight-core model has been used since Samsung Galaxy S4, at least in some versions of the company's devices.

More recently, Qualcomm also began using big.LITTLE in its eight-core Snapdragon 810 CPU chips. It is on this processor that such well-known new products in the smartphone market are based, like the G Flex 2, which became LG.

At the beginning of 2015, NVIDIA introduced Tegra X1, a new super-performance mobile processor, which the company intends for automotive computers. The X1's main feature is its console-challenging GPU, which is also based on the big.LITTLE architecture. That is, it will also become eight-core.

Is there a big difference for regular user?

Is there a big difference between a quad-core and an eight-core smartphone processor for the average user? No, in fact it is very small, says Jon Mandi.

The term "octa-core" is somewhat confusing, but it actually means duplication of quad-core processors. The result is two independently operating quad-core sets, combined with one chip to improve energy efficiency.

Is an eight-core processor needed in every modern smartphone? There is no such need, believes Jon Mundy and cites the example of Apple, which ensures decent energy efficiency of its iPhones with only a dual-core processor.

Thus, the eight-core ARM big.LITTLE architecture is one of possible solutions One of the most important issues regarding smartphones is battery life. According to John Mundy, as soon as another solution to this problem is found, the trend of installing two quad-core sets in one chip, and similar solutions, will stop.

Do you know other advantages of octa-core smartphone processors?

The article is constantly updated. Last update 10.10.2013 r.

At the moment, the processor market is developing so dynamically that it is simply impossible to keep track of all the new products and keep up with progress.
But we don’t really need this.
In order to buy a processor, it is enough for us to know what the computer will be needed for, what tasks it will perform, and how much money we are willing to spend.

Today, the deserved leaders of the processor market are two largest companies Intel And AMD.
They offer widest choice models of any price category. And such a choice of processors makes my eyes wide open.
And we will try to help you figure it out so that you can choose and buy a productive processor for reasonable money.

Let's start with the fact that the main performance indicators of the processor are:

1) Processor architecture. After all, the new architecture will always be more productive than the previous one (despite the same frequency).
2) Operating frequency. The higher the processor frequency, the more productive it is.
3) size of cache memory of the second and third levels (L2 and L3);

Well, and the secondary indicators:
4) ;
5) technological process;
6) a set of instructions;
and etc.

Although now resourceful consultants in stores are trying to focus more on the number of cores, directly linking the number of cores with the data processing speed and performance of the computer itself.

Number of Cores?

Today, eight-, six-, four-, dual- and single-core processors from AMD, as well as six-, four-, two-, single-core from INTEL.
But for today's programs and the needs of the home gamer, a dual- or quad-core processor operating at a high frequency is quite enough.
A processor with a large number of cores (6-8) will only be needed for programs for encoding video and audio content, image rendering and archivers.

At the moment, optimization in the gaming industry is mainly focused on dual-core processors; only the newest software and games will be developed for multi-threaded computing. So if you are buying a processor for gaming, a high-frequency dual-core processor will be faster than a low-frequency, three- or four-core processor.

Attention! You do not have permission to view hidden text.

And it turned out that for now, players can choose a modern dual-core processor, choosing a solution with a suitable performance-to-price ratio.
It is worth considering that Intel chips also have HyperThreading technology, which allows two parallel tasks to be executed on each core. operating system sees 2-core processors as quad-core, and 4-core as eight-core.
Processors with a large number of cores may be in demand mainly in professional applications and video encoding.
Eight/six cores are not yet fully capable of loading any game.

Let's summarize a little about the cores.

For an office computer, a dual-core processor in the lower price range will be enough.
Like Pentium, Celeron from Intel or A4, AthlonII X2 from AMD.

For a home gaming computer, you can buy a dual-core Intel processor increased frequency or quad-core processor from AMD.
Type Core i3, Core i5 with a frequency of 3 GHz Intel or A8, A10, Phenom™ II X4 with a frequency of 3 GHz AMD.

Well, and for the “charged” workstation or a hi-end gaming system will need a good new generation quad-core processor.
Like Core i5, Core i7 from Intel, since AMD processors are very rarely used in high-performance machines.

We read about Core i3, Core i5 and Core i7 processors in the article:

CPU performance?

As stated above, it is important the parameter is the architecture, on which the processor is based/implemented. How newer architecture, the faster the processor shows itself in applications and games. Since any subsequent architecture, whether Intel or AMD, will always be more productive than the previous one.
At the moment, processors of the family are relevant Haswell(4th generation) and Ivy Bridge(3rd generation), as well as processor architectures Piledriver Richland family, Trinity from AMD.

Also CPU performance depends on its operating frequency. The higher the operating frequency, the more productive the processor. The current operating frequency of the cores, at the moment, is from 3 GHz and higher.
But when comparing AMD and INTEL processors with the same clock frequency, does not mean that they are equal in performance.
Architectural features allow INTEL processors to show higher productivity even at lower frequencies than their competitors.

Note: you cannot simply add the frequency of two cores. Defined as two cores at XX GHz.

Another parameter performance is the size, volume, ultra-fast cache memory of the second and third levels L2 and L3.
This is high-access memory designed to speed up access to data processed by the processor.
The larger the cache memory, the higher the performance.

Note: Core 2 Duo, Core 2 Quad have only L2, Core i5, Core i7 have L2+L3 processors AMD Athlon™ II X2 have only L2, Phenom™ II X4 have L2+L3.

For earlier Core 2s, the indicator was the processor FSB frequency. The bus frequency through which the processor communicates with the RAM.
The higher the FSB bus frequency, the higher the processor performance.

Note: Core processors i3, Core i5 and Core i7 from Intel do not have system bus FSB, as in the latest AMD processors, transfers data between memory and processor directly.
This data transfer method significantly increased productivity.
Processors of the Core i7 LGA1366 family also do not have an FSB bus, but have a high-speed QPI bus.

Technological process(processor design standard) primarily determines the structural size of the elements that make up the processor.
In particular, the heat dissipation and power consumption of modern processors depend on the manufacturing process.
The smaller this value (technological process), the less heat the processor generates and the less energy it consumes.
Earlier Core 2 processors were made using 45-65 nm technologies. Newer Haswell and Ivy Bridge Corei3, Corei5, Core i7 fourth and third generation 22 nm, Sandy Bridge® Corei3, Corei5, second generation Core i7 from Intel and Bulldozer from AMD are made using 32 nm technology.

Set of instructions- this is a set of control codes and data addressing methods acceptable for the processor. The system of such commands is strictly connected to a specific type of processor.
The wider the processor's instruction set, the better and faster the data is processed.

Box configuration (BOX) or tray (Tray/OEM)?

Box (BOX) equipment is a set:
- the processor itself;
- cooler with applied thermal paste (radiator + fan);
- instructions and documentation.

A distinctive feature of the BOX package is the extended warranty on the processor - 3 years.
It is better to buy BOX processors for office and home multimedia systems in which there are no plans to change the cooling to a more efficient one.
But BOX processors are a little more expensive than the same TRAY ones.

Tray processor (Tray/OEM) represents only the processor. No cooler or documents.

Unlike BOX, the warranty for the Tray processor is only 1 year.
Tray/OEM processors are used by companies that assemble ready-made branded computers. And also enthusiastic gamers-overclockers, for whom the warranty (after overclocking the warranty is removed from the product) and native cooling are not important. A more efficient one is immediately installed on the processor.
Tray processors are slightly cheaper.

Intel or AMD?

There has always been fierce debate on this topic at forums and conferences. In general, this topic is eternal. Intel supporters will argue that these processors are better than the competition in every way. And vice versa. I myself am a supporter of Intel.

If we compare processors from these two companies with the same frequency and number of cores, then Intel processors will be more productive. However, in the price range, AMD has the advantage.

If you are assembling a budget system for yourself with minimal finances, then AMD processors are your choice. If you have a gaming or productivity computing system, then the choice should be made in favor of Intel.

There is one more point: motherboards for Intel processors are also more expensive, and the AMD platform is correspondingly cheaper. When choosing a processor for your PC, you need to decide on the initial priorities, assemble inexpensive system on AMD or more productive, but more expensive based on Intel.

Each company has many processor models in its assortment, ranging from budget ones, for example, Celeron from Intel and Sempron/Duron from AMD, to top-end Core i7 from Intel, A10 from AMD.

IN different applications The results are quite different, so in some AMD processors win, in others - Intel, so the choice is always up to the user.

AMD just has one thing undeniable advantage- this is the price. And one drawback is that AMD processors are not as structurally reliable and are a little hotter.

Intel also has an advantage - processors are more structurally reliable and stable, and also less hot. Disadvantage: the price is higher than that of a competitor.

Judging by current tests gaming performance processors between INTEL and AMD looks like this:

Let's summarize:

This means that in order to buy the most productive gaming processor for a computer, you need to select a processor with:
1) the newest architecture;
2) maximum frequency cores (preferably 3 GHz and higher);
3) maximum L2/L3 cache size;
4) a large set of available instructions;
5) minimal manufacturing process.

After reading this article, I think everyone will be able to decide which processor to buy for their computer.
You can always buy processors for a lot of money, but if only everyday tasks that do not require a lot of computing power are performed on the computer, the money will be wasted.

I told you why the growth of processor frequencies has stalled at several gigahertz. Now let's talk about why the development of the number of cores in consumer processors is also extremely slow: for example, the first honest dual-core processor (where both cores were in one chip), built on the x86 architecture, appeared already in 2006, 12 years ago - this there was a ruler Intel Core Duo. And since then, 2-core processors have not left the arena, moreover, they are actively developing: for example, just the other day Lenovo laptop with a processor built on the latest (for x86 architecture) 10 nm process technology. And yes, as you may have guessed, this processor has exactly 2 cores.

For consumer processors, the number of cores has been stuck at 6 since 2010, with the release of the line AMD Phenom X6 - yes, AMD FX were not honest 8-core processors (there were 4 APUs), just like Ryzen 7 is two blocks of 4 cores located side by side on the die. And then, of course, the question arises - why is this so? After all, the same video cards, being essentially “single-headed” in 1995-6 (that is, having 1 shader), have managed to increase their number to several thousand by now - for example, in Nvidia Titan V there are already 5120 of them! At the same time, over a much longer period of development of the x86 architecture, user processors settled on an honest 6 cores per chip, and CPUs for high-performance PCs - on 18, that is, a couple of orders of magnitude less than that of video cards. Why? We'll talk about this below.

CPU architecture

Initially, all Intel x86 processors were built on the CISC architecture (Complex Instruction Set Computing, processors with a full set of instructions) - that is, they implemented the maximum number of instructions “for all occasions”. On the one hand, this is great: for example, in the 90s, the CPU was responsible for both image rendering and even sound (there was a life hack - if the game is slow, turning off the sound in it can help). And even now the processor is a kind of combine that can do everything - and this is also a problem: parallelizing a random task across several cores is not a trivial task. Let’s say that with two cores you can do it simply: we “hang” the system on one core and that’s it background tasks, on the other - only the application. This will always work, but the performance increase will be far from double, as usual background processes require significantly fewer resources than the current heavy task.

Left - GPU diagram Nvidia GTX 980 Ti, where you can see 2816 CUDA cores combined into clusters. On the right is a photo of the crystal. AMD processor Ryzen, where 4 large cores are visible.

Now let’s imagine that we have not two, but 4 or even 8 cores. Yes, in archiving and other calculation tasks, parallelization works well (and that is why the same server processors can have several dozen cores). But what if we have a task with a random outcome (which, alas, is the majority) - say, a game? After all, here every new action depends entirely on the player, so “spreading” such a load across several cores is not an easy task, which is why developers often “hand-write” what the cores do: for example, one can only be occupied processing actions artificial intelligence, another to be responsible only for surround sound, and so on. It is almost impossible to load even an 8-core processor in this way, which is what we see in practice.

With video cards, everything is simpler: the GPU, in fact, deals with calculations and only them, and the number of types of calculations is limited and small. Therefore, firstly, it is possible to optimize the computing cores themselves (Nvidia calls them CUDA) specifically for the required tasks, and, secondly, once everything possible tasks are known, then the process of parallelizing them does not cause difficulties. And thirdly, control is not carried out by individual shaders, but by computing modules, which include 64-192 shaders, therefore big number shaders are not a problem.

Energy consumption

One of the reasons for abandoning further frequency race is the sharp increase in energy consumption. As I already explained in the article on degrowth CPU frequencies, the heat dissipation of the processor is proportional to the cube of the frequency. In other words, if at a frequency of 2 GHz processor emits 100 W of heat, which in principle can be removed without problems air cooler, then at 4 GHz you will get 800 W, which can be removed in the best case with an evaporation chamber with liquid nitrogen (although it should be borne in mind that the formula is still approximate, and the processor has not only computing cores, but you can get the order of numbers using it quite possible).

Therefore, increasing the breadth was an excellent solution: so, roughly speaking, a dual-core 2 GHz processor will consume 200 W, but a single-core 3 GHz processor will consume almost 340 W, that is, the gain in heat dissipation is more than 50%, while in tasks with good optimization for multi-threading a low-frequency dual-core CPU will still be faster than a high-frequency single-core one.

An example of an evaporation chamber with liquid nitrogen for cooling extremely overclocked CPUs.

It would seem that this is a bonanza, we quickly make a 10-core processor with a frequency of 1 GHz, which will generate only 25% more heat than a single-core CPU with 2 GHz (if a 2 GHz processor generates 100 W of heat, then 1 GHz - only 12.5 W, 10 cores - about 125 W). But here we quickly run into the fact that not all tasks are well parallelized, so in practice it will often turn out that a much cheaper single-core CPU with 2 GHz will be significantly faster than a much more expensive 10-core CPU with 1 GHz. But there are still such processors - in the server segment, where there are no problems with parallelizing tasks, and a 40-60 core CPU with frequencies of 1.5 GHz often turns out to be many times faster than 8-10 core processors with frequencies of 4 GHz, while allocating a comparable amount heat.

Therefore, CPU manufacturers have to ensure that single-threaded performance does not suffer as cores grow, and taking into account the fact that the limit of heat dissipation in a typical home PC has been “found” quite a long time ago (this is about 60-100 W), there are ways to increase the number of cores with the same single-core performance and the same heat dissipation, there are only two options: this is either to optimize the processor architecture itself, increasing its performance per clock cycle, or to reduce the technical process. But, alas, both are progressing more and more slowly: over more than 30 years of existence of x86 processors, almost everything that is possible has already been “polished”, so the increase is at best 5% per generation, and reducing the technical process is becoming increasingly difficult due to fundamental problems of creating correctly functioning transistors (with dimensions of tens of nanometers, quantum effects already begin to affect, it is difficult to produce a suitable laser, etc.) - therefore, alas, it is increasingly difficult to increase the number of cores.

Crystal size

If we look at the area of processor chips 15 years ago, we will see that it was only about 100-150 square millimeters. About 5-7 years ago, chips “grew” to 300-400 sq mm and... the process practically stopped. Why? It's simple - firstly, it is very difficult to produce giant crystals, which is why the number of defects increases sharply, and, therefore, the final cost of the CPU.

Secondly, fragility increases: a large crystal can very easily split, and its different edges can heat up differently, which again can cause physical damage.

Comparison of Intel Pentium 3 and Core i9 crystals.

And thirdly, the speed of light also introduces its own limitation: yes, although it is high, it is not infinite, and with large crystals this can introduce a delay, or even make the processor’s operation impossible.

Eventually maximum size The crystal has stopped at about 500 sq mm, and is unlikely to grow anymore - therefore, in order to increase the number of cores, you need to reduce their sizes. It would seem that Nvidia or AMD were able to do this, and their GPUs have thousands of shaders. But here it should be understood that shaders are not full-fledged cores - for example, they do not have their own cache, but only a common one, plus “sharpening” for certain tasks made it possible to “throw out” everything unnecessary from them, which again affected their size. And the CPU not only has full-fledged cores with its own cache, but often graphics and various controllers are located on the same crystal - so in the end, again, almost the only ways to increase the number of cores with the same crystal size are the same optimization and the same reduction of the technical process, and they, as I already wrote, are going slowly.

Operation optimization

Let's imagine that we have a team of people performing various tasks, some of which require the work of several people at the same time. If there are two people in it, they will be able to agree and work effectively. Four is more difficult, but the work will also be quite effective. What if there are 10 or even 20 people? Here we already need some means of communication between them, otherwise there will be “distortions” in the work when someone is not busy with anything. In Intel processors, this means of communication is a ring bus, which connects all cores and allows them to exchange information with each other.

But even this does not help: for example, at the same frequencies, 10-core and 18-core processors from Intel generation Skylake-X differ in performance by only 25-30%, although in theory they should be as much as 80%. The reason is precisely the bus - no matter how good it is, there will still be delays and downtime, and the more cores, the worse the situation will be. But why then are there no such problems in video cards? It's simple - if the processor cores can be thought of as people who can perform various tasks, then the computing units of video cards are more like robots on an assembly line that can only carry out certain instructions. They essentially don’t need to “agree” - therefore, as their number increases, the efficiency decreases more slowly: for example, the difference in CUDA between 1080 (2560 units) and 1080 Ti (3584 units) is 40%, in practice it is about 25-35%, then there are significantly fewer losses.

The more cores, the worse they work together, up to zero performance gain as the number of cores increases.

Therefore, there is no particular point in increasing the number of cores - the increase from each new core will be lower and lower. Moreover, it is quite difficult to solve this problem - you need to develop a bus that would allow data to be transferred between any two cores with the same delay. The star topology is best suited in this case - when all the cores should be connected to a hub, but in reality no one has yet done such an implementation.

So in the end, as we see, increasing the frequency and increasing the number of cores is a rather difficult task, and the game is often not worth the candle. And in the near future, it is unlikely that anything will change seriously, since nothing better than silicon crystals has yet been invented.