Chinese bookmarks: a true story about virtualization, security and spies. Hardware "bookmarks" Hardware bookmark on the data bus
Many information systems operate on the basis that Hardware does not bring any threats. In this situation, initial and regular checks are not even carried out. Bookmark- a logical/hardware device that implements certain undocumented functions, usually to the detriment of the user of a certain information system. A bookmark can store or transmit system data.
Not all enterprises have the necessary specialists who are able to identify hardware problems. For initial confidence in the absence of such bookmarks on the new equipment, you will have to trust the supplied certificates of the supplier. To increase confidence in such equipment, you can invite specialists to perform a complete or random inspection of the equipment.
Implementing regular checking is possible in different ways, it all depends on the requirements of a particular class of information or methods of working with it. Verification methods can be as follows:
- Periodic inspection of equipment by invited specialists from certain authorized structures
- Analysis of equipment serial numbers
- Automated inventory of hardware components in the information sphere of the enterprise
- Sealing of various parts of equipment housings with regular integrity checks
In critical areas, it is possible to implement equipment for continuous monitoring, or suppression of various transmission signals from outside the enterprise. Such means include: radio broadcasts, wired information transmission networks, power networks, infrared range, background sound.
In this matter, an important criterion for the security implementation mechanism is the training of users, who must notify the enterprise security service at any suspicion. Such methods of working with users are called.
I am not a professional in the field of information security; my area of interest is high-performance computing systems. I came to the topic of information security completely by accident, and this is what will be discussed further. I think this true story will highlight the problems associated with virtualization hardware much better than a dry statement of facts. Even before the official announcement of new Intel processors with support for hardware virtualization (at the beginning of 2007), I decided to use these chips to create a single computing system based on several servers, which would become a single computing installation with SMP architecture for the OS and application programs. To do this, it was necessary to write a compact hypervisor with non-standard functionality, the main feature of which would not be the division of the resources of a single computing installation between different operating systems, but, on the contrary, the combination of the resources of several computers into a single complex, which would be managed by one operating system. At the same time, the OS should not even realize that it was dealing not with a single system, but with several servers. Virtualization hardware provided such an opportunity, although it was not originally intended to solve such problems. Actually, a system in which virtualization equipment would be used for high-performance computing has not yet been created, and at that time I was generally a pioneer in this area. The hypervisor for this task, of course, was written from scratch. It was fundamentally important to launch the OS already on a virtualized platform, so that from the first commands of the OS loader everything would work in a virtual environment. To do this, we had to virtualize the real model and all processor operating modes and start virtualization immediately after initializing the platform before loading the OS. Since the virtualization system for this purpose turned out to be non-standard and looked like a completely autonomous compact software module (code volume no more than 40–60 KB), I somehow did not dare call it a hypervisor, and I began to use the term “hyperdriver”, since it is more accurate conveyed the essence of the functional purpose of the system. There was no serial equipment with virtualization hardware at that time, but thanks to cooperation with Craftway, I had access to pre-production samples of processors and motherboards with virtualization support that had not yet been officially released (the so-called samples that Intel kindly provides to its business partners). Therefore, work began to boil on this “sampling” equipment. The layout was assembled, the hyperdriver was written, everything worked as intended. It must be said that at that time the virtualization equipment was very “crude”, which is why it more than once refused to work as written in the documentation. It was necessary to deal with literally every assembly command, and the commands for the virtualization hardware themselves had to be written in machine code, since at that time there were no compilers that supported virtualization commands. I was proud of the results obtained, I felt almost like the ruler of virtual worlds... but my euphoria did not last long, only a month. By that time, I had already assembled a prototype based on servers with virtualization equipment, the first production samples of which had just appeared, but the layout did not work. I started looking into it and realized that my system was hanging when executing hardware virtualization commands. The impression was that they either did not work at all, or worked somehow non-standardly. The freeze occurred only when the virtualization equipment was running in real mode, but if my system was started from protected mode, after loading the OS, then everything was fine. Professionals know that in the first revisions, Intel virtualization hardware did not support processor operation in real mode. This required an additional layer of sufficiently large volume to emulate virtual x86. Since the hyperdriver was launched before the operating system loaded so that it could fully trust the new virtual configuration, a small part of the OS boot code was executed in real processor mode. The system died precisely on the real-mode emulation handlers in the hyperdriver. At first I thought that I made a mistake somewhere, didn’t understand something, forgot about something. I checked everything to the last bit in my code, didn’t find any errors and began to blame not myself, but my colleagues from overseas. The first thing I did was replace the processors, but that didn't help. On motherboards at that time, the virtualization equipment was only in the BIOS, where it was initialized when the server was turned on, so I started comparing BIOSes on motherboards (boards of the same type with samples) - everything matched down to the byte and the number of the BIOS itself. I fell into a stupor and, no longer knowing what to do, used the last resort - the “poke method”. What I didn’t do, no longer thinking, but simply combining, and in the end I stupidly downloaded BIOSes from the official Intel website and re-wrote them into the motherboards, after which everything worked... My surprise knew no bounds: the BIOS number was the same , the BIOS images matched byte by byte, but for some reason the serial motherboards only worked when I loaded them with the same BIOS taken from the Intel website. So, the reason is still in the motherboards? But their only difference was in the markings: Assembled Canada was written on the samples, and Assembled China was written on the serial boards. It became clear that the boards from China contain additional software modules firmware in the BIOS, but standard analysis programs did not see these modules. They apparently also worked with virtualization equipment and, accordingly, were able to hide the true contents of the BIOS. The reason why my hyperdriver froze on these Chinese boards also became clear: two software systems simultaneously worked with the same virtualization hardware, which did not allow sharing their resources. I wanted to deal with this malicious BIOS, and without any ulterior thought about “bookmarks”, “backdoors”, “undocumented capabilities”, there was simply academic interest, and nothing more. It must be said that in parallel with the introduction of virtualization equipment, Intel radically updated the chipset. This chipset, numbered 5000x, is still produced in several modifications. The south bridge of this chipset, 631xESB/632xESB I/O Controller Hub, to which flash chips with BIOS are connected, has been produced almost unchanged since 2007 and is used as the base chip for almost all servers in a two-socket design. I downloaded the datasheet for the southbridge, read the description and was simply stunned. It turns out that three flash memory chips are connected to this new south bridge: the first is a standard BIOS, the second is dedicated to network controller processor programs, and the third is intended for the BMC unit integrated into the south bridge. The system management unit (SMU) is a means of remote control and monitoring of a computing installation. It is indispensable for large server rooms, where it is simply impossible to stay for a long time due to noise, temperature and drafts. The fact that VMC units have their own processor and, accordingly, flash memory for its programs is, of course, not news, but until now such processor and memory were placed on a separate board, which was connected to the motherboard: if you want, install it, if you don’t want, install it. don't put it. Now Intel has implemented these components into the south bridge; moreover, it connected this unit to the system bus and did not use a dedicated network channel (as provided by the IPMI standard, which describes the functions of the BMC unit) for the operation of the service network, but tunneled all service network traffic to the main network adapters. Next, I learned from the documentation that the programs on the flash chip of the Navy unit are encrypted, and to unpack them, a special hardware cryptographic module is used, also integrated into the south bridge. I have never come across such IUD units before. In order not to be unfounded, I give an excerpt from the documentation for this south bridge:
- ARC4 processor working at 62.5 MHz speed.
- Interface to both LAN ports of Intel® 631xESB/632xESB I/O Controller Hub allowing direct connection to the net and access to all LAN registers.
- Cryptographic module, supporting AES and RC4 encryption algorithms and SHA1 and MD5 authentication algorithms.
- Secured mechanism for loadable Regulated FW.
- Intel's new production server boards based on the 5000 chipset have programs embedded in the flash memory of the BMC unit and executed on the central processor, and these programs run using CPU virtualization hardware.
- The flash memory images from the official Intel website do not contain such software modules, therefore, the software modules interfering with me were illegally flashed into the motherboards at the production stage.
- The flash memory of the Navy unit contains encrypted software modules that cannot be assembled and loaded into the flash memory without knowing the encryption keys, therefore, the one who inserted these illegal software modules knew the encryption keys, that is, they actually had access to secret information.
![](https://i2.wp.com/xakep.ru/wp-content/uploads/post/58104/3.png)
![](https://i2.wp.com/xakep.ru/wp-content/uploads/post/58104/4.png)
Convenient remote control tools save system administrators a lot of energy - and at the same time pose a huge security risk when they cannot be disabled in hardware using a jumper or switch on the motherboard. The Intel Management Engine 11 unit in modern Intel platforms poses just such a danger - initially it cannot be disabled and, moreover, some mechanisms for initialization and operation of the processor are tied to it, so rough deactivation can simply lead to complete system inoperability. The vulnerability lies in Intel Active Management Technology (AMT) and, with a successful attack, allows you to gain complete control over the system, as was described back in May of this year. But researchers from Positive Technologies.
The IME processor itself is part of the system hub (PCH) chip. With the exception of PCI Express processor slots, all communication between the system and the outside world goes through the PCH, which means that the IME has access to almost all data. Prior to version 11, an attack using this vector was unlikely: the IME processor used a proprietary architecture with the ARC instruction set, which was little known to third-party developers. But in version 11, a bad joke was played on the technology: it was transferred to the x86 architecture, and the modified MINIX was used as the OS, which means that third-party studies of binary code were significantly simplified: both the architecture and the OS were well documented. Russian researchers Dmitry Sklyarov, Mark Ermolov and Maxim Goryachiy managed to decrypt the executable modules of IME version 11 and begin their thorough study.
Intel AMT technology has a vulnerability score of 9.8 out of 10. Unfortunately, completely disabling the IME on modern platforms is not possible for the reason described above - the subsystem is closely related to the initialization and startup of the CPU, as well as power management. But from a flash memory image containing IME modules, you can remove everything unnecessary, although this is very difficult to do, especially in version 11. The me_cleaner project is actively developing, a utility that allows you to remove the general part of the image and leave only vital components. But let’s give a small comparison: if in IME versions up to 11 (before Skylake), the utility deleted almost everything, leaving about 90 KB of code, now it is necessary to save about 650 KB of code - and in some cases the system may turn off after half an hour, since the block The IME enters recovery mode.
However, there is progress. The above-mentioned group of researchers managed to use the development kit, which is provided by Intel itself and includes the Flash Image Tool for configuring IME parameters and the Flash Programming Tool, which works through the built-in SPI controller. Intel does not make these programs publicly available, but finding them online is not particularly difficult.
The XML files obtained using this kit were analyzed (they contain the structure of the IME firmware and a description of the PCH strap mechanism). One bit called "reserve_hap" (HAP) seemed suspicious due to the description "High Assurance Platform (HAP) enable". An online search revealed that this is the name of a high-trust platform program associated with the US NSA. Enabling this bit indicated that the system had entered Alt Disable Mode. The IME unit did not respond to commands and did not respond to influences from the operating system. There are a number of more subtle nuances that can be found in the article on Habrahabr.ru, but the new version of me_cleaner already supports most of the dangerous modules without setting the HAP bit, which puts the IME engine in the “TemporaryDisable” state.
The latest modification of me_cleaner, even in the 11th version of IME, leaves only the RBE, KERNEL, SYSLIB and BUP modules; no code was found in them to enable the IME system itself. In addition to them, you can use the HAP bit to be completely sure that the utility can also do this. Intel has reviewed the research and has confirmed that a number of IME settings do address the security needs of government agencies. These settings were introduced at the request of US government customers, they have undergone limited testing and such configurations are not officially supported by Intel. The company also denies introducing so-called backdoors into its products.
Concern that if the adversary is sufficiently technical, there is a danger that he will carry out covert modifications to any chip. The modified chip will work in critical nodes, and the introduced “Trojan horse” or “hardware” will go unnoticed, undermining the country’s defense capability at the most fundamental level. For a long time, such a threat remained hypothetical, but an international group of researchers was recently able to realize it at the physical level.
Georg T. Becker from the University of Massachusetts, together with colleagues from Switzerland and Germany, as part of a proof of concept, created two versions of a “hardware-level Trojan” that disrupts the operation of the (pseudo) random number generator (RNG) in the cryptographic unit of Intel Ivy processors Bridge. Cryptographic keys created using a modified PRNG for any encryption system will be easily predictable.
The presence of a hardware bug is not determined in any way either by built-in tests specially designed for this purpose, or by external inspection of the processor. How could this happen? To answer this question, it is necessary to return to the history of the emergence of hardware PRNG and become familiar with the basic principles of its operation.
When creating cryptographic systems, it is necessary to eliminate the possibility of quickly selecting keys. Their length and degree of unpredictability directly affect the number of options that the attacking side would have to go through. The length can be set directly, but achieving uniqueness of key options and their equal probability is much more difficult. To do this, random numbers are used during key creation.
Currently, it is generally accepted that using only software algorithms it is impossible to obtain a truly random stream of numbers with their uniform chaotic distribution throughout the entire specified set. They will always have a high frequency of occurrence in some part of the range and remain somewhat predictable. Therefore, most number generators used in practice should be perceived as pseudo-random. They are rarely strong enough in a cryptographic sense.
To reduce the effect of predictability, any number generator requires a reliable source of random seeding - a random seed. Usually it is used as the results of measurements of some chaotic physical processes. For example, fluctuations in the intensity of light vibrations or registration of radio frequency noise. It would be technically convenient to use such an element of randomness (and the entire hardware PRNG) in a compact version, and ideally, make it built-in.
Intel has been building (pseudo)random number generators into its chips since the late nineties. Previously, their nature was analog. Random output values were obtained due to the influence of difficult to predict physical processes - thermal noise and electromagnetic interference. Analog oscillators were relatively easy to implement as separate blocks, but difficult to integrate into new circuits. As the process scaled down, new and time-consuming calibration steps were required. In addition, a natural decrease in supply voltage worsened the signal-to-noise ratio in such systems. PRNGs worked constantly and consumed a significant amount of energy, and their speed of operation left much to be desired. These shortcomings imposed restrictions on possible areas of application.
The idea of a (pseudo)random number generator with a completely digital nature has long seemed strange, if not absurd. After all, the state of any digital circuit is always strictly determined and predictable. How to introduce the necessary element of randomness into it if there are no analog components?
Attempts to achieve the desired chaos based only on digital elements have been made by Intel engineers since 2008 and were crowned with success after a couple of years of research. The work was presented in 2010 at the VLSI Summer Symposium in Honolulu and produced a small revolution in modern cryptography. For the first time, a fully digital, fast and energy-efficient PRNG was implemented in mass-produced general-purpose processors.
Its first working title was Bull Mountain. It was then renamed Secure Key. This cryptographic block consists of three basic modules. The first generates a stream of random bits at a relatively slow speed of 3 Gbps. The second evaluates their variance and combines them into blocks of 256 bits, which are used as sources of random seeding. After a series of mathematical procedures, a stream of random numbers 128 bits long is generated in the third block at a higher speed. Based on them, using the new RdRand instruction, if necessary, random numbers of the required length are created and placed in a specially designated register: 16, 32 or 64 bits, which are ultimately transmitted to the program that requested them.
Errors in (pseudo)random number generators and their malicious modifications cause a loss of confidence in popular cryptographic products and their certification procedure itself.
Due to the exceptional importance of PRNG for any cryptographic system, Secure Key has built-in tests to verify the quality of the generated random numbers, and leading expert groups have been involved in certification. The entire unit meets the criteria of ANSI X9.82 and NIST SP 800-90 standards. In addition, it is certified to Level 2 according to NIST FIPS 140-2 requirements.
Until now, most of the work on hardware Trojans has been hypothetical. Researchers have proposed additional designs of small logic circuits that should somehow be added to existing chips. For example, Samuel Talmadge King and his co-authors presented at the LEET-08 conference a variant of a hardware Trojan for the central processor that would provide complete control over the system to a remote attacker. Simply by sending a UDP packet configured in a certain way, one could make any changes on such a computer and gain unlimited access to its memory. However, additional logic circuits are relatively easy to identify using microscopy, not to mention specialized methods for searching for such modifications. Becker's group took a different route:
Instead of adding additional circuitry to the chip, we implemented our hardware-level features by simply changing the operation of some of the microtransistors already on it. After a number of attempts, we were able to selectively change the polarity of the dopant and make the desired modifications to the operation of the entire cryptographic unit. Therefore, our family of Trojans turned out to be resistant to most detection methods, including scanning microscopy and comparison with reference chips.”
As a result of the work done, instead of unique numbers 128 bits long, the third Secure Key block began to accumulate sequences in which only 32 bits differed. Cryptographic keys created from such pseudo-random numbers are very predictable and can be opened within a few minutes on a regular home computer.
The selective change in electrical conductivity underlying the hardware was implemented in two versions:
- digital post-processing of signals from Intel Secure Key;
- use on a side channel using the table bit substitution method (Substitution-box).
The latter method is more universal and can be used with minor modifications on other chips.
The ability to use the built-in PRNG through the RdRand instruction first appeared in Intel Ivy Bridge architecture processors. Intel has written detailed guides for programmers. They talk about methods for optimal implementation of cryptographic algorithms and provide a link to a description of the principles of operation of Secure Key. For a long time, the efforts of security experts were aimed at finding vulnerabilities in the software. Perhaps for the first time, hidden interference at the hardware level turned out to be a much more dangerous and completely feasible technology.