How are negative numbers stored in computer memory? Floating point format

Real numbers in mathematical calculations have no restrictions on the range and precision of number representation. However, in computers, numbers are stored in registers and memory locations with a limited number of digits. That's why accuracy representation real numbers, imaginable in a car, is finite and the range is limited.

When writing real numbers in programs, it is customary to use a dot instead of the usual comma. Any real number can be represented in the form of numbers with the radix order of the number system.

Example 4.4. The decimal number 1.756 in the form of writing numbers with the radix order of the number system can be represented as follows:

1.756 . 10 0 = 0.1756 . 10 1 = 0.01756 . 10 2 = ...

17.56 . 10 -1 = 175.6 . 10 -2 = 1756.0 . 10 -3 = ... .

Floating point representation called number representation N in a number system with a base q as :

N = m*. q p ,

Where m - a multiplier containing all the digits of the number (mantissa), p - an integer called order.

If the “floating” point is located in the mantissa before the first significant digit, then with a fixed number of digits allocated for the mantissa, the maximum number of significant digits of the number is recorded, that is, the maximum accuracy of the number’s representation in the machine.

If in the mantissa the first digit after the dot (comma) is different from zero, then such a number is called normalized .

Mantissa and order q It is customary to write an -ary number in the radix system q , and the base itself is in the decimal system.

Example 4.5. Here are examples of a normalized representation of a number in the decimal system:

2178.01 =0.217801 * 10 4

0.0045 =0.45 * 10 -2

Examples in binary:

10110.01= 0.1011001 * 2 101 (order 101 2 = 5 10)

Modern computers support several international standard formats for storing real floating-point numbers, varying in precision, but they all have the same structure. A real number is stored in three parts: the sign of the mantissa, the shifted order and the mantissa:

Characteristic n-bit normalized number is calculated as follows: if the order is allocated k digits, then an offset equal to (2 k -1 -1) is added to the true value of the order represented in the two's complement code.

Thus, an order taking values in the range -128 to +127 is converted into a biased order in the range 0 to 255. The biased order is stored as an unsigned number, which simplifies the comparison, addition and subtraction operations of orders, and also simplifies the comparison operation the normalized numbers themselves.

The number of digits allocated to the order affects the range from the smallest non-zero number to the largest number representable in the machine given the format. Obviously, the more digits allocated to the mantissa, the higher the accuracy of the number representation. Due to the fact that for normalized real numbers the most significant bit of the mantissa is always 1, this most significant bit is not stored in memory.

Any binary integer containing at most m digits, can be converted into real format without distortion.

Table 4.3. Standard formats for representing real numbers

Example 4.6. Representation of normalized numbers in a single format.

Let's illustrate how the number 37.16 10 will be stored. When converting to a binary number, an exact translation of 100101,(00101000111101011100) does not result - the fractional part enclosed in brackets is repeated in the period.

We convert the number into normalized form: 0.100101(00101000111101011100) * 2 110

Let's represent a real number in 32-bit format:

1. The sign of the number is “+”, so we enter 0 in the sign bit (31);

2. To set the order, 8 bits are allocated; to the true value of the order presented in the complementary code, we add the offset (2 7 -1) = 127.

Since the order is positive, the direct order code coincides with the additional order, let’s calculate the shifted order: 00000110 + 01111111=10000101

We enter the resulting shifted order.

3. We enter the mantissa, while removing the most significant digit of the mantissa (it is always equal to 1);

shifted order

mantissa

In this example, we were able to transfer only 24 bits; the rest were lost with a loss of precision in representing the number.

If we could look into the contents of computer memory, we would see the following: This figure reflects Rule #1:

Data (and programs) in computer memory are stored in binary form, i.e. in the form of chains of zeros and ones.Rule #2:

representation of data in a computer discretely.

What is discreteness?

Note: A discrete set consists of elements separated from each other. For example, sand is discrete because it is made up of individual grains of sand. But water or oil is continuous (within the framework of our sensations, since we still cannot sense individual molecules)

For example, an image is constructed as a collection of points, i.e. discretely.

Rule #3:the set of quantities representable in memory is limited and finite.

Representing numbers on a computer.

Integers in the computer. (Fixed point format)

Any computing device (computer, calculator) can only work with a limited set of integers. Look at the calculator display, it contains 10 characters. The largest positive number that can be placed on the scoreboard:

The largest negative number in absolute value:

The situation is similar in the computer.

For example, if a memory cell of 16 bits is allocated for an integer, then the largest positive number will be like this:

In the decimal number system it is equal to:

2 15 -1=32767

Here the first bit plays the role of the sign of the number. Zero is a sign of a positive number. The largest absolute negative number is -32768.

How to get its internal representation:

1) convert the number in 32768 to the binary number system, it is equal to
1000000000000000 - received direct code.

2) invert this binary code, i.e. replace zeros with ones, and ones with zeros - we got return code.

0111111111111111

3) Add one to this binary number, the result is:

A one in the first bit denotes a minus sign.

(don't think that the resulting code is "minus zero". This code represents the number -32768.)

These are the rules for machine representation of integers. This internal representation of a number is called additional code.

If N bits are allocated for an integer in the computer memory, then the range of integer values is: [-2 N-1 -1, 2 N -1]

We looked at the format for representing signed integers, i.e. positive and negative. There are times when you only need to work with positive integers. In this case, the format for representing unsigned integers is used.

In this format, the smallest number is zero, and the largest number for a 16-bit cell is:

In decimal notation this is 2 16 - 1 = 65535, twice the absolute value of signed notation.

Integers in the computer. (Floating point format)

The largest number may vary from calculator to calculator. The simplest calculator has 999999999. If you add another unit to it, the calculator will display an error message. And on a “smarter” calculator, adding one will lead to the following result:

This entry on the scoreboard is understood as follows: 1 x10 9.

This number format is called floating point format.

1	e	+	0	9
mantissa			number order

On a computer, numbers can be represented in both fixed point and floating point formats.

In computer technology, real numbers (as opposed to integers) are numbers that have a fractional part.

When writing them Instead of a comma, it is customary to write a period. So, for example, the number 5 is an integer, and the numbers 5.1 and 5.0 are real.

For the convenience of displaying numbers that take values from a fairly wide range (that is, both very small and very large), the form of writing numbers with base order of the number system. For example, the decimal number 1.25 can be represented in this form as follows:

1.25*10 0 = 0.125*10 1 = 0.0125*10 2 = ... ,
or like this:
12.5*10 -1 = 125.0*10 -2 = 1250.0*10 -3 = ... .

This representation of real numbers, which is most beneficial for a computer, is called normalized.

The mantissa and the order of a q-ary number are usually written in the system with the base q, and the base itself is written in the decimal system.

Examples of normalized representation:

Decimal system Binary system

753.15 = 0.75315*10 3 ; -101.01 = -0.10101*2 11 (order 11 2 = 3 10)

0.000034 = -0.34*10 -4 ; -0.000011 = 0.11*2 -100 (order -100 2 = -410)

Real numbers are written differently in different types of computers. In this case, the computer usually gives the programmer the opportunity to choose from several number formats the most suitable for a particular task - using four, six, eight or ten bytes.

As an example, here are the characteristics of the real number formats used by IBM-compatible personal computers:

Real number formats	Size in bytes	Approximate range of absolute values	Number of significant decimal digits
Single	4	10 -45 ... 10 38	7 or 8
Real	6	10 -39 ... 10 38	11 or 12
Double	8	10 -324 ... 10 308	15 or 16
Advanced	10	10 -4932 ... 10 4932	19 or 20

From this table it can be seen that the form of representation of floating point numbers allows you to write numbers with high precision and from a very wide range.

When storing floating point numbers, they are allocated digits for mantissa, exponent, number sign and exponent sign:

Let us show with examples how some numbers are written in a normalized form in a four-byte format with seven bits to record the order.

1. Number 6.25 10 = 110.01 2 = 0.11001

2 11:

2. Number -0.125 10 = -0.0012 = -0.1*2 -10 (negative order is written in two's complement):

The maximum value of a non-negative integer is achieved when all cells contain ones. For an n-bit representation it will be equal to

non-negative integers. The minimum number corresponds to the eight zeros stored in the eight bits of the memory cell and is equal to zero. The maximum number corresponds to eight units and is equal to

A = 1 × 2 7 + 1 × 2 6 + 1 × 2 5 + 1 × 2 4 + 1 × 2 3 + 1 × 2 2 + 1 × 2 1 + 1 × 2 0 = 1 × 2 8 - 1 = 255 10 .

Range of change non-negative integers numbers: from 0 to 255.

For storage signed integers two memory cells (16 bits) are allocated, and the most significant (left) bit is allocated to the sign of the number (if the number is positive, then 0 is written to the sign bit, if the number is negative - 1).

The representation of positive numbers in a computer using sign-magnitude format is called direct code numbers. For example, the number 2002 10 = 11111010010 2 would be represented in 16-bit format as follows:

The maximum positive number (allowing for the allocation of one digit per sign) for signed integers in n-bit representation is:

Used to represent negative numbers additional code. Additional code allows you to replace the arithmetic operation of subtraction with an addition operation, which significantly simplifies the work of the processor and increases its performance.

The complement code of a negative number A stored in n cells is 2 n - |A|.

Two's complement represents the addition of the modulus of a negative number A to 0, since in n-bit computer arithmetic:

2 n - |A| + |A| = 0,

since in computer n-bit arithmetic 2 n = 0. Indeed, the binary representation of such a number consists of one one and n zeros, and only n low-order digits, that is, n zeros, can fit into an n-bit cell.

To obtain the complementary code of a negative number, you can use a fairly simple algorithm:

1. Write the module of the number in direct code in n binary digits.

2. Get return code numbers, for this value, invert all bits (replace all ones with zeros and replace all zeros with ones).

3. Add one to the resulting reverse code.

Let's write the additional code of the negative number -2002 for the 16-bit computer representation:

When an n-bit representation of a negative number A in two's complement code is used, the most significant bit is allocated to store the sign of the number (one). The remaining digits are written as positive numbers.

For a number to be positive, the following condition must be true:

|A| £ 2 n-1 .

Therefore, the maximum value of the modulus of the number A in the m-digit representation is equal to:

Then the minimum negative number is:

Let's define the range of numbers that can be stored in RAM in the format long signed integers(four memory cells are allocated for storing such numbers - 32 bits).

The maximum positive integer (taking into account the allocation of one digit per sign) is equal to:

A = 2 31 - 1 = 2 147 483 647 10.

The minimum negative integer is:

A = -2 31 = - 2 147 483 648 10.

Advantages of representing numbers in format with fixed point are the simplicity and clarity of the representation of numbers, as well as the simplicity of the algorithms for implementing arithmetic operations.

The disadvantage of representing numbers in format with fixed point is a small range of representation of quantities, insufficient for solving mathematical, physical, economic and other problems in which both very small and very large numbers are used.

Representation of numbers in floating point format. Real numbers are stored and processed in a computer in the format floating point. In this case, the position of the decimal point in the number may change.

Number format floating point is based on exponential notation, in which any number can be represented. So the number A can be represented as:

A = m × q n

2.3

where m is the mantissa of the number;
q - base of the number system;
n - number order.

For uniform presentation of numbers floating point a normalized form is used in which the mantissa meets the condition:

1/n £ |m|

This means that the mantissa must be a proper fraction and have a non-zero digit after the decimal point.

Let's convert the decimal number 555.55, written in natural form, into exponential form with a normalized mantissa:

555.55 = 0.55555 × 10 3.

Here the normalized mantissa: m = 0.55555, order: n = 3.

A number in floating point format takes up 4 ( common precision number) or 8 bytes ( double precision number). When writing a floating point number, bits are allocated to store the mantissa sign, exponent sign, exponent and mantissa.

The range of numbers is determined by the number of bits allocated to store the order of the number, and the precision (the number of significant digits) is determined by the number of bits allocated to store the mantissa.

Let's determine the maximum number and its accuracy for the format ordinary precision numbers, if 8 bits are allocated to store the order and its sign, and 24 bits are allocated to store the mantissa and its sign:

0	1	1	1	1	1	1	1	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
sign and order								sign and mantissa

The maximum value of the order of the number will be 1111111 2 = 127 10, and therefore the maximum value of the number will be:

2 127 = 1.7014118346046923173168730371588 × 10 38.

The maximum value of a positive mantissa is:

2 23 - 1 » 2 23 = 2 (10 × 2.3) » 1000 2.3 = 10 (3 × 2.3) » 10 7.

Thus the maximum value ordinary precision numbers taking into account the possible accuracy of calculations will be 1.701411 × 10 38 (the number of significant digits of a decimal number in this case is limited to 7 digits).

Tasks

1.26. Fill out the table by writing negative decimal numbers in forward, reverse and complement codes in 16-bit notation:

1.27. Define view range signed integers(2 bytes of memory are allocated) in fixed point format.

1.28. Determine the maximum number and its precision for the format double precision numbers, if 11 bits are allocated to store the order and its sign, and 53 bits are allocated to store the mantissa and its sign.

Numerical data is processed in a computer using the binary number system. Numbers are stored in computer memory in binary code, that is, as a sequence of zeros and ones, and can be represented in fixed or floating point format.

Integers are stored in memory in fixed-point format. With this format for representing numbers, a memory register consisting of eight memory cells (8 bits) is allocated for storing non-negative integer numbers. Each digit of a memory cell always corresponds to the same digit of the number, and the comma is located to the right after the least significant digit and outside the digit grid. For example, the number 110011012 would be stored in a memory register as follows:

Table 4

The maximum value of a non-negative integer number that can be stored in a register in fixed-point format can be determined from the formula: 2n – 1, where n is the number of digits of the number. The maximum number will be equal to 28 - 1 = 25510 = 111111112 and the minimum 010 = 000000002. Thus, the range of changes in non-negative integer numbers will be from 0 to 25510.

Unlike the decimal system, the binary number system in the computer representation of a binary number does not have symbols indicating the sign of the number: positive (+) or negative (-), therefore, to represent signed integers in the binary system, two number representation formats are used: number value format signed and two's complement format. In the first case, two memory registers (16 bits) are allocated for storing signed integers, and the most significant digit (leftmost) is used as the sign of the number: if the number is positive, then 0 is written to the sign bit, if the number is negative, then 1. For example , the number 53610 = 00000010000110002 will be represented in the memory registers in the following form:

Table 5

and a negative number -53610 = 10000010000110002 in the form:

Table 6

The maximum positive number or minimum negative number in signed number value format (taking into account the representation of one digit per sign) is 2n-1 – 1 = 216-1 – 1 = 215 – 1 = 3276710 = 1111111111111112 and the range of numbers will be from - 3276710 to 32767.

Most often, to represent signed integers in the binary system, the two's complement code format is used, which allows you to replace the arithmetic operation of subtraction in a computer with an addition operation, which significantly simplifies the structure of the microprocessor and increases its performance.

To represent negative integers in this format, two's complement code is used, which is the modulus of a negative number to zero. Converting a negative integer to two's complement is carried out using the following operations:

1) write the module of the number in direct code in n (n = 16) binary digits;

2) get the reverse code of the number (invert all digits of the number, i.e. replace all ones with zeros, and zeros with ones);

3) add one to the least significant digit to the resulting reverse code.

For example, for the number -53610 in this format, the modulus will be 00000010000110002, the reciprocal code will be 1111110111100111, and the additional code will be 1111110111101000.

It must be remembered that the complement of a positive number is the number itself.

To store signed integers other than the 16-bit computer representation when used two memory registers(this number format is also called the short signed integer format), the medium and long signed integer formats are used. To represent numbers in the mid number format, four registers are used (4 x 8 = 32 bits), and to represent numbers in the long number format, eight registers are used (8 x 8 = 64 bits). The ranges of values for the medium and long number formats will be respectively: -(231 – 1) ... + 231 – 1 and -(263-1) ... + 263 – 1.

Computer representation of numbers in fixed point format has its advantages and disadvantages. TO benefits include the simplicity of representing numbers and algorithms for implementing arithmetic operations; the disadvantages are the finite range of representation of numbers, which may be insufficient for solving many problems of a practical nature (mathematical, economic, physical, etc.).

Real numbers (finite and infinite decimals) are processed and stored in a computer in floating point format. With this number representation format, the position of the decimal point in the entry may change. Any real number K in floating point format can be represented as:

where A is the mantissa of the number; h – base of the number system; p – number order.

Expression (2.7) for the decimal number system will take the form:

for binary -

for octal -

for hexadecimal -

This form of number representation is also called normal . With a change in order, the comma in the number shifts, that is, it seems to float to the left or to the right. Therefore, the normal form of representing numbers is called floating point form. The decimal number 15.5, for example, in floating point format can be represented as: 0.155 102; 1.55 101; 15.5 100; 155.0 10-1; 1550.0 · 10-2, etc. This form of decimal floating point notation 15.5 is not used when writing computer programs and entering them into a computer (computer input devices accept only linear data recording). Based on this, expression (2.7) for representing decimal numbers and entering them into the computer is converted to the form

where P is the order of number,

i.e., instead of the base of the number system 10, they write the letter E, instead of a comma, a dot, and the multiplication sign is not placed. Thus, the number 15.5 in floating point and linear format (computer representation) will be written as: 0.155E2; 1.55E1; 15.5E0; 155.0E-1; 1550.0E-2, etc.

Regardless of the number system, any number in floating point form can be represented by an infinite number of numbers. This form of recording is called non-normalized . For an unambiguous representation of floating point numbers, a normalized form of writing a number is used, in which the mantissa of the number must meet the condition

where |A| - the absolute value of the mantissa of the number.

Condition (2.9) means that the mantissa must be a proper fraction and have a non-zero digit after the decimal point, or, in other words, if the mantissa does not have a zero after the decimal point, then the number is called normalized. So, the number 15.5 in normalized form (normalized mantissa) in floating point form will look like this: 0.155 102, i.e. the normalized mantissa will be A = 0.155 and order P = 2, or in the computer representation of the number 0.155E2 .

Floating point numbers have a fixed format and occupy four (32 bits) or eight bytes (64 bits) of computer memory. If a number occupies 32 bits in the computer's memory, then it is a regular precision number; if it is 64 bits, then it is a double precision number. When writing a floating point number, bits are allocated to store the sign of the mantissa, sign of the exponent, mantissa and exponent. The number of digits allocated to the order of the number determines the range of variation of the numbers, and the number of digits allocated to store the mantissa determines the accuracy with which the number is specified.

When performing arithmetic operations (addition and subtraction) on numbers presented in floating point format, the following procedure (algorithm) is implemented:

1) the order of numbers on which arithmetic operations are performed is aligned (the order of a smaller absolute number increases to the order of a larger absolute number, while the mantissa decreases by the same amount);

2) arithmetic operations are performed on the mantissas of numbers;

3) the result obtained is normalized.

Practical part