What is a character code? ASCII encoding (American standard code for information interchange) - basic text encoding for the Latin alphabet

In order to use ASCII correctly, it is necessary to expand your knowledge in this area and about coding capabilities.

What is it?

ASCII is an encoding table of printed characters (see screenshot No. 1) typed on a computer keyboard to transmit information and some codes. In other words, the alphabet and decimal digits are encoded into corresponding symbols that represent and carry the necessary information.

ASCII was developed in America, so the standard character set usually includes the English alphabet with numbers, for a total of about 128 characters. But then a fair question arises: what to do if encoding of the national alphabet is required?

Other versions of the ASCII table have been developed to address similar issues. For example, for languages ​​with a foreign structure, the letters of the English alphabet were either removed, or additional characters were added to them in the form of a national alphabet. Thus, the ASCII encoding may contain Russian letters for national use (see screenshot No. 2).

Where is the ASCII coding system used?

This coding system is necessary not only for typing text information on the keyboard. It is also used in graphics. For example, in the ASCII Art Maker program, graphic images of various extensions consist of a range of ASCII characters (see screenshot No. 3).


As a rule, such programs can be divided into those that perform the function of graphic editors, inverting an image into text, and those that convert an image into ASCII graphics. The well-known emoticon (or as it is also called “ smiling human face") is also an example of an encoding character.

This encoding method can also be used when writing or creating an HTML document. For example, you enter a specific and necessary set of characters, and when viewing the page itself, the symbol corresponding to this code will be displayed on the screen.

Among other things, this type of encoding is necessary when creating a multilingual website, because characters that are not included in one or another national table will need to be replaced with ASCII codes. If the reader is directly connected with information and communication technologies (ICT), then it will be useful for him to familiarize himself with such systems as:

  1. Portable character set;
  2. Control characters;
  3. EBCDIC;
  4. VISCII;
  5. YUSCII;
  6. Unicode;
  7. ASCII art;
  8. KOI-8.

ASCII Table Properties

Like any systematic program, ASCII has its own characteristic properties. So, for example, the decimal number system (digits from 0 to 9) is converted to the binary number system (i.e., each decimal digit is converted to binary 288 = 1001000, respectively).

The letters located in the upper and lower columns differ from each other only by a bit, which significantly reduces the level of complexity of checking and editing the case.

With all these properties, ASCII encoding works as eight-bit, although it was originally intended to be seven-bit.

Use of ASCII in Microsoft Office programs:

If necessary, this option for encoding information can be used in Microsoft Notepad and Microsoft Office Word. Within these applications, the document can be saved in ASCII format, but in this case, you will not be able to use some functions when typing text.

In particular, bolding and bolding will not be available because encoding only preserves the meaning of the typed information, and not the general appearance and form. You can add such codes to a document using the following software applications:

  • Microsoft Excel;
  • Microsoft FrontPage;
  • Microsoft InfoPath;
  • Microsoft OneNote;
  • Microsoft Outlook;
  • Microsoft PowerPoint;
  • Microsoft Project.

It is worth considering that when typing ASCII code in these applications, you must hold down the ALT key.

Of course, all the necessary codes require a longer and more detailed study, but this is beyond the scope of our article today. I hope that you found it really useful.

See you again!

Good Bad

The set of characters with which text is written is called alphabet.

The number of characters in the alphabet is its power.

Formula for determining the amount of information: N=2b,

where N is the power of the alphabet (number of characters),

b – number of bits (information weight of the symbol).

The alphabet with a capacity of 256 characters can accommodate almost all the necessary characters. This alphabet is called sufficient.

Because 256 = 2 8, then the weight of 1 character is 8 bits.

The unit of measurement 8 bits was given the name 1 byte:

1 byte = 8 bits.

The binary code of each character in computer text takes up 1 byte of memory.

How is text information represented in computer memory?

The convenience of byte-by-byte character encoding is obvious because a byte is the smallest addressable part of memory and, therefore, the processor can access each character separately when processing text. On the other hand, 256 characters is quite a sufficient number to represent a wide variety of symbolic information.

Now the question arises, which eight-bit binary code to assign to each character.

It is clear that this is a conditional matter; you can come up with many encoding methods.

All characters of the computer alphabet are numbered from 0 to 255. Each number corresponds to an eight-bit binary code from 00000000 to 11111111. This code is simply the serial number of the character in the binary number system.

A table in which all characters of the computer alphabet are assigned serial numbers is called an encoding table.

Different types of computers use different encoding tables.

The table has become the international standard for PCs ASCII(read aski) (American Standard Code for Information Interchange).

The ASCII code table is divided into two parts.

Only the first half of the table is the international standard, i.e. symbols with numbers from 0 (00000000), up to 127 (01111111).

ASCII encoding table structure

Serial number

Code

Symbol

0 - 31

00000000 - 00011111

Symbols with numbers from 0 to 31 are usually called control symbols.
Their function is to control the process of displaying text on the screen or printing, sounding a sound signal, marking up text, etc.

32 - 127

00100000 - 01111111

Standard part of the table (English). This includes lowercase and uppercase letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of parentheses, commercial and other symbols.
Character 32 is a space, i.e. empty position in the text.
All others are reflected in certain signs.

128 - 255

10000000 - 11111111

Alternative part of the table (Russian).
The second half of the ASCII code table, called the code page (128 codes, starting from 10000000 and ending with 11111111), can have different options, each option has its own number.
The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, characters from the Russian alphabet are placed in this part of the table.

First half of the ASCII code table


Please note that in the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order. This observance of lexicographic order in the arrangement of symbols is called the principle of sequential coding of the alphabet.

For letters of the Russian alphabet, the principle of sequential coding is also observed.

Second half of the ASCII code table


Unfortunately, there are currently five different Cyrillic encodings (KOI8-R, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with transferring Russian text from one computer to another, from one software system to another.

Chronologically, one of the first standards for encoding Russian letters on computers was KOI8 ("Information Exchange Code, 8-bit"). This encoding was used back in the 70s on computers of the ES computer series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

From the early 90s, the time of dominance of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

Apple computers running the Mac OS operating system use their own Mac encoding.

In addition, the International Standards Organization (ISO) has approved another encoding called ISO 8859-5 as a standard for the Russian language.

The most common encoding currently used is Microsoft Windows, abbreviated CP1251.

Since the late 90s, the problem of standardizing character encoding has been solved by the introduction of a new international standard called Unicode. This is a 16-bit encoding, i.e. it allocates 2 bytes of memory for each character. Of course, this increases the amount of memory occupied by 2 times. But such a code table allows the inclusion of up to 65536 characters. The complete specification of the Unicode standard includes all the existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

Let's try using an ASCII table to imagine what words will look like in the computer's memory.

Internal representation of words in computer memory

Sometimes it happens that a text consisting of letters of the Russian alphabet received from another computer cannot be read - some kind of “abracadabra” is visible on the monitor screen. This happens because computers use different character encodings for the Russian language.

Each computer has its own set of characters that it implements. This set contains 26 upper and lowercase letters, numbers and special characters (dot, space, etc.). When converted to integers, symbols are called codes. Standards were developed so that computers would have the same sets of codes.

ASCII standard

ASCII (American Standard Code for Information Interchange) is an American standard code for information exchange. Each ASCII character has 7 bits, so the maximum number of characters is 128 (Table 1). Codes 0 through 1F are control characters that are not printed. Many non-printable ASCII characters are needed to transmit data. For example, a message may consist of the start-of-header character SOH, the header itself and the start-of-text character STX, the text itself and the end-of-text character ETX, and the end-of-transmission character EOT. However, data over the network is transmitted in packets, which themselves are responsible for the beginning and end of the transmission. So non-printable characters are almost never used.

Table 1 - ASCII code table

Number Team Meaning Number Team Meaning
0 NUL Null pointer 10 DLE Exit from the transmission system
1 SOH start of title 11 DC1 Device management
2 STX Beginning of text 12 DC2 Device management
3 ETX End of text 13 DC3 Device management
4 EOT End of transmission 14 DC4 Device management
5 ACK Request 15 N.A.K. Non-confirmation of reception
6 BEL Acceptance confirmation 16 SYN Simple
7 B.S. Bell symbol 17 ETB End of transmission block
8 HT Step back 18 CAN Mark
9 LF Horizontal tabulation 19 E.M. End of media
A VT Line feed 1A SUB Subscript
B FF Vertical tab 1B ESC Exit
C CR Page translation 1C FS File separator
D SO Carriage return 1D G.S. Group separator
E S.I. Switch to additional register 1E R.S. Record separator
S.I. Switch to standard case 1F US Module separator
Number Symbol Number Symbol Number Symbol Number Symbol Number Symbol Number Symbol
20 space 30 0 40 @ 50 P 60 . 70 p
21 ! 31 1 41 A 51 Q 61 a 71 q
22 32 2 42 B 52 R 62 b 72 r
23 # 33 3 43 C 53 S 63 c 73 s
24 φ 34 4 44 D 54 T 64 d 74 t
25 % 35 5 45 E 55 AND 65 e 75 And
26 & 36 6 46 F 56 V 66 f 76 v
27 37 7 47 G 57 W 67 g 77 w
28 ( 38 8 48 H 58 X 68 h 78 x
29 ) 39 9 49 I 59 Y 69 i 70 y
2A 3A ; 4A J 5A Z 6A j 7A z
2B + 3B ; 4B K 5B [ 6B k 7B {
2C 3C < 4C L 5C \ 6C l 7C |
2D 3D = 4D M 5D ] 6D m 7D }
2E 3E > 4E N 5E 6E n 7E ~
2F / 3F g 4F O 5F _ 6F o 7F DEL

Unicode standard

The previous encoding is fine for English, but it is not convenient for other languages. For example, German has umlauts, and French has superscripts. Some languages ​​have completely different alphabets. The first attempt at extending ASCII was IS646, which extended the previous encoding by an additional 128 characters. Latin letters with strokes and diacritics were added, and received the name - Latin 1. The next attempt was IS 8859 - which contained a code page. There were also attempts at extensions, but this was not universal. UNICODE encoding was created (is 10646). The idea behind the encoding is to assign a single constant 16-bit value to each character, which is called - code pointer. In total there are 65536 pointers. To save space, we used Latin-1 for codes 0 -255, easily changing ASII to UNICODE. This standard solved many problems, but not all. Due to the arrival of new words, for example, for the Japanese language, it is necessary to increase the number of terms by about 20 thousand. It is also necessary to include braille.

As you know, a computer stores information in binary form, representing it as a sequence of ones and zeros. To translate information into a form convenient for human perception, each unique sequence of numbers is replaced by its corresponding symbol when displayed.

One of the systems for correlating binary codes with printed and control characters is

At the current level of development of computer technology, the user is not required to know the code of each specific character. However, a general understanding of how coding is carried out is extremely useful, and for some categories of specialists, even necessary.

Creating ASCII

The encoding was originally developed in 1963 and then updated twice over the course of 25 years.

In the original version, the ASCII character table included 128 characters; later an extended version appeared, where the first 128 characters were saved, and previously missing characters were assigned to codes with the eighth bit involved.

For many years, this encoding was the most popular in the world. In 2006, Latin 1252 took the leading position, and from the end of 2007 to the present, Unicode has firmly held the leading position.

Computer representation of ASCII

Each ASCII character has its own code, consisting of 8 characters representing a zero or a one. The minimum number in this representation is zero (eight zeros in the binary system), which is the code of the first element in the table.

Two codes in the table were reserved for switching between standard US-ASCII and its national variant.

After ASCII began to include not 128, but 256 characters, an encoding variant became widespread, in which the original version of the table was stored in the first 128 codes with the 8th bit zero. National written characters were stored in the upper half of the table (positions 128-255).

The user does not need to know the ASCII character codes directly. A software developer usually only needs to know the element number in the table to calculate its code using the binary system if necessary.

Russian language

After the development of encodings for the Scandinavian languages, Chinese, Korean, Greek, etc. in the early 70s, the Soviet Union began creating its own version. Soon, a version of the 8-bit encoding called KOI8 was developed, preserving the first 128 ASCII character codes and allocating the same number of positions for letters of the national alphabet and additional characters.

Before the introduction of Unicode, KOI8 dominated the Russian segment of the Internet. There were encoding options for both the Russian and Ukrainian alphabet.

ASCII problems

Since the number of elements even in the extended table did not exceed 256, there was no possibility of accommodating several different scripts in one encoding. In the 90s, the “crocozyabr” problem appeared on the Runet, when texts typed in Russian ASCII characters were displayed incorrectly.

The problem was that the different ASCII codes did not match each other. Let us remember that various characters could be located in positions 128-255, and when changing one Cyrillic encoding to another, all letters of the text were replaced with others having an identical number in a different version of the encoding.

Current Status

With the advent of Unicode, the popularity of ASCII began to decline sharply.

The reason for this lies in the fact that the new encoding made it possible to accommodate characters from almost all written languages. In this case, the first 128 ASCII characters correspond to the same characters in Unicode.

In 2000, ASCII was the most popular encoding on the Internet and was used on 60% of web pages indexed by Google. By 2012, the share of such pages had dropped to 17%, and Unicode (UTF-8) took the place of the most popular encoding.

Thus, ASCII is an important part of the history of information technology, but its use in the future seems unpromising.