Structure of software components. System for automatically creating signatures of executable files

2.1 Files

Information storage requirements:

    ability to store large amounts of data

    information must be retained after the process is terminated

    multiple processes must have simultaneous access to information

2.1.1 File Naming

The length of the file name depends on the OS; it can be from 8 (MS-DOS) to 255 (Windows, LINUX) characters.

OSes can distinguish between upper and lowercase characters. For example, WINDOWS and windows for MS-DOS are the same, but for UNIX they are different files.

On many operating systems, the file name consists of two parts separated by a period, for example windows.exe. The part after the dot is called file extension. The system uses it to distinguish the file type.

For MS-DOS the extension is 3 characters. Using it, the system distinguishes the type of file, and whether it can be executed or not.

In UNIX, the extension is limited to a file name size of 255 characters, and UNIX can have several extensions, but extensions are used more application programs, not the OS. UNIX cannot determine whether a file is executable or not based on its extension.

2.1.2 File structure

Three main file structures:

    Byte sequence- The OS is not interested in the contents of the file, it only sees bytes. The main advantage of such a system is its flexibility of use. Used on Windows and UNIX.

    Sequence of records- records of a fixed length (for example, a punched card), are read sequentially. Not in use now.

    Entry tree- each record has a key, records are read using the key. The main advantage of such a system is the search speed. Still used on mainframes.

Three types of file structures.

2.1.3 File types

Main file types:

    Regular- contain user information. Used on Windows and UNIX.

    Catalogs - system files, providing structure support file system. Used on Windows and UNIX.

    Character- for input-output modeling. Used only on UNIX.

    Block- for modeling disks. Used only on UNIX.

Main types of regular files:

    ASCII files- consist of text strings. Each line ends with a carriage return (Windows), a line feed (UNIX), and both (MS-DOS). Therefore, if you open a text file written in UNIX in Windows, then all the lines will merge into one big line, but under MS-DOS they will not merge ( this is a fairly common situation). Main advantages of ASCII files:
    - can be displayed on the screen and output to the printer without conversion
    - can be edited by almost any editor

    Binary files- other files (non-ASCII). As a rule, they have an internal structure.

Main types of binary files:

    Executable- programs, they can be processed by the operating system itself, although they are written as a sequence of bytes.

    Non-executable- other.

Examples of executable and non-executable files

"Magic number"- identifying the file as executable.

2.1.4 File access

Main types of file access:

    Consistent- bytes are read in order. Used when there were magnetic tapes.

2.1.5 File attributes

Main file attributes:

    Protection - who can access the file and how (users, groups, read/write). Used on Windows and UNIX.

    Password - password for the file

    Creator - who created the file

    Owner - current owner of the file

    Read-only flag - 0 - read/write, 1 - read-only. Used on Windows.

    The "hidden" flag - 0 - visible, 1 - invisible in the list of directory files (default). Used on Windows.

    Flag "system" - 0 - normal, 1 - system. Used on Windows.

    The "archive" flag - ready or not for archiving (not to be confused with compression). Used on Windows.

    Flag "compressed" - the file is compressed (similar to zip archives). Used on Windows.

    "Encrypted" flag - the encryption algorithm is used. If someone tries to read a file that does not have permission to do so, they will not be able to read it. Used on Windows.

    ASCII/binary flag - 0 - ASCII, 1 - binary

    Random access flag - 0 - sequential only, 1 - random access

    Flag "temporary" - 0 - normal, 1 - to delete the file at the end of the process

    Blocking flag - blocking access to the file. If he is busy for editing.

    Creation time - date and time of creation. UNIX is used.

    Last access time - date and time of last access

    Time last change- date and time of the last change. Used on Windows and UNIX.

    Current size is the file size. Used on Windows and UNIX.

2.1.6 File Operations

Basic system calls to work with files:

    Create - creating a file without data.

    Delete - deleting a file.

    Open - open a file.

    Close - closing the file.

    Read - reading from a file, from the current file position.

    Write - writing to a file, to the current file position.

    Append - adding to the end of the file.

    Seek - sets the file pointer to a specific position in the file.

    Get attributes - getting file attributes.

    Set attributes - set file attributes.

    Rename - rename the file.

2.1.7 Files mapped to memory address space

Sometimes it is convenient to display a file in memory (you do not need to use I/O system calls to work with the file), and work with memory, and then write the modified file to disk.

When using paged memory organization, the entire file is not loaded, but only the necessary pages are loaded.

When using segmented memory organization, the file is loaded into a separate segment.

An example of copying a file via memory mapping.

Algorithm:

    A segment is created for file 1

    The file is displayed in memory

    A segment is created for file 2

    Segment 1 is copied to segment 2

    Segment 2 is saved to disk

Disadvantages of this method:

    It's hard to determine the length of the output file

    If one process has mapped a file into memory and modified it, but the file has not yet been saved, a second process will open the same file and work with the outdated file.

    The file may be large, larger than a segment or virtual space.

2.2 Directories

2.2.1 Single-level catalog systems

In this system, all files are contained in one directory.

Single-directory system containing four files, two files A, but different owners

System advantages:

    Simplicity

    The ability to quickly find a file, no need to climb through directories

Disadvantages of the system:

    Different users can create files with the same names.

2.2.2 Two-level catalog systems

Each user has his own directory.

Two-level catalog system

When a user logs into the system, he is taken to his directory and works only with it. This makes using system files problematic.

This problem can be solved by creating system directory, with general access.

If one user has many files, then he may also need files with the same names.

2.2.3 Hierarchical catalog systems

Each user can create as many directories as he needs.

Hierarchical catalog system

Almost all modern universal operating systems are organized in this way. Specialized OS may not need this.

2.2.4 Path name

To organize a directory tree, you need some way to specify a file.

Two main methods for specifying a file:

    absolute path name- indicates the path from root directory, For example:
    - for Windows \usr\ast\mailbox
    - for UNIX /usr/ast/mailbox
    - for MULTICS >usr>ast>mailbox

    relative path name- the path is indicated from the current directory (working directory), for example:
    - if the current directory is /usr/, then absolute path/usr/ast/mailbox will be rewritten to ast/mailbox
    - if the current directory is /usr/ast/, then the absolute path /usr/ast/mailbox will be overwritten in mailbox
    - if the current directory is /var/log/, then the absolute path /usr/ast/mailbox will be rewritten to ../../usr/ast/mailbox

./ - means current directory

../ - means parent directory

2.2.5 Operations with directories

Basic system calls for working with directories:

    Create - create a directory

    Delete - delete a directory

    OpenDir - close directory

    CloseDir - close a directory

    Rename - rename a directory

Obfuscators

Debuggers

Debugger or debugger(eng. debugger) is a development environment module or separate application, designed to find errors in the program. The debugger allows you to perform step-by-step tracing, monitor, set or change the values ​​of variables during program execution, install and remove control points or stopping conditions, etc.

Obfuscation(from the Latin obfuscare - to obscure, darken; and English obfuscate - to make unobvious, confusing, confusing) or code obfuscation - bringing the source text or executable code of a program to a form that preserves its functionality, but makes it difficult to analyze, understand the operating algorithms and modification during decompilation.

« Entanglement» code can be carried out at the level of the algorithm, source text and/or assembly text. To create confusing assembly text, specialized compilers can be used that use non-obvious or undocumented features program execution environment. There are also special programs, producing obfuscation, called obfuscators.

Executable module, executable file- a file containing a program in a form in which it can be (after loading into memory and locally configured) executed by a computer.

Most often it contains binary representation machine instructions for a specific processor (for this reason, in programming slang, the word binary is used in relation to it - tracing paper from English binary), but may also contain instructions in an interpreted programming language, the execution of which requires an interpreter. In relation to the latter, the term “script” is often used.

The execution of binary files is carried out by hardware- and software-implemented machines. The first include processors - for example, the x86 or SPARC families. The second are virtual machines, for example, virtual machine Java or .NET Framework. Format binary file determined by the architecture of the machine executing it. There are machines implemented in both hardware and software, for example, x86 family processors and the VMware virtual machine.

The executability status of a file is most often determined by the conventions adopted. Thus, in some operating systems, executable files are recognized thanks to a file naming convention (for example, by specifying the file extension - . exe or. bin), while in others, executable files have specific metadata (for example, the execute permission bit on UNIX-like operating systems).

In modern computer architectures, executable files contain large amounts of data that are not computer program: description of the software environment in which the program can be executed, data for debugging the program, constants used, data that may be required operating system to run a process (for example, recommended heap size), and even describe the graphics window structures used by the program.



Often, executable files contain calls to library functions, such as calls to operating system functions. Thus, along with processor dependence (machine-dependent is any binary executable file containing machine code), executable files may be characterized by dependence on the version of the operating system and its components.

Regardless of whether the computer is turned on or not, all data and programs are stored in the long-term (external) memory of the computer in the form of files - from where they are loaded during execution or processing.

File is a certain set of codes that display a certain amount of information related to the type or purpose to which it is assigned unique name, and which is stored in long-term memory.

The source texts of programs, ready-to-execute programs, documents, graphic images and any other data can be stored in the form of files. Based on the type of organization and content, files are divided into two categories - text and binary (binary). Text files in accordance with their purpose, they store strings of characters interpreted as texts. Executable files consist of program codes of programs ready for execution.

Unique names provide a way to organize files and make them accessible to operating systems and other programs. The file name consists of two parts separated by a dot: actually Name file and extension , defining its type (program, data, etc.). The file name is assigned by the user (sometimes by the system by default). The file type is usually set automatically by the program when it is created, which allows you to automate the launch of programs in most cases. For example, .com, .exe– executable files (programs), .txt, .rtf . doctext files, .pas– the source text of a program written in the language Pascal .

To organize the placement of files on disks, their names are registered in special files– directories (in modern operating systems these files are called folders) . Catalog this is a table file (stored on the same disk as the files), which stores file names, information about their size, time last update, file attributes (properties), etc. If a directory stores the name of a file, the file is often said to be “located” in that directory. In reality, the file is located (saved) in a certain memory area on the computer disk, often in the form of several parts, fragments on different tracks and disks of the package (on free areas of the media). Relevant information is contained in the catalogue.

Each disk can have many directories - their number is determined by expediency and is limited only by the disk capacity. This also applies to the number of files in the directory. All modern disk operating systems provide the creation of a file system designed to organize data storage and provide access to it. The principle of organizing the file system is tabular. The surface of a hard disk is considered as a three-dimensional matrix, the dimensions of which are the surface, cylinder and sector numbers. A cylinder is understood as a set of all tracks belonging to different surfaces and located at an equal distance from the axis of rotation. Data about where on the disk a particular file is recorded is stored in the system area of ​​the disk in special file allocation tables (FAT tables).

The order in which files are stored on disk is determined by the organization of the file system (the organization of directories and the way in which the placement and attributes of files are described in them).

Hundreds of thousands of files are stored on disks, so for ease of searching, files are organized in the form of a multi-level file system, which has the structure shown in the figure.

The initial, root directory contains subdirectories of the 1st level, in turn, each of them has subdirectories of the 2nd level, etc. Each directory has a name (without extension), and it can be registered in another, parental catalogue. It should be noted that directories at all levels can store not only directories, but also files.

Although file location data is actually stored in tabular form, for user convenience it is presented as hierarchical tree structures, and all necessary connections are provided by the operating system.

File system maintenance functions include the following operations performed under the control of the operating system:

    creating and naming files;

    creating and naming directories;

    renaming files and directories;

    copying and moving files between computer drives and between directories on the same drive;

    deleting files and directories;

    navigation through the file structure to access a given file or directory;

    file attributes management.

Formats executable files

The virtual memory of a process consists of several segments or regions memory. The size, contents and location of segments in memory are determined both by the program itself, for example, the use of libraries, the size of code and data, and the format of the executable file of this program. In most modern operating rooms UNIX systems two are used standard format executable files - COFF (Common Object File Format) and ELF (Executable and Linking Format).

A description of executable file formats may seem redundant, but an understanding of them is necessary to describe the basic functionality of the operating system kernel. In particular, the information stored in executable files of the COFF and ELF formats allows you to answer a number of questions that are very important for the operation of the application and the system as a whole:

What parts of the program need to be loaded into memory?

How is an area for uninitialized data created?

Which parts of the process should be stored in the disk swap area (special area disk space, intended for temporary storage of fragments of the process address space), for example, when replacing pages, and which ones can be read from a file if necessary, and thus do not require saving?

Where are program instructions and data located in memory?

What libraries are needed to run the program?

How are the executable file on disk, the program image in memory, and the disk swap area related?

In Fig. 2.3 is given basic structure memory for processes loaded from executable files in COFF and ELF formats, respectively. Although the layout of the segments differs between the two formats, the basic components are the same. Both processes have code (text), data, and stack segments. As you can see from the figure, the size of the data segments and the stack can change, and the direction of this change is determined by the format of the executable file. The stack size is automatically changed by the operating system, while the data segment size is controlled by the application itself. We'll discuss these issues in detail in the "Memory Allocation" section later in this chapter.

Rice. 2.3. Executable program images in COFF and ELF formats

The data segment includes initialized data, which is copied into memory from the corresponding sections of the executable file, and uninitialized data, which is filled with zeros before the process begins execution. Uninitialized data is often called a BSS segment.

From the book Photoshop CS2 and digital photography (Tutorial). Chapters 1-9 author Solonitsyn Yuri

From the Linux for the User book author Kostromin Viktor Alekseevich

11.4.2. Font file formats In recent times, literally every graphics editor or publishing program used their own font file format and, as a rule, some programs did not support the formats of others. Over time, the number of formats actually used

From book Adobe Photoshop CS3 author Zavgorodniy Vladimir

Chapter 4 Formats graphic files For storage raster graphics exists a large number of various formats files. Among them there are both universal formats, not tied to any specific program, and specific “personal” raster formats

From the book Adobe InDesign CS3 author Zavgorodniy Vladimir

Graphic file formats Adobe InDesign can import graphic files of various formats - both the most common AI, BMP, EPS, GIF, JPEG, PDF, PSD, TIFF, and the more rare DCS, EMF, PCX, PICT, PNG, SCT (ScitexCT), WMF.All graphic formats and files are separated according to the type of information they

From the book Internet Solutions from Dr. Bob by Swart Bob

1. Internet file encoding formats Internet file formats can be divided into several groups. Firstly, file transfer formats via FTP, for which the uuencode/decode scheme was developed a long time ago, later replaced by xxencode/decode. Later there was a refusal in favor of Base64 and MIME,

author Raymond Eric Stephen

3.1.6. Binary file formats If your operating system uses binary formats for important data (such as Accounts users), it is likely that the tradition of using readable text formats for applications will not be generated. In details

From the book Photoshop CS3: Training Course author Timofeev Sergey Mikhailovich

Graphic file formats Any graphic image regardless of whether it is vector or raster, can be stored in a computer solely by writing it to separate file. Each file always has a specific format. The format indicates that

From the book The Art of Programming for Unix author Raymond Eric Stephen

3.1.6. Binary File Formats If an operating system uses binary formats for sensitive data (such as user accounts), it is likely that there will be no tradition of using readable text formats for applications. More details about

From book Network tools Linux by Smith Roderick W.

Font File Formats There are two types of fonts: bitmap and outline fonts (outline fonts are often called scalable fonts). These font types have different properties and are processed different ways. Most font servers designed to run in

From the book HTML 5, CSS 3 and Web 2.0. Development of modern Web sites. author Dronov Vladimir

From the book HTML 5, CSS 3 and Web 2.0. Development of modern Web sites author Dronov Vladimir

File formats and encoding formats There are as many multimedia file formats as there are graphic file formats. As with Internet graphics, not all Web browsers support multimedia formats, but only a few. (I would like the author

From the book Computer Sound Processing author Zagumennov Alexander Petrovich

Ad Lib Sample SMP audio file formatsFormat used sound card Ad Lib Gold for loading instrument samples into it. Supports 8/16-bit audio, mono/stereo, 4-bit Yamaha ADPCM compression. Files in this format have the extension . smp.Amiga SVXThis file type is used on

From the book Creating a virus and antivirus author Guliev Igor A.

Appendix A EXE File Header Formats Header Format of a Regular EXE File At the beginning of the EXE file is the formatted portion of the EXE file header (Table A-1). Next comes the Relocation Table, consisting of long pointers (offset: segment) on those

From the Photoshop CS4 book author Zhvalevsky Andrey Valentinovich

Graphic file formats A format is a way of recording an image as a file. There are quite a few graphics file formats, but in most cases only a few are used. Each of them has characteristics, so we recommend

From book Digital photography. Tricks and effects author Gursky Yuri Anatolievich

File Formats There are many ways to store image information and therefore many file formats. Attention! To avoid data loss, when working with images, save them in TIFF format or in the “native” format of the editor program. JPEGВ

From the book Windows 10. Secrets and device author Almametov Vladimir