Abstract: Distributed information systems. Architecture of distributed information systems and Web applications

INTRODUCTION 4

1.THE CONCEPT OF DISTRIBUTED IS 6

1.1. Prerequisites for creating distributed IS 6

1.2. Concept of distributed information systems 8

1.3. Tools for working with distributed data 11

2. DISTRIBUTED DATABASES 13

2.1. Basic principles 13

2.2 Types of distributed databases 15

2.3. Purpose and principle of operation of a distributed database 16

3. EXAMPLES OF DISTRIBUTED SYSTEMS 21

CONCLUSION 25

LITERATURE 26

INTRODUCTION

The relevance of this topic of the essay lies in the fact that in the world economy there are processes of globalization and information integration. They also affected our country, which, due to its geographical location and size, is forced to use distributed information systems (IS). Distributed information systems provide work with data located on different servers, various hardware and software platforms and stored in various formats. They are easily expandable, based on open standards and protocols, provide integration of their resources with other information systems, and provide users with simple interfaces.

In the world there is a huge amount of ready-to-use information computing resources. They were created in different time, different approaches were used to develop them. Almost always, when developing a new information system, you can find ready-made components that are suitable for their functions. The problem is that when they were created, incompatibility requirements were not taken into account. These components do not understand each other, they cannot work together. It is desirable to have a mechanism or set of mechanisms to make such independently developed information and computing resources interoperable.

This paper examines basic information about a distributed information system: describes the prerequisites for its development, means of working with data, and introduces the concept distributed base data, as well as its types and basic principles. The third chapter presents examples of distributed information systems, such as: - Informix On-Line from Informix Software; - Ingres Intelligent Database from Ingres Corp; - Oracle (version 7) from Oracle Corp; - Sybase System 10 from Sybase Inc.

The purpose of the study is to study theoretical foundations about distributed information systems, as well as the formation of knowledge about the principles of its operation.

This distribution of data allows, for example, to store in a network node the data that is most often used in this node. This approach makes it easier and faster to work with this data and leaves the opportunity to work with the rest of the database data.

1.THE CONCEPT OF DISTRIBUTED IS

1.1. Prerequisites for creating distributed information systems

From the very beginning of development computer technology Two main directions of its use have emerged. The first direction is the use of computer technology to perform numerical calculations that take too long or are impossible to perform manually. The emergence of this direction contributed to the intensification of methods for numerical solution of complex mathematical problems, development of a class of programming languages focused on convenient recording numerical algorithms, formation feedback with developers of new computer architectures.

The second direction is the use of computer technology in automatic or automated information systems. Typically, the volumes of information that such systems have to deal with are quite large, and the information itself has a rather complex structure. One of the natural requirements for such systems is the average speed of operations and the safety of information.

But since information systems require complex structures data, these individual additional funds data management was an essential part of information systems and was practically repeated from one system to another. The desire to identify and generalize the general part of information systems responsible for managing complexly structured data was, apparently, the first motivating reason for the creation various systems management.

Very soon it became clear that it was impossible to get by with a common library of programs that implemented over the standard basic file system more complex data storage methods, for example, storing information in multiple files. Thus, all this contributed to the creation of distributed information systems.

In fact, if Information system supports consistent storage of information in several files, we can say that it supports a database. If some auxiliary data management system allows you to work with multiple files, ensuring their consistency, we can call it a database management system. The mere requirement of maintaining data consistency across multiple files does not allow for a library of functions: such a system must have some of its own data (metadata) and even knowledge that determines the integrity of the data.

There is a huge amount of ready-to-use information and computing resources in the world. They were created at different times and different approaches were used to develop them. Almost always, when developing a new information system, you can find ready-made components that are suitable for their functions.

1.2. The concept of distributed information systems

Typically, a system in which more than one database server operates is considered distributed. This is used to reduce the load on the server and ensure the operation of geographically remote departments. The varying complexity of creation, modification, maintenance, and integration with other systems make it possible to divide information systems into classes of small, medium and large distributed systems. Small ICs have a small life cycle(life cycle), orientation towards mass use, low price, impossibility of modification without the participation of developers, using mainly desktop database management systems (DBMS), homogeneous hardware and software that do not have security features. Large corporate information systems, federal-level systems and others have a long life cycle, migration of legacy systems, diversity of hardware software, the scale and complexity of the problems being solved, the intersection of the set subject areas, analytical processing data, territorial distribution of components.

The functions of such information systems include, first of all, working with distributed data located on different physical servers, various hardware and software platforms and stored in various internal formats. In this case, the system must provide full information about yourself and all your resources, easy to expand, be based on open standards and protocols, provide the ability to integrate your resources with the resources of other information systems. For users, the system should provide different levels of user privileges and provide simple interfaces to access information.

Data from heterogeneous systems is usually combined into logical groups to which queries are addressed. An abstract query system assumes that the system operates not with a specific query syntax, but with its logical essence based on abstract attributes.
When building distributed information systems, as a rule, two basic architectures are used: Client/server and Internet Intranet.
Enterprise IS built on a Client/server architecture provides clients with a wide range of applications and development tools that are focused on maximizing the computing capabilities of client workstations. Server resources are used mainly for storing and exchanging documents, as well as for accessing the external environment. This architecture allows you to better protect the server side of applications, while providing the ability for applications to either directly address other server applications, or route requests to them. However, frequent client calls to the server reduce network performance. We have to solve issues safe work on a network because applications and data are distributed among different clients. The distributed nature of the system construction makes it difficult to configure and maintain

The IS based on Internet Intranet is based on the principle " open architecture". IS software is implemented in the form of applets or servlets (programs in JAVA language) or in the form of cgi modules (Perl or C programs). The IP of this architecture includes Web-yinh\, implemented using CORBA Enterprise JavaBeans technologies, ActiveX 1X"OM, multi-level applications on Java based and XML, a .Net concept with XML in which the exchange between various servers(data warehouses, business applications, servers for mobile clients and more) is produced using architecture-neutral XML.

A distributed information base means an unlimited number of databases that are remote from each other and have a number of common characteristics:

Operating according to uniform rules defined centrally for all databases included in the distributed information base;

Data exchange is carried out according to rules also defined centrally.

The organization of a distributed database is necessary for companies engaged in various types of activities, if their daily work there is a need to solve the following problems:

The need to quickly obtain information from databases of remotely located units (or branches);

The need to consolidate information from databases into a single database legal entities included in the structure of the company, for subsequent data analysis and obtaining reports from one database, both for the company as a whole and for each legal entity separately;

Typically, a system in which more than one database server operates is considered distributed. This is used to reduce the load on the server and ensure the operation of geographically remote departments. The varying complexity of creation, modification, maintenance, and integration with other systems make it possible to divide information systems into classes of small, medium and large distributed systems. Small information systems have a short life cycle (life cycle), orientation towards mass use, low price, impossibility of modification without the participation of developers, using mainly desktop database management systems (DBMS), homogeneous hardware and software, which do not have security features. Large corporate information systems, federal-level systems and others have a long life cycle, migration of legacy systems, diversity of hardware and software, the scale and complexity of the tasks being solved, the intersection of many subject areas, analytical data processing, and territorial distribution of components.

Distributed databases (RDB) are a set of logically interconnected databases distributed over a computer network.

The RDB consists of a set of nodes connected communication network, wherein:

each node is a full-fledged DBMS in itself;

nodes interact with each other in such a way that a user of any of them can access any data on the network as if it were on his own node.

Each node is itself a database system. Any user can perform operations on data on his local node in the same way as if this node was not part of the distributed system at all. A distributed database system can be thought of as a partnership between separate local DBMSs on separate local nodes.

Fundamental principle for creating distributed databases (“Rule 0”): To the user, a distributed system should look the same as a non-distributed system.

A fundamental principle entails certain additional rules or purposes. There are only twelve such goals:

Local independence. Nodes in a distributed system must be independent, or autonomous. Local independence means that all operations on a node are controlled by that node.

Lack of support for the central node. Local independence implies that all nodes in a distributed system should be treated as equals. Therefore, there should not be any calls to the "central" or "master" node in order to obtain some centralized service.

Continuous operation. Distributed systems should provide more high degree reliability and availability.

Location independent. Users should not know where exactly the data is physically stored and should act as if all the data was stored on their own local node.

Fragmentation independent. A system supports fragmentation independence if a given relation variable can be divided into parts or fragments when organizing its physical storage. In this case, data can be stored in the place where it is most often used, which allows localization of most operations and reduced network traffic.

Replication independent. The system supports data replication if a given stored relation variable - or, in general, a given fragment of a given stored relation variable - can be represented by several separate copies or replicas that are stored on several separate nodes.

Processing distributed requests. The point is that a request may need to contact multiple nodes. In such a system there can be many possible ways forwarding data to enable the request in question to be completed.

Distributed transaction management. There are 2 main aspects of transaction management: recovery management and concurrency management. With regard to recovery management, to ensure the atomicity of a transaction in a distributed environment, the system must ensure that the entire set of agents related to a given transaction (an agent is a process that runs for a given transaction on a separate node) has either committed its results or performed a rollback. As for concurrency control, in most distributed systems it is based on a blocking mechanism, just like in non-distributed systems.

Hardware independence. It is desirable to be able to run the same DBMS on different hardware platforms and, moreover, to ensure that different machines participate in the operation of a distributed system as equal partners.

Operating system independent. Ability to operate the DBMS under various operating systems.

Network independence. The ability to support many fundamentally different nodes, differing in hardware and operating systems, as well as a number of different types of communication networks.

Independence from the type of DBMS. It is necessary that the DBMS instances on different nodes all support the same interface, and it is not at all necessary that these are copies of the same version of the DBMS.

Types of distributed databases

Distributed Databases

Multidatabases with global schema. A multidatabase system is a distributed system that serves external interface for access to multiple local DBMSs or is structured as a global level above local DBMSs.

Federated databases. Unlike multibases, they do not have a global schema that all applications access. Supported instead local circuit data import-export. Each node maintains a partial global schema that describes information from those remote sources whose data is necessary for operation.

Multibases with common language access - distributed control environments with client-server technology

Distributed information systems

A distributed information system is a set of databases that are remotely located from each other and have a number of general parameters. They function according to general rules, which are defined centrally simultaneously for all databases included in the information system. Information is exchanged according to rules that are also centrally determined.

The organization of a distributed information system is necessary for enterprises engaged in various types activities, if there is a need to solve such problems as the need to quickly obtain information from the database of remotely located units. Also, the need to implement such a system may arise when it is necessary to consolidate in a common database the information contained in the databases of legal entities that are part of the enterprise structure. This is carried out for the purpose of further data analysis and generation of reports from one database, both for the enterprise as a whole and separately for each legal entity.

Such an information system is implemented when it is necessary to introduce centralized changes in the structure and configuration of the database operating rules for the functioning of all remote departments and legal entities. At the same time, the ability to change certain rules directly from remote units may be prohibited.

Also, the implementation of a distributed information system is carried out if it is necessary to ensure control over changes in data in remotely located departments of the organization.

The procedure for organizing a distributed information system consists of two stages. At the first stage, preparatory work is carried out: the structures of the information system, the rules for migrating information between databases that are part of a distributed information system, as well as the rules for limiting changes in such databases are determined.

The second stage includes the process of preparing a distributed information system. At this stage, the selection of optimally suitable software is carried out, with the help of which distributed information base, working according to the rules described as a result of the preparatory work. Also at this stage, the selected software is configured in order to organize and effectively manage distributed information systems.

As an example, let's consider a corporate information system - the Regional Distributed Education Information System (RDIS).

Tasks of RRISO (Fig. 5.1):

1. Maintaining a centralized database to ensure system management.
2. Integration of heterogeneous databases of pedagogical and management information.
3. Providing a unified user interface and generating standard documents.
4. Creation of a centralized electronic library and support for the work of students and teachers with peripheral electronic libraries.
5. Support distance learning and independent testing.
6. Sharing computing resources and equipment.
7. Automatic exchange electronic information between educational institutions, automation of the processes of creating, processing and storing information.
8. Protection of information posted in RRISON and copyright of database developers, electronic educational materials and applications.
9. Support group work in the preparation of electronic educational materials, training, and scientific research.
10. Integration with similar information systems of foreign and domestic computer networks.

Rice. 5.1.

An object automation (Fig. 5.2) has a geographically distributed structure. It consists of the regional education department, municipal education authorities, district education authorities, educational institutions. All of them are dispersed over a large area of the region. They interact with the administrations of the region, cities, districts, with students and their parents, and the public.

Rice. 5.2.

The purpose of the information system is monitoring in the field of education (Fig. 5.3).

Rice. 5.3.

The regional distributed information system has a hierarchical organization (Fig. 5.4).

Hierarchical system structure due to the presence of several levels of education management: regional level (Department of Education and structural divisions administration of the region), the level of large municipalities(education authorities, divisions city administration - regional center), level of regional and urban districts, level of individual educational institutions various types and types, other departments, institutions and organizations providing social services, protection of the rights of children and adolescents.

Automation information exchange ensures consistency of data used on various levels information system, increases them reliability.

Rice. 5.4.

Interaction between educational authorities and educational institutions that exist between them information flows, are determined by the regulations of the regional education department. The IS must have an architecture that matches the structure of the automation object. The system being developed must include subsystems belonging to several hierarchy levels:

· Level of educational institutions. The components of this level differ in the set of functions they implement, depending on the type of educational institution. The main purpose of these components in this system is to collect primary information about the activities of an educational institution and generate reports (information about the activities of specific educational institutions of various types) for education management bodies and state statistics bodies, as well as maintaining the management functions of an educational institution, organization educational process in him. The need to combine these functions in one application is dictated by the requirement to minimize manual processing of information, its re-entry and duplication, which is a source of errors in the operation of information systems.
· Level of municipal educational authorities of the region's districts. The main purpose of these subsystems is to obtain primary information from educational institutions, its integration and transfer to a higher level, the generation of reports (information on the activities of educational institutions of the district, city) for higher education authorities and state statistics bodies, as well as maintaining the functions of management of educational institutions of the relevant territories.
· Level of the Department of Education of the region. The main purpose of the components this level is the analysis of information received from subsystems of lower levels, maintaining education management functions, forming a state statistical reporting and maintenance integrated system monitoring in the field of education.

Subsystems of each level ensure the maintenance of primary information and documentation support for the activities of educational institutions and education management bodies, the generation of primary and summary reports, information exchange with other subsystems, and information protection.

Rice. 5.5.

Architecture IS corresponds to the multi-level structure of the region's education system. The system includes subsystems of several levels (Fig. 5.5):

· Information systems of educational institutions of various types and types.
· Information systems of municipal (territorial, district) education authorities.
· Information system of educational authorities at the regional level.

A regional-scale system must support the possibility of distributed storage and distributed data processing.

Each subsystem works with its own local database, but a single model. Data is fragmented. To implement the possibility of data transfer between DB subsystems used component data replication.

All changes made to data model if it is necessary to expand it, adjustments to new information needs are transferred to those subsystems whose work is affected by the updates.

Integration subsystems are implemented based on BizTalk technology Server.

Software technology platform - Microsoft. NET.

Rice. 5.6.

Software The IS (Fig. 5.6) is flexibly configured during installation: it is configured to perform the functions of a subsystem of the appropriate level, to work in educational institutions of various types and types, various conditions operation.

Users have the opportunity to carry out search and selection of documents, their viewing (through document management components).

The system supports automation functions for standard operations, office work and document flow (through business process management components). Changes in DB are entered only through the execution of appropriate operations, during which the primary data is changed and documents are created.

Performing operations and working with documents are carried out in accordance with the rights of users determined by their affiliation with certain category, job responsibilities.

Architecture of distributed information systems and Web applications

Distributed system is a set of independent computers, which appears to their users as a single unified system. Despite the fact that all computers are autonomous, for users they appear to be a single system.

The main characteristics of distributed systems:

1. The differences between computers and methods of communication between them are hidden from users. The same applies to the external organization of distributed systems.

2. Users and applications experience a consistent experience across distributed systems, no matter where or when they interact.

Distributed systems should also be relatively easy to expand, or scale. This characteristic is a direct consequence of having independent computers, but at the same time does not indicate how these computers are actually combined into a single system.

To maintain a unified view of the system, distributed systems often include an additional layer of software that sits between top level, where users and applications reside, and a lower level consisting of operating systems(Figure 1.11).

Accordingly, such a distributed system is usually called intermediate level system (middleware). Note that the intermediate layer is distributed among many computers.

Features of the functioning of distributed systems include:

· Availability large quantities objects;

· request execution delays (for example, if local calls require about a couple of hundred nanoseconds, then requests to an object in distributed systems require from 0.1 to 10 ms);

· some objects may not be used for a long time;

· distributed components are executed in parallel, which leads to the need to coordinate execution;

· requests in distributed systems have a high probability of failure;

· increased safety requirements.

Due to the presence of increased delays, interfaces in distributed system should be designed to reduce query execution time. This can be achieved by reducing the frequency of access, as well as by enlarging the functions performed.

To combat failures, clients are required to check whether requests are being executed by the server. Security in distributed applications can be improved by monitoring communication sessions (authentication, authorization, data encryption).

The architecture of Web applications (Web services) is widely used nowadays. Web service is an application accessible via the Internet. It provides services, the form of which does not depend on the service provider, since it uses a universal operating platform and universal format data (XML). IN Web-based-services are based on standards that define the formats and language of queries, as well as protocols for searching for these services on the Internet. The scheme for accessing the database via the Internet is shown in Fig. 1.12.

Figure 1.12 – Scheme of access to the DBMS server via the Internet

There are currently three various technologies, supporting the concept of distributed object systems: EJB, DCOM CORBA.

The main idea behind the development of EJB technology ( Enterprise Java Beans) - create an infrastructure for components so that they can be easily inserted and removed from servers, thereby increasing or decreasing the functionality of the server. EJB components are Java classes and can run on any EJB-compatible server, even without recompilation. The main goals of EJB technology are:

1. Make it easier for developers to create applications by relieving them of the need to implement services such as transactions, threads, loads, etc. from scratch. Developers can concentrate their attention on describing the logic of their applications, shifting the tasks of storing, transferring and security of data to the EJB system .

2. Describe the main structures of the EJB system and the interfaces for interaction between its components.

3. Free the developer from implementing EJB objects due to the presence of a special code generator.

Thanks to the Java model used, EJB is relatively simple and in a fast way creation of distributed systems.

DCOM technology ( Distributed Component Object Model) is a software architecture developed by Microsoft for distributing applications across multiple computers on a network. Software component one computer can use DCOM to pass messages to a component on another computer. DCOM automatically establishes a connection, transmits a message, and returns a response from the remote component. DCOM's ability to interconnect components has allowed Microsoft to provide Windows nearby additional features, in particular, to implement Microsoft server Transaction Server, responsible for executing database transactions over the Internet.