Brief description of the HTTP protocol. HTTP protocol

Hello, blog site reader! Let's continue to get acquainted with the HTTP protocol in the section Servers and protocols and its section HTTP protocol. This entry is the final one in the series of notes on the HTTP protocol, after which I will prepare navigation and, perhaps, there will be some posts related to the HTTP protocol, but not directly related to it. Basically, this post will help you understand how it works HTTP protocol, and if you need details, follow the links that I have placed throughout the article for you.

What is the HTTP protocol?

Let's define what it is HTTP protocol, but before we define the term HTTP protocol, let's understand the word protocol. The word protocol is literally translated from Greek as first and glue. In ancient times, it was a piece of paper that was glued to a scroll and on it the author wrote his name, date of writing, etc. unnecessary information, or rather, official. Why am I saying unnecessary things? Yes, because the average person is more interested in the content of the scroll itself, and not in who wrote it. So in HTTP protocol: The average user is not at all interested in how he receives site pages, he simply opens his browser. Another definition of the word protocol is an algorithm, or a sequence of actions. A protocol is a set of rules and regulations that govern a particular event. A data transfer protocol is a standard that describes the rules for interaction between functional blocks when transferring data.

So, we have decided that HTTP is a data transfer protocol, but what does the abbreviation HTTP mean? HTTP or HyperText Transfer Protocol is a hypertext transfer protocol. And now I will give the most interesting definition of the HTTP protocol that I have ever come across.

HTTP protocol- these are the rules traffic on the Internet, only if in life people may not follow traffic rules and nothing will happen to them for it, then failure to comply with the rules of the HTTP protocol leads to the fact that the user will not be able to work on the Internet.

The HTTP protocol is a data transfer protocol of the seventh layer of the OSI model, operating on the basis of client-server technology.

The HTTP protocol is an abstraction over the third and fourth layers reference model, expanding the possibilities of communication between people.

HTTP protocol- an initially simple hypertext transmission protocol, which can now be used to transmit anything.

The HTTP protocol is a transport for other protocols, such as JSON.

The HTTP protocol is a technology that any web developer should understand.

Well, I think we have figured out what it is HTTP protocol and we can now see where it is used.

What is the HTTP protocol used for?

I'll tell you straight HTTP protocol- this is the basis of the Internet, or rather not so, this is the basis that the end consumer sees: website visitors. Therefore, the HTTP protocol is everywhere on the Internet. The phrase sounds strange, but I couldn’t think of another one. When reading news on the site, you use the HTTP protocol. When listening to VKontakte music, you use the HTTP protocol. When you watch a video on YouTube, you are using the HTTP protocol. When you play a browser game, you also use the HTTP protocol. That's why I write that the HTTP protocol is used everywhere on the Internet. Without it, you would not be able to read this text. To summarize: the HTTP protocol is used to transfer data on the Internet; it was originally used to transfer HTML documents, but now it allows you to transfer various content and various.

Characteristics of the HTTP protocol

Let's list the technical HTTP protocol characteristics:

  1. HTTP protocol works using technology.
  2. The HTTP protocol belongs to the seventh level.
  3. The HTTP protocol belongs to the TCP/IP protocol family.
  4. For data transmission via HTTP protocol, port 80 TCP or 8080 is used.
  5. RFC 2616 protocol specification.
  6. To identify a resource, the HTTP protocol uses a URI (read about).
  7. The HTTP protocol does not have intermediate states between the request and the response; of course, the client can receive a response with code 100, but this is already a response, and not an intermediate state.
  8. HTTP protocol synchronous, but allows the client to send several requests in a row without waiting for a response from the server, provided that the server responds to the requests in the order in which they arrived.

This is just part technical characteristics protocol, but in my opinion, the most important characteristics to understand its essence.

The HTTP protocol works on the client-server principle

Yes, HTTP protocol works on the client-server principle. The simplest example that comes to my mind now in order to explain the essence of client-server interaction is the example of a buyer and seller in a store. The buyer comes to the store and says to the seller: Hello! If the seller is rude, he replies: paint the fence! Then the buyer smiles, stands, looks at the display window and chooses what to buy. Meanwhile, the seller stands and silently waits for the client to choose. The client has made a choice and says to the seller: give me that brown crap that is on the top shelf in the far corner. The seller says: right now. After which he takes a stool, places it in the far corner, removes the brown crap from the shelf and brings it to the buyer. The buyer takes the brown crap, gives the money and leaves. And the seller, having received the money, puts it in the cash register.

The point of this story is to show the client-server interaction. (V in this case the buyer) completely controls the development of events, that is (in our example, the seller) in no case establishes contact himself, he patiently waits for the client’s actions and somehow reacts to them. I gave the simplest example. But it can be complicated, for example, the buyer gives one hundred rubles, and the brown crap costs 90, in this case the seller will give the client change. The seller could have responded to the client’s words: Hello!, in some other way. Or the brown stuff might not be for sale or for sale, but only for special clients. What I mean by this is that HTTP protocol is a data transfer protocol based on client-server interaction and it, in principle, quite fully describes the action algorithms for both the client and the server in various situations.

History of HTTP: HTTP protocol standards

Let's now look at the story HTTP protocol to his standards.

  1. – A version of the HTTP9 protocol was developed in 1991 at CERN by Tim Berners-Lee. Tim developed the HTTP protocol to facilitate access and navigation using hypertext. The HTTP/0.9 standard contains the basic syntax and semantics of the HTTP protocol.
  2. In 1996, RFC 1945 (HTTP/1.0 standard) was released.
  3. In 1997, a version of the HTTP1 protocol was released: the HTTP/1.1 standard was developed and described in RFC 2068. In 1999, the HTTP/1.1 standard was finalized (namely, the HTTP/1.1 standard). On this moment most applications use HTTP protocol version 1.1. By the way, sending information about yourself in the header.
  4. In 2015, the final version of the draft HTTP 2 protocol was published; this is not yet a standard, but the draft “shows” us where the development of the Internet will move.

HTTP protocol clients

The most common example of an HTTP protocol client is a browser, here are the most popular clients HTTP protocol:

  • Google Chrome;
  • Mozilla FireFox;
  • Opera;
  • Internet Explorer;
  • Yandex browser;
  • Safari.

Often, instead of the term client, you may hear user agent, be aware that HTTP protocol makes no distinction between the terms client and user agent.

HTTP protocol servers

The status line is separated from the header by the CRLF character at the end of this very line from the HTTP header (you can get this character in Windows by pressing the Enter key - line break), and the HTTP header is separated from the message body by a line in which there is only one character - CRLF.

Requests and responses have common service headers that can be used both in the request and in the response HTTP servers. I also want to note that there is a group of headers related to objects (message body), they can all be used both in requests and in responses, with the exception of the Allow header field, which is used only in server responses when interacting via the HTTP protocol. An HTTP message has a length, which is measured in bytes; if your HTTP message has a body, then the following rules apply in order to determine the length of the message:

  1. Any HTTP server response message, which must not include a message body, must always be completed empty line after the headings.
  2. If in HTTP headers messages contains the Transfer-Encoding field (HTTP encoding) and this field has the value chunked, then the length of the HTTP message should be determined by the chunked encoding method.
  3. If the HTTP message header has a Content-Length field, then the value that is written in Content-Length is the length of the HTTP message, measured in bytes.
  4. If an HTTP message uses "multipart/byteranges" media types, which is itself delimited, then it determines the length.
  5. The length of the HTTP message is determined by closing the connection on the server side.

For clarity, let's look at example messages in the HTTP protocol and the first thing we will look at is an example request in the HTTP protocol:

POST /cgi-bin/process.cgi HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Host: www.example.com Content-Type: application/x-www-form-urlencoded Content -Length: length Accept-Language: ru-ru Accept-Encoding: gzip, deflate Connection: Keep-Alive licenseID=string&content=string&/paramsXML=string

POST /cgi-bin/process. cgi HTTP/1.1

User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)

Host: www. example. com

Content - Type : application / x - www - form - urlencoded

Content - Length : length

Accept - Language : ru - ru

Accept - Encoding : gzip , deflate

Connection: Keep - Alive

licenseID=string&content=string&/paramsXML=string

Number Status code class inHTTP protocol and its description
1 HTTP 1xx status codes: informational The server sends this status code when a request has been received but not yet processed.
2 HTTP status codes 2xx: successful The server will send you this code when it has successfully received and processed the client's HTTP message.
3 HTTP status codes 3xx: redirect If you received a status code starting with a three from the server, this means that you need additional actions to end the process HTTP processing request.
4 HTTP status codes 4xx: client error If you see a status code that starts with a four, this means that an error occurred due to the fault of the client.
5 HTTP status codes 5xx: server error A status code starting with a five indicates that an error occurred on the server side.

HTTP Message Header Fields

IN HTTP protocol there are header fields that allow you to configure the interaction between the client and server, as well as how and in what form useful information will be received by the end user. General syntax The header fields are quite simple: fieldname: value1, value2

The header fields are separated by the CRLF character. HTTP protocol divides header fields into four groups:

  1. Common header fields. Such headers can be used in any messages transmitted over the HTTP protocol.
  2. Request header fields. These messages can only be sent in HTTP protocol requests.
  3. Response header fields. As the name suggests, these fields are only used in HTTP responses.
  4. Message body header fields. And these fields are used when it is necessary to determine how and in what form information will be presented to the end user, which is transmitted via HTTP.

HTTP protocol caching

To reduce network load and improve efficiency HTTP protocol a caching mechanism was implemented. It often happens that the user does not even realize that the page opened in his browser was loaded not from the site he accessed, but from the cache. We will not go into the internal caching mechanisms of a particular server/client, but will just look at what is available directly HTTP protocol to manage caching. And, as you probably already guessed, in the HTTP protocol, caching is controlled by header fields and directives, that is, the values ​​of these very fields.

I note that HTTP protocol implemented so that these directives must be followed by all participants in the chain between the client and the final server. Conditional directives can be divided into client and server. Let's look at the HTTP protocol directives designed to control client-side caching.

Number Header Field DirectivesCacheControlfor the client and their description
1 nocache The HTTP no-cache protocol directive tells the server that for a subsequent request, the response should not be sent from the cache without checking with the contents of the origin server.
2 nostore The HTTP no-store directive tells the server that neither the client request nor the server response should be cached. This is for safety reasons.
3 maxage = seconds The max-age HTTP protocol directive tells the server that the cache should be no older than the time specified in seconds.
4 maxstale [ = seconds ] The max-stale HTTP protocol directive tells the server that the client will accept a cached HTTP response if its age does not exceed the time specified in seconds.
5 minfresh = seconds The min-fresh HTTP protocol directive tells the server that the client will accept a cached HTTP response if the cache lifetime is no more than the specified seconds.
6 The HTTP min-fresh directive tells the server that no transformations should be applied to the requested resource.
7 onlyifcached
The min-fresh HTTP protocol directive tells the server that the client will only accept a cached HTTP response; if a suitable response is not in the server's cache, then nothing needs to be done.

Now let's take a look at the directives that allow .

Number Header Field DirectivesCacheControlfor the server and their description
1 public The HTTP Public protocol directive states that the server response can be stored in any cache.
2 private HTTP protocol directive private indicates that the server response should be stored in a private cache, which is intended only for this user.
3 nocache HTTP protocol directive nocache states that a cached response should not be sent to the client without first checking it.
4 nostore HTTP protocol directive nostore indicates that the server response cannot be stored in the cache.
5 notransform HTTP protocol directive notransform indicates that no transformations should be applied to the server's response by any node in the chain.
6 mustrevalidate HTTP protocol directive mustrevalidate says that if the server's HTTP message is stale, then pre-validation should be applied to it.
7 proxyrevalidate HTTP protocol directive proxyrevalidate says the same as the previous directive, but only for intermediate servers.
8 maxage = seconds HTTP protocol directive maxage indicates how long the cache lives on the server.
9 Header field directiveCacheControlserver response:smaxage = seconds The Public server response directive says the same as the max-age directive, but for CDN servers

Both client and server HTTP applications must be able to compare data from the cache so as not to send unnecessary traffic through the network and at the same time the end user receives responses to his requests up-to-date information. To this end, in HTTP protocol a special Last-Modified field was introduced and conditional methods request with conditional header fields. The Last-Modified field indicates the date and time the cached version was created; the value of this field can be compared with the date and time value of the moment when the Last update original resource and if the values ​​match, then the data comes to the client from the cache.

And if the client makes a repeated request to the same resource, then the browser can include a conditional header field in the client’s message, the server, having received such a field, will analyze the contents of the resource, compare it with what it sent previously, and if the comparison is equivalent, then the browser it will return a 304 (unmodified) message, after which the browser will spit out the contents of the page from its cache for the user.

And also HTTP protocol allows you to assign a tag to each HTTP object in the ETag header field, in fact, this is a hash sum of the object itself and for each non-repeating object it is unique, therefore the HTTP protocol caching mechanism actively uses this field to check the relevance of the data stored in the cache.

HTTP protocol security

HTTP protocol Designed for data transfer and no nails. There are no encryption mechanisms or mechanisms in the HTTP protocol, since all kinds of encoding mechanisms of the HTTP protocol can hardly be called data protection, but transmits the user's login and password in unencrypted form.

But the HTTP protocol has an extension HTTPS, please note HTTPS is not a protocol, but an extension of the HTTP protocol that uses TCP port 433. This extension is a combination of two protocols: HTTP and SSL or HTTP and TLS (TLS and SSL, essentially the same thing).

Don't forget to share your opinion in the comments and leave feedback, this will help make our work better, with respect!

HTTP (HyperText Transfer Protocol) was developed as a basis World Wide Web.

The HTTP protocol works as follows: the client program establishes a TCP connection with the server ( Standart room port-80) and issues an HTTP request to it. The server processes this request and issues an HTTP response to the client.

HTTP request structure

An HTTP request consists of a request header and a request body, separated by an empty line. The request body may be missing.

The request header consists of the main (first) line of the request and subsequent lines that clarify the request in the main line. Subsequent lines may also be missing.

The main line query consists of three parts, separated by spaces:

Method(in other words, the HTTP command):

GET- document request. The most commonly used method; in HTTP/0.9, they say, he was the only one.

HEAD- document title request. It differs from GET in that only the request header with information about the document is returned. The document itself is not issued.

POST- this method is used to transfer data to CGI scripts. The data itself appears in subsequent lines of the request in the form of parameters.

PUT- place the document on the server. As far as I know, it is rarely used. A request with this method has a body in which the document itself is transmitted.

Resource- this is the way to specific file on the server that the client wants to receive (or place - for the PUT method). If the resource is simply some file to be read, the server must return it in the response body for this request. If this is the path to a CGI script, then the server runs the script and returns the result of its execution. By the way, thanks to this unification of resources, the client is practically indifferent to what he represents on the server.

Protocol version-version of the HTTP protocol with which the client program works.

So a simple HTTP request might look like this:

This requests the root file from the web server's root directory.

Lines after main line requests have the following format:

Parameter: value.

This is how the request parameters are set. This is optional; all lines after the main query line may be missing; in this case, the server accepts their value by default or based on the results of the previous request (when working in Keep-Alive mode).

I will list some of the most commonly used HTTP request parameters:

Connection(connection) - can take the values ​​Keep-Alive and close. Keep-Alive means that after issuing this document, the connection to the server is not broken, and more requests can be issued. Most browsers work in Keep-Alive mode, since it allows you to “download” an html page and images for it in one connection to the server. Once set, Keep-Alive mode is maintained until the first error or until the next Connection: close request is explicitly specified.
close ("close") - the connection is closed after responding to this request.

User-Agent- the value is the browser "code", for example:

Mozilla/4.0 (compatible; MSIE 5.0; Windows 95; DigExt)

Accept- a list of content types supported by the browser in order of their preference for a given browser, for example for my IE5:

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*

This is obviously necessary for the case when the server can output the same document in different formats.

The value of this parameter is used mainly by CGI scripts to generate a response tailored for a given browser.

Referrer- URL from which you came to this resource.

Host- the name of the host from which the resource is requested. Useful if the server has several virtual servers under the same IP address. In this case the name virtual server determined by this field.

Accept-Language- supported language. Significant for a server that may serve the same document in different language versions.

HTTP response format

The response format is very similar to the request format: it also has a header and body separated by an empty line.

The header also consists of a main line and parameter lines, but the format of the main line is different from that of the request header.

The main query string consists of 3 fields separated by spaces:

Protocol version- similar to the corresponding request parameter.

Error code- code designation of the “success” of the request. Code 200 means "everything is normal" (OK).

Verbal description of the error- “deciphering” the previous code. For example, for 200 it is OK, for 500 - Internal Server Error.

The most common http response parameters:

Connection- similar to the corresponding request parameter.
If the server does not support Keep-Alive (there are some), then the Connection value in the response is always close.

Therefore, in my opinion, the correct browser tactic is the following:
1. issue Connection: Keep-Alive in the request;
2. The connection status can be judged by the Connection field in the response.

Content-Type("content type") - contains a designation of the content type of the response.

Depending on the Content-Type value, the browser interprets the response as an HTML page, gif picture or jpeg, as a file to be saved to disk, or whatever, and takes appropriate action. The Content-Type value for the browser is the same as the file extension value for Windows.

Some content types:

text/html - text in HTML format(webpage);
text/plain - plain text (similar to Notepad);
image/jpeg - picture in JPEG format;
image/gif - the same, in GIF format;
application/octet-stream - a stream of "octets" (i.e. just bytes) to write to disk.

There are actually many more types of content.

Content-Length("content length") - the length of the response content in bytes.

Last-Modified("Modified to last time") - date last change document.

For the World Wide Web. Such protocols are structured text that uses logical connections (hyperlinks) between nodes containing specific data. Thus, it is a way of exchanging or transmitting hypertext.

The HTTP protocol operates as a request-response function in a client-server computing model. So, the web browser acts as a client, and the website hosting is the server. The client sends an HTTP request message to a server that provides certain resources (such as HTML files and other materials) and then returns a response message. The response contains information about the request, and may also contain the requested content in the body of the message.

The browser is a basic example of a user agent (client). Other types of user agents include software used for indexing by search providers, mobile applications and other resources that use or display web content.

The HTTP protocol is designed to provide intermediate network elements to enhance or enable communication between clients and servers. High traffic sites often benefit from cached web servers that serve content on behalf of upstream resources, reducing load times. The cache of web browsers allows the user to reduce network traffic. Proxy servers that use the HTTP protocol in local network, can provide connectivity for clients that do not allow global address routing by relaying messages from external servers.

An HTTP session is a sequential process of requests and responses. The client initiates a request by making a TCP connection to a specific port on the server, and the server listens on that port and waits for the request message. When it is received, the server sends a response message. The body of this message typically represents the resource requested, although an error message or other information may be displayed.

When considering the purpose of the HTTP protocol, it should be noted that it defines methods for the purpose of specifying required action, performed using identified resources. In this case, the type of information displayed (pre-existing data or dynamically generated) depends on the server implementation. Often such a resource corresponds to a file or script located on the hosting.

Some methods that the HTTP Hypertext Transfer Protocol uses are intended only for retrieval of information and should not change the state of the server. In other words, they do not have a serious impact, except for the relatively harmless effects of caching or increasing visit statistics.

On the other hand, the HTTP protocol can also use methods that are intended for actions that can affect either the server or other external resources- activate financial operations or transfer Email. Occasionally, such methods are used by web robots or some sites and can make requests regardless of the main task.

Allows you to receive various resources, such as HTML documents. HTTP protocol underlies the exchange of data on the Internet. HTTP is a client-server communication protocol, which means requests to the server are initiated by the recipient itself, usually a web browser. The resulting final document will be reconstructed from various sub-documents, for example, from separately obtained text, a description of the document structure, images, video files, scripts and much more.

Clients and servers communicate by exchanging individual messages (rather than a stream of data). Messages sent by a client, usually a web browser, are called requests, and messages sent by the server are called answers.

Although HTTP was developed in the early 1990s, it has been continually improved due to its extensibility. HTTP is a protocol application level, which most often uses the capabilities of another protocol - TCP (or TLS - secure TCP) - to forward its messages, however, any other reliable transport protocol can theoretically be used to deliver such messages. Due to its extensibility, it is used not only for the client to receive hypertext documents or images and videos, but also for transmitting content to servers, for example, using HTML forms. HTTP can also be used to retrieve only parts of a document for the purpose of updating a web page on demand.

Components of HTTP-based systems

HTTP is a client-server protocol, that is, requests are sent by one party - the user-agent (or a proxy instead). Most often, a web browser acts as a user agent, but it can be anyone, for example, a robot traveling the Web to replenish and update web page indexing data for search engines.

Each individual request request) is sent to the server, which processes it and returns a response (eng. response). Between these requests and responses there are numerous intermediaries called proxies that perform various operations and work as gateways or caches, for example.

In reality, between the browser and the server there are many more different intermediary devices that play some role in processing the request: routers, modems, and so on. Due to the fact that the Network is built on the basis of a system of interaction levels (layers), these intermediaries are “hidden” at the network and transport levels. In this level system, HTTP occupies the most top level which is called the "application" (or "application layer"). Knowledge of network layers such as presentation, session, transport, network, link and physical, having important to understand network operation and diagnostics possible problems, are not required to describe and understand HTTP.

Client: user agent

A user agent is any tool or device that acts on behalf of a user. This role primarily belongs to the web browser; In some cases, user agents are programs that are used by engineers and web developers to debug their applications.

Browser Always is the entity that initiates the request. The server never does this (although over the many years of the network's existence, mechanisms have been created that can simulate requests from the server).

To display a web page, the browser sends an initial request to obtain the HTML document of that page. After this, the browser parses this document and requests additional files, necessary for displaying the content of the web page (executable scripts, information about the page layout - CSS tables styles, additional resources in the form of images and video files). Next, the browser connects all these resources to display them to the user in the form single document- web pages. Scripts executed by the browser itself can receive additional resources over the network at later stages of processing of the web page, and the browser updates the user's view of that page accordingly.

A web page is a hypertext document. This means that some parts of the displayed text are links that can be activated (usually by clicking a mouse button) to retrieve and therefore display a new web page. This allows the user to direct their user agent when navigating the Web. The browser translates these “traffic directions” into HTTP requests and subsequently interprets the HTTP responses in a user-readable form.

Web server

On the other side of the communication channel there is a server that serves (eng. serve) user, providing him with documents upon request. From point of view end user, the server is always one virtual machine, which completely or partially generates a document, although in fact it can be a group of servers between which the load is balanced, that is, requests from different users are redistributed, or a complex software polling other computers (such as caching servers, database servers, application servers ecommerce and others).

A server is not necessarily located on one machine, and vice versa - several servers can be located (hosted) on the same machine. According to HTTP/1.1 version and having a Host header, they can even share the same IP address.

Proxy

Between the web browser and the server are a large number of network nodes transmitting HTTP messages. Due to their layered structure, most of them also operate at the transport network or physical layers, becoming transparent to the HTTP layer and potentially reducing performance. These application-level operations are called proxy . They may or may not be transparent (modifying requests will not pass through them), and can perform many functions:

  • caching (cache can be public or private, like browser cache)
  • filtering (like antivirus scanning, parental control, …)
  • load balancing (allow multiple servers to serve different requests)
  • authentication (control access to different resources)
  • logging (permission to store transaction history)

Basic Aspects of HTTP

HTTP is simple

Even with the greater complexity introduced in HTTP/2 by encapsulating HTTP messages in frames, HTTP is generally simple and human-readable. HTTP messages can be read and understood by humans, providing easier testing for developers and reduced complexity for new users.

HTTP - extensible

The HTTP headers introduced in HTTP/1.0 made the protocol easy to extend and experiment with. New functionality can even be introduced by a simple agreement between client and server on the semantics of the new header.

HTTP is stateless but has a session

HTTP is stateless: there is no relationship between two requests that are executed sequentially over the same connection. This immediately implies the possibility of problems for the user attempting to interact with specific page consistently, for example, when using a shopping cart in an electronic store. But while HTTP core is stateless, cookies enable stateful sessions. Using header extensibility, cookies are added to the worker thread, allowing the session to share some context, or state, on each HTTP request.

HTTP and connections

The connection is managed at the transport layer, and therefore fundamentally goes beyond the boundaries of HTTP. Although HTTP does not require the underlying transport protocol to be connection-based, requiring only reliability, or no lost messages (i.e., at least an error representation). Among the two most common Internet transport protocols, TCP is reliable while UDP is not. HTTP subsequently relies on the TCP standard being connection-based, even though a connection is not always required.

HTTP/1.0 opened a TCP connection for each request/response exchange, with two important disadvantages: opening a connection requires multiple message exchanges and is therefore slow, although it becomes more efficient when sending multiple messages, or when sending messages regularly: warm connections are more effective than cold.

To mitigate these shortcomings, HTTP/1.1 introduced pipelining (which proved difficult to implement) and persistent connections: lying in TCP based the connection can be partially controlled through the Connection header. HTTP/2 took the next step by adding multiplexing of messages across a simple connection, helping to keep the connection warm and more efficient.

Experiments are underway to develop better transport protocol, more suitable for HTTP. For example, Google is experimenting with QUIC, which is based on UDP, to provide a more reliable and efficient transport protocol.

What can be controlled via HTTP

The natural extensibility of HTTP has allowed greater control and functionality of the Web over time. Cache and authentication methods were early features in HTTP history. The ability to relax the original restrictions, on the other hand, was added in the 2010s.

Listed below general functions, managed with HTTP.


  • The server can instruct proxies and clients what to cache and for how long. The client can instruct intermediate cache proxies to ignore stored documents.
  • Relaxing Source Constraints
    To prevent spyware and other privacy-violating intrusions, the web browser enforces strict segregation between websites. Only pages from same source can access information on the web page. Although such restrictions are taxing on the server, HTTP headers can relax the strict separation on the server side, allowing the document to become part of information from different domains (for security reasons).
  • Authentication
    Some pages are only available to special users. Basic authentication can be provided via HTTP, either through the use of the WWW-Authenticate and similar headers, or by setting up a special session using cookies.
  • Proxy and tunneling
    Servers and/or clients are often located on an intranet, and hide their true IP addresses from others. HTTP requests go through a proxy to cross this network barrier. Not all proxies are HTTP proxies. The SOCKS protocol, for example, operates at a lower level. Others, such as ftp, can be handled by these proxies.
  • Sessions
    Using an HTTP cookie allows you to associate a request with a state on the server. This creates a session, even though HTTP is a stateless protocol at its core. This is useful not only for shopping carts in online stores, but also for any sites that allow the user to customize the exit.

HTTP stream

When a client wants to communicate with a server, whether it is a final server or an intermediate proxy, it follows these steps:

  1. Opening TCP connections: A TCP connection will be used to send a request or requests and receive a response. The client can open a new connection, reuse an existing one, or open multiple TCP connections to the server.
  2. Sending an HTTP message: HTTP messages (before HTTP/2) are human-readable. Since HTTP/2, simple messages are encapsulated in frames, making them impossible to read directly, but fundamentally remain the same.
  3. GET / HTTP/1.1 Host: site Accept-Language: fr
  4. Reads response from server: HTTP/1.1 200 OK Date: Sat, 09 Oct 2010 14:28:02 GMT Server: Apache Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT ETag: "51142bc1-7449-479b075b2891b" Accept-Ranges: bytes Content-Length: 29769 Content-Type: text/html

Closes or reuses the connection for further requests.

If the HTTP pipeline is enabled, multiple requests can be sent without waiting for the first response to be received in its entirety. The HTTP pipeline is difficult to integrate into existing networks, where old pieces of software coexist with modern versions. The HTTP pipeline was replaced in HTTP/2 with more reliable multiplexed requests in a frame.

HTTP messages

HTTP/1.1 and earlier HTTP messages are human-readable. In HTTP/2, these messages are embedded in a new binary structure, a frame, that allows optimizations such as header compression and multiplexing. Even if part of the original HTTP message is sent in this version of HTTP, the semantics of each message are not changed and the client recreates (virtually) the original HTTP request. It is also useful for understanding HTTP/2 messages in HTTP/1.1 format.

HTTP We present to you a description of the main aspects of the HTTP protocol - a network protocol that, from the early 90s to this day, allows your browser to load web pages. This article was written for those who are just starting to work with computer networks and develop network applications, and who still find it difficult to read the official specifications on their own.

- a widely used data transfer protocol, originally intended for the transfer of hypertext documents (that is, documents that may contain links that allow navigation to other documents). The abbreviation HTTP stands for, "hypertext transfer protocol". According to the OSI specification, HTTP is an application (upper, 7th) layer protocol. The current version of the protocol, HTTP 1.1, is described in the RFC 2616 specification.

The HTTP protocol involves the use of a client-server data transfer structure. The client application generates a request and sends it to the server, after which the server software processes the request, generates a response and sends it back to the client. The client application can then continue to send other requests, which will be processed in the same way.

A task that is traditionally solved using the HTTP protocol is the exchange of data between a user application that accesses web resources (usually a web browser) and a web server. At the moment, it is thanks to the HTTP protocol that the World Wide Web operates.

HTTP is also often used as a transport protocol for other application layer protocols such as SOAP, XML-RPC and WebDAV. In this case, the HTTP protocol is said to be used as a “transport”.

The API of many software products also implies the use of HTTP for data transfer - the data itself can be in any format, for example, XML or JSON.

Typically, HTTP data transfer is carried out over TCP/IP connections. In this case, server software usually uses TCP port 80 (and, if the port is not specified explicitly, then client software usually uses port 80 by default for opening HTTP connections), although it can use any other one.

How to send an HTTP request?

The easiest way to understand the HTTP protocol is to try to access some web resource manually. Imagine that you are a browser and you have a user who really wants to read articles by Anatoly Alizar.

Let's say he entered the following in the address bar:

Http://alizar.site/

Accordingly, you, as a web browser, now need to connect to the web server at alizar.site.

To do this, you can use any suitable command line utility. For example, telnet:

Telnet alizar.site 80

Let me clarify right away that if you suddenly change your mind, press Ctrl + “]” and then enter - this will allow you to close the HTTP connection. In addition to telnet, you can try nc (or ncat) - depending on your taste.

After you connect to the server, you need to send an HTTP request. This, by the way, is very easy - HTTP requests can consist of just two lines.

In order to generate an HTTP request, you need to compose a starting line, and also set at least one header - this is the Host header, which is mandatory and must be present in every request. The fact is that the conversion of a domain name to an IP address is carried out on the client side, and, accordingly, when you open a TCP connection, the remote server does not have any information about which address was used for the connection: it could be, for example , address alizar..ru or m.. However, in fact, the network connection in all cases opens with node 212.24.43.44, and even if initially when opening the connection it was not this IP address, but some domain name that was specified, then the server reports this is not informed in any way - and that is why this address must be passed in the Host header.

The starting (initial) request line for HTTP 1.1 is composed according to the following scheme:

For example (such a starting line may indicate that the main page of the site is being requested):

And, of course, don’t forget that any technology becomes much simpler and clearer when you actually start using it.

Good luck and fruitful learning!

Tags:

  • http
  • alizar
  • spdy
Add tags