THE HTTP PROTOCOL
“HTTP” stands for “Hypertext Transfer Protocol” and was developed by Tim Berners-Lee at CERN (Switzerland) along with other concepts that formed the foundation for the World Wide Web: HTML and URI. While HTML (Hypertext Markup Language) defines the structure of a website, HTTP regulates the way in which this page is transferred from the server to the client. The third concept, URL (Uniform Resource Locator), defines how the resource (such as a website) should be addressed on the web. By “Hypertext”, the term that appears in the abbreviations HTTP and HTML, we mean a concept we are all familiar with: linking files. On a website you place hyperlinks or links that lead to other pages. If you enter an Internet address in the browser and a website is displayed shortly after, the browser communicated with the web server via HTTP. Metaphorically speaking, HTTP is the language that the browser uses to speak to the web server and communicate what is requested.
HOW HTTP WORKS
The simplest way to explain how HTTP works is to open a website:
- We open “http://www.rifugioalbasini.it/contatti.html/” by typing refugealbasini.it in the Google search bar and open the site which is not secure as it does not use HTTPS.
- The browser sends a corresponding request, the HTTP request, to the competent web server, which manages the refugealbasini.it domain.
- Usually the request is “Please send me the file”. Alternatively the client can simply ask “Do you have this file?”.
- The web server receives the HTTP request, searches for the desired file (in the example: the refugealbasini.it homepage, i.e. the contacts.html file) and first sends the header, which uses a status code to inform the requesting client of the result of his research.
If the file is found and the client actually wants it to be sent (and didn’t just want to know if it exists), after the header the server sends the body of the message, that is, the actual content. In our example it is the contacts.html file. - The browser receives the file and displays it as a website.
VERSIONS OF HTTP
The original version: HTTP/1
The history of HTTP began in 1989, when Tim Berners-Lee and his team began developing the World Wide Web at CERN in Geneva. The original version of HTTP was version 0.9 and was called a “one-line protocol”. All it could do was get an HTML file from a server.
GET /dummy.html
The server just sent the appropriate file. Therefore, this version of the protocol was only capable of handling HTML files. In 1996 the Internet Engineering Task Force (IETF) described the HTTP / 1 version in the RFC1945 memo, however only as a non-binding proposal. Recently a header has been introduced that could better specify both the client request and the server response. Among other things, the “Content-Type” header field was introduced, which made it possible to transmit files other than HTML. In short, this version of HTTP had the following characteristics:
- Connectionless: The client connects to the server, sends the request, the server replies, then the connection is closed. For the next request, the client must reestablish the connection. This is a complicated process because a website usually consists of multiple files and each of them must be “retrieved” by an independent request.
- Stateless: the two parties, client and server, immediately “forget” each other. The next time the client logs in to the server, it doesn’t know that the client had already contacted it.
- Media independent: Any type of file can be transmitted via HTTP as long as both parties know how to handle the respective file type.
The first official standard: HTTP/1.1
In 1997, the HTTP / 1.1 version appeared, described in the memo RFC2068. It was the first “official” standard and is still in use today. It has made important innovations over the HTTP / 1 version:
- Keepalive: the client has the ability to maintain the connection through a request (persistent connection), by sending a keepalive in the header of its request.
- HTTP pipelining, so that the client can send a subsequent request before receiving the response to the first.
- In chats, the browser can update the browser window using the MIME type multipart / replace.
- Data can also be transferred from the client to the server.
- With the TRACE method, just introduced, it is possible to follow the path from the client to the web server.
- Cache: There are new mechanisms for caching content.
- Host: thanks to a corresponding specification in the header (host), an HTTP request works even if several different domains are hosted under a single IP address, as happens today with most websites (Shared Webhosting).
An urgently needed renewal: HTTP/2
Over the years, websites have gotten bigger and more complex. To load a modern website into the browser, the browser must request several megabytes of data and send up to one hundred individual HTTP requests. Since HTTP / 1.1 requires requests within a connection to be processed one after the other, the more complex a website is, the longer it takes to load the page. Therefore Google developed a new experimental protocol called SPDY (pronounced: “speedy”), which aroused great interest from the developer community and eventually led to the release of the HTTP / 2 protocol version in 2015. This new standard brings among other things, the following innovations, all aimed at speeding up the loading of websites:
- Binary: The protocol is based on binary data rather than text files.
- Multiplex: Client and server can send and process multiple HTTP requests in parallel.
- Compression: headers are compressed; Because they are often nearly identical in many HTTP requests, compression eliminates unnecessary redundancy.
- Server Push: If the server can predict what data the client will still need, it can send it to the client cache, without a previous HTTP request.
HTTP/2 quickly established itself; in particular, it was quickly adopted by websites with a lot of traffic. Currently (as of January 2020) according to W3Techs, around 42 percent of websites use the HTTP/2 version.
IN-DEPTH HTTP OPERATION/1.1
HTTP is a protocol that works with a client / server architecture: the client makes a request and the server returns the response sent by another host. In common use, the client corresponds to the browser and the server the machine on which the website resides. There are therefore two types of HTTP messages: request messages and response messages. HTTP differs from other layer 7 protocols such as FTP in that connections are generally closed once a particular request (or a series of related requests) has been satisfied. This behavior makes the HTTP protocol ideal for the World Wide Web, in which the pages very often contain links to pages hosted by other servers, thus decreasing the number of active connections limiting them to those actually necessary with an increase in efficiency (less load and occupancy) on both the client and the server. However, it sometimes poses problems for web content developers, because the stateless nature of the browsing session forces them to use alternative methods – typically based on cookies – to maintain the user’s status.
REQUEST MESSAGE
The request message consists of four parts:
- request line;
- header section (additional information);
- blank line (CRLF: the 2 characters carriage return and line feed);
- (message body).
REQUEST LINE
The request line consists of the method, URI and protocol version. The request method, for version 1.1, can be one of the following:
GET
POST
HEAD
PUT
DELETE
PATCH
TRACE
OPTIONS
CONNECT
the URI, Uniform Resource Identifier (unique resource identifier), indicates the subject of the request (for example the web page to be obtained). The most common http methods are GET, HEAD and POST. The GET method is used to get the content of the resource indicated as a URI (such as the content of an HTML page). HEAD is similar to GET, but returns only the header fields, for example to check the modification date of the file. A request with the HEAD method does not require the use of the body. The POST method is typically used to send information to the server (such as form data). In this case, the URI indicates what is being sent and the body indicates its content.
THE HEADERS OF THE REQUEST
The most common request headers are:
- Host: Name of the server to which the URL refers. It is required in HTTP/1.1 compliant requests because it allows the use of name-based virtual hosts.
- User-Agent: identification of the client type: browser type, manufacturer, version …
- Cookies: Used by web applications to store and retrieve long-term client-side information. Often used to store an authentication token or to track user activity.
RESPONSE MESSAGE
The reply message is textual and consists of four parts:
- status-line.
- header section;
- blank line (CRLF: the 2 characters carriage return and line feed);
- body (response content).
The status line shows a three-digit code cataloged as follows:
- 1xx: Informational (informational messages)
- 2xx: Successful (the request was satisfied)
- 3xx: Redirection (there is no immediate response, but the request makes sense and is told how to get the response)
- 4xx: Client error (the request cannot be satisfied because it is wrong)
- 5xx: Server error (the request cannot be satisfied due to an internal server problem)
The most common response codes are:
- 200 OK. The server successfully provided the content in the body section.
- 301 Moved Permanently. The resource we requested is unreachable because it has been moved permanently.
- 302 Found. The resource is reachable with another URI indicated in the Location header. Browsers typically make the request to the specified URI automatically without user interaction.
- 400 Bad Request. The requested resource is not understandable to the server.
- 404 Not Found. The requested resource was not found and its location is unknown. This usually occurs when the URI has been entered incorrectly or the content has been removed from the server.
- 500 Internal Server Error. The server is unable to respond to the request due to an internal problem.
- 502 Bad Gateway. The web server acting as the reverse proxy did not get a valid response from the upstream server.
- 505 HTTP Version Not Supported. The http version is not supported.
THE HEADERS OF THE ANSWER
The most common response headers are:
- Server:Indicates the type and version of the server. It can be seen as the equivalent of the User-Agent request header.
- Content-Type: Indicates the type of content returned. The encoding of these types (called Media types) is registered with the IANA (Internet Assigned Number Authority); they are called MIME types (Multimedia Internet Mail Extensions), whose encoding is described in the RFC 1521 document. Some usual MIME types encountered in an HTTP response are:
- text / html HTML document
- text / plain Unformatted text document
- text / xml XML document
- image / jpeg JPEG format image
TYPE OF CONNECTION
The client can ask the server, in the request message, to use two types of communication.
Not persistent
For each request and its response, a dedicated TCP connection is established.
Persistent
Each request and its response is transferred using the same TCP connection. This is the default behavior of HTTP 1.1.
On the one hand, non-persistent connections introduce an additional latency compared to persistent ones of at least 3 Round Trip Times (RTT). In fact, at the end of each response from the server they become necessary
1.5 or 2 RTTs for closing the current connection, with its final three- or four-step FIN and ACK (three- or four-way handshake) handshake.
1.5 RTT for opening the new connection, for the three steps of SYN and ACK.
On the other hand, persistent connections preclude parallelism in communications, since the client that has several requests to send to the same server is forced to process them sequentially, one after the other. For these reasons, browsers usually exploit the performance complementarities of the two communication policies to maximize their efficiency: they usually open several TCP connections in parallel with each server, on which they communicate with a persistent strategy.
EXAMPLES OF HTTP MESSAGES
- Navigate on a web page.
- Open the browser tools by clicking on inspect.
- In the Network tab you can see the various messages exchanged between client and server.
MORE INFORMATION HTTP/2
HTTP/2 (originally called HTTP/2.0) is the new version of the HTTP networking protocol used by the World Wide Web. It is based on SPDY. HTTP/2 was developed by the Internet Engineering Task Force’s Working Group Hypertext Transfer Protocol (http bis). HTTP/2 would be the first new version of the HTTP protocol since the birth of HTTP 1.1, which is RFC 2616 standard in 1999. The Working Group presented HTTP / 2 to the IESG proposing it as a standard in December 2014. The efforts made for standardization are a response to SPDY, an HTTP-compatible protocol developed by Google and supported in Chrome, Opera, Firefox, Internet Explorer 11, Safari, and Amazon Sil.
DIFFERENCES WITH HTTP 1.1
The proposed changes do not require any changes to the way existing web applications work, but new applications can take advantage of the innovations introduced to increase speed. HTTP/2 keeps most of the HTTP 1.1 syntax such as methods, status codes, header fields, URIs at a high level. The difference is in the way the data flow between the client and the server is structured and transported. Efficient websites minimize the number of requests required to return a page by “minifying” (reducing the size of the code and packing small pieces of code into larger units, without affecting their functionality) applied to resources such as images and script. However, minification is not necessarily cost effective or efficient and may still require separate HTTP connections to get the page and resources minimized. HTTP/2 allows the server to “push” more data than the client requests. This allows the server to provide data that a web browser knows is necessary to complete the page, without waiting for the browser to examine the first response and without the overhead of an additional request cycle.
SUBSEQUENT DIFFERENCES WITH SPDY
SPDY (pronounced “spidi”) is a research project carried out by Google. The protocol derived from the SPDY project, which bears the same name, is intended for the transport of information and other content on the web, with the main objective of reducing latency. SPDY always relies on the same TCP pipe, but uses different protocols to achieve this reduction. Basic changes made to HTTP/1.1 to create SPDY include: “True request pipelining without FIFO restrictions, a message framing mechanism to simplify client and server development, mandatory compression (including headers), priority management, and also bi-directional communications. ” The httpbis working group considered Google’s SPDY protocol, Microsoft’s HTTP Speed + Mobility proposal (based on SPDY) and Network-Friendly HTTP Upgrade. In July 2012, Facebook provided feedback on each of the proposals and recommended that HTTP / 2 be based on SPDY. The initial draft of HTTP/2 was released in November 2012 and was based directly on a copy of SPDY. The major difference between HTTP/1.1 and SPDY is that every action of a user in SPDY is assigned a “stream ID”, which means that a single TCP channel connects the user to the server. SPDY splits requests into control or data, which is a “simple binary protocol parsing with two types of frames”. SPDY showed a noticeable improvement over HTTP, with page load speed increases from 11.81% up to 47.7%. HTTP/2 uses SPDY as a starting point. In HTTP/2, however, a code-based Huffman compression algorithm is used instead of the stream-based dynamic compression used in SPDY. This helps reduce potential risk of protocol attacks. On February 9, 2015, Google announced that it would remove support for SPDY in Chrome by early 2016 in favor of HTTP/2 support, starting with Chrome 40. Although HTTP/2 was designed to support both HTTP and HTTPS, in fact all the implementations within the main browsers (Firefox, Chrome, Safari, Opera, IE, Edge) have decided to exclusively support HTTP/2 through TLS, making it a requirement.
DEEPENING AI
1. HTTP: HyperText Transfer Protocol
HTTP (HyperText Transfer Protocol) is the communication protocol used to transfer information on the Internet, in particular between a client (usually a web browser) and a web server. It is the protocol on which the World Wide Web is based and allows the display of web pages.
1.1 How HTTP works
HTTP is a stateless protocol, which means that each request between the client and the server is independent and does not maintain information about the state of previous communications. Each time a user requests a web page, the browser sends a request to the server, which responds with the requested resources (such as HTML pages, images, videos, etc.).
The HTTP communication process occurs through a request and response mechanism:
•Client: the browser sends an HTTP request to the server.
•Server: the server processes the request and returns a response (web content or an error message).
1.2 HTTP methods
There are various HTTP methods, which tell the server what action to perform in relation to the resource. The main methods are:
•GET: Requests data from the server. It is used to get information or load a web page.
•POST: Sends data to the server, such as a contact or registration form.
•PUT: Uploads a resource to the server or modifies an existing one.
•DELETE: Removes a resource from the server.
•HEAD: Similar to GET, but does not return the body of the document, only the header information.
•OPTIONS: Asks the server which request methods are supported on a specific resource.
1.3 HTTP Port and Security
HTTP uses port 80 for communication. One of the main disadvantages of HTTP is the lack of encryption: the data sent between the client and the server is in clear text and can be intercepted and read by malicious third parties.
2. HTTPS: HyperText Transfer Protocol Secure
HTTPS (HyperText Transfer Protocol Secure) is the secure version of HTTP. It uses a combination of HTTP and SSL/TLS (Secure Sockets Layer/Transport Layer Security) to encrypt the communication between the client and the server, ensuring that the data transmitted cannot be intercepted or altered by third parties.
2.1 How HTTPS Works
HTTPS adds a layer of security through SSL/TLS encryption. When a user accesses an HTTPS site:
1. Handshaking: The browser and the server establish a secure connection through a process called handshake. In this process, cryptographic keys are negotiated that will be used to secure the communication.
2. Encryption: Once the secure connection is established, all information exchanged between the browser and the server is encrypted.
3. Authentication: The HTTPS server uses a digital certificate (issued by a certificate authority, such as Let’s Encrypt or Symantec) to prove its identity and authenticity.
2.2 Benefits of HTTPS
•Encryption: HTTPS ensures that data transmitted between the client and the server is encrypted, making it unreadable to third parties.
•Data Integrity: HTTPS ensures that data is not altered during transmission.
•Authentication: HTTPS authenticates the server, ensuring the user that they are communicating with the legitimate server and not a fraudulent entity.
•SEO: Search engines like Google prioritize HTTPS sites over HTTP sites, improving their search rankings.
2.3 HTTPS Port and Security
HTTPS uses port 443 for communication. By implementing SSL/TLS, HTTPS protects sensitive data, such as credit card numbers, login credentials, and personal information.
3. How the Transition from HTTP to HTTPS Happens
The transition from HTTP to HTTPS has become essential for the security of online communications. The transition process includes:
•Purchasing an SSL/TLS certificate: Website administrators must purchase (or obtain for free) an SSL/TLS certificate from a certificate authority (CA).
•Server configuration: The certificate must be installed and configured correctly on the server.
•HTTP to HTTPS redirection: Once HTTPS is configured, all HTTP requests should be redirected to the HTTPS version of the site.
•Updating internal links: Any internal links that point to an HTTP resource should be updated to use HTTPS.
4. Conclusion
HTTP and HTTPS are both protocols used for client-server communication, but they differ significantly in terms of security. HTTPS is the new standard, which is essential for ensuring secure and trustworthy online communications. Using HTTPS is especially important for sites that handle sensitive information such as passwords and financial data, while HTTP is now considered obsolete.
Leave A Comment