HTTP - the HyperText Transfer Protocol

mail

HTTP Strict Transport Security (HSTS)

Definition (source : Mozilla Developer Network, Wikipedia, RFC 6797)

If a website accepts a connection through HTTP and redirects to HTTPS, visitors may initially communicate with the non-encrypted version of the site before being redirected. This creates an opportunity for a man-in-the-middle attack. The redirect could be exploited to direct visitors to a malicious site instead of the secure version of the original site.

The HSTS header informs the browser that it should never load a site using HTTP and should automatically convert all attempts to access the site using HTTP to HTTPS requests instead.

How it works

This is over-simplified, read the docs for full details !
  1. a web client contacts a web server over HTTP (not HTTPS)
  2. the server replies with a permanent redirection (HTTP301) to the HTTPS version of the website
    This redirect should already be in place even though you did not consider HSTS yet.
  3. the client follows the redirection and connects to the website over HTTPS
  4. the server replies normally and adds an extra :
  5. the client
    1. remembers to open next connections via HTTPS directly, even though the user typed/clicked an HTTP URL
    2. will abide by this behaviour for max-age duration
  6. later, the client
    • _may_ retry connecting to the server via HTTP (not HTTPS) when max-age is elapsed
    • but when receiving the Strict-Transport-Security header again, the max-age value is reset, preventing the timeout from expiring

What's the difference / benefit between an HTTP 301 only and HTTP 301 + HSTS ?

With HTTP 301 only

  1. you connect to http://www.mybank.com, either by typing the URL or via a bookmark
  2. your browser is told to go "elsewhere" and goes there, as instructed
  3. if this is not the HTTPS version of your bank website, and if you don't realize it : you're screwed !
  4. if it's the genuine https://www.mybank.com, you'll be sending your bank credentials to your bank website over HTTPS, which is fine
  5. the next time you visit your bank website, depending on your browser settings + caches between you and your bank website + , you re-play this whole scenario, including the part where you blindly follow a redirect sent over HTTP

With HTTP 301 + HSTS

  1. as above, you connect to http://www.mybank.com, either by typing the URL or via a bookmark
  2. your browser is told to go "elsewhere" and goes there, as instructed
  3. if this is not the HTTPS version of your bank website, and if you don't realize it : you're screwed !
  4. if it's the genuine https://www.mybank.com :
    1. not only will you be communicating with your bank website over HTTPS, which is fine
    2. but also the website will instruct your browser Next time, contact me directly via HTTPS !
  5. the next time you visit your bank website, even though you type or click an HTTP URL, your browser will
    1. translate it into HTTPS
    2. then open https://www.mybank.com

Summary : the difference

  • for both : if the 1st time you visit http://www.mybank.com you're redirected to a malicious website : you're screwed
  • with HTTP 301 only : you're taking the same risk again at each visit
  • With HTTP 301 + HSTS : you'll never visit the HTTP version of the website again, which is what increases security

How to enable it on Apache (source) ?

  1. check the headers module is loaded
    httpd -M | grep headers
    headers_module (shared)
  2. If it's not loaded yet, you'll need to add this somewhere :
    LoadModule headers_module modules/mod_headers.so
    or : a2enmod ...
  3. add to the configuration of your HTTPS website :
    <VirtualHost 12.34.56.78:443>
    	
    	Header always set Strict-Transport-Security "max-age=31536000; includeSubdomains;"
    	
    </VirtualHost>
    31536000 is the number of seconds in a year. Start with smaller values while experimenting.
  4. check the configuration
  5. restart / reload the configuration
  6. make sure you receive the Strict-Transport-Security header as expected :
    curl -I https://my.web.site/
    HTTP/1.1 200 OK
    Date: Mon, 09 Oct 2023 09:39:04 GMT
    Server: Apache/2
    Strict-Transport-Security: max-age=600; includeSubdomains;
    
mail

HTTP/2

Presentation / details

Support in existing software

Lighttpd

HTTP/2 is NOT supported so far in Lighttpd (sources : 1, 2).

Varnish

mail

HTTP caching pragma headers

TCP_HIT
The object was fresh in cache and served from disk cache.
TCP_MISS
The object was not in cache, server fetched object from origin.
TCP_REFRESH_HIT
The object was stale in cache. The If-Modified-Since request to the origin returned a HTTP 304 - Not Modified
TCP_REFRESH_MISS
the object was stale in cache. The If-Modified-Since request to the origin returned the new content.
TCP_REFRESH_FAIL_HIT
Object was stale in cache and we failed on refresh (couldn't reach origin) so we served the stale object.
TCP_IMS_HIT
IF-Modified-Since request from client and object was fresh in cache and served.
TCP_NEGATIVE_HIT
Object previously returned a "not found" (or any other negatively cacheable response) and that cached response was a hit for this new request.
TCP_MEM_HIT
Object was on disk and in the memory cache. Server served it without hitting the disk.
A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses.
TCP_DENIED
Denied access to the client for whatever reason.
TCP_COOKIE_DENY
Denied access on cookie authentication (if centralized or decentralized authorization feature is being used in configuration).
mail

HTTP headers : Expires vs ETag

Cache-control and Expires headers
  • are giving refresh information : they inform the browser and the in-between reverse proxies, up to what time or for how long, they may keep the resource in their cache
  • are strong caching headers
Last-modified and ETag headers
  • are validators : they help the browser understand if a resource has changed, even if it preserves the same name
  • are weak caching headers

On the client side :

  1. the browser checks Cache-Control / Expires to determine whether or not to make a request to the server
  2. if so, the browser adds Last-Modified / ETag headers to the HTTP request
  3. if both client + server ETag values match, the server replies with an HTTP 304 code instead of an HTTP 200200, and no content. Then the browser will loads the contents from its cache.

On the server side :

When instructed to use ETags, by default Apache and Lighttpd generate ETag values based on the file's :
So, when using a load-balanced server setup with multiple machines :
  • static resources (e.g. /var/www/html/path/to/image.jpg) will have distinct inode numbers on each web server. But since the inode numbers are used as part of the ETag hash algorithm, the same resource will have a distinct ETag value depending on which web server handles the request. This might be a reason for not using ETags.
  • You can configure Apache and Lighttpd to not use inode numbers as part of the calculation but then you have to make sure the files timestamps are consistent across web servers so that identical ETags are generated.
mail

HTTP Headers

Header 1.0 1.1 Request Response Usage Example
Accept-Encoding Declares which content-codings are acceptable in the response. (Details for HTTP/1.0, HTTP/1.1) Accept-Encoding: gzip, deflate
Cache-Control Specify directives that must be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response. These directives typically override the default caching algorithms. Cache directives are unidirectional in that the presence of a directive in a request does not imply that the same directive is to be given in the response.
Cache-Control: max-age
  • Request : Indicates that the client is willing to accept a response whose age is no greater than the specified time in seconds. Overrides Expires if both are given. (details)
  • Response : Tells all caching mechanisms from server to client whether they may cache this object. It is measured in seconds. A request having Cache-Control: max-age=0 force any intermediate caches to validate their copies directly with the origin server
Cache-Control: max-age=3600
Cache-Control: no-cache
  • Request : force any intermediate caches to obtain a new copy from the origin server
  • Response : a cache must not use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests. (details)
Cache-Control: no-store no-store is used to prevent the inadvertent release or retention of sensitive information (for example, on backup tapes)
  • Request : a cache must not store any part of either this request or any response to it
  • Response : a cache must not store any part of either this response or the request that elicited it.
"must not store" in this context means that the cache must not intentionally store the information in non-volatile storage, and MUST make a best-effort attempt to remove the information from volatile storage as promptly as possible after forwarding it.
Connection Allows the sender to specify options that are desired for that particular connection and must not be communicated by proxies over further connections.(Details for HTTP/1.1, both). The Keep-Alive header is optional; it is wise not to force its usage in the context of mixed HTTP/1.0 and HTTP/1.1 clients.
In HTTP 1.1, all connections are considered persistent unless declared otherwise. HTTP/1.1 applications that do not support persistent connections must include the close connection option in every message.
  • Connection: keep-alive
  • Connection: close
Content-Encoding This field is used as a modifier to the media-type. When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type. (Details for HTTP/1.0, HTTP/1.1) Content-Encoding: gzip
Content-Length Indicates the size of the Entity-Body, in decimal number of octets, sent to the recipient.
In the case of a HEAD request, this is the size of the Entity-Body that would have been sent had the request been a GET.
Details for HTTP/1.0, HTTP/1.1
Content-Type Indicates the media type of the Entity-Body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET.
Details for HTTP/1.0, HTTP/1.1
Content-Type: text/html
ETag ETag stands for Entity Tag. It is a fingerprint (hash) of the resource content. Computing the ETag value is up to the web server as long as it is globally unique (collision-free).(Details from Wikipedia, HTTP/1.1).
Browsers may decide to cache the resource along with its ETag, so that, on the next request for the same resource, they will attach the ETag value to the request (If-None-Match: "686897696a7c876b7e") and may receive a mere 304 (not modified) response if the cached version is still valid, or a full response if the resource was updated.
ETag: "686897696a7c876b7e"
If-Modified-Since
  • If the requested resource has been modified since the time specified in this field, the response is exactly the same as for a normal GET.
  • If the requested resource has not been modified since the time specified in this field, a copy of the resource will not be returned from the server. Instead, a 304 (not modified) response will be returned without any Entity-Body.
Details for HTTP/1.0, HTTP/1.1
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
Expires Gives the date/time after which the response is considered stale. Expires: Wed, 13 Nov 2013 12:58:48 GMT
Header 1.0 1.1 Request Response Usage Example

Some definitions :

Entity-Body
The Entity-Body is : entity-body := Content-Encoding( Content-Type( data ) ) (source). Its length (shown in Content-Length) is the size of the compressed data (source).
mail

HTTP status codes

These status codes are common to HTTP/1.1 and HTTP/2.
Code Message Details
1xx Informational Request received, continuing process
2xx Successful This class of status code indicates that the client's request was successfully received, understood, and accepted.
200 OK The request has succeeded. The information returned with the response is dependent on the method used in the request (GET, HEAD, POST, TRACE)
201 Created The request has been fulfilled and resulted in a new resource being created, which URL may be found in the response. The origin server MUST create the resource before returning the 201 status code. If the action cannot be carried out immediately, the server should respond with 202 (Accepted) response instead.
202 Accepted The request has been accepted for processing, but the processing has not been completed. This status doesn't even indicate that processing actually started, and gives no guarantee that it may be started anyway. The entity returned with this response should include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
204 No content The server processed the request but has no content to return. The response may include new or updated HTTP headers but must not include a message-body, and thus is always terminated by the first empty line after the header fields.
3xx Redirection This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.
301 Moved Permanently The requested resource has been assigned a new permanent URI and any future references to this resource should use one of the returned URIs. This response is cacheable unless indicated otherwise (details).
302 Found The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client should continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.
Browser developers misinterpreted the HTTP/1.0 standard and evolutions to HTTP/1.1 made things even more unclear (source). For temporary redirects, use 307 rather than 302.
303 See Other The response to the request can be found under a different URI and should be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. See also
304 Not Modified Is given as a reply to a conditional GET : indicates the resource has not been modified since the datetime provided with the If-Modified-Since header. When receiving this code, the HTTP client is instructed to use data from its cache.
If the object was modified, the HTTP server will return an HTTP 200, and the content will be downloaded again.
307 Temporary Redirect What's the difference between a 302 and a 307 redirect? See also
4xx Client Error The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents should display any included entity to the user.
If the client is sending data, a server implementation using TCP should be careful to ensure that the client acknowledges receipt of the packet(s) containing the response, before the server closes the input connection. If the client continues sending data to the server after the close, the server's TCP stack will send a reset packet to the client, which may erase the client's unacknowledged input buffers before they can be read and interpreted by the HTTP application.
400 Bad Request The request could not be understood by the server due to malformed syntax. The client should not repeat the request without modifications.
401 Unauthorized
  • The request requires user authentication.
  • The client MAY repeat the request with a suitable Authorization header field.
  • If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials.
  • Details : HTTP Authentication: Basic and Digest Access Authentication
403 Forbidden The server understood the request, but is refusing to fulfill it. Authorization will not help and the request should not be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it should describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
404 Not Found The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code should be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
405 Method Not Allowed The method specified in the Request-Line is not allowed for the resource identified by the Request-URI. The response must include an Allow header containing a list of valid methods for the requested resource.
407 Proxy Authentication Required This code is similar to 401 (Unauthorized), but indicates that the client must first authenticate itself with the proxy. The proxy MUST return a Proxy-Authenticate header field containing a challenge applicable to the proxy for the requested resource. The client MAY repeat the request with a suitable Proxy-Authorization header field.
410 Gone The resource requested is no longer available and will not be available again. This should be used when a resource has been intentionally removed and the resource should be purged. Clients such as search engines should remove the resource from their indices.
411 Length Required The server refuses to accept the request without a defined Content-Length. The client MAY repeat the request if it adds a valid Content-Length header field containing the length of the message-body in the request message.
412 Precondition Failed The precondition given in one or more of the request-header fields evaluated to false when it was tested on the server. (i.e. The server does not meet one of the preconditions that the requester put on the request.)
414 Request-URI Too Long The server is refusing to service the request because the Request-URI is longer than the server is willing to interpret. This rare condition is only likely to occur when a client has improperly converted a POST request to a GET request with long query information, when the client has descended into a URI "black hole" of redirection (e.g., a redirected URI prefix that points to a suffix of itself), or when the server is under attack by a client attempting to exploit security holes present in some servers using fixed-length buffers for reading or manipulating the Request-URI.
5xx Server Error The server failed to fulfill an apparently valid request.
5xx indicate cases in which the server is aware that it has encountered an error or is otherwise incapable of performing the request. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and indicate whether it is a temporary or permanent condition. Likewise, user agents should display any included entity to the user. These response codes are applicable to any request method.
501 Not Implemented The server does not support the functionality required to fulfill the request. This is the appropriate response when the server does not recognize the request method and is not capable of supporting it for any resource.
503 Service Unavailable The server is currently unavailable (because it is overloaded or down for maintenance). Generally, this is a temporary state.