HTTP - the HyperText Transfer Protocol

HTTP/2

Presentation / details

Support in existing software

Lighttpd
HTTP/2 is NOT supported so far in Lighttpd (sources : 1, 2).
Varnish

HTTP caching pragma headers

TCP_HIT
The object was fresh in cache and served from disk cache.
TCP_MISS
The object was not in cache, server fetched object from origin.
TCP_REFRESH_HIT
The object was stale in cache. The If-Modified-Since request to the origin returned a HTTP 304 - Not Modified
TCP_REFRESH_MISS
the object was stale in cache. The If-Modified-Since request to the origin returned the new content.
TCP_REFRESH_FAIL_HIT
Object was stale in cache and we failed on refresh (couldn't reach origin) so we served the stale object.
TCP_IMS_HIT
IF-Modified-Since request from client and object was fresh in cache and served.
TCP_NEGATIVE_HIT
Object previously returned a "not found" (or any other negatively cacheable response) and that cached response was a hit for this new request.
TCP_MEM_HIT
Object was on disk and in the memory cache. Server served it without hitting the disk.
A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses.
TCP_DENIED
Denied access to the client for whatever reason.
TCP_COOKIE_DENY
Denied access on cookie authentication (if centralized or decentralized authorization feature is being used in configuration).

HTTP ETag

ETag vs header Expires :

ETag and Last-modified headers are validators : they help the browser understand if a resource has changed, even if it preserves the same name.
Expires and Cache-control are giving refresh information : they inform the browser and the reverse in-between proxies, up to what time or for how long, they may keep the resource in their cache.

  • Expires and Cache-Control headers are strong caching headers.
  • Last-Modified and ETag are weak caching headers.
First the browser checks Expires/Cache-Control to determine whether or not to make a request to the server. If so, it will send Last-Modified/ETag in the HTTP request. If the ETag value of the document matches that, the server will send a 304 code instead of 200, and no content. The browser will load the contents from its cache.

By default, Apache and Lighttpd (not checked about others, they may do the same) will generate an ETag based on the file's :

  • inode number
  • last-modified date
  • size
So if you are using a load-balanced server setup with multiple machines, you will probably want to turn off ETag generation : the inodes are used as part of the ETag hash algorithm which will be different between the servers. You can configure Apache and Lighttpd to not use inodes as part of the calculation but then you'd want to make sure the timestamps on the files are exactly the same, to ensure the same ETag gets generated for all servers.

HTTP Headers

Header 1.0 1.1 Request Response Usage Example
Accept-Encoding Declares which content-codings are acceptable in the response. (Details for HTTP/1.0, HTTP/1.1) Accept-Encoding: gzip, deflate
Cache-Control Specify directives that must be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response. These directives typically override the default caching algorithms. Cache directives are unidirectional in that the presence of a directive in a request does not imply that the same directive is to be given in the response.
Cache-Control: max-age
  • Request : Indicates that the client is willing to accept a response whose age is no greater than the specified time in seconds. Overrides Expires if both are given. (details)
  • Response : Tells all caching mechanisms from server to client whether they may cache this object. It is measured in seconds. A request having Cache-Control: max-age=0 force any intermediate caches to validate their copies directly with the origin server
Cache-Control: max-age=3600
Cache-Control: no-cache
  • Request : force any intermediate caches to obtain a new copy from the origin server
  • Response : a cache must not use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests. (details)
Cache-Control: no-store no-store is used to prevent the inadvertent release or retention of sensitive information (for example, on backup tapes)
  • Request : a cache must not store any part of either this request or any response to it
  • Response : a cache must not store any part of either this response or the request that elicited it.
"must not store" in this context means that the cache must not intentionally store the information in non-volatile storage, and MUST make a best-effort attempt to remove the information from volatile storage as promptly as possible after forwarding it.
Connection Allows the sender to specify options that are desired for that particular connection and must not be communicated by proxies over further connections.(Details for HTTP/1.1, both). The Keep-Alive header is optional; it is wise not to force its usage in the context of mixed HTTP/1.0 and HTTP/1.1 clients.
In HTTP 1.1, all connections are considered persistent unless declared otherwise. HTTP/1.1 applications that do not support persistent connections must include the close connection option in every message.
  • Connection: keep-alive
  • Connection: close
Content-Encoding This field is used as a modifier to the media-type. When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type. (Details for HTTP/1.0, HTTP/1.1) Content-Encoding: gzip
Content-Length Indicates the size of the Entity-Body, in decimal number of octets, sent to the recipient.
In the case of a HEAD request, this is the size of the Entity-Body that would have been sent had the request been a GET.
Details for HTTP/1.0, HTTP/1.1
Content-Type Indicates the media type of the Entity-Body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET.
Details for HTTP/1.0, HTTP/1.1
Content-Type: text/html
ETag ETag stands for Entity Tag. It is a fingerprint (hash) of the resource content. Computing the ETag value is up to the web server as long as it is globally unique (collision-free).(Details from Wikipedia, HTTP/1.1).
Browsers may decide to cache the resource along with its ETag, so that, on the next request for the same resource, they will attach the ETag value to the request (If-None-Match: "686897696a7c876b7e") and may receive a mere 304 (not modified) response if the cached version is still valid, or a full response if the resource was updated..
ETag: "686897696a7c876b7e"
If-Modified-Since
  • If the requested resource has been modified since the time specified in this field, the response is exactly the same as for a normal GET.
  • If the requested resource has not been modified since the time specified in this field, a copy of the resource will not be returned from the server. Instead, a 304 (not modified) response will be returned without any Entity-Body.
Details for HTTP/1.0, HTTP/1.1
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
Expires Gives the date/time after which the response is considered stale. Expires: Wed, 13 Nov 2013 12:58:48 GMT
Header 1.0 1.1 Request Response Usage Example

Some definitions :

Entity-Body
The Entity-Body is : entity-body := Content-Encoding( Content-Type( data ) ) (source). Its length (shown in Content-Length) is the size of the compressed data (source).

HTTP status codes

Code Message Details
1xx Informational Request received, continuing process
2xx Successful This class of status code indicates that the client's request was successfully received, understood, and accepted.
200 OK The request has succeeded. The information returned with the response is dependent on the method used in the request (GET, HEAD, POST, TRACE)
201 Created The request has been fulfilled and resulted in a new resource being created, which URL may be found in the response. The origin server MUST create the resource before returning the 201 status code. If the action cannot be carried out immediately, the server should respond with 202 (Accepted) response instead.
202 Accepted The request has been accepted for processing, but the processing has not been completed. This status doesn't even indicate that processing actually started, and gives no guarantee that it may be started anyway. The entity returned with this response should include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
204 No content The server processed the request but has no content to return. The response may include new or updated HTTP headers but must not include a message-body, and thus is always terminated by the first empty line after the header fields.
3xx Redirection This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.
301 Moved Permanently The requested resource has been assigned a new permanent URI and any future references to this resource should use one of the returned URIs. This response is cacheable unless indicated otherwise.
302 Found The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client should continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.
Browser developers misinterpreted the HTTP/1.0 standard and evolutions to HTTP/1.1 made things even more unclear (source). For temporary redirects, use 307 rather than 302.
303 See Other The response to the request can be found under a different URI and should be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. See also
304 Not Modified Is given as a reply to a conditional GET : indicates the resource has not been modified since the datetime provided with the If-Modified-Since header. When receiving this code, the HTTP client is instructed to use data from its cache.
If the object was modified, the HTTP server will return an HTTP 200, and the content will be downloaded again.
307 Temporary Redirect What's the difference between a 302 and a 307 redirect? See also
4xx Client Error The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents should display any included entity to the user.
If the client is sending data, a server implementation using TCP should be careful to ensure that the client acknowledges receipt of the packet(s) containing the response, before the server closes the input connection. If the client continues sending data to the server after the close, the server's TCP stack will send a reset packet to the client, which may erase the client's unacknowledged input buffers before they can be read and interpreted by the HTTP application.
400 Bad Request The request could not be understood by the server due to malformed syntax. The client should not repeat the request without modifications.
401 Unauthorized
  • The request requires user authentication.
  • The client MAY repeat the request with a suitable Authorization header field.
  • If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials.
  • Details : HTTP Authentication: Basic and Digest Access Authentication
403 Forbidden The server understood the request, but is refusing to fulfill it. Authorization will not help and the request should not be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it should describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
404 Not Found The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code should be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
405 Method Not Allowed The method specified in the Request-Line is not allowed for the resource identified by the Request-URI. The response must include an Allow header containing a list of valid methods for the requested resource.
407 Proxy Authentication Required This code is similar to 401 (Unauthorized), but indicates that the client must first authenticate itself with the proxy. The proxy MUST return a Proxy-Authenticate header field containing a challenge applicable to the proxy for the requested resource. The client MAY repeat the request with a suitable Proxy-Authorization header field.
410 Gone The resource requested is no longer available and will not be available again. This should be used when a resource has been intentionally removed and the resource should be purged. Clients such as search engines should remove the resource from their indices.
411 Length Required The server refuses to accept the request without a defined Content-Length. The client MAY repeat the request if it adds a valid Content-Length header field containing the length of the message-body in the request message.
412 Precondition Failed The precondition given in one or more of the request-header fields evaluated to false when it was tested on the server. (i.e. The server does not meet one of the preconditions that the requester put on the request.)
414 Request-URI Too Long The server is refusing to service the request because the Request-URI is longer than the server is willing to interpret. This rare condition is only likely to occur when a client has improperly converted a POST request to a GET request with long query information, when the client has descended into a URI "black hole" of redirection (e.g., a redirected URI prefix that points to a suffix of itself), or when the server is under attack by a client attempting to exploit security holes present in some servers using fixed-length buffers for reading or manipulating the Request-URI.
5xx Server Error The server failed to fulfill an apparently valid request.
5xx indicate cases in which the server is aware that it has encountered an error or is otherwise incapable of performing the request. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and indicate whether it is a temporary or permanent condition. Likewise, user agents should display any included entity to the user. These response codes are applicable to any request method.
501 Not Implemented The server does not support the functionality required to fulfill the request. This is the appropriate response when the server does not recognize the request method and is not capable of supporting it for any resource.
503 Service Unavailable The server is currently unavailable (because it is overloaded or down for maintenance). Generally, this is a temporary state.