Web - HTTP, HTML, web servers, tools, ...

mail

/etc/hosts/ is ignored by my web browser

Situation

Details

Here are solutions suggested for Firefox that did not fix my problem :

Solution

The problem and its solution are fairly basic : /etc/hosts must be world-readable :
chmod 644 /etc/hosts
mail

Javascript : How to conditionally include a file ?

if(some condition ...) {
	var myJavascriptSnippet = document.createElement("script");
	myJavascriptSnippet.setAttribute("type", "text/javascript");
	myJavascriptSnippet.setAttribute("src", "someFile.js");
	document.body.appendChild(myJavascriptSnippet);
	}
else {
	deal with it ...
	}
mail

What's the difference between Basic and Digest authentication ?

Basic Digest
How does it work ?
  1. client makes a request for information, sending username and password to server in plain text. The Authorization HTTP header contains :
    Basic base64(username:password)
  2. server responds with the desired information or an HTTP 401 error
  1. client sends a request to server
  2. server responds with a special code (called a nonce), another string representing the realm (a hash) and asks client to authenticate
  3. client responds with this nonce and an encrypted version of the username, password and realm (a hash)
  4. server responds with the requested information if client hash matches their own hash of the username, password and realm, or an HTTP 401 error if not
Pro's
  • simple to implement
  • unlike Digest, supports whatever encryption method you like, such as bcrypt, making the passwords more secure
  • only 1 (request + reply) to get the information, making this slightly faster than more complex authentication methods might be
  • No information is sent in plaintext, making a non-SSL connection more secure than an HTTP Basic request that isn't sent over SSL. This means SSL isn't required, which makes each call slightly faster
Con's
  • offers close to no security advantage if not backed up by SSL / TLS, which brings some extra complexity to setup and makes authentication finally slower
  • 2 (request + reply) are necessary to get the information, making the process slightly slower than HTTP Basic
  • vulnerable to man-in-the-middle attacks
  • prevents use of strong password encryption methods, meaning the passwords stored on server could be hacked
Which one is safer ?
  • without SSL / TLS : Digest (slightly)
  • with SSL / TLS : Basic
mail

Log analysis with GoAccess

Install

apt install goaccess

Configure

config file :
  • /etc/goaccess.conf
  • the "Get started" doc mentions ~/.goaccessrc but this one looks completely ignored (or I missed something)
https://goaccess.io/


Generate HTML report :
goaccess -a -o foo > /path/to/web/directory/report.html
mail

Should I cache linked images or embed base64 images ?

Situation

What's "better" between :

Details

Let's consider the following elements :
type name / reference size [KiB]
HTML page with linked CSS file A A
HTML page with linked CSS file B B
CSS file C C
image i i
base64-encoded image b b = i*133% = 4i/3
And the following scenarios :
# description caching requests order nb. of requests transferred size computed transferred size
1 page A (with linked image) then page B (with linked image) none A C i B C i 6 A+C+i+B+C+i = 2 (C + i)
= 2α
2 same as #1 with browser caching browser A C i B 4 A+C+i+B = C + i
= α
3 same as #1 with server-side caching server-side A C i B (C) (i) 6 A+C+i+B = C + i
= α
4 page A (with base64-embedded image) then page B (with base64-embedded image) none [A+b] C [B+b] C 4 A+b+C+B+b+C = 2 (C + b)
= 2 (C + 4i/3)
= 2 (C + i + i/3)
= 2 (C + i) + 2i/3
= 2α + 2i/3
5 same as #4 with browser caching browser [A+b] C [B+b] 3 A+b+C+B+b = C + 2b
= C + 8i/3
= C + i + 5i/3
= α + 5i/3
6 same as #4 with server-side caching server-side [A+b] C [B+b] (C) 4 A+b+C+B+b = C + 2b
= C + 8i/3
= C + i + 5i/3
= α + 5i/3

Notes :

browser caching
no request is sent for items still valid in cache. We consider everything in the cache is still valid, otherwise we fall back to a "no cache" scenario
There is no such "browser only" / "server only" caching. Read more in the HTTP ETag article
server-side caching
  • this is usually performed by proxies such as Squid, Varnish, or CDN such as Akamai
  • if-modified-since HTTP header is added to each request. We consider the cache server always return an HTTP 304 status code, meaning the cached object is still valid (otherwise we'd fall back into a "no cache" scenario)
computed transferred size
  • A and B are always transferred, so removed for the comparison
  • let's consider : α = C + i
number of requests
fewer requests is interesting to take latency to the minimum

Solution

There is no "perfect" solution to this question : it depends.

The typical answer is : small files should be inlined, large files should be served separately.

Pro's Con's
link + cache
  • lighter content (no base64 overhead) : saves bandwidth (GB are $$$ on the server side) + faster on the client side
  • download + cache an image once, reuse it (from the cache) as often as you like : saves even more bandwidth
  • more GETs
  • worth only for contents that is repeated throughout the site
base64
  • fewer GETs (interesting on connections with high latency : mobile, satellite)
  • for tiny resources, downloading the base64 overhead is quicker than an extra GET
  • makes pages heavier (base64 images are 33% bigger)
  • higher CPU load (server side : trying to compress pages, client side : to render images)
  • the resource is not cached, so it'll be downloaded with every page it appears on (client : more stuff to download, server : global bandwidth usage). This is not adapted to images visible on all pages such as company logo
mail

How to disable the server signature ?

The server signature is the short text displaying the web server name, its version, as well as details on the operating system itself on servers using the default HTTP 404 page :

It can also be a dedicated HTTP header : Server: lighttpd/1.4.25, or :

Such signatures disclose valuable information to attackers, which is why it is wise to hide them.

On Apache (source) :

  1. Add to /etc/apache2/apache2.conf :
    ServerSignature Off
    ServerTokens Prod
  2. Then restart Apache

On Lighttpd :

  1. Add to /etc/lighttpd/lighttpd.conf :
    server.tag="any string you like"
  2. Then restart Lighttpd

On Varnish (source) :

  1. Add to /etc/varnish/default.vcl :
    sub vcl_deliver {
    	
    	unset resp.http.Via;
    	unset resp.http.X-Varnish;
    	unset resp.http.Age;
    	unset resp.http.X-Powered-By;
    	
    	}
    • resp.http.Age is not "dangerous" per se, but this testifies there is a cache server on the line (source 1, 2)
    • resp.http.X-Powered-By actually disables the PHP signature
  2. Then restart Varnish
mail

What is X-Content-Type-Options=nosniff ?

This custom HTTP header prevents browsers from doing MIME-sniffing. Only Internet Explorer and Chrome/Chromium implement it so far (source).

You may then ask "What exactly is MIME-sniffing ?" :
Content-sniffing aka MIME-sniffing is inspecting the content of a byte stream in order to deduce the file format of the data (source).

mail

Google's mod_pagespeed

mod_pagespeed is an open-source webserver module developed by Google to automatically apply web performance best practices to pages and their assets (CSS, JS, images) without requiring to modify the existing content or workflow. It is only available for Apache and Nginx so far.

Setup (source) :

  1. enter work directory : cd /tmp
  2. get the package matching your platform/architecture : wget https://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-stable_current_amd64.deb
  3. install : dpkg -i mod-pagespeed-stable_current_amd64.deb

mod_pagespeed can do MANY things to improve a website performance. In the steps below, I'll consider only its optimize for bandwidth mode.

By default, mod_pagespeed is enabled for ALL virtualhosts (source) :

  • If a virtualhost has no mod_pagespeed-specific configuration : it uses the defaults
  • If a virtualhost has mod_pagespeed-specific configuration :
    • InheritVHostConfig = on : the virtualhost inherits global configuration and can override it
    • InheritVHostConfig = off : mod_pagespeed is disabled for this virtualhost

By default, mod_pagespeed serves all HTML with Cache-Control: no-cache, max-age=0 because the transformations made to the page may not be cacheable for extended periods of time. To force mod_pagespeed to leave the HTML caching headers unaltered, add to the conf :
ModPagespeedModifyCachingHeaders off
(source)

Enable mod_pagespeed on a specific virtualhost of an Apache webserver :

  1. Disable mod_pagespeed globally; it will be re-enabled later on a per-virtualhost basis : in /etc/apache2/mods-enabled/pagespeed.conf, change ModPagespeed on into ModPagespeed off
  2. Add to the virtualhost configuration :
    <IfModule pagespeed_module>
    	ModPagespeed on
    	ModPagespeedRewriteLevel OptimizeForBandwidth
    
    	AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
    	AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/css
    
    	# This directory must exist and be writable by the apache user (as specified by the User directive).
    	ModPagespeedFileCachePath "/path/to/mod_pagespeed/cache/"
    </IfModule>
  3. Restart Apache. Enjoy.

Comments & results of experiments :

  • Looks like after enabling mod_pagespeed, the first page generation feeds its cache (the page is served un-optimized), then, starting from the 2nd page request, it's served with optimizations (source 1, 2). This seems to be due to the IPRO feature.
  • Once a resource (ie an image) has been optimized, it's ALWAYS served from the cache. If it changes, the cache needs to be flushed, then re-generated (image served unoptimized), then the new image will be served optimized.
  • Using / commenting the AddOutputFilterByType makes no real difference so far (?)
  • On some resources (CSS, JPG, ... ?), mod_pagespeed adds an extra HTTP header : X-Content-Type-Options

About mod_pagespeed's internal cache :

On the server side, resources are stored in a filesystem-based cache (memcached may be used as a scalable network-accessible cache in addition to the file cache). This cache has its own LRU / threshold cleaning methods. Items live in this cache according to their Cache-Control: max-age header (source 1, 2).

Configuration options

Option Usage
ModPagespeedFileCachePath Defaults to /var/cache/mod_pagespeed/
No caching can be made until this directory is writable by the Apache user (generally www-data)
ModPagespeedGeneratedFilePrefix This directive appears in many examples, but it is reported a deprecated at Apache restart (source)

Flushing the cache (source) :

  • Can be done by touching a file into the directory defined by ModPagespeedFileCachePath :
    touch /var/cache/mod_pagespeed/cache.flush
  • Invalidates cache content, but does not delete files
  • Restarting Apache doesn't flush the cache

About mod_pagespeed's optimize for bandwidth mode :

In this mode, mod_pagespeed :
  • minifies JS and CSS
  • recompresses images :
    • JPG into lower quality JPG
    • Convert PNG into JPG
    • Has no effect on BMP images
  • does not alter HTML at all
While still in "optimize for bandwidth" mode, it is possible to enable additional filters. Some of them MAY alter HTML.

Downstream caching :

Read Configuring Downstream Caches.
mail

How to put a website under maintenance ?

Serving a maintenance page with a HTTP 503 sounds like a smart idea, but reverse proxies (such as Akamai) can be taught not to cache such pages and serve stale content instead. So a HTTP 302 is the right option.

With Apache

In the Virtualhost configuration file :

  1. Create a maintenance.html web page to redirect visitors to, and store it in the documentRoot of the website.
  2. Add to the Virtualhost configuration :
    RewriteEngine on
    RewriteCond %{REQUEST_URI} !/maintenance.html
    RewriteCond %{REMOTE_ADDR} !192\.168\.105\.81
    RewriteRule ^.*$ maintenance.html [R=302,L]
    This only works with HTTP redirection codes (3xx).
    The IP address 192.168.105.81, given as example, mimics my public IP so that everybody except me sees the maintenance page while I'm testing the website.
    Or (source) :
    RewriteEngine On
    RewriteCond %{ENV:REDIRECT_STATUS} !=503
    RewriteRule ^(.*)$ /$1 [R=503,L]
    ErrorDocument 503 /maintenance.html
    The path to the maintenance page is relative to the documentRoot.
  3. Reload Apache, you're done !

With a .htaccess (source) :

  1. Create a maintenance.html web page to redirect visitors to, and store it in the documentRoot of the website.
  2. Create a .htaccess file in the documentRoot of the website and edit it :
    RewriteEngine on
    RewriteCond %{REQUEST_URI} !/maintenance.html$
    RewriteCond %{REMOTE_ADDR} !192\.168\.105\.81
    RewriteRule $ http://www.example.com/maintenance.html [R=302,L]
    If the rewrite destination were just /maintenance.html :
    • http://www.example.com would be redirected to http://www.example.com/maintenance.html
    • and http://www.example.com/other to http://www.example.com/other/maintenance.html (==> 404)

    This only works with HTTP redirection codes (3xx).

    Or :
    RewriteEngine On
    RewriteCond %{ENV:REDIRECT_STATUS} !=503
    RewriteCond %{REMOTE_ADDR} !192\.168\.105\.81
    RewriteRule ^(.*)$ /$1 [R=503,L]
    ErrorDocument 503 /maintenance.html
    The path to the maintenance page is relative to the documentRoot.
  3. Reload Apache, you're done !
mail

Syntax of Netscape cookie files

A cookie file looks like :
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

admin.web.example.com	FALSE	/	FALSE	0		PHPSESSID		97nvft0hm505i0r9jk3c9sds37
.web.example.com	TRUE	/	FALSE	1385999491	example_com_auth_token	529c9e71d22c25.49344365
.web.example.com	TRUE	/	FALSE	1385999491	is_logged_in		true
Fields are :
mail

load / reload / force-reload / refresh in Firefox vs Cache-Control

Action Trigger Effect
load
  • hit in the address bar
  • click on hyperlink
no request happens until the cached resource expires
reload, refresh
  • F5
  • CTRL + R
  • toolbar's Refresh button
  • Rclick | Reload
the request contains the If-Modified-Since and Cache-Control: max-age=0 headers that allow the server to respond with 304 Not Modified if applicable
hard reload,
super refresh
  • CTRL + F5
  • CTRL + SHIFT + R
the request contains the Pragma: no-cache and Cache-Control: no-cache headers and will bypass the cache

Example

load a page :

wget -U "IE Hate" -S -O /dev/null http://www.akamai.com
HTTP/1.1 200 OK

reload / refresh a page :

IMS=$(date --date "now -1 hours" +"%a, %d %b %Y %H:%M:%S GMT"); wget -U "IE Hate" -S -O /dev/null --header "Cache-Control: max-age=0" --header "If-Modified-Since: $IMS" http://www.akamai.com
HTTP/1.1 304 Not Modified
IMS=$(date --date "now -100 days" +"%a, %d %b %Y %H:%M:%S GMT"); wget -U "IE Hate" -S -O /dev/null --header "Cache-Control: max-age=0" --header "If-Modified-Since: $IMS" http://www.akamai.com
HTTP/1.1 200 OK

hard-reload / super-refresh a page :

wget -U "IE Hate" -S -O /dev/null --header "Cache-Control: no-cache" --header "Pragma: no-cache" http://www.akamai.com
HTTP/1.1 200 OK
mail

Detect whether a website is served by Akamai

The easy method :

Just ask WolframAlpha

The method "from bare metal" :

  1. nslookup
  2. set type=cname
  3. www.bmw.com
  4. www.bmw.com canonical name = cn-www.bmw.com.edgesuite.net.

Find the Akamai origin server :

curl -i -s -o /dev/null -D /dev/stdout -H "Pragma: akamai-x-get-cache-key, akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-get-true-cache-key, akamai-x-get-extracted-values, akamai-x-check-cacheable, akamai-x-get-request-id, akamai-x-serial-no, akamai-x-get-ssl-client-session-id" http://www.bmw.com | grep "Cache-Key"
X-Cache-Key: /L/1550/14006/15m/www.bmw.com.origin.bmw.com/ cid=__
X-True-Cache-Key: /L/www.bmw.com.origin.bmw.com/ cid=__
mail

Debug/decode Akamai cache headers

X-Cache: TCP_MEM_HIT from a69-31-112-141 (AkamaiGHost/6.2.2-6802992) (-)
X-Cache-Key: /=/1924/123456/4h/httpOriginHostName/path/to/resource
X-Check-Cacheable: YES

Flags

Flag Description
X-Cache Result of the cache request made to the Edge server. See list of HTTP caching statuses.
Akamai may have some specific statuses.
X-Cache-Remote Result of the cache request made by the Edge server to a parent Edge server, when using Tiered distribution. (source)
X-Check-Cacheable YES|NO : whether Akamai is|is not configured to cache the requested object
mail

Encoding special characters in URL

The concept of urlencoding a character is to replace it with
  1. the % prefix
  2. followed by its ASCII hexadecimal value (examples). See the ASCII table :

urlencode text with Bash (source) :

echo -ne 'my text' | xxd -plain | tr -d '\n' | sed 's/\(..\)/%\1/g'

urldecode text with Bash (source) :

urlEncodedString='http%3A%2F%2Fstackoverflow.com%2Fsearch%3Fq%3Durldecode%2Bbash'; echo -e ${urlEncodedString//%/\\x}
http://stackoverflow.com/search?q=urldecode+bash
mail

How to fight SQL injections ?

Here are some tips to minimize the risk of SQL injections :