Varnish - How to ... ?

mail

How to troubleshoot the Varnish Guru Meditation error ?

Situation

Full error message :
Error 503 Backend fetch failed

Backend fetch failed
Guru Meditation:

XID: 98441

Varnish cache server

Details

The HTTP 503 error means that the backend server (i.e. web server) trying to be reached is unavailable. This could be because it is :

Solution

step 1 — identify your backend :

Sounds funny, but on complex web stacks, you'd better not lose time investigating the wrong server or application while the production is down. So make sure you know who's talking to who :
  1. grep -A4 backend /etc/varnish/default.vcl
    backend default {
    	.host = "127.0.0.1";
    	.port = "8080";
    	}
    • method 1 :
      ss -punta | grep 127.0.0.1:8080
      tcp    LISTEN     0      0      127.0.0.1:8080                  *:*                   users:(("lighttpd",pid=4243,fd=4))
      tcp    TIME-WAIT  0      0      127.0.0.1:8080               127.0.0.1:40892
    • alternate method :
      lsof -i 4tcp@127.0.0.1:8080
      COMMAND   PID     USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
      lighttpd 4243 www-data    4u  IPv4 1855533916      0t0  TCP localhost.localdomain:http-alt (LISTEN)

step 2 — make sure the backend is not overloaded :

On my side, the backend was clearly not overloaded, so nothing much to investigate. Here are a few starting points :
  • check CPU + RAM + swap usage with htop
  • make sure no disk is full or has 100% of inodes used

step 3 — make sure the backend is not down for maintenance :

  • Check with colleagues / internal messages / ...
  • If it's a one-person project (you), you should already be aware of services stopped for maintenance (otherwise, consider cutting on caffeine ). Anyway, it could also be stopped for another reason, so just check it :
    1. is there a startup error message ?
      systemctl restart lighttpd
    2. anything in systemd logs ?
      journalctl -ru lighttpd
    3. other logs :
      • /var/log/lighttpd/error.log
      • PHP error log, if any (find its name with : grep -E '^error_log' /etc/php/7.0/cgi/php.ini)

step 4 — investigate the Guru Meditation itself :

  1. Have a look at Varnish logs :
    varnishlog -q 'RespStatus == 503' -g request
  2. Refresh the page causing the error (F5), you should get some data. In my case, it was :
    --  BogoHeader     Too many headers: Set-Cookie: myWebSite_
    --  HttpGarbage    "HTTP/1.1%00"
    --  BerespStatus   503
    --  BerespReason   Service Unavailable
    --  FetchError     http format error
  3. The website myWebSite sends too many cookies, named after myWebSite_.
    We have a culprit, the debug of the Guru Meditation error is now over .
mail

How to purge Varnish cache ?

Method 1 — restart the daemon :

systemctl restart varnish
This works fine but has nuclear weapon-like sense of details .

Method 2 — ban content with varnishadm :

This allows to precisely target what to remove from the cache.
So far, I've found no way of listing cache contents (except by searching HIT and MISS in the logs).

in non-interactive mode :

varnishadm ban req.http.host == example.com '&&' req.url '~' '\\.png$'

in interactive mode :

(todo ;-)
mail

How to defeat hotlinking with Varnish ?

Edit /etc/varnish/default.vcl :
  1. sub detectHotlinking {
    	if(req.http.host == "my.website.tld" && req.http.referer ~ "(BADDOMAIN\.com|IMAGESUCKERDOMAIN\.com)" && req.url ~ "^/path/to/whatever.jpg$") {
    		return (synth (444, ""));
    		}
    	}
    • the req.http.host == "" test is there because, in my setup, Varnish serves several distinct virtualhosts but I want these hotlinking rules to apply to THIS virtualhost only.
    • the 444 code has no special meaning. You can choose any number you like, as long as what is output by detectHotlinking matches what vcl_synth expects as input (see below).
  2. sub vcl_recv {
    	
    	call detectHotlinking;
    	
    	}
  3. sub vcl_synth {
    	# "Hotlinking is BAD¹⁰⁰⁰⁰"
    	if (resp.status == 444) {
    		set resp.status = 302;
    		set resp.http.location = "/pictures/hotlinking.png";
    		}
    	}
    • this is where the 444 discussed above is matched. This function generally ends as a giant switch / case block handling distinct responses to different behaviors detected by one or more subs.
    • in the specific case of fighting hotlinking, the HTTP 302 code is interesting because the response is not cached. This means ALL requests to http://my.website.tld/path/to/whatever.jpg will be re-evaluated individually and served accordingly. Indeed, the web cache matches a request with a response without considering the referrer header, possibly leading to over-blocking.
mail

How to declare removed resources ?

Situation

Your Varnish-served website offers resources (such as PDF files) to visitors. Should you decide to stop serving those files, you can simply remove them, which will result in a 404 error when requesting any removed file. In such situation, you could :

Solution

It all takes place in /etc/varnish/default.vcl :
  1. add a function to declare removed resource and define what to do when receiving a request for such resource :
    sub declareRemovedResources {
    	if(req.http.host == "my.website.tld" && req.url ~ "path/to/.*[dD]ocument.*\.pdf") {
    	    return (synth (410, ""));		this could be any arbitrary code. 410 chosen for consistency
    	    }
    	}
  2. run this function upon receiving requests :
    sub vcl_recv {
    	
    	call declareRemovedResources;
    	
    	}
  3. apply the defined behavior :
    sub vcl_synth {
    	
    	if (resp.status == 410) {		this matches the arbitrary code described above
    	    set resp.status = 410;		this is the actual HTTP response code that will be returned
    	    }
    	
    
    	return(deliver);
    	}
  4. check the config file
  5. restart Varnish :
    systemctl restart varnish.service
  6. check + restart :
    configFile='/etc/varnish/default.vcl'; varnishd -C -f "$configFile" 2>/dev/null && systemctl restart varnish.service || echo "error in '$configFile'"
mail

How to split varnishncsa logs per virtualhost ?

Situation

Solution

commands that may help in the process :
  • list unit files :
    ls -l /etc/systemd/system/varnishncsa.*.service
  • list current varnishncsa processes :
    ps aux | grep [n]csa
  • "stop" all varnishncsa processes (this is dirty!) :
    killall -15 varnishncsa
Now the procedure :
  1. Let's start by having a look at the current varnishncsa service unit file :
    cat /etc/systemd/system/varnishncsa.service
    [Unit]
    Description=Varnish Cache HTTP accelerator NCSA logging daemon
    After=varnish.service
    
    [Service]
    RuntimeDirectory=varnishncsa
    Type=forking
    PIDFile=/run/varnishncsa/varnishncsa.pid
    User=varnishlog
    Group=varnish
    ExecStart=/usr/bin/varnishncsa -a -w /var/log/varnish/varnishncsa.log -D -P /run/varnishncsa/varnishncsa.pid
    ExecReload=/bin/kill -HUP $MAINPID
    
    [Install]
    WantedBy=multi-user.target
  2. stop the "old" configuration :
    systemctl stop varnishncsa.service
  3. /etc/systemd/system/varnishncsa.service must not stay in place, otherwise it'll interfere with the system unit files we're about to create. Move it somewhere else or (my advice) : create a local Git repository, commit it as the "before" situation then use it to build the first specific service unit file.
  4. Create the 3 specific service unit files (they must be named after whatever.service, details on custom service unit files) :
    • /etc/systemd/system/varnishncsa.A.service
      [Unit]
      Description=Varnish Cache HTTP accelerator NCSA logging daemon (A)
      After=varnish.service
      
      [Service]
      RuntimeDirectory=varnishncsa
      Type=forking
      PIDFile=/run/varnishncsa/varnishncsa_A.example.com.pid		looks redundant with -P below but seems necessary 
      User=varnishlog
      Group=varnish
      ExecStart=/usr/bin/varnishncsa -q "ReqHeader:Host eq 'A.example.com'" -D -a -w /var/log/varnish/A.example.com.log -P /run/varnishncsa/varnishncsa_A.example.com.pid
      ExecReload=/bin/kill -HUP $MAINPID
      
      [Install]
      WantedBy=multi-user.target
    • /etc/systemd/system/varnishncsa.B.service
      [Unit]
      Description=Varnish Cache HTTP accelerator NCSA logging daemon (B)
      After=varnish.service
      
      [Service]
      RuntimeDirectory=varnishncsa
      Type=forking
      PIDFile=/run/varnishncsa/varnishncsa_B.example.com.pid
      User=varnishlog
      Group=varnish
      ExecStart=/usr/bin/varnishncsa -q "ReqHeader:Host eq 'B.example.com'" -D -a -w /var/log/varnish/B.example.com.log -P /run/varnishncsa/varnishncsa_B.example.com.pid
      ExecReload=/bin/kill -HUP $MAINPID
      
      [Install]
      WantedBy=multi-user.target
    • /etc/systemd/system/varnishncsa.other.service
      [Unit]
      Description=Varnish Cache HTTP accelerator NCSA logging daemon (other: not 'A' and not 'B')
      After=varnish.service
      
      [Service]
      RuntimeDirectory=varnishncsa
      Type=forking
      PIDFile=/run/varnishncsa/varnishncsa_other.example.com.pid
      User=varnishlog
      Group=varnish
      ExecStart=/usr/bin/varnishncsa -q "ReqHeader:Host !~ '(A|B).example.com'" -D -a -w /var/log/varnish/other.example.com.log -P /run/varnishncsa/varnishncsa_other.example.com.pid
      ExecReload=/bin/kill -HUP $MAINPID
      
      [Install]
      WantedBy=multi-user.target
  5. Start it all :
    chmod 664 /etc/systemd/system/varnishncsa.*.service; systemctl daemon-reload && systemctl start varnishncsa.A.service; systemctl start varnishncsa.B.service; systemctl start varnishncsa.other.service
  6. If everything works fine (and provided there is traffic coming to the websites), you should see some logfiles :
    ll /var/log/varnish/*.example.com.log
  7. Don't forget to logrotate your new log files.
https://unix.stackexchange.com/questions/463321/how-to-split-varnishncsa-logs-into-separate-under-systemd-ubuntu-16-04#answer-463446
So for each unit file, 1 per vhost, you'd add your exec line like so:
unit file #1

ExecStart=/usr/bin/varnishncsa -q "ReqHeader ~ '^Host: somedomain1.com'" -D -a -w /var/log/varnish/somedomain1.log -P /run/varnishncsa/varnishncsa_vhost1.pid -F '%%{X-Forwarded-For}i %%l %%u %%t "%%r" %%s %%b "%%{Referer}i" "%%{User-agent}i"'

unit file #2

ExecStart=/usr/bin/varnishncsa -q "ReqHeader ~ '^Host: somedomain2.com'" -D -a -w /var/log/varnish/somedomain2.log -P /run/varnishncsa/varnishncsa_vhost2.pid -F '%%{X-Forwarded-For}i %%l %%u %%t "%%r" %%s %%b "%%{Referer}i" "%%{User-agent}i"'
mail

How to test the configuration file ?

configFile='/etc/varnish/default.vcl'; varnishd -C -f "$configFile" 2>/dev/null && echo "$configFile : OK" || echo "$configFile : KO"

mail

How to filter requests based on HTTP headers ?

Important notes before we start :

  • This article focuses on Varnish 4.0. The syntax is somewhat different for versions 3.x, and snippets below won't work.
  • In the example below, I chose the referer as the filtering criteria. A real-life application would be to prevent hotlinking, but in such case, using a white list (rather than a black list) should be more efficient.

About the HTTP referer header itself :

  • It can be disabled using the Web Developer Firefox extension.
  • It contains the full URL of the referring page, so it must be tested using RegExp rather than the == operator.

Configuration :

The global idea is to analyze requests when they arrive, which is the purpose of the built-in vcl_recv function. From there, we'll try to match the referer HTTP header, and if matched, we'll call another function, vcl_synth, with an "id", describing which actions are to be taken (see diagram of VCL workflow).
All of this takes place in /etc/varnish/default.VCL :
To be added into vcl_recv :
sub vcl_recv {
	
	call detectHotLinking;
	
Then :
sub detectHotLinking {
	if(req.http.host == "my.site.tld" && req.http.referer ~ "evil\.hotlinker\.com") {
		return (synth (750, ""));
		}
	}

sub vcl_synth {
	if (resp.status == 750) {
		set resp.status = 301;
		set resp.http.Location = "http://images.google.com";
		}
	return(deliver);
	}

Regexp matching in VCL files follow the ERE syntax.

Test :

Regular request :
wget -q -S http://my.site.tld
Should return HTTP 200
A request as the evil hotlinker :
wget -q -S --header='Referer: http://evil.hotlinker.com' http://my.site.tld
Should return the configured HTTP redirection

More filters :

To filter on ... Use the VCL object :
referer req.http.referer (actually : all req.http.HTTP header objects are available)
URL req.url
mail

How to list backends ?

  1. telnet varnishHost 6082
  2. if required, please authenticate
  3. backend.list
    200 1427
    Backend name			Refs	Admin	Probe
    host9(10.0.16.19,,80)	12	probe	Healthy 5/5
    host10(10.0.16.22,,80)	10	probe	Healthy 5/5
    host11(10.0.16.23,,80)	6	probe	Healthy 5/5
    host12(10.0.16.24,,80)	10	probe	Healthy 5/5
    host21(10.0.16.117,,80)	7	probe	Healthy 5/5
    host134(10.0.16.134,,80)	6	probe	Healthy 5/5
    host167(10.0.16.167,,80)	2	probe	Sick 0/5
mail

How to add an HTTP Header showing HIT / MISS status ?

Update /etc/varnish/default.vcl
sub vcl_deliver {
	if (obj.hits > 0) {
		set resp.http.X-Cache = "HIT";
		}
	else {
		set resp.http.X-Cache = "MISS";
		}
	}
mail

How to authenticate to the Telnet CLI ?

  1. First, you should have a /etc/varnish/secret file containing a secret key.
  2. Then, upon opening a telnet session, you are presented with a challenge :
    Connected to localhost.localdomain.
    Escape character is '^]'.
    107 59
    uzaiqheubccbpimwyyevwqfedtxuqdwm
    
    Authentication required.
    
    
    The CLI 107 status code means that authentication is requested.
  3. To authenticate (assuming the secret key is foo), you'll have to compute the response with :
    challenge='uzaiqheubccbpimwyyevwqfedtxuqdwm'; secret=$(cat /etc/varnish/secret); echo -e "${challenge}\n${secret}\n${challenge}" | sha256sum
    2bcd7855d7d63870aecaa7f7c5eeeae3166581644f8216698558d78f19c3bdc2

    The documentation states that the string to be sha256sum'd is made of : challenge + newline + secret + challenge + newline, while the command above adds a newline after the secret, and no newline after the final challenge. This is because the secret is read by cat (which adds no newline), and the whole string is output by echo (that DOES add a trailing newline).

  4. Back to the telnet session, enter :
    auth 2bcd7855d7d63870aecaa7f7c5eeeae3166581644f8216698558d78f19c3bdc2
    Then you get :
    200 239
    -----------------------------
    Varnish Cache CLI 1.0
    -----------------------------
    Linux,2.6.32-042stab106.4,x86_64,-smalloc,-smalloc,-hcritbit
    varnish-4.0.3 revision b8c4a34
    
    Type 'help' for command list.
    Type 'quit' to close CLI session.
    
    
    The CLI 200 status code means OK.