Varnish - Httqm's Docs

Full error message :

Error 503 Backend fetch failed

Backend fetch failed
Guru Meditation:

XID: 98441

Varnish cache server

The HTTP 503 error means that the backend server (i.e. web server) trying to be reached is unavailable. This could be because it is :

overloaded
down for maintenance
not fully functional for another reason

step 1 — identify your backend :

Sounds funny, but on complex web stacks, you'd better not lose time investigating the wrong server or application while the production is down. So make sure you know who's talking to who :

grep -A4 backend /etc/varnish/default.vcl

backend default {
	.host = "127.0.0.1";
	.port = "8080";
	}

method 1 :

ss -punta | grep 127.0.0.1:8080

tcp    LISTEN     0      0      127.0.0.1:8080                  *:*                   users:(("lighttpd",pid=4243,fd=4))
tcp    TIME-WAIT  0      0      127.0.0.1:8080               127.0.0.1:40892

alternate method :

lsof -i 4tcp@127.0.0.1:8080

COMMAND   PID     USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME
lighttpd 4243 www-data    4u  IPv4 1855533916      0t0  TCP localhost.localdomain:http-alt (LISTEN)

step 2 — make sure the backend is not overloaded :

On my side, the backend was clearly not overloaded, so nothing much to investigate. Here are a few starting points :

check CPU + RAM + swap usage with htop
make sure no disk is full or has 100% of inodes used

step 3 — make sure the backend is not down for maintenance :

Check with colleagues / internal messages / ...
If it's a one-person project (you), you should already be aware of services stopped for maintenance (otherwise, consider cutting on caffeine ). Anyway, it could also be stopped for another reason, so just check it :
1. is there a startup error message ?
  systemctl restart lighttpd
2. anything in systemd logs ?
  journalctl -ru lighttpd
3. other logs :
  - /var/log/lighttpd/error.log
  - PHP error log, if any (find its name with : grep -E '^error_log' /etc/php/7.0/cgi/php.ini)

step 4 — investigate the Guru Meditation itself :

Have a look at Varnish logs :
varnishlog -q 'RespStatus == 503' -g request

Refresh the page causing the error (F5), you should get some data. In my case, it was :

--  BogoHeader     Too many headers: Set-Cookie: myWebSite_
--  HttpGarbage    "HTTP/1.1%00"
--  BerespStatus   503
--  BerespReason   Service Unavailable
--  FetchError     http format error

The website myWebSite sends too many cookies, named after myWebSite_.
We have a culprit, the debug of the Guru Meditation error is now over .

Method 1 — restart the daemon :

systemctl restart varnish

This works fine but has nuclear weapon-like sense of details .

Method 2 — ban content with varnishadm :

This allows to precisely target what to remove from the cache.

So far, I've found no way of listing cache contents (except by searching HIT and MISS in the logs).

in non-interactive mode :

varnishadm ban req.http.host == example.com '&&' req.url '~' '\\.png$'

in interactive mode :

(todo ;-)

Edit /etc/varnish/default.vcl :

```
sub detectHotlinking {
	if(req.http.host == "my.website.tld" && req.http.referer ~ "(BADDOMAIN\.com|IMAGESUCKERDOMAIN\.com)" && req.url ~ "^/path/to/whatever.jpg$") {
		return (synth (444, ""));
		}
	}
```
- the req.http.host == "" test is there because, in my setup, Varnish serves several distinct virtualhosts but I want these hotlinking rules to apply to THIS virtualhost only.
- the 444 code has no special meaning. You can choose any number you like, as long as what is output by detectHotlinking matches what vcl_synth expects as input (see below).

sub vcl_recv {
	
	call detectHotlinking;
	
	}

```
sub vcl_synth {
	# "Hotlinking is BAD¹⁰⁰⁰⁰"
	if (resp.status == 444) {
		set resp.status = 302;
		set resp.http.location = "/pictures/hotlinking.png";
		}
	}
```
- this is where the 444 discussed above is matched. This function generally ends as a giant switch / case block handling distinct responses to different behaviors detected by one or more subs.
- in the specific case of fighting hotlinking, the HTTP 302 code is interesting because the response is not cached. This means ALL requests to http://my.website.tld/path/to/whatever.jpg will be re-evaluated individually and served accordingly. Indeed, the web cache matches a request with a response without considering the referrer header, possibly leading to over-blocking.

Your Varnish-served website offers resources (such as PDF files) to visitors. Should you decide to stop serving those files, you can simply remove them, which will result in a 404 error when requesting any removed file. In such situation, you could :

leave things as-is. Technically, this is ok. But if you're trying to take things seriously about your website, this is not a good thing, though :
- nobody likes dead links
- search engines may decrease your site notation if it has too many 404
or let everybody aware —and especially search engines, actually— that the corresponding resource is gone forever and that it should be removed from the indexed resources for your website. This is exactly what the 410 status code is about.

It all takes place in /etc/varnish/default.vcl :

add a function to declare removed resource and define what to do when receiving a request for such resource :

sub declareRemovedResources {
	if(req.http.host == "my.website.tld" && req.url ~ "path/to/.*[dD]ocument.*\.pdf") {
	    return (synth (410, ""));		this could be any arbitrary code. 410 chosen for consistency
	    }
	}

run this function upon receiving requests :

sub vcl_recv {
	
	call declareRemovedResources;
	
	}

apply the defined behavior :

sub vcl_synth {
	
	if (resp.status == 410) {		this matches the arbitrary code described above
	    set resp.status = 410;		this is the actual HTTP response code that will be returned
	    }
	

	return(deliver);
	}

check the config file
restart Varnish :
systemctl restart varnish.service
check + restart :
configFile='/etc/varnish/default.vcl'; varnishd -C -f "$configFile" 2>/dev/null && systemctl restart varnish.service || echo "error in '$configFile'"

Varnish is setup to serve HTML contents on several virtualhosts :
- A.example.com
- B.example.com
- ... and some others
In the current setup, all logs are written to a single logfile by varnishncsa : /var/log/varnish/varnishncsa.log
A.example.com and B.example.com are websites I'd like to get statistics about. The other virtualhosts are there for technical reasons or tests, and I don't need stats about them.
To list these virtualhosts :
1. grep 'example.com' /var/log/varnish/varnishncsa.log | grep -Ev 'A|B|C|'
2. complete the 2^nd grep list until you caught them all.
The stats software I'll use (AWStats) expects distinct logfiles per virtualhost to do its job. So I'll need 3 distinct logfiles :
- the 1^st for A.example.com
- the 2^nd for B.example.com
- and a 3^rd for everything else (which won't be analyzed)
varnishncsa itself is controlled by systemd. To write 3 distinct logfiles, the solution is to run 3 varnishncsa instances (sources : 1, 2, 3), fired by 3 custom service unit files.

commands that may help in the process :

list unit files :
ls -l /etc/systemd/system/varnishncsa.*.service
list current varnishncsa processes :
ps aux | grep [n]csa
"stop" all varnishncsa processes (this is dirty!) :
killall -15 varnishncsa

Now the procedure :

Let's start by having a look at the current varnishncsa service unit file :

cat /etc/systemd/system/varnishncsa.service

[Unit]
Description=Varnish Cache HTTP accelerator NCSA logging daemon
After=varnish.service

[Service]
RuntimeDirectory=varnishncsa
Type=forking
PIDFile=/run/varnishncsa/varnishncsa.pid
User=varnishlog
Group=varnish
ExecStart=/usr/bin/varnishncsa -a -w /var/log/varnish/varnishncsa.log -D -P /run/varnishncsa/varnishncsa.pid
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

stop the "old" configuration :
systemctl stop varnishncsa.service
/etc/systemd/system/varnishncsa.service must not stay in place, otherwise it'll interfere with the system unit files we're about to create. Move it somewhere else or (my advice) : create a local Git repository, commit it as the "before" situation then use it to build the first specific service unit file.

Create the 3 specific service unit files (they must be named after whatever.service, details on custom service unit files) :

/etc/systemd/system/varnishncsa.A.service

[Unit]
Description=Varnish Cache HTTP accelerator NCSA logging daemon (A)
After=varnish.service

[Service]
RuntimeDirectory=varnishncsa
Type=forking
PIDFile=/run/varnishncsa/varnishncsa_A.example.com.pid		looks redundant with -P below but seems necessary 
User=varnishlog
Group=varnish
ExecStart=/usr/bin/varnishncsa -q "ReqHeader:Host eq 'A.example.com'" -D -a -w /var/log/varnish/A.example.com.log -P /run/varnishncsa/varnishncsa_A.example.com.pid
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

/etc/systemd/system/varnishncsa.B.service

[Unit]
Description=Varnish Cache HTTP accelerator NCSA logging daemon (B)
After=varnish.service

[Service]
RuntimeDirectory=varnishncsa
Type=forking
PIDFile=/run/varnishncsa/varnishncsa_B.example.com.pid
User=varnishlog
Group=varnish
ExecStart=/usr/bin/varnishncsa -q "ReqHeader:Host eq 'B.example.com'" -D -a -w /var/log/varnish/B.example.com.log -P /run/varnishncsa/varnishncsa_B.example.com.pid
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

/etc/systemd/system/varnishncsa.other.service

[Unit]
Description=Varnish Cache HTTP accelerator NCSA logging daemon (other: not 'A' and not 'B')
After=varnish.service

[Service]
RuntimeDirectory=varnishncsa
Type=forking
PIDFile=/run/varnishncsa/varnishncsa_other.example.com.pid
User=varnishlog
Group=varnish
ExecStart=/usr/bin/varnishncsa -q "ReqHeader:Host !~ '(A|B).example.com'" -D -a -w /var/log/varnish/other.example.com.log -P /run/varnishncsa/varnishncsa_other.example.com.pid
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

Start it all :
chmod 664 /etc/systemd/system/varnishncsa.*.service; systemctl daemon-reload && systemctl start varnishncsa.A.service; systemctl start varnishncsa.B.service; systemctl start varnishncsa.other.service
If everything works fine (and provided there is traffic coming to the websites), you should see some logfiles :
ll /var/log/varnish/*.example.com.log
Don't forget to logrotate your new log files.

https://unix.stackexchange.com/questions/463321/how-to-split-varnishncsa-logs-into-separate-under-systemd-ubuntu-16-04#answer-463446
So for each unit file, 1 per vhost, you'd add your exec line like so:
unit file #1

ExecStart=/usr/bin/varnishncsa -q "ReqHeader ~ '^Host: somedomain1.com'" -D -a -w /var/log/varnish/somedomain1.log -P /run/varnishncsa/varnishncsa_vhost1.pid -F '%%{X-Forwarded-For}i %%l %%u %%t "%%r" %%s %%b "%%{Referer}i" "%%{User-agent}i"'

unit file #2

ExecStart=/usr/bin/varnishncsa -q "ReqHeader ~ '^Host: somedomain2.com'" -D -a -w /var/log/varnish/somedomain2.log -P /run/varnishncsa/varnishncsa_vhost2.pid -F '%%{X-Forwarded-For}i %%l %%u %%t "%%r" %%s %%b "%%{Referer}i" "%%{User-agent}i"'

configFile='/etc/varnish/default.vcl'; varnishd -C -f "$configFile" 2>/dev/null && echo "$configFile : OK" || echo "$configFile : KO"

Important notes before we start :

This article focuses on Varnish 4.0. The syntax is somewhat different for versions 3.x, and snippets below won't work.
In the example below, I chose the referer as the filtering criteria. A real-life application would be to prevent hotlinking, but in such case, using a white list (rather than a black list) should be more efficient.

About the HTTP referer header itself :

It can be disabled using the Web Developer Firefox extension.
It contains the full URL of the referring page, so it must be tested using RegExp rather than the == operator.

Configuration :

The global idea is to analyze requests when they arrive, which is the purpose of the built-in vcl_recv function. From there, we'll try to match the referer HTTP header, and if matched, we'll call another function, vcl_synth, with an "id", describing which actions are to be taken (see diagram of VCL workflow).

All of this takes place in /etc/varnish/default.VCL :

To be added into vcl_recv :

sub vcl_recv {
	
	call detectHotLinking;

Then :

sub detectHotLinking {
	if(req.http.host == "my.site.tld" && req.http.referer ~ "evil\.hotlinker\.com") {
		return (synth (750, ""));
		}
	}

sub vcl_synth {
	if (resp.status == 750) {
		set resp.status = 301;
		set resp.http.Location = "http://images.google.com";
		}
	return(deliver);
	}

Regexp matching in VCL files follow the ERE syntax.

Test :

Regular request :: wget -q -S http://my.site.tld
Should return HTTP 200
A request as the evil hotlinker :: wget -q -S --header='Referer: http://evil.hotlinker.com' http://my.site.tld
Should return the configured HTTP redirection

More filters :

To filter on ...	Use the VCL object :
referer	`req.http.referer` (actually : all `req.http.HTTP header` objects are available)
URL	`req.url`

telnet varnishHost 6082
if required, please authenticate

backend.list

200 1427
Backend name			Refs	Admin	Probe
host9(10.0.16.19,,80)	12	probe	Healthy 5/5
host10(10.0.16.22,,80)	10	probe	Healthy 5/5
host11(10.0.16.23,,80)	6	probe	Healthy 5/5
host12(10.0.16.24,,80)	10	probe	Healthy 5/5
host21(10.0.16.117,,80)	7	probe	Healthy 5/5
host134(10.0.16.134,,80)	6	probe	Healthy 5/5
host167(10.0.16.167,,80)	2	probe	Sick 0/5

Update /etc/varnish/default.vcl

sub vcl_deliver {
	if (obj.hits > 0) {
		set resp.http.X-Cache = "HIT";
		}
	else {
		set resp.http.X-Cache = "MISS";
		}
	}

First, you should have a /etc/varnish/secret file containing a secret key.

Then, upon opening a telnet session, you are presented with a challenge :

Connected to localhost.localdomain.
Escape character is '^]'.
107 59
uzaiqheubccbpimwyyevwqfedtxuqdwm

Authentication required.

The CLI 107 status code means that authentication is requested.

To authenticate (assuming the secret key is foo), you'll have to compute the response with :
challenge='uzaiqheubccbpimwyyevwqfedtxuqdwm'; secret=$(cat /etc/varnish/secret); echo -e "${challenge}\n${secret}\n${challenge}" | sha256sum
```
2bcd7855d7d63870aecaa7f7c5eeeae3166581644f8216698558d78f19c3bdc2
```
The documentation states that the string to be sha256sum'd is made of : challenge + newline + secret + challenge + newline, while the command above adds a newline after the secret, and no newline after the final challenge. This is because the secret is read by cat (which adds no newline), and the whole string is output by echo (that DOES add a trailing newline).

Back to the telnet session, enter :

auth 2bcd7855d7d63870aecaa7f7c5eeeae3166581644f8216698558d78f19c3bdc2

Then you get :

200 239
-----------------------------
Varnish Cache CLI 1.0
-----------------------------
Linux,2.6.32-042stab106.4,x86_64,-smalloc,-smalloc,-hcritbit
varnish-4.0.3 revision b8c4a34

Type 'help' for command list.
Type 'quit' to close CLI session.

The CLI 200 status code means OK.

Varnish - How to ... ?

How to troubleshoot the Varnish Guru Meditation error ?

Situation

Details

Solution

step 1 — identify your backend :

step 2 — make sure the backend is not overloaded :

step 3 — make sure the backend is not down for maintenance :

step 4 — investigate the Guru Meditation itself :

How to purge Varnish cache ?

Method 1 — restart the daemon :

Method 2 — ban content with varnishadm :

in non-interactive mode :

in interactive mode :

How to defeat hotlinking with Varnish ?

How to declare removed resources ?

Situation

Solution

How to split varnishncsa logs per virtualhost ?

Situation

Solution

How to test the configuration file ?

How to filter requests based on HTTP headers ?

Important notes before we start :

About the HTTP referer header itself :

Configuration :

Test :

More filters :

How to list backends ?

How to add an HTTP Header showing HIT / MISS status ?

How to authenticate to the Telnet CLI ?