Beginning To See The Nginx

Hello, the title is a play on that thing. It's just a song I suddenly remembered.

The reason for the unusually high number of article updates lately is just that I'm in that phase where once I start caring about something, I can't stop.

Nginx

Up until now, I had set it up as a reverse proxy and cache server for the time being, but along with IPv6 support, I started to care about it, so I reviewed the settings, including a refactor.

File Structure

Some parts are hidden, but it looks like this:

├── http.d
│   ├── bot_rate_limit.conf
│   ├── gzip.conf
│   ├── proxy_cache_zones.conf
│   └── proxy_common.conf
├── mime.types
├── mime.types-dist
├── nginx.conf
├── scgi_params
├── selfsigned.crt
├── selfsigned.key
├── sites-enabled
│   ├── 1btc.love.conf
│   ├── btclol.xyz.conf
│   ├── damepo.jp.conf
│   ├── git.soulminingrig.com.conf
│   ├── soulminingrig.com.conf
│   ├── starlink.soulminingrig.com.conf
│   ├── stg.api.1btc.love.conf
├── snippets
│   ├── common_error_pages.conf
│   ├── proxy_headers.conf
│   └── ssl_common.conf
├── uwsgi_params
└── win-utf

I was a bit unsure about which folder name to use for managing conf files loaded per location directive, but after consulting with ChatGPT, this is what I ended up with. Since http.d contains files included from the http directive in nginx.conf, it made a lot of sense to me.

However, I felt that naming it snippets was a bit of a toss-up, but it'll do.

Let's take a look at each configuration item.

http.d/bot_rate_limit.conf

Meta's crawlers were getting out of hand, so I decided to apply limits not just with fail2ban but also on a per-UA (User Agent) basis.

Also, regarding feed/RSS, I received a polite message about it, but applying limits there doesn't make much sense anyway. Since they are mostly served from cache, they don't really contribute to the load on the Origin, so I made an exception for them.

# Apply rate limits only to bots and link expansion crawlers
map $http_user_agent $is_bot {
    default 0;
    ~*bot 1;
    ~*crawler 1;
    ~*spider 1;
    ~*facebookexternalhit 1;
    ~*slackbot 1;
    ~*discordbot 1;
    ~*twitterbot 1;
    ~*linkedinbot 1;
    ~*embedly 1;
    ~*quora 1;
    ~*skypeuripreview 1;
    ~*whatsapp 1;
    ~*telegrambot 1;
    ~*applebot 1;
    ~*pingdom 1;
    ~*uptimerobot 1;
}
# Since stg.api.1btc.love is for verification purposes, exclude it from rate limits even if it's a bot
# Setting the key to an empty string means it won't be counted in limit_req_zone
map $server_name $bot_limit_host_key {
    stg.api.1btc.love "";
    default $binary_remote_addr;
}
# Exclude feed.xml / feed.json from rate limits even if it's a bot
map $uri $is_feed_path {
    default 0;
    ~*feed\.(xml|json)$ 1;
}
# Use the per-IP key only when it's a bot and not a feed.xml / feed.json path
map "$is_bot:$is_feed_path" $bot_limit_key {
    default "";
    "1:0" $bot_limit_host_key;
}
limit_req_zone $bot_limit_key zone=bot:10m rate=1r/s;
limit_req_status 429;
limit_req zone=bot burst=5 nodelay;

Of course, since this is applied to the http directive, it basically applies to everything, but I made it so that I can at least make exceptions. I've judged that these are things that should ideally be enabled by default.

If someone clearly attempts a DoS by spoofing their UA or browser, they'll be blocked by fail2ban and treated as a drop, preventing them from sending requests for a while, so it's a two-tier defense.

http.d/gzip.conf

I used to support brotli in the past, but having to build it separately makes version upgrades a hassle, so I stopped. The benefit of being able to update via pkg/apt is significant.

gzip on;
gzip_vary off;
gzip_proxied any;
gzip_min_length 1024;
gzip_comp_level 7;
gzip_http_version 1.1;
gzip_types text/plain
text/xml
text/css
text/javascript
image/gif
image/png
image/svg+xml
application/javascript
application/json
application/xml
application/x-javascript
application/font-woff
application/font-woff2
application/font-ttf
application/octet-stream;

Not much to say here, but I started including gzip_min_length a few years ago, even though I didn't use it before. That was when I decided to include it properly, thinking that inefficient compression is just a waste.

http.d/proxy_cache_zones.conf

You might think the indentation is slightly off, but that's a constraint of the formatter.

I've deleted unnecessary items, so the zone numbering starts from 4 and is a bit inconsistent, but well...

Regarding inactive, it's set to expire if there's no hit for 7 days, and for use_temp_path, I've set it to cache directly to the cache path. It seems that even if you have a setting like proxy_temp_path /tmp/nginx;, it will cache without going through that, making it faster.

proxy_cache_path /tmp/nginx/zone4 levels=1:2 keys_zone=zone4:10m
inactive=7d
max_size=3g
use_temp_path=off;

proxy_cache_path /tmp/nginx/posts levels=1:2 keys_zone=posts:10m inactive=7d max_size=2g use_temp_path=off; proxy_cache_path /tmp/nginx/git levels=1:2 keys_zone=git:10m inactive=7d max_size=2g use_temp_path=off; proxy_cache_path /tmp/nginx/static levels=1:2 keys_zone=static_cache:10m inactive=7d max_size=1g use_temp_path=off; proxy_cache_path /tmp/nginx/1btc_cache levels=1:2 keys_zone=1btc_cache:10m inactive=7d max_size=512m use_temp_path=off;

http.d/proxy_common.conf

proxy_cache_valid is set as a common rule, and the rest is managed by overwriting it in the location directive.

This way, cache operations can be performed even without explicit cache settings.

By setting proxy_cache_bypass $http_cookie, it prevents requests with cookies—for example, after logging in—from potentially showing the pre-login screen.

Also, having redirects, errors, and any return cached responses is a countermeasure against attacks. By doing this, at least a cached response is returned, preventing abnormal traffic from reaching the Origin.

While proxy_temp_path should ideally point to a persistent path, I use /tmp because I don't encounter such scenarios in my use case. If you do this on a site with significant traffic, as mentioned in that unused option earlier, all cache would likely be lost upon server restart, increasing the load on the Origin and risking failure.

proxy_buffering on;
proxy_cache_bypass $http_cookie;
proxy_cache_background_update on;
proxy_cache_key "$scheme$request_method$host$request_uri";
proxy_cache_revalidate on;
proxy_cache_use_stale updating;
proxy_connect_timeout 60;
proxy_no_cache $http_cookie;
proxy_read_timeout 90;
proxy_send_timeout 60;
proxy_temp_path /tmp/nginx;
proxy_cache_valid 200 201 60s;
proxy_cache_valid 301 1d;
proxy_cache_valid 302 3h;
proxy_cache_valid 304 1d;
proxy_cache_valid 404 1m;
proxy_cache_valid any 5s;
proxy_cache_lock on;

nginx.conf

There's not much to say here, other than setting it up to return a specific error if the A record's IP is accessed directly.

I have multi_accept set to on because I'm running this on a low-spec, modest instance as a reverse proxy and cache server.

The visibility has improved significantly now that settings previously written directly in the http directive are included by purpose.

worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 65535;
events {
    multi_accept on;
    worker_connections 65535;
}
http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    server_tokens off;
    types_hash_max_size 4096;
    client_max_body_size 16M;
    # MINE
    include mime.types;
    default_type application/octet-stream;
    include ./http.d/bot_rate_limit.conf;
    include ./http.d/proxy_common.conf;
    include ./http.d/proxy_cache_zones.conf;
    include ./http.d/gzip.conf;
    server {
        listen 80;
        listen [::]:80;
        server_name 163.44.113.145 91.98.169.80 2400:8500:2002:3317:163:44:113:145;
        include snippets/common_error_pages.conf;
        return 444;
    }
    server {
        listen 443 ssl;
        listen [::]:443 ssl;
        server_name 163.44.113.145 91.98.169.80 2400:8500:2002:3317:163:44:113:145;
        ssl_certificate ./selfsigned.crt;
        ssl_certificate_key ./selfsigned.key;
        include snippets/common_error_pages.conf;
        # IPアドレスへのアクセスを拒否
        return 444;
    }
    ### Damepo.jp
    include ./sites-enabled/damepo.jp.conf;
    ### Soulminingrig My Blog
    include ./sites-enabled/soulminingrig.com.conf;
# ~~~略~~~~
}

sites-enabled/soulminingrig.com.conf

I'll introduce just one example under site-enabled. This is the configuration for this site.

Also, the reason I don't use symbolic links is simply because I want to be able to delete unnecessary items quickly.

Please note that some parts are still hardcoded as I am currently in the middle of refining them.

For the record, I recently changed it to www.soulminingrig.com, but since I was previously serving it from the root domain without www, I'm currently still accepting the root domain without redirecting.

During this verification period, I've enabled header responses that make it easy to see whether the server cache or the client cache responded.

And I'm wondering if I can do something more about the cache rules for images, fonts, CSS, etc... I'm stuck thinking, "Can't I do something about this...?"

I've configured upstream so that I can quickly respond if the number of backends increases. Well, there's only one at the moment, but having it at the top makes it easy to see which backend the configuration is pointing to.

upstream backend_sm {
    server 10.1.0.228:8888 max_fails=3 fail_timeout=3s;
    keepalive 16;
    keepalive_timeout 30s;
}
map $uri $static_cache {
    ~\.(jpg|jpeg|png|webp|gif|mp4|css|js|ico|woff2)(\?.*)?$ "public, max-age=604800";
    ~\.html$ "public, max-age=600";
    default "public, max-age=600";
}
map $upstream_cache_status $server_cache_status {
    default $upstream_cache_status;
    "" "NONE";
}
map "$http_if_none_match:$http_if_modified_since" $client_cache_request {
    default "MISS";
    "~.+:.+" "REVALIDATE";
    "~.+:" "REVALIDATE";
    "~:.+" "REVALIDATE";
}
server {
    listen 80;
    listen [::]:80;
    server_name soulminingrig.com www.soulminingrig.com;
    return 301 https://$host$request_uri;
}
server {
    listen 443 ssl reuseport backlog=65535 rcvbuf=256k sndbuf=256k fastopen=256 so_keepalive=on;
    listen [::]:443 ssl reuseport backlog=65535 rcvbuf=256k sndbuf=256k fastopen=256 so_keepalive=on ipv6only=on;
    listen 443 quic reuseport;
    listen [::]:443 quic reuseport;
    http2 on;
    http3 on;
    server_name soulminingrig.com www.soulminingrig.com;
    client_max_body_size 50M;
    location ~* \.(jpg|jpeg|png|webp|gif|ico|mp4|js|css|woff2)(\?.*)?$ {
        proxy_pass http://backend_sm;
        include snippets/proxy_headers.conf;
        proxy_http_version 1.1;
        proxy_redirect off;
        proxy_cache static_cache;
        proxy_cache_valid 200 301 302 7d;
        proxy_cache_valid 404 1m;
        proxy_cache_revalidate on;
        proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
        proxy_cache_background_update on;
        proxy_cache_lock on;
        proxy_connect_timeout 3s;
        proxy_read_timeout 15s;
        expires 7d;
        add_header X-Cache-Status $upstream_cache_status always;
        add_header X-Server-Cache-Status $server_cache_status always;
        add_header X-Client-Cache-Request $client_cache_request always;
        add_header X-Client-Cache-Policy "public, max-age=604800" always;
        add_header Cache-Control "public, max-age=604800" always;
    }
    location / {
        proxy_pass http://backend_sm/;
        include snippets/proxy_headers.conf;
        proxy_http_version 1.1;
        proxy_redirect off;
        proxy_cache posts;
        proxy_cache_key $scheme$host$request_uri;
        proxy_cache_valid 200 10m;
        proxy_cache_valid 301 1h;
        proxy_cache_valid 404 1m;
        proxy_cache_revalidate on;
        proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504 http_403 http_404;
        proxy_cache_background_update on;
        proxy_cache_lock on;
        proxy_connect_timeout 3s;
        proxy_read_timeout 15s;
        expires $static_cache;
        add_header X-Cache-Status $upstream_cache_status always;
        add_header X-Server-Cache-Status $server_cache_status always;
        add_header X-Client-Cache-Request $client_cache_request always;
        add_header X-Client-Cache-Policy $static_cache always;
        add_header Cache-Control $static_cache always;
    }
    include snippets/common_error_pages.conf;
    include snippets/ssl_common.conf;
    ssl_certificate /hoge/fullchain.pem; # managed by Certbot
    ssl_certificate_key /hoge/oulminingrig.com/privkey.pem;
    # managed by Certbot
}

Bonus: nginxfmt.py

This is really great.

It's a formatter, but I feel it has the highest quality.

It's available in the AUR.

yay -S nginx-config-formatter

Usage: nginxfmt.py example.conf will format the file, so

find . -name "*conf" | xargs -I{} nginxfmt.py {}

will format everything in bulk.

Until recently, I was using nginxbeautifier, but I switched because it broke the syntax around redirects.