Server Stats Script — server-stats.sh
A standalone bash script that parses /var/log/nginx/access.log to produce a traffic-like dashboard of visitor stats, song plays, top pages, referrers, and status codes.
Location
/usr/local/bin/server-stats (symlinked from config/server-stats.sh in the repo)
Usage
sudo server-stats # last 7 days (default)
sudo server-stats --today # last 24 hours
sudo server-stats --week # last 7 days (explicit)
sudo server-stats --all # entire log
The script auto-elevates to sudo if run as a regular user, since the log is root-owned.
What it shows
| Section | Description |
|---|---|
| Visitor Stats | Unique IPs, total requests, busiest single day |
| Song Plays | Per-song play count (from MP3 requests in /static/songs/lofi/) |
| Top Pages | Most requested URL paths (excluding static assets) |
| Top Referrers | Where traffic came from (direct, social media, etc.) |
| Status Codes | Breakdown of HTTP response codes (200, 404, 444, etc.) |
How it works
Log format
The script expects nginx's Combined Log Format (default):
$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"
Example:
81.103.246.154 - - [04/Jul/2026:13:14:33 +0100] "GET /music/lofi HTTP/2.0" 200 1234 "https://twitter.com/..." "Mozilla/5.0..."
Default location: /var/log/nginx/access.log
Parsing approach
The script runs 5 separate awk invocations, each processing the full log file in a single pass. This is fast — the file is read from disk cache after the first pass.
All awk programs share a common prolog (AWK_PROLOG shell variable) that:
- Maps 3-letter month abbreviations to numbers (
Jan→1,Feb→2, etc.) - Parses the
$4field ([04/Jul/2026:14:30:00) usingsplit($4, d, "[/:[]")to extract day, month, and year - Converts to a Unix timestamp using
mktime()and compares against the cutoff for the requested time window - Skips lines with
NF < 5or malformed dates (added to prevent theattempt to use array 'd' in a scalar contexterror from malformed log lines)
Date filtering by time window
The cutoff is calculated in bash before any awk runs:
CUTOFF=$((NOW - 86400)) # --today (1 day)
CUTOFF=$((NOW - 86400 * 7)) # --week (7 days, default)
CUTOFF=0 # --all
This is passed to each awk invocation as the -v cutoff=... variable. Lines with a timestamp older than cutoff are skipped immediately, so only recent log entries contribute to the counts.
Request parsing with quote splitting
The Combined Log Format uses quoted fields (the request string, referrer, and user agent). Splitting by whitespace alone doesn't work because the request contains spaces:
"GET /music/lofi HTTP/2.0"
The script uses split($0, q, "\"") to split each log line by the " character. This neatly separates the quoted fields:
| Element | Content |
|---|---|
q[1] |
Everything before the first " (IP, date, etc.) |
q[2] |
The request: GET /path HTTP/2.0 |
q[3] |
Between request and referrer: 200 1234 |
q[4] |
The referrer |
q[5] |
Between referrer and user agent: |
q[6] |
The user agent |
This is more reliable than field-position-based extraction.
Song play counting
Only GET requests for .mp3 files in /static/songs/lofi/ are counted. The URL-decoded filename (with %20, %2C, %3F converted back to spaces, commas, and question marks) is used as the song name, with .mp3 stripped.
Referrer grouping
Referrers are extracted from q[4]. The https:// prefix is stripped and www. removed. IP-based referrers (common from port scanners) are folded into the "direct" count. Empty referrers and - are also counted as direct traffic.
Status code extraction
The status code is extracted from q[3] (the space between the request and referrer quotes) by splitting on whitespace and taking the first field. This avoids the fragility of relying on $N field positions.
Edge cases handled
| Edge case | How it's handled |
|---|---|
| Malformed log lines (missing date) | if (NF < 5) next and if (length(d) < 4) next skip them |
| IP-based referrers (port scanners) | Counted as "direct" traffic |
| URL-encoded characters in song names | %20 → space, %2C → comma, %3F → question mark |
Variable collision (array d vs scalar d) |
Loop variable renamed to day to avoid conflicting with the d array from split() |
\u unicode escapes in gawk |
Replaced with plain ASCII - and = since gawk's printf doesn't interpret them |
Dependencies
- gawk (GNU awk) — for
mktime(),asorti(), andPROCINFO["sorted_in"] - AlmaLinux ships with gawk by default (
awkis symlinked togawk)
Color coding
| Color | Meaning |
|---|---|
| Green | Normal / expected values |
| Yellow | Warnings (e.g., 4xx status codes) |
| Red | Critical (e.g., 5xx status codes) |
Colours are omitted when output is piped (e.g., sudo server-stats | less).
Deployment
sudo scp config/server-stats.sh user@server:~/
sudo mv ~/server-stats.sh /usr/local/bin/server-stats
sudo chmod +x /usr/local/bin/server-stats
Companion script
server-status.sh in the same directory provides system-level monitoring (memory, disk, services, fail2ban) — complimentary but separate.