Server Stats Script — server-stats.sh

A standalone bash script that parses /var/log/nginx/access.log to produce a traffic-like dashboard of visitor stats, song plays, top pages, referrers, and status codes.

Location

/usr/local/bin/server-stats (symlinked from config/server-stats.sh in the repo)

Usage

sudo server-stats          # last 7 days (default)
sudo server-stats --today  # last 24 hours
sudo server-stats --week   # last 7 days (explicit)
sudo server-stats --all    # entire log

The script auto-elevates to sudo if run as a regular user, since the log is root-owned.

What it shows

Section Description
Visitor Stats Unique IPs, total requests, busiest single day
Song Plays Per-song play count (from MP3 requests in /static/songs/lofi/)
Top Pages Most requested URL paths (excluding static assets)
Top Referrers Where traffic came from (direct, social media, etc.)
Status Codes Breakdown of HTTP response codes (200, 404, 444, etc.)

How it works

Log format

The script expects nginx's Combined Log Format (default):

$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"

Example:

81.103.246.154 - - [04/Jul/2026:13:14:33 +0100] "GET /music/lofi HTTP/2.0" 200 1234 "https://twitter.com/..." "Mozilla/5.0..."

Default location: /var/log/nginx/access.log

Parsing approach

The script runs 5 separate awk invocations, each processing the full log file in a single pass. This is fast — the file is read from disk cache after the first pass.

All awk programs share a common prolog (AWK_PROLOG shell variable) that:

  1. Maps 3-letter month abbreviations to numbers (Jan1, Feb2, etc.)
  2. Parses the $4 field ([04/Jul/2026:14:30:00) using split($4, d, "[/:[]") to extract day, month, and year
  3. Converts to a Unix timestamp using mktime() and compares against the cutoff for the requested time window
  4. Skips lines with NF < 5 or malformed dates (added to prevent the attempt to use array 'd' in a scalar context error from malformed log lines)

Date filtering by time window

The cutoff is calculated in bash before any awk runs:

CUTOFF=$((NOW - 86400))        # --today (1 day)
CUTOFF=$((NOW - 86400 * 7))    # --week (7 days, default)
CUTOFF=0                       # --all

This is passed to each awk invocation as the -v cutoff=... variable. Lines with a timestamp older than cutoff are skipped immediately, so only recent log entries contribute to the counts.

Request parsing with quote splitting

The Combined Log Format uses quoted fields (the request string, referrer, and user agent). Splitting by whitespace alone doesn't work because the request contains spaces:

"GET /music/lofi HTTP/2.0"

The script uses split($0, q, "\"") to split each log line by the " character. This neatly separates the quoted fields:

Element Content
q[1] Everything before the first " (IP, date, etc.)
q[2] The request: GET /path HTTP/2.0
q[3] Between request and referrer: 200 1234
q[4] The referrer
q[5] Between referrer and user agent:
q[6] The user agent

This is more reliable than field-position-based extraction.

Song play counting

Only GET requests for .mp3 files in /static/songs/lofi/ are counted. The URL-decoded filename (with %20, %2C, %3F converted back to spaces, commas, and question marks) is used as the song name, with .mp3 stripped.

Referrer grouping

Referrers are extracted from q[4]. The https:// prefix is stripped and www. removed. IP-based referrers (common from port scanners) are folded into the "direct" count. Empty referrers and - are also counted as direct traffic.

Status code extraction

The status code is extracted from q[3] (the space between the request and referrer quotes) by splitting on whitespace and taking the first field. This avoids the fragility of relying on $N field positions.

Edge cases handled

Edge case How it's handled
Malformed log lines (missing date) if (NF < 5) next and if (length(d) < 4) next skip them
IP-based referrers (port scanners) Counted as "direct" traffic
URL-encoded characters in song names %20 → space, %2C → comma, %3F → question mark
Variable collision (array d vs scalar d) Loop variable renamed to day to avoid conflicting with the d array from split()
\u unicode escapes in gawk Replaced with plain ASCII - and = since gawk's printf doesn't interpret them

Dependencies

  • gawk (GNU awk) — for mktime(), asorti(), and PROCINFO["sorted_in"]
  • AlmaLinux ships with gawk by default (awk is symlinked to gawk)

Color coding

Color Meaning
Green Normal / expected values
Yellow Warnings (e.g., 4xx status codes)
Red Critical (e.g., 5xx status codes)

Colours are omitted when output is piped (e.g., sudo server-stats | less).

Deployment

sudo scp config/server-stats.sh user@server:~/
sudo mv ~/server-stats.sh /usr/local/bin/server-stats
sudo chmod +x /usr/local/bin/server-stats

Companion script

server-status.sh in the same directory provides system-level monitoring (memory, disk, services, fail2ban) — complimentary but separate.