Running a homelab firewall like production

This is the technical writeup for my firewall management lab. The artifacts it references (the complete ruleset, the change-request template, and a worked incident report) are published as-is.

1. Goals

This lab is meant to prove one thing: that I can operate a firewall the way a production SOC does: documented policy, a change management gate in front of every rule modification, and the ability to investigate a blocked-traffic alert end to end.

It is deliberately not a red-team or attack lab. The value here is the defensive process, not exploit development. The firewall is not connected to anything I care about compromising. The traffic is simulated or between virtual machines. The point is the workflow and the documentation habit.

Specific things this lab is designed to practice:

Writing and maintaining pf firewall rules in a way that is auditable (labeled, commented, CR-referenced)
Using Suricata for IDS alerting without tuning it into silence, knowing which alerts matter
Following a change management process even when working alone, because the habit is the skill
Investigating a blocked or alerted flow from alert through log correlation through root-cause determination
Writing incident reports that a real team could act on

What it doesn’t claim: production experience, enterprise scale, or a hardened threat model. The lab runs on repurposed hardware with simulated traffic. The discipline is real. The blast radius is not.

2. Topology

                    [WAN · 203.0.113.0/30]
                           |
                           | em0
                    [pfSense Firewall]
                   /                  \
               em1                   em2 / em3 / em4
              /                             |
    [DMZ · 172.16.0.0/24]         [Core Switch (L3)]
     172.16.0.10  reverse proxy      /      |       \
     172.16.0.20  DNS resolver      /       |        \
                                   /        |         \
                       [VLAN 10]  [VLAN 20]  [VLAN 30]
                    192.168.10.0  192.168.20.0  10.0.30.0
                       /24           /24            /24
                     Users         Servers        Mgmt

Addressing plan

Segment	Interface	Subnet	Purpose
WAN	em0	203.0.113.0/30	Upstream ISP (RFC 5737 doc range in lab)
DMZ	em1	172.16.0.0/24	Perimeter services
VLAN 10	em2	192.168.10.0/24	Workstations
VLAN 20	em3	192.168.20.0/24	Internal servers
VLAN 30	em4	10.0.30.0/24	Out-of-band management

Trust model

Three zones, ordered by trust level:

Untrusted (WAN): No unsolicited inbound traffic is permitted. Anti-spoofing rules block RFC 1918 and other bogons arriving on the WAN interface. All NAT masquerades internal subnets behind the WAN IP.

DMZ (semi-trusted): The DMZ is reachable by WAN for services that need to be. Currently none, because no public services are exposed. DMZ hosts may initiate outbound HTTP/HTTPS for updates. They may send syslog to the log collector. They may not reach any internal LAN segment directly.

Internal LAN (trusted, segmented): Three VLANs with explicit-deny between them. Users can reach the internet and the file server. Users cannot reach servers directly on most ports. Management can reach everything via SSH but only from the management VLAN. No segment can reach the management VLAN except the firewall itself.

3. Default policy stance

Every interface has block in log all as its last rule. There is no implicit “allow established” on the inside interfaces. pfSense handles stateful return traffic, but new connections from any direction require an explicit permit.

Why explicit-deny and not a stateful-allow-out?

Many small firewalls are configured with an implicit “allow outbound, deny inbound” policy. That stops drive-by attacks from the internet but does nothing if a host inside the network is compromised or misconfigured. The explicit-deny model means that any new flow (user-to-server, server-to-internet, DMZ-to-internal) requires a written rule. That rule has a change-request ID in its label. This creates an audit trail and forces a conscious decision for every access path.

What “change-managed” means operationally:

Every time I add, modify, or remove a rule, I fill out the change request template before touching the firewall config. The template forces me to articulate the justification, assess the risk, write a test plan, and think through rollback. Working alone, this feels unnecessary. That’s the point. The habit of documenting decisions is not for the current moment. It’s for the version of me that comes back six months later and has no idea why a rule exists.

Logging defaults:

Every block rule logs. The log keyword in pf sends dropped packets to /var/log/filter.log, which syslog-ng picks up and forwards to the log collector. Passes are not logged by default (too noisy), but specific high-interest passes, particularly anything touching the management VLAN, have explicit log on the pass rule.

4. Ruleset structure

Rules are organized by interface (pfSense processes rules per-interface, inbound). Within each interface block, the order is:

Anti-spoofing blocks (WAN only)
Specific permits, each labeled with a CR reference
Explicit named blocks for high-interest flows (belt-and-suspenders over the default deny)
Default deny with log

Object tables are defined at the top: <user_net>, <server_net>, <dmz_hosts>, and so on. This means adding a new host to the user VLAN doesn’t require editing every rule that references the user network, only the table definition.

Naming convention for labels: <zone>-<direction>-<service>. Examples:

users-web-egress: Users VLAN, outbound to internet, HTTP/HTTPS
dmz-no-lateral: DMZ hosts blocked from reaching LAN
mgmt-ssh-fw: Management SSH to the firewall

Labels appear in firewall log entries and in pfctl -s rules, which makes log correlation trivial: grep the label, get the rule.

5. Change management

Every rule modification (add, edit, or delete) goes through the same four-step process:

Request → Review → Implement → Verify

Request: Fill out the change request template. This includes scope (which interface, which hosts, which ports), justification, risk assessment, and rollback plan. The CR gets a sequential ID: CR-YYYY-NNN.

Review: In a real environment, a second set of eyes. In this lab, I sit on the request for at least an hour and re-read it. The goal is to catch the thing you don’t see when you’re in the middle of configuring.

Implement: Apply the change in the pfSense GUI or via pf.conf edit. Back up the config before and after. Update baseline.rules to reflect the new state.

Verify: Test the specific flow (e.g., nc -zv destination port from the correct source host). Check the firewall log to confirm the label appears. Check that adjacent flows that should remain blocked are still blocked. Mark the CR closed.

Rollback: pfSense keeps config backups in /cf/conf/backup/. Reverting to the pre-change config takes about 90 seconds via the GUI, or pfctl -f /etc/pf.conf at the CLI.

6. Traffic analysis

What gets logged:

All drops (every interface, every direction): syslog-ng at 192.168.20.30
Suricata alerts (inline mode via pfSense package): same syslog collector
DHCP leases: pfSense DHCP log, also forwarded to syslog-ng

Toolchain for investigation:

For a blocked flow: Start with the pfSense filter log. Filter by source IP or destination port. The label on the blocking rule tells you which policy denied it.

# On the syslog server: find all drops from a specific host
grep "192.168.10.11" /var/log/syslog/filter.log | grep "block"

For a Suricata alert: Suricata writes alerts to /var/log/suricata/fast.log and to syslog. Cross-reference the signature name against the Emerging Threats ruleset documentation to understand what traffic pattern triggered it.

grep "ET SCAN" /var/log/suricata/fast.log | tail -20

For an unknown flow: tcpdump on the relevant pfSense interface to capture raw packets, then Wireshark.

# Capture TCP on the Users interface for 60 seconds
tcpdump -i em2 -w /tmp/users-cap.pcap -G 60 -W 1 tcp

For log correlation: syslog-ng writes structured logs per host under /var/log/syslog/hosts/<ip>/current. Cross-referencing endpoint process logs with firewall block timestamps is how you go from “what was blocked” to “what was the source process.”

7. Incident response workflow

The firewall is involved in all four IR phases. Here’s how I practice each:

Triage: Alert fires (Suricata or a firewall log anomaly). First question: is the traffic being blocked or permitted? If blocked, the threat is already mitigated, and triage drops from Critical to Low. Pull the filter log for the source IP to see the full scope of attempts.

Containment: If traffic is blocked, containment is the firewall doing its job. Verify that no state was established (pfctl -s state | grep <source-ip>) before downgrading severity. If traffic was permitted (false negative), add a temporary block rule immediately, then investigate.

Eradication: Identify the root cause: compromised host, misconfigured application, or test traffic that wasn’t coordinated. Remediate at the source, not just at the firewall. A block rule without a root-cause fix is just a bandage.

Recovery: Remove temporary rules. If a permanent rule change is needed, file a CR and go through the normal change process, even if the incident is “closed.”

Lessons learned: Every IR gets a written report. The value is not the report itself but the habit of asking “what would have caught this earlier?” and “does the detection need tuning?“

8. Worked example: IR-2026-003

This is a real example from the lab. The full report is published.

Alert: Suricata fires ET SCAN Potential SSH Scan OUTBOUND at 02:11 UTC. Source: 192.168.10.11 (ws02). Targets: three hosts in VLAN 20, port 22. Traffic is blocked by users-default-deny.

Triage: Filter log confirms all attempts blocked, no state established. Severity: Low.

Investigation: syslog-ng shows the backup agent on ws02 starting a job at 02:08. The agent’s config points to all three server-VLAN hosts and was reconfigured by a software update to try SSH transport instead of the application port (9443). The agent was iterating its configured host list, probing each on port 22, which Suricata correctly identified as a scan pattern.

Outcome: No breach. Reconfigured the agent, scoped its target list to only the backup server, corrected the transport port. Filed CR-2026-016 to explicitly permit that specific flow. Added an explicit users-no-ssh-to-servers block rule so future SSH-to-servers blocks are clearly labeled in audit logs rather than falling through to the generic default-deny.

Time to resolve: 65 minutes. Impact: Zero.

9. What’s next

Honest gaps and the next milestones:

Gap	Plan
No centralized log search, and grepping flat files doesn’t scale	Stand up Graylog or Elastic on a separate VM in VLAN 20
Suricata rules are mostly defaults, with a high false-positive rate	Work through the ET ruleset methodically, suppress known-good, write one custom rule
No IDS coverage on inter-VLAN traffic	Enable Suricata on em2 and em3 and compare alert volume
Change process is documented but not version-controlled	Move baseline.rules and the CR log into git so each CR becomes a commit
No traffic baseline, and anomaly detection requires knowing “normal”	Capture a week of baselines before tuning Suricata
Single firewall, with no redundancy or HA testing	Add a second pfSense in CARP/HA mode and practice failover

The goal is not to finish, it’s to keep the process honest. Each gap is a planned lab exercise, not a shortcoming.