This is the technical writeup for my firewall management lab. The artifacts it references (the complete ruleset, the change-request template, and a worked incident report) are published as-is.
1. Goals
This lab is meant to prove one thing: that I can operate a firewall the way a production SOC does: documented policy, a change management gate in front of every rule modification, and the ability to investigate a blocked-traffic alert end to end.
It is deliberately not a red-team or attack lab. The value here is the defensive process, not exploit development. The firewall is not connected to anything I care about compromising. The traffic is simulated or between virtual machines. The point is the workflow and the documentation habit.
Specific things this lab is designed to practice:
- Writing and maintaining pf firewall rules in a way that is auditable (labeled, commented, CR-referenced)
- Using Suricata for IDS alerting without tuning it into silence, knowing which alerts matter
- Following a change management process even when working alone, because the habit is the skill
- Investigating a blocked or alerted flow from alert through log correlation through root-cause determination
- Writing incident reports that a real team could act on
What it doesn’t claim: production experience, enterprise scale, or a hardened threat model. The lab runs on repurposed hardware with simulated traffic. The discipline is real. The blast radius is not.
2. Topology
[WAN · 203.0.113.0/30]
|
| em0
[pfSense Firewall]
/ \
em1 em2 / em3 / em4
/ |
[DMZ · 172.16.0.0/24] [Core Switch (L3)]
172.16.0.10 reverse proxy / | \
172.16.0.20 DNS resolver / | \
/ | \
[VLAN 10] [VLAN 20] [VLAN 30]
192.168.10.0 192.168.20.0 10.0.30.0
/24 /24 /24
Users Servers Mgmt
Addressing plan
| Segment | Interface | Subnet | Purpose |
|---|---|---|---|
| WAN | em0 | 203.0.113.0/30 | Upstream ISP (RFC 5737 doc range in lab) |
| DMZ | em1 | 172.16.0.0/24 | Perimeter services |
| VLAN 10 | em2 | 192.168.10.0/24 | Workstations |
| VLAN 20 | em3 | 192.168.20.0/24 | Internal servers |
| VLAN 30 | em4 | 10.0.30.0/24 | Out-of-band management |
Trust model
Three zones, ordered by trust level:
Untrusted (WAN): No unsolicited inbound traffic is permitted. Anti-spoofing rules block RFC 1918 and other bogons arriving on the WAN interface. All NAT masquerades internal subnets behind the WAN IP.
DMZ (semi-trusted): The DMZ is reachable by WAN for services that need to be. Currently none, because no public services are exposed. DMZ hosts may initiate outbound HTTP/HTTPS for updates. They may send syslog to the log collector. They may not reach any internal LAN segment directly.
Internal LAN (trusted, segmented): Three VLANs with explicit-deny between them. Users can reach the internet and the file server. Users cannot reach servers directly on most ports. Management can reach everything via SSH but only from the management VLAN. No segment can reach the management VLAN except the firewall itself.
3. Default policy stance
Every interface has block in log all as its last rule. There is no implicit “allow established” on the inside interfaces. pfSense handles stateful return traffic, but new connections from any direction require an explicit permit.
Why explicit-deny and not a stateful-allow-out?
Many small firewalls are configured with an implicit “allow outbound, deny inbound” policy. That stops drive-by attacks from the internet but does nothing if a host inside the network is compromised or misconfigured. The explicit-deny model means that any new flow (user-to-server, server-to-internet, DMZ-to-internal) requires a written rule. That rule has a change-request ID in its label. This creates an audit trail and forces a conscious decision for every access path.
What “change-managed” means operationally:
Every time I add, modify, or remove a rule, I fill out the change request template before touching the firewall config. The template forces me to articulate the justification, assess the risk, write a test plan, and think through rollback. Working alone, this feels unnecessary. That’s the point. The habit of documenting decisions is not for the current moment. It’s for the version of me that comes back six months later and has no idea why a rule exists.
Logging defaults:
Every block rule logs. The log keyword in pf sends dropped packets to /var/log/filter.log, which syslog-ng picks up and forwards to the log collector. Passes are not logged by default (too noisy), but specific high-interest passes, particularly anything touching the management VLAN, have explicit log on the pass rule.
4. Ruleset structure
Rules are organized by interface (pfSense processes rules per-interface, inbound). Within each interface block, the order is:
- Anti-spoofing blocks (WAN only)
- Specific permits, each labeled with a CR reference
- Explicit named blocks for high-interest flows (belt-and-suspenders over the default deny)
- Default deny with log
Object tables are defined at the top: <user_net>, <server_net>, <dmz_hosts>, and so on. This means adding a new host to the user VLAN doesn’t require editing every rule that references the user network, only the table definition.
Naming convention for labels: <zone>-<direction>-<service>. Examples:
users-web-egress: Users VLAN, outbound to internet, HTTP/HTTPSdmz-no-lateral: DMZ hosts blocked from reaching LANmgmt-ssh-fw: Management SSH to the firewall
Labels appear in firewall log entries and in pfctl -s rules, which makes log correlation trivial: grep the label, get the rule.
5. Change management
Every rule modification (add, edit, or delete) goes through the same four-step process:
Request → Review → Implement → Verify
Request: Fill out the change request template. This includes scope (which interface, which hosts, which ports), justification, risk assessment, and rollback plan. The CR gets a sequential ID: CR-YYYY-NNN.
Review: In a real environment, a second set of eyes. In this lab, I sit on the request for at least an hour and re-read it. The goal is to catch the thing you don’t see when you’re in the middle of configuring.
Implement: Apply the change in the pfSense GUI or via pf.conf edit. Back up the config before and after. Update baseline.rules to reflect the new state.
Verify: Test the specific flow (e.g., nc -zv destination port from the correct source host). Check the firewall log to confirm the label appears. Check that adjacent flows that should remain blocked are still blocked. Mark the CR closed.
Rollback: pfSense keeps config backups in /cf/conf/backup/. Reverting to the pre-change config takes about 90 seconds via the GUI, or pfctl -f /etc/pf.conf at the CLI.
6. Traffic analysis
What gets logged:
- All drops (every interface, every direction): syslog-ng at 192.168.20.30
- Suricata alerts (inline mode via pfSense package): same syslog collector
- DHCP leases: pfSense DHCP log, also forwarded to syslog-ng
Toolchain for investigation:
For a blocked flow: Start with the pfSense filter log. Filter by source IP or destination port. The label on the blocking rule tells you which policy denied it.
# On the syslog server: find all drops from a specific host
grep "192.168.10.11" /var/log/syslog/filter.log | grep "block"
For a Suricata alert: Suricata writes alerts to /var/log/suricata/fast.log and to syslog. Cross-reference the signature name against the Emerging Threats ruleset documentation to understand what traffic pattern triggered it.
grep "ET SCAN" /var/log/suricata/fast.log | tail -20
For an unknown flow: tcpdump on the relevant pfSense interface to capture raw packets, then Wireshark.
# Capture TCP on the Users interface for 60 seconds
tcpdump -i em2 -w /tmp/users-cap.pcap -G 60 -W 1 tcp
For log correlation: syslog-ng writes structured logs per host under /var/log/syslog/hosts/<ip>/current. Cross-referencing endpoint process logs with firewall block timestamps is how you go from “what was blocked” to “what was the source process.”
7. Incident response workflow
The firewall is involved in all four IR phases. Here’s how I practice each:
Triage: Alert fires (Suricata or a firewall log anomaly). First question: is the traffic being blocked or permitted? If blocked, the threat is already mitigated, and triage drops from Critical to Low. Pull the filter log for the source IP to see the full scope of attempts.
Containment: If traffic is blocked, containment is the firewall doing its job. Verify that no state was established (pfctl -s state | grep <source-ip>) before downgrading severity. If traffic was permitted (false negative), add a temporary block rule immediately, then investigate.
Eradication: Identify the root cause: compromised host, misconfigured application, or test traffic that wasn’t coordinated. Remediate at the source, not just at the firewall. A block rule without a root-cause fix is just a bandage.
Recovery: Remove temporary rules. If a permanent rule change is needed, file a CR and go through the normal change process, even if the incident is “closed.”
Lessons learned: Every IR gets a written report. The value is not the report itself but the habit of asking “what would have caught this earlier?” and “does the detection need tuning?“
8. Worked example: IR-2026-003
This is a real example from the lab. The full report is published.
Alert: Suricata fires ET SCAN Potential SSH Scan OUTBOUND at 02:11 UTC. Source: 192.168.10.11 (ws02). Targets: three hosts in VLAN 20, port 22. Traffic is blocked by users-default-deny.
Triage: Filter log confirms all attempts blocked, no state established. Severity: Low.
Investigation: syslog-ng shows the backup agent on ws02 starting a job at 02:08. The agent’s config points to all three server-VLAN hosts and was reconfigured by a software update to try SSH transport instead of the application port (9443). The agent was iterating its configured host list, probing each on port 22, which Suricata correctly identified as a scan pattern.
Outcome: No breach. Reconfigured the agent, scoped its target list to only the backup server, corrected the transport port. Filed CR-2026-016 to explicitly permit that specific flow. Added an explicit users-no-ssh-to-servers block rule so future SSH-to-servers blocks are clearly labeled in audit logs rather than falling through to the generic default-deny.
Time to resolve: 65 minutes. Impact: Zero.
9. What’s next
Honest gaps and the next milestones:
| Gap | Plan |
|---|---|
| No centralized log search, and grepping flat files doesn’t scale | Stand up Graylog or Elastic on a separate VM in VLAN 20 |
| Suricata rules are mostly defaults, with a high false-positive rate | Work through the ET ruleset methodically, suppress known-good, write one custom rule |
| No IDS coverage on inter-VLAN traffic | Enable Suricata on em2 and em3 and compare alert volume |
| Change process is documented but not version-controlled | Move baseline.rules and the CR log into git so each CR becomes a commit |
| No traffic baseline, and anomaly detection requires knowing “normal” | Capture a week of baselines before tuning Suricata |
| Single firewall, with no redundancy or HA testing | Add a second pfSense in CARP/HA mode and practice failover |
The goal is not to finish, it’s to keep the process honest. Each gap is a planned lab exercise, not a shortcoming.