Servers
7/7
SKY · RAIN · SUN · WIND · OAK · MAPLE · CEDAR
CIS Compliance
100%
193/193 WDC + GCP
Cloud VPN
ESTABLISHED
130.211.194.72 ↔ 38.140.146.68
DNS Serial
2026031002
DNSSEC signed · 134 records
DHCP Hosts
112
Reservations active
Failover
NORMAL
SKY primary · RAIN secondary
All Infrastructure — WDC + GCP
DNS Queries (7 days)
Server Uptime % — All 7
DHCP Leases Active
On-Prem — WDC Servers
SKY — 192.168.120.1
Primary DNS/DHCP · sky.wdc.us.gl3
Online
CIS
47/47
SELinux
Enforcing
Auditd
Immutable
named ✓dhcpd ✓fail2ban ✓auditd ✓rsyslog ✓firewalld ✓chronyd ✓AIDE ✓DNSSEC ✓
RN
RAIN — 192.168.120.2
Secondary DNS/DHCP · rain.wdc.us.gl3
Online
CIS
47/47
SELinux
Enforcing
Zone Sync
Current
named ✓dhcpd ✓fail2ban ✓auditd ✓rsyslog ✓firewalld ✓chronyd ✓AIDE ✓
SN
SUN — 192.168.120.3
Monitoring · Prometheus + Grafana
Online
CIS
48/48
Targets
5
Alerts
0
prometheus ✓grafana ✓node_exporter ✓auditd ✓firewalld ✓chronyd ✓AIDE ✓
WD
WIND — 192.168.120.4
Logging · ELK Stack
Online
CIS
51/51
Indices
5
Retention
90d
elasticsearch ✓logstash ✓kibana ✓auditd ✓firewalld ✓chronyd ✓AIDE ✓
Cloud — GCP (us-central1)
VPN Tunnel ESTABLISHED
WDC 38.140.146.68 ↔ GCP 130.211.194.72 · IKEv2 · AES-256
192.168.120.0/23 + 192.168.124.0/24 ↔ 172.16.0.0/24
192.168.120.0/23 + 192.168.124.0/24 ↔ 172.16.0.0/24
GCP VMs — cloud.us (us-central1-a)
OA
OAK — 172.16.0.10
Security Scanner · OpenVAS/Greenbone · oak.cloud.us
Online
docker ✓fail2ban ✓auditd ✓SELinux ✓firewalld ✓chronyd ✓AIDE ✓
MA
MAPLE — 172.16.0.12
Cloud Monitoring · Prometheus + Grafana + Wazuh · maple.cloud.us
Online
prometheus ✓grafana ✓wazuh-mgr ✓fail2ban ✓auditd ✓SELinux ✓firewalld ✓chronyd ✓AIDE ✓
CE
CEDAR — 172.16.0.13
Cloud Logging · ELK Stack + Wazuh Indexer · cedar.cloud.us
Online
elasticsearch ✓logstash ✓kibana ✓fail2ban ✓auditd ✓SELinux ✓firewalld ✓chronyd ✓AIDE ✓
Cloud Run Services
| Service | Cloud Run Name | Image | URL | Status |
|---|---|---|---|---|
| MkDocs Portal | gpus-mkdocs-portal | mkdocs:latest | infra.greenpeace.us | ✅ Running |
| Status Site | gpus-status-site | status-site:latest | status.greenpeace.us | ✅ Running |
| Status Backend | gpus-status-backend | status-backend:latest | (internal) | ✅ Running |
| Security Site | gpus-security-site | security-site:latest | security.greenpeace.us | ✅ Running |
| Security Backend | gpus-security-backend | security-backend:latest | (internal) | ✅ Running |
| SOC Site | gpus-soc-site | soc-site:latest | soc.greenpeace.us | ✅ Running |
| SOC Backend | gpus-soc-backend | soc-backend:latest | (internal) | ✅ Running |
| Forms Portal Backend | gpus-forms-backend | gpus-forms-backend:latest | forms.greenpeace.us | ✅ Running |
All 8 services scale to zero when idle. TLS is Google-managed. Images in Artifact Registry at us-central1-docker.pkg.dev/gpus-infra/gpus-images.
98
Security Posture Score
CIS compliance · monitoring coverage · backup health · threat activity
CIS Compliance
100%
193/193 WDC + GCP
Uptime (30d)
99.9%
All services
Open Incidents
0
Last: none
Assets Monitored
136
of 138 (98.6%)
Backups
OK
Daily + GCS offsite
Threats
0
Active
CIS Compliance — Per Server
SKY
47/47
100% ✓
RAIN
47/47
100% ✓
SUN
48/48
100% ✓
WIND
51/51
100% ✓
GCP VMs — CIS Compliance
OAK
47/47
100% ✓
MAPLE
47/47
100% ✓
CEDAR
47/47
100% ✓
GCP Cloud Controls
🔐
Data Encryption
VPN AES-256 · GCS encryption at rest · Cloud Run HTTPS
CIS 3.11 PCI 4.1
🔥
VPC Firewall
Default deny-all · VPN + internal rules only
CIS 4.4 PCI 1.2.2
📊
Audit Logging
VPC Flow Logs · Cloud Audit Logs automatic
CIS 8.3 NIST AU-6
🔄
Data Recovery
GCS Nearline · 90-day retention · versioning
CIS 11.1 NIST CP-9
🔀
Network Segmentation
VPC 172.16.0.0/24 · VPN-only from on-prem
CIS 12.4 NIST SC-7
🛡
Transmission Security
IKEv2 · AES-256 · SHA-256 · DH14
NIST SC-8 PCI 1.5.1
Risk Register
Disaster Recovery Plan not documented
IT
Incident Response Plan not documented
IT
SSO not implemented — 42 apps need Okta
IT
Backup pipeline to GCS not configured
IT
Data classification not started — 7 payment + 42 supporter-data apps
IT/Legal
Servers Backed Up
—
of 7 total
GCS Bucket
gpus-infra-backups-wdc
us-central1 · Nearline
Retention
90d
Daily backups
Last Checked
—
Live from backend
Backup Status — All Servers
| Server | Last Backup | Size | Age | GCS | NAS |
|---|---|---|---|---|---|
| Loading backup data... | |||||
Backup Schedule
| Server | Frequency | Target | Retention |
|---|---|---|---|
| SKY / RAIN | Daily 02:00 | NAS (vmstorage) + GCS | 90 days |
| SUN / WIND | Daily 02:00 | NAS (vmstorage) + GCS | 90 days |
| OAK / MAPLE / CEDAR | Daily 02:00 | GCS (instance SA) | 90 days |
| Portal sites | Daily 02:30 | GCS (mkdocs/status/security) | 30 days |
Portal Site Backups — GCS
| Site | URL | Last Backup | Size | Age | GCS |
|---|---|---|---|---|---|
| Loading portal backup data... | |||||
CIS Controls v8
100%
193/193 WDC + GCP
PCI-DSS v4.0
94%
47/50 requirements met
NIST CSF
96%
Identify · Protect · Detect · Respond · Recover
NIST SP 800-53
92%
Key controls mapped
Last Audit
2026-03-10
All servers verified
Gaps
3
DRP · IRP · Backup pipeline
CIS Controls v8 — Implementation Status
On-Premises Infrastructure — SKY / RAIN / SUN / WIND
| CIS # | Control | SKY | RAIN | SUN | WIND | Implementation |
|---|---|---|---|---|---|---|
| 1.1 | Asset Inventory | ✓ | ✓ | ✓ | ✓ | DHCP lease tracking, DNS records, Kibana dhcp-leases-* index |
| 1.2 | Software Inventory | ✓ | ✓ | ✓ | ✓ | Minimal RPM install, dnf history tracked |
| 2.2 | Authorized Software | ✓ | ✓ | ✓ | ✓ | Server base only, no GUI, no unnecessary packages |
| 3.11 | Data Encryption | ✓ | ✓ | ✓ | ✓ | DNSSEC, Webmin TLS, SSH key auth, VPN AES-256 |
| 3.14 | Sensitive Data | ✓ | ✓ | ✓ | ✓ | DNSSEC keys chmod 600, ES on dedicated partition |
| 4.1 | Secure Configuration | ✓ | ✓ | ✓ | ✓ | CIS Benchmark Rocky Linux 8 Level 2 applied |
| 4.4 | Firewall | ✓ | ✓ | ✓ | ✓ | firewalld default drop zone, explicit rich-rules only |
| 5.1 | Account Inventory | ✓ | ✓ | ✓ | ✓ | dnsadmin / monitadmin only, service accounts nologin |
| 5.2 | Privileged Access | ✓ | ✓ | ✓ | ✓ | sudo with logging, SSH no root, key-only |
| 5.4 | Password Policy | ✓ | ✓ | ✓ | ✓ | 14-char min, 90-day max, lockout after 5 |
| 6.1 | Access Control | ✓ | ✓ | ✓ | ✓ | SELinux enforcing, BIND chroot, MAC filtering |
| 7.1 | Vulnerability Mgmt | ✓ | ✓ | ✓ | ✓ | dnf-automatic security updates enabled |
| 8.2 | Audit Log Mgmt | ✓ | ✓ | ✓ | ✓ | auditd immutable (-e 2), DNS/DHCP/auth rules |
| 8.3 | Log Storage | ✓ | ✓ | ✓ | ✓ | Dedicated /var/log + /var/log/audit on sdb |
| 8.5 | Log Analysis | ✓ | ✓ | ✓ | ✓ | Kibana dashboards, Grafana panels |
| 8.9 | Centralized Logging | ✓ | ✓ | — | ✓ | rsyslog → CEDAR:5140 (GCP) + WIND:5140 (WDC) → Logstash → ES → Kibana |
| 10.1 | Malware Defenses | ✓ | ✓ | ✓ | ✓ | AIDE daily file integrity monitoring |
| 11.1 | Data Recovery | ✓ | ✓ | ✓ | ✓ | Daily cron backups to /backup + GCS (planned) |
| 12.1 | Network Security | ✓ | ✓ | ✓ | ✓ | Prod/mgmt separation, firewalld drop, IPv6 disabled |
| 12.4 | Network Segmentation | ✓ | ✓ | ✓ | ✓ | 120.0/23 prod, 124.0/24 mgmt, 172.16.0.0/24 GCP |
| 13.1 | Threat Detection | ✓ | ✓ | ✓ | ✓ | Fail2ban, AIDE alerts, Prometheus alerting |
GCP Cloud Infrastructure — gpus-infra
| CIS # | Control | Status | Implementation |
|---|---|---|---|
| 3.11 | Data Encryption | ✓ | VPN AES-256, GCS encryption at rest, Cloud Run HTTPS |
| 4.4 | Firewall | ✓ | VPC deny-all default, explicit VPN + internal rules |
| 8.3 | Log Storage | ✓ | VPC Flow Logs, Cloud Audit Logs automatic |
| 11.1 | Data Recovery | ✓ | GCS backups Nearline, 90-day retention, versioning |
| 12.4 | Network Segmentation | ✓ | Separate VPC 172.16.0.0/24, VPN-only from on-prem |
PCI-DSS v4.0 — Compliance Matrix
Payment Card Industry Data Security Standard
| Req | Sub | Description | Status | Implementation |
|---|---|---|---|---|
| 1 | 1.1.1 | Network security controls defined | ✓ | firewalld default-drop zone all servers |
| 1 | 1.2.1 | Inbound/outbound restricted | ✓ | Rich rules per service/source |
| 1 | 1.2.2 | All other traffic denied | ✓ | Zone=drop, no implicit permits |
| 1 | 1.3.1 | Inbound to CDE restricted | ✓ | DNS/DHCP/SSH from internal only |
| 1 | 1.4.1 | NSC between zones | ✓ | DHCP failover 647 SKY↔RAIN only |
| 1 | 1.5.1 | Remote access secured | ✓ | SSH key auth, no root, AllowUsers, VPN |
| 2 | 2.2.1 | Securely configured | ✓ | CIS Benchmark Level 2 applied |
| 2 | 2.2.2 | Vendor defaults changed | ✓ | Root locked, all defaults changed |
| 2 | 2.2.3 | Unnecessary services removed | ✓ | telnet/ftp/rsh/avahi/cups masked |
| 2 | 2.2.4 | Insecure protocols disabled | ✓ | SSHv2 only, no FTP/Telnet/rsh |
| 4 | 4.1 | Strong cryptography for transmission | ✓ | DNSSEC, TLS Webmin, VPN AES-256 |
| 5 | 5.2 | Anti-malware mechanisms | ✓ | AIDE file integrity daily scan |
| 7 | 7.1 | Access limited to need | ✓ | Dedicated admin accounts, nologin service accounts |
| 8 | 8.3.6 | Password complexity | ✓ | 14-char min, 90-day max, lockout after 5 |
| 10 | 10.2 | Audit trails | ✓ | auditd immutable mode, DNS/DHCP/auth rules |
| 10 | 10.3 | Audit trail protection | ✓ | Centralized to WIND (WDC) + CEDAR (GCP), 90-day retention |
| 10 | 10.7 | Log retention | ✓ | Dedicated log partitions on sdb |
| 11 | 11.5.1 | File integrity monitoring | ✓ | AIDE daily scans on all 7 servers |
| 12 | 12.1.1 | Security policy established | ◐ | Policy drafted in status site — formal sign-off pending |
| 12 | 12.5.1 | Asset inventory | ✓ | IAR: 136 assets tracked in wdchostregistry.csv (129 workstations + 7 servers) |
| 12 | 12.10.1 | IR plan | ◐ | IRP drafted in status site — formal sign-off pending |
NIST Cybersecurity Framework — Function Coverage
Identify
100%
Asset mgmt · Risk assessment · Governance
Protect
100%
Access ctrl · Encryption · Hardening
Detect
100%
AIDE · auditd · ELK · Prometheus
Respond
85%
IRP drafted · formal sign-off pending
Recover
85%
DRP drafted · GCS pipeline pending
NIST SP 800-53 — Key Control Families
| Family | Control | Status | Implementation |
|---|---|---|---|
| AC-3 | Access Enforcement | ✓ | SELinux enforcing, BIND chroot, MAC DHCP filtering |
| AU-2 | Auditable Events | ✓ | auditd custom rules: DNS/DHCP changes, auth, privilege escalation |
| AU-6 | Audit Review | ✓ | Kibana dashboards (WIND + CEDAR), Grafana panels (SUN + MAPLE), centralized logging |
| CM-2 | Baseline Configuration | ✓ | CIS Benchmark L2, Terraform IaC for GCP |
| CM-7 | Least Functionality | ✓ | Minimal install, unnecessary services masked, IPv6 disabled |
| CP-9 | System Backup | ✓ | Daily cron backups, GCS offsite (pipeline pending) |
| IA-2 | Identification & Auth | ✓ | SSH key-only, no passwords, AllowUsers directive |
| IA-5 | Authenticator Mgmt | ✓ | 14-char min, 90-day rotation, faillock after 5 |
| SC-7 | Boundary Protection | ✓ | Prod/mgmt/GCP zone separation, VPN encrypted tunnel |
| SC-8 | Transmission Confidentiality | ✓ | IKEv2 AES-256, DNSSEC, TLS on Webmin |
| SC-20 | Secure Name Resolution | ✓ | DNSSEC zone signing + validation |
| SC-28 | Protection of Info at Rest | ✓ | Dedicated partitions, GCS encryption, key chmod 600 |
| SI-4 | System Monitoring | ✓ | Prometheus q15s, Fail2ban, AIDE, Kibana dashboards |
| SI-7 | Software Integrity | ✓ | AIDE daily file integrity scan on all 7 servers |
Compliance Gaps & Remediation
PCI 12.1.1 — Security Policy: Policy drafted in Governance tab. Requires formal review and sign-off by management.
Target: Q2 2026
PCI 12.10.1 — IR Plan: IRP drafted in Governance tab. Requires formal review, tabletop exercise, and sign-off.
Target: Q2 2026
NIST CP-9 — Offsite Backup: GCS backup pipeline active — all 4 WDC servers backing up daily to NAS + GCS. GCP VMs backup cron pending.
WDC: ✓ Done
SSO Integration: 42 applications identified for Okta SSO. Not yet started.
Target: Q3 2026
Data Classification: 7 payment + 42 supporter-data apps identified. Classification program not started.
Target: Q3 2026
Estimated Monthly
$362
Apr 2026 · 7 servers + 8 Cloud Run services
VPN (Fixed)
$43
Tunnel + Static IP · largest fixed cost
GCP VMs (3x)
$207
OAK (n2-std-2) + MAPLE (e2-std-2) + CEDAR (e2-std-4)
Budget Alert
$400
GCP Console → Billing → Budgets
Spend Forecast — Apr/May 2026
Cost Breakdown by Service
OAK (n2-standard-2)
$62.00
MAPLE (e2-standard-2)
$48.00
CEDAR (e2-standard-4)
$97.00
SSD Disks (6 × 50GB)
$51.00
Cloud NAT
$15.00
Cloud Run (8 services)
$8.00
Cloud Storage
$5.00
Networking + Egress
$25.00
VPN Tunnel
$36.00
Static IP
$7.00
Artifact Registry
$2.00
Other
$6.00
Cost Details
| Resource | SKU | Unit | Qty | Rate | Monthly |
|---|---|---|---|---|---|
| OAK — n2-standard-2 | oak (us-central1-a) | hr | 730 | $0.085 | $62.00 |
| MAPLE — e2-standard-2 | maple (us-central1-a) | hr | 730 | $0.067 | $48.00 |
| CEDAR — e2-standard-4 | cedar (us-central1-a) | hr | 730 | $0.134 | $97.00 |
| SSD Disks (6 × 50GB) | pd-ssd | GB | 300 | $0.170 | $51.00 |
| Cloud NAT | gpus-nat-gateway | hr | 730 | $0.020 | $15.00 |
| Cloud VPN Tunnel | gpus-vpn-tunnel-wdc | hr | 730 | $0.049 | $36.00 |
| Static IP | gpus-vpn-ip | hr | 730 | $0.010 | $7.00 |
| Networking + Egress | inter-region + internet | GB | ~150 | varies | $25.00 |
| Cloud Run (8 services) | mkdocs + status + security + soc + forms | req | ~2000 | $0.40/M | $8.00 |
| Cloud Storage | gpus-infra-backups-wdc + tf-state | GB | ~300 | $0.02 | $5.00 |
| Artifact Registry | gpus-images | GB | ~10 | $0.10 | $2.00 |
| Other (Logging, DNS, VPC) | misc | — | — | — | $6.00 |
| Total Estimated Monthly | $362.00 | ||||
Cost Optimization Notes
✓ Cloud Run scales to zero — 8 services cost only ~$8/mo total
✓ Nearline storage — 50% cheaper than Standard for backup data
⚠ GCP VMs are the biggest cost — OAK (n2-std-2) + MAPLE (e2-std-2) + CEDAR (e2-std-4) + disks = $258/mo (71% of total)
✓ Single VPN tunnel — upgrade to HA VPN ($72/mo) if uptime SLA needed
✓ SOC dashboard live — soc.greenpeace.us deployed with Wazuh/OpenVAS/Lynis/AIDE/Fail2ban/Prometheus feeds
⚠ Budget alert set to $400/mo — review in GCP Console → Billing → Budgets
✓ Nearline storage — 50% cheaper than Standard for backup data
⚠ GCP VMs are the biggest cost — OAK (n2-std-2) + MAPLE (e2-std-2) + CEDAR (e2-std-4) + disks = $258/mo (71% of total)
✓ Single VPN tunnel — upgrade to HA VPN ($72/mo) if uptime SLA needed
✓ SOC dashboard live — soc.greenpeace.us deployed with Wazuh/OpenVAS/Lynis/AIDE/Fail2ban/Prometheus feeds
⚠ Budget alert set to $400/mo — review in GCP Console → Billing → Budgets
Yearly Total
0.018
tCO₂e · Dec 2024 – Mar 2026
Latest Month
0.008
▲ +167% MoM (Mar 2026)
Scope 2 (Market)
0.006
Purchased electricity
Scope 3
0.012
Value-chain indirect
Top Region
us-central1
0.014 tCO₂e (78%)
Monthly Emissions Trend (tCO₂e)
Emissions by Scope
Scope 1
0.001
Direct emissions from owned/controlled sources (e.g., on-site generators). Near zero for cloud-only infrastructure.
Scope 2 (Market)
0.006
Indirect emissions from purchased electricity powering GCP data centers. Reduced by Google's renewable energy purchases.
Scope 3
0.012
All other indirect emissions in the value chain — hardware manufacturing, cooling, network infrastructure, employee commuting to data centers.
Emissions by Region
Contextual Comparisons
Total yearly emissions (0.018 tCO₂e ≈ 18 kg CO₂e) are roughly equivalent to:
🚗
≈ 45
miles driven in an average gasoline car
📱
≈ 2,200
smartphone charges
🌳
< 1 tree
needed to absorb yearly emissions
💰
$0.27
to fully offset via carbon credits
Data source: GCP Carbon Footprint · Market-based emissions · Updated monthly
Security Policy
Incident Response Plan
Disaster Recovery Plan
Information Security Policy
GPUS-POL-001 · v1.0 · Effective: 2026-03-10 · Owner: IT Department · Classification: INTERNAL
1. Access Control
All access to infrastructure systems follows the principle of least privilege. Administrative access is restricted to named accounts over the management network (192.168.124.0/24) using SSH key-based authentication only. Root login is disabled on all servers. Service accounts are set to nologin.
| Server | Admin Account | Auth Method | Network |
|---|---|---|---|
| SKY / RAIN | dnsadmin | SSH key-only | 192.168.124.0/24 |
| SUN / WIND | monitadmin | SSH key-only | 192.168.124.0/24 |
| OAK / MAPLE / CEDAR | cloudadmin | SSH key-only | VPN (10.8.0.0/28) |
| GCP | rajesh.chhetry@greenpeace.us | OAuth + IAM | IAM roles |
2. Change Management
All configuration changes require: (1) backup of affected files, (2) validation before deployment, (3) AIDE baseline update after change, (4) entry in
/var/log/asset-inventory.log, (5) DNSSEC re-signing if zone files changed. GCP changes must go through Terraform — no manual console changes.3. Availability & Redundancy
DNS and DHCP services run in primary/secondary failover (SKY/RAIN). DHCP failover is automatic. DNS zone transfers via AXFR. Monitoring (SUN) and logging (WIND) are single-instance with daily backups. GCP services use Cloud Run with auto-scaling.
4. Logging & Monitoring
All WDC servers forward logs to WIND (on-prem) and CEDAR (GCP) via rsyslog (TCP:5140). All GCP VMs forward logs to CEDAR. Elasticsearch retains logs for 90 days with daily index rotation. Prometheus on SUN and MAPLE scrapes metrics every 15 seconds across all 7 servers. AIDE runs daily integrity scans on all 7 servers. auditd runs in immutable mode. VPC Flow Logs enabled in GCP.
5. Password Policy
| Parameter | Value | CIS Control |
|---|---|---|
| Minimum length | 14 characters | CIS 5.4 |
| Maximum age | 90 days | CIS 5.4 |
| Lockout threshold | 5 failed attempts | CIS 5.4 |
| Lockout duration | 15 minutes | CIS 5.4 |
| Password history | 5 remembered | CIS 5.4 |
Incident Response Plan
GPUS-IRP-001 · v1.0 · Effective: 2026-03-10 · Owner: IT Department · Classification: INTERNAL
1. Incident Classification
| Severity | Description | Response Time | Escalation | Examples |
|---|---|---|---|---|
| P1 Critical | Service outage or active breach | 15 min | IT Manager → CISO | Both DNS down, ransomware, data exfil |
| P2 High | Degraded service or confirmed intrusion attempt | 1 hr | IT Manager | Single DNS down, AIDE alert, Fail2ban flood |
| P3 Medium | Anomaly requiring investigation | 4 hr | IT Team | Unusual audit events, DNS query spike |
| P4 Low | Minor issue, no impact | 24 hr | IT Team | Config drift, routine Fail2ban bans |
2. Phase 1 — Detection
Detection sources: AIDE file integrity alerts, Fail2ban ban events, auditd rule triggers, Prometheus alert rules, Kibana dashboards, GCP Cloud Audit Logs.
## Check all detection sources
# AIDE
sudo aide --check
# Fail2ban
sudo fail2ban-client status sshd
# auditd — recent security events
sudo ausearch -ts recent -k dns-zone-change -k dhcp-config
# Prometheus alerts
curl -s http://192.168.120.3:9090/api/v1/alerts | python3 -m json.tool
# Kibana — auth failures
# Open http://192.168.124.4:5601 → auth-logs-* index
3. Phase 2 — Containment
## Isolate compromised server (example: SKY)
# Option A: Block all traffic except failover
sudo firewall-cmd --zone=drop --remove-all-rich-rules
sudo firewall-cmd --zone=drop --add-rich-rule='rule family="ipv4" source address="192.168.120.2" accept'
# Option B: Shut down (RAIN takes over DNS/DHCP automatically)
sudo shutdown -h now
# Preserve evidence BEFORE remediation
mkdir -p /var/log/incident/$(date +%F)
sudo cp /var/named/wdc.us.gl3.db* /var/log/incident/$(date +%F)/
sudo ausearch -ts today > /var/log/incident/$(date +%F)/audit.txt
sudo aide --check > /var/log/incident/$(date +%F)/aide.txt 2>&1
4. Phase 3 — Eradication
Identify root cause from logs. Remove malicious artifacts. Restore from known-good backup if files were modified. Re-apply CIS hardening if configuration was altered.
5. Phase 4 — Recovery
## Restore from backup
BACKUP_DATE="YYYY-MM-DD"
tar xzf /backup/dns-dhcp/dns-backup-${BACKUP_DATE}.tar.gz -C /tmp
named-checkzone wdc.us.gl3 /tmp/zones/wdc.us.gl3.db
sudo cp /tmp/zones/* /var/named/
sudo cp /tmp/dhcpd.conf /etc/dhcp/dhcpd.conf
## Re-sign DNSSEC
cd /var/named
sudo dnssec-signzone -A -3 $(head -c 500 /dev/urandom | sha1sum | cut -b 1-16) \
-N INCREMENT -o wdc.us.gl3 -t wdc.us.gl3.db
sudo rndc reload
sudo systemctl restart dhcpd
## Re-baseline AIDE
sudo aide --update && sudo mv /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz
6. Phase 5 — Post-Incident
Post-incident report due within 72 hours. Root cause analysis, timeline, affected systems, remediation actions, lessons learned, process improvements. All evidence preserved in
/var/log/incident/.Contacts
| Role | Contact | Escalation |
|---|---|---|
| IT Infrastructure Lead | Rajesh Chhetry | First responder for all incidents |
| IT Manager | — | P1/P2 escalation within 15min/1hr |
| CISO | — | P1 escalation, breach notification |
Disaster Recovery Plan
GPUS-DRP-001 · v1.0 · Effective: 2026-03-10 · Owner: IT Department · Classification: INTERNAL
1. Recovery Objectives
| System | RTO | RPO | Recovery Method |
|---|---|---|---|
| DNS (SKY/RAIN) | 5 min | 0 (real-time failover) | Automatic — RAIN takes over |
| DHCP (SKY/RAIN) | 30 sec | 0 (real-time failover) | Automatic — failover peer |
| Monitoring (SUN) | 1 hr | 15 sec (scrape interval) | ESXi snapshot restore |
| Logging (WIND) | 1 hr | 24 hr (daily backup) | ESXi snapshot + backup restore |
| Cloud VPN | 15 min | N/A | Terraform redeploy |
| Cloud Run | 5 min | N/A | Auto-healing by GCP |
2. Disaster Scenarios
Scenario 1: Single server failure (SKY or RAIN)
Impact: Minimal — failover is automatic. RAIN serves DNS/DHCP if SKY is down and vice versa. Restore failed server from ESXi snapshot within 1 hour.
Scenario 2: Both DNS/DHCP servers down
## Emergency: Deploy from backup on any Rocky Linux 8 box
tar xzf /backup/dns-dhcp/dns-backup-LATEST.tar.gz -C /tmp
dnf install -y bind dhcp-server
cp /tmp/zones/* /var/named/
cp /tmp/dhcpd.conf /etc/dhcp/
cp /tmp/named.conf /etc/
systemctl start named dhcpd
Scenario 3: ESXi host failure
All 4 WDC VMs lost. GCP VMs (OAK/MAPLE/CEDAR) remain operational. Rebuild WDC from backups on replacement ESXi host. Total rebuild time: ~4 hours following the deployment guides (sky-rain + sun-wind docs).
Scenario 4: WDC site loss (fire, flood)
GCP services remain operational. Backups in GCS bucket. DNS can be redirected at Hover. Rebuild on-prem at DR site using GCS backups + Terraform + deployment guides.
Scenario 5: Cloud VPN tunnel down
## Check tunnel status
gcloud compute vpn-tunnels describe gpus-vpn-tunnel-wdc --region=us-central1
## If ESTABLISHED lost — check Meraki side first
# Meraki Dashboard → Security & SD-WAN → VPN Status
## Redeploy VPN via Terraform if needed
cd ~/terraform/gpus-infra/terraform
terraform apply -target=google_compute_vpn_tunnel.wdc_tunnel
3. Backup Schedule
| Data | Frequency | Location | Retention |
|---|---|---|---|
| DNS zone files | Daily cron | /backup + GCS (planned) | 90 days |
| DHCP config + leases | Daily cron | /backup + GCS (planned) | 90 days |
| ES snapshots | Daily | /backup + GCS (planned) | 90 days |
| Prometheus TSDB | Daily | /backup | 90 days |
| ESXi VM snapshots | Weekly | Local datastore | 4 snapshots |
| Terraform state | Every apply | GCS (gpus-infra-tf-state) | 5 versions |
4. DR Testing Schedule
Quarterly DR tests: Q1 (DNS failover), Q2 (full server restore from backup), Q3 (site failover simulation), Q4 (full tabletop exercise). Results documented and reviewed by IT Manager.
Change Log
2026-04-07
soc.greenpeace.us deployed — SOC dashboard live with Wazuh/OpenVAS/Lynis/AIDE/Fail2ban/Prometheus feedsGCP
2026-03-10 13:51
chronyd fix:
denyall → deny all on SUN + WIND · AIDE re-baselinedCONFIG2026-03-10 13:38
SUN + WIND rebooted — auditd immutable active · CIS 48/48 + 51/51CIS
2026-03-10 11:15
Cloud VPN ESTABLISHED — 130.211.194.72 ↔ 38.140.146.68GCP
2026-03-10 11:00
GCP infra deployed — VPC, VPN, Cloud Run ×2, GCS ×2, Artifact Registry (19 resources)GCP
2026-03-10 10:55
RAIN DHCP updated — 112 reservations · failover normal · AIDE re-baselinedDHCP
2026-03-10 10:49
SKY DNS/DHCP bulk update — 112 workstations · serial 2026031002 · DNSSEC signedDNSDHCP
2026-03-10 10:30
GCP project
gpus-infra created · billing linked · APIs enabledGCPAIDE Baselines
| Server | Baseline | Reason | Status |
|---|---|---|---|
| SKY | 2026-03-10 10:49 | DNS/DHCP bulk update | ✓ |
| RAIN | 2026-03-10 10:55 | DHCP update | ✓ |
| SUN | 2026-03-10 13:51 | chronyd fix + reboot | ✓ |
| WIND | 2026-03-10 13:52 | chronyd fix + reboot | ✓ |
DNSSEC History
| Date | Serial | Sigs | KSK | ZSK |
|---|---|---|---|---|
| 2026-03-10 | 2026031002 | 280 | +008+37075 | +008+06660 |
Terraform History
| Date | Action | Resources | Project |
|---|---|---|---|
| 2026-03-10 | Initial deploy | 19 created | gpus-infra |
Document Versions
| Document | Version | Updated |
|---|---|---|
| sky-rain-dns-dhcp-infrastructure.md | v2.2 | 2026-03-10 |
| sun-wind-monitoring-logging.md | v1.1 | 2026-03-10 |
| wdc-infrastructure-architecture-overview.md | v1.2 | 2026-03-10 |
| wdchostregistry.csv (IAR) | v2.0 | 2026-03-10 |
| gpus-it-architecture.html | v2.1 | 2026-03-10 |
| gcp-cloud-infrastructure.md | v1.5 | 2026-03-10 |