Add docs/TASKS.md: - Completed tasks (6): K3S, Flux, ESS, Themes, Desktop Scripts, Monitoring, TURN - In Progress: Authentik Stage 2 (pending manual config) - Backlog (15+): Element Call Fork, PostgreSQL migration, NetworkPolicies, etc. - Security hardening: Host/Cluster/App layer recommendations - Milestones: Track progress from M1 (Basic) to M7 (Enterprise) Enhance devcontainer.json: - Add gruntfuggly.todo-tree extension for task tree view - Add ms-vscode.makefile-tools for build automation - Add GitHub.copilot for development assistance - Configure todo-tree to highlight TASKS.md and deployment guides Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
15 KiB
aXion1337.Chat – Task List & Meilensteine
Statusübersicht: [✅ 6 Abgeschlossen] [🔄 1 In Progress] [📋 15+ Pending] [🔒 10 Security]
✅ Abgeschlossene Aufgaben (Chronologisch)
Phase 1: Basis-Setup
-
K3S Cluster aufsetzen – Single-Node auf Hetzner Cloud (49.13.132.245)
- Commit:
initial-setup(vor Projekt) - Status: ✅ Läuft
- Commit:
-
Flux CD Installation
- SOPS + age Encryption
- GitOps Repository konfigurieren
- Commit:
setup-flux(vor Projekt) - Status: ✅ Läuft
-
Element Server Suite v26.4.0 Deployment
- Synapse Homeserver (
matrix.axion1337.chat) - Matrix Authentication Service (
account.axion1337.chat) - Element Web (
axion1337.chat) - Element Admin (
admin.axion1337.chat) - MatrixRTC/Element Call (
mrtc.axion1337.chat) - Commit:
deploy-ess-matrix-stack - Status: ✅ Running
- Synapse Homeserver (
Phase 2: Core Features
-
7 Custom Element Web Themes
- aXion1337 Dark, Deep Purple, Discord Dark, Electric Blue, Everforest, Gruvbox, Wal
- Alphabetisch sortiert
- Commit:
add-custom-element-themes - Status: ✅ Deployed
-
Element Desktop Setup Scripts (Windows/macOS/Linux)
- Auto-Download + Install + Config
- Hosted auf
axion1337.chat/docs/setup/ - Commits:
add-element-desktop-setup-scripts,fix-element-setup-script-hosting - Status: ✅ Deployed
-
Room Policies
- Message Retention (1d–1y lifecycle)
- Room Publication Rules (allow all)
- Auto-Join Rooms für Onboarding
- Commit:
add-synapse-retention-publication-autojoin - Status: ✅ Deployed
Phase 3: WebRTC & Medienübertragung
- TURN Server (coturn) für Video-Calls
- Domain:
turn.axion1337.chat - HMAC-Auth mit Shared Secret
- Ports: 3478/udp, 3478/tcp, 5349/tcp, 49152-65535/udp
- Commit:
implement-turn-server-coturn-for-webrtc-video-calls - Status: ✅ Deployed
- Manual: DNS A-Record + Firewall-Ports öffnen (noch erforderlich)
- Domain:
Phase 4: Monitoring & Observability
- Monitoring Stack Integration
- Alloy (Grafana Agent) als Collector
- Remote Write zu Selendis (10.0.0.3:9090 Prometheus, :3100 Loki)
- kube-state-metrics, node-exporter DaemonSet
- Commits:
integrate-monitoring-alloy-prometheus-loki,fix-prometheus-remote-write-docker - Status: ✅ Deployed
Phase 5: Identity Provider (Authentik)
- Authentik Stage 1 Deployment
- HelmRelease v2026.x in
authentiknamespace - Embedded PostgreSQL + Alloy-compatible
- Cert-Manager für TLS
- Commit:
deploy-authentik-as-identity-provider-for-matrix-stage-1 - Status: ✅ Deployed
- Manual: Admin-Passwort setzen + OIDC Provider erstellen (erforderlich)
- HelmRelease v2026.x in
🔄 [IN PROGRESS] Authentik Stage 2 – MAS Integration
- MAS Upstream OIDC Konfiguration
- Client ID/Secret aus Authentik Admin UI kopieren
upstream_oauth2_configinmas-secret.yamleinfügenpasswords: enabled: false- Commit: (pending)
- Status: ⏳ Wartet auf manuelle Authentik-Konfiguration
Phase 6: Dokumentation
- Deployment Guides erstellen
- 5 Markdown-Dateien in
docs/deployment-guides/ - Chronologisch geordnet
- Troubleshooting + Best Practices
- Commit:
add-comprehensive-deployment-configuration-documentation - Status: ✅ Deployed
- 5 Markdown-Dateien in
🔄 In Progress / Blocked
Authentik Stage 2 – MAS Integration (⏳ Depends on Manual Config)
Beschreibung: Authentik OIDC Provider muss manuell im Authentik Admin UI konfiguriert werden, bevor Stage 2 Deployment möglich ist.
Schritte:
- ✅ Authentik Stage 1 Deployment (done)
- ⏳ Authentik Admin UI: OIDC Provider erstellen (MANUAL - user action)
- ⏳ Authentik Admin UI: Application mit Slug
matrixerstellen (MANUAL - user action) - ⏳ Authentik Admin UI: Enrollment Flow mit Invitation Stage (MANUAL - user action)
- ⏳ Authentik Admin UI: Client ID + Secret kopieren (MANUAL - user action)
- 📋 MAS
upstream_oauth2_configmit Client Credentials aktualisieren - 📋
passwords: enabled: falseaktivieren - 📋 Commit + Push
Blocker: Manuelle Authentik-Konfiguration (wartet auf Benutzer)
📋 Backlog (Weitere Aufgaben)
Authentik Completion
-
Finish Authentik Stage 2 – MAS Integration
- Prerequisites: Authentik OIDC Provider vollständig konfiguriert
- Task: Update
mas-secret.yaml, enable password login disable - Commit:
enable-authentik-oidc-integration-in-mas - Est. Effort: 30 min (manual + scripted)
-
Test End-to-End Login Flow
- Element Web login → MAS → Authentik → Matrix User Creation
- Create test users via Authentik
- Verify password reset flow
- Commit: (implicit in Stage 2)
- Est. Effort: 20 min
-
Create Invite Links für neue User
- Authentik Admin UI → Invitations → Create
- Set expiry dates (7d) + use limits
- Document procedure
- Est. Effort: 15 min
Element Call Enhancement
- Element Call Fork für Custom Constraints
- Repository: Fork
element-hq/element-call - Feature: Video/Audio constraints parameter im config
- Include: Bandwidth limiting, resolution limits, frame rate control
- Integration mit Synapse well-known
- Est. Effort: 2–3 days (fork + feature + test)
- Priority: HIGH (user feature)
- Repository: Fork
Database Hardening
-
External/Dedicated PostgreSQL Deployment
- Option 1: CloudNativePG Operator (open-source, auf K3S)
- Option 2: Managed Hetzner Postgres
- Separate aus ESS matrix-stack embedded Postgres
- HA + Replication
- Est. Effort: 1–2 days
- Priority: HIGH (reliability)
-
Database Backup Strategy
- Daily automated backups (PgBackRest oder velero)
- Off-site backup storage (S3 / Hetzner Storage Box)
- Monthly verified restores (test restore → verify data integrity)
- Backup + restore documentation
- Est. Effort: 2–3 days
- Priority: CRITICAL (disaster recovery)
-
Synapse Media PVC Backups
- Separate backup pipeline für
/data/media_storePVC - Reason: Media oft >100GB, sollte nicht im DB-Backup sein
- Velero + Restic für block-level backup
- Est. Effort: 1 day
- Priority: HIGH (data preservation)
- Separate backup pipeline für
Network Security
-
NetworkPolicies – K8s-Layer Segmentation
- Default-Deny Ingress für
matrixnamespace - Allow rules:
- Ingress → MAS:443
- Ingress → ElementWeb:443
- MAS ↔ Synapse:8008
- Synapse ↔ Postgres:5432
- Authentik → Postgres:5432
- Authentik → Loki:3100 (monitoring)
- Egress: Matrix-specific (federation, etc.)
- Est. Effort: 1 day
- Priority: MEDIUM (compliance, least-privilege)
- Default-Deny Ingress für
-
Pod Security Admission (Restricted)
- Apply to
matrix&authentiknamespaces - Enforce: non-root, no privileged, read-only root fs
- Test: Ensure no chart breakage
- Est. Effort: 1 day
- Priority: MEDIUM (hardening)
- Apply to
Federation & Access Control
- Federation-Allowlist oder Closed Federation
- Decision: Which servers to federate with?
- If allowlist: explicit
federation_domain_whitelist - If closed:
allow_public_rooms_without_join_rules: false - Synapse config in
synapse-values.yaml - Est. Effort: 4 hours
- Priority: MEDIUM (security policy)
Moderation & Anti-Abuse
-
Mjolnir/Draupnir Bot Deployment
- Open-source moderation bot für Matrix
- Reason: Invitation-based, aber Federation kann Spam bringen
- Auto-ban known bad servers/users
- Spam-detection rules
- HelmChart oder custom Deployment
- Est. Effort: 1–2 days
- Priority: MEDIUM (ops safety)
-
Content Scanner for Media
- matrix-content-scanner + ClamAV antivirus
- Scan uploaded media for malware
- Block suspicious files
- Est. Effort: 1–2 days
- Priority: LOW–MEDIUM (optional but good practice)
Secrets Management
- External-Secrets Operator oder SOPS für Flux
- Current: SOPS with age encryption
- Consideration: External-Secrets for cloud-native (AWS Secrets Manager, Hetzner Vault, etc.)
- OR: Improve SOPS rotation strategy
- Decision needed: Keep SOPS or upgrade?
- Est. Effort: 2–3 days (if switching)
- Priority: LOW (current SOPS setup working)
Image & Dependency Management
-
Renovate / Dependabot Setup
- Auto-update Helm Chart versions
- Auto-update Container Image Tags
- Monitor for security patches
- Est. Effort: 4 hours
- Priority: MEDIUM (maintenance)
-
Trivy Image Scanning
- Scan images in Flux HelmReleases for CVEs
- Block deployment if critical CVE found
- CI/CD hook in git workflow
- Est. Effort: 8 hours
- Priority: LOW–MEDIUM (security posture)
-
Monitor ESS & Element Security Advisories
- Subscribe to
element-hqsecurity mailing list - Monitor
#matrix-communitysecurity channels - Auto-alerts on new CVEs/patches
- Est. Effort: Ongoing (low maintenance)
- Priority: MEDIUM (security awareness)
- Subscribe to
Container Security
- Disable automountServiceAccountToken Everywhere
- Audit all Deployments/StatefulSets
- Disable for: Synapse, ElementWeb, MAS, Postgres, Authentik (where not needed)
- Add
automountServiceAccountToken: falseto spec.template.spec - Test: Ensure no breakage
- Est. Effort: 4 hours
- Priority: MEDIUM (least-privilege)
🔒 Security Hardening (Host & Cluster Level)
Host OS Layer (Ubuntu/Debian)
-
Hetzner Cloud Firewall
- Default-Deny inbound
- Allow: 80/443 (HTTP/HTTPS)
- Allow: 22 (SSH) from your IP only (or via WireGuard/Tailscale)
- Status: ✅ Can be done in Hetzner UI
- Est. Effort: 30 min
- Priority: CRITICAL (immediate, zero config cost)
-
SSH Hardening
- Disable password auth (key-only)
- Disable root login
- PermitRootLogin: no
- PasswordAuthentication: no
- MaxAuthTries: 3
- Optional: Change SSH port (cosmetic, reduces log noise)
- Optional: SSH hinter WireGuard/Tailscale (eliminates fail2ban für SSH)
- Est. Effort: 2 hours
- Priority: HIGH (immediate)
-
unattended-upgrades
- Enable automatic security updates
- Configure: APT::Periodic::Update-Package-Lists "1";
- Configure: APT::Periodic::Unattended-Upgrade "1";
- Configure: APT::Periodic::AutocleanInterval "7";
- Est. Effort: 30 min
- Priority: HIGH (set & forget)
-
K3S API Security
- Current: K3S API listening on :6443 on all interfaces (default)
- Hardening:
- Option 1: Firewall restrict :6443 to localhost only
- Option 2: K3S --bind-address + --advertise-address to WireGuard IP
- Option 3: kubectl access only via jumphost/bastion
- Est. Effort: 2 hours
- Priority: HIGH (API is high-value target)
-
auditd for File Integrity & Syscall Audit
- Monitor: /etc, ~/.kube, /var/lib/rancher/k3s
- Audit rules für sensitive file changes
- Low overhead, good signal/noise ratio
- Output to syslog / centralized logging
- Est. Effort: 2 hours
- Priority: MEDIUM (forensics + compliance)
-
Kernel Hardening (sysctl)
- Apply hardening recommendations from Lynis
- Key settings:
- kernel.kptr_restrict=2 (hide kernel pointers)
- kernel.dmesg_restrict=1 (restrict dmesg)
- net.ipv4.tcp_syncookies=1 (SYN flood protection)
- net.ipv4.conf.all.rp_filter=1 (reverse path filtering)
- net.ipv4.conf.all.send_redirects=0
- net.ipv6.conf.all.disable_ipv6=0 (or =1 if no IPv6 needed)
- Persist via /etc/sysctl.d/99-hardening.conf
- Est. Effort: 2 hours
- Priority: MEDIUM (defense in depth)
-
Lynis Security Baseline
- Run
lynis audit system - Review recommendations
- Implement high-priority findings
- Aim for score >80
- Re-run quarterly
- Est. Effort: 4 hours (initial) + 1 hour quarterly
- Priority: MEDIUM (baseline verification)
- Run
Cluster Layer (K3S / Kubernetes)
-
CrowdSec Integration
- Install CrowdSec agent on host
- Connect to CrowdSec Hub (commercial platform, free tier available)
- Feed auth.log, syslog → CrowdSec for attack detection
- Auto-block IPs via local firewall or Hetzner Firewall API
- Est. Effort: 4 hours
- Priority: MEDIUM (proactive threat response)
-
Falco Runtime Monitoring
- Install Falco DaemonSet in K3S
- Monitor: Shell spawning in containers, suspicious syscalls, privilege escalation
- Output to Loki / syslog
- Alert on anomalies
- Est. Effort: 1 day
- Priority: MEDIUM (runtime detection)
🎯 Meilensteine (Milestones)
| Meilenstein | Beschreibung | Status | ETA |
|---|---|---|---|
| M1: Basis-Setup | K3S + Flux + ESS deployed | ✅ Done | - |
| M2: Core Matrix | Themes, Scripts, Policies | ✅ Done | - |
| M3: WebRTC & Monitoring | TURN + Alloy/Prometheus/Loki | ✅ Done | - |
| M4: Identity Provider | Authentik Stage 1+2 (pending Stage 2) | 🔄 In Progress | ~1–2 days |
| M5: Production-Ready | DB Backups, NetworkPolicies, Security Hardening | 📋 Backlog | ~2–3 weeks |
| M6: Advanced Features | Element Call Fork, Content Scanner, Mjolnir | 📋 Backlog | ~4+ weeks |
| M7: Enterprise-Ready | Full compliance (DSGVO), HA setup, Disaster Recovery | 🎯 Future | ~8+ weeks |
📊 Prioritäts-Kategorien
🔴 CRITICAL (do immediately)
- Hetzner Cloud Firewall setup
- Database backup strategy
- SSH hardening
🟠 HIGH (do within 1–2 weeks)
- Authentik Stage 2 completion
- External PostgreSQL migration
- NetworkPolicies
- Element Call fork
🟡 MEDIUM (do within 1 month)
- CrowdSec + Falco
- Mjolnir bot
- Renovate/Trivy
- PSA restricted mode
- Kernel hardening
🟢 LOW (nice-to-have, do if time allows)
- Content scanner (ClamAV)
- External-Secrets upgrade
- SSH port relocation
- Advanced federation rules
📝 Notes & Decision Points
Authentik Stage 2 Blocker
⏳ Waiting for: User to manually configure Authentik OIDC Provider in Authentik Admin UI.
- Once done, provide Client ID + Secret
- Then: Commit Stage 2 MAS config
Database: CloudNativePG vs. Hetzner Postgres
- CloudNativePG: Open-source, runs on K3S, full control
- Hetzner Postgres: Managed, backups included, less ops overhead
- Decision: Recommend CloudNativePG for now (cost-effective), migrate to Hetzner later if operational overhead too high
Federation: Allowlist vs. Closed?
- Allowlist: Default federation with all public servers, can be attacked
- Closed: Only federate with trusted servers (higher security, lower interop)
- Decision: Depends on user intent. For now: allow all, add Mjolnir for abuse protection
Security Framework
- Layers: Perimeter (Firewall) → Host (SSH, auditd, hardening) → Cluster (NetworkPolicies, PSA, Falco) → App (Rate-limits, Mjolnir)
- Approach: Implement incrementally, test after each layer
🔗 Related Documentation
docs/deployment-guides/README.md– Overviewdocs/deployment-guides/01-turn-server-setup.md– TURNdocs/deployment-guides/02-authentik-identity-provider.md– Authentik (Stage 1 + Stage 2 plan)docs/deployment-guides/03-monitoring-integration.md– Monitoringdocs/deployment-guides/04-element-customization.md– Themes, Desktopdocs/deployment-guides/05-room-policies.md– Policies
Last Updated: 2026-05-14
Next Review: 2026-05-21