Scrublord MacBad 4cf6702f85 Add comprehensive task list and VSCode todo-tree integration
Add docs/TASKS.md:
- Completed tasks (6): K3S, Flux, ESS, Themes, Desktop Scripts, Monitoring, TURN
- In Progress: Authentik Stage 2 (pending manual config)
- Backlog (15+): Element Call Fork, PostgreSQL migration, NetworkPolicies, etc.
- Security hardening: Host/Cluster/App layer recommendations
- Milestones: Track progress from M1 (Basic) to M7 (Enterprise)

Enhance devcontainer.json:
- Add gruntfuggly.todo-tree extension for task tree view
- Add ms-vscode.makefile-tools for build automation
- Add GitHub.copilot for development assistance
- Configure todo-tree to highlight TASKS.md and deployment guides

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 23:05:33 +02:00

422 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# aXion1337.Chat Task List & Meilensteine
**Statusübersicht**: [✅ 6 Abgeschlossen] [🔄 1 In Progress] [📋 15+ Pending] [🔒 10 Security]
---
## ✅ Abgeschlossene Aufgaben (Chronologisch)
### Phase 1: Basis-Setup
- [x] **K3S Cluster aufsetzen** Single-Node auf Hetzner Cloud (49.13.132.245)
- Commit: `initial-setup` (vor Projekt)
- Status: ✅ Läuft
- [x] **Flux CD Installation**
- SOPS + age Encryption
- GitOps Repository konfigurieren
- Commit: `setup-flux` (vor Projekt)
- Status: ✅ Läuft
- [x] **Element Server Suite v26.4.0 Deployment**
- Synapse Homeserver (`matrix.axion1337.chat`)
- Matrix Authentication Service (`account.axion1337.chat`)
- Element Web (`axion1337.chat`)
- Element Admin (`admin.axion1337.chat`)
- MatrixRTC/Element Call (`mrtc.axion1337.chat`)
- Commit: `deploy-ess-matrix-stack`
- Status: ✅ Running
### Phase 2: Core Features
- [x] **7 Custom Element Web Themes**
- aXion1337 Dark, Deep Purple, Discord Dark, Electric Blue, Everforest, Gruvbox, Wal
- Alphabetisch sortiert
- Commit: `add-custom-element-themes`
- Status: ✅ Deployed
- [x] **Element Desktop Setup Scripts** (Windows/macOS/Linux)
- Auto-Download + Install + Config
- Hosted auf `axion1337.chat/docs/setup/`
- Commits: `add-element-desktop-setup-scripts`, `fix-element-setup-script-hosting`
- Status: ✅ Deployed
- [x] **Room Policies**
- Message Retention (1d1y lifecycle)
- Room Publication Rules (allow all)
- Auto-Join Rooms für Onboarding
- Commit: `add-synapse-retention-publication-autojoin`
- Status: ✅ Deployed
### Phase 3: WebRTC & Medienübertragung
- [x] **TURN Server (coturn) für Video-Calls**
- Domain: `turn.axion1337.chat`
- HMAC-Auth mit Shared Secret
- Ports: 3478/udp, 3478/tcp, 5349/tcp, 49152-65535/udp
- Commit: `implement-turn-server-coturn-for-webrtc-video-calls`
- Status: ✅ Deployed
- Manual: DNS A-Record + Firewall-Ports öffnen (noch erforderlich)
### Phase 4: Monitoring & Observability
- [x] **Monitoring Stack Integration**
- Alloy (Grafana Agent) als Collector
- Remote Write zu Selendis (10.0.0.3:9090 Prometheus, :3100 Loki)
- kube-state-metrics, node-exporter DaemonSet
- Commits: `integrate-monitoring-alloy-prometheus-loki`, `fix-prometheus-remote-write-docker`
- Status: ✅ Deployed
### Phase 5: Identity Provider (Authentik)
- [x] **Authentik Stage 1 Deployment**
- HelmRelease v2026.x in `authentik` namespace
- Embedded PostgreSQL + Alloy-compatible
- Cert-Manager für TLS
- Commit: `deploy-authentik-as-identity-provider-for-matrix-stage-1`
- Status: ✅ Deployed
- Manual: Admin-Passwort setzen + OIDC Provider erstellen (erforderlich)
🔄 **[IN PROGRESS] Authentik Stage 2 MAS Integration**
- [ ] **MAS Upstream OIDC Konfiguration**
- Client ID/Secret aus Authentik Admin UI kopieren
- `upstream_oauth2_config` in `mas-secret.yaml` einfügen
- `passwords: enabled: false`
- Commit: (pending)
- Status: ⏳ Wartet auf manuelle Authentik-Konfiguration
### Phase 6: Dokumentation
- [x] **Deployment Guides erstellen**
- 5 Markdown-Dateien in `docs/deployment-guides/`
- Chronologisch geordnet
- Troubleshooting + Best Practices
- Commit: `add-comprehensive-deployment-configuration-documentation`
- Status: ✅ Deployed
---
## 🔄 In Progress / Blocked
### Authentik Stage 2 MAS Integration (⏳ Depends on Manual Config)
**Beschreibung**: Authentik OIDC Provider muss manuell im Authentik Admin UI konfiguriert werden, bevor Stage 2 Deployment möglich ist.
**Schritte**:
1. ✅ Authentik Stage 1 Deployment (done)
2. ⏳ Authentik Admin UI: OIDC Provider erstellen (MANUAL - user action)
3. ⏳ Authentik Admin UI: Application mit Slug `matrix` erstellen (MANUAL - user action)
4. ⏳ Authentik Admin UI: Enrollment Flow mit Invitation Stage (MANUAL - user action)
5. ⏳ Authentik Admin UI: Client ID + Secret kopieren (MANUAL - user action)
6. 📋 MAS `upstream_oauth2_config` mit Client Credentials aktualisieren
7. 📋 `passwords: enabled: false` aktivieren
8. 📋 Commit + Push
**Blocker**: Manuelle Authentik-Konfiguration (wartet auf Benutzer)
---
## 📋 Backlog (Weitere Aufgaben)
### Authentik Completion
- [ ] **Finish Authentik Stage 2 MAS Integration**
- Prerequisites: Authentik OIDC Provider vollständig konfiguriert
- Task: Update `mas-secret.yaml`, enable password login disable
- Commit: `enable-authentik-oidc-integration-in-mas`
- Est. Effort: 30 min (manual + scripted)
- [ ] **Test End-to-End Login Flow**
- Element Web login → MAS → Authentik → Matrix User Creation
- Create test users via Authentik
- Verify password reset flow
- Commit: (implicit in Stage 2)
- Est. Effort: 20 min
- [ ] **Create Invite Links für neue User**
- Authentik Admin UI → Invitations → Create
- Set expiry dates (7d) + use limits
- Document procedure
- Est. Effort: 15 min
### Element Call Enhancement
- [ ] **Element Call Fork für Custom Constraints**
- Repository: Fork `element-hq/element-call`
- Feature: Video/Audio constraints parameter im config
- Include: Bandwidth limiting, resolution limits, frame rate control
- Integration mit Synapse well-known
- Est. Effort: 23 days (fork + feature + test)
- Priority: **HIGH** (user feature)
### Database Hardening
- [ ] **External/Dedicated PostgreSQL Deployment**
- Option 1: CloudNativePG Operator (open-source, auf K3S)
- Option 2: Managed Hetzner Postgres
- Separate aus ESS matrix-stack embedded Postgres
- HA + Replication
- Est. Effort: 12 days
- Priority: **HIGH** (reliability)
- [ ] **Database Backup Strategy**
- Daily automated backups (PgBackRest oder velero)
- Off-site backup storage (S3 / Hetzner Storage Box)
- Monthly verified restores (test restore → verify data integrity)
- Backup + restore documentation
- Est. Effort: 23 days
- Priority: **CRITICAL** (disaster recovery)
- [ ] **Synapse Media PVC Backups**
- Separate backup pipeline für `/data/media_store` PVC
- Reason: Media oft >100GB, sollte nicht im DB-Backup sein
- Velero + Restic für block-level backup
- Est. Effort: 1 day
- Priority: **HIGH** (data preservation)
### Network Security
- [ ] **NetworkPolicies K8s-Layer Segmentation**
- Default-Deny Ingress für `matrix` namespace
- Allow rules:
- Ingress → MAS:443
- Ingress → ElementWeb:443
- MAS ↔ Synapse:8008
- Synapse ↔ Postgres:5432
- Authentik → Postgres:5432
- Authentik → Loki:3100 (monitoring)
- Egress: Matrix-specific (federation, etc.)
- Est. Effort: 1 day
- Priority: **MEDIUM** (compliance, least-privilege)
- [ ] **Pod Security Admission (Restricted)**
- Apply to `matrix` & `authentik` namespaces
- Enforce: non-root, no privileged, read-only root fs
- Test: Ensure no chart breakage
- Est. Effort: 1 day
- Priority: **MEDIUM** (hardening)
### Federation & Access Control
- [ ] **Federation-Allowlist oder Closed Federation**
- Decision: Which servers to federate with?
- If allowlist: explicit `federation_domain_whitelist`
- If closed: `allow_public_rooms_without_join_rules: false`
- Synapse config in `synapse-values.yaml`
- Est. Effort: 4 hours
- Priority: **MEDIUM** (security policy)
### Moderation & Anti-Abuse
- [ ] **Mjolnir/Draupnir Bot Deployment**
- Open-source moderation bot für Matrix
- Reason: Invitation-based, aber Federation kann Spam bringen
- Auto-ban known bad servers/users
- Spam-detection rules
- HelmChart oder custom Deployment
- Est. Effort: 12 days
- Priority: **MEDIUM** (ops safety)
- [ ] **Content Scanner for Media**
- matrix-content-scanner + ClamAV antivirus
- Scan uploaded media for malware
- Block suspicious files
- Est. Effort: 12 days
- Priority: **LOWMEDIUM** (optional but good practice)
### Secrets Management
- [ ] **External-Secrets Operator oder SOPS für Flux**
- Current: SOPS with age encryption
- Consideration: External-Secrets for cloud-native (AWS Secrets Manager, Hetzner Vault, etc.)
- OR: Improve SOPS rotation strategy
- Decision needed: Keep SOPS or upgrade?
- Est. Effort: 23 days (if switching)
- Priority: **LOW** (current SOPS setup working)
### Image & Dependency Management
- [ ] **Renovate / Dependabot Setup**
- Auto-update Helm Chart versions
- Auto-update Container Image Tags
- Monitor for security patches
- Est. Effort: 4 hours
- Priority: **MEDIUM** (maintenance)
- [ ] **Trivy Image Scanning**
- Scan images in Flux HelmReleases for CVEs
- Block deployment if critical CVE found
- CI/CD hook in git workflow
- Est. Effort: 8 hours
- Priority: **LOWMEDIUM** (security posture)
- [ ] **Monitor ESS & Element Security Advisories**
- Subscribe to `element-hq` security mailing list
- Monitor `#matrix-community` security channels
- Auto-alerts on new CVEs/patches
- Est. Effort: Ongoing (low maintenance)
- Priority: **MEDIUM** (security awareness)
### Container Security
- [ ] **Disable automountServiceAccountToken Everywhere**
- Audit all Deployments/StatefulSets
- Disable for: Synapse, ElementWeb, MAS, Postgres, Authentik (where not needed)
- Add `automountServiceAccountToken: false` to spec.template.spec
- Test: Ensure no breakage
- Est. Effort: 4 hours
- Priority: **MEDIUM** (least-privilege)
---
## 🔒 Security Hardening (Host & Cluster Level)
### Host OS Layer (Ubuntu/Debian)
- [ ] **Hetzner Cloud Firewall**
- Default-Deny inbound
- Allow: 80/443 (HTTP/HTTPS)
- Allow: 22 (SSH) from your IP only (or via WireGuard/Tailscale)
- Status: ✅ Can be done in Hetzner UI
- Est. Effort: 30 min
- Priority: **CRITICAL** (immediate, zero config cost)
- [ ] **SSH Hardening**
- Disable password auth (key-only)
- Disable root login
- PermitRootLogin: no
- PasswordAuthentication: no
- MaxAuthTries: 3
- Optional: Change SSH port (cosmetic, reduces log noise)
- Optional: SSH hinter WireGuard/Tailscale (eliminates fail2ban für SSH)
- Est. Effort: 2 hours
- Priority: **HIGH** (immediate)
- [ ] **unattended-upgrades**
- Enable automatic security updates
- Configure: APT::Periodic::Update-Package-Lists "1";
- Configure: APT::Periodic::Unattended-Upgrade "1";
- Configure: APT::Periodic::AutocleanInterval "7";
- Est. Effort: 30 min
- Priority: **HIGH** (set & forget)
- [ ] **K3S API Security**
- Current: K3S API listening on :6443 on all interfaces (default)
- Hardening:
- Option 1: Firewall restrict :6443 to localhost only
- Option 2: K3S --bind-address + --advertise-address to WireGuard IP
- Option 3: kubectl access only via jumphost/bastion
- Est. Effort: 2 hours
- Priority: **HIGH** (API is high-value target)
- [ ] **auditd for File Integrity & Syscall Audit**
- Monitor: /etc, ~/.kube, /var/lib/rancher/k3s
- Audit rules für sensitive file changes
- Low overhead, good signal/noise ratio
- Output to syslog / centralized logging
- Est. Effort: 2 hours
- Priority: **MEDIUM** (forensics + compliance)
- [ ] **Kernel Hardening (sysctl)**
- Apply hardening recommendations from Lynis
- Key settings:
- kernel.kptr_restrict=2 (hide kernel pointers)
- kernel.dmesg_restrict=1 (restrict dmesg)
- net.ipv4.tcp_syncookies=1 (SYN flood protection)
- net.ipv4.conf.all.rp_filter=1 (reverse path filtering)
- net.ipv4.conf.all.send_redirects=0
- net.ipv6.conf.all.disable_ipv6=0 (or =1 if no IPv6 needed)
- Persist via /etc/sysctl.d/99-hardening.conf
- Est. Effort: 2 hours
- Priority: **MEDIUM** (defense in depth)
- [ ] **Lynis Security Baseline**
- Run `lynis audit system`
- Review recommendations
- Implement high-priority findings
- Aim for score >80
- Re-run quarterly
- Est. Effort: 4 hours (initial) + 1 hour quarterly
- Priority: **MEDIUM** (baseline verification)
### Cluster Layer (K3S / Kubernetes)
- [ ] **CrowdSec Integration**
- Install CrowdSec agent on host
- Connect to CrowdSec Hub (commercial platform, free tier available)
- Feed auth.log, syslog → CrowdSec for attack detection
- Auto-block IPs via local firewall or Hetzner Firewall API
- Est. Effort: 4 hours
- Priority: **MEDIUM** (proactive threat response)
- [ ] **Falco Runtime Monitoring**
- Install Falco DaemonSet in K3S
- Monitor: Shell spawning in containers, suspicious syscalls, privilege escalation
- Output to Loki / syslog
- Alert on anomalies
- Est. Effort: 1 day
- Priority: **MEDIUM** (runtime detection)
---
## 🎯 Meilensteine (Milestones)
| Meilenstein | Beschreibung | Status | ETA |
|------------|-------------|--------|-----|
| **M1: Basis-Setup** | K3S + Flux + ESS deployed | ✅ Done | - |
| **M2: Core Matrix** | Themes, Scripts, Policies | ✅ Done | - |
| **M3: WebRTC & Monitoring** | TURN + Alloy/Prometheus/Loki | ✅ Done | - |
| **M4: Identity Provider** | Authentik Stage 1+2 (pending Stage 2) | 🔄 In Progress | ~12 days |
| **M5: Production-Ready** | DB Backups, NetworkPolicies, Security Hardening | 📋 Backlog | ~23 weeks |
| **M6: Advanced Features** | Element Call Fork, Content Scanner, Mjolnir | 📋 Backlog | ~4+ weeks |
| **M7: Enterprise-Ready** | Full compliance (DSGVO), HA setup, Disaster Recovery | 🎯 Future | ~8+ weeks |
---
## 📊 Prioritäts-Kategorien
### 🔴 CRITICAL (do immediately)
- Hetzner Cloud Firewall setup
- Database backup strategy
- SSH hardening
### 🟠 HIGH (do within 12 weeks)
- Authentik Stage 2 completion
- External PostgreSQL migration
- NetworkPolicies
- Element Call fork
### 🟡 MEDIUM (do within 1 month)
- CrowdSec + Falco
- Mjolnir bot
- Renovate/Trivy
- PSA restricted mode
- Kernel hardening
### 🟢 LOW (nice-to-have, do if time allows)
- Content scanner (ClamAV)
- External-Secrets upgrade
- SSH port relocation
- Advanced federation rules
---
## 📝 Notes & Decision Points
### Authentik Stage 2 Blocker
**Waiting for**: User to manually configure Authentik OIDC Provider in Authentik Admin UI.
- Once done, provide Client ID + Secret
- Then: Commit Stage 2 MAS config
### Database: CloudNativePG vs. Hetzner Postgres
- **CloudNativePG**: Open-source, runs on K3S, full control
- **Hetzner Postgres**: Managed, backups included, less ops overhead
- **Decision**: Recommend CloudNativePG for now (cost-effective), migrate to Hetzner later if operational overhead too high
### Federation: Allowlist vs. Closed?
- **Allowlist**: Default federation with all public servers, can be attacked
- **Closed**: Only federate with trusted servers (higher security, lower interop)
- **Decision**: Depends on user intent. For now: allow all, add Mjolnir for abuse protection
### Security Framework
- **Layers**: Perimeter (Firewall) → Host (SSH, auditd, hardening) → Cluster (NetworkPolicies, PSA, Falco) → App (Rate-limits, Mjolnir)
- **Approach**: Implement incrementally, test after each layer
---
## 🔗 Related Documentation
- `docs/deployment-guides/README.md` Overview
- `docs/deployment-guides/01-turn-server-setup.md` TURN
- `docs/deployment-guides/02-authentik-identity-provider.md` Authentik (Stage 1 + Stage 2 plan)
- `docs/deployment-guides/03-monitoring-integration.md` Monitoring
- `docs/deployment-guides/04-element-customization.md` Themes, Desktop
- `docs/deployment-guides/05-room-policies.md` Policies
---
**Last Updated**: 2026-05-14
**Next Review**: 2026-05-21