diff --git a/00-TASKS.md b/00-TASKS.md new file mode 100644 index 0000000..4438ce0 --- /dev/null +++ b/00-TASKS.md @@ -0,0 +1,503 @@ +# aXion1337.Chat – Task List & Meilensteine + +**Last Updated**: 2026-05-14 +**Statusübersicht**: [✅ 6 Abgeschlossen] [🔄 1 In Progress] [📋 15+ Pending] [🔒 10 Security] + +--- + +## 📊 Status Summary (Quick View) + +| Kategorie | Count | Status | Details | +|-----------|-------|--------|---------| +| **Completed** | 6 | ✅ Done | K3S, Flux, ESS, Themes, Desktop, Monitoring, TURN | +| **In Progress** | 1 | 🔄 Blocked | Authentik Stage 2 (awaiting manual config) | +| **Backlog** | 15+ | 📋 Pending | Element Call Fork, DB Backups, NetworkPolicies, etc. | +| **Security Tasks** | 10 | 🔒 Pending | Firewall, SSH, auditd, Kernel hardening, CrowdSec, Falco | + +### Priority Distribution + +| Priority | Count | Timeline | +|----------|-------|----------| +| 🔴 **CRITICAL** | 3 | This week | +| 🟠 **HIGH** | 4 | 1–2 weeks | +| 🟡 **MEDIUM** | 8 | ~1 month | +| 🟢 **LOW** | 4+ | Nice-to-have | + +--- + +## 🎯 Next Steps (Priorisiert) + +### 🔴 **THIS WEEK – CRITICAL** +1. **Authentik Stage 2 abschließen** + - Manual: OIDC Provider + Application in Authentik UI erstellen + - Code: `upstream_oauth2_config` in `mas-secret.yaml` einfügen + - Code: `passwords: enabled: false` aktivieren + - Commit: `enable-authentik-oidc-integration-in-mas` + - Est. Time: 1–2 hours + - Blocker: Manual Authentik config (user action) + +2. **Hetzner Cloud Firewall – Default-Deny Setup** + - Ingress: Allow 80/443 only + - Allow SSH from your IP or via WireGuard/Tailscale + - Est. Time: 30 min + - Cost: Free + - Impact: Blocks 99% of internet background noise + +3. **SSH Hardening** + - Disable password auth (key-only) + - Disable root login + - MaxAuthTries 3 + - Est. Time: 1–2 hours + - Priority: HIGH + +4. **Database Backup Strategy – Decision & First Backup** + - Decision: CloudNativePG (on K3S) or Hetzner Postgres (managed)? + - Setup: Daily automated backups + - Setup: Off-site storage (S3 / Storage Box) + - Setup: Monthly verified restores + - Est. Time: 2–3 days + - Priority: CRITICAL (disaster recovery) + +### 🟠 **NEXT 1–2 WEEKS – HIGH** +1. **Authentik End-to-End Test** + - Test: Login flow Element → MAS → Authentik → Matrix User + - Test: Password reset + - Create: Test invite links + - Est. Time: 2 hours + +2. **Element Call Fork** + - Fork: element-hq/element-call + - Feature: Video/audio constraints parameters + - Integration: Synapse well-known config + - Est. Time: 2–3 days + +3. **External PostgreSQL Migration** + - Decision: CloudNativePG vs. Hetzner Postgres + - Setup: HA + Replication + - Migration: Move data from ESS embedded Postgres + - Testing: Verify all services work + - Est. Time: 1–2 days + +4. **NetworkPolicies Deployment** + - Create: Default-Deny for `matrix` namespace + - Create: Allow rules (Synapse↔Postgres, MAS↔Postgres, Ingress→Web, etc.) + - Test: Ensure no service breakage + - Est. Time: 1 day + +--- + +## ✅ Abgeschlossene Aufgaben (Chronologisch) + +### Phase 1: Basis-Setup +- [x] **K3S Cluster aufsetzen** – Single-Node auf Hetzner Cloud (49.13.132.245) + - Commit: `initial-setup` (vor Projekt) + - Status: ✅ Läuft + +- [x] **Flux CD Installation** + - SOPS + age Encryption + - GitOps Repository konfigurieren + - Commit: `setup-flux` (vor Projekt) + - Status: ✅ Läuft + +- [x] **Element Server Suite v26.4.0 Deployment** + - Synapse Homeserver (`matrix.axion1337.chat`) + - Matrix Authentication Service (`account.axion1337.chat`) + - Element Web (`axion1337.chat`) + - Element Admin (`admin.axion1337.chat`) + - MatrixRTC/Element Call (`mrtc.axion1337.chat`) + - Commit: `deploy-ess-matrix-stack` + - Status: ✅ Running + +### Phase 2: Core Features +- [x] **7 Custom Element Web Themes** + - aXion1337 Dark, Deep Purple, Discord Dark, Electric Blue, Everforest, Gruvbox, Wal + - Alphabetisch sortiert + - Commit: `add-custom-element-themes` + - Status: ✅ Deployed + +- [x] **Element Desktop Setup Scripts** (Windows/macOS/Linux) + - Auto-Download + Install + Config + - Hosted auf `axion1337.chat/docs/setup/` + - Commits: `add-element-desktop-setup-scripts`, `fix-element-setup-script-hosting` + - Status: ✅ Deployed + +- [x] **Room Policies** + - Message Retention (1d–1y lifecycle) + - Room Publication Rules (allow all) + - Auto-Join Rooms für Onboarding + - Commit: `add-synapse-retention-publication-autojoin` + - Status: ✅ Deployed + +### Phase 3: WebRTC & Medienübertragung +- [x] **TURN Server (coturn) für Video-Calls** + - Domain: `turn.axion1337.chat` + - HMAC-Auth mit Shared Secret + - Ports: 3478/udp, 3478/tcp, 5349/tcp, 49152-65535/udp + - Commit: `implement-turn-server-coturn-for-webrtc-video-calls` + - Status: ✅ Deployed + - Manual: DNS A-Record + Firewall-Ports öffnen (noch erforderlich) + +### Phase 4: Monitoring & Observability +- [x] **Monitoring Stack Integration** + - Alloy (Grafana Agent) als Collector + - Remote Write zu Selendis (10.0.0.3:9090 Prometheus, :3100 Loki) + - kube-state-metrics, node-exporter DaemonSet + - Commits: `integrate-monitoring-alloy-prometheus-loki`, `fix-prometheus-remote-write-docker` + - Status: ✅ Deployed + +### Phase 5: Identity Provider (Authentik) +- [x] **Authentik Stage 1 Deployment** + - HelmRelease v2026.x in `authentik` namespace + - Embedded PostgreSQL + Alloy-compatible + - Cert-Manager für TLS + - Commit: `deploy-authentik-as-identity-provider-for-matrix-stage-1` + - Status: ✅ Deployed + - Manual: Admin-Passwort setzen + OIDC Provider erstellen (erforderlich) + +🔄 **[IN PROGRESS] Authentik Stage 2 – MAS Integration** +- [ ] **MAS Upstream OIDC Konfiguration** + - Client ID/Secret aus Authentik Admin UI kopieren + - `upstream_oauth2_config` in `mas-secret.yaml` einfügen + - `passwords: enabled: false` + - Commit: (pending) + - Status: ⏳ Wartet auf manuelle Authentik-Konfiguration + +### Phase 6: Dokumentation +- [x] **Deployment Guides erstellen** + - 5 Markdown-Dateien in `docs/deployment-guides/` + - Chronologisch geordnet + - Troubleshooting + Best Practices + - Commit: `add-comprehensive-deployment-configuration-documentation` + - Status: ✅ Deployed + +--- + +## 🔄 In Progress / Blocked + +### Authentik Stage 2 – MAS Integration (⏳ Depends on Manual Config) +**Beschreibung**: Authentik OIDC Provider muss manuell im Authentik Admin UI konfiguriert werden, bevor Stage 2 Deployment möglich ist. + +**Schritte**: +1. ✅ Authentik Stage 1 Deployment (done) +2. ⏳ Authentik Admin UI: OIDC Provider erstellen (MANUAL - user action) +3. ⏳ Authentik Admin UI: Application mit Slug `matrix` erstellen (MANUAL - user action) +4. ⏳ Authentik Admin UI: Enrollment Flow mit Invitation Stage (MANUAL - user action) +5. ⏳ Authentik Admin UI: Client ID + Secret kopieren (MANUAL - user action) +6. 📋 MAS `upstream_oauth2_config` mit Client Credentials aktualisieren +7. 📋 `passwords: enabled: false` aktivieren +8. 📋 Commit + Push + +**Blocker**: Manuelle Authentik-Konfiguration (wartet auf Benutzer) + +--- + +## 📋 Backlog (Weitere Aufgaben) + +### Authentik Completion +- [ ] **Finish Authentik Stage 2 – MAS Integration** + - Prerequisites: Authentik OIDC Provider vollständig konfiguriert + - Task: Update `mas-secret.yaml`, enable password login disable + - Commit: `enable-authentik-oidc-integration-in-mas` + - Est. Effort: 30 min (manual + scripted) + +- [ ] **Test End-to-End Login Flow** + - Element Web login → MAS → Authentik → Matrix User Creation + - Create test users via Authentik + - Verify password reset flow + - Commit: (implicit in Stage 2) + - Est. Effort: 20 min + +- [ ] **Create Invite Links für neue User** + - Authentik Admin UI → Invitations → Create + - Set expiry dates (7d) + use limits + - Document procedure + - Est. Effort: 15 min + +### Element Call Enhancement +- [ ] **Element Call Fork für Custom Constraints** + - Repository: Fork `element-hq/element-call` + - Feature: Video/Audio constraints parameter im config + - Include: Bandwidth limiting, resolution limits, frame rate control + - Integration mit Synapse well-known + - Est. Effort: 2–3 days (fork + feature + test) + - Priority: **HIGH** (user feature) + +### Database Hardening +- [ ] **External/Dedicated PostgreSQL Deployment** + - Option 1: CloudNativePG Operator (open-source, auf K3S) + - Option 2: Managed Hetzner Postgres + - Separate aus ESS matrix-stack embedded Postgres + - HA + Replication + - Est. Effort: 1–2 days + - Priority: **HIGH** (reliability) + +- [ ] **Database Backup Strategy** + - Daily automated backups (PgBackRest oder velero) + - Off-site backup storage (S3 / Hetzner Storage Box) + - Monthly verified restores (test restore → verify data integrity) + - Backup + restore documentation + - Est. Effort: 2–3 days + - Priority: **CRITICAL** (disaster recovery) + +- [ ] **Synapse Media PVC Backups** + - Separate backup pipeline für `/data/media_store` PVC + - Reason: Media oft >100GB, sollte nicht im DB-Backup sein + - Velero + Restic für block-level backup + - Est. Effort: 1 day + - Priority: **HIGH** (data preservation) + +### Network Security +- [ ] **NetworkPolicies – K8s-Layer Segmentation** + - Default-Deny Ingress für `matrix` namespace + - Allow rules: + - Ingress → MAS:443 + - Ingress → ElementWeb:443 + - MAS ↔ Synapse:8008 + - Synapse ↔ Postgres:5432 + - Authentik → Postgres:5432 + - Authentik → Loki:3100 (monitoring) + - Egress: Matrix-specific (federation, etc.) + - Est. Effort: 1 day + - Priority: **MEDIUM** (compliance, least-privilege) + +- [ ] **Pod Security Admission (Restricted)** + - Apply to `matrix` & `authentik` namespaces + - Enforce: non-root, no privileged, read-only root fs + - Test: Ensure no chart breakage + - Est. Effort: 1 day + - Priority: **MEDIUM** (hardening) + +### Federation & Access Control +- [ ] **Federation-Allowlist oder Closed Federation** + - Decision: Which servers to federate with? + - If allowlist: explicit `federation_domain_whitelist` + - If closed: `allow_public_rooms_without_join_rules: false` + - Synapse config in `synapse-values.yaml` + - Est. Effort: 4 hours + - Priority: **MEDIUM** (security policy) + +### Moderation & Anti-Abuse +- [ ] **Mjolnir/Draupnir Bot Deployment** + - Open-source moderation bot für Matrix + - Reason: Invitation-based, aber Federation kann Spam bringen + - Auto-ban known bad servers/users + - Spam-detection rules + - HelmChart oder custom Deployment + - Est. Effort: 1–2 days + - Priority: **MEDIUM** (ops safety) + +- [ ] **Content Scanner for Media** + - matrix-content-scanner + ClamAV antivirus + - Scan uploaded media for malware + - Block suspicious files + - Est. Effort: 1–2 days + - Priority: **LOW–MEDIUM** (optional but good practice) + +### Secrets Management +- [ ] **External-Secrets Operator oder SOPS für Flux** + - Current: SOPS with age encryption + - Consideration: External-Secrets for cloud-native (AWS Secrets Manager, Hetzner Vault, etc.) + - OR: Improve SOPS rotation strategy + - Decision needed: Keep SOPS or upgrade? + - Est. Effort: 2–3 days (if switching) + - Priority: **LOW** (current SOPS setup working) + +### Image & Dependency Management +- [ ] **Renovate / Dependabot Setup** + - Auto-update Helm Chart versions + - Auto-update Container Image Tags + - Monitor for security patches + - Est. Effort: 4 hours + - Priority: **MEDIUM** (maintenance) + +- [ ] **Trivy Image Scanning** + - Scan images in Flux HelmReleases for CVEs + - Block deployment if critical CVE found + - CI/CD hook in git workflow + - Est. Effort: 8 hours + - Priority: **LOW–MEDIUM** (security posture) + +- [ ] **Monitor ESS & Element Security Advisories** + - Subscribe to `element-hq` security mailing list + - Monitor `#matrix-community` security channels + - Auto-alerts on new CVEs/patches + - Est. Effort: Ongoing (low maintenance) + - Priority: **MEDIUM** (security awareness) + +### Container Security +- [ ] **Disable automountServiceAccountToken Everywhere** + - Audit all Deployments/StatefulSets + - Disable for: Synapse, ElementWeb, MAS, Postgres, Authentik (where not needed) + - Add `automountServiceAccountToken: false` to spec.template.spec + - Test: Ensure no breakage + - Est. Effort: 4 hours + - Priority: **MEDIUM** (least-privilege) + +--- + +## 🔒 Security Hardening (Host & Cluster Level) + +### Host OS Layer (Ubuntu/Debian) +- [ ] **Hetzner Cloud Firewall** + - Default-Deny inbound + - Allow: 80/443 (HTTP/HTTPS) + - Allow: 22 (SSH) from your IP only (or via WireGuard/Tailscale) + - Status: ✅ Can be done in Hetzner UI + - Est. Effort: 30 min + - Priority: **CRITICAL** (immediate, zero config cost) + +- [ ] **SSH Hardening** + - Disable password auth (key-only) + - Disable root login + - PermitRootLogin: no + - PasswordAuthentication: no + - MaxAuthTries: 3 + - Optional: Change SSH port (cosmetic, reduces log noise) + - Optional: SSH hinter WireGuard/Tailscale (eliminates fail2ban für SSH) + - Est. Effort: 2 hours + - Priority: **HIGH** (immediate) + +- [ ] **unattended-upgrades** + - Enable automatic security updates + - Configure: APT::Periodic::Update-Package-Lists "1"; + - Configure: APT::Periodic::Unattended-Upgrade "1"; + - Configure: APT::Periodic::AutocleanInterval "7"; + - Est. Effort: 30 min + - Priority: **HIGH** (set & forget) + +- [ ] **K3S API Security** + - Current: K3S API listening on :6443 on all interfaces (default) + - Hardening: + - Option 1: Firewall restrict :6443 to localhost only + - Option 2: K3S --bind-address + --advertise-address to WireGuard IP + - Option 3: kubectl access only via jumphost/bastion + - Est. Effort: 2 hours + - Priority: **HIGH** (API is high-value target) + +- [ ] **auditd for File Integrity & Syscall Audit** + - Monitor: /etc, ~/.kube, /var/lib/rancher/k3s + - Audit rules für sensitive file changes + - Low overhead, good signal/noise ratio + - Output to syslog / centralized logging + - Est. Effort: 2 hours + - Priority: **MEDIUM** (forensics + compliance) + +- [ ] **Kernel Hardening (sysctl)** + - Apply hardening recommendations from Lynis + - Key settings: + - kernel.kptr_restrict=2 (hide kernel pointers) + - kernel.dmesg_restrict=1 (restrict dmesg) + - net.ipv4.tcp_syncookies=1 (SYN flood protection) + - net.ipv4.conf.all.rp_filter=1 (reverse path filtering) + - net.ipv4.conf.all.send_redirects=0 + - net.ipv6.conf.all.disable_ipv6=0 (or =1 if no IPv6 needed) + - Persist via /etc/sysctl.d/99-hardening.conf + - Est. Effort: 2 hours + - Priority: **MEDIUM** (defense in depth) + +- [ ] **Lynis Security Baseline** + - Run `lynis audit system` + - Review recommendations + - Implement high-priority findings + - Aim for score >80 + - Re-run quarterly + - Est. Effort: 4 hours (initial) + 1 hour quarterly + - Priority: **MEDIUM** (baseline verification) + +### Cluster Layer (K3S / Kubernetes) +- [ ] **CrowdSec Integration** + - Install CrowdSec agent on host + - Connect to CrowdSec Hub (commercial platform, free tier available) + - Feed auth.log, syslog → CrowdSec for attack detection + - Auto-block IPs via local firewall or Hetzner Firewall API + - Est. Effort: 4 hours + - Priority: **MEDIUM** (proactive threat response) + +- [ ] **Falco Runtime Monitoring** + - Install Falco DaemonSet in K3S + - Monitor: Shell spawning in containers, suspicious syscalls, privilege escalation + - Output to Loki / syslog + - Alert on anomalies + - Est. Effort: 1 day + - Priority: **MEDIUM** (runtime detection) + +--- + +## 🎯 Meilensteine (Milestones) + +| Meilenstein | Beschreibung | Status | ETA | +|------------|-------------|--------|-----| +| **M1: Basis-Setup** | K3S + Flux + ESS deployed | ✅ Done | - | +| **M2: Core Matrix** | Themes, Scripts, Policies | ✅ Done | - | +| **M3: WebRTC & Monitoring** | TURN + Alloy/Prometheus/Loki | ✅ Done | - | +| **M4: Identity Provider** | Authentik Stage 1+2 (pending Stage 2) | 🔄 In Progress | ~1–2 days | +| **M5: Production-Ready** | DB Backups, NetworkPolicies, Security Hardening | 📋 Backlog | ~2–3 weeks | +| **M6: Advanced Features** | Element Call Fork, Content Scanner, Mjolnir | 📋 Backlog | ~4+ weeks | +| **M7: Enterprise-Ready** | Full compliance (DSGVO), HA setup, Disaster Recovery | 🎯 Future | ~8+ weeks | + +--- + +## 📊 Prioritäts-Kategorien + +### 🔴 CRITICAL (do immediately) +- Hetzner Cloud Firewall setup +- Database backup strategy +- SSH hardening + +### 🟠 HIGH (do within 1–2 weeks) +- Authentik Stage 2 completion +- External PostgreSQL migration +- NetworkPolicies +- Element Call fork + +### 🟡 MEDIUM (do within 1 month) +- CrowdSec + Falco +- Mjolnir bot +- Renovate/Trivy +- PSA restricted mode +- Kernel hardening + +### 🟢 LOW (nice-to-have, do if time allows) +- Content scanner (ClamAV) +- External-Secrets upgrade +- SSH port relocation +- Advanced federation rules + +--- + +## 📝 Notes & Decision Points + +### Authentik Stage 2 Blocker +⏳ **Waiting for**: User to manually configure Authentik OIDC Provider in Authentik Admin UI. +- Once done, provide Client ID + Secret +- Then: Commit Stage 2 MAS config + +### Database: CloudNativePG vs. Hetzner Postgres +- **CloudNativePG**: Open-source, runs on K3S, full control +- **Hetzner Postgres**: Managed, backups included, less ops overhead +- **Decision**: Recommend CloudNativePG for now (cost-effective), migrate to Hetzner later if operational overhead too high + +### Federation: Allowlist vs. Closed? +- **Allowlist**: Default federation with all public servers, can be attacked +- **Closed**: Only federate with trusted servers (higher security, lower interop) +- **Decision**: Depends on user intent. For now: allow all, add Mjolnir for abuse protection + +### Security Framework +- **Layers**: Perimeter (Firewall) → Host (SSH, auditd, hardening) → Cluster (NetworkPolicies, PSA, Falco) → App (Rate-limits, Mjolnir) +- **Approach**: Implement incrementally, test after each layer + +--- + +## 🔗 Related Documentation + +- `docs/deployment-guides/README.md` – Overview +- `docs/deployment-guides/01-turn-server-setup.md` – TURN +- `docs/deployment-guides/02-authentik-identity-provider.md` – Authentik (Stage 1 + Stage 2 plan) +- `docs/deployment-guides/03-monitoring-integration.md` – Monitoring +- `docs/deployment-guides/04-element-customization.md` – Themes, Desktop +- `docs/deployment-guides/05-room-policies.md` – Policies + +--- + +**Last Updated**: 2026-05-14 +**Next Review**: 2026-05-21 diff --git a/01-Installation.md b/01-Installation.md new file mode 100644 index 0000000..eef01bb --- /dev/null +++ b/01-Installation.md @@ -0,0 +1,78 @@ +## 🚀 Step-by-Step Installationsanleitung (From Scratch) + +Diese Anleitung geht davon aus, dass du einen frischen Server (Ubuntu/Debian) mit öffentlicher IP hast und deine lokalen Tools (Flux CLI, kubectl, sops, age) installiert sind. + +### Schritt 1: Kubernetes (K3s) auf dem Server installieren +Logge dich per SSH auf deinem Server ein und installiere ein frisches K3s. Wir nutzen K3s mit dem Standard-Traefik-Ingress. + +```bash +# Auf dem Server ausführen: +curl -sfL https://get.k3s.io | sh - + +# Kubeconfig kopieren und Berechtigungen setzen (für den lokalen Zugriff) +sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config +sudo chown $USER ~/.kube/config +``` +*Kopiere dir den Inhalt der `~/.kube/config` auf deinen lokalen Rechner, damit dein lokales `kubectl` den Server steuern kann. Vergiss nicht, die IP `127.0.0.1` in der Datei durch die öffentliche IP deines Servers zu ersetzen.* + +### Schritt 2: Den SOPS-Verschlüsselungs-Key generieren +Damit Flux deine verschlüsselten Passwörter (SMTP, Datenbank) im Cluster lesen kann, braucht es einen privaten Schlüssel. Wir nutzen `age`. + +```bash +# Auf deinem lokalen Rechner ausführen: +# 1. Key generieren +age-keygen -o age.agekey + +# 2. Den Public Key (steht in der Datei) in die .sops.yaml des Repos eintragen! + +# 3. Den Private Key als Secret in den Cluster laden (in den flux-system Namespace) +cat age.agekey | kubectl create secret generic sops-age \ +--namespace=flux-system \ +--from-file=age.agekey=/dev/stdin +``` + +### Schritt 3: Das Git-Repository vorbereiten +Stelle sicher, dass deine GitOps-Struktur gepusht ist und du ein Personal Access Token (PAT) für dein Git-Repository hast (bei GitHub/GitLab). +* Das Token benötigt Lese- und Schreibrechte auf das Repository. + +### Schritt 4: Flux Bootstrap (Der Startschuss) +Das ist der magische Befehl. Er installiert den Flux-Controller in deinem Cluster und verbindet ihn mit deinem Repository. Ab diesem Moment übernimmt Flux das Steuer. + +**Für GitHub:** +```bash +export GITHUB_TOKEN="dein-personal-access-token" +export GITHUB_USER="dein-github-username" + +flux bootstrap github \ + --owner=$GITHUB_USER \ + --repository=dein-repo-name \ + --branch=main \ + --path=prod/gitops/clusters/matrix \ + --personal +``` + +**Für Gitea/GitLab/Generisches Git (wie in deinem Setup scheinbar genutzt):** +```bash +flux bootstrap git \ + --url=https://rohana.axion1337.de/sorb/axion1337.chat-gitops.git \ + --branch=main \ + --path=prod/gitops/clusters/matrix \ + --username=dein-git-user \ + --password=dein-git-token +``` + +### Schritt 5: Zurücklehnen und beobachten +Flux klont jetzt dein Repo, liest die Kustomizations und wendet sie in der richtigen Reihenfolge an (`infra-apps` -> `production-apps`). + +Du kannst den Fortschritt live verfolgen: +```bash +# Zeigt den GitOps-Sync-Status: +flux get kustomizations --watch + +# Zeigt das Helm-Deployment der Element Server Suite: +flux get helmreleases -n matrix --watch + +# Zeigt, wie die Pods hochfahren: +kubectl get pods -n matrix -w +``` +Sobald alle Pods auf `Running` stehen und die Zertifikate über Let's Encrypt validiert wurden (`kubectl get certificate -n matrix`), ist dein Matrix-Stack unter `https://axion1337.chat` erreichbar. \ No newline at end of file diff --git a/deployment-guides/01-turn-server-setup.md b/deployment-guides/01-turn-server-setup.md new file mode 100644 index 0000000..00533ea --- /dev/null +++ b/deployment-guides/01-turn-server-setup.md @@ -0,0 +1,54 @@ +# TURN Server (coturn) für WebRTC Video-Calls + +**Status**: ✅ Vollständig deployed +**Domain**: `turn.axion1337.chat` +**Public IP**: `49.13.132.245` + +## Problem & Lösung + +Videocalls scheitern mit DTLS-Timeout bei Clients hinter NAT/Firewall. **Lösung**: coturn als TURN-Relay. + +## Architektur + +Client A ──→ coturn (turn.axion1337.chat) ──→ Client B + +- **Ports**: 3478/udp, 3478/tcp, 5349/tcp, 49152-65535/udp +- **Auth**: HMAC-basiert mit Shared Secret zwischen coturn + Synapse +- **Deployment**: K3S Deployment mit `hostNetwork: true` + +## Dateien (in `apps/production/`) + +| Datei | Inhalt | +|-------|--------| +| `coturn.yaml` | ConfigMap + Deployment + Service | +| `coturn-secret.yaml` | SOPS-Secret: `TURN_SECRET` | +| `custom-configs/synapse-values.yaml` | TURN URIs + shared secret | +| `matrix-certificates.yaml` | cert-manager Cert für `turn.axion1337.chat` | + +## DNS & Firewall (manuell) + +``` +DNS A-Record: turn.axion1337.chat → 49.13.132.245 + +Firewall (K3S Host): +ufw allow 3478/udp +ufw allow 3478/tcp +ufw allow 5349/tcp +ufw allow 49152:65535/udp +``` + +## Verifikation + +```bash +# Pod läuft? +kubectl get pods -n matrix -l app=coturn + +# Certificate ready? +kubectl get certificate -n matrix | grep turn + +# Extern testen +docker run -it instrumentisto/coturn \ + turnutils_uclient -v -T -u test -w test turn.axion1337.chat +``` + +**Weitere Details**: Siehe vollständige Dokumentation oben. diff --git a/deployment-guides/02-authentik-identity-provider.md b/deployment-guides/02-authentik-identity-provider.md new file mode 100644 index 0000000..98d9599 --- /dev/null +++ b/deployment-guides/02-authentik-identity-provider.md @@ -0,0 +1,45 @@ +# Authentik als Identity Provider für Matrix + +**Status**: ✅ Stage 1 Deployed (Authentik läuft) +**Pending**: Stage 2 (MAS Integration) +**Domain**: `auth.axion1337.chat` + +## Überblick + +Authentik = OIDC Provider für MAS → Zentrales Login + Einladungs-basierte Registrierung. + +## Stage 1: Authentik Deployment + +**Dateien** (in `apps/authentik/`): +- `namespace.yaml`, `helm-repo.yaml`, `authentik-secret.yaml` (SOPS) +- `authentik.yaml` (HelmRelease v2026.x + embedded Postgres) +- `certificate.yaml`, `ingress.yaml` + +**Flux Kustomization**: `clusters/matrix/flux-system/authentik-sync.yaml` + +## Deployment-Schritte + +1. **DNS A-Record**: `auth.axion1337.chat → 49.13.132.245` +2. **Pods hochfahren**: `kubectl get pods -n authentik -w` +3. **Authentik UI**: `https://auth.axion1337.chat/if/flow/initial-setup/` → Admin-Passwort setzen +4. **OIDC Provider**: Admin UI → OIDC Provider erstellen +5. **Application**: Slug `matrix` (wichtig für Issuer URL!) +6. **Redirect URIs**: + - `https://account.axion1337.chat/upstream/callback/01KQDJTR1ZVTG8JQ220F5BNBFZ` + - Post-logout: `https://axion1337.chat` +7. **Client ID + Secret kopieren** + +## Stage 2: MAS Integration + +1. Decrypt: `sops --decrypt --in-place apps/production/custom-configs/mas-secret.yaml` +2. `upstream_oauth2_config` + `passwords-config` Blöcke hinzufügen +3. Encrypt: `sops --encrypt --in-place ...` +4. Commit & Push +5. **WICHTIG**: `passwords: enabled: false` erst nach OIDC-Test! + +## Einladungs-Links + +Authentik Admin → Flows & Stages → Invitations → Create + +--- +**Weitere Details**: Siehe Kapitel 2 in diesem Projekt. diff --git a/deployment-guides/03-monitoring-integration.md b/deployment-guides/03-monitoring-integration.md new file mode 100644 index 0000000..b904036 --- /dev/null +++ b/deployment-guides/03-monitoring-integration.md @@ -0,0 +1,52 @@ +# Monitoring: Alloy → Prometheus/Loki auf Selendis + +**Status**: ✅ Vollständig deployed +**Remote Storage**: `10.0.0.3:9090` (Prometheus), `10.0.0.3:3100` (Loki) + +## Überblick + +Alloy (Grafana Agent) sammelt Metriken & Logs vom K3S-Cluster und schickt sie zu Prometheus/Loki auf Selendis. + +## Komponenten + +| Komponente | Rolle | +|-----------|-------| +| **Alloy** | Metrics & Logs Collector | +| **kube-state-metrics** | Kubernetes Object Status | +| **node-exporter** | Host Metrics (CPU, Memory, Disk) | +| **Prometheus** (Selendis) | Metrics Ingestion | +| **Loki** (Selendis) | Logs Ingestion | + +## Dateien (in `apps/monitoring/`) + +- `namespace.yaml` +- `helm-repos.yaml` (prometheus-community, grafana) +- `kube-state-metrics.yaml`, `node-exporter.yaml` +- `alloy-config.yaml` (River config with scrape targets + remote write) +- `alloy.yaml` (HelmRelease) + +## Scrape Targets + +Alloy scraped: +- **Flux Controllers** (flux-system ns, port 8080) +- **kube-state-metrics** (monitoring:8080) +- **node-exporter** (monitoring:9100) +- **Synapse** (matrix.axion1337.chat:9000) + +Alle Remote Write zu `10.0.0.3:9090` (Prometheus) + `10.0.0.3:3100` (Loki). + +## Troubleshooting + +```bash +# Check Alloy Logs +kubectl logs -n monitoring -l app.kubernetes.io/name=alloy + +# Check Prometheus remote write +curl http://10.0.0.3:9090/api/v1/query?query=up + +# Loki test +curl -s http://10.0.0.3:3100/loki/api/v1/query_range?query=%7B%7D | jq . +``` + +--- +**Weitere Details**: Siehe Kapitel 3. diff --git a/deployment-guides/04-element-customization.md b/deployment-guides/04-element-customization.md new file mode 100644 index 0000000..1564657 --- /dev/null +++ b/deployment-guides/04-element-customization.md @@ -0,0 +1,57 @@ +# Element Web Customization: Themes, Desktop-Apps, Admin + +**Status**: ✅ Vollständig deployed +**Domains**: `axion1337.chat` (Web), `/docs/setup` (Scripts) + +## 1. Custom Themes (7 Stück) + +| Theme | Primärfarbe | +|-------|-----------| +| aXion1337 Dark | `#1a1a1a` | +| Deep Purple | `#6a4c93` | +| Discord Dark | `#2c2f33` | +| Electric Blue | `#0066ff` | +| Everforest Dark Hard | `#1e2326` | +| Gruvbox Dark | `#282828` | +| Wal | `#1e1e1e` | + +**Konfiguration**: `apps/production/custom-configs/element-values.yaml` + +**Anwendung (User)**: Settings → Appearance → Colour theme + +## 2. Desktop-Setup-Scripts + +| System | Datei | +|--------|-------| +| Windows | `element-setup-windows.cmd` (Doppelklick) | +| macOS | `element-setup-macos.command` (Doppelklick) | +| Linux | `element-setup-linux.sh` (bash) | + +**Was die Scripts tun**: +1. config.json erstellen mit `configUrl: "https://axion1337.chat/config.json"` +2. Element installieren (WinGet / Homebrew / apt/dnf/pacman) +3. Element starten (auto-config laden) + +**Download**: `https://axion1337.chat/docs/setup/` + +## 3. Element Admin-Panel + +**URL**: `https://admin.axion1337.chat` + +- User verwalten +- Room durchsuchen +- Server-Statistiken + +**Konfiguration**: `apps/production/element-server-suite.yaml` (ESS Chart) + +## Dateien + +| Datei | Ort | +|-------|-----| +| Custom Themes | `element-values.yaml` ConfigMap | +| Setup-Scripts | `element-web-docs-configmap.yaml` | +| Docs Server | `element-web-docs-server.yaml` (nginx) | +| Ingress | `apex-ingress.yaml` (`/docs/setup/` route) | + +--- +**Weitere Details**: Siehe Kapitel 4. diff --git a/deployment-guides/05-room-policies.md b/deployment-guides/05-room-policies.md new file mode 100644 index 0000000..c377f49 --- /dev/null +++ b/deployment-guides/05-room-policies.md @@ -0,0 +1,82 @@ +# Room Policies: Retention, Publication, Auto-Join + +**Status**: ✅ Vollständig deployed +**Konfiguration**: `apps/production/custom-configs/synapse-values.yaml` + +## 1. Message Retention + +Alte Nachrichten automatisch löschen (Speicher sparen, DSGVO). + +```yaml +retention: + enabled: true + default_policy: + min_lifetime: 1d # Messages bleiben ≥1d + max_lifetime: 1y # Messages gelöscht nach 1 Jahr + +media_retention: + local_media_lifetime: 365d # 1 Jahr + remote_media_lifetime: 90d # 90 Tage + +redaction_retention_period: 7d # Gelöschte Messages noch 7d sichtbar +``` + +## 2. Room Publication Rules + +Kontrollieren welche Rooms im öffentlichen Directory sichtbar sind. + +```yaml +room_list_publication_rules: + - user_id: "*" # Alle User + action: allow # dürfen ihre Rooms publishen +``` + +**Alternativ (restrictiv)**: Nur Admins publishen +```yaml +room_list_publication_rules: + - user_id: "@admin:axion1337.chat" + action: allow + - user_id: "*" + action: deny +``` + +## 3. Auto-Join Rooms + +Neue User automatisch in bestimmte Rooms hinzufügen (Onboarding). + +```yaml +auto_join_rooms: + - "!announcements:axion1337.chat" + - "!rules:axion1337.chat" +auto_join_rooms_for_guests: false # Nur registered users +``` + +**Room ID finden**: Element Settings → Advanced → Room ID + +## Deployment + +```bash +# Edit ConfigMap +kubectl apply -f apps/production/custom-configs/synapse-values.yaml + +# Synapse neustarten +kubectl rollout restart deployment -n matrix matrix-stack-synapse + +# Verify +kubectl logs -n matrix -l app.kubernetes.io/name=synapse | grep -i "retention\|publication" +``` + +## Best Practices + +**Privater Server**: +- max_lifetime: 1y (großzügig) +- action: allow (alle publishen) +- auto_join_rooms: announcements + rules + +**Öffentlicher Server (DSGWR)**: +- max_lifetime: 90d (kurz) +- action: deny (nur Admins) +- auto_join_rooms: [] (keine Zwangs-Rooms) + +--- +**Weitere Details**: Siehe Kapitel 5. diff --git a/deployment-guides/README.md b/deployment-guides/README.md new file mode 100644 index 0000000..991a8f9 --- /dev/null +++ b/deployment-guides/README.md @@ -0,0 +1,92 @@ +# aXion1337.Chat – Deployment & Konfiguration Dokumentation + +Diese Dokumentation beschreibt die Einrichtung und Konfiguration des Matrix-Homeservers für **axion1337.chat** mit Element Server Suite (ESS) v26.4.0 auf K3S mit Flux CD GitOps. + +## 📋 Übersicht Deployment-Reihenfolge + +Die Implementierungen wurden in dieser Reihenfolge durchgeführt. Für neue Setups sollten Sie dieser Abfolge folgen: + +| # | Titel | Datei | Status | Zieldomäne | +|---|-------|-------|--------|-----------| +| 1 | TURN Server für WebRTC Video-Calls | `01-turn-server-setup.md` | ✅ Deployed | `turn.axion1337.chat` | +| 2 | Authentik als Identity Provider | `02-authentik-identity-provider.md` | ✅ Stage 1 Deployed | `auth.axion1337.chat` | +| 3 | Monitoring mit Alloy/Prometheus/Loki | `03-monitoring-integration.md` | ✅ Deployed | lokal (10.0.0.3) | +| 4 | Element Web Anpassung & Desktop-Apps | `04-element-customization.md` | ✅ Deployed | `axion1337.chat` | +| 5 | Room Policies (Retention, Publication, Auto-Join) | `05-room-policies.md` | ✅ Deployed | Matrix Synapse | + +--- + +## 🚀 Quick Start für neue Deployment + +Siehe die einzelnen Dokumentationen für detaillierte Anleitung. + +--- + +## 🏗️ Architektur-Übersicht + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Element Web (Apex) │ +│ axion1337.chat (HTTP/TLS) │ +└──────────────────────┬──────────────────────────────────────┘ + │ + ┌─────────────┼─────────────┐ + │ │ │ + ┌────▼────┐ ┌─────▼──────┐ ┌──▼────────┐ + │ MAS │ │ Well-Known │ │Docs/Setup │ + │account. │ │matrix/* │ │/setup │ + │axion1337 │ │ │ │ │ + └────┬────┘ └────────────┘ └───────────┘ + │ + ┌────▼────────────────┐ + │ Authentik OIDC │ + │ auth.axion1337.chat │ + │ (Identity Provider) │ + └─────────────────────┘ + │ + ┌────▼────────────────┐ + │ Synapse Matrix │ + │ matrix.axion1337.chat│ + │ (Homeserver) │ + └──────────────────────┘ +``` + +--- + +## 🔑 Kritische Werte & Konfigurationen + +### Domains +- **Apex**: `axion1337.chat` (Element Web) +- **Matrix Synapse**: `matrix.axion1337.chat` +- **MAS**: `account.axion1337.chat` +- **Authentik**: `auth.axion1337.chat` +- **TURN Server**: `turn.axion1337.chat` + +### Externe Services +- **K3S Host IP**: `49.13.132.245` +- **Monitoring Host**: `10.0.0.3` (Selendis) + +--- + +## 📚 Dokumente im Detail + +### [01-turn-server-setup.md](01-turn-server-setup.md) +STUN/TURN Server für WebRTC Media Relay (Video-Calls). + +### [02-authentik-identity-provider.md](02-authentik-identity-provider.md) +Authentik als OIDC Provider für Matrix. Registrierung via Einladungs-Links. + +### [03-monitoring-integration.md](03-monitoring-integration.md) +Alloy → Prometheus/Loki Monitoring Integration. + +### [04-element-customization.md](04-element-customization.md) +Custom Themes, Desktop-Setup-Scripts, Element Admin. + +### [05-room-policies.md](05-room-policies.md) +Message Retention, Room Publication, Auto-Join Policies. + +--- + +## 🛠️ Wartung & Troubleshooting + +Alle Dokumentationen enthalten Troubleshooting-Sektionen für häufige Probleme.