Files
GitOps/ARCHITECTURE.md
Janis 20d8146e8c feat: add comprehensive documentation and deployment scripts
- Created new documentation files including TROUBLESHOOTING.md, CONFIGURATION.md, and CONTRIBUTING.md.
- Added deployment guide with automated and manual instructions.
- Introduced new YAML configurations for Kubernetes resources and Helm charts.
- Implemented a deployment script for setting up the GitOps infrastructure.
2026-04-16 19:07:08 +02:00

463 lines
19 KiB
Markdown

# Architecture Overview
This document provides a detailed technical architecture of the GitOps infrastructure.
## System Architecture
### High-Level Design
```
┌─────────────────────────────────────────────────────────────────┐
│ Git Repository │
│ (This GitOps Repository) │
│ - Infrastructure as Code │
│ - Application Manifests │
│ - Configuration & Secrets │
└──────────────────────────────┬──────────────────────────────────┘
┌────────────┴────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Kubernetes │ │ Webhook Triggers│
│ API Server │ │ (GitHub/Gitea) │
└────────┬─────────┘ └──────────────────┘
┌────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────────┐
│ ArgoCD │ │ Kargo │ │ Cert-Manager │
│ Server │ │ Controller
│ └─────────┘ └──────────┘ └──────────────┘
│ Monitors & Syncs
┌──────────────────────────────────────────┐
│ Kubernetes Cluster Resources │
│ - Deployments │
│ - Services │
│ - ConfigMaps & Secrets │
│ - Ingresses │
│ - Custom Resources (Kargo, Rollouts) │
└──────────────────────────────────────────┘
```
## Component Architecture
### 1. Argo CD
**Purpose**: GitOps continuous deployment orchestration
**Components**:
- **API Server**: RESTful API for CLI and UI
- **Repository Server**: Handles Git operations and manifests
- **Application Controller**: Reconciles desired vs. actual state
- **Redis**: Caching and session storage
- **Webhook Receiver**: Listens for Git push events
**Data Flow**:
1. Repository Server pulls latest manifests from Git
2. Application Controller compares desired vs actual state
3. Controller creates/updates/deletes Kubernetes resources
4. API Server provides status and management interface
**Default Configuration**:
- Single replica for development
- Redis for in-memory caching
- Kustomize with Helm support
- Automated pruning and self-healing enabled
### 2. Cert-Manager
**Purpose**: Automated certificate lifecycle management
**Components**:
- **Webhook**: Validates Certificate and Issuer resources
- **Controller**: Watches for certificate requests
- **Issuers**: Define how certificates are obtained (self-signed, Let's Encrypt, etc.)
- **Cert-Manager**: Core reconciliation logic
**Certificate Flow**:
1. Certificate CRD is created in Kubernetes
2. Cert-Manager controller watches for changes
3. Controller contacts issuer to obtain certificate
4. Certificate and private key stored in Kubernetes Secret
5. Controller monitors expiry and auto-renews
**Issuers in Use**:
- Self-signed CA: For internal cluster certificates
- Can be extended with Let's Encrypt (ACME) for public certificates
### 3. Kargo
**Purpose**: Progressive delivery and multi-stage promotion
**Components**:
- **API Server**: Provides REST API and gRPC endpoints
- **Controller**: Reconciles Kargo resources (Promotions, Stages, Warehouses)
- **Webhooks Server**: Internal validating webhooks
- **External Webhooks Server**: Handles external event triggers
- **Management Controller**: Manages Projects and Namespaces
- **Garbage Collector**: Cleans up old Promotions and Freight
**Key Resources**:
- **Warehouse**: Source of deployable artifacts (containers, Helm charts)
- **Freight**: Represents a deployment candidate with specific versions
- **Stage**: Deployment target with promotion rules
- **Promotion**: Represents moving Freight from one Stage to another
**Promotion Flow**:
```
Warehouse (Source)
Freight (Versions)
Stage 1 (Dev)
Promotion to Stage 2 (Staging) → Approval/Analysis
Stage 2 (Staging)
Promotion to Stage 3 (Production) → Analysis/Verification
Stage 3 (Production)
```
### 4. Argo Rollouts
**Purpose**: Progressive deployment strategies (Canary, Blue-Green)
**Components**:
- **Rollouts Controller**: Manages Rollout resources
- **Analysis Engine**: Evaluates deployment health via metrics
- **Progressive Deployment**: Gradually shifts traffic to new version
**Deployment Strategies**:
- **Canary**: Gradually shift traffic (e.g., 5% → 50% → 100%)
- **Blue-Green**: Maintain two active environments, switch traffic
- **Traffic Shifting**: Use service mesh integration (Istio/SMI)
## Data Flow Diagrams
### GitOps Sync Flow
```
┌──────────────┐
│ Git Commit │
└──────┬───────┘
├─→ GitHub Webhook
└─→ Argo CD Webhook Receiver
├─→ Repository Server: Fetch Latest Manifests
├─→ Parse & Validate (Kustomize/Helm)
└─→ Application Controller
├─→ Compare: Git State vs. Cluster State
├─→ Generate Diff
└─→ Apply Changes to Cluster
└─→ Update Application Status
```
### Kargo Promotion Flow
```
┌─────────────────┐
│ New Artifact │
│ Published │
└────────┬────────┘
└─→ Webhook Event
└─→ Kargo API
├─→ Create Freight
└─→ Check Stage Promotions
├─→ Auto-Promotion Enabled?
│ ├─ Yes → Create Promotion
│ └─ No → Wait for Manual Approval
└─→ Kargo Controller Reconciles
├─→ Update Argo CD Applications
├─→ Monitor Health
├─→ Run Analysis (via Argo Rollouts)
└─→ Approve/Reject Next Promotion
```
## Security Architecture
### Multi-Layer Security
```
┌─────────────────────────────────────────────────────────┐
│ 1. Git Repository Security │
│ - SSH key authentication │
│ - Branch protection rules │
│ - Code review requirements │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 2. Secrets Encryption │
│ - Sealed Secrets (bitnami-labs) │
│ - Encrypted at rest in Git │
│ - Decrypted only in cluster │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 3. RBAC (Role-Based Access Control) │
│ - Argo CD projects limit access │
│ - Kargo OIDC integration │
│ - Kubernetes RBAC policies │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 4. Network Security │
│ - Namespace isolation │
│ - Network policies │
│ - TLS for all communications │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ 5. Pod Security │
│ - Non-root users │
│ - Read-only filesystems │
│ - Security contexts │
└─────────────────────────────────────────────────────────┘
```
## State Management
### What State is Stored Where
```
┌────────────────────────────────────────┐
│ Git Repository │
├────────────────────────────────────────┤
│ ✓ Infrastructure manifests │
│ ✓ Application configurations │
│ ✓ Sealed secrets │
│ ✓ Kustomize overlays │
│ ✓ Helm values │
│ ✗ Cluster runtime state │
│ ✗ User credentials (plaintext) │
└────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ Kubernetes Cluster (etcd) │
├────────────────────────────────────────┤
│ ✓ Applied manifests │
│ ✓ Running resource state │
│ ✓ Sealed secrets (encrypted) │
│ ✓ Argo CD applications │
│ ✓ Kargo Promotions/Freight │
│ ✗ Git history │
└────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ External Storage │
├────────────────────────────────────────┤
│ ✓ Container registries │
│ ✓ Helm repositories │
│ ✓ Git repository │
│ ✓ Certificate authority keys │
│ ✗ Sensitive credentials (plaintext) │
└────────────────────────────────────────┘
```
## Scalability Considerations
### Horizontal Scaling
**Argo CD**:
- Multiple application-controller replicas for sharding
- Multiple server replicas for load distribution
- Shared Redis for session management
**Kargo**:
- Multiple controller replicas for resource sharding
- Multiple API server replicas behind load balancer
- Webhook servers scale independently
### Vertical Scaling
**Resource Limits by Component**:
```
Argo CD Controller:
- Requests: 250m CPU, 256Mi Memory
- Limits: 500m CPU, 512Mi Memory
Argo CD Server:
- Requests: 125m CPU, 128Mi Memory
- Limits: 250m CPU, 256Mi Memory
Cert-Manager:
- Requests: 100m CPU, 64Mi Memory
- Limits: 200m CPU, 128Mi Memory
Kargo API:
- Requests: 100m CPU, 128Mi Memory
- Limits: 500m CPU, 512Mi Memory
```
## High Availability Setup
### Production Configuration
```
┌─────────────────────────────────────────────────────────┐
│ Multi-Zone Kubernetes Cluster │
│ (3+ availability zones) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Zone A │ │ Zone B │ │ Zone C │ │
│ │ Master │ │ Master │ │ Master │ │
│ │ Worker │ │ Worker │ │ Worker │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Distributed Storage: │
│ - etcd replicated across zones │
│ - PVC/PV with cross-zone replication │
└─────────────────────────────────────────────────────────┘
```
### Component Redundancy
- Argo CD: 2-3 replicas of each component
- Cert-Manager: 2-3 controller replicas
- Kargo: 2-3 API server replicas, 2-3 controller replicas
- Redis: Redis-HA with 3 sentinels
## Disaster Recovery
### Backup Strategy
```
Daily Backups:
┌─────────────────────────────────────────────────┐
│ Git Repository Commits │
│ (Automatically backed up by Git hosting) │
├─────────────────────────────────────────────────┤
│ Kubernetes etcd │
│ (velero or native etcd backup) │
├─────────────────────────────────────────────────┤
│ Sealing Keys for Sealed Secrets │
│ (Secured storage, NOT in Git) │
└─────────────────────────────────────────────────┘
```
### Recovery Procedures
1. **Git Corruption**: Use distributed copies, restore from backups
2. **etcd Corruption**: Restore from latest backup
3. **Secrets Key Loss**: Complete cluster recreation needed
4. **Application State**: Redeploy from Git (source of truth)
## Monitoring & Observability
### Key Metrics to Monitor
```
Argo CD Metrics:
- Application sync status
- Reconciliation lag
- Git repository fetch rate
- API server response times
Cert-Manager Metrics:
- Certificate renewal status
- Certificate expiry tracking
- Issuer availability
Kargo Metrics:
- Promotion success rate
- Stage health
- Freight warehouse size
- Webhook latency
System Metrics:
- Pod CPU/Memory usage
- Node capacity
- PVC utilization
- Network I/O
```
### Integration Points
- **Prometheus**: Scrape metrics from `/metrics` endpoints
- **Grafana**: Visualize metrics and dashboards
- **AlertManager**: Send alerts for critical issues
- **Logs**: Aggregate logs from all components
## Integration with External Systems
### Git Integration
```
Supported Git Providers:
├─ GitHub (via SSH)
├─ GitLab (via SSH)
├─ Gitea (via SSH)
└─ Self-hosted Git
Authentication:
├─ SSH keys (primary)
├─ HTTPS with personal tokens
└─ SSH agent forwarding
```
### CI/CD Pipeline Integration
```
Build Pipeline → Container Registry → Webhook → Kargo
Create Freight
Promote to Stages
Update Argo CD Applications
```
## Network Architecture
### Kubernetes Network Design
```
┌────────────────────────────────────────────────────┐
│ Cluster Network │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Namespace │ │ Namespace │ │
│ │ argocd │ │ cert-manager │ │
│ │ │ │ │ │
│ │ Service: │ │ Service: │ │
│ │ 10.0.0.0/24 │ │ 10.0.1.0/24 │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ Pod CIDR: 10.1.0.0/16 │
│ Service CIDR: 10.0.0.0/12 │
│ │
│ DNS: CoreDNS for internal resolution │
│ Ingress: Optional external access │
└────────────────────────────────────────────────────┘
```
---
**Last Updated**: 2026-04-16
**Version**: 1.0