Files
GitOps/ARCHITECTURE.md
T
Janis 20d8146e8c feat: add comprehensive documentation and deployment scripts
- Created new documentation files including TROUBLESHOOTING.md, CONFIGURATION.md, and CONTRIBUTING.md.
- Added deployment guide with automated and manual instructions.
- Introduced new YAML configurations for Kubernetes resources and Helm charts.
- Implemented a deployment script for setting up the GitOps infrastructure.
2026-04-16 19:07:08 +02:00

19 KiB

Architecture Overview

This document provides a detailed technical architecture of the GitOps infrastructure.

System Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                        Git Repository                           │
│                   (This GitOps Repository)                      │
│  - Infrastructure as Code                                       │
│  - Application Manifests                                        │
│  - Configuration & Secrets                                      │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                  ┌────────────┴────────────┐
                  ▼                         ▼
        ┌──────────────────┐      ┌──────────────────┐
        │   Kubernetes     │      │  Webhook Triggers│
        │   API Server     │      │   (GitHub/Gitea) │
        └────────┬─────────┘      └──────────────────┘
                 │
    ┌────────────┼────────────────┐
    │            │                │
    ▼            ▼                ▼
┌─────────┐ ┌──────────┐  ┌──────────────┐
│ ArgoCD  │ │ Kargo    │  │ Cert-Manager │
│ Server  │ │ Controller
 │  └─────────┘ └──────────┘  └──────────────┘
    │
    │ Monitors & Syncs
    │
    ▼
┌──────────────────────────────────────────┐
│       Kubernetes Cluster Resources       │
│  - Deployments                           │
│  - Services                              │
│  - ConfigMaps & Secrets                  │
│  - Ingresses                             │
│  - Custom Resources (Kargo, Rollouts)    │
└──────────────────────────────────────────┘

Component Architecture

1. Argo CD

Purpose: GitOps continuous deployment orchestration

Components:

  • API Server: RESTful API for CLI and UI
  • Repository Server: Handles Git operations and manifests
  • Application Controller: Reconciles desired vs. actual state
  • Redis: Caching and session storage
  • Webhook Receiver: Listens for Git push events

Data Flow:

  1. Repository Server pulls latest manifests from Git
  2. Application Controller compares desired vs actual state
  3. Controller creates/updates/deletes Kubernetes resources
  4. API Server provides status and management interface

Default Configuration:

  • Single replica for development
  • Redis for in-memory caching
  • Kustomize with Helm support
  • Automated pruning and self-healing enabled

2. Cert-Manager

Purpose: Automated certificate lifecycle management

Components:

  • Webhook: Validates Certificate and Issuer resources
  • Controller: Watches for certificate requests
  • Issuers: Define how certificates are obtained (self-signed, Let's Encrypt, etc.)
  • Cert-Manager: Core reconciliation logic

Certificate Flow:

  1. Certificate CRD is created in Kubernetes
  2. Cert-Manager controller watches for changes
  3. Controller contacts issuer to obtain certificate
  4. Certificate and private key stored in Kubernetes Secret
  5. Controller monitors expiry and auto-renews

Issuers in Use:

  • Self-signed CA: For internal cluster certificates
  • Can be extended with Let's Encrypt (ACME) for public certificates

3. Kargo

Purpose: Progressive delivery and multi-stage promotion

Components:

  • API Server: Provides REST API and gRPC endpoints
  • Controller: Reconciles Kargo resources (Promotions, Stages, Warehouses)
  • Webhooks Server: Internal validating webhooks
  • External Webhooks Server: Handles external event triggers
  • Management Controller: Manages Projects and Namespaces
  • Garbage Collector: Cleans up old Promotions and Freight

Key Resources:

  • Warehouse: Source of deployable artifacts (containers, Helm charts)
  • Freight: Represents a deployment candidate with specific versions
  • Stage: Deployment target with promotion rules
  • Promotion: Represents moving Freight from one Stage to another

Promotion Flow:

Warehouse (Source)
    ↓
Freight (Versions)
    ↓
Stage 1 (Dev)
    ↓
Promotion to Stage 2 (Staging) → Approval/Analysis
    ↓
Stage 2 (Staging)
    ↓
Promotion to Stage 3 (Production) → Analysis/Verification
    ↓
Stage 3 (Production)

4. Argo Rollouts

Purpose: Progressive deployment strategies (Canary, Blue-Green)

Components:

  • Rollouts Controller: Manages Rollout resources
  • Analysis Engine: Evaluates deployment health via metrics
  • Progressive Deployment: Gradually shifts traffic to new version

Deployment Strategies:

  • Canary: Gradually shift traffic (e.g., 5% → 50% → 100%)
  • Blue-Green: Maintain two active environments, switch traffic
  • Traffic Shifting: Use service mesh integration (Istio/SMI)

Data Flow Diagrams

GitOps Sync Flow

┌──────────────┐
│  Git Commit  │
└──────┬───────┘
       │
       ├─→ GitHub Webhook
       │
       └─→ Argo CD Webhook Receiver
           │
           ├─→ Repository Server: Fetch Latest Manifests
           │
           ├─→ Parse & Validate (Kustomize/Helm)
           │
           └─→ Application Controller
               │
               ├─→ Compare: Git State vs. Cluster State
               │
               ├─→ Generate Diff
               │
               └─→ Apply Changes to Cluster
                   │
                   └─→ Update Application Status

Kargo Promotion Flow

┌─────────────────┐
│ New Artifact    │
│ Published       │
└────────┬────────┘
         │
         └─→ Webhook Event
            │
            └─→ Kargo API
               │
               ├─→ Create Freight
               │
               └─→ Check Stage Promotions
                   │
                   ├─→ Auto-Promotion Enabled?
                   │   ├─ Yes → Create Promotion
                   │   └─ No → Wait for Manual Approval
                   │
                   └─→ Kargo Controller Reconciles
                       │
                       ├─→ Update Argo CD Applications
                       │
                       ├─→ Monitor Health
                       │
                       ├─→ Run Analysis (via Argo Rollouts)
                       │
                       └─→ Approve/Reject Next Promotion

Security Architecture

Multi-Layer Security

┌─────────────────────────────────────────────────────────┐
│ 1. Git Repository Security                              │
│    - SSH key authentication                              │
│    - Branch protection rules                             │
│    - Code review requirements                            │
└─────────────────────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────────────────────┐
│ 2. Secrets Encryption                                   │
│    - Sealed Secrets (bitnami-labs)                       │
│    - Encrypted at rest in Git                            │
│    - Decrypted only in cluster                           │
└─────────────────────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────────────────────┐
│ 3. RBAC (Role-Based Access Control)                     │
│    - Argo CD projects limit access                       │
│    - Kargo OIDC integration                              │
│    - Kubernetes RBAC policies                            │
└─────────────────────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────────────────────┐
│ 4. Network Security                                     │
│    - Namespace isolation                                │
│    - Network policies                                   │
│    - TLS for all communications                          │
└─────────────────────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────────────────────┐
│ 5. Pod Security                                         │
│    - Non-root users                                     │
│    - Read-only filesystems                              │
│    - Security contexts                                  │
└─────────────────────────────────────────────────────────┘

State Management

What State is Stored Where

┌────────────────────────────────────────┐
│ Git Repository                         │
├────────────────────────────────────────┤
│ ✓ Infrastructure manifests             │
│ ✓ Application configurations           │
│ ✓ Sealed secrets                       │
│ ✓ Kustomize overlays                   │
│ ✓ Helm values                          │
│ ✗ Cluster runtime state                │
│ ✗ User credentials (plaintext)         │
└────────────────────────────────────────┘

┌────────────────────────────────────────┐
│ Kubernetes Cluster (etcd)              │
├────────────────────────────────────────┤
│ ✓ Applied manifests                    │
│ ✓ Running resource state                │
│ ✓ Sealed secrets (encrypted)           │
│ ✓ Argo CD applications                 │
│ ✓ Kargo Promotions/Freight             │
│ ✗ Git history                          │
└────────────────────────────────────────┘

┌────────────────────────────────────────┐
│ External Storage                       │
├────────────────────────────────────────┤
│ ✓ Container registries                 │
│ ✓ Helm repositories                    │
│ ✓ Git repository                       │
│ ✓ Certificate authority keys           │
│ ✗ Sensitive credentials (plaintext)    │
└────────────────────────────────────────┘

Scalability Considerations

Horizontal Scaling

Argo CD:

  • Multiple application-controller replicas for sharding
  • Multiple server replicas for load distribution
  • Shared Redis for session management

Kargo:

  • Multiple controller replicas for resource sharding
  • Multiple API server replicas behind load balancer
  • Webhook servers scale independently

Vertical Scaling

Resource Limits by Component:

Argo CD Controller:
  - Requests: 250m CPU, 256Mi Memory
  - Limits: 500m CPU, 512Mi Memory

Argo CD Server:
  - Requests: 125m CPU, 128Mi Memory
  - Limits: 250m CPU, 256Mi Memory

Cert-Manager:
  - Requests: 100m CPU, 64Mi Memory
  - Limits: 200m CPU, 128Mi Memory

Kargo API:
  - Requests: 100m CPU, 128Mi Memory
  - Limits: 500m CPU, 512Mi Memory

High Availability Setup

Production Configuration

┌─────────────────────────────────────────────────────────┐
│ Multi-Zone Kubernetes Cluster                           │
│ (3+ availability zones)                                 │
│                                                         │
│ ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│ │ Zone A   │    │ Zone B   │    │ Zone C   │          │
│ │ Master   │    │ Master   │    │ Master   │          │
│ │ Worker   │    │ Worker   │    │ Worker   │          │
│ └──────────┘    └──────────┘    └──────────┘          │
│                                                         │
│ Distributed Storage:                                    │
│ - etcd replicated across zones                         │
│ - PVC/PV with cross-zone replication                   │
└─────────────────────────────────────────────────────────┘

Component Redundancy

  • Argo CD: 2-3 replicas of each component
  • Cert-Manager: 2-3 controller replicas
  • Kargo: 2-3 API server replicas, 2-3 controller replicas
  • Redis: Redis-HA with 3 sentinels

Disaster Recovery

Backup Strategy

Daily Backups:
┌─────────────────────────────────────────────────┐
│ Git Repository Commits                          │
│ (Automatically backed up by Git hosting)         │
├─────────────────────────────────────────────────┤
│ Kubernetes etcd                                 │
│ (velero or native etcd backup)                  │
├─────────────────────────────────────────────────┤
│ Sealing Keys for Sealed Secrets                 │
│ (Secured storage, NOT in Git)                   │
└─────────────────────────────────────────────────┘

Recovery Procedures

  1. Git Corruption: Use distributed copies, restore from backups
  2. etcd Corruption: Restore from latest backup
  3. Secrets Key Loss: Complete cluster recreation needed
  4. Application State: Redeploy from Git (source of truth)

Monitoring & Observability

Key Metrics to Monitor

Argo CD Metrics:
- Application sync status
- Reconciliation lag
- Git repository fetch rate
- API server response times

Cert-Manager Metrics:
- Certificate renewal status
- Certificate expiry tracking
- Issuer availability

Kargo Metrics:
- Promotion success rate
- Stage health
- Freight warehouse size
- Webhook latency

System Metrics:
- Pod CPU/Memory usage
- Node capacity
- PVC utilization
- Network I/O

Integration Points

  • Prometheus: Scrape metrics from /metrics endpoints
  • Grafana: Visualize metrics and dashboards
  • AlertManager: Send alerts for critical issues
  • Logs: Aggregate logs from all components

Integration with External Systems

Git Integration

Supported Git Providers:
├─ GitHub (via SSH)
├─ GitLab (via SSH)
├─ Gitea (via SSH)
└─ Self-hosted Git

Authentication:
├─ SSH keys (primary)
├─ HTTPS with personal tokens
└─ SSH agent forwarding

CI/CD Pipeline Integration

Build Pipeline → Container Registry → Webhook → Kargo
                                           ↓
                                    Create Freight
                                           ↓
                                    Promote to Stages
                                           ↓
                                    Update Argo CD Applications

Network Architecture

Kubernetes Network Design

┌────────────────────────────────────────────────────┐
│ Cluster Network                                    │
│                                                   │
│ ┌──────────────┐  ┌──────────────┐               │
│ │ Namespace    │  │ Namespace    │               │
│ │ argocd       │  │ cert-manager │               │
│ │              │  │              │               │
│ │ Service:     │  │ Service:     │               │
│ │ 10.0.0.0/24  │  │ 10.0.1.0/24  │               │
│ └──────────────┘  └──────────────┘               │
│                                                   │
│ Pod CIDR: 10.1.0.0/16                            │
│ Service CIDR: 10.0.0.0/12                        │
│                                                   │
│ DNS: CoreDNS for internal resolution              │
│ Ingress: Optional external access                 │
└────────────────────────────────────────────────────┘

Last Updated: 2026-04-16 Version: 1.0