20d8146e8c
- Created new documentation files including TROUBLESHOOTING.md, CONFIGURATION.md, and CONTRIBUTING.md. - Added deployment guide with automated and manual instructions. - Introduced new YAML configurations for Kubernetes resources and Helm charts. - Implemented a deployment script for setting up the GitOps infrastructure.
19 KiB
19 KiB
Architecture Overview
This document provides a detailed technical architecture of the GitOps infrastructure.
System Architecture
High-Level Design
┌─────────────────────────────────────────────────────────────────┐
│ Git Repository │
│ (This GitOps Repository) │
│ - Infrastructure as Code │
│ - Application Manifests │
│ - Configuration & Secrets │
└──────────────────────────────┬──────────────────────────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Kubernetes │ │ Webhook Triggers│
│ API Server │ │ (GitHub/Gitea) │
└────────┬─────────┘ └──────────────────┘
│
┌────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────────┐
│ ArgoCD │ │ Kargo │ │ Cert-Manager │
│ Server │ │ Controller
│ └─────────┘ └──────────┘ └──────────────┘
│
│ Monitors & Syncs
│
▼
┌──────────────────────────────────────────┐
│ Kubernetes Cluster Resources │
│ - Deployments │
│ - Services │
│ - ConfigMaps & Secrets │
│ - Ingresses │
│ - Custom Resources (Kargo, Rollouts) │
└──────────────────────────────────────────┘
Component Architecture
1. Argo CD
Purpose: GitOps continuous deployment orchestration
Components:
- API Server: RESTful API for CLI and UI
- Repository Server: Handles Git operations and manifests
- Application Controller: Reconciles desired vs. actual state
- Redis: Caching and session storage
- Webhook Receiver: Listens for Git push events
Data Flow:
- Repository Server pulls latest manifests from Git
- Application Controller compares desired vs actual state
- Controller creates/updates/deletes Kubernetes resources
- API Server provides status and management interface
Default Configuration:
- Single replica for development
- Redis for in-memory caching
- Kustomize with Helm support
- Automated pruning and self-healing enabled
2. Cert-Manager
Purpose: Automated certificate lifecycle management
Components:
- Webhook: Validates Certificate and Issuer resources
- Controller: Watches for certificate requests
- Issuers: Define how certificates are obtained (self-signed, Let's Encrypt, etc.)
- Cert-Manager: Core reconciliation logic
Certificate Flow:
- Certificate CRD is created in Kubernetes
- Cert-Manager controller watches for changes
- Controller contacts issuer to obtain certificate
- Certificate and private key stored in Kubernetes Secret
- Controller monitors expiry and auto-renews
Issuers in Use:
- Self-signed CA: For internal cluster certificates
- Can be extended with Let's Encrypt (ACME) for public certificates
3. Kargo
Purpose: Progressive delivery and multi-stage promotion
Components:
- API Server: Provides REST API and gRPC endpoints
- Controller: Reconciles Kargo resources (Promotions, Stages, Warehouses)
- Webhooks Server: Internal validating webhooks
- External Webhooks Server: Handles external event triggers
- Management Controller: Manages Projects and Namespaces
- Garbage Collector: Cleans up old Promotions and Freight
Key Resources:
- Warehouse: Source of deployable artifacts (containers, Helm charts)
- Freight: Represents a deployment candidate with specific versions
- Stage: Deployment target with promotion rules
- Promotion: Represents moving Freight from one Stage to another
Promotion Flow:
Warehouse (Source)
↓
Freight (Versions)
↓
Stage 1 (Dev)
↓
Promotion to Stage 2 (Staging) → Approval/Analysis
↓
Stage 2 (Staging)
↓
Promotion to Stage 3 (Production) → Analysis/Verification
↓
Stage 3 (Production)
4. Argo Rollouts
Purpose: Progressive deployment strategies (Canary, Blue-Green)
Components:
- Rollouts Controller: Manages Rollout resources
- Analysis Engine: Evaluates deployment health via metrics
- Progressive Deployment: Gradually shifts traffic to new version
Deployment Strategies:
- Canary: Gradually shift traffic (e.g., 5% → 50% → 100%)
- Blue-Green: Maintain two active environments, switch traffic
- Traffic Shifting: Use service mesh integration (Istio/SMI)
Data Flow Diagrams
GitOps Sync Flow
┌──────────────┐
│ Git Commit │
└──────┬───────┘
│
├─→ GitHub Webhook
│
└─→ Argo CD Webhook Receiver
│
├─→ Repository Server: Fetch Latest Manifests
│
├─→ Parse & Validate (Kustomize/Helm)
│
└─→ Application Controller
│
├─→ Compare: Git State vs. Cluster State
│
├─→ Generate Diff
│
└─→ Apply Changes to Cluster
│
└─→ Update Application Status
Kargo Promotion Flow
┌─────────────────┐
│ New Artifact │
│ Published │
└────────┬────────┘
│
└─→ Webhook Event
│
└─→ Kargo API
│
├─→ Create Freight
│
└─→ Check Stage Promotions
│
├─→ Auto-Promotion Enabled?
│ ├─ Yes → Create Promotion
│ └─ No → Wait for Manual Approval
│
└─→ Kargo Controller Reconciles
│
├─→ Update Argo CD Applications
│
├─→ Monitor Health
│
├─→ Run Analysis (via Argo Rollouts)
│
└─→ Approve/Reject Next Promotion
Security Architecture
Multi-Layer Security
┌─────────────────────────────────────────────────────────┐
│ 1. Git Repository Security │
│ - SSH key authentication │
│ - Branch protection rules │
│ - Code review requirements │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ 2. Secrets Encryption │
│ - Sealed Secrets (bitnami-labs) │
│ - Encrypted at rest in Git │
│ - Decrypted only in cluster │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ 3. RBAC (Role-Based Access Control) │
│ - Argo CD projects limit access │
│ - Kargo OIDC integration │
│ - Kubernetes RBAC policies │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ 4. Network Security │
│ - Namespace isolation │
│ - Network policies │
│ - TLS for all communications │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ 5. Pod Security │
│ - Non-root users │
│ - Read-only filesystems │
│ - Security contexts │
└─────────────────────────────────────────────────────────┘
State Management
What State is Stored Where
┌────────────────────────────────────────┐
│ Git Repository │
├────────────────────────────────────────┤
│ ✓ Infrastructure manifests │
│ ✓ Application configurations │
│ ✓ Sealed secrets │
│ ✓ Kustomize overlays │
│ ✓ Helm values │
│ ✗ Cluster runtime state │
│ ✗ User credentials (plaintext) │
└────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ Kubernetes Cluster (etcd) │
├────────────────────────────────────────┤
│ ✓ Applied manifests │
│ ✓ Running resource state │
│ ✓ Sealed secrets (encrypted) │
│ ✓ Argo CD applications │
│ ✓ Kargo Promotions/Freight │
│ ✗ Git history │
└────────────────────────────────────────┘
┌────────────────────────────────────────┐
│ External Storage │
├────────────────────────────────────────┤
│ ✓ Container registries │
│ ✓ Helm repositories │
│ ✓ Git repository │
│ ✓ Certificate authority keys │
│ ✗ Sensitive credentials (plaintext) │
└────────────────────────────────────────┘
Scalability Considerations
Horizontal Scaling
Argo CD:
- Multiple application-controller replicas for sharding
- Multiple server replicas for load distribution
- Shared Redis for session management
Kargo:
- Multiple controller replicas for resource sharding
- Multiple API server replicas behind load balancer
- Webhook servers scale independently
Vertical Scaling
Resource Limits by Component:
Argo CD Controller:
- Requests: 250m CPU, 256Mi Memory
- Limits: 500m CPU, 512Mi Memory
Argo CD Server:
- Requests: 125m CPU, 128Mi Memory
- Limits: 250m CPU, 256Mi Memory
Cert-Manager:
- Requests: 100m CPU, 64Mi Memory
- Limits: 200m CPU, 128Mi Memory
Kargo API:
- Requests: 100m CPU, 128Mi Memory
- Limits: 500m CPU, 512Mi Memory
High Availability Setup
Production Configuration
┌─────────────────────────────────────────────────────────┐
│ Multi-Zone Kubernetes Cluster │
│ (3+ availability zones) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Zone A │ │ Zone B │ │ Zone C │ │
│ │ Master │ │ Master │ │ Master │ │
│ │ Worker │ │ Worker │ │ Worker │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Distributed Storage: │
│ - etcd replicated across zones │
│ - PVC/PV with cross-zone replication │
└─────────────────────────────────────────────────────────┘
Component Redundancy
- Argo CD: 2-3 replicas of each component
- Cert-Manager: 2-3 controller replicas
- Kargo: 2-3 API server replicas, 2-3 controller replicas
- Redis: Redis-HA with 3 sentinels
Disaster Recovery
Backup Strategy
Daily Backups:
┌─────────────────────────────────────────────────┐
│ Git Repository Commits │
│ (Automatically backed up by Git hosting) │
├─────────────────────────────────────────────────┤
│ Kubernetes etcd │
│ (velero or native etcd backup) │
├─────────────────────────────────────────────────┤
│ Sealing Keys for Sealed Secrets │
│ (Secured storage, NOT in Git) │
└─────────────────────────────────────────────────┘
Recovery Procedures
- Git Corruption: Use distributed copies, restore from backups
- etcd Corruption: Restore from latest backup
- Secrets Key Loss: Complete cluster recreation needed
- Application State: Redeploy from Git (source of truth)
Monitoring & Observability
Key Metrics to Monitor
Argo CD Metrics:
- Application sync status
- Reconciliation lag
- Git repository fetch rate
- API server response times
Cert-Manager Metrics:
- Certificate renewal status
- Certificate expiry tracking
- Issuer availability
Kargo Metrics:
- Promotion success rate
- Stage health
- Freight warehouse size
- Webhook latency
System Metrics:
- Pod CPU/Memory usage
- Node capacity
- PVC utilization
- Network I/O
Integration Points
- Prometheus: Scrape metrics from
/metricsendpoints - Grafana: Visualize metrics and dashboards
- AlertManager: Send alerts for critical issues
- Logs: Aggregate logs from all components
Integration with External Systems
Git Integration
Supported Git Providers:
├─ GitHub (via SSH)
├─ GitLab (via SSH)
├─ Gitea (via SSH)
└─ Self-hosted Git
Authentication:
├─ SSH keys (primary)
├─ HTTPS with personal tokens
└─ SSH agent forwarding
CI/CD Pipeline Integration
Build Pipeline → Container Registry → Webhook → Kargo
↓
Create Freight
↓
Promote to Stages
↓
Update Argo CD Applications
Network Architecture
Kubernetes Network Design
┌────────────────────────────────────────────────────┐
│ Cluster Network │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Namespace │ │ Namespace │ │
│ │ argocd │ │ cert-manager │ │
│ │ │ │ │ │
│ │ Service: │ │ Service: │ │
│ │ 10.0.0.0/24 │ │ 10.0.1.0/24 │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ Pod CIDR: 10.1.0.0/16 │
│ Service CIDR: 10.0.0.0/12 │
│ │
│ DNS: CoreDNS for internal resolution │
│ Ingress: Optional external access │
└────────────────────────────────────────────────────┘
Last Updated: 2026-04-16 Version: 1.0