20d8146e8c
- Created new documentation files including TROUBLESHOOTING.md, CONFIGURATION.md, and CONTRIBUTING.md. - Added deployment guide with automated and manual instructions. - Introduced new YAML configurations for Kubernetes resources and Helm charts. - Implemented a deployment script for setting up the GitOps infrastructure.
463 lines
19 KiB
Markdown
463 lines
19 KiB
Markdown
# Architecture Overview
|
|
|
|
This document provides a detailed technical architecture of the GitOps infrastructure.
|
|
|
|
## System Architecture
|
|
|
|
### High-Level Design
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Git Repository │
|
|
│ (This GitOps Repository) │
|
|
│ - Infrastructure as Code │
|
|
│ - Application Manifests │
|
|
│ - Configuration & Secrets │
|
|
└──────────────────────────────┬──────────────────────────────────┘
|
|
│
|
|
┌────────────┴────────────┐
|
|
▼ ▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Kubernetes │ │ Webhook Triggers│
|
|
│ API Server │ │ (GitHub/Gitea) │
|
|
└────────┬─────────┘ └──────────────────┘
|
|
│
|
|
┌────────────┼────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────┐ ┌──────────┐ ┌──────────────┐
|
|
│ ArgoCD │ │ Kargo │ │ Cert-Manager │
|
|
│ Server │ │ Controller
|
|
│ └─────────┘ └──────────┘ └──────────────┘
|
|
│
|
|
│ Monitors & Syncs
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────┐
|
|
│ Kubernetes Cluster Resources │
|
|
│ - Deployments │
|
|
│ - Services │
|
|
│ - ConfigMaps & Secrets │
|
|
│ - Ingresses │
|
|
│ - Custom Resources (Kargo, Rollouts) │
|
|
└──────────────────────────────────────────┘
|
|
```
|
|
|
|
## Component Architecture
|
|
|
|
### 1. Argo CD
|
|
|
|
**Purpose**: GitOps continuous deployment orchestration
|
|
|
|
**Components**:
|
|
- **API Server**: RESTful API for CLI and UI
|
|
- **Repository Server**: Handles Git operations and manifests
|
|
- **Application Controller**: Reconciles desired vs. actual state
|
|
- **Redis**: Caching and session storage
|
|
- **Webhook Receiver**: Listens for Git push events
|
|
|
|
**Data Flow**:
|
|
1. Repository Server pulls latest manifests from Git
|
|
2. Application Controller compares desired vs actual state
|
|
3. Controller creates/updates/deletes Kubernetes resources
|
|
4. API Server provides status and management interface
|
|
|
|
**Default Configuration**:
|
|
- Single replica for development
|
|
- Redis for in-memory caching
|
|
- Kustomize with Helm support
|
|
- Automated pruning and self-healing enabled
|
|
|
|
### 2. Cert-Manager
|
|
|
|
**Purpose**: Automated certificate lifecycle management
|
|
|
|
**Components**:
|
|
- **Webhook**: Validates Certificate and Issuer resources
|
|
- **Controller**: Watches for certificate requests
|
|
- **Issuers**: Define how certificates are obtained (self-signed, Let's Encrypt, etc.)
|
|
- **Cert-Manager**: Core reconciliation logic
|
|
|
|
**Certificate Flow**:
|
|
1. Certificate CRD is created in Kubernetes
|
|
2. Cert-Manager controller watches for changes
|
|
3. Controller contacts issuer to obtain certificate
|
|
4. Certificate and private key stored in Kubernetes Secret
|
|
5. Controller monitors expiry and auto-renews
|
|
|
|
**Issuers in Use**:
|
|
- Self-signed CA: For internal cluster certificates
|
|
- Can be extended with Let's Encrypt (ACME) for public certificates
|
|
|
|
### 3. Kargo
|
|
|
|
**Purpose**: Progressive delivery and multi-stage promotion
|
|
|
|
**Components**:
|
|
- **API Server**: Provides REST API and gRPC endpoints
|
|
- **Controller**: Reconciles Kargo resources (Promotions, Stages, Warehouses)
|
|
- **Webhooks Server**: Internal validating webhooks
|
|
- **External Webhooks Server**: Handles external event triggers
|
|
- **Management Controller**: Manages Projects and Namespaces
|
|
- **Garbage Collector**: Cleans up old Promotions and Freight
|
|
|
|
**Key Resources**:
|
|
- **Warehouse**: Source of deployable artifacts (containers, Helm charts)
|
|
- **Freight**: Represents a deployment candidate with specific versions
|
|
- **Stage**: Deployment target with promotion rules
|
|
- **Promotion**: Represents moving Freight from one Stage to another
|
|
|
|
**Promotion Flow**:
|
|
```
|
|
Warehouse (Source)
|
|
↓
|
|
Freight (Versions)
|
|
↓
|
|
Stage 1 (Dev)
|
|
↓
|
|
Promotion to Stage 2 (Staging) → Approval/Analysis
|
|
↓
|
|
Stage 2 (Staging)
|
|
↓
|
|
Promotion to Stage 3 (Production) → Analysis/Verification
|
|
↓
|
|
Stage 3 (Production)
|
|
```
|
|
|
|
### 4. Argo Rollouts
|
|
|
|
**Purpose**: Progressive deployment strategies (Canary, Blue-Green)
|
|
|
|
**Components**:
|
|
- **Rollouts Controller**: Manages Rollout resources
|
|
- **Analysis Engine**: Evaluates deployment health via metrics
|
|
- **Progressive Deployment**: Gradually shifts traffic to new version
|
|
|
|
**Deployment Strategies**:
|
|
- **Canary**: Gradually shift traffic (e.g., 5% → 50% → 100%)
|
|
- **Blue-Green**: Maintain two active environments, switch traffic
|
|
- **Traffic Shifting**: Use service mesh integration (Istio/SMI)
|
|
|
|
## Data Flow Diagrams
|
|
|
|
### GitOps Sync Flow
|
|
|
|
```
|
|
┌──────────────┐
|
|
│ Git Commit │
|
|
└──────┬───────┘
|
|
│
|
|
├─→ GitHub Webhook
|
|
│
|
|
└─→ Argo CD Webhook Receiver
|
|
│
|
|
├─→ Repository Server: Fetch Latest Manifests
|
|
│
|
|
├─→ Parse & Validate (Kustomize/Helm)
|
|
│
|
|
└─→ Application Controller
|
|
│
|
|
├─→ Compare: Git State vs. Cluster State
|
|
│
|
|
├─→ Generate Diff
|
|
│
|
|
└─→ Apply Changes to Cluster
|
|
│
|
|
└─→ Update Application Status
|
|
```
|
|
|
|
### Kargo Promotion Flow
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ New Artifact │
|
|
│ Published │
|
|
└────────┬────────┘
|
|
│
|
|
└─→ Webhook Event
|
|
│
|
|
└─→ Kargo API
|
|
│
|
|
├─→ Create Freight
|
|
│
|
|
└─→ Check Stage Promotions
|
|
│
|
|
├─→ Auto-Promotion Enabled?
|
|
│ ├─ Yes → Create Promotion
|
|
│ └─ No → Wait for Manual Approval
|
|
│
|
|
└─→ Kargo Controller Reconciles
|
|
│
|
|
├─→ Update Argo CD Applications
|
|
│
|
|
├─→ Monitor Health
|
|
│
|
|
├─→ Run Analysis (via Argo Rollouts)
|
|
│
|
|
└─→ Approve/Reject Next Promotion
|
|
```
|
|
|
|
## Security Architecture
|
|
|
|
### Multi-Layer Security
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ 1. Git Repository Security │
|
|
│ - SSH key authentication │
|
|
│ - Branch protection rules │
|
|
│ - Code review requirements │
|
|
└─────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ 2. Secrets Encryption │
|
|
│ - Sealed Secrets (bitnami-labs) │
|
|
│ - Encrypted at rest in Git │
|
|
│ - Decrypted only in cluster │
|
|
└─────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ 3. RBAC (Role-Based Access Control) │
|
|
│ - Argo CD projects limit access │
|
|
│ - Kargo OIDC integration │
|
|
│ - Kubernetes RBAC policies │
|
|
└─────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ 4. Network Security │
|
|
│ - Namespace isolation │
|
|
│ - Network policies │
|
|
│ - TLS for all communications │
|
|
└─────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ 5. Pod Security │
|
|
│ - Non-root users │
|
|
│ - Read-only filesystems │
|
|
│ - Security contexts │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## State Management
|
|
|
|
### What State is Stored Where
|
|
|
|
```
|
|
┌────────────────────────────────────────┐
|
|
│ Git Repository │
|
|
├────────────────────────────────────────┤
|
|
│ ✓ Infrastructure manifests │
|
|
│ ✓ Application configurations │
|
|
│ ✓ Sealed secrets │
|
|
│ ✓ Kustomize overlays │
|
|
│ ✓ Helm values │
|
|
│ ✗ Cluster runtime state │
|
|
│ ✗ User credentials (plaintext) │
|
|
└────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────┐
|
|
│ Kubernetes Cluster (etcd) │
|
|
├────────────────────────────────────────┤
|
|
│ ✓ Applied manifests │
|
|
│ ✓ Running resource state │
|
|
│ ✓ Sealed secrets (encrypted) │
|
|
│ ✓ Argo CD applications │
|
|
│ ✓ Kargo Promotions/Freight │
|
|
│ ✗ Git history │
|
|
└────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────┐
|
|
│ External Storage │
|
|
├────────────────────────────────────────┤
|
|
│ ✓ Container registries │
|
|
│ ✓ Helm repositories │
|
|
│ ✓ Git repository │
|
|
│ ✓ Certificate authority keys │
|
|
│ ✗ Sensitive credentials (plaintext) │
|
|
└────────────────────────────────────────┘
|
|
```
|
|
|
|
## Scalability Considerations
|
|
|
|
### Horizontal Scaling
|
|
|
|
**Argo CD**:
|
|
- Multiple application-controller replicas for sharding
|
|
- Multiple server replicas for load distribution
|
|
- Shared Redis for session management
|
|
|
|
**Kargo**:
|
|
- Multiple controller replicas for resource sharding
|
|
- Multiple API server replicas behind load balancer
|
|
- Webhook servers scale independently
|
|
|
|
### Vertical Scaling
|
|
|
|
**Resource Limits by Component**:
|
|
|
|
```
|
|
Argo CD Controller:
|
|
- Requests: 250m CPU, 256Mi Memory
|
|
- Limits: 500m CPU, 512Mi Memory
|
|
|
|
Argo CD Server:
|
|
- Requests: 125m CPU, 128Mi Memory
|
|
- Limits: 250m CPU, 256Mi Memory
|
|
|
|
Cert-Manager:
|
|
- Requests: 100m CPU, 64Mi Memory
|
|
- Limits: 200m CPU, 128Mi Memory
|
|
|
|
Kargo API:
|
|
- Requests: 100m CPU, 128Mi Memory
|
|
- Limits: 500m CPU, 512Mi Memory
|
|
```
|
|
|
|
## High Availability Setup
|
|
|
|
### Production Configuration
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Multi-Zone Kubernetes Cluster │
|
|
│ (3+ availability zones) │
|
|
│ │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
|
│ │ Zone A │ │ Zone B │ │ Zone C │ │
|
|
│ │ Master │ │ Master │ │ Master │ │
|
|
│ │ Worker │ │ Worker │ │ Worker │ │
|
|
│ └──────────┘ └──────────┘ └──────────┘ │
|
|
│ │
|
|
│ Distributed Storage: │
|
|
│ - etcd replicated across zones │
|
|
│ - PVC/PV with cross-zone replication │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Component Redundancy
|
|
|
|
- Argo CD: 2-3 replicas of each component
|
|
- Cert-Manager: 2-3 controller replicas
|
|
- Kargo: 2-3 API server replicas, 2-3 controller replicas
|
|
- Redis: Redis-HA with 3 sentinels
|
|
|
|
## Disaster Recovery
|
|
|
|
### Backup Strategy
|
|
|
|
```
|
|
Daily Backups:
|
|
┌─────────────────────────────────────────────────┐
|
|
│ Git Repository Commits │
|
|
│ (Automatically backed up by Git hosting) │
|
|
├─────────────────────────────────────────────────┤
|
|
│ Kubernetes etcd │
|
|
│ (velero or native etcd backup) │
|
|
├─────────────────────────────────────────────────┤
|
|
│ Sealing Keys for Sealed Secrets │
|
|
│ (Secured storage, NOT in Git) │
|
|
└─────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Recovery Procedures
|
|
|
|
1. **Git Corruption**: Use distributed copies, restore from backups
|
|
2. **etcd Corruption**: Restore from latest backup
|
|
3. **Secrets Key Loss**: Complete cluster recreation needed
|
|
4. **Application State**: Redeploy from Git (source of truth)
|
|
|
|
## Monitoring & Observability
|
|
|
|
### Key Metrics to Monitor
|
|
|
|
```
|
|
Argo CD Metrics:
|
|
- Application sync status
|
|
- Reconciliation lag
|
|
- Git repository fetch rate
|
|
- API server response times
|
|
|
|
Cert-Manager Metrics:
|
|
- Certificate renewal status
|
|
- Certificate expiry tracking
|
|
- Issuer availability
|
|
|
|
Kargo Metrics:
|
|
- Promotion success rate
|
|
- Stage health
|
|
- Freight warehouse size
|
|
- Webhook latency
|
|
|
|
System Metrics:
|
|
- Pod CPU/Memory usage
|
|
- Node capacity
|
|
- PVC utilization
|
|
- Network I/O
|
|
```
|
|
|
|
### Integration Points
|
|
|
|
- **Prometheus**: Scrape metrics from `/metrics` endpoints
|
|
- **Grafana**: Visualize metrics and dashboards
|
|
- **AlertManager**: Send alerts for critical issues
|
|
- **Logs**: Aggregate logs from all components
|
|
|
|
## Integration with External Systems
|
|
|
|
### Git Integration
|
|
|
|
```
|
|
Supported Git Providers:
|
|
├─ GitHub (via SSH)
|
|
├─ GitLab (via SSH)
|
|
├─ Gitea (via SSH)
|
|
└─ Self-hosted Git
|
|
|
|
Authentication:
|
|
├─ SSH keys (primary)
|
|
├─ HTTPS with personal tokens
|
|
└─ SSH agent forwarding
|
|
```
|
|
|
|
### CI/CD Pipeline Integration
|
|
|
|
```
|
|
Build Pipeline → Container Registry → Webhook → Kargo
|
|
↓
|
|
Create Freight
|
|
↓
|
|
Promote to Stages
|
|
↓
|
|
Update Argo CD Applications
|
|
```
|
|
|
|
## Network Architecture
|
|
|
|
### Kubernetes Network Design
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────┐
|
|
│ Cluster Network │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Namespace │ │ Namespace │ │
|
|
│ │ argocd │ │ cert-manager │ │
|
|
│ │ │ │ │ │
|
|
│ │ Service: │ │ Service: │ │
|
|
│ │ 10.0.0.0/24 │ │ 10.0.1.0/24 │ │
|
|
│ └──────────────┘ └──────────────┘ │
|
|
│ │
|
|
│ Pod CIDR: 10.1.0.0/16 │
|
|
│ Service CIDR: 10.0.0.0/12 │
|
|
│ │
|
|
│ DNS: CoreDNS for internal resolution │
|
|
│ Ingress: Optional external access │
|
|
└────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-04-16
|
|
**Version**: 1.0
|
|
|