fix: coordinator auto-scaling, cache eviction, rebalancing, and grpc timeouts
Build & Test (NowChessSystems) TeamCity build finished
Build & Test (NowChessSystems) TeamCity build finished
Critical fixes: - Enable auto-scaling (was disabled in config) - Add periodic cache eviction (5m interval) — CacheEvictionManager never ran - Add periodic rebalance check (30s) — proactive load balancing - Add 5s timeout to all gRPC calls (batchResubscribe, unsubscribe, evict) - Use Option instead of null checks (scalafix compliance) These gaps left the coordinator unable to: 1. Scale up when instances overloaded (scaling was disabled) 2. Clean up idle games from memory (no scheduled eviction) 3. Rebalance load proactively (only on scale-up) 4. Handle hung instances (no RPC timeouts, operations could hang forever) Combined with prior fixes for instance metadata parsing and heartbeat TTL, the coordinator now handles overload scenarios correctly. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -37,7 +37,7 @@ nowchess:
|
||||
stream-heartbeat-interval: PT0.2S
|
||||
cache-eviction-interval: 10m
|
||||
game-idle-threshold: 45m
|
||||
auto-scale-enabled: false
|
||||
auto-scale-enabled: true
|
||||
scale-up-threshold: 0.8
|
||||
scale-down-threshold: 0.3
|
||||
scale-min-replicas: 2
|
||||
|
||||
Reference in New Issue
Block a user