fix: coordinator auto-scaling, cache eviction, rebalancing, and grpc timeouts
Build & Test (NowChessSystems) TeamCity build finished

Critical fixes:
- Enable auto-scaling (was disabled in config)
- Add periodic cache eviction (5m interval) — CacheEvictionManager never ran
- Add periodic rebalance check (30s) — proactive load balancing
- Add 5s timeout to all gRPC calls (batchResubscribe, unsubscribe, evict)
- Use Option instead of null checks (scalafix compliance)

These gaps left the coordinator unable to:
1. Scale up when instances overloaded (scaling was disabled)
2. Clean up idle games from memory (no scheduled eviction)
3. Rebalance load proactively (only on scale-up)
4. Handle hung instances (no RPC timeouts, operations could hang forever)

Combined with prior fixes for instance metadata parsing and heartbeat TTL,
the coordinator now handles overload scenarios correctly.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-05-13 22:20:25 +02:00
parent 3f12f695f1
commit d0c71693bb
7 changed files with 26 additions and 9 deletions
+1
View File
@@ -256,6 +256,7 @@
- `modules/coordinator/src/main/scala/de/nowchess/coordinator/service/AutoScaler.scala`
- class AutoScaler
- function initMetrics
- function periodicScaleCheck
- function checkAndScale
- function scaleUp
- function scaleDown
+1
View File
@@ -200,6 +200,7 @@
- `modules/coordinator/src/main/scala/de/nowchess/coordinator/service/AutoScaler.scala`
- class AutoScaler
- function initMetrics
- function periodicScaleCheck
- function checkAndScale
- function scaleUp
- function scaleDown