fix: resolve 6 coordinator bugs (cache eviction, rebalance race, pod matching, lookup inefficiency)

- Add lastUpdatedMs timestamp to GameCacheDto to track actual game updates instead of heartbeat time. Fix cache eviction incorrectly marking correspondence games as idle.
- Use atomic SPOP in LoadBalancer.getGamesToMove() to prevent concurrent rebalance calls from selecting same games for migration.
- Add game→instance reverse mapping (nowchess:game:$gameId:instance) to eliminate O(instances) linear scan during cache eviction.
- Fix HealthMonitor pod matching from loose contains() to reliable endsWith() to prevent matching unintended pods with similar names.
- Update FailoverService to maintain game→instance mappings when migrating games during failover.
- Update CacheEvictionManager to use game→instance mapping for O(1) lookup instead of O(n) instance scan.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 17:07:29 +02:00
parent 2d76c001fe
commit 5619c8223a
7 changed files with 159 additions and 16 deletions
@@ -22,4 +22,5 @@ case class GameCacheDto(
pendingDrawOffer: Option[String],
redoStack: List[String] = Nil,
pendingTakebackRequest: Option[String] = None,
lastUpdatedMs: Long = System.currentTimeMillis(),
)
@@ -143,6 +143,7 @@ class RedisGameRegistry extends GameRegistry:
clockMoveDeadline = Option(record.clockMoveDeadline).map(_.longValue),
clockActiveColor = Option(record.clockActiveColor),
pendingDrawOffer = Option(record.pendingDrawOffer),
lastUpdatedMs = System.currentTimeMillis(),
)
(dto, reconstruct(dto))
} match