NowChessSystems

Author	SHA1	Message	Date
Janis	d0c71693bb	fix: coordinator auto-scaling, cache eviction, rebalancing, and grpc timeouts Build & Test (NowChessSystems) TeamCity build finished Details Critical fixes: - Enable auto-scaling (was disabled in config) - Add periodic cache eviction (5m interval) — CacheEvictionManager never ran - Add periodic rebalance check (30s) — proactive load balancing - Add 5s timeout to all gRPC calls (batchResubscribe, unsubscribe, evict) - Use Option instead of null checks (scalafix compliance) These gaps left the coordinator unable to: 1. Scale up when instances overloaded (scaling was disabled) 2. Clean up idle games from memory (no scheduled eviction) 3. Rebalance load proactively (only on scale-up) 4. Handle hung instances (no RPC timeouts, operations could hang forever) Combined with prior fixes for instance metadata parsing and heartbeat TTL, the coordinator now handles overload scenarios correctly. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-05-13 22:20:25 +02:00
Janis	3f12f695f1	feat: implement periodic scaling checks and enhance instance management in AutoScaler Build & Test (NowChessSystems) TeamCity build failed Details	2026-05-13 22:08:22 +02:00
TeamCity	d41c03700c	ci: bump version with Build-83	2026-05-13 17:06:04 +00:00
Janis	10937e756a	fix: streamline logging for evicted instances in InstanceRegistry Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-13 18:39:32 +02:00
Janis	380a2cceeb	feat: add periodic health check to evict dead instances Build & Test (NowChessSystems) TeamCity build failed Details Add quarkus-scheduler dependency and schedule health check every 10 seconds. Dead instances (marked with state="DEAD") now automatically evicted instead of accumulating indefinitely. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-05-13 18:25:12 +02:00
Janis	43184d296d	fix: remove corrupted instances immediately and evict dead instances Problem: Dead instances pile up indefinitely. Failed metadata parsing leaves stale data in registry. No cleanup mechanism exists. Changes: 1. Remove instance from registry on parse failure (corrupted metadata = unrecoverable) 2. Evict instances with state="DEAD" on next health check (was only evicting by heartbeat age) This prevents: - Memory leak from accumulating dead/corrupted instances - Stale data persisting after parse failures - Dead instances blocking resources indefinitely Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-05-13 18:25:12 +02:00
TeamCity	0060229ee9	ci: bump version with Build-81	2026-05-13 12:59:28 +00:00
Janis	d5c8da20f8	fix: update grpcServer variable to use Instance wrapper and add optional access method Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-13 14:42:12 +02:00
Janis	ad9495afa3	fix: clean up code formatting and improve error handling in gRPC server and failover service Build & Test (NowChessSystems) TeamCity build failed Details	2026-05-13 13:16:22 +02:00
Janis	2b04d7fa71	fix: replace null checks with Option in coordinator Build & Test (NowChessSystems) TeamCity build failed Details Use Option instead of null checks in HealthMonitor and InstanceRegistry per Scalafix DisableSyntax rule. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-05-13 12:44:34 +02:00
Janis	81b045d01b	feat: add coordinator startup validation and K8s pod watch Build & Test (NowChessSystems) TeamCity build failed Details On startup, load all known instances from Redis and wait 15s for them to reconnect via gRPC. Evict instances that don't reconnect within the timeout and delete their K8s pods. Replace one-shot pod status check with real fabric8 Watch. On pod Terminating event, mark instance dead. On pod Deleted event, trigger failover. Failover now waits reactively for at least one healthy instance before distributing orphaned games, up to 30s timeout. - Add startupValidationTimeout and failoverWaitTimeout config (15s, 30s) - CoordinatorGrpcServer tracks active gRPC streams - InstanceRegistry.loadAllFromRedis() scans and loads instances on startup - HealthMonitor startup observer validates instances and starts K8s watch - FailoverService.onInstanceStreamDropped returns Uni[Unit] for reactive wait - All failover service callers updated to subscribe to Uni result Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-05-13 09:55:38 +02:00
TeamCity	e81c3844ad	ci: bump version with Build-77	2026-05-12 17:18:14 +00:00
Janis	3904d5ad8a	feat: add OpenTelemetry trace configuration with parentbased sampler Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-12 19:00:08 +02:00
TeamCity	58e08f3807	ci: bump version with Build-76	2026-05-11 21:05:41 +00:00
Janis	d438e97f32	feat: add initialization metrics for various services	2026-05-11 22:37:22 +02:00
TeamCity	fcc251f777	ci: bump version with Build-75	2026-05-10 21:19:52 +00:00
Janis	9459203e0d	refactor: update timer record calls to use Runnable type Build & Test (NowChessSystems) TeamCity build failed Details	2026-05-10 22:24:55 +02:00
Janis	d57c488661	feat: configure logging and add OpenTelemetry support (#49 ) Build & Test (NowChessSystems) TeamCity build failed Details Reviewed-on: #49	2026-05-10 20:31:48 +02:00
TeamCity	e9a4ecf4ae	ci: bump version with Build-73	2026-05-10 10:36:36 +00:00
Janis	87dfc6c2bc	feat: NCS-78 Add Traceability to the Applications (#47 ) Build & Test (NowChessSystems) TeamCity build finished Details Reviewed-on: #47	2026-05-10 12:15:53 +02:00
TeamCity	183ad670e3	ci: bump version with Build-72	2026-05-09 19:11:07 +00:00
Janis	649566eb3f	feat: NCS-78 Add Traceability to the Applications (#46 ) Build & Test (NowChessSystems) TeamCity build finished Details Reviewed-on: #46	2026-05-09 20:54:18 +02:00
TeamCity	6844253f4c	ci: bump version with Build-71	2026-05-08 13:48:55 +00:00
Janis	be0b710543	fix: add instance-dead-timeout configuration and update HealthMonitor to use it for stale instance eviction Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-08 15:32:44 +02:00
TeamCity	dcebdf237e	ci: bump version with Build-70	2026-05-08 12:26:57 +00:00
Janis	0f41f13ce6	fix: update HealthMonitor to evict instances without associated pods Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-08 14:10:53 +02:00
TeamCity	ae6d235e1d	ci: bump version with Build-69	2026-05-08 10:54:01 +00:00
Janis	b4920d3817	fix: enhance AutoScaler and InstanceRegistry for replica management and stale instance eviction Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-08 12:37:23 +02:00
TeamCity	a101866bcf	ci: bump version with Build-67	2026-05-05 18:20:39 +00:00
Janis	5baf6a7cdb	fix(redis): update Redis configuration with max pool size and waiting parameters Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-05 20:01:32 +02:00
TeamCity	4a145cb538	ci: bump version with Build-64	2026-05-03 10:34:44 +00:00
Janis	d522f7f6ed	fix(coordinator): refine type casting in rolloutSpec method (#45 ) Build & Test (NowChessSystems) TeamCity build failed Details Reviewed-on: #45 Co-authored-by: Janis <janis.e.20@gmx.de> Co-committed-by: Janis <janis.e.20@gmx.de>	2026-05-03 12:12:39 +02:00
Janis	82d0b754be	fix(coordinator): use genericKubernetesResources API for Argo Rollout scaling (#44 ) Build & Test (NowChessSystems) TeamCity build failed Details Reviewed-on: #44 Co-authored-by: Janis <janis.e.20@gmx.de> Co-committed-by: Janis <janis.e.20@gmx.de>	2026-05-02 22:27:18 +02:00
TeamCity	4694f516fa	ci: bump version with Build-63	2026-05-02 19:39:06 +00:00
Janis	fa3c6b2886	fix(coordinator): use genericKubernetesResources API for Argo Rollout scaling (#43 ) Build & Test (NowChessSystems) TeamCity build finished Details fabric8 disallows client.resources(classOf[GenericKubernetesResource]) — throws KubernetesClientException at runtime. Switch to genericKubernetesResources(apiVersion, kind) which is the correct API for CRDs. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Reviewed-on: #43 Co-authored-by: Janis <janis.e.20@gmx.de> Co-committed-by: Janis <janis.e.20@gmx.de>	2026-05-02 21:22:53 +02:00
TeamCity	e472fb75ad	ci: bump version with Build-62	2026-05-02 17:12:34 +00:00
Janis	5f44570b35	fix(dependencies): replace Fabric8 Kubernetes client with Quarkus Kubernetes client Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-02 18:52:47 +02:00
TeamCity	18a4b1cc15	ci: bump version with Build-60	2026-05-02 15:53:05 +00:00
Janis	804a4bf179	feat(logging): add DEBUG/INFO/WARN logging across services (NCS-72) (#41 ) Build & Test (NowChessSystems) TeamCity build finished Details Reviewed-on: #41 Co-authored-by: Janis <janis.e.20@gmx.de> Co-committed-by: Janis <janis.e.20@gmx.de>	2026-05-02 17:33:27 +02:00
TeamCity	3c47d2b8c9	ci: bump version with Build-58	2026-05-01 18:23:46 +00:00
Janis	d346c41d98	refactor: improve code formatting and readability Build & Test (NowChessSystems) TeamCity build finished Details	2026-05-01 20:06:10 +02:00
Janis	2dd0501687	fix(middleware): update paths for bot generation and stockfish configuration Build & Test (NowChessSystems) TeamCity build failed Details refactor(bru): standardize authentication settings across requests chore: add coordinator base URL to configuration files	2026-05-01 19:56:34 +02:00
TeamCity	9f86cc421f	ci: bump version with Build-57	2026-04-30 17:14:21 +00:00
TeamCity	77e498a326	ci: bump version with Build-56	2026-04-30 16:15:27 +00:00
TeamCity	0229147f4d	ci: bump version with Build-55	2026-04-30 15:35:40 +00:00
Janis	2404e6164c	feat(config): update application.yml for PostgreSQL and remove staging/production configurations	2026-04-30 16:14:10 +02:00
TeamCity	6a143a462a	ci: bump version with Build-54	2026-04-30 09:55:43 +00:00
Janis	6113432a14	feat(config): update application.yml for staging and production environments Build & Test (NowChessSystems) TeamCity build finished Details	2026-04-30 10:55:20 +02:00
Janis	34b9933046	feat(docker): add Dockerfiles for Quarkus application in JVM and native modes Build & Test (NowChessSystems) TeamCity build finished Details	2026-04-30 09:28:02 +02:00
Janis	c987d8e258	feat(docker): add .dockerignore and .gitignore files for build exclusions Build & Test (NowChessSystems) TeamCity build was queued Details	2026-04-30 08:41:03 +02:00

1 2

52 Commits