Compare commits

..

4 Commits

Author SHA1 Message Date
TeamCity 0060229ee9 ci: bump version with Build-81 2026-05-13 12:59:28 +00:00
Janis d5c8da20f8 fix: update grpcServer variable to use Instance wrapper and add optional access method
Build & Test (NowChessSystems) TeamCity build finished
2026-05-13 14:42:12 +02:00
Janis ad9495afa3 fix: clean up code formatting and improve error handling in gRPC server and failover service
Build & Test (NowChessSystems) TeamCity build failed
2026-05-13 13:16:22 +02:00
Janis 2b04d7fa71 fix: replace null checks with Option in coordinator
Build & Test (NowChessSystems) TeamCity build failed
Use Option instead of null checks in HealthMonitor and InstanceRegistry
per Scalafix DisableSyntax rule.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-13 12:44:34 +02:00
8 changed files with 69 additions and 963 deletions
-313
View File
@@ -1,313 +0,0 @@
# NowChess Tournament API
Swiss-system bot tournaments. Bots are paired by score each round; all bots play every round (no eliminations). Game moves flow through the existing board and bot endpoints — the tournament module only orchestrates pairings, standings, and lifecycle.
---
## Base path
```
/api/tournament
```
Routing: `/api/tournament``nowchess-tournament-active:8086`
---
## Authentication
All endpoints require a valid JWT (`Authorization: Bearer <token>`).
Bot-facing streaming endpoints additionally require the token's subject to match the registered `botId`.
---
## Data models
### Tournament
```json
{
"id": "t7kXq2",
"name": "Friday Night Bots",
"status": "created | started | finished",
"rounds": 5,
"currentRound": 2,
"timeControl": {
"limitSeconds": 300,
"incrementSeconds": 3
},
"createdBy": "userId",
"createdAt": "2026-05-13T18:00:00Z",
"startedAt": "2026-05-13T18:05:00Z",
"finishedAt": null
}
```
### Standing
```json
{
"rank": 1,
"botId": "bot_abc",
"botName": "StockfishClone",
"points": 3.5,
"wins": 3,
"draws": 1,
"losses": 0,
"buchholz": 9.0
}
```
Tiebreaker: Buchholz score (sum of opponents' points).
### Pairing
```json
{
"round": 2,
"whiteBot": "bot_abc",
"blackBot": "bot_xyz",
"gameId": "j0nPtcjl",
"result": "white | black | draw | ongoing"
}
```
### TournamentEvent (SSE)
```json
{ "type": "tournamentStarted", "tournamentId": "t7kXq2" }
{ "type": "roundStarted", "tournamentId": "t7kXq2", "round": 2 }
{ "type": "pairingReady", "tournamentId": "t7kXq2", "round": 2, "gameId": "j0nPtcjl", "color": "white" }
{ "type": "roundFinished", "tournamentId": "t7kXq2", "round": 2 }
{ "type": "tournamentFinished","tournamentId": "t7kXq2" }
```
---
## Endpoints
### Tournament lifecycle
#### Create tournament
```
POST /api/tournament
```
Body:
```json
{
"name": "Friday Night Bots",
"rounds": 5,
"timeControl": {
"limitSeconds": 300,
"incrementSeconds": 3
}
}
```
Response `201 Created`:
```json
{ "id": "t7kXq2" }
```
The creator becomes the tournament director. Only the director can start and delete the tournament.
---
#### Get tournament
```
GET /api/tournament/{tournamentId}
```
Response `200 OK`: `Tournament` object.
---
#### List tournaments
```
GET /api/tournament
```
Query params:
| Param | Type | Default |
|----------|---------------------------------|-----------|
| `status` | `created\|started\|finished` | (all) |
| `limit` | integer (max 50) | 20 |
| `offset` | integer | 0 |
Response `200 OK`:
```json
{
"tournaments": [ /* Tournament[] */ ],
"total": 42
}
```
---
#### Start tournament
```
POST /api/tournament/{tournamentId}/start
```
Requires at least 2 registered bots. Computes round 1 pairings (random for round 1; score-based from round 2). Creates one game per pairing via `POST /api/board/game`.
Response `200 OK`: updated `Tournament` object.
---
#### Delete tournament
```
DELETE /api/tournament/{tournamentId}
```
Only allowed while `status == "created"`. Response `204 No Content`.
---
### Bot registration
#### Register bot
```
POST /api/tournament/{tournamentId}/bots
```
Registers a bot for the tournament. Must be called before the tournament starts.
The token subject must match the bot being registered.
Body:
```json
{ "botId": "bot_abc" }
```
Response `200 OK`:
```json
{ "botId": "bot_abc", "tournamentId": "t7kXq2" }
```
---
#### Unregister bot
```
DELETE /api/tournament/{tournamentId}/bots/{botId}
```
Only allowed while `status == "created"`. Response `204 No Content`.
---
#### List registered bots
```
GET /api/tournament/{tournamentId}/bots
```
Response `200 OK`:
```json
{
"bots": [
{ "botId": "bot_abc", "botName": "StockfishClone" }
]
}
```
---
### Standings and pairings
#### Get standings
```
GET /api/tournament/{tournamentId}/standings
```
Response `200 OK`:
```json
{ "standings": [ /* Standing[] */ ] }
```
---
#### Get pairings for a round
```
GET /api/tournament/{tournamentId}/rounds/{round}/pairings
```
Response `200 OK`:
```json
{ "pairings": [ /* Pairing[] */ ] }
```
---
### Bot streaming
#### Stream tournament events
```
GET /api/tournament/{tournamentId}/stream
```
Headers: `Accept: text/event-stream`
Server-Sent Events stream scoped to this tournament. The bot receives `pairingReady` events when it is assigned a game, at which point it should connect to the existing bot game stream:
```
GET /bot/stream/game/{gameId} (existing endpoint)
POST /bot/game/{gameId}/move/{uci} (existing endpoint)
```
The tournament module never sends moves — bots do that themselves through the existing bot endpoints.
---
## Typical bot flow
```
1. POST /api/tournament # director creates tournament
2. POST /api/tournament/{id}/bots # each bot registers
3. POST /api/tournament/{id}/start # director starts
4. GET /api/tournament/{id}/stream (SSE) # each bot opens stream
-- per round --
5. receive: pairingReady { gameId, color }
6. GET /bot/stream/game/{gameId} # existing endpoint
7. POST /bot/game/{gameId}/move/{uci} # existing endpoint, repeated
-- game ends --
8. receive: roundFinished
9. GET /api/tournament/{id}/standings # optional, inspect scores
-- repeat 59 for each round --
10. receive: tournamentFinished
11. GET /api/tournament/{id}/standings # final ranking
```
---
## Error responses
| Status | Meaning |
|--------|------------------------------------------------------|
| 400 | Invalid request body or parameters |
| 401 | Missing or invalid JWT |
| 403 | Action not allowed (wrong director, wrong bot, etc.) |
| 404 | Tournament or bot not found |
| 409 | Tournament already started / bot already registered |
-623
View File
@@ -1,623 +0,0 @@
openapi: 3.0.3
info:
title: NowChess Tournament API
description: |
Swiss-system bot tournaments, modelled after the Lichess API style.
Game moves flow through the existing board and bot endpoints — this module
handles pairings, standings, and lifecycle only.
## Streaming
Endpoints marked **NDJSON** return newline-delimited JSON objects
(`application/x-ndjson`). Each line is one complete JSON object. The
connection stays open until the tournament or round ends.
## Bot flow
```
POST /api/tournament # create
POST /api/tournament/{id}/join # each bot joins
POST /api/tournament/{id}/start # director starts
GET /api/tournament/{id}/stream (NDJSON) # open before start
-- per round --
receive gameStart { gameId, color }
GET /bot/stream/game/{gameId} (existing, NDJSON)
POST /bot/game/{gameId}/move/{uci} (existing)
-- repeat --
GET /api/tournament/{id}/results (NDJSON) # final standings
```
version: 1.0.0
servers:
- url: https://st.nowchess.janis-eccarius.de
description: Staging
- url: https://nowchess.janis-eccarius.de
description: Production
- url: http://localhost:8086
description: Local
security:
- bearerAuth: []
tags:
- name: Tournament
description: Tournament lifecycle
- name: Participation
description: Join and withdraw
- name: Results
description: Standings, pairings, and game export
- name: Stream
description: NDJSON event streams
paths:
/api/tournament:
get:
tags: [Tournament]
summary: Get current tournaments
description: Returns tournaments grouped by status. No auth required.
security: []
responses:
"200":
description: Tournaments by status
content:
application/json:
schema:
type: object
properties:
created:
type: array
items:
$ref: "#/components/schemas/TournamentInfo"
started:
type: array
items:
$ref: "#/components/schemas/TournamentInfo"
finished:
type: array
items:
$ref: "#/components/schemas/TournamentInfo"
post:
tags: [Tournament]
summary: Create a new tournament
description: The authenticated user becomes the tournament director.
requestBody:
required: true
content:
application/x-www-form-urlencoded:
schema:
$ref: "#/components/schemas/CreateTournamentForm"
responses:
"201":
description: Tournament created
content:
application/json:
schema:
$ref: "#/components/schemas/Tournament"
"400":
$ref: "#/components/responses/BadRequest"
"401":
$ref: "#/components/responses/Unauthorized"
/api/tournament/{id}:
parameters:
- $ref: "#/components/parameters/id"
get:
tags: [Tournament]
summary: Get a tournament
description: Includes the first page of standings in the `standing` field.
security: []
responses:
"200":
description: Tournament with embedded standings
content:
application/json:
schema:
$ref: "#/components/schemas/Tournament"
"404":
$ref: "#/components/responses/NotFound"
delete:
tags: [Tournament]
summary: Terminate a tournament
description: Only the director may terminate. Only allowed while status is `created`.
responses:
"204":
description: Terminated
"401":
$ref: "#/components/responses/Unauthorized"
"403":
$ref: "#/components/responses/Forbidden"
"404":
$ref: "#/components/responses/NotFound"
"409":
$ref: "#/components/responses/Conflict"
/api/tournament/{id}/start:
parameters:
- $ref: "#/components/parameters/id"
post:
tags: [Tournament]
summary: Start the tournament
description: |
Only the director may start. Requires at least 2 joined bots.
Computes round 1 pairings and creates games via `POST /api/board/game`.
responses:
"200":
description: Tournament started
content:
application/json:
schema:
$ref: "#/components/schemas/Tournament"
"401":
$ref: "#/components/responses/Unauthorized"
"403":
$ref: "#/components/responses/Forbidden"
"404":
$ref: "#/components/responses/NotFound"
"409":
$ref: "#/components/responses/Conflict"
/api/tournament/{id}/join:
parameters:
- $ref: "#/components/parameters/id"
post:
tags: [Participation]
summary: Join a tournament
description: |
Register the authenticated bot for the tournament. Only allowed while
status is `created`. The token subject must be a bot account.
responses:
"200":
description: Ok
content:
application/json:
schema:
$ref: "#/components/schemas/Ok"
"401":
$ref: "#/components/responses/Unauthorized"
"403":
$ref: "#/components/responses/Forbidden"
"404":
$ref: "#/components/responses/NotFound"
"409":
$ref: "#/components/responses/Conflict"
/api/tournament/{id}/withdraw:
parameters:
- $ref: "#/components/parameters/id"
post:
tags: [Participation]
summary: Withdraw from a tournament
description: Only allowed while status is `created`.
responses:
"200":
description: Ok
content:
application/json:
schema:
$ref: "#/components/schemas/Ok"
"401":
$ref: "#/components/responses/Unauthorized"
"403":
$ref: "#/components/responses/Forbidden"
"404":
$ref: "#/components/responses/NotFound"
"409":
$ref: "#/components/responses/Conflict"
/api/tournament/{id}/results:
parameters:
- $ref: "#/components/parameters/id"
get:
tags: [Results]
summary: Get results as NDJSON stream
description: |
Streams one `Result` object per line, sorted by rank ascending.
Available at any point during or after the tournament.
security: []
parameters:
- name: nb
in: query
description: Max number of results to stream (default all)
schema:
type: integer
minimum: 1
responses:
"200":
description: NDJSON stream of results
content:
application/x-ndjson:
schema:
$ref: "#/components/schemas/Result"
"404":
$ref: "#/components/responses/NotFound"
/api/tournament/{id}/round/{round}:
parameters:
- $ref: "#/components/parameters/id"
- name: round
in: path
required: true
schema:
type: integer
minimum: 1
get:
tags: [Results]
summary: Get pairings for a round
security: []
responses:
"200":
description: Pairings for the specified round
content:
application/json:
schema:
type: object
properties:
round:
type: integer
example: 2
pairings:
type: array
items:
$ref: "#/components/schemas/Pairing"
"404":
$ref: "#/components/responses/NotFound"
/api/tournament/{id}/export/games:
parameters:
- $ref: "#/components/parameters/id"
get:
tags: [Results]
summary: Export all games
description: |
Returns all games of the tournament. Accepts both PGN and NDJSON via
the `Accept` header.
security: []
parameters:
- name: Accept
in: header
schema:
type: string
enum:
- application/x-chess-pgn
- application/x-ndjson
default: application/x-chess-pgn
responses:
"200":
description: Games in the requested format
content:
application/x-chess-pgn:
schema:
type: string
description: Standard PGN, one game per block
application/x-ndjson:
schema:
$ref: "#/components/schemas/GameExport"
"404":
$ref: "#/components/responses/NotFound"
/api/tournament/{id}/stream:
parameters:
- $ref: "#/components/parameters/id"
get:
tags: [Stream]
summary: Stream tournament events
description: |
NDJSON stream scoped to one tournament. Keep this connection open for
the full tournament lifetime.
On `gameStart` the bot connects to the existing bot endpoints:
- `GET /bot/stream/game/{gameId}` — stream game state (existing)
- `POST /bot/game/{gameId}/move/{uci}` — submit moves (existing)
responses:
"200":
description: NDJSON event stream
content:
application/x-ndjson:
schema:
$ref: "#/components/schemas/TournamentEvent"
"401":
$ref: "#/components/responses/Unauthorized"
"404":
$ref: "#/components/responses/NotFound"
components:
securitySchemes:
bearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
parameters:
id:
name: id
in: path
required: true
schema:
type: string
example: t7kXq2
schemas:
Clock:
type: object
required: [limit, increment]
properties:
limit:
type: integer
description: Base time in seconds
example: 300
increment:
type: integer
description: Increment per move in seconds
example: 3
Variant:
type: object
properties:
key:
type: string
example: standard
name:
type: string
example: Standard
BotRef:
type: object
properties:
id:
type: string
example: bot_abc
name:
type: string
example: StockfishClone
Standing:
type: object
properties:
page:
type: integer
example: 1
players:
type: array
items:
$ref: "#/components/schemas/Result"
TournamentInfo:
description: Lightweight tournament summary used in list responses.
type: object
properties:
id:
type: string
example: t7kXq2
fullName:
type: string
example: Friday Night Bots Swiss
clock:
$ref: "#/components/schemas/Clock"
variant:
$ref: "#/components/schemas/Variant"
rated:
type: boolean
example: true
nbPlayers:
type: integer
example: 8
nbRounds:
type: integer
example: 5
createdBy:
type: string
example: userId
startsAt:
type: string
format: date-time
Tournament:
allOf:
- $ref: "#/components/schemas/TournamentInfo"
- type: object
properties:
status:
type: string
enum: [created, started, finished]
example: started
round:
type: integer
description: Current round number
example: 2
standing:
$ref: "#/components/schemas/Standing"
winner:
description: Present only when status is `finished`
allOf:
- $ref: "#/components/schemas/BotRef"
nullable: true
CreateTournamentForm:
type: object
required: [name, nbRounds, clockLimit, clockIncrement]
properties:
name:
type: string
example: Friday Night Bots
nbRounds:
type: integer
minimum: 1
example: 5
clockLimit:
type: integer
description: Base time in seconds
example: 300
clockIncrement:
type: integer
description: Increment per move in seconds
example: 3
rated:
type: boolean
default: true
Result:
type: object
properties:
rank:
type: integer
example: 1
points:
type: number
format: double
example: 3.5
tieBreak:
type: number
format: double
description: Buchholz score (sum of opponents' points)
example: 9.0
bot:
$ref: "#/components/schemas/BotRef"
nbGames:
type: integer
example: 4
wins:
type: integer
example: 3
draws:
type: integer
example: 1
losses:
type: integer
example: 0
Pairing:
type: object
properties:
round:
type: integer
example: 2
white:
$ref: "#/components/schemas/BotRef"
black:
$ref: "#/components/schemas/BotRef"
gameId:
type: string
example: j0nPtcjl
winner:
type: string
enum: [white, black, draw]
nullable: true
description: Null while the game is ongoing
GameExport:
description: One game object per NDJSON line.
type: object
properties:
id:
type: string
example: j0nPtcjl
round:
type: integer
example: 2
white:
$ref: "#/components/schemas/BotRef"
black:
$ref: "#/components/schemas/BotRef"
winner:
type: string
enum: [white, black, draw]
nullable: true
moves:
type: string
description: Space-separated UCI moves
example: e2e4 e7e5 g1f3
TournamentEvent:
description: |
One JSON object per NDJSON line. Discriminate on `type`.
| type | extra fields |
|------|-------------|
| `tournamentStarted` | — |
| `roundStarted` | `round` |
| `gameStart` | `round`, `gameId`, `color` |
| `roundFinished` | `round` |
| `tournamentFinished` | `winner` |
type: object
required: [type]
properties:
type:
type: string
enum:
- tournamentStarted
- roundStarted
- gameStart
- roundFinished
- tournamentFinished
round:
type: integer
example: 2
gameId:
type: string
example: j0nPtcjl
color:
type: string
enum: [white, black]
winner:
$ref: "#/components/schemas/BotRef"
Ok:
type: object
properties:
ok:
type: boolean
example: true
Error:
type: object
properties:
error:
type: string
example: tournament already started
responses:
BadRequest:
description: Invalid request body or parameters
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
Unauthorized:
description: Missing or invalid JWT
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
Forbidden:
description: Action not permitted for this user or bot
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
NotFound:
description: Tournament not found
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
Conflict:
description: Conflicting state (e.g. already started, bot already joined)
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
+32
View File
@@ -334,3 +334,35 @@
* **middleware:** update paths for bot generation and stockfish configuration ([2dd0501](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/2dd0501687db08dcd242359f6837125baf8a2fdc))
* **redis:** update Redis configuration with max pool size and waiting parameters ([5baf6a7](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/5baf6a7cdbea484fc49c02e2b5a1c3919b7fa2c4))
* update HealthMonitor to evict instances without associated pods ([0f41f13](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/0f41f13ce68b76846684bab67241a122250dfaf9))
## (2026-05-13)
### Features
* add coordinator startup validation and K8s pod watch ([81b045d](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/81b045d01bb054a4bc9dc9e02fc30f814e756205))
* add initialization metrics for various services ([d438e97](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d438e97f32bdde0bfc63c1b4a8cc810cdd093166))
* add OpenTelemetry trace configuration with parentbased sampler ([3904d5a](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/3904d5ad8ad4930ddee65287a7bfab785a6148f5))
* **config:** update application.yml for PostgreSQL and remove staging/production configurations ([2404e61](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/2404e6164c3b50ffccbea5238d636060d6abe4d6))
* **config:** update application.yml for staging and production environments ([6113432](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/6113432a14c476a3a0dfc0d449e17d023697f2ba))
* configure logging and add OpenTelemetry support ([#49](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/49)) ([d57c488](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d57c4886612d1d92da0e1b79209fc83e6ef537a1))
* **docker:** add .dockerignore and .gitignore files for build exclusions ([c987d8e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/c987d8e258c0e6c4cfbdaa8381c64c410d7a2b83))
* **docker:** add Dockerfiles for building Quarkus application in native and JVM modes ([3f2d2bb](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/3f2d2bb4c97fa8cddba66e1da4427c54236dfeed))
* **docker:** add Dockerfiles for Quarkus application in JVM and native modes ([34b9933](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/34b993304670cf2aa62cd2f6460cee7b9864b08e))
* **logging:** add DEBUG/INFO/WARN logging across services (NCS-72) ([#41](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/41)) ([804a4bf](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/804a4bf179e3dfb19e2be4390e7e543caf5237c6))
* NCS-78 Add Traceability to the Applications ([#46](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/46)) ([649566e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/649566eb3fcf38f91c8896a739f74ea318af312d))
* NCS-78 Add Traceability to the Applications ([#47](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/47)) ([87dfc6c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/87dfc6c2bcce7f7d58fc641bd8d468a2e584c108))
* true-microservices ([#40](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/40)) ([5909242](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/590924254e8a2754de661a57a03e43f89ceb6299))
### Bug Fixes
* add instance-dead-timeout configuration and update HealthMonitor to use it for stale instance eviction ([be0b710](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/be0b710543b542da5c301efef7d2d587d0ba758a))
* clean up code formatting and improve error handling in gRPC server and failover service ([ad9495a](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/ad9495afa3e93593b57154a187346c9b01393911))
* **coordinator:** refine type casting in rolloutSpec method ([#45](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/45)) ([d522f7f](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d522f7f6edf9c985f03dd16816439d4184f1a589))
* **coordinator:** use genericKubernetesResources API for Argo Rollout scaling ([#43](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/43)) ([fa3c6b2](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/fa3c6b2886dc59c14c5dad834acc9b41e42023bb))
* **coordinator:** use genericKubernetesResources API for Argo Rollout scaling ([#44](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/44)) ([82d0b75](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/82d0b754be1075084944b466858672d944f9f7d8))
* **dependencies:** replace Fabric8 Kubernetes client with Quarkus Kubernetes client ([5f44570](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/5f44570b357277d09f33b7296860c421e2e70ce0))
* enhance AutoScaler and InstanceRegistry for replica management and stale instance eviction ([b4920d3](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/b4920d3817e58bda94d7764e608b856ce9a909f7))
* **middleware:** update paths for bot generation and stockfish configuration ([2dd0501](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/2dd0501687db08dcd242359f6837125baf8a2fdc))
* **redis:** update Redis configuration with max pool size and waiting parameters ([5baf6a7](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/5baf6a7cdbea484fc49c02e2b5a1c3919b7fa2c4))
* replace null checks with Option in coordinator ([2b04d7f](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/2b04d7fa713e06662bff5afe3fb3f9d04541ce51))
* update grpcServer variable to use Instance wrapper and add optional access method ([d5c8da2](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d5c8da20f8805199e920ea5afbd9cdb39a078e40))
* update HealthMonitor to evict instances without associated pods ([0f41f13](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/0f41f13ce68b76846684bab67241a122250dfaf9))
@@ -127,7 +127,8 @@ class CoordinatorGrpcServer extends CoordinatorServiceGrpc.CoordinatorServiceImp
_ =>
val response = DrainInstanceResponse.newBuilder().setGamesMigrated(gamesBefore).build()
responseObserver.onNext(response)
responseObserver.onCompleted(),
responseObserver.onCompleted()
,
ex =>
log.warnf(ex, "Drain failed for %s", instanceId)
responseObserver.onError(ex),
@@ -122,14 +122,17 @@ class FailoverService:
log.infof("Cleaned up games set for instance %s", instanceId)
private def waitForHealthyInstanceAsync(): Uni[InstanceMetadata] =
Uni.createFrom().deferred(() =>
instanceRegistry.getAllInstances
.filter(_.state == "HEALTHY")
.sortBy(_.subscriptionCount)
.headOption match
Uni
.createFrom()
.deferred(() =>
instanceRegistry.getAllInstances
.filter(_.state == "HEALTHY")
.sortBy(_.subscriptionCount)
.headOption match
case Some(inst) => Uni.createFrom().item(inst)
case None => Uni.createFrom().failure(new RuntimeException("no healthy instance"))
).onFailure()
case None => Uni.createFrom().failure(new RuntimeException("no healthy instance")),
)
.onFailure()
.retry()
.withBackOff(Duration.ofMillis(500))
.expireIn(config.failoverWaitTimeout.toMillis)
@@ -39,7 +39,7 @@ class HealthMonitor:
private var meterRegistry: MeterRegistry = uninitialized
@Inject
private var grpcServer: CoordinatorGrpcServer = uninitialized
private var grpcServerInstance: Instance[CoordinatorGrpcServer] = uninitialized
@Inject
private var failoverService: FailoverService = uninitialized
@@ -52,6 +52,10 @@ class HealthMonitor:
if kubeClientInstance.isUnsatisfied then None
else Some(kubeClientInstance.get())
private def grpcServerOpt: Option[CoordinatorGrpcServer] =
if grpcServerInstance.isUnsatisfied then None
else Some(grpcServerInstance.get())
def setRedisPrefix(prefix: String): Unit =
redisPrefix = prefix
@@ -133,19 +137,18 @@ class HealthMonitor:
action match
case Watcher.Action.DELETED =>
handlePodGone(pod)
case Watcher.Action.MODIFIED
if Option(pod.getMetadata.getDeletionTimestamp).isDefined =>
case Watcher.Action.MODIFIED if Option(pod.getMetadata.getDeletionTimestamp).isDefined =>
handlePodTerminating(pod)
case _ => ()
override def onClose(cause: WatcherException): Unit =
if cause != null then
log.warnf(cause, "Pod watch closed, restarting")
Option(cause).foreach { ex =>
log.warnf(ex, "Pod watch closed, restarting")
startPodWatch()
},
)
log.info("Pod watch started")
catch
case ex: Exception => log.warnf(ex, "Failed to start pod watch")
catch case ex: Exception => log.warnf(ex, "Failed to start pod watch")
private def isPodReady(pod: Pod): Boolean =
Option(pod.getStatus)
@@ -180,15 +183,17 @@ class HealthMonitor:
private def validateStartupInstances(timeoutMs: Long): Unit =
Thread.sleep(timeoutMs)
instanceRegistry.getAllInstances.foreach { inst =>
if !grpcServer.hasActiveStream(inst.instanceId) then
log.warnf(
"Startup: instance %s did not reconnect within %dms — evicting",
inst.instanceId,
timeoutMs,
)
instanceRegistry.removeInstance(inst.instanceId)
deleteK8sPod(inst.instanceId)
grpcServerOpt.foreach { grpcServer =>
instanceRegistry.getAllInstances.foreach { inst =>
if !grpcServer.hasActiveStream(inst.instanceId) then
log.warnf(
"Startup: instance %s did not reconnect within %dms — evicting",
inst.instanceId,
timeoutMs,
)
instanceRegistry.removeInstance(inst.instanceId)
deleteK8sPod(inst.instanceId)
}
}
private def handlePodTerminating(pod: Pod): Unit =
@@ -51,14 +51,15 @@ class InstanceRegistry:
keys.asScala.foreach { key =>
val instanceId = key.stripPrefix(s"$redisPrefix:instances:")
val json = syncRedis.value(classOf[String]).get(key)
if json != null then
Option(json).foreach { jsonStr =>
try
val metadata = mapper.readValue(json, classOf[InstanceMetadata])
val metadata = mapper.readValue(jsonStr, classOf[InstanceMetadata])
instances.put(instanceId, metadata)
log.infof("Startup: loaded instance %s from Redis", instanceId)
catch
case ex: Exception =>
log.warnf(ex, "Startup: failed to parse instance %s", instanceId)
}
}
def getInstance(instanceId: String): Option[InstanceMetadata] =
+1 -1
View File
@@ -1,3 +1,3 @@
MAJOR=0
MINOR=18
MINOR=19
PATCH=0