Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 3bda9466b5 | |||
| 81b045d01b |
@@ -0,0 +1,313 @@
|
|||||||
|
# NowChess Tournament API
|
||||||
|
|
||||||
|
Swiss-system bot tournaments. Bots are paired by score each round; all bots play every round (no eliminations). Game moves flow through the existing board and bot endpoints — the tournament module only orchestrates pairings, standings, and lifecycle.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Base path
|
||||||
|
|
||||||
|
```
|
||||||
|
/api/tournament
|
||||||
|
```
|
||||||
|
|
||||||
|
Routing: `/api/tournament` → `nowchess-tournament-active:8086`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
All endpoints require a valid JWT (`Authorization: Bearer <token>`).
|
||||||
|
Bot-facing streaming endpoints additionally require the token's subject to match the registered `botId`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data models
|
||||||
|
|
||||||
|
### Tournament
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "t7kXq2",
|
||||||
|
"name": "Friday Night Bots",
|
||||||
|
"status": "created | started | finished",
|
||||||
|
"rounds": 5,
|
||||||
|
"currentRound": 2,
|
||||||
|
"timeControl": {
|
||||||
|
"limitSeconds": 300,
|
||||||
|
"incrementSeconds": 3
|
||||||
|
},
|
||||||
|
"createdBy": "userId",
|
||||||
|
"createdAt": "2026-05-13T18:00:00Z",
|
||||||
|
"startedAt": "2026-05-13T18:05:00Z",
|
||||||
|
"finishedAt": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Standing
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"rank": 1,
|
||||||
|
"botId": "bot_abc",
|
||||||
|
"botName": "StockfishClone",
|
||||||
|
"points": 3.5,
|
||||||
|
"wins": 3,
|
||||||
|
"draws": 1,
|
||||||
|
"losses": 0,
|
||||||
|
"buchholz": 9.0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Tiebreaker: Buchholz score (sum of opponents' points).
|
||||||
|
|
||||||
|
### Pairing
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"round": 2,
|
||||||
|
"whiteBot": "bot_abc",
|
||||||
|
"blackBot": "bot_xyz",
|
||||||
|
"gameId": "j0nPtcjl",
|
||||||
|
"result": "white | black | draw | ongoing"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### TournamentEvent (SSE)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "type": "tournamentStarted", "tournamentId": "t7kXq2" }
|
||||||
|
{ "type": "roundStarted", "tournamentId": "t7kXq2", "round": 2 }
|
||||||
|
{ "type": "pairingReady", "tournamentId": "t7kXq2", "round": 2, "gameId": "j0nPtcjl", "color": "white" }
|
||||||
|
{ "type": "roundFinished", "tournamentId": "t7kXq2", "round": 2 }
|
||||||
|
{ "type": "tournamentFinished","tournamentId": "t7kXq2" }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
### Tournament lifecycle
|
||||||
|
|
||||||
|
#### Create tournament
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/tournament
|
||||||
|
```
|
||||||
|
|
||||||
|
Body:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "Friday Night Bots",
|
||||||
|
"rounds": 5,
|
||||||
|
"timeControl": {
|
||||||
|
"limitSeconds": 300,
|
||||||
|
"incrementSeconds": 3
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response `201 Created`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "id": "t7kXq2" }
|
||||||
|
```
|
||||||
|
|
||||||
|
The creator becomes the tournament director. Only the director can start and delete the tournament.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Get tournament
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/tournament/{tournamentId}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response `200 OK`: `Tournament` object.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### List tournaments
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/tournament
|
||||||
|
```
|
||||||
|
|
||||||
|
Query params:
|
||||||
|
|
||||||
|
| Param | Type | Default |
|
||||||
|
|----------|---------------------------------|-----------|
|
||||||
|
| `status` | `created\|started\|finished` | (all) |
|
||||||
|
| `limit` | integer (max 50) | 20 |
|
||||||
|
| `offset` | integer | 0 |
|
||||||
|
|
||||||
|
Response `200 OK`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tournaments": [ /* Tournament[] */ ],
|
||||||
|
"total": 42
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Start tournament
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/tournament/{tournamentId}/start
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires at least 2 registered bots. Computes round 1 pairings (random for round 1; score-based from round 2). Creates one game per pairing via `POST /api/board/game`.
|
||||||
|
|
||||||
|
Response `200 OK`: updated `Tournament` object.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Delete tournament
|
||||||
|
|
||||||
|
```
|
||||||
|
DELETE /api/tournament/{tournamentId}
|
||||||
|
```
|
||||||
|
|
||||||
|
Only allowed while `status == "created"`. Response `204 No Content`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Bot registration
|
||||||
|
|
||||||
|
#### Register bot
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/tournament/{tournamentId}/bots
|
||||||
|
```
|
||||||
|
|
||||||
|
Registers a bot for the tournament. Must be called before the tournament starts.
|
||||||
|
The token subject must match the bot being registered.
|
||||||
|
|
||||||
|
Body:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "botId": "bot_abc" }
|
||||||
|
```
|
||||||
|
|
||||||
|
Response `200 OK`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "botId": "bot_abc", "tournamentId": "t7kXq2" }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Unregister bot
|
||||||
|
|
||||||
|
```
|
||||||
|
DELETE /api/tournament/{tournamentId}/bots/{botId}
|
||||||
|
```
|
||||||
|
|
||||||
|
Only allowed while `status == "created"`. Response `204 No Content`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### List registered bots
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/tournament/{tournamentId}/bots
|
||||||
|
```
|
||||||
|
|
||||||
|
Response `200 OK`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"bots": [
|
||||||
|
{ "botId": "bot_abc", "botName": "StockfishClone" }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Standings and pairings
|
||||||
|
|
||||||
|
#### Get standings
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/tournament/{tournamentId}/standings
|
||||||
|
```
|
||||||
|
|
||||||
|
Response `200 OK`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "standings": [ /* Standing[] */ ] }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Get pairings for a round
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/tournament/{tournamentId}/rounds/{round}/pairings
|
||||||
|
```
|
||||||
|
|
||||||
|
Response `200 OK`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "pairings": [ /* Pairing[] */ ] }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Bot streaming
|
||||||
|
|
||||||
|
#### Stream tournament events
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/tournament/{tournamentId}/stream
|
||||||
|
```
|
||||||
|
|
||||||
|
Headers: `Accept: text/event-stream`
|
||||||
|
|
||||||
|
Server-Sent Events stream scoped to this tournament. The bot receives `pairingReady` events when it is assigned a game, at which point it should connect to the existing bot game stream:
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /bot/stream/game/{gameId} (existing endpoint)
|
||||||
|
POST /bot/game/{gameId}/move/{uci} (existing endpoint)
|
||||||
|
```
|
||||||
|
|
||||||
|
The tournament module never sends moves — bots do that themselves through the existing bot endpoints.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Typical bot flow
|
||||||
|
|
||||||
|
```
|
||||||
|
1. POST /api/tournament # director creates tournament
|
||||||
|
2. POST /api/tournament/{id}/bots # each bot registers
|
||||||
|
3. POST /api/tournament/{id}/start # director starts
|
||||||
|
4. GET /api/tournament/{id}/stream (SSE) # each bot opens stream
|
||||||
|
|
||||||
|
-- per round --
|
||||||
|
5. receive: pairingReady { gameId, color }
|
||||||
|
6. GET /bot/stream/game/{gameId} # existing endpoint
|
||||||
|
7. POST /bot/game/{gameId}/move/{uci} # existing endpoint, repeated
|
||||||
|
-- game ends --
|
||||||
|
|
||||||
|
8. receive: roundFinished
|
||||||
|
9. GET /api/tournament/{id}/standings # optional, inspect scores
|
||||||
|
-- repeat 5–9 for each round --
|
||||||
|
|
||||||
|
10. receive: tournamentFinished
|
||||||
|
11. GET /api/tournament/{id}/standings # final ranking
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error responses
|
||||||
|
|
||||||
|
| Status | Meaning |
|
||||||
|
|--------|------------------------------------------------------|
|
||||||
|
| 400 | Invalid request body or parameters |
|
||||||
|
| 401 | Missing or invalid JWT |
|
||||||
|
| 403 | Action not allowed (wrong director, wrong bot, etc.) |
|
||||||
|
| 404 | Tournament or bot not found |
|
||||||
|
| 409 | Tournament already started / bot already registered |
|
||||||
@@ -0,0 +1,623 @@
|
|||||||
|
openapi: 3.0.3
|
||||||
|
info:
|
||||||
|
title: NowChess Tournament API
|
||||||
|
description: |
|
||||||
|
Swiss-system bot tournaments, modelled after the Lichess API style.
|
||||||
|
|
||||||
|
Game moves flow through the existing board and bot endpoints — this module
|
||||||
|
handles pairings, standings, and lifecycle only.
|
||||||
|
|
||||||
|
## Streaming
|
||||||
|
Endpoints marked **NDJSON** return newline-delimited JSON objects
|
||||||
|
(`application/x-ndjson`). Each line is one complete JSON object. The
|
||||||
|
connection stays open until the tournament or round ends.
|
||||||
|
|
||||||
|
## Bot flow
|
||||||
|
```
|
||||||
|
POST /api/tournament # create
|
||||||
|
POST /api/tournament/{id}/join # each bot joins
|
||||||
|
POST /api/tournament/{id}/start # director starts
|
||||||
|
|
||||||
|
GET /api/tournament/{id}/stream (NDJSON) # open before start
|
||||||
|
|
||||||
|
-- per round --
|
||||||
|
receive gameStart { gameId, color }
|
||||||
|
GET /bot/stream/game/{gameId} (existing, NDJSON)
|
||||||
|
POST /bot/game/{gameId}/move/{uci} (existing)
|
||||||
|
-- repeat --
|
||||||
|
|
||||||
|
GET /api/tournament/{id}/results (NDJSON) # final standings
|
||||||
|
```
|
||||||
|
version: 1.0.0
|
||||||
|
|
||||||
|
servers:
|
||||||
|
- url: https://st.nowchess.janis-eccarius.de
|
||||||
|
description: Staging
|
||||||
|
- url: https://nowchess.janis-eccarius.de
|
||||||
|
description: Production
|
||||||
|
- url: http://localhost:8086
|
||||||
|
description: Local
|
||||||
|
|
||||||
|
security:
|
||||||
|
- bearerAuth: []
|
||||||
|
|
||||||
|
tags:
|
||||||
|
- name: Tournament
|
||||||
|
description: Tournament lifecycle
|
||||||
|
- name: Participation
|
||||||
|
description: Join and withdraw
|
||||||
|
- name: Results
|
||||||
|
description: Standings, pairings, and game export
|
||||||
|
- name: Stream
|
||||||
|
description: NDJSON event streams
|
||||||
|
|
||||||
|
paths:
|
||||||
|
|
||||||
|
/api/tournament:
|
||||||
|
get:
|
||||||
|
tags: [Tournament]
|
||||||
|
summary: Get current tournaments
|
||||||
|
description: Returns tournaments grouped by status. No auth required.
|
||||||
|
security: []
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: Tournaments by status
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
created:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: "#/components/schemas/TournamentInfo"
|
||||||
|
started:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: "#/components/schemas/TournamentInfo"
|
||||||
|
finished:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: "#/components/schemas/TournamentInfo"
|
||||||
|
|
||||||
|
post:
|
||||||
|
tags: [Tournament]
|
||||||
|
summary: Create a new tournament
|
||||||
|
description: The authenticated user becomes the tournament director.
|
||||||
|
requestBody:
|
||||||
|
required: true
|
||||||
|
content:
|
||||||
|
application/x-www-form-urlencoded:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/CreateTournamentForm"
|
||||||
|
responses:
|
||||||
|
"201":
|
||||||
|
description: Tournament created
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Tournament"
|
||||||
|
"400":
|
||||||
|
$ref: "#/components/responses/BadRequest"
|
||||||
|
"401":
|
||||||
|
$ref: "#/components/responses/Unauthorized"
|
||||||
|
|
||||||
|
/api/tournament/{id}:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
|
||||||
|
get:
|
||||||
|
tags: [Tournament]
|
||||||
|
summary: Get a tournament
|
||||||
|
description: Includes the first page of standings in the `standing` field.
|
||||||
|
security: []
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: Tournament with embedded standings
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Tournament"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
|
||||||
|
delete:
|
||||||
|
tags: [Tournament]
|
||||||
|
summary: Terminate a tournament
|
||||||
|
description: Only the director may terminate. Only allowed while status is `created`.
|
||||||
|
responses:
|
||||||
|
"204":
|
||||||
|
description: Terminated
|
||||||
|
"401":
|
||||||
|
$ref: "#/components/responses/Unauthorized"
|
||||||
|
"403":
|
||||||
|
$ref: "#/components/responses/Forbidden"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
"409":
|
||||||
|
$ref: "#/components/responses/Conflict"
|
||||||
|
|
||||||
|
/api/tournament/{id}/start:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
|
||||||
|
post:
|
||||||
|
tags: [Tournament]
|
||||||
|
summary: Start the tournament
|
||||||
|
description: |
|
||||||
|
Only the director may start. Requires at least 2 joined bots.
|
||||||
|
Computes round 1 pairings and creates games via `POST /api/board/game`.
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: Tournament started
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Tournament"
|
||||||
|
"401":
|
||||||
|
$ref: "#/components/responses/Unauthorized"
|
||||||
|
"403":
|
||||||
|
$ref: "#/components/responses/Forbidden"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
"409":
|
||||||
|
$ref: "#/components/responses/Conflict"
|
||||||
|
|
||||||
|
/api/tournament/{id}/join:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
|
||||||
|
post:
|
||||||
|
tags: [Participation]
|
||||||
|
summary: Join a tournament
|
||||||
|
description: |
|
||||||
|
Register the authenticated bot for the tournament. Only allowed while
|
||||||
|
status is `created`. The token subject must be a bot account.
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: Ok
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Ok"
|
||||||
|
"401":
|
||||||
|
$ref: "#/components/responses/Unauthorized"
|
||||||
|
"403":
|
||||||
|
$ref: "#/components/responses/Forbidden"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
"409":
|
||||||
|
$ref: "#/components/responses/Conflict"
|
||||||
|
|
||||||
|
/api/tournament/{id}/withdraw:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
|
||||||
|
post:
|
||||||
|
tags: [Participation]
|
||||||
|
summary: Withdraw from a tournament
|
||||||
|
description: Only allowed while status is `created`.
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: Ok
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Ok"
|
||||||
|
"401":
|
||||||
|
$ref: "#/components/responses/Unauthorized"
|
||||||
|
"403":
|
||||||
|
$ref: "#/components/responses/Forbidden"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
"409":
|
||||||
|
$ref: "#/components/responses/Conflict"
|
||||||
|
|
||||||
|
/api/tournament/{id}/results:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
|
||||||
|
get:
|
||||||
|
tags: [Results]
|
||||||
|
summary: Get results as NDJSON stream
|
||||||
|
description: |
|
||||||
|
Streams one `Result` object per line, sorted by rank ascending.
|
||||||
|
Available at any point during or after the tournament.
|
||||||
|
security: []
|
||||||
|
parameters:
|
||||||
|
- name: nb
|
||||||
|
in: query
|
||||||
|
description: Max number of results to stream (default all)
|
||||||
|
schema:
|
||||||
|
type: integer
|
||||||
|
minimum: 1
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: NDJSON stream of results
|
||||||
|
content:
|
||||||
|
application/x-ndjson:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Result"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
|
||||||
|
/api/tournament/{id}/round/{round}:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
- name: round
|
||||||
|
in: path
|
||||||
|
required: true
|
||||||
|
schema:
|
||||||
|
type: integer
|
||||||
|
minimum: 1
|
||||||
|
|
||||||
|
get:
|
||||||
|
tags: [Results]
|
||||||
|
summary: Get pairings for a round
|
||||||
|
security: []
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: Pairings for the specified round
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
round:
|
||||||
|
type: integer
|
||||||
|
example: 2
|
||||||
|
pairings:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: "#/components/schemas/Pairing"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
|
||||||
|
/api/tournament/{id}/export/games:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
|
||||||
|
get:
|
||||||
|
tags: [Results]
|
||||||
|
summary: Export all games
|
||||||
|
description: |
|
||||||
|
Returns all games of the tournament. Accepts both PGN and NDJSON via
|
||||||
|
the `Accept` header.
|
||||||
|
security: []
|
||||||
|
parameters:
|
||||||
|
- name: Accept
|
||||||
|
in: header
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
enum:
|
||||||
|
- application/x-chess-pgn
|
||||||
|
- application/x-ndjson
|
||||||
|
default: application/x-chess-pgn
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: Games in the requested format
|
||||||
|
content:
|
||||||
|
application/x-chess-pgn:
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
description: Standard PGN, one game per block
|
||||||
|
application/x-ndjson:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/GameExport"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
|
||||||
|
/api/tournament/{id}/stream:
|
||||||
|
parameters:
|
||||||
|
- $ref: "#/components/parameters/id"
|
||||||
|
|
||||||
|
get:
|
||||||
|
tags: [Stream]
|
||||||
|
summary: Stream tournament events
|
||||||
|
description: |
|
||||||
|
NDJSON stream scoped to one tournament. Keep this connection open for
|
||||||
|
the full tournament lifetime.
|
||||||
|
|
||||||
|
On `gameStart` the bot connects to the existing bot endpoints:
|
||||||
|
- `GET /bot/stream/game/{gameId}` — stream game state (existing)
|
||||||
|
- `POST /bot/game/{gameId}/move/{uci}` — submit moves (existing)
|
||||||
|
responses:
|
||||||
|
"200":
|
||||||
|
description: NDJSON event stream
|
||||||
|
content:
|
||||||
|
application/x-ndjson:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/TournamentEvent"
|
||||||
|
"401":
|
||||||
|
$ref: "#/components/responses/Unauthorized"
|
||||||
|
"404":
|
||||||
|
$ref: "#/components/responses/NotFound"
|
||||||
|
|
||||||
|
components:
|
||||||
|
|
||||||
|
securitySchemes:
|
||||||
|
bearerAuth:
|
||||||
|
type: http
|
||||||
|
scheme: bearer
|
||||||
|
bearerFormat: JWT
|
||||||
|
|
||||||
|
parameters:
|
||||||
|
id:
|
||||||
|
name: id
|
||||||
|
in: path
|
||||||
|
required: true
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
example: t7kXq2
|
||||||
|
|
||||||
|
schemas:
|
||||||
|
|
||||||
|
Clock:
|
||||||
|
type: object
|
||||||
|
required: [limit, increment]
|
||||||
|
properties:
|
||||||
|
limit:
|
||||||
|
type: integer
|
||||||
|
description: Base time in seconds
|
||||||
|
example: 300
|
||||||
|
increment:
|
||||||
|
type: integer
|
||||||
|
description: Increment per move in seconds
|
||||||
|
example: 3
|
||||||
|
|
||||||
|
Variant:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
key:
|
||||||
|
type: string
|
||||||
|
example: standard
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
example: Standard
|
||||||
|
|
||||||
|
BotRef:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
example: bot_abc
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
example: StockfishClone
|
||||||
|
|
||||||
|
Standing:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
page:
|
||||||
|
type: integer
|
||||||
|
example: 1
|
||||||
|
players:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: "#/components/schemas/Result"
|
||||||
|
|
||||||
|
TournamentInfo:
|
||||||
|
description: Lightweight tournament summary used in list responses.
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
example: t7kXq2
|
||||||
|
fullName:
|
||||||
|
type: string
|
||||||
|
example: Friday Night Bots Swiss
|
||||||
|
clock:
|
||||||
|
$ref: "#/components/schemas/Clock"
|
||||||
|
variant:
|
||||||
|
$ref: "#/components/schemas/Variant"
|
||||||
|
rated:
|
||||||
|
type: boolean
|
||||||
|
example: true
|
||||||
|
nbPlayers:
|
||||||
|
type: integer
|
||||||
|
example: 8
|
||||||
|
nbRounds:
|
||||||
|
type: integer
|
||||||
|
example: 5
|
||||||
|
createdBy:
|
||||||
|
type: string
|
||||||
|
example: userId
|
||||||
|
startsAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
|
||||||
|
Tournament:
|
||||||
|
allOf:
|
||||||
|
- $ref: "#/components/schemas/TournamentInfo"
|
||||||
|
- type: object
|
||||||
|
properties:
|
||||||
|
status:
|
||||||
|
type: string
|
||||||
|
enum: [created, started, finished]
|
||||||
|
example: started
|
||||||
|
round:
|
||||||
|
type: integer
|
||||||
|
description: Current round number
|
||||||
|
example: 2
|
||||||
|
standing:
|
||||||
|
$ref: "#/components/schemas/Standing"
|
||||||
|
winner:
|
||||||
|
description: Present only when status is `finished`
|
||||||
|
allOf:
|
||||||
|
- $ref: "#/components/schemas/BotRef"
|
||||||
|
nullable: true
|
||||||
|
|
||||||
|
CreateTournamentForm:
|
||||||
|
type: object
|
||||||
|
required: [name, nbRounds, clockLimit, clockIncrement]
|
||||||
|
properties:
|
||||||
|
name:
|
||||||
|
type: string
|
||||||
|
example: Friday Night Bots
|
||||||
|
nbRounds:
|
||||||
|
type: integer
|
||||||
|
minimum: 1
|
||||||
|
example: 5
|
||||||
|
clockLimit:
|
||||||
|
type: integer
|
||||||
|
description: Base time in seconds
|
||||||
|
example: 300
|
||||||
|
clockIncrement:
|
||||||
|
type: integer
|
||||||
|
description: Increment per move in seconds
|
||||||
|
example: 3
|
||||||
|
rated:
|
||||||
|
type: boolean
|
||||||
|
default: true
|
||||||
|
|
||||||
|
Result:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
rank:
|
||||||
|
type: integer
|
||||||
|
example: 1
|
||||||
|
points:
|
||||||
|
type: number
|
||||||
|
format: double
|
||||||
|
example: 3.5
|
||||||
|
tieBreak:
|
||||||
|
type: number
|
||||||
|
format: double
|
||||||
|
description: Buchholz score (sum of opponents' points)
|
||||||
|
example: 9.0
|
||||||
|
bot:
|
||||||
|
$ref: "#/components/schemas/BotRef"
|
||||||
|
nbGames:
|
||||||
|
type: integer
|
||||||
|
example: 4
|
||||||
|
wins:
|
||||||
|
type: integer
|
||||||
|
example: 3
|
||||||
|
draws:
|
||||||
|
type: integer
|
||||||
|
example: 1
|
||||||
|
losses:
|
||||||
|
type: integer
|
||||||
|
example: 0
|
||||||
|
|
||||||
|
Pairing:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
round:
|
||||||
|
type: integer
|
||||||
|
example: 2
|
||||||
|
white:
|
||||||
|
$ref: "#/components/schemas/BotRef"
|
||||||
|
black:
|
||||||
|
$ref: "#/components/schemas/BotRef"
|
||||||
|
gameId:
|
||||||
|
type: string
|
||||||
|
example: j0nPtcjl
|
||||||
|
winner:
|
||||||
|
type: string
|
||||||
|
enum: [white, black, draw]
|
||||||
|
nullable: true
|
||||||
|
description: Null while the game is ongoing
|
||||||
|
|
||||||
|
GameExport:
|
||||||
|
description: One game object per NDJSON line.
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
example: j0nPtcjl
|
||||||
|
round:
|
||||||
|
type: integer
|
||||||
|
example: 2
|
||||||
|
white:
|
||||||
|
$ref: "#/components/schemas/BotRef"
|
||||||
|
black:
|
||||||
|
$ref: "#/components/schemas/BotRef"
|
||||||
|
winner:
|
||||||
|
type: string
|
||||||
|
enum: [white, black, draw]
|
||||||
|
nullable: true
|
||||||
|
moves:
|
||||||
|
type: string
|
||||||
|
description: Space-separated UCI moves
|
||||||
|
example: e2e4 e7e5 g1f3
|
||||||
|
|
||||||
|
TournamentEvent:
|
||||||
|
description: |
|
||||||
|
One JSON object per NDJSON line. Discriminate on `type`.
|
||||||
|
|
||||||
|
| type | extra fields |
|
||||||
|
|------|-------------|
|
||||||
|
| `tournamentStarted` | — |
|
||||||
|
| `roundStarted` | `round` |
|
||||||
|
| `gameStart` | `round`, `gameId`, `color` |
|
||||||
|
| `roundFinished` | `round` |
|
||||||
|
| `tournamentFinished` | `winner` |
|
||||||
|
type: object
|
||||||
|
required: [type]
|
||||||
|
properties:
|
||||||
|
type:
|
||||||
|
type: string
|
||||||
|
enum:
|
||||||
|
- tournamentStarted
|
||||||
|
- roundStarted
|
||||||
|
- gameStart
|
||||||
|
- roundFinished
|
||||||
|
- tournamentFinished
|
||||||
|
round:
|
||||||
|
type: integer
|
||||||
|
example: 2
|
||||||
|
gameId:
|
||||||
|
type: string
|
||||||
|
example: j0nPtcjl
|
||||||
|
color:
|
||||||
|
type: string
|
||||||
|
enum: [white, black]
|
||||||
|
winner:
|
||||||
|
$ref: "#/components/schemas/BotRef"
|
||||||
|
|
||||||
|
Ok:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
ok:
|
||||||
|
type: boolean
|
||||||
|
example: true
|
||||||
|
|
||||||
|
Error:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
error:
|
||||||
|
type: string
|
||||||
|
example: tournament already started
|
||||||
|
|
||||||
|
responses:
|
||||||
|
BadRequest:
|
||||||
|
description: Invalid request body or parameters
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Error"
|
||||||
|
Unauthorized:
|
||||||
|
description: Missing or invalid JWT
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Error"
|
||||||
|
Forbidden:
|
||||||
|
description: Action not permitted for this user or bot
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Error"
|
||||||
|
NotFound:
|
||||||
|
description: Tournament not found
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Error"
|
||||||
|
Conflict:
|
||||||
|
description: Conflicting state (e.g. already started, bot already joined)
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: "#/components/schemas/Error"
|
||||||
@@ -45,6 +45,8 @@ nowchess:
|
|||||||
k8s-namespace: default
|
k8s-namespace: default
|
||||||
k8s-rollout-name: nowchess-core
|
k8s-rollout-name: nowchess-core
|
||||||
k8s-rollout-label-selector: "app=nowchess-core"
|
k8s-rollout-label-selector: "app=nowchess-core"
|
||||||
|
startup-validation-timeout: 15s
|
||||||
|
failover-wait-timeout: 30s
|
||||||
|
|
||||||
---
|
---
|
||||||
# dev profile
|
# dev profile
|
||||||
|
|||||||
+6
@@ -56,3 +56,9 @@ trait CoordinatorConfig:
|
|||||||
|
|
||||||
@WithName("k8s-rollout-label-selector")
|
@WithName("k8s-rollout-label-selector")
|
||||||
def k8sRolloutLabelSelector: String
|
def k8sRolloutLabelSelector: String
|
||||||
|
|
||||||
|
@WithName("startup-validation-timeout")
|
||||||
|
def startupValidationTimeout: Duration
|
||||||
|
|
||||||
|
@WithName("failover-wait-timeout")
|
||||||
|
def failoverWaitTimeout: Duration
|
||||||
|
|||||||
+30
-7
@@ -9,6 +9,7 @@ import de.nowchess.coordinator.proto.{CoordinatorServiceGrpc, *}
|
|||||||
import io.grpc.stub.StreamObserver
|
import io.grpc.stub.StreamObserver
|
||||||
import com.fasterxml.jackson.databind.ObjectMapper
|
import com.fasterxml.jackson.databind.ObjectMapper
|
||||||
import org.jboss.logging.Logger
|
import org.jboss.logging.Logger
|
||||||
|
import java.util.concurrent.ConcurrentHashMap
|
||||||
|
|
||||||
@GrpcService
|
@GrpcService
|
||||||
@Singleton
|
@Singleton
|
||||||
@@ -21,8 +22,9 @@ class CoordinatorGrpcServer extends CoordinatorServiceGrpc.CoordinatorServiceImp
|
|||||||
private var failoverService: FailoverService = uninitialized
|
private var failoverService: FailoverService = uninitialized
|
||||||
// scalafix:on DisableSyntax.var
|
// scalafix:on DisableSyntax.var
|
||||||
|
|
||||||
private val mapper = ObjectMapper()
|
private val mapper = ObjectMapper()
|
||||||
private val log = Logger.getLogger(classOf[CoordinatorGrpcServer])
|
private val log = Logger.getLogger(classOf[CoordinatorGrpcServer])
|
||||||
|
private val activeStreams = ConcurrentHashMap.newKeySet[String]()
|
||||||
|
|
||||||
override def heartbeatStream(
|
override def heartbeatStream(
|
||||||
responseObserver: StreamObserver[CoordinatorCommand],
|
responseObserver: StreamObserver[CoordinatorCommand],
|
||||||
@@ -38,6 +40,7 @@ class CoordinatorGrpcServer extends CoordinatorServiceGrpc.CoordinatorServiceImp
|
|||||||
lastInstanceId = frame.getInstanceId
|
lastInstanceId = frame.getInstanceId
|
||||||
if !firstFrameSeen then
|
if !firstFrameSeen then
|
||||||
firstFrameSeen = true
|
firstFrameSeen = true
|
||||||
|
activeStreams.add(frame.getInstanceId)
|
||||||
log.infof(
|
log.infof(
|
||||||
"First heartbeat from instance %s (host=%s http=%d grpc=%d)",
|
"First heartbeat from instance %s (host=%s http=%d grpc=%d)",
|
||||||
frame.getInstanceId,
|
frame.getInstanceId,
|
||||||
@@ -60,10 +63,19 @@ class CoordinatorGrpcServer extends CoordinatorServiceGrpc.CoordinatorServiceImp
|
|||||||
|
|
||||||
override def onError(t: Throwable): Unit =
|
override def onError(t: Throwable): Unit =
|
||||||
log.warnf(t, "Heartbeat stream error for instance %s", lastInstanceId)
|
log.warnf(t, "Heartbeat stream error for instance %s", lastInstanceId)
|
||||||
if lastInstanceId.nonEmpty then failoverService.onInstanceStreamDropped(lastInstanceId)
|
if lastInstanceId.nonEmpty then
|
||||||
|
activeStreams.remove(lastInstanceId)
|
||||||
|
failoverService
|
||||||
|
.onInstanceStreamDropped(lastInstanceId)
|
||||||
|
.subscribe()
|
||||||
|
.`with`(
|
||||||
|
_ => (),
|
||||||
|
ex => log.warnf(ex, "Failover for %s failed", lastInstanceId),
|
||||||
|
)
|
||||||
|
|
||||||
override def onCompleted: Unit =
|
override def onCompleted: Unit =
|
||||||
log.infof("Heartbeat stream completed for instance %s", lastInstanceId)
|
log.infof("Heartbeat stream completed for instance %s", lastInstanceId)
|
||||||
|
activeStreams.remove(lastInstanceId)
|
||||||
|
|
||||||
override def batchResubscribeGames(
|
override def batchResubscribeGames(
|
||||||
request: BatchResubscribeRequest,
|
request: BatchResubscribeRequest,
|
||||||
@@ -108,7 +120,18 @@ class CoordinatorGrpcServer extends CoordinatorServiceGrpc.CoordinatorServiceImp
|
|||||||
val instanceId = request.getInstanceId
|
val instanceId = request.getInstanceId
|
||||||
log.infof("Drain request for instance %s", instanceId)
|
log.infof("Drain request for instance %s", instanceId)
|
||||||
val gamesBefore = instanceRegistry.getInstance(instanceId).map(_.subscriptionCount).getOrElse(0)
|
val gamesBefore = instanceRegistry.getInstance(instanceId).map(_.subscriptionCount).getOrElse(0)
|
||||||
failoverService.onInstanceStreamDropped(instanceId)
|
failoverService
|
||||||
val response = DrainInstanceResponse.newBuilder().setGamesMigrated(gamesBefore).build()
|
.onInstanceStreamDropped(instanceId)
|
||||||
responseObserver.onNext(response)
|
.subscribe()
|
||||||
responseObserver.onCompleted()
|
.`with`(
|
||||||
|
_ =>
|
||||||
|
val response = DrainInstanceResponse.newBuilder().setGamesMigrated(gamesBefore).build()
|
||||||
|
responseObserver.onNext(response)
|
||||||
|
responseObserver.onCompleted(),
|
||||||
|
ex =>
|
||||||
|
log.warnf(ex, "Drain failed for %s", instanceId)
|
||||||
|
responseObserver.onError(ex),
|
||||||
|
)
|
||||||
|
|
||||||
|
def hasActiveStream(instanceId: String): Boolean =
|
||||||
|
activeStreams.contains(instanceId)
|
||||||
|
|||||||
+7
-1
@@ -70,7 +70,13 @@ class CoordinatorResource:
|
|||||||
@Produces(Array(MediaType.APPLICATION_JSON))
|
@Produces(Array(MediaType.APPLICATION_JSON))
|
||||||
def triggerFailover(@PathParam("instanceId") instanceId: String): scala.collection.Map[String, String] =
|
def triggerFailover(@PathParam("instanceId") instanceId: String): scala.collection.Map[String, String] =
|
||||||
log.infof("Manual failover triggered for instance %s", instanceId)
|
log.infof("Manual failover triggered for instance %s", instanceId)
|
||||||
failoverService.onInstanceStreamDropped(instanceId)
|
failoverService
|
||||||
|
.onInstanceStreamDropped(instanceId)
|
||||||
|
.subscribe()
|
||||||
|
.`with`(
|
||||||
|
_ => (),
|
||||||
|
ex => log.warnf(ex, "Manual failover for %s failed", instanceId),
|
||||||
|
)
|
||||||
Map("status" -> "failover_started", "instanceId" -> instanceId)
|
Map("status" -> "failover_started", "instanceId" -> instanceId)
|
||||||
|
|
||||||
@POST
|
@POST
|
||||||
|
|||||||
+45
-13
@@ -8,6 +8,9 @@ import scala.compiletime.uninitialized
|
|||||||
import org.jboss.logging.Logger
|
import org.jboss.logging.Logger
|
||||||
import de.nowchess.coordinator.dto.InstanceMetadata
|
import de.nowchess.coordinator.dto.InstanceMetadata
|
||||||
import de.nowchess.coordinator.grpc.CoreGrpcClient
|
import de.nowchess.coordinator.grpc.CoreGrpcClient
|
||||||
|
import de.nowchess.coordinator.config.CoordinatorConfig
|
||||||
|
import io.smallrye.mutiny.Uni
|
||||||
|
import java.time.Duration
|
||||||
|
|
||||||
@ApplicationScoped
|
@ApplicationScoped
|
||||||
class FailoverService:
|
class FailoverService:
|
||||||
@@ -21,6 +24,9 @@ class FailoverService:
|
|||||||
@Inject
|
@Inject
|
||||||
private var coreGrpcClient: CoreGrpcClient = uninitialized
|
private var coreGrpcClient: CoreGrpcClient = uninitialized
|
||||||
|
|
||||||
|
@Inject
|
||||||
|
private var config: CoordinatorConfig = uninitialized
|
||||||
|
|
||||||
private val log = Logger.getLogger(classOf[FailoverService])
|
private val log = Logger.getLogger(classOf[FailoverService])
|
||||||
private var redisPrefix = "nowchess"
|
private var redisPrefix = "nowchess"
|
||||||
// scalafix:on DisableSyntax.var
|
// scalafix:on DisableSyntax.var
|
||||||
@@ -28,7 +34,7 @@ class FailoverService:
|
|||||||
def setRedisPrefix(prefix: String): Unit =
|
def setRedisPrefix(prefix: String): Unit =
|
||||||
redisPrefix = prefix
|
redisPrefix = prefix
|
||||||
|
|
||||||
def onInstanceStreamDropped(instanceId: String): Unit =
|
def onInstanceStreamDropped(instanceId: String): Uni[Unit] =
|
||||||
log.infof("Instance %s stream dropped, triggering failover", instanceId)
|
log.infof("Instance %s stream dropped, triggering failover", instanceId)
|
||||||
|
|
||||||
val startTime = System.currentTimeMillis()
|
val startTime = System.currentTimeMillis()
|
||||||
@@ -37,19 +43,32 @@ class FailoverService:
|
|||||||
val gameIds = getOrphanedGames(instanceId)
|
val gameIds = getOrphanedGames(instanceId)
|
||||||
log.infof("Found %d orphaned games for instance %s", gameIds.size, instanceId)
|
log.infof("Found %d orphaned games for instance %s", gameIds.size, instanceId)
|
||||||
|
|
||||||
if gameIds.nonEmpty then
|
if gameIds.isEmpty then
|
||||||
val healthyInstances = instanceRegistry.getAllInstances
|
cleanupDeadInstance(instanceId)
|
||||||
.filter(_.state == "HEALTHY")
|
Uni.createFrom().item(())
|
||||||
.sortBy(_.subscriptionCount)
|
else
|
||||||
|
waitForHealthyInstanceAsync()
|
||||||
|
.onItem()
|
||||||
|
.transform { _ =>
|
||||||
|
val healthyInstances = instanceRegistry.getAllInstances
|
||||||
|
.filter(_.state == "HEALTHY")
|
||||||
|
.sortBy(_.subscriptionCount)
|
||||||
|
distributeGames(gameIds, healthyInstances, instanceId)
|
||||||
|
|
||||||
if healthyInstances.nonEmpty then
|
val elapsed = System.currentTimeMillis() - startTime
|
||||||
distributeGames(gameIds, healthyInstances, instanceId)
|
log.infof("Failover completed in %dms for instance %s", elapsed, instanceId)
|
||||||
|
cleanupDeadInstance(instanceId)
|
||||||
val elapsed = System.currentTimeMillis() - startTime
|
()
|
||||||
log.infof("Failover completed in %dms for instance %s", elapsed, instanceId)
|
}
|
||||||
else log.warnf("No healthy instances available for failover of %s", instanceId)
|
.onFailure()
|
||||||
|
.recoverWithItem { _ =>
|
||||||
cleanupDeadInstance(instanceId)
|
log.errorf(
|
||||||
|
"No healthy instance appeared within %s — games orphaned for %s",
|
||||||
|
config.failoverWaitTimeout,
|
||||||
|
instanceId,
|
||||||
|
)
|
||||||
|
()
|
||||||
|
}
|
||||||
|
|
||||||
private def getOrphanedGames(instanceId: String): List[String] =
|
private def getOrphanedGames(instanceId: String): List[String] =
|
||||||
val setKey = s"$redisPrefix:instance:$instanceId:games"
|
val setKey = s"$redisPrefix:instance:$instanceId:games"
|
||||||
@@ -101,3 +120,16 @@ class FailoverService:
|
|||||||
val setKey = s"$redisPrefix:instance:$instanceId:games"
|
val setKey = s"$redisPrefix:instance:$instanceId:games"
|
||||||
redis.key(classOf[String]).del(setKey)
|
redis.key(classOf[String]).del(setKey)
|
||||||
log.infof("Cleaned up games set for instance %s", instanceId)
|
log.infof("Cleaned up games set for instance %s", instanceId)
|
||||||
|
|
||||||
|
private def waitForHealthyInstanceAsync(): Uni[InstanceMetadata] =
|
||||||
|
Uni.createFrom().deferred(() =>
|
||||||
|
instanceRegistry.getAllInstances
|
||||||
|
.filter(_.state == "HEALTHY")
|
||||||
|
.sortBy(_.subscriptionCount)
|
||||||
|
.headOption match
|
||||||
|
case Some(inst) => Uni.createFrom().item(inst)
|
||||||
|
case None => Uni.createFrom().failure(new RuntimeException("no healthy instance"))
|
||||||
|
).onFailure()
|
||||||
|
.retry()
|
||||||
|
.withBackOff(Duration.ofMillis(500))
|
||||||
|
.expireIn(config.failoverWaitTimeout.toMillis)
|
||||||
|
|||||||
+85
-27
@@ -2,17 +2,23 @@ package de.nowchess.coordinator.service
|
|||||||
|
|
||||||
import jakarta.annotation.PostConstruct
|
import jakarta.annotation.PostConstruct
|
||||||
import jakarta.enterprise.context.ApplicationScoped
|
import jakarta.enterprise.context.ApplicationScoped
|
||||||
|
import jakarta.enterprise.event.Observes
|
||||||
import jakarta.enterprise.inject.Instance
|
import jakarta.enterprise.inject.Instance
|
||||||
import jakarta.inject.Inject
|
import jakarta.inject.Inject
|
||||||
import de.nowchess.coordinator.config.CoordinatorConfig
|
import de.nowchess.coordinator.config.CoordinatorConfig
|
||||||
import io.fabric8.kubernetes.client.KubernetesClient
|
import io.fabric8.kubernetes.client.KubernetesClient
|
||||||
import io.fabric8.kubernetes.api.model.Pod
|
import io.fabric8.kubernetes.api.model.Pod
|
||||||
|
import io.fabric8.kubernetes.client.Watcher
|
||||||
|
import io.fabric8.kubernetes.client.WatcherException
|
||||||
import io.micrometer.core.instrument.MeterRegistry
|
import io.micrometer.core.instrument.MeterRegistry
|
||||||
import io.quarkus.redis.datasource.RedisDataSource
|
import io.quarkus.redis.datasource.RedisDataSource
|
||||||
|
import io.quarkus.runtime.StartupEvent
|
||||||
import scala.jdk.CollectionConverters.*
|
import scala.jdk.CollectionConverters.*
|
||||||
import org.jboss.logging.Logger
|
import org.jboss.logging.Logger
|
||||||
import scala.compiletime.uninitialized
|
import scala.compiletime.uninitialized
|
||||||
import java.time.Instant
|
import java.time.Instant
|
||||||
|
import de.nowchess.coordinator.grpc.CoordinatorGrpcServer
|
||||||
|
import de.nowchess.coordinator.dto.InstanceMetadata
|
||||||
|
|
||||||
@ApplicationScoped
|
@ApplicationScoped
|
||||||
class HealthMonitor:
|
class HealthMonitor:
|
||||||
@@ -32,6 +38,12 @@ class HealthMonitor:
|
|||||||
@Inject
|
@Inject
|
||||||
private var meterRegistry: MeterRegistry = uninitialized
|
private var meterRegistry: MeterRegistry = uninitialized
|
||||||
|
|
||||||
|
@Inject
|
||||||
|
private var grpcServer: CoordinatorGrpcServer = uninitialized
|
||||||
|
|
||||||
|
@Inject
|
||||||
|
private var failoverService: FailoverService = uninitialized
|
||||||
|
|
||||||
private val log = Logger.getLogger(classOf[HealthMonitor])
|
private val log = Logger.getLogger(classOf[HealthMonitor])
|
||||||
private var redisPrefix = "nowchess"
|
private var redisPrefix = "nowchess"
|
||||||
// scalafix:on DisableSyntax.var
|
// scalafix:on DisableSyntax.var
|
||||||
@@ -48,6 +60,15 @@ class HealthMonitor:
|
|||||||
meterRegistry.counter("nowchess.coordinator.health.checks").increment(0)
|
meterRegistry.counter("nowchess.coordinator.health.checks").increment(0)
|
||||||
meterRegistry.counter("nowchess.coordinator.pods.unhealthy").increment(0)
|
meterRegistry.counter("nowchess.coordinator.pods.unhealthy").increment(0)
|
||||||
|
|
||||||
|
def onStartup(@Observes ev: StartupEvent): Unit =
|
||||||
|
instanceRegistry.loadAllFromRedis()
|
||||||
|
val loaded = instanceRegistry.getAllInstances
|
||||||
|
log.infof("Startup: loaded %d instances from Redis", loaded.size)
|
||||||
|
if loaded.nonEmpty then
|
||||||
|
val timeoutMs = config.startupValidationTimeout.toMillis
|
||||||
|
Thread.ofVirtual().start(() => validateStartupInstances(timeoutMs))
|
||||||
|
startPodWatch()
|
||||||
|
|
||||||
def checkInstanceHealth: Unit =
|
def checkInstanceHealth: Unit =
|
||||||
meterRegistry.counter("nowchess.coordinator.health.checks").increment()
|
meterRegistry.counter("nowchess.coordinator.health.checks").increment()
|
||||||
val evicted = instanceRegistry.evictStaleInstances(config.instanceDeadTimeout)
|
val evicted = instanceRegistry.evictStaleInstances(config.instanceDeadTimeout)
|
||||||
@@ -98,41 +119,33 @@ class HealthMonitor:
|
|||||||
true
|
true
|
||||||
}
|
}
|
||||||
|
|
||||||
def watchK8sPods: Unit =
|
private def startPodWatch(): Unit =
|
||||||
kubeClientOpt match
|
kubeClientOpt match
|
||||||
case None =>
|
case None => log.debug("K8s client unavailable, skipping pod watch")
|
||||||
log.debug("Kubernetes client not available for pod watch")
|
|
||||||
case Some(kube) =>
|
case Some(kube) =>
|
||||||
try
|
try
|
||||||
val pods = kube
|
kube
|
||||||
.pods()
|
.pods()
|
||||||
.inNamespace(config.k8sNamespace)
|
.inNamespace(config.k8sNamespace)
|
||||||
.withLabel(config.k8sRolloutLabelSelector)
|
.withLabel(config.k8sRolloutLabelSelector)
|
||||||
.list()
|
.watch(new Watcher[Pod]:
|
||||||
.getItems
|
override def eventReceived(action: Watcher.Action, pod: Pod): Unit =
|
||||||
.asScala
|
action match
|
||||||
|
case Watcher.Action.DELETED =>
|
||||||
|
handlePodGone(pod)
|
||||||
|
case Watcher.Action.MODIFIED
|
||||||
|
if Option(pod.getMetadata.getDeletionTimestamp).isDefined =>
|
||||||
|
handlePodTerminating(pod)
|
||||||
|
case _ => ()
|
||||||
|
|
||||||
val instances = instanceRegistry.getAllInstances
|
override def onClose(cause: WatcherException): Unit =
|
||||||
instances.foreach { inst =>
|
if cause != null then
|
||||||
val matchingPod = pods.find { pod =>
|
log.warnf(cause, "Pod watch closed, restarting")
|
||||||
pod.getMetadata.getName.contains(inst.instanceId)
|
startPodWatch()
|
||||||
}
|
)
|
||||||
|
log.info("Pod watch started")
|
||||||
matchingPod match
|
|
||||||
case Some(pod) =>
|
|
||||||
val isReady = isPodReady(pod)
|
|
||||||
if !isReady && inst.state == "HEALTHY" then
|
|
||||||
meterRegistry.counter("nowchess.coordinator.pods.unhealthy").increment()
|
|
||||||
log.warnf("Pod %s not ready, marking instance %s dead", pod.getMetadata.getName, inst.instanceId)
|
|
||||||
instanceRegistry.markInstanceDead(inst.instanceId)
|
|
||||||
deleteK8sPod(inst.instanceId)
|
|
||||||
case None =>
|
|
||||||
log.warnf("No pod found for instance %s, evicting from registry", inst.instanceId)
|
|
||||||
instanceRegistry.removeInstance(inst.instanceId)
|
|
||||||
}
|
|
||||||
catch
|
catch
|
||||||
case ex: Exception =>
|
case ex: Exception => log.warnf(ex, "Failed to start pod watch")
|
||||||
log.warnf(ex, "Failed to watch k8s pods")
|
|
||||||
|
|
||||||
private def isPodReady(pod: Pod): Boolean =
|
private def isPodReady(pod: Pod): Boolean =
|
||||||
Option(pod.getStatus)
|
Option(pod.getStatus)
|
||||||
@@ -164,3 +177,48 @@ class HealthMonitor:
|
|||||||
catch
|
catch
|
||||||
case ex: Exception =>
|
case ex: Exception =>
|
||||||
log.warnf(ex, "Failed to delete pod for instance %s", instanceId)
|
log.warnf(ex, "Failed to delete pod for instance %s", instanceId)
|
||||||
|
|
||||||
|
private def validateStartupInstances(timeoutMs: Long): Unit =
|
||||||
|
Thread.sleep(timeoutMs)
|
||||||
|
instanceRegistry.getAllInstances.foreach { inst =>
|
||||||
|
if !grpcServer.hasActiveStream(inst.instanceId) then
|
||||||
|
log.warnf(
|
||||||
|
"Startup: instance %s did not reconnect within %dms — evicting",
|
||||||
|
inst.instanceId,
|
||||||
|
timeoutMs,
|
||||||
|
)
|
||||||
|
instanceRegistry.removeInstance(inst.instanceId)
|
||||||
|
deleteK8sPod(inst.instanceId)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def handlePodTerminating(pod: Pod): Unit =
|
||||||
|
findRegisteredInstance(pod).foreach { inst =>
|
||||||
|
if inst.state == "HEALTHY" then
|
||||||
|
meterRegistry.counter("nowchess.coordinator.pods.unhealthy").increment()
|
||||||
|
log.warnf(
|
||||||
|
"Pod %s terminating — marking instance %s dead",
|
||||||
|
pod.getMetadata.getName,
|
||||||
|
inst.instanceId,
|
||||||
|
)
|
||||||
|
instanceRegistry.markInstanceDead(inst.instanceId)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def handlePodGone(pod: Pod): Unit =
|
||||||
|
findRegisteredInstance(pod).foreach { inst =>
|
||||||
|
log.warnf(
|
||||||
|
"Pod %s deleted — triggering failover for %s",
|
||||||
|
pod.getMetadata.getName,
|
||||||
|
inst.instanceId,
|
||||||
|
)
|
||||||
|
failoverService
|
||||||
|
.onInstanceStreamDropped(inst.instanceId)
|
||||||
|
.subscribe()
|
||||||
|
.`with`(
|
||||||
|
_ => (),
|
||||||
|
ex => log.warnf(ex, "Failover for %s failed", inst.instanceId),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def findRegisteredInstance(pod: Pod): Option[InstanceMetadata] =
|
||||||
|
val podName = pod.getMetadata.getName
|
||||||
|
instanceRegistry.getAllInstances.find(inst => podName.contains(inst.instanceId))
|
||||||
|
|||||||
+20
-1
@@ -4,6 +4,7 @@ import jakarta.annotation.PostConstruct
|
|||||||
import jakarta.enterprise.context.ApplicationScoped
|
import jakarta.enterprise.context.ApplicationScoped
|
||||||
import jakarta.inject.Inject
|
import jakarta.inject.Inject
|
||||||
import io.quarkus.redis.datasource.ReactiveRedisDataSource
|
import io.quarkus.redis.datasource.ReactiveRedisDataSource
|
||||||
|
import io.quarkus.redis.datasource.RedisDataSource
|
||||||
import scala.jdk.CollectionConverters.*
|
import scala.jdk.CollectionConverters.*
|
||||||
import scala.compiletime.uninitialized
|
import scala.compiletime.uninitialized
|
||||||
import com.fasterxml.jackson.databind.ObjectMapper
|
import com.fasterxml.jackson.databind.ObjectMapper
|
||||||
@@ -19,7 +20,10 @@ class InstanceRegistry:
|
|||||||
// scalafix:off DisableSyntax.var
|
// scalafix:off DisableSyntax.var
|
||||||
@Inject
|
@Inject
|
||||||
private var redis: ReactiveRedisDataSource = uninitialized
|
private var redis: ReactiveRedisDataSource = uninitialized
|
||||||
private var redisPrefix = "nowchess"
|
|
||||||
|
@Inject
|
||||||
|
private var syncRedis: RedisDataSource = uninitialized
|
||||||
|
private var redisPrefix = "nowchess"
|
||||||
|
|
||||||
@Inject
|
@Inject
|
||||||
private var meterRegistry: MeterRegistry = uninitialized
|
private var meterRegistry: MeterRegistry = uninitialized
|
||||||
@@ -42,6 +46,21 @@ class InstanceRegistry:
|
|||||||
def setRedisPrefix(prefix: String): Unit =
|
def setRedisPrefix(prefix: String): Unit =
|
||||||
redisPrefix = prefix
|
redisPrefix = prefix
|
||||||
|
|
||||||
|
def loadAllFromRedis(): Unit =
|
||||||
|
val keys = syncRedis.key(classOf[String]).keys(s"$redisPrefix:instances:*")
|
||||||
|
keys.asScala.foreach { key =>
|
||||||
|
val instanceId = key.stripPrefix(s"$redisPrefix:instances:")
|
||||||
|
val json = syncRedis.value(classOf[String]).get(key)
|
||||||
|
if json != null then
|
||||||
|
try
|
||||||
|
val metadata = mapper.readValue(json, classOf[InstanceMetadata])
|
||||||
|
instances.put(instanceId, metadata)
|
||||||
|
log.infof("Startup: loaded instance %s from Redis", instanceId)
|
||||||
|
catch
|
||||||
|
case ex: Exception =>
|
||||||
|
log.warnf(ex, "Startup: failed to parse instance %s", instanceId)
|
||||||
|
}
|
||||||
|
|
||||||
def getInstance(instanceId: String): Option[InstanceMetadata] =
|
def getInstance(instanceId: String): Option[InstanceMetadata] =
|
||||||
Option(instances.get(instanceId))
|
Option(instances.get(instanceId))
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user