Commit Graph

340 Commits

Author SHA1 Message Date
Janis 6e37a7d209 fix(bot): drop game-stream play, poll with low delay
Build & Test (NowChessSystems) TeamCity build finished
The native JAX-RS client buffers the NDJSON game stream, so the read
blocked forever and the bot never moved. Remove streamGameLoop and play
purely via polling at a 150ms interval; clock-aware budgets and the
server-clock read are unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 00:56:32 +02:00
TeamCity eeae4f01b4 ci: bump version with Build-161 2026-06-30 20:29:13 +00:00
Janis 45b5719d63 feat(bot): clock-aware time management and stream-driven tournament play
Build & Test (NowChessSystems) TeamCity build finished
Tournament bots flagged in 5+3 classical: budgets were fixed and
clock-blind, HybridBot's veto re-search double-spent (up to 4s/move),
and the game loop polled every 1s, burning our clock waiting on the
opponent.

- Bot is now a trait taking a TimeControl (remaining + increment);
  apply(ctx) defaults to Unlimited so local/self-play/tests keep their
  fixed budgets.
- TimeControl.budget derives a per-move budget from the real clock with
  an overhead reserve, a panic mode under 20s, and a hard ceiling, so a
  bot can no longer flag from thinking.
- HybridBot splits one budget across main (0.7) and veto (0.3) searches
  instead of running two full searches.
- TournamentBotGamePlayer reads the server clock (seconds -> ms) and
  plays stream-driven via GET /game/{id}/stream (NDJSON, heartbeat-kept),
  so the opponent's move arrives instantly; polling stays as a fallback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 22:12:55 +02:00
Janis b2683a7f5a feat(bot): implement bot-vs-bot harness for NNUE evaluation 2026-06-30 22:12:53 +02:00
Janis 3437dab49b feat(bot): add Lazy SMP parallel search for the NNUE bot
Adds optional multithreaded search behind a thread count that defaults to
1, so the live bot's play is unchanged until explicitly configured.

- ParallelSearch runs N AlphaBetaSearch workers over one shared,
  already-lock-protected TranspositionTable. Each worker has its own NNUE
  evaluator (independent accumulator) and ordering state; helpers only
  deepen the shared TT, the main worker's move is returned.
- AlphaBetaSearch gains bestMoveWithTimeSharedTt: the coordinator clears
  the shared TT once before launching workers, so helpers must not clear.
- EvaluationNNUE.freshEvaluator builds independent evaluators sharing the
  immutable weights (one per thread); the singleton still backs the
  default single-instance path.
- NNUEBot uses ParallelSearch with NNUE_SEARCH_THREADS (default 1).

numThreads <= 1 takes the single-worker clearing path, identical to the
previous sequential search. Strength can be validated by self-play
(threads N vs 1) before promoting the default.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 22:12:52 +02:00
Janis b72e8ec017 refactor(bot): split NNUE into shared weights and per-thread evaluator
Prerequisite for parallel search. NNUE held all state on one instance:
the immutable transposed L1 weight matrix alongside the mutable
accumulator stack, scratch buffers and eval cache. That made concurrent
eval calls corrupt shared buffers.

Extract the read-only parameters into NNUEWeights (heavy to build, safe
to share). NNUE now owns only per-instance mutable buffers and references
the shared weights, so many evaluators can run in parallel over one weight
matrix without duplicating it. Single-instance behaviour is unchanged —
EvaluationNNUE still uses one evaluator, so play is identical.

Also applies scalafmt alignment to the MopUp files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 22:12:50 +02:00
TeamCity 7136803c7e ci: bump version with Build-160 2026-06-30 10:21:06 +00:00
Janis faf7eb38ea fix(bot): seed search with game history, add contempt and NNUE mop-up
Build & Test (NowChessSystems) TeamCity build finished
Repetition: alpha-beta seeded the repetition map with only the root
position, so search was blind to positions already reached in the real
game and would happily shuffle into draws when ahead. Reconstruct the
full game-history position hashes by replaying moves and seed the search
state with them; treat a twofold occurrence at non-root nodes as a draw.

Contempt: draws are now scored CONTEMPT (25cp) away from zero, signed by
ply parity, so the bot avoids dead-equal repetitions instead of settling.

Endgame: pure NNUE lacks mating knowledge and stalls KX-vs-K conversions.
Add a MopUp correction (edge-driving + king-proximity) applied only in
lone-king endgames with sufficient mating material; zero elsewhere so
middlegame NNUE output is untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 12:04:27 +02:00
TeamCity 344bed6935 ci: bump version with Build-159 2026-06-29 17:34:14 +00:00
Janis Eccarius 4938560014 fix(bot): include quiet promotions in quiescence search
Build & Test (NowChessSystems) TeamCity build finished
Quiescence tactical filter only flagged capture-promotions, so a quiet
queening on an empty back-rank square was treated as non-tactical and
skipped at the search horizon. A bot could therefore miss a winning
promotion sitting exactly at the horizon and play another move. All bots
(Classical/NNUE/Hybrid) share AlphaBetaSearch and were affected.

Treat every promotion as tactical so quiescence always expands it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 19:18:31 +02:00
TeamCity 4a397eed7f ci: bump version with Build-158 2026-06-24 20:41:25 +00:00
Janis Eccarius 9d656624d8 fix(official-bots): stream NNUE features as sparse indices to stop host OOM
Build & Test (NowChessSystems) TeamCity build finished
Densifying the 98304-dim HalfKP vector per item filled host RAM and crashed the
Colab runtime even at small batch sizes. The dataset now yields only the ~64
active feature indices; a custom collate carries (row, col) pairs and the
training loop scatters them into a dense [B, INPUT_SIZE] tensor on the GPU. Host
RAM stays tiny; GPU holds one dense batch transiently.

- NNUEDataset.__getitem__ returns indices via new fen_to_indices.
- fen_to_features now derives from fen_to_indices (kept for external callers).
- _collate_sparse builds row/col index batches; loaders use it.
- train/val loops scatter to a GPU dense batch; loss weighting uses batch size.
- Notebook: BATCH_SIZE 4096 -> 8192 (host no longer the limit; GPU is).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 22:28:53 +02:00
Janis Eccarius e2b4342f60 fix(official-bots): prevent Colab OOM in NNUE training
Build & Test (NowChessSystems) TeamCity build finished
Dense 98304-dim HalfKP features at batch_size=16384 cost ~6.4 GB/batch on the
host; with 8 hardcoded DataLoader workers and prefetch this OOM-killed the Colab
runtime.

- train.py: adaptive DataLoader workers (min(4, cpu_count), Colab free tier = 2),
  overridable via NNUE_LOADER_WORKERS; persistent_workers only when > 0.
- NNUETraining.ipynb: lower BATCH_SIZE 16384 -> 4096 with a memory-cost note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 22:18:18 +02:00
TeamCity 9d56446c65 ci: bump version with Build-156 2026-06-24 20:17:17 +00:00
Janis Eccarius 1c80abdb8a feat(official-bots): standalone self-play + one-shot dataset builder for NNUE training
Build & Test (NowChessSystems) TeamCity build finished
Add an easy local data pipeline feeding GPU training on Colab.

- SelfPlayMain: standalone NNUEBot self-play (no microservices) writing FENs
  for labeling; randomised openings for game diversity, sequential due to the
  shared EvaluationNNUE accumulator. Exposed via the `selfPlay` Gradle task and
  selfplay.sh.
- NNUEBot: optional fixedMoveTimeMs so self-play runs fast (default unchanged).
- NbaiLoader: honor `-Dnnue.weights=<path>` to load weights from a file before
  falling back to the bundled resource.
- build_dataset.py / dataset.sh: one command builds the entire dataset
  (Lichess eval-DB backbone + self-play + tactical + random filler), dedups,
  balances the eval histogram, writes append-only zstd shards + manifest, and
  rclone-pushes to Drive.
- train.py: NNUEDataset reads a directory of .jsonl.zst shards (streaming) in
  addition to a single file.
- NNUETraining.ipynb: clone to ephemeral /content, sync shards from Drive
  (cache-aware), train on the shards dir; removed Colab generation/upload steps.
- Concept + implementation plan docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 22:04:22 +02:00
TeamCity c8cbcdca3b ci: bump version with Build-155 2026-06-24 18:21:11 +00:00
Janis e4fee85134 feat(ncs-110): feed NNUE root-move scores into search move ordering (#83)
Build & Test (NowChessSystems) TeamCity build finished
Pre-evaluated NNUE scores from NNUEBot.batchEvaluateRoot are now passed
as root hints into AlphaBetaSearch, improving move ordering at ply 0 before
the TT is populated. Hints are threaded immutably through SearchParams to
satisfy the no-var constraint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com>
Reviewed-on: #83
2026-06-24 20:09:28 +02:00
TeamCity b4709b4a33 ci: bump version with Build-154 2026-06-24 17:55:44 +00:00
Janis Eccarius 9f9140cb58 fix: modified training pipeline
Build & Test (NowChessSystems) TeamCity build finished
2026-06-24 19:37:26 +02:00
Janis fa10852bc9 feat(official-bots): add Google Colab notebook for NNUE training (NCS-111) (#81)
Build & Test (NowChessSystems) TeamCity build finished
Adds python/NNUETraining.ipynb with five sections:
- Setup: mount Drive, clone/update repo, install deps + Stockfish
- Data: Option A (generate + label) or Option B (upload existing labeled.jsonl)
- Train: standard epoch loop or burst mode (recommended for Colab free tier)
- Export: convert best .pt checkpoint to .nbai via export.py
- Download: pull .nbai and .pt to local machine via files.download

Checkpoints and datasets are persisted to Google Drive so training
survives session disconnects and can be resumed automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com>
Reviewed-on: #81
2026-06-24 19:33:24 +02:00
Janis 44f376f032 feat(official-bots): implement king-relative (HalfKP) encoding in NNUE (NCS-109) (#80)
Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com>
Reviewed-on: #80
2026-06-24 19:33:12 +02:00
TeamCity 7372867a82 ci: bump version with Build-152 2026-06-23 22:30:53 +00:00
Janis Eccarius c3e7b82ae8 feat(analytics): add accuracy and blunder analysis job for Lichess data
Build & Test (NowChessSystems) TeamCity build finished
2026-06-24 00:21:40 +02:00
TeamCity e88b081947 ci: bump version with Build-151 2026-06-23 21:54:06 +00:00
Janis Eccarius 1b30c3be39 fix(official-bots): use ThreadLocalRandom in PolyglotBook for native image
Build & Test (NowChessSystems) TeamCity build finished
A stored java.util.Random field is reachable from BotController's static
openingBook, so GraalVM baked it into the image heap and aborted the
native build (Random in image heap has a cached seed). Use
ThreadLocalRandom.current() at call time instead — no stored instance,
nothing in the image heap, still thread-safe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:42:15 +02:00
Janis Eccarius f8ca95af3c refactor(official-bots): use java.util.Random in PolyglotBook
Build & Test (NowChessSystems) TeamCity build finished
scala.util.Random delegates to a shared global java.util.Random, a
contention point across concurrent bot games. Use a per-book
java.util.Random instance instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:34:38 +02:00
TeamCity 4a50db0721 ci: bump version with Build-150 2026-06-23 21:27:19 +00:00
Janis Eccarius 260db25803 feat(official-bots): activate opening book in expert bot (native-safe)
Build & Test (NowChessSystems) TeamCity build finished
Load the Polyglot opening book as a classpath resource and wire it into
the expert HybridBot. Previously the bot supported Option[PolyglotBook]
but BotController passed None, so the book was never used.

PolyglotBook.fromResource reads via getResourceAsStream so the book is
embedded in the GraalVM native image instead of read from the filesystem
(FileInputStream) — no file needs mounting into the pod. The filesystem
apply(path) factory is kept for tests. Moved codekiddy.bin into
resources as opening_book.bin. Dropped the per-probe debug println.

NCS-43

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:17:52 +02:00
TeamCity 80e1cc258b ci: bump version with Build-149 2026-06-23 21:08:35 +00:00
Janis bfc46723e6 fix(official-bots): derive tournament game color from game endpoint (#79)
Build & Test (NowChessSystems) TeamCity build finished
Tournament-server reports wrong color in pairings (everyone white), so
auto-joined games could play with an inverted color and never move on
their real turn. The game endpoint white/black ids are correct, so the
poll loop now derives our color from it, falling back to the passed-in
color. Both stream and auto-join entry paths are now immune to the bug.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com>
Reviewed-on: #79
2026-06-23 22:58:09 +02:00
TeamCity bace029a8a ci: bump version with Build-148 2026-06-23 20:31:27 +00:00
Janis Eccarius 7664042193 fix(tournament): mirror bot join onto native twin
Build & Test (NowChessSystems) TeamCity build finished
The UI reads participant/standings fields from the native-server twin
(nativeOverlay), but bot join only wrote the NowChess participant list,
so bots never appeared in replicated/native-published tournaments. On
join, register the bot on the native server by name and join the twin
as that bot. Also run this for the AlreadyJoined case so bots stuck in
the NowChess list (but missing on native) get reconciled, and return
200 instead of 409 for it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 22:20:56 +02:00
TeamCity a604b4ad42 ci: bump version with Build-147 2026-06-23 19:33:36 +00:00
Janis Eccarius fdf4c94811 fix(official-bots): resolve per-difficulty bot token on tournament join
Build & Test (NowChessSystems) TeamCity build finished
joinTournament only ever had a token for the startup difficulty
(default medium); other difficulties fell back to the single shared
TOURNAMENT_BOT_TOKEN, which our tournament server rejects (401),
surfacing as 400 "Failed to join tournament" in the UI. Resolve and
cache a token for the requested difficulty instead.

Prefer the account-service token over anonymous register in
resolveToken so the bot joins as its canonical identity rather than a
throwaway account (medium joined but never appeared as a participant).

Add NativeReflectionConfig for JoinTournamentRequest/Response so the
success path serializes in native image instead of returning an empty
200 body.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:23:41 +02:00
TeamCity d9f30f0bfe ci: bump version with Build-146 2026-06-23 18:45:05 +00:00
Janis 1f4e9c8498 fix(tournament): sync native-server participants and route start (#78)
Build & Test (NowChessSystems) TeamCity build finished
Bots joining a published tournament directly on the native server were not
reflected in NowChess (0 players) and the tournament could not be started,
because create() kept a local copy plus a separate native copy whose id was
discarded — leaving the two records disconnected.

- Capture the native tournament id: createNative/publishNative now return the
  id instead of Boolean; persist it on Tournament.nativeTournamentId.
- Reverse-sync on read: get()/list() overlay nbPlayers/standing/status/round/
  winner from the native twin (with a fullName backfill for tournaments created
  before the id was captured).
- start(): proxy to the native twin (director token via authFor) so the native
  participants are used; mirror the started status locally.
- Skip the native server in the replicate loop (it has no /replicate endpoint),
  removing the per-create "Failed to replicate" warning.
- Isolate native integration in tournament unit tests (native-server-url no
  longer defaults to the live server).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com>
Reviewed-on: #78
2026-06-23 20:34:30 +02:00
TeamCity e2b13c0c8f ci: bump version with Build-145 2026-06-23 13:18:03 +00:00
Janis bfb15c7299 fix(official-bots): play games by polling state instead of NDJSON stream
Build & Test (NowChessSystems) TeamCity build finished
In the native image the JAX-RS client buffers streaming responses, so reading
the NDJSON game stream blocks forever — the bot discovered its game ("Playing
game …") but never saw its turn and never moved, with no error. Replace the
game-stream consumer with a poll loop over plain GET game-state calls (which
work natively): when it is our turn, compute and submit. Drop the now-unused
stream consumer, move helper, and game-stream opener. Auto-join no longer
spawns per-tournament event-stream threads; polling handles discovery + play.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 15:09:39 +02:00
TeamCity 627f017cdc ci: bump version with Build-144 2026-06-23 12:49:17 +00:00
Janis 10113fd057 fix(official-bots): discover tournament games by polling, not just the stream
Build & Test (NowChessSystems) TeamCity build finished
The tournament-server does not replay gameStart to late subscribers — a
subscriber that connects after a game activates receives only heartbeats.
The bot relied solely on live gameStart events, so any reconnect or restart
after activation left it blind and it never played (games recorded with no
moves, losing on both colors).

Now each scan polls every joined tournament's current-round pairings, finds
the bot's own non-finished game and color, and starts playing it. The game
stream still drives moves once a game is discovered. Verified end-to-end
against the live server.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:40:50 +02:00
TeamCity b57e5827df ci: bump version with Build-143 2026-06-23 11:55:27 +00:00
Janis b98bdd2a64 fix(tournament): use HS256 director token for native tournament-server calls
Build & Test (NowChessSystems) TeamCity build finished
The tournament-server only accepts HS256 tokens it issued; forwarding a
NowChessSystems RS256 user token caused "unsupported algorithm". Proxied
calls (start/join/withdraw/stream) targeting the native server now swap in
the director token registered on that server.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 13:47:13 +02:00
Janis 285b73efbd fix(official-bots): resume tournaments already joined after restart
A 409 on join means the bot is already a participant (in-memory join set is
empty after a pod restart). Treat 409 as success and start playing instead of
dropping the tournament and spamming errors every scan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 13:46:57 +02:00
TeamCity 06f2adfeb6 ci: bump version with Build-142 2026-06-23 08:52:12 +00:00
Janis 4651bb796f fix(official-bots): play only own tournament games with correct color
Build & Test (NowChessSystems) TeamCity build finished
The tournament stream broadcasts a gameStart per color for every pairing in
the round, without a player id. The bot latched the first color it saw and
played games it was not part of, submitting moves for the wrong color that
the server rejected. Now it fetches game detail and matches its botId against
white/black to resolve its real color, skipping games it is not in.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 10:37:28 +02:00
Janis 1df29cf3a6 feat(official-bots): make HybridBot veto actionable and use it for expert
Build & Test (NowChessSystems) TeamCity build finished
When classical and NNUE evals diverge above the veto threshold, HybridBot
now re-searches excluding the suspect move and switches to NNUE's preferred
alternative instead of merely logging. BotController maps the expert bot to
HybridBot so tournament auto-join uses it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 10:30:55 +02:00
TeamCity ff492e1dc8 ci: bump version with Build-140 2026-06-23 08:10:24 +00:00
Janis 9978b7ea78 feat(tournament): auto-join external tournaments and publish created ones (#77)
Build & Test (NowChessSystems) TeamCity build finished
Official bots now poll the external tournament server and auto-join every
created tournament with the hardest bot (expert). Tournaments created in
NowChessSystems are forwarded to the native tournament server so the bots
can see and join them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Reviewed-on: #77
2026-06-23 10:01:35 +02:00
TeamCity 9a9784673f ci: bump version with Build-139 2026-06-22 20:44:36 +00:00
Janis Eccarius 83dd2d4335 fix(official-bots): prioritize Redis token over stale env var in joinTournament
Build & Test (NowChessSystems) TeamCity build finished
The env var TOURNAMENT_BOT_TOKEN was checked before Redis, so a stale
token set in the k8s secret always won over the freshly-registered token
stored in Redis at startup. Swap order: request param → Redis → env var.

Also add WARN-level logging when registerWithServer fails (non-2xx or
exception), making the failure visible in the log stream since INFO is
filtered in production.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-22 22:22:28 +02:00