NowChessSystems

Author	SHA1	Message	Date
Janis Eccarius	9d656624d8	fix(official-bots): stream NNUE features as sparse indices to stop host OOM Build & Test (NowChessSystems) TeamCity build finished Details Densifying the 98304-dim HalfKP vector per item filled host RAM and crashed the Colab runtime even at small batch sizes. The dataset now yields only the ~64 active feature indices; a custom collate carries (row, col) pairs and the training loop scatters them into a dense [B, INPUT_SIZE] tensor on the GPU. Host RAM stays tiny; GPU holds one dense batch transiently. - NNUEDataset.__getitem__ returns indices via new fen_to_indices. - fen_to_features now derives from fen_to_indices (kept for external callers). - _collate_sparse builds row/col index batches; loaders use it. - train/val loops scatter to a GPU dense batch; loss weighting uses batch size. - Notebook: BATCH_SIZE 4096 -> 8192 (host no longer the limit; GPU is). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 22:28:53 +02:00
Janis Eccarius	e2b4342f60	fix(official-bots): prevent Colab OOM in NNUE training Build & Test (NowChessSystems) TeamCity build finished Details Dense 98304-dim HalfKP features at batch_size=16384 cost ~6.4 GB/batch on the host; with 8 hardcoded DataLoader workers and prefetch this OOM-killed the Colab runtime. - train.py: adaptive DataLoader workers (min(4, cpu_count), Colab free tier = 2), overridable via NNUE_LOADER_WORKERS; persistent_workers only when > 0. - NNUETraining.ipynb: lower BATCH_SIZE 16384 -> 4096 with a memory-cost note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 22:18:18 +02:00
Janis Eccarius	1c80abdb8a	feat(official-bots): standalone self-play + one-shot dataset builder for NNUE training Build & Test (NowChessSystems) TeamCity build finished Details Add an easy local data pipeline feeding GPU training on Colab. - SelfPlayMain: standalone NNUEBot self-play (no microservices) writing FENs for labeling; randomised openings for game diversity, sequential due to the shared EvaluationNNUE accumulator. Exposed via the `selfPlay` Gradle task and selfplay.sh. - NNUEBot: optional fixedMoveTimeMs so self-play runs fast (default unchanged). - NbaiLoader: honor `-Dnnue.weights=<path>` to load weights from a file before falling back to the bundled resource. - build_dataset.py / dataset.sh: one command builds the entire dataset (Lichess eval-DB backbone + self-play + tactical + random filler), dedups, balances the eval histogram, writes append-only zstd shards + manifest, and rclone-pushes to Drive. - train.py: NNUEDataset reads a directory of .jsonl.zst shards (streaming) in addition to a single file. - NNUETraining.ipynb: clone to ephemeral /content, sync shards from Drive (cache-aware), train on the shards dir; removed Colab generation/upload steps. - Concept + implementation plan docs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 22:04:22 +02:00
Janis Eccarius	9f9140cb58	fix: modified training pipeline Build & Test (NowChessSystems) TeamCity build finished Details	2026-06-24 19:37:26 +02:00
Janis	fa10852bc9	feat(official-bots): add Google Colab notebook for NNUE training (NCS-111) (#81 ) Build & Test (NowChessSystems) TeamCity build finished Details Adds python/NNUETraining.ipynb with five sections: - Setup: mount Drive, clone/update repo, install deps + Stockfish - Data: Option A (generate + label) or Option B (upload existing labeled.jsonl) - Train: standard epoch loop or burst mode (recommended for Colab free tier) - Export: convert best .pt checkpoint to .nbai via export.py - Download: pull .nbai and .pt to local machine via files.download Checkpoints and datasets are persisted to Google Drive so training survives session disconnects and can be resumed automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com> Reviewed-on: #81	2026-06-24 19:33:24 +02:00

5 Commits