Add an easy local data pipeline feeding GPU training on Colab.
- SelfPlayMain: standalone NNUEBot self-play (no microservices) writing FENs
for labeling; randomised openings for game diversity, sequential due to the
shared EvaluationNNUE accumulator. Exposed via the `selfPlay` Gradle task and
selfplay.sh.
- NNUEBot: optional fixedMoveTimeMs so self-play runs fast (default unchanged).
- NbaiLoader: honor `-Dnnue.weights=<path>` to load weights from a file before
falling back to the bundled resource.
- build_dataset.py / dataset.sh: one command builds the entire dataset
(Lichess eval-DB backbone + self-play + tactical + random filler), dedups,
balances the eval histogram, writes append-only zstd shards + manifest, and
rclone-pushes to Drive.
- train.py: NNUEDataset reads a directory of .jsonl.zst shards (streaming) in
addition to a single file.
- NNUETraining.ipynb: clone to ephemeral /content, sync shards from Drive
(cache-aware), train on the shards dir; removed Colab generation/upload steps.
- Concept + implementation plan docs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds python/NNUETraining.ipynb with five sections:
- Setup: mount Drive, clone/update repo, install deps + Stockfish
- Data: Option A (generate + label) or Option B (upload existing labeled.jsonl)
- Train: standard epoch loop or burst mode (recommended for Colab free tier)
- Export: convert best .pt checkpoint to .nbai via export.py
- Download: pull .nbai and .pt to local machine via files.download
Checkpoints and datasets are persisted to Google Drive so training
survives session disconnects and can be resumed automatically.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com>
Reviewed-on: #81