Add an easy local data pipeline feeding GPU training on Colab.
- SelfPlayMain: standalone NNUEBot self-play (no microservices) writing FENs
for labeling; randomised openings for game diversity, sequential due to the
shared EvaluationNNUE accumulator. Exposed via the `selfPlay` Gradle task and
selfplay.sh.
- NNUEBot: optional fixedMoveTimeMs so self-play runs fast (default unchanged).
- NbaiLoader: honor `-Dnnue.weights=<path>` to load weights from a file before
falling back to the bundled resource.
- build_dataset.py / dataset.sh: one command builds the entire dataset
(Lichess eval-DB backbone + self-play + tactical + random filler), dedups,
balances the eval histogram, writes append-only zstd shards + manifest, and
rclone-pushes to Drive.
- train.py: NNUEDataset reads a directory of .jsonl.zst shards (streaming) in
addition to a single file.
- NNUETraining.ipynb: clone to ephemeral /content, sync shards from Drive
(cache-aware), train on the shards dir; removed Colab generation/upload steps.
- Concept + implementation plan docs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>