Densifying the 98304-dim HalfKP vector per item filled host RAM and crashed the
Colab runtime even at small batch sizes. The dataset now yields only the ~64
active feature indices; a custom collate carries (row, col) pairs and the
training loop scatters them into a dense [B, INPUT_SIZE] tensor on the GPU. Host
RAM stays tiny; GPU holds one dense batch transiently.
- NNUEDataset.__getitem__ returns indices via new fen_to_indices.
- fen_to_features now derives from fen_to_indices (kept for external callers).
- _collate_sparse builds row/col index batches; loaders use it.
- train/val loops scatter to a GPU dense batch; loss weighting uses batch size.
- Notebook: BATCH_SIZE 4096 -> 8192 (host no longer the limit; GPU is).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Dense 98304-dim HalfKP features at batch_size=16384 cost ~6.4 GB/batch on the
host; with 8 hardcoded DataLoader workers and prefetch this OOM-killed the Colab
runtime.
- train.py: adaptive DataLoader workers (min(4, cpu_count), Colab free tier = 2),
overridable via NNUE_LOADER_WORKERS; persistent_workers only when > 0.
- NNUETraining.ipynb: lower BATCH_SIZE 16384 -> 4096 with a memory-cost note.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add an easy local data pipeline feeding GPU training on Colab.
- SelfPlayMain: standalone NNUEBot self-play (no microservices) writing FENs
for labeling; randomised openings for game diversity, sequential due to the
shared EvaluationNNUE accumulator. Exposed via the `selfPlay` Gradle task and
selfplay.sh.
- NNUEBot: optional fixedMoveTimeMs so self-play runs fast (default unchanged).
- NbaiLoader: honor `-Dnnue.weights=<path>` to load weights from a file before
falling back to the bundled resource.
- build_dataset.py / dataset.sh: one command builds the entire dataset
(Lichess eval-DB backbone + self-play + tactical + random filler), dedups,
balances the eval histogram, writes append-only zstd shards + manifest, and
rclone-pushes to Drive.
- train.py: NNUEDataset reads a directory of .jsonl.zst shards (streaming) in
addition to a single file.
- NNUETraining.ipynb: clone to ephemeral /content, sync shards from Drive
(cache-aware), train on the shards dir; removed Colab generation/upload steps.
- Concept + implementation plan docs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds python/NNUETraining.ipynb with five sections:
- Setup: mount Drive, clone/update repo, install deps + Stockfish
- Data: Option A (generate + label) or Option B (upload existing labeled.jsonl)
- Train: standard epoch loop or burst mode (recommended for Colab free tier)
- Export: convert best .pt checkpoint to .nbai via export.py
- Download: pull .nbai and .pt to local machine via files.download
Checkpoints and datasets are persisted to Google Drive so training
survives session disconnects and can be resumed automatically.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Janis Eccarius <eccariusjanis@gmail.com>
Reviewed-on: #81