feat: integrate NNUE bot and add Python training pipeline with weight export functionality

This commit is contained in:
2026-04-07 23:33:20 +02:00
parent 6a9ac55b31
commit b25be99dcf
29 changed files with 338 additions and 2538 deletions
-165
View File
@@ -1,165 +0,0 @@
# NNUE Implementation Summary
## ✅ Complete
The NNUE training pipeline and Scala integration have been fully implemented and tested. All code compiles without errors.
## Python Pipeline (modules/bot/python/)
### Files Created
1. **requirements.txt** — Python dependencies
- python-chess 1.10.0
- torch 2.1.2
- tqdm 4.66.1
2. **generate_positions.py** — Step 1: Position Generator
- Generates 500,000 random chess positions
- Filters out invalid positions (checks, captures available, game-over)
- Shows progress bar with tqdm
- Output: `positions.txt`
3. **label_positions.py** — Step 2: Stockfish Labeler
- Reads positions.txt
- Evaluates each position with Stockfish at depth 12
- Clamps evaluations to [-2000, 2000] centipawns
- Supports resuming if interrupted
- Output: `training_data.jsonl`
- Uses STOCKFISH_PATH environment variable
4. **train_nnue.py** — Step 3: NNUE Trainer
- Loads training_data.jsonl
- Converts FENs to 768-dimensional binary feature vectors (12 piece types × 64 squares)
- Architecture: Linear(768→256) → ReLU → Linear(256→32) → ReLU → Linear(32→1)
- Loss: MSE with sigmoid(eval/400) targets
- Training: 20 epochs, batch size 4096, Adam (lr=1e-3), 90/10 train/val split
- Output: `nnue_weights.pt`
- GPU-accelerated with CPU fallback
5. **export_weights.py** — Step 4: Weight Exporter
- Loads nnue_weights.pt
- Exports all weights as Scala 3 Array literals
- Output: `../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala`
6. **run_pipeline.sh** — Master Script
- Runs all 4 steps in sequence
- Confirms each step succeeds before proceeding
- Error handling with clear error messages
7. **README_NNUE.md** — Complete Documentation
- Step-by-step usage instructions
- File reference guide
- Troubleshooting tips
- Performance optimization hints
## Scala Implementation (modules/bot/src/main/scala/de/nowchess/bot/bots/nnue/)
### Files Created
1. **NNUE.scala** — Neural Network Inference Engine
- `class NNUE`
- `positionToFeatures()` — Converts positions to 768-dimensional vectors
- `evaluate()` — Runs inference: input → dense → relu → dense → relu → dense
- Pre-allocated buffers for zero-copy inference
- Handles side-to-move perspective (mirroring for black)
- Returns centipawn score clamped to [-20000, 20000]
2. **EvaluationNNUE.scala** — Weights Trait Implementation
- `object EvaluationNNUE extends Weights`
- Implements required interface: `CHECKMATE_SCORE`, `DRAW_SCORE`, `evaluate()`
- Instantiates and uses NNUE for position evaluation
3. **NNUEBot.scala** — Bot Implementation
- `class NNUEBot extends Bot`
- Uses AlphaBetaSearch with EvaluationNNUE weights
- Supports Polyglot opening book
- Time budget: 1000ms per move
- Follows ClassicalBot pattern
4. **NNUEWeights.scala** — Placeholder Weights
- Generated by export_weights.py
- Contains l1/l2/l3 weights and biases as Array[Float]
- Loaded at compile time (no runtime file I/O)
## Test Fixes
Updated `AlphaBetaSearchTest.scala` to include the required `weights` parameter in all AlphaBetaSearch constructor calls:
- Added import of `EvaluationClassic`
- Fixed 12 test cases to pass `weights = EvaluationClassic`
## Compilation Status
**BUILD SUCCESSFUL** — All modules compile without errors.
```
> Task :modules:bot:compileScala
> Task :modules:bot:classes
> Task :modules:bot:jar
BUILD SUCCESSFUL in 8s
```
## Next Steps
1. **Install Python dependencies:**
```bash
cd modules/bot/python
pip install -r requirements.txt
```
2. **Ensure Stockfish is available:**
```bash
export STOCKFISH_PATH=/path/to/stockfish
```
3. **Run the training pipeline:**
```bash
cd modules/bot/python
chmod +x run_pipeline.sh
./run_pipeline.sh
```
This will:
- Generate 500,000 positions (Step 1)
- Label with Stockfish (Step 2) — *slower step, ~24-36 hours*
- Train NNUE model (Step 3) — *~2-4 hours on GPU*
- Export weights to Scala (Step 4) — *automatic*
4. **Recompile and test:**
```bash
./compile
./test
```
## Architecture Notes
- **Feature Vector:** 768 dimensions (12 piece types × 64 squares)
- Piece ordering: Pawn, Knight, Bishop, Rook, Queen, King (×2 for white/black)
- Always from white's perspective; black positions are mirrored
- **Network Layers:**
1. Input → Dense(768→256) + ReLU
2. Dense(256→32) + ReLU
3. Dense(32→1) → scales to centipawns
- **Integration:**
- NNUEWeights loaded at compile time
- Zero allocations in eval hot path
- Compatible with existing AlphaBetaSearch framework
- Can replace EvaluationClassic in any bot
## Performance
- **Inference:** ~1-2 microseconds per position (no allocations)
- **Memory:** 768 + 256 + 32 = 1,056 floats (4KB) for buffers
- **Search:** Uses existing AlphaBetaSearch with 1000ms time budget
## Testing
The implementation:
- ✅ Compiles without errors
- ✅ Follows Scala 3.5 standards
- ✅ Integrates with existing GameContext, Board, and Move APIs
- ✅ Implements required Weights trait interface
- ✅ Uses pre-allocated arrays for zero-copy inference
- ✅ Maintains immutability patterns
- ✅ Compatible with AlphaBetaSearch framework
-144
View File
@@ -1,144 +0,0 @@
# NNUE Pipeline Quickstart
## Prerequisites
### Install Python Dependencies
```bash
cd modules/bot/python
pip install -r requirements.txt
```
### Install Stockfish
**macOS:**
```bash
brew install stockfish
```
**Linux (Debian/Ubuntu):**
```bash
apt-get install stockfish
```
**Windows:**
- Download from https://stockfishchess.org
- Or use Chocolatey: `choco install stockfish`
- Add to PATH or set `STOCKFISH_PATH` environment variable
## Run the Full Pipeline
### Easiest: Launcher Scripts (Recommended)
From `modules/bot/` directory:
**Windows (Command Prompt or PowerShell):**
```cmd
run_nnue_pipeline.bat
```
**Linux/macOS/Windows (Git Bash/WSL):**
```bash
chmod +x run_nnue_pipeline.sh
./run_nnue_pipeline.sh
```
### Alternative: Direct Scripts
From `modules/bot/python/` directory:
**Windows (Command Prompt):**
```cmd
cd python
set STOCKFISH_PATH=C:\path\to\stockfish.exe
run_pipeline.bat
```
**Bash (Linux, macOS, Git Bash, WSL):**
```bash
cd python
export STOCKFISH_PATH=/path/to/stockfish
chmod +x run_pipeline.sh
./run_pipeline.sh
```
**PowerShell (Windows):**
```powershell
cd python
$env:STOCKFISH_PATH = "C:\path\to\stockfish.exe"
bash ./run_pipeline.sh
```
The pipeline will:
1. Generate 500,000 random positions (~2-3 minutes)
2. Evaluate with Stockfish depth 12 (~24-36 hours on typical machine)
3. Train NNUE network (20 epochs, ~2-4 hours on GPU)
4. Export weights to Scala (~1 minute)
## For Quick Testing
Reduce the position count to test the pipeline quickly:
```python
# Edit generate_positions.py, change:
# for game_num in range(500000): # Change 500000 to 1000
# for game_num in range(1000):
```
Then run:
```bash
./run_pipeline.sh
```
This will complete in ~30-60 minutes total, allowing you to test the full pipeline.
## After Pipeline Completes
```bash
# Navigate to project root
cd ../..
# Recompile (loads the new NNUEWeights.scala)
./compile
# Run tests
./test
```
## Architecture Quick Reference
- **Input:** Board position (768 binary features)
- **Network:** Linear(768→256) → ReLU → Linear(256→32) → ReLU → Linear(32→1)
- **Output:** Centipawn evaluation (-20000 to +20000)
- **Training:** Stockfish evals → sigmoid(eval/400) targets → MSE loss
## Troubleshooting
**"Module not found: chess"**
```bash
pip install python-chess==1.10.0
```
**"CUDA out of memory"**
- Edit `train_nnue.py` line 91: change `batch_size=4096` to `batch_size=2048`
**"Stockfish not found"**
```bash
export STOCKFISH_PATH=$(which stockfish)
# or provide full path
export STOCKFISH_PATH=/usr/bin/stockfish
```
**"ModuleNotFoundError: No module named 'torch'"**
```bash
pip install torch==2.1.2
```
## Files Generated
- `positions.txt` — 500,000 FENs
- `training_data.jsonl` — FEN + Stockfish evaluation pairs
- `nnue_weights.pt` — PyTorch model
- `../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala` — Scala code
See `README_NNUE.md` for detailed documentation.
-261
View File
@@ -1,261 +0,0 @@
# Windows Users: Start Here!
This guide gets you running the NNUE pipeline on Windows in 5 minutes.
## TL;DR — Quick Start
1. **Install prerequisites:**
```cmd
pip install -r python/requirements.txt
```
2. **Download Stockfish** from https://stockfishchess.org/download/ and note the path
3. **Run the pipeline:**
```cmd
set STOCKFISH_PATH=C:\path\to\stockfish.exe
run_nnue_pipeline.bat
```
Done! The pipeline will:
- Generate 500,000 chess positions (~2 min)
- Evaluate with Stockfish (~24-36 hours)
- Train neural network (~2-4 hours)
- Generate Scala code (~1 min)
## Launcher Options
### 1. Command Prompt/PowerShell (Easiest)
```cmd
cd modules\bot
REM Optional: set Stockfish path
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
REM Run the pipeline
run_nnue_pipeline.bat
```
### 2. PowerShell (Colorful Output)
```powershell
cd modules\bot
# Optional: set Stockfish path
$env:STOCKFISH_PATH = "C:\stockfish\stockfish.exe"
# Run the pipeline
.\run_nnue_pipeline.ps1
```
### 3. Git Bash (If You Have It)
```bash
cd modules/bot
export STOCKFISH_PATH=/c/stockfish/stockfish.exe
bash run_nnue_pipeline.sh
```
## Available Scripts
| Script | Location | Usage |
|--------|----------|-------|
| `run_nnue_pipeline.bat` | `modules/bot/` | Windows batch launcher (easiest) |
| `run_nnue_pipeline.ps1` | `modules/bot/` | PowerShell launcher (colorful) |
| `run_nnue_pipeline.sh` | `modules/bot/` | Bash launcher (for Git Bash/WSL) |
| `run_pipeline.bat` | `modules/bot/python/` | Direct batch runner |
| `run_pipeline.sh` | `modules/bot/python/` | Direct bash runner |
## Step-by-Step Setup
### Step 1: Check Python
```cmd
python --version
```
If Python is not installed:
1. Download from https://python.org
2. Run installer
3. **IMPORTANT:** Check "Add Python to PATH"
4. Verify: `python --version`
### Step 2: Install Dependencies
```cmd
cd modules\bot\python
pip install -r requirements.txt
```
This installs:
- `python-chess` — chess engine interface
- `torch` — neural network training
- `tqdm` — progress bars
### Step 3: Get Stockfish
Option A (Recommended): Download from https://stockfishchess.org/download/
- Extract to `C:\stockfish`
- Verify: `C:\stockfish\stockfish.exe --version`
Option B (If using Chocolatey):
```cmd
choco install stockfish
```
### Step 4: Run Pipeline
From `modules\bot\`:
```cmd
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
run_nnue_pipeline.bat
```
## What Each Step Does
### Step 1: Generate Positions (2-3 minutes)
```cmd
python python\generate_positions.py python\positions.txt
```
Creates 500,000 random chess positions saved to `positions.txt`
### Step 2: Evaluate with Stockfish (24-36 hours)
```cmd
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
python python\label_positions.py python\positions.txt python\training_data.jsonl %STOCKFISH_PATH%
```
Evaluates each position at depth 12. This is the slowest step.
### Step 3: Train Network (2-4 hours)
```cmd
python python\train_nnue.py python\training_data.jsonl python\nnue_weights.pt
```
Trains a 768→256→32→1 neural network. Faster on GPU.
### Step 4: Export Weights (1 minute)
```cmd
python python\export_weights.py python\nnue_weights.pt src\main\scala\de\nowchess\bot\bots\nnue\NNUEWeights.scala
```
Exports PyTorch weights as Scala code.
## Monitoring Progress
### Check Step 2 (Stockfish) Progress
The Stockfish evaluation is slow but shows progress. Check the size of `training_data.jsonl`:
```cmd
cd modules\bot\python
dir training_data.jsonl
```
The file grows as positions are evaluated. If it's increasing, the pipeline is working!
### If Pipeline Gets Interrupted
The pipeline saves progress and can resume:
```cmd
REM Just run the pipeline again
run_nnue_pipeline.bat
REM It will skip already-processed positions and continue
```
## Troubleshooting
### "python is not recognized"
Python isn't in PATH. Fix:
1. Reinstall Python from python.org
2. **CHECK** "Add Python to PATH" during installation
3. Restart Command Prompt
Or manually add to PATH:
1. Press `Win+R`, type `systempropertiesadvanced.exe`
2. Click "Environment Variables"
3. Add `C:\Users\YourName\AppData\Local\Programs\Python\Python310` to `Path`
### "stockfish not found"
Set the full path:
```cmd
where stockfish
REM Then use the full path:
set STOCKFISH_PATH=C:\full\path\to\stockfish.exe
```
### "ModuleNotFoundError: No module named 'torch'"
Reinstall PyTorch:
```cmd
pip install torch==2.1.2
```
### "CUDA out of memory"
If using GPU and training fails, reduce batch size:
Edit `modules\bot\python\train_nnue.py`, line ~91:
```python
# Change from:
train_loader = DataLoader(train_dataset, batch_size=4096, shuffle=True)
# To:
train_loader = DataLoader(train_dataset, batch_size=2048, shuffle=True)
```
## After Pipeline Completes
1. New file created: `modules\bot\src\main\scala\de\nowchess\bot\bots\nnue\NNUEWeights.scala`
2. Rebuild the project:
```cmd
cd ..\..\
compile.bat
test.bat
```
## Expected Output
When running `run_nnue_pipeline.bat`, you should see:
```
=== NNUE Training Pipeline ===
Step 1: Generating 500,000 random positions...
[progress bar]
[OK] Positions generated
Step 2: Labeling positions with Stockfish (depth 12)...
[progress bar - this takes 24+ hours]
[OK] Positions labeled
Step 3: Training NNUE model (20 epochs)...
[progress bar showing epoch progress]
[OK] Model trained
Step 4: Exporting weights to Scala...
[progress bar]
[OK] Weights exported
=== Pipeline Complete ===
Next steps:
1. Navigate to project root: cd ..\..
2. Compile: .\compile.bat
3. Test: .\test.bat
```
## Need More Info?
- **Quick reference:** See `QUICKSTART.md`
- **Detailed setup:** See `WINDOWS_SETUP.md`
- **Complete docs:** See `python/README_NNUE.md`
- **Implementation details:** See `NNUE_IMPLEMENTATION_SUMMARY.md`
## Still Stuck?
Check `WINDOWS_SETUP.md` section "Troubleshooting" for more solutions, or see `python/README_NNUE.md` for common issues.
-196
View File
@@ -1,196 +0,0 @@
# Windows NNUE Pipeline — Complete Guide
## Quick Links
**Start here:** [`README_WINDOWS.md`](README_WINDOWS.md) — 5-minute quick start
## Documentation Files
| File | Purpose | Time to Read |
|------|---------|------|
| **README_WINDOWS.md** | Windows quick start guide | 5 min |
| **WINDOWS_SETUP.md** | Detailed Windows setup with troubleshooting | 10 min |
| **QUICKSTART.md** | Cross-platform quick reference | 5 min |
| **python/README_NNUE.md** | Complete pipeline documentation | 15 min |
| **NNUE_IMPLEMENTATION_SUMMARY.md** | Technical implementation details | 10 min |
## Launcher Scripts
All scripts work from `modules\bot\` directory.
### Windows Command Prompt / PowerShell
```cmd
set STOCKFISH_PATH=C:\path\to\stockfish.exe
run_nnue_pipeline.bat
```
### PowerShell (Colorful, Recommended)
```powershell
$env:STOCKFISH_PATH = "C:\path\to\stockfish.exe"
.\run_nnue_pipeline.ps1
```
### Git Bash / WSL
```bash
export STOCKFISH_PATH=/c/path/to/stockfish.exe
bash run_nnue_pipeline.sh
```
## Python Pipeline Scripts
Located in `modules\bot\python\`:
| Script | Purpose |
|--------|---------|
| **generate_positions.py** | Step 1: Generate 500K random positions |
| **label_positions.py** | Step 2: Evaluate with Stockfish |
| **train_nnue.py** | Step 3: Train neural network |
| **export_weights.py** | Step 4: Export to Scala |
| **run_pipeline.bat** | Windows batch runner |
| **run_pipeline.sh** | Bash runner |
## Getting Started (3 Steps)
### 1. Install Python
```cmd
REM Check if Python is installed
python --version
REM If not, download from https://python.org
REM During installation, CHECK "Add Python to PATH"
```
### 2. Install Dependencies
```cmd
cd modules\bot\python
pip install -r requirements.txt
```
### 3. Get Stockfish
- Download from https://stockfishchess.org/download/
- Extract to `C:\stockfish`
- Verify: `C:\stockfish\stockfish.exe --version`
### 4. Run Pipeline
```cmd
cd modules\bot
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
run_nnue_pipeline.bat
```
## FAQ
### How long does it take?
- Step 1 (positions): 2-3 minutes
- Step 2 (Stockfish): **24-36 hours** ← slowest
- Step 3 (training): 2-4 hours (faster with GPU)
- Step 4 (export): 1 minute
- **Total: 26-40 hours**
### Can I pause and resume?
Yes! The pipeline saves progress:
1. Press `Ctrl+C` to stop
2. Run the pipeline again - it will resume where it left off
### Does it use my GPU?
Yes, automatically! If you have NVIDIA GPU:
- Training will be 5-10x faster
- Requires CUDA Toolkit (optional, not required)
### Can I test with fewer positions?
Yes! Edit `python\generate_positions.py`:
```python
# Change line 9 from:
for game_num in range(500000):
# To:
for game_num in range(10000):
```
This will complete in ~30 minutes instead of 26+ hours.
## File Locations After Pipeline
```
modules\bot\
├── python\
│ ├── positions.txt (15 MB - raw positions)
│ ├── training_data.jsonl (100 MB - FEN + eval)
│ ├── nnue_weights.pt (3 MB - trained weights)
│ └── [python scripts]
├── src\main\scala\de\nowchess\bot\bots\nnue\
│ ├── NNUEWeights.scala (10 MB - generated weights)
│ ├── NNUE.scala (inference engine)
│ ├── EvaluationNNUE.scala (weights trait)
│ └── NNUEBot.scala (bot implementation)
└── [launcher scripts]
```
## Environment Variables
Set these before running the pipeline:
```cmd
REM Required (unless Stockfish is in PATH)
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
REM Optional: specify Python version
set PYTHON_CMD=python3
```
Or in PowerShell:
```powershell
$env:STOCKFISH_PATH = "C:\stockfish\stockfish.exe"
$env:PYTHON_CMD = "python3"
```
## Troubleshooting Flow
1. **Python not found** → Install from python.org, check "Add to PATH"
2. **Stockfish not found** → Download from stockfishchess.org, set `STOCKFISH_PATH`
3. **Module not found** → Run `pip install -r requirements.txt`
4. **GPU out of memory** → Reduce batch size in `train_nnue.py`
5. **Pipeline hangs** → Check `training_data.jsonl` size, Stockfish evaluation is slow
See **WINDOWS_SETUP.md** for detailed troubleshooting.
## Next Steps After Pipeline
1. **Verify output:**
```cmd
cd ..\..\
compile.bat
test.bat
```
2. **Use NNUEBot in your engine:**
```scala
val bot = new NNUEBot(difficulty, rules, book)
val move = bot.nextMove(context)
```
## Support
- **Quick help:** README_WINDOWS.md
- **Detailed help:** WINDOWS_SETUP.md
- **Technical details:** NNUE_IMPLEMENTATION_SUMMARY.md
- **Complete reference:** python/README_NNUE.md
---
**Platform:** Windows 10/11 (tested on Windows 11)
**Requirements:** Python 3.8+, Stockfish 14+
**Languages:** Python, Scala 3
**Status:** ✅ Production Ready
-245
View File
@@ -1,245 +0,0 @@
# Windows Setup Guide for NNUE Pipeline
This guide walks through running the NNUE training pipeline on Windows 10/11.
## Prerequisites
### 1. Python 3.8+
Check if Python is installed:
```cmd
python --version
```
If not installed:
- Download from [python.org](https://www.python.org)
- During installation, **CHECK** "Add Python to PATH"
- Verify after install: `python --version`
### 2. Stockfish Chess Engine
Download Stockfish:
- https://stockfishchess.org/download/
- Extract to a known location, e.g., `C:\stockfish\stockfish.exe`
Verify installation:
```cmd
C:\stockfish\stockfish.exe --version
```
### 3. Python Dependencies
From `modules\bot\python\`:
```cmd
pip install -r requirements.txt
```
This installs:
- python-chess (chess board library)
- torch (neural network training)
- tqdm (progress bars)
## Running the Pipeline
### Option A: Quick Start (Recommended for Windows)
From `modules\bot\`:
```cmd
REM Set Stockfish path (if not in PATH)
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
REM Run the pipeline
run_nnue_pipeline.bat
```
### Option B: Manual Control
From `modules\bot\python\`:
```cmd
REM Set Stockfish path
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
REM Run pipeline
python run_pipeline.py
```
Wait, there's no `run_pipeline.py` - use the batch file instead:
```cmd
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
run_pipeline.bat
```
### Option C: Using Git Bash (if installed)
Git Bash allows you to use bash scripts on Windows:
```bash
cd modules/bot
export STOCKFISH_PATH=C:/stockfish/stockfish.exe
bash run_nnue_pipeline.sh
```
## Setting Stockfish Path Permanently
If you want to avoid setting `STOCKFISH_PATH` each time:
### Method 1: Add to System PATH
1. Open **Environment Variables**:
- Press `Win + R`
- Type `systempropertiesadvanced.exe`
- Click "Environment Variables..."
2. Under "System variables", click "New"
- Variable name: `STOCKFISH_PATH`
- Variable value: `C:\stockfish\stockfish.exe`
- Click OK, OK, OK
3. Restart Command Prompt or PowerShell
4. Verify: `echo %STOCKFISH_PATH%`
### Method 2: Add Stockfish Directory to PATH
1. Open **Environment Variables** (same as above)
2. Find "Path" in System variables, click Edit
3. Click "New"
4. Add: `C:\stockfish`
5. Click OK, OK, OK
6. Restart terminal and verify: `stockfish --version`
## Running the Full Pipeline
Time estimates (on typical Windows machine):
- Step 1 (Generate positions): ~2-3 minutes
- Step 2 (Stockfish evaluation): **~24-36 hours** (slowest)
- Step 3 (Train network): ~2-4 hours (faster with NVIDIA GPU)
- Step 4 (Export weights): ~1 minute
Total: **~26-40 hours** on CPU, **~26-30 hours** on GPU
To run the full pipeline:
```cmd
cd modules\bot
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
run_nnue_pipeline.bat
```
The script will:
1. Generate 500,000 random chess positions
2. Evaluate each with Stockfish at depth 12
3. Train a neural network on the evaluations
4. Export weights as Scala code
5. Automatically update `NNUEWeights.scala`
## Quick Testing (Shorter Run)
To test the pipeline with fewer positions (~30 minutes total):
Edit `python\generate_positions.py`:
```python
# Line 9, change:
for game_num in range(500000):
# To:
for game_num in range(10000):
```
Then run the pipeline normally.
## Troubleshooting
### "Python is not recognized"
Python isn't in PATH:
1. Install Python again, **CHECK** "Add Python to PATH"
2. Or add manually: add `C:\Users\YourName\AppData\Local\Programs\Python\Python310` to PATH
### "Stockfish not found"
```cmd
REM Find where stockfish is installed
where stockfish
REM If found, set the full path
set STOCKFISH_PATH=C:\full\path\to\stockfish.exe
```
### "ModuleNotFoundError: No module named 'torch'"
PyTorch not installed or wrong Python version:
```cmd
pip install torch==2.1.2
```
If you have NVIDIA GPU, install CUDA version for better performance:
```cmd
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
### "CUDA out of memory"
If training fails with GPU memory error, edit `python\train_nnue.py`:
```python
# Line ~91, change:
train_loader = DataLoader(train_dataset, batch_size=4096, shuffle=True)
# To:
train_loader = DataLoader(train_dataset, batch_size=2048, shuffle=True)
```
### Pipeline hangs at Step 2
Stockfish evaluation is slow. This is normal - it may take 24+ hours.
To check progress, look at the size of `training_data.jsonl` (should grow over time):
```cmd
dir training_data.jsonl
```
To interrupt and resume later:
- Press `Ctrl+C`
- Run the pipeline again - it will resume from where it left off
## After Pipeline Completes
1. New file created: `modules\bot\src\main\scala\de\nowchess\bot\bots\nnue\NNUEWeights.scala`
2. Recompile the project:
```cmd
cd ..\..\
compile.bat
```
3. Run tests:
```cmd
test.bat
```
## File Locations
| File | Location | Size |
|------|----------|------|
| Positions | `modules\bot\python\positions.txt` | ~15 MB |
| Training data | `modules\bot\python\training_data.jsonl` | ~100 MB |
| Weights | `modules\bot\python\nnue_weights.pt` | ~3 MB |
| Scala weights | `modules\bot\src\main\scala\de\nowchess\bot\bots\nnue\NNUEWeights.scala` | ~10 MB |
## Advanced: GPU Acceleration
If you have an NVIDIA GPU:
1. Install CUDA Toolkit: https://developer.nvidia.com/cuda-downloads
2. Install cuDNN: https://developer.nvidia.com/cudnn
3. Reinstall PyTorch with CUDA support:
```cmd
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
Training will be 5-10x faster with GPU.
## Support
See `README_NNUE.md` for complete documentation and `QUICKSTART.md` for quick reference.
+19
View File
@@ -0,0 +1,19 @@
# Data and weights are local artifacts, not committed
data/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
.venv
# IDE
.idea/
.vscode/
*.swp
*.swo
-383
View File
@@ -1,383 +0,0 @@
# Debugging the NNUE Pipeline
## Common Issues & Solutions
### Issue 1: Empty training_data.jsonl
**Symptom:** After running the pipeline, `training_data.jsonl` is empty or doesn't exist.
**Diagnosis:** Run labeling with verbose output:
```bash
python label_positions.py positions.txt training_data.jsonl /path/to/stockfish --verbose
```
**Check these in order:**
#### 1. Is `positions.txt` empty?
```bash
wc -l positions.txt
```
If 0 lines: positions generator is failing. See Issue 2.
If >0 lines: positions exist. Check step 2.
#### 2. Is Stockfish installed and working?
```bash
# Linux/macOS
which stockfish
stockfish --version
# Windows
where stockfish
C:\path\to\stockfish.exe --version
```
If not found: Install from https://stockfishchess.org
#### 3. Is the Stockfish path correct?
```bash
# Check what path the labeler is using
export STOCKFISH_PATH=/your/path/to/stockfish
echo $STOCKFISH_PATH
python label_positions.py positions.txt training_data.jsonl $STOCKFISH_PATH --verbose
```
The script will print at the top: `Using Stockfish: /path/to/stockfish`
#### 4. Check the error summary
After running with verbose, look for the summary:
```
============================================================
LABELING SUMMARY
============================================================
Successfully evaluated: 0 ← This should be > 0
Skipped (duplicates): 0
Skipped (invalid): 0
Errors: 0
```
If "Successfully evaluated" is 0, positions aren't being saved.
---
### Issue 2: Empty positions.txt
**Symptom:** `positions.txt` is empty after running `generate_positions.py`
**Diagnosis:** Check the generation summary:
```bash
python generate_positions.py positions.txt --games 10000
```
Expected output:
```
============================================================
POSITION GENERATION SUMMARY
============================================================
Total games: 10000
Saved positions: 1234 ← This should be > 0
Filtered (check): 2345
Filtered (captures): 4321
Filtered (game over): 1100
Total filtered: 7766
Acceptance rate: 12.34%
============================================================
```
**If Saved positions = 0:**
The filters are too strict! Try with `--no-filter-captures`:
```bash
python generate_positions.py positions.txt --games 10000 --no-filter-captures
```
This allows positions with available captures, which should greatly increase the output.
---
### Issue 3: Stockfish Errors During Labeling
**Symptom:** Labeling runs but shows errors like:
```
Error evaluating position: rnbqkbnr/pppppppp...
SomeError: [error details]
```
**Solutions:**
1. **Check Stockfish is responsive:**
```bash
# Test Stockfish directly
echo "position startpos" | stockfish
echo "quit" | stockfish
```
2. **Try with lower depth** (faster, fewer timeouts):
```bash
python label_positions.py positions.txt training_data.jsonl /path/to/stockfish --depth 8
```
3. **Use explicit path** instead of relying on PATH:
```bash
python label_positions.py positions.txt training_data.jsonl /usr/games/stockfish
```
4. **Check if FENs in positions.txt are valid:**
```bash
head -5 positions.txt
```
Output should look like:
```
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
```
---
### Issue 4: Training Fails - No Valid Data
**Symptom:** `train_nnue.py` crashes with:
```
IndexError: list index out of range
```
**Cause:** `training_data.jsonl` is empty or contains invalid JSON.
**Debug:**
```bash
# Check file size
ls -lh training_data.jsonl
# Count valid lines
python -c "import json; lines = [1 for line in open('training_data.jsonl') if json.loads(line)]; print(f'Valid lines: {len(lines)}')"
# Look at first few lines
head -3 training_data.jsonl
```
Expected output:
```
{"fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1", "eval": 45}
{"fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1", "eval": 48}
```
If empty: go back to Issue 1.
---
## Step-by-Step Verification
Run this to verify each step works:
```bash
cd modules/bot/python
# Step 1: Generate 1000 positions (quick test)
echo "Testing position generation..."
python generate_positions.py test_positions.txt --games 1000 --no-filter-captures
# Check output
if [ ! -s test_positions.txt ]; then
echo "ERROR: positions.txt is empty"
exit 1
fi
POSITIONS=$(wc -l < test_positions.txt)
echo "✓ Generated $POSITIONS positions"
# Step 2: Label positions (quick test with 100 positions)
echo "Testing Stockfish labeling..."
export STOCKFISH_PATH=$(which stockfish || which /usr/games/stockfish || echo "stockfish")
if ! command -v $STOCKFISH_PATH &> /dev/null; then
echo "ERROR: Stockfish not found"
echo " Install: apt-get install stockfish (Linux) or brew install stockfish (Mac)"
exit 1
fi
head -100 test_positions.txt > test_positions_100.txt
python label_positions.py test_positions_100.txt test_training_data.jsonl $STOCKFISH_PATH --depth 8
# Check output
if [ ! -s test_training_data.jsonl ]; then
echo "ERROR: training_data.jsonl is empty"
echo " Run again with --verbose:"
python label_positions.py test_positions_100.txt test_training_data.jsonl $STOCKFISH_PATH --depth 8 --verbose
exit 1
fi
EVALS=$(wc -l < test_training_data.jsonl)
echo "✓ Evaluated $EVALS positions"
# Step 3: Test training
echo "Testing training..."
python train_nnue.py test_training_data.jsonl test_weights.pt --epochs 1 --batch-size 32 --no-versioning
if [ ! -f test_weights.pt ]; then
echo "ERROR: training failed"
exit 1
fi
echo "✓ Training works"
echo ""
echo "All tests passed! Pipeline is working correctly."
echo "You can now run the full pipeline with:"
echo " ./run_pipeline.sh"
```
Save as `test_pipeline.sh` and run:
```bash
chmod +x test_pipeline.sh
./test_pipeline.sh
```
---
## Common Error Messages
### "Stockfish not found at stockfish"
```bash
# Set the full path
export STOCKFISH_PATH=/usr/games/stockfish
# Or on Windows:
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
```
### "No such file or directory: positions.txt"
```bash
# Make sure you're in the right directory
cd modules/bot/python
# Or provide full path
python label_positions.py /full/path/to/positions.txt training_data.jsonl stockfish
```
### "JSONDecodeError" in training
```bash
# training_data.jsonl has invalid JSON
# Regenerate it:
rm training_data.jsonl
python label_positions.py positions.txt training_data.jsonl stockfish
```
### "CUDA out of memory"
```bash
# Reduce batch size
python train_nnue.py training_data.jsonl nnue_weights.pt --batch-size 1024
```
---
## Getting More Information
### Verbose Output
All scripts support `--verbose` for detailed debugging:
```bash
python label_positions.py positions.txt training_data.jsonl stockfish --verbose
```
This prints:
- Which Stockfish is being used
- Error details for each failed position
- Summary of what passed/failed/skipped
### File Size Checks
```bash
# Check all files
ls -lh positions.txt training_data.jsonl nnue_weights.pt
# Count lines
echo "Positions: $(wc -l < positions.txt)"
echo "Training data: $(wc -l < training_data.jsonl)"
```
### Quick Tests
```bash
# Test position generation (100 games)
python generate_positions.py test_pos.txt --games 100 --no-filter-captures
# Test Stockfish labeling (10 positions)
head -10 test_pos.txt > test_pos_10.txt
python label_positions.py test_pos_10.txt test_data_10.jsonl stockfish --depth 6
# Test training (on test data)
python train_nnue.py test_data_10.jsonl test_model.pt --epochs 1 --batch-size 8
```
---
## Pipeline Workflow with Debugging
```bash
# 1. Generate positions
python generate_positions.py positions.txt --games 100000 --no-filter-captures
# Should output: Saved positions: ~20000-40000 (depends on filter)
# 2. Label with Stockfish
export STOCKFISH_PATH=$(which stockfish)
python label_positions.py positions.txt training_data.jsonl $STOCKFISH_PATH --depth 10
# Should output: Successfully evaluated: > 0
# 3. Train model
python train_nnue.py training_data.jsonl nnue_weights.pt --epochs 5
# Should output: Training summary with version info
# 4. Export to Scala
python export_weights.py nnue_weights_v1.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
# Should output: NNUEWeights.scala created
# 5. Compile Scala
cd ../..
./compile
# Should output: BUILD SUCCESSFUL
```
---
## Performance Monitoring
While labeling is running, monitor progress:
```bash
# In another terminal
watch -n 5 'wc -l modules/bot/python/training_data.jsonl'
# Or on macOS
while true; do echo $(wc -l < modules/bot/python/training_data.jsonl) positions labeled; sleep 5; done
```
This shows how many positions per second are being evaluated.
---
## Still Stuck?
1. **Read the full output** — Don't skip error messages
2. **Check file sizes** — `ls -lh` shows if files are being created
3. **Run with `--verbose`** — Shows exactly what's failing
4. **Test individual steps** — Don't run full pipeline, test pieces
5. **Check Stockfish** — `stockfish --version` confirms it works
For more help, see:
- `README_NNUE.md` — Complete pipeline docs
- `TRAINING_GUIDE.md` — Training workflows
- `INCREMENTAL_TRAINING.md` — Versioning & checkpoints
-296
View File
@@ -1,296 +0,0 @@
# Incremental Training & Versioning: New Features
## Summary
`train_nnue.py` now supports:
**Checkpoint Loading** — Resume from previous models
**Automatic Versioning** — v1, v2, v3... naming
**Metadata Tracking** — Date, positions, losses, depth
**CLI Arguments** — Full control via command line
---
## Feature 1: Automatic Checkpoint Detection
When you run training, the trainer automatically looks for and loads existing weights:
```bash
# First run: nnue_weights.pt doesn't exist
python train_nnue.py training_data.jsonl nnue_weights.pt
# → Trains from scratch, saves as nnue_weights_v1.pt
# Second run: nnue_weights.pt exists (symlink to v1)
python train_nnue.py training_data_bigger.jsonl nnue_weights.pt
# → Auto-loads nnue_weights_v1.pt as checkpoint
# → Continues training
# → Saves as nnue_weights_v2.pt
```
**No command-line flag needed** — automatic detection of existing weights!
---
## Feature 2: Explicit Checkpoint
Override auto-detection with `--checkpoint`:
```bash
# Use v1 as starting point, ignore any other weights
python train_nnue.py training_data.jsonl nnue_weights.pt \
--checkpoint nnue_weights_v1.pt
# Or load from external checkpoint
python train_nnue.py training_data.jsonl nnue_weights.pt \
--checkpoint /path/to/backup_model.pt
```
---
## Feature 3: Automatic Versioning
Models are saved with version numbers:
**First run:**
```
nnue_weights_v1.pt ← Model weights
nnue_weights_v1_metadata.json ← Training info
```
**Second run:**
```
nnue_weights_v2.pt ← Model weights
nnue_weights_v2_metadata.json ← Training info
```
**Third run:**
```
nnue_weights_v3.pt
nnue_weights_v3_metadata.json
```
Disable with `--no-versioning`:
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt --no-versioning
# → Saves directly to nnue_weights.pt (no version number)
```
---
## Feature 4: Training Metadata
Each model save includes a JSON metadata file tracking:
```json
{
"version": 2,
"date": "2026-04-07T15:30:45.123456",
"num_positions": 1000000,
"stockfish_depth": 12,
"epochs": 20,
"batch_size": 4096,
"learning_rate": 0.001,
"final_val_loss": 0.0234567,
"device": "cuda",
"checkpoint": "nnue_weights_v1.pt",
"notes": "Win rate vs classical eval: TBD"
}
```
### Useful for:
- **Tracking progress** — Compare val_loss across versions
- **Reproducibility** — Know exactly how each model was trained
- **Debugging** — Identify which positions/depth produced best results
- **Benchmarking** — Record win rates (manually added to notes)
---
## Feature 5: CLI Arguments
Full control over training via command-line flags:
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt \
--epochs 30 \
--batch-size 2048 \
--lr 5e-4 \
--stockfish-depth 14 \
--checkpoint nnue_weights_v1.pt
```
**All flags:**
- `--epochs` — Number of training passes (default: 20)
- `--batch-size` — Samples per update (default: 4096)
- `--lr` — Learning rate (default: 1e-3)
- `--stockfish-depth` — Depth for metadata (default: 12)
- `--checkpoint` — Resume from checkpoint (default: auto-detect)
- `--no-versioning` — Disable versioning
---
## Workflow Examples
### Scenario 1: Continuous Improvement
```bash
# Initial training: 500K positions
./run_pipeline.sh
# → nnue_weights_v1.pt created
# Add more positions (500K more)
python label_positions.py positions_v2.txt training_data_v2.jsonl stockfish
# Combine and retrain
cat training_data.jsonl training_data_v2.jsonl > all_data.jsonl
python train_nnue.py all_data.jsonl nnue_weights.pt
# → Loads v1, trains on all 1M positions
# → nnue_weights_v2.pt created
# Export best version
python export_weights.py nnue_weights_v2.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
```
### Scenario 2: Hyperparameter Tuning
```bash
# Baseline
python train_nnue.py data.jsonl nnue_weights.pt
# → v1 with default settings
# Try lower learning rate
python train_nnue.py data.jsonl nnue_weights.pt --lr 5e-4
# → v2 with lr=5e-4
# Try higher learning rate
python train_nnue.py data.jsonl nnue_weights.pt --lr 2e-3
# → v3 with lr=2e-3
# Compare metadata
cat nnue_weights_v*_metadata.json | grep final_val_loss
# → Pick the lowest loss
```
### Scenario 3: Interrupted Training Resume
```bash
# Start training
python train_nnue.py training_data.jsonl nnue_weights.pt --epochs 50
# → Epoch 30 of 50, then crash/interrupt
# Resume: same command
python train_nnue.py training_data.jsonl nnue_weights.pt --epochs 50
# → Auto-detects checkpoint, continues from epoch 30
# → Completes to epoch 50
```
---
## Command-Line Help
View all options:
```bash
python train_nnue.py --help
```
Output:
```
usage: train_nnue.py [-h] [--checkpoint CHECKPOINT] [--epochs EPOCHS]
[--batch-size BATCH_SIZE] [--lr LR]
[--stockfish-depth STOCKFISH_DEPTH] [--no-versioning]
[data_file] [output_file]
Train NNUE neural network for chess evaluation
positional arguments:
data_file Path to training_data.jsonl (default: training_data.jsonl)
output_file Output file base name (default: nnue_weights.pt)
optional arguments:
-h, --help show this help message and exit
--checkpoint CHECKPOINT
Path to checkpoint file to resume training from (optional)
--epochs EPOCHS Number of epochs to train (default: 20)
--batch-size BATCH_SIZE
Batch size (default: 4096)
--lr LR Learning rate (default: 1e-3)
--stockfish-depth STOCKFISH_DEPTH
Stockfish depth used for evaluations (for metadata, default: 12)
--no-versioning Disable automatic versioning (save directly to output file)
```
---
## Key Differences from Previous Version
| Feature | Before | After |
|---------|--------|-------|
| Checkpoint support | ❌ No | ✅ Yes (auto + explicit) |
| Versioning | ❌ Single file | ✅ v1, v2, v3... |
| Metadata tracking | ❌ No | ✅ JSON with all info |
| CLI arguments | ❌ Limited | ✅ Full argparse |
| Resumed training | ❌ Always from scratch | ✅ Resume from checkpoint |
| Training history | ❌ Lost | ✅ Tracked in metadata |
---
## Integration with Pipeline
The `run_pipeline.sh` and `run_pipeline.bat` scripts automatically use versioning:
```bash
./run_pipeline.sh
# First run:
# - Generates data
# - Trains model
# - Creates nnue_weights_v1.pt + metadata
# - Exports to NNUEWeights.scala
# Second run:
# - Auto-detects v1, loads as checkpoint
# - Continues training on all data
# - Creates nnue_weights_v2.pt + metadata
# - Exports updated NNUEWeights.scala
```
---
## Tips & Tricks
### List all versions with losses:
```bash
for f in nnue_weights_v*_metadata.json; do
version=$(grep version $f | head -1)
loss=$(grep final_val_loss $f)
echo "$version | $loss"
done
```
### Auto-export best version:
```bash
# Find version with lowest loss
BEST=$(for f in nnue_weights_v*_metadata.json; do
echo "$f $(grep final_val_loss $f | cut -d: -f2)"
done | sort -k2 -n | head -1 | cut -d_ -f3 | cut -d. -f1)
python export_weights.py nnue_weights_$BEST.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
```
### Archive old versions:
```bash
mkdir -p archive
mv nnue_weights_v{1,2,3}.pt archive/
mv nnue_weights_v{1,2,3}_metadata.json archive/
# Keep only v4+
```
---
## See Also
- `TRAINING_GUIDE.md` — Detailed examples and workflows
- `README_NNUE.md` — Complete pipeline documentation
- `train_nnue.py --help` — Command-line reference
+129
View File
@@ -0,0 +1,129 @@
# NNUE Python Pipeline
Central CLI for training and exporting chess evaluation neural networks (NNUE).
## Directory Structure
```
python/
├── nnue.py # Main CLI entry point
├── src/ # Python modules
│ ├── generate.py # Generate random chess positions
│ ├── label.py # Label positions with Stockfish
│ ├── train.py # Train NNUE model
│ └── export.py # Export weights to Scala
├── data/ # Training data (gitignored)
│ ├── positions.txt
│ └── training_data.jsonl
└── weights/ # Model weights (gitignored)
├── nnue_weights_v1.pt
├── nnue_weights_v1_metadata.json
└── ...
```
## Quick Start
```bash
# Train a new model (500k positions, auto-detect checkpoint)
python nnue.py train
# Train from specific checkpoint
python nnue.py train --from-checkpoint 2
# Train with custom games count
python nnue.py train --games 200000
# Train with custom positions file
python nnue.py train --positions-file my_positions.txt
# Export specific version to Scala
python nnue.py export 2
# List all checkpoints
python nnue.py list
```
## CLI Commands
### `train` - Train NNUE model
```bash
python nnue.py train [OPTIONS]
```
**Options:**
- `--from-checkpoint N` - Resume from checkpoint version N (default: uses latest)
- `--games N` - Number of games to generate (default: 500000)
- `--positions-file FILE` - Use existing positions file instead of generating
- `--stockfish PATH` - Path to Stockfish binary (default: `$STOCKFISH_PATH` or `/usr/games/stockfish`)
**Examples:**
```bash
# Train with latest checkpoint
python nnue.py train
# Train from v2 with 100k games
python nnue.py train --from-checkpoint 2 --games 100000
# Train with custom positions
python nnue.py train --positions-file my_games.txt --stockfish /opt/stockfish/sf15
```
### `export` - Export weights to Scala
```bash
python nnue.py export WEIGHTS [output_path]
```
**Arguments:**
- `WEIGHTS` - Version number (e.g., `2`) or full filename (e.g., `nnue_weights_v2.pt`)
**Examples:**
```bash
# Export version 2
python nnue.py export 2
# Export with full filename
python nnue.py export nnue_weights_v3.pt
```
Output goes to `../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights_vN.scala`
### `list` - List available checkpoints
```bash
python nnue.py list
```
Shows all available model versions with file sizes.
## Data Flow
1. **Generate**`data/positions.txt`
- Random chess positions from 8-20 move openings
- Filters out checks, game-over states, and captures
2. **Label**`data/training_data.jsonl`
- Evaluates each position with Stockfish at depth 12
- Stores FEN + evaluation in JSONL format
3. **Train**`weights/nnue_weights_vN.pt`
- Trains neural network on labeled positions
- Auto-versioning (v1, v2, v3, etc.)
- Saves metadata alongside weights
4. **Export**`NNUEWeights_vN.scala`
- Converts weights to Scala object
- Ready for integration into bot
## Versioning
- Models are automatically versioned (v1, v2, v3, etc.)
- Each version gets a `_metadata.json` file with training info
- Training from checkpoint uses latest version unless specified with `--from-checkpoint`
## Files
- `data/` and `weights/` are gitignored (local artifacts)
- Documentation in `docs/` explains training, debugging, and incremental improvements
- Source modules in `src/` are independent and can be imported for custom workflows
-173
View File
@@ -1,173 +0,0 @@
# NNUE Training Pipeline
This directory contains the complete NNUE (Efficiently Updatable Neural Network) training pipeline for the Now-Chess bot.
## Overview
The pipeline generates 500,000 random chess positions, evaluates them with Stockfish, trains a neural network, and exports the weights as Scala code for integration into the engine.
## Prerequisites
Install Python dependencies:
```bash
pip install -r requirements.txt
```
Ensure Stockfish is installed. You can:
- Install via package manager: `apt-get install stockfish` (Linux) or `brew install stockfish` (macOS)
- Or download from [stockfish.org](https://stockfishchess.org)
Set the Stockfish path:
```bash
export STOCKFISH_PATH=/path/to/stockfish
```
## Pipeline Steps
### Quick Run
Run the entire pipeline:
```bash
chmod +x run_pipeline.sh
./run_pipeline.sh
```
This automatically runs all 4 steps in sequence and confirms each succeeds before continuing.
### Individual Steps
#### Step 1: Generate Positions
Generate 500,000 random chess positions:
```bash
python3 generate_positions.py positions.txt
```
Output: `positions.txt` (one FEN per line)
- Plays 8-20 random opening moves
- Filters out checks, captures available, and game-over positions
- Shows progress bar with tqdm
#### Step 2: Label with Stockfish
Evaluate each position with Stockfish at depth 12:
```bash
export STOCKFISH_PATH=/path/to/stockfish
python3 label_positions.py positions.txt training_data.jsonl $STOCKFISH_PATH
```
Output: `training_data.jsonl` (one JSON per line)
- Format: `{"fen": "...", "eval": 123}` (centipawns)
- Evals clamped to [-2000, 2000] to avoid mate score outliers
- Supports resuming if interrupted (checks for existing entries)
- Shows progress bar with tqdm
**Note:** This step is slow (~24-36 hours for 500K positions at depth 12). You can reduce games or use lower depth for testing.
#### Step 3: Train NNUE Model
Train the neural network:
```bash
python3 train_nnue.py training_data.jsonl nnue_weights.pt
```
Output: `nnue_weights.pt` (PyTorch model weights)
Architecture:
- Input: 768 binary features (12 piece types × 64 squares)
- Hidden 1: 256 neurons + ReLU
- Hidden 2: 32 neurons + ReLU
- Output: 1 neuron (sigmoid applied to eval/400)
Training:
- 20 epochs, batch size 4096, Adam optimizer (lr=1e-3)
- 90% train / 10% validation split
- Saves best weights by validation loss
- Shows train/val loss per epoch
**Note:** Requires GPU for reasonable speed (~2-4 hours). CPU falls back to ~8-16 hours.
#### Step 4: Export to Scala
Export weights as Scala code:
```bash
python3 export_weights.py nnue_weights.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
```
Output: `NNUEWeights.scala`
- Object with `val` arrays for each layer's weights and biases
- Format: `Array[Float]` with precision sufficient for inference
- Includes shape comments for reference
## Scala Integration
### Step 5: NNUE Evaluator
Create `NNUE.scala` in `src/main/scala/de/nowchess/bot/bots/nnue/`:
```scala
package de.nowchess.bot.bots.nnue
class NNUE:
// Load weights from NNUEWeights.scala
// Convert Position to 768-feature vector
// Run inference: l1→ReLU→l2→ReLU→l3
// Return centipawn score
```
### Step 6: Integration
Implement `NNUEBot` that uses the NNUE evaluator for move selection.
## File Reference
| File | Purpose |
|------|---------|
| `requirements.txt` | Python dependencies |
| `generate_positions.py` | Step 1: Position generator |
| `label_positions.py` | Step 2: Stockfish labeler |
| `train_nnue.py` | Step 3: NNUE trainer |
| `export_weights.py` | Step 4: Weight exporter |
| `run_pipeline.sh` | Master script (runs steps 1-4) |
| `positions.txt` | Output: Raw FENs (500K) |
| `training_data.jsonl` | Output: FEN+eval pairs |
| `nnue_weights.pt` | Output: Trained weights |
| `../src/main/scala/.../NNUEWeights.scala` | Output: Scala weights |
## Tips
- **For testing:** Reduce `generate_positions.py` to 10,000 games for quick iteration
- **Resume labeling:** Run step 2 again; it skips already-evaluated positions
- **GPU acceleration:** Install CUDA for PyTorch to speed up training
- **Stockfish tuning:** Lower depth (e.g., 8 instead of 12) for faster labeling
- **Batch size:** Increase to 8192 if OOM; decrease if out of memory
## Troubleshooting
**ImportError: No module named 'chess'**
- Run: `pip install -r requirements.txt`
**Stockfish not found**
- Check: `which stockfish` or set `export STOCKFISH_PATH=/full/path/to/stockfish`
**CUDA out of memory**
- Reduce batch size in `train_nnue.py` (e.g., 2048)
- Or use CPU: Remove CUDA check and device setup
**Training loss not decreasing**
- Check data quality: Sample some entries from `training_data.jsonl`
- Increase learning rate to 1e-2 or 5e-4 for experimentation
- Verify Stockfish depth was sufficient (depth ≥ 10)
## References
- [NNUE Overview](https://www.chessprogramming.org/NNUE)
- [python-chess](https://python-chess.readthedocs.io/)
- [PyTorch](https://pytorch.org/)
- [Stockfish](https://stockfishchess.org/)
-381
View File
@@ -1,381 +0,0 @@
# NNUE Training Guide: Incremental Training & Versioning
## Overview
The improved `train_nnue.py` now supports:
1. **Incremental training** — Resume from checkpoint, continue training on new data
2. **Automatic versioning** — Each training run saved as `nnue_weights_v{N}.pt`
3. **Metadata tracking** — Date, positions, depth, losses stored in JSON
4. **CLI flags** — Full control over training parameters
## Quick Start
### First Training Run (Fresh Start)
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt
```
This saves:
- `nnue_weights_v1.pt` — The trained weights
- `nnue_weights_v1_metadata.json` — Training metadata
### Continue Training (Incremental)
Add more positions to `training_data.jsonl`, then:
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt
```
The trainer will:
1. Detect `nnue_weights.pt` exists
2. Load it as a checkpoint automatically
3. Continue training on all data
4. Save as `nnue_weights_v2.pt` with updated metadata
Alternatively, specify a checkpoint explicitly:
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt --checkpoint nnue_weights_v1.pt
```
## Advanced Usage
### Custom Training Parameters
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt \
--epochs 30 \
--batch-size 2048 \
--lr 5e-4 \
--stockfish-depth 14
```
- `--epochs` — How many passes through the data (default: 20)
- `--batch-size` — Samples per gradient update (default: 4096)
- `--lr` — Learning rate (default: 1e-3)
- `--stockfish-depth` — Depth of Stockfish evaluation (for metadata only)
### Explicit Checkpoint
Resume from a specific checkpoint (not `nnue_weights.pt`):
```bash
python train_nnue.py training_data_v2.jsonl nnue_weights.pt \
--checkpoint nnue_weights_v1.pt
```
### Disable Versioning
Save directly to output file without versioning:
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt --no-versioning
```
This overwrites `nnue_weights.pt` instead of creating `nnue_weights_v2.pt`.
## Incremental Training Workflow
Typical workflow for improving the model over time:
**Step 1: Initial Training**
```bash
# Generate 500K positions with Stockfish
./run_pipeline.sh
# This saves:
# - nnue_weights_v1.pt
# - nnue_weights_v1_metadata.json
```
**Step 2: Generate More Positions**
```bash
# Later, generate 500K more positions
# Append to training_data.jsonl or create new one
# Label with Stockfish at depth 16 (more thorough)
python label_positions.py positions_batch2.txt training_data_batch2.jsonl stockfish --stockfish-depth 16
# Combine datasets
cat training_data_batch1.jsonl training_data_batch2.jsonl > training_data_combined.jsonl
```
**Step 3: Continue Training**
```bash
# Train on combined data, starting from v1 checkpoint
python train_nnue.py training_data_combined.jsonl nnue_weights.pt
# Saves:
# - nnue_weights_v2.pt (improved)
# - nnue_weights_v2_metadata.json
```
**Step 4: Benchmark & Choose**
```bash
# Test both versions in matches
# If v2 is better, use it; otherwise keep v1
# Update NNUEWeights.scala with best version
python export_weights.py nnue_weights_v2.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
```
## Metadata File Format
Each training session generates a JSON metadata file, e.g., `nnue_weights_v2_metadata.json`:
```json
{
"version": 2,
"date": "2026-04-07T21:45:30.123456",
"num_positions": 1000000,
"stockfish_depth": 12,
"epochs": 20,
"batch_size": 4096,
"learning_rate": 0.001,
"final_val_loss": 0.0234567,
"device": "cuda",
"checkpoint": "nnue_weights_v1.pt",
"notes": "Win rate vs classical eval: TBD (requires benchmark games)"
}
```
### Fields
- **version**: Training version number (v1, v2, etc.)
- **date**: ISO timestamp of training start
- **num_positions**: Total positions in dataset
- **stockfish_depth**: Depth of Stockfish evaluations (from command-line flag)
- **epochs**: Number of training passes
- **batch_size**: Training batch size
- **learning_rate**: Adam optimizer learning rate
- **final_val_loss**: Best validation loss achieved
- **device**: GPU (cuda) or CPU used for training
- **checkpoint**: Previous model used as starting point (null if from scratch)
- **notes**: Win rate comparison (currently TBD — requires benchmark)
## Checkpoint Logic
When you run training, the trainer checks for checkpoints in this order:
1. **Explicit checkpoint** — If you provide `--checkpoint`, use it
2. **Auto-detect** — If output file exists (e.g., `nnue_weights.pt`), load it
3. **From scratch** — Otherwise, initialize with random weights
Example:
```bash
# First run: from scratch (no nnue_weights.pt exists)
python train_nnue.py training_data.jsonl nnue_weights.pt
# → Creates v1 from scratch, saves as nnue_weights_v1.pt
# Second run: auto-detect nnue_weights.pt as checkpoint
python train_nnue.py training_data_bigger.jsonl nnue_weights.pt
# → Loads nnue_weights_v1.pt (because nnue_weights.pt = v1), saves as v2
# Third run: explicit checkpoint
python train_nnue.py training_data_huge.jsonl nnue_weights.pt --checkpoint nnue_weights_v2.pt
# → Loads v2, saves as v3
```
## Resuming Interrupted Training
If training is interrupted (power loss, ^C), you can resume:
```bash
# Original command
python train_nnue.py training_data.jsonl nnue_weights.pt
# If interrupted, the same command will:
# 1. Detect nnue_weights_v1.pt exists (or a higher version)
# 2. Auto-load it as checkpoint
# 3. Resume training
# 4. Save next version (v2, v3, etc.)
```
## Performance Tips
### Reduce Training Time
```bash
# Smaller batch size = slower but less memory
python train_nnue.py training_data.jsonl nnue_weights.pt --batch-size 1024
# Fewer epochs
python train_nnue.py training_data.jsonl nnue_weights.pt --epochs 5
# Lower learning rate = slower convergence but more stable
python train_nnue.py training_data.jsonl nnue_weights.pt --lr 5e-4
```
### Accelerate on GPU
If you have NVIDIA GPU with CUDA:
```bash
# Training will automatically use CUDA
# Check metadata device field: should be "cuda" not "cpu"
python train_nnue.py training_data.jsonl nnue_weights.pt
```
If training uses CPU but GPU is available:
```bash
# Reinstall PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
### Efficient Incremental Training
```bash
# Fine-tune v1 on slightly different data (high learning rate)
python train_nnue.py new_positions.jsonl nnue_weights.pt \
--checkpoint nnue_weights_v1.pt \
--epochs 3 \
--lr 5e-4
# Full retraining on combined data (slower, better)
python train_nnue.py all_positions.jsonl nnue_weights.pt \
--checkpoint nnue_weights_v1.pt \
--epochs 20 \
--lr 1e-3
```
## Version Management
### List All Versions
```bash
ls -la nnue_weights_v*.pt
ls -la nnue_weights_v*_metadata.json
```
### Compare Versions
```bash
cat nnue_weights_v1_metadata.json | grep "final_val_loss"
cat nnue_weights_v2_metadata.json | grep "final_val_loss"
cat nnue_weights_v3_metadata.json | grep "final_val_loss"
```
Lower val loss = better model.
### Benchmark Best Version
After training multiple versions, benchmark them:
```bash
# Export v1 and play some games
python export_weights.py nnue_weights_v1.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
./compile && ./test
# Export v2 and benchmark
python export_weights.py nnue_weights_v2.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
./compile && ./test
# Keep the best, archive others
```
### Archive Old Versions
```bash
# Keep only recent versions
mkdir -p old_models
mv nnue_weights_v1.pt old_models/
mv nnue_weights_v1_metadata.json old_models/
```
## Troubleshooting
### "FileNotFoundError: training_data.jsonl not found"
```bash
# Make sure you're in the python/ directory
cd modules/bot/python
# Or provide full path
python train_nnue.py /full/path/to/training_data.jsonl nnue_weights.pt
```
### "CUDA out of memory"
Reduce batch size:
```bash
python train_nnue.py training_data.jsonl nnue_weights.pt --batch-size 2048
```
### Training seems slow (using CPU not GPU)
```bash
# Check metadata of a training run
cat nnue_weights_v1_metadata.json | grep device
# If "cpu", reinstall PyTorch with CUDA support
pip install torch --index-url https://download.pytorch.org/whl/cu118
```
### "checkpoint file corrupted"
```bash
# Start over from scratch (don't load corrupted checkpoint)
python train_nnue.py training_data.jsonl nnue_weights_fresh.pt --no-versioning
# Or resume from earlier version
python train_nnue.py training_data.jsonl nnue_weights.pt --checkpoint nnue_weights_v1.pt
```
## Integration with Pipeline
The `run_pipeline.sh` script now supports incremental training:
```bash
# First run: generates data, trains v1
./run_pipeline.sh
# Add more positions
# ... generate more, label more ...
# Second run: trains on combined data as v2
./run_pipeline.sh
```
## Example: Full Workflow
```bash
cd modules/bot/python
# Session 1: Initial training
chmod +x run_pipeline.sh
export STOCKFISH_PATH=/usr/bin/stockfish
./run_pipeline.sh
# Creates: nnue_weights_v1.pt, nnue_weights_v1_metadata.json
# Session 2: Improve with deeper analysis
# (manually evaluate more positions at depth 14)
python label_positions.py positions_v2.txt training_data_v2.jsonl \
/usr/bin/stockfish --stockfish-depth 14
# Combine and retrain
cat training_data_v1.jsonl training_data_v2.jsonl > training_data_combined.jsonl
python train_nnue.py training_data_combined.jsonl nnue_weights.pt \
--epochs 25 \
--stockfish-depth 14
# Creates: nnue_weights_v2.pt, nnue_weights_v2_metadata.json
# Session 3: Benchmark and choose
# Test both v1 and v2 with matches...
# If v2 is better, export and use
python export_weights.py nnue_weights_v2.pt \
../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
cd ../..
./compile && ./test
```
## See Also
- `train_nnue.py --help` — Command-line help
- `README_NNUE.md` — Complete pipeline documentation
- `NNUE_IMPLEMENTATION_SUMMARY.md` — Technical architecture
-64
View File
@@ -1,64 +0,0 @@
#!/usr/bin/env python3
"""Export NNUE weights to Scala code."""
import torch
import sys
from pathlib import Path
def export_weights_to_scala(weights_file, output_file):
"""Load PyTorch weights and export as Scala code."""
if not Path(weights_file).exists():
print(f"Error: Weights file not found at {weights_file}")
sys.exit(1)
# Load weights (weights_only=False for compatibility with older PyTorch versions)
state_dict = torch.load(weights_file, map_location='cpu')
# Create output directory if needed
output_path = Path(output_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_file, 'w') as f:
f.write("package de.nowchess.bot.bots.nnue\n\n")
f.write("object NNUEWeights:\n")
for layer_name, tensor in sorted(state_dict.items()):
# Sanitize name
safe_name = layer_name.replace('.', '_').replace(' ', '_')
# Convert tensor to flat list
values = tensor.flatten().tolist()
# Format as Scala array
f.write(f"\n val {safe_name} = Array(\n")
# Write values in chunks for readability
chunk_size = 16
for i in range(0, len(values), chunk_size):
chunk = values[i:i + chunk_size]
formatted_chunk = ", ".join(f"{v:.10g}f" for v in chunk)
f.write(f" {formatted_chunk}")
if i + chunk_size < len(values):
f.write(",\n")
else:
f.write("\n")
f.write(f" )\n")
# Store shape for reference
shape = list(tensor.shape)
f.write(f" // Shape: {shape}\n")
print(f"Weights exported to {output_file}")
if __name__ == "__main__":
weights_file = "nnue_weights.pt"
output_file = "../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala"
if len(sys.argv) > 1:
weights_file = sys.argv[1]
if len(sys.argv) > 2:
output_file = sys.argv[2]
export_weights_to_scala(weights_file, output_file)
+58 -31
View File
@@ -13,44 +13,63 @@ def get_python_cmd():
return "python"
return "python3" if os.popen("which python3 2>/dev/null").read() else "python"
def get_src_module(module_name):
"""Get path to module in src/ directory."""
return Path(__file__).parent / "src" / f"{module_name}.py"
def get_data_dir():
"""Get/create data directory."""
data_dir = Path(__file__).parent / "data"
data_dir.mkdir(exist_ok=True)
return data_dir
def get_weights_dir():
"""Get/create weights directory."""
weights_dir = Path(__file__).parent / "weights"
weights_dir.mkdir(exist_ok=True)
return weights_dir
def list_checkpoints():
"""List available checkpoint versions."""
checkpoints = sorted(Path(".").glob("nnue_weights_v*.pt"))
weights_dir = get_weights_dir()
checkpoints = sorted(weights_dir.glob("nnue_weights_v*.pt"))
if not checkpoints:
return []
return [int(cp.stem.split("_v")[1]) for cp in checkpoints]
def run_generate_positions(num_games):
"""Generate random positions."""
positions_file = "positions.txt"
data_dir = get_data_dir()
positions_file = data_dir / "positions.txt"
print(f"Generating {num_games} positions...")
result = subprocess.run(
[get_python_cmd(), "generate_positions.py", positions_file, "--games", str(num_games)],
[get_python_cmd(), str(get_src_module("generate")), str(positions_file), "--games", str(num_games)],
capture_output=False
)
if result.returncode != 0:
print("ERROR: Position generation failed")
return False
return Path(positions_file).exists()
return positions_file.exists()
def run_label_positions(stockfish_path):
"""Label positions with Stockfish."""
positions_file = "positions.txt"
output_file = "training_data.jsonl"
data_dir = get_data_dir()
positions_file = data_dir / "positions.txt"
output_file = data_dir / "training_data.jsonl"
if not Path(positions_file).exists():
if not positions_file.exists():
print("ERROR: positions.txt not found")
return False
print("Labeling positions with Stockfish...")
result = subprocess.run(
[get_python_cmd(), "label_positions.py", positions_file, output_file, stockfish_path],
[get_python_cmd(), str(get_src_module("label")), str(positions_file), str(output_file), stockfish_path],
capture_output=False
)
if result.returncode != 0:
print("ERROR: Position labeling failed")
return False
return Path(output_file).exists()
return output_file.exists()
def run_train(positions_file, output_weights, from_checkpoint=None):
"""Train NNUE model."""
@@ -58,29 +77,34 @@ def run_train(positions_file, output_weights, from_checkpoint=None):
print(f"ERROR: {positions_file} not found")
return False
weights_dir = get_weights_dir()
print(f"Training model (output: {output_weights})...")
if from_checkpoint:
print(f" Starting from checkpoint: {from_checkpoint}")
cmd = [get_python_cmd(), "train_nnue.py", positions_file, output_weights]
cmd = [get_python_cmd(), str(get_src_module("train")), str(positions_file), str(output_weights)]
if from_checkpoint:
cmd.extend(["--checkpoint", from_checkpoint])
cmd.extend(["--checkpoint", str(from_checkpoint)])
result = subprocess.run(cmd, capture_output=False)
# Run from weights directory so outputs save there
result = subprocess.run(cmd, cwd=str(weights_dir), capture_output=False)
if result.returncode != 0:
print("ERROR: Training failed")
return False
return True # train_nnue creates versioned file, not the base name
return True
def run_export(weights_file, output_file):
"""Export weights to Scala."""
if not Path(weights_file).exists():
print(f"ERROR: {weights_file} not found")
weights_dir = get_weights_dir()
weights_path = weights_dir / Path(weights_file).name
if not weights_path.exists():
print(f"ERROR: {weights_file} not found in {weights_dir}")
return False
print(f"Exporting {weights_file} to Scala...")
result = subprocess.run(
[get_python_cmd(), "export_weights.py", weights_file, output_file],
[get_python_cmd(), str(get_src_module("export")), str(weights_path), output_file],
capture_output=False
)
if result.returncode != 0:
@@ -91,13 +115,16 @@ def run_export(weights_file, output_file):
def cmd_train(args):
"""Handle train command."""
stockfish_path = args.stockfish or os.environ.get("STOCKFISH_PATH", "/usr/games/stockfish")
data_dir = get_data_dir()
weights_dir = get_weights_dir()
# Determine checkpoint
checkpoint = None
if args.from_checkpoint:
checkpoint_version = args.from_checkpoint
checkpoint = f"nnue_weights_v{checkpoint_version}.pt"
if not Path(checkpoint).exists():
checkpoint_path = weights_dir / checkpoint
if not checkpoint_path.exists():
print(f"ERROR: Checkpoint {checkpoint} not found")
return False
else:
@@ -109,12 +136,12 @@ def cmd_train(args):
# Generate or use existing positions
if args.positions_file:
if not Path(args.positions_file).exists():
positions_file = Path(args.positions_file)
if not positions_file.exists():
print(f"ERROR: {args.positions_file} not found")
return False
positions_file = args.positions_file
else:
positions_file = "positions.txt"
positions_file = data_dir / "positions.txt"
num_games = args.games or 500000
if not run_generate_positions(num_games):
return False
@@ -125,8 +152,9 @@ def cmd_train(args):
print("\nStarting training...")
# Train (train_nnue.py handles versioning internally)
if not run_train("training_data.jsonl", "nnue_weights.pt", checkpoint):
# Train with absolute path to data, checkpoint is relative to weights dir
training_data = str(data_dir / "training_data.jsonl")
if not run_train(training_data, "nnue_weights.pt", checkpoint):
return False
# Show created version
@@ -143,13 +171,8 @@ def cmd_export(args):
if not weights_file.endswith(".pt"):
weights_file = f"nnue_weights_v{weights_file}.pt"
if not Path(weights_file).exists():
print(f"ERROR: {weights_file} not found")
return False
# Determine version from filename
version = Path(weights_file).stem.split("_v")[1] if "_v" in weights_file else "1"
output_file = f"../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights_v{version}.scala"
# Output to resources directory as binary format
output_file = str(Path(__file__).parent.parent / "src" / "main" / "resources" / "nnue_weights.bin")
if not run_export(weights_file, output_file):
return False
@@ -164,11 +187,15 @@ def cmd_list(args):
print("No checkpoints found")
return True
weights_dir = get_weights_dir()
print("Available checkpoints:")
for v in available:
weights_file = f"nnue_weights_v{v}.pt"
size = Path(weights_file).stat().st_size / (1024**2) # MB
weights_file = weights_dir / f"nnue_weights_v{v}.pt"
if weights_file.exists():
size = weights_file.stat().st_size / (1024**2) # MB
print(f" v{v} ({size:.1f} MB)")
else:
print(f" v{v} (file not found)")
return True
def main():
Binary file not shown.
+8 -47
View File
@@ -1,6 +1,7 @@
#!/bin/bash
# NNUE Training Pipeline (bash version)
# Uses the central CLI (nnue.py) for all operations
# Works on Linux, macOS, and Windows (with Git Bash or WSL)
set -e # Exit on error
@@ -20,56 +21,16 @@ echo "Python command: $PYTHON_CMD"
echo "Working directory: $SCRIPT_DIR"
echo ""
# Step 1: Generate positions
echo "Step 1: Generating 500,000 random positions..."
$PYTHON_CMD generate_positions.py positions.txt
if [ ! -f positions.txt ]; then
echo "ERROR: positions.txt not created"
# Run the unified training pipeline
$PYTHON_CMD nnue.py train
if [ $? -ne 0 ]; then
echo ""
echo "ERROR: Training pipeline failed"
exit 1
fi
echo "✓ Positions generated"
echo ""
# Step 2: Label positions with Stockfish
echo "Step 2: Labeling positions with Stockfish (depth 12)..."
STOCKFISH_PATH="${STOCKFISH_PATH:-/usr/games/stockfish}"
echo "Using Stockfish: $STOCKFISH_PATH"
$PYTHON_CMD label_positions.py positions.txt training_data.jsonl "$STOCKFISH_PATH"
if [ ! -f training_data.jsonl ]; then
echo "ERROR: training_data.jsonl not created"
exit 1
fi
echo "✓ Positions labeled"
echo ""
# Step 3: Train NNUE model with versioning
echo "Step 3: Training NNUE model (20 epochs)..."
# Auto-detect latest version and increment
LATEST_VERSION=$(ls -1 nnue_weights_v*.pt 2>/dev/null | sed 's/nnue_weights_v//;s/.pt$//' | sort -n | tail -1)
NEW_VERSION=$((${LATEST_VERSION:-0} + 1))
WEIGHTS_FILE="nnue_weights_v${NEW_VERSION}.pt"
echo "Creating version v${NEW_VERSION}..."
$PYTHON_CMD train_nnue.py training_data.jsonl "$WEIGHTS_FILE"
if [ ! -f "$WEIGHTS_FILE" ]; then
echo "ERROR: $WEIGHTS_FILE not created"
exit 1
fi
echo "✓ Model trained: $WEIGHTS_FILE"
echo ""
# Step 4: Export weights to Scala
echo "Step 4: Exporting weights to Scala..."
SCALA_FILE="../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights_v${NEW_VERSION}.scala"
$PYTHON_CMD export_weights.py "$WEIGHTS_FILE" "$SCALA_FILE"
if [ ! -f "$SCALA_FILE" ]; then
echo "ERROR: $SCALA_FILE not created"
exit 1
fi
echo "✓ Weights exported: $SCALA_FILE"
echo ""
echo "=== Pipeline Complete ==="
echo ""
echo "Next steps:"
+66
View File
@@ -0,0 +1,66 @@
#!/usr/bin/env python3
"""Export NNUE weights to binary format for runtime loading."""
import torch
import struct
import sys
from pathlib import Path
def export_weights_to_binary(weights_file, output_file):
"""Load PyTorch weights and export as binary file."""
if not Path(weights_file).exists():
print(f"Error: Weights file not found at {weights_file}")
sys.exit(1)
# Load weights
state_dict = torch.load(weights_file, map_location='cpu')
# Debug: print available layers
print(f"Available layers in {weights_file}:")
for key in sorted(state_dict.keys()):
print(f" {key}: {state_dict[key].shape}")
# Create output directory if needed
output_path = Path(output_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_file, 'wb') as f:
# Write magic number and version
f.write(b'NNUE')
f.write(struct.pack('<I', 1)) # version 1
# Write each weight tensor in order
for layer_name in ['l1.weight', 'l1.bias', 'l2.weight', 'l2.bias', 'l3.weight', 'l3.bias']:
if layer_name not in state_dict:
print(f"Error: Missing layer {layer_name}")
sys.exit(1)
tensor = state_dict[layer_name]
# Convert to float32 and flatten
data = tensor.float().flatten().cpu().numpy()
# Write shape (allows validation on load)
shape = list(tensor.shape)
f.write(struct.pack('<I', len(shape)))
for dim in shape:
f.write(struct.pack('<I', dim))
# Write flattened data as binary floats
f.write(struct.pack(f'<{len(data)}f', *data))
print(f" {layer_name}: shape {shape}, {len(data)} floats")
file_size_mb = output_path.stat().st_size / (1024**2)
print(f"Weights exported to {output_file} ({file_size_mb:.2f} MB)")
if __name__ == "__main__":
weights_file = "nnue_weights.pt"
output_file = "../src/main/resources/nnue_weights.bin"
if len(sys.argv) > 1:
weights_file = sys.argv[1]
if len(sys.argv) > 2:
output_file = sys.argv[2]
export_weights_to_binary(weights_file, output_file)
Binary file not shown.
@@ -1,12 +1,12 @@
{
"version": 1,
"date": "2026-04-07T22:37:15.093371",
"num_positions": 1223,
"date": "2026-04-07T22:56:23.259658",
"num_positions": 2086,
"stockfish_depth": 12,
"epochs": 20,
"batch_size": 4096,
"learning_rate": 0.001,
"final_val_loss": 0.0162429828196764,
"final_val_loss": 0.016311248764395714,
"device": "cuda",
"checkpoint": null,
"notes": "Win rate vs classical eval: TBD (requires benchmark games)"
-22
View File
@@ -1,22 +0,0 @@
@echo off
REM NNUE Pipeline launcher from bot directory
setlocal
echo Launching NNUE Training Pipeline...
echo.
REM Check if we're in the right directory
if not exist "python" (
echo ERROR: python directory not found
echo Please run this script from the modules\bot directory
exit /b 1
)
REM Run the pipeline
cd python
call run_pipeline.bat
set RESULT=%ERRORLEVEL%
cd ..
exit /b %RESULT%
-55
View File
@@ -1,55 +0,0 @@
# NNUE Pipeline launcher for PowerShell (Windows)
Write-Host "Launching NNUE Training Pipeline..." -ForegroundColor Green
Write-Host ""
# Check if we're in the right directory
if (!(Test-Path "python")) {
Write-Host "ERROR: python directory not found" -ForegroundColor Red
Write-Host "Please run this script from the modules\bot directory" -ForegroundColor Red
exit 1
}
# Check for Stockfish
$stockfishPath = $env:STOCKFISH_PATH
if ($null -eq $stockfishPath -or $stockfishPath -eq "") {
Write-Host "Stockfish path not set. Trying to find in PATH..." -ForegroundColor Yellow
$stockfishPath = (Get-Command stockfish -ErrorAction SilentlyContinue).Source
if ($null -eq $stockfishPath) {
Write-Host "Stockfish not found in PATH" -ForegroundColor Yellow
Write-Host "Set STOCKFISH_PATH environment variable and try again:" -ForegroundColor Yellow
Write-Host ' $env:STOCKFISH_PATH = "C:\path\to\stockfish.exe"' -ForegroundColor Cyan
} else {
Write-Host "Found Stockfish: $stockfishPath" -ForegroundColor Green
$env:STOCKFISH_PATH = $stockfishPath
}
}
# Run the pipeline
Write-Host "Running pipeline from: $(Get-Location)\python" -ForegroundColor Cyan
Write-Host ""
Push-Location python
try {
# Use bash if available (Git Bash or WSL)
if (Get-Command bash -ErrorAction SilentlyContinue) {
Write-Host "Using bash script..." -ForegroundColor Cyan
bash ./run_pipeline.sh
} else {
Write-Host "Using batch script..." -ForegroundColor Cyan
& cmd.exe /c run_pipeline.bat
}
$result = $LASTEXITCODE
} finally {
Pop-Location
}
if ($result -eq 0) {
Write-Host ""
Write-Host "Pipeline completed successfully!" -ForegroundColor Green
} else {
Write-Host ""
Write-Host "Pipeline failed with exit code $result" -ForegroundColor Red
}
exit $result
-21
View File
@@ -1,21 +0,0 @@
#!/bin/bash
# NNUE Pipeline launcher from bot directory
echo "Launching NNUE Training Pipeline..."
echo ""
# Check if we're in the right directory
if [ ! -d "python" ]; then
echo "ERROR: python directory not found"
echo "Please run this script from the modules/bot directory"
exit 1
fi
# Run the pipeline
cd python
bash run_pipeline.sh
RESULT=$?
cd ..
exit $RESULT
Binary file not shown.
@@ -1,4 +1,4 @@
package de.nowchess.bot.bots.nnue
package de.nowchess.bot.bots
import de.nowchess.api.game.GameContext
import de.nowchess.api.move.Move
@@ -2,15 +2,58 @@ package de.nowchess.bot.bots.nnue
import de.nowchess.api.board.{Board, Color, File, PieceType, Rank, Square}
import de.nowchess.api.game.GameContext
import java.nio.ByteBuffer
import java.nio.ByteOrder
class NNUE:
private val l1Weights = NNUEWeights.l1_weights
private val l1Bias = NNUEWeights.l1_bias
private val l2Weights = NNUEWeights.l2_weights
private val l2Bias = NNUEWeights.l2_bias
private val l3Weights = NNUEWeights.l3_weights
private val l3Bias = NNUEWeights.l3_bias
private val (l1Weights, l1Bias, l2Weights, l2Bias, l3Weights, l3Bias) = loadWeights()
private def loadWeights(): (Array[Float], Array[Float], Array[Float], Array[Float], Array[Float], Array[Float]) =
val stream = getClass.getResourceAsStream("/nnue_weights.bin")
if stream == null then
throw RuntimeException("NNUE weights file not found in resources")
try
val bytes = stream.readAllBytes()
val buffer = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN)
// Read and verify magic number
val magic = buffer.getInt()
if magic != 0x4555_4e4e then // "NNUE" in little-endian
throw RuntimeException(s"Invalid magic number: 0x${magic.toHexString}")
// Read version
val version = buffer.getInt()
if version != 1 then
throw RuntimeException(s"Unsupported weight version: $version")
// Read all weight tensors in order
val l1w = readTensor(buffer)
val l1b = readTensor(buffer)
val l2w = readTensor(buffer)
val l2b = readTensor(buffer)
val l3w = readTensor(buffer)
val l3b = readTensor(buffer)
(l1w, l1b, l2w, l2b, l3w, l3b)
finally stream.close()
private def readTensor(buffer: ByteBuffer): Array[Float] =
// Read shape
val shapeLen = buffer.getInt()
val shape = Array.ofDim[Int](shapeLen)
for i <- 0 until shapeLen do
shape(i) = buffer.getInt()
// Calculate total elements
val totalElements = shape.product
// Read float data
val floats = Array.ofDim[Float](totalElements)
for i <- 0 until totalElements do
floats(i) = buffer.getFloat()
floats
// Pre-allocated buffers for inference
private val features = new Array[Float](768)
@@ -19,7 +62,7 @@ class NNUE:
/** Convert a position to 768-dimensional binary feature vector.
* 12 piece types (white pawn to black king) × 64 squares from white's perspective. */
def positionToFeatures(board: Board, sideToMove: Color): Array[Float] =
private def positionToFeatures(board: Board, sideToMove: Color): Array[Float] =
// Zero out features array
java.util.Arrays.fill(features, 0f)
@@ -1,39 +0,0 @@
package de.nowchess.bot.bots.nnue
object NNUEWeights:
// PLACEHOLDER: This file is generated by export_weights.py
// Run: python3 modules/bot/python/run_pipeline.sh to generate actual weights
// Layer 1: Input(768) -> Hidden(256)
val l1_weights = Array(
0f
)
// Shape: [256, 768]
val l1_bias = Array(
0f
)
// Shape: [256]
// Layer 2: Hidden(256) -> Hidden(32)
val l2_weights = Array(
0f
)
// Shape: [32, 256]
val l2_bias = Array(
0f
)
// Shape: [32]
// Layer 3: Hidden(32) -> Output(1)
val l3_weights = Array(
0f
)
// Shape: [1, 32]
val l3_bias = Array(
0f
)
// Shape: [1]
@@ -3,7 +3,7 @@ package de.nowchess.ui
import de.nowchess.api.board.Color.Black
import de.nowchess.bot.util.PolyglotBook
import de.nowchess.bot.BotDifficulty
import de.nowchess.bot.bots.ClassicalBot
import de.nowchess.bot.bots.{ClassicalBot, NNUEBot}
import de.nowchess.chess.engine.GameEngine
import de.nowchess.ui.terminal.TerminalUI
import de.nowchess.ui.gui.ChessGUILauncher
@@ -17,9 +17,9 @@ object Main:
val engine = new GameEngine()
val book = PolyglotBook("/home/janis/Workspaces/IntelliJ/NowChess/NowChessSystems/modules/bot/codekiddy.bin")
val book = PolyglotBook("../../modules/bot/codekiddy.bin")
engine.setOpponentBot(ClassicalBot(BotDifficulty.Easy, book = Some(book)), Black);
engine.setOpponentBot(NNUEBot(BotDifficulty.Easy, book = Some(book)), Black);
// Launch ScalaFX GUI in separate thread
ChessGUILauncher.launch(engine)