NowChessSystems/modules/bot/python/README_NNUE.md

# NNUE Training Pipeline

This directory contains the complete NNUE (Efficiently Updatable Neural Network) training pipeline for the Now-Chess bot.

## Overview

The pipeline generates 500,000 random chess positions, evaluates them with Stockfish, trains a neural network, and exports the weights as Scala code for integration into the engine.

## Prerequisites

Install Python dependencies:

```bash
pip install -r requirements.txt
```

Ensure Stockfish is installed. You can:
- Install via package manager: `apt-get install stockfish` (Linux) or `brew install stockfish` (macOS)
- Or download from [stockfish.org](https://stockfishchess.org)

Set the Stockfish path:
```bash
export STOCKFISH_PATH=/path/to/stockfish
```

## Pipeline Steps

### Quick Run

Run the entire pipeline:

```bash
chmod +x run_pipeline.sh
./run_pipeline.sh
```

This automatically runs all 4 steps in sequence and confirms each succeeds before continuing.

### Individual Steps

#### Step 1: Generate Positions

Generate 500,000 random chess positions:

```bash
python3 generate_positions.py positions.txt
```

Output: `positions.txt` (one FEN per line)
- Plays 8-20 random opening moves
- Filters out checks, captures available, and game-over positions
- Shows progress bar with tqdm

#### Step 2: Label with Stockfish

Evaluate each position with Stockfish at depth 12:

```bash
export STOCKFISH_PATH=/path/to/stockfish
python3 label_positions.py positions.txt training_data.jsonl $STOCKFISH_PATH
```

Output: `training_data.jsonl` (one JSON per line)
- Format: `{"fen": "...", "eval": 123}` (centipawns)
- Evals clamped to [-2000, 2000] to avoid mate score outliers
- Supports resuming if interrupted (checks for existing entries)
- Shows progress bar with tqdm

**Note:** This step is slow (~24-36 hours for 500K positions at depth 12). You can reduce games or use lower depth for testing.

#### Step 3: Train NNUE Model

Train the neural network:

```bash
python3 train_nnue.py training_data.jsonl nnue_weights.pt
```

Output: `nnue_weights.pt` (PyTorch model weights)

Architecture:
- Input: 768 binary features (12 piece types × 64 squares)
- Hidden 1: 256 neurons + ReLU
- Hidden 2: 32 neurons + ReLU
- Output: 1 neuron (sigmoid applied to eval/400)

Training:
- 20 epochs, batch size 4096, Adam optimizer (lr=1e-3)
- 90% train / 10% validation split
- Saves best weights by validation loss
- Shows train/val loss per epoch

**Note:** Requires GPU for reasonable speed (~2-4 hours). CPU falls back to ~8-16 hours.

#### Step 4: Export to Scala

Export weights as Scala code:

```bash
python3 export_weights.py nnue_weights.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
```

Output: `NNUEWeights.scala`
- Object with `val` arrays for each layer's weights and biases
- Format: `Array[Float]` with precision sufficient for inference
- Includes shape comments for reference

## Scala Integration

### Step 5: NNUE Evaluator

Create `NNUE.scala` in `src/main/scala/de/nowchess/bot/bots/nnue/`:

```scala
package de.nowchess.bot.bots.nnue

class NNUE:
  // Load weights from NNUEWeights.scala
  // Convert Position to 768-feature vector
  // Run inference: l1→ReLU→l2→ReLU→l3
  // Return centipawn score
```

### Step 6: Integration

Implement `NNUEBot` that uses the NNUE evaluator for move selection.

## File Reference

| File | Purpose |
|------|---------|
| `requirements.txt` | Python dependencies |
| `generate_positions.py` | Step 1: Position generator |
| `label_positions.py` | Step 2: Stockfish labeler |
| `train_nnue.py` | Step 3: NNUE trainer |
| `export_weights.py` | Step 4: Weight exporter |
| `run_pipeline.sh` | Master script (runs steps 1-4) |
| `positions.txt` | Output: Raw FENs (500K) |
| `training_data.jsonl` | Output: FEN+eval pairs |
| `nnue_weights.pt` | Output: Trained weights |
| `../src/main/scala/.../NNUEWeights.scala` | Output: Scala weights |

## Tips

- **For testing:** Reduce `generate_positions.py` to 10,000 games for quick iteration
- **Resume labeling:** Run step 2 again; it skips already-evaluated positions
- **GPU acceleration:** Install CUDA for PyTorch to speed up training
- **Stockfish tuning:** Lower depth (e.g., 8 instead of 12) for faster labeling
- **Batch size:** Increase to 8192 if OOM; decrease if out of memory

## Troubleshooting

**ImportError: No module named 'chess'**
- Run: `pip install -r requirements.txt`

**Stockfish not found**
- Check: `which stockfish` or set `export STOCKFISH_PATH=/full/path/to/stockfish`

**CUDA out of memory**
- Reduce batch size in `train_nnue.py` (e.g., 2048)
- Or use CPU: Remove CUDA check and device setup

**Training loss not decreasing**
- Check data quality: Sample some entries from `training_data.jsonl`
- Increase learning rate to 1e-2 or 5e-4 for experimentation
- Verify Stockfish depth was sufficient (depth ≥ 10)

## References

- [NNUE Overview](https://www.chessprogramming.org/NNUE)
- [python-chess](https://python-chess.readthedocs.io/)
- [PyTorch](https://pytorch.org/)
- [Stockfish](https://stockfishchess.org/)