384 lines
9.2 KiB
Markdown
384 lines
9.2 KiB
Markdown
# Debugging the NNUE Pipeline
|
|
|
|
## Common Issues & Solutions
|
|
|
|
### Issue 1: Empty training_data.jsonl
|
|
|
|
**Symptom:** After running the pipeline, `training_data.jsonl` is empty or doesn't exist.
|
|
|
|
**Diagnosis:** Run labeling with verbose output:
|
|
|
|
```bash
|
|
python label_positions.py positions.txt training_data.jsonl /path/to/stockfish --verbose
|
|
```
|
|
|
|
**Check these in order:**
|
|
|
|
#### 1. Is `positions.txt` empty?
|
|
|
|
```bash
|
|
wc -l positions.txt
|
|
```
|
|
|
|
If 0 lines: positions generator is failing. See Issue 2.
|
|
|
|
If >0 lines: positions exist. Check step 2.
|
|
|
|
#### 2. Is Stockfish installed and working?
|
|
|
|
```bash
|
|
# Linux/macOS
|
|
which stockfish
|
|
stockfish --version
|
|
|
|
# Windows
|
|
where stockfish
|
|
C:\path\to\stockfish.exe --version
|
|
```
|
|
|
|
If not found: Install from https://stockfishchess.org
|
|
|
|
#### 3. Is the Stockfish path correct?
|
|
|
|
```bash
|
|
# Check what path the labeler is using
|
|
export STOCKFISH_PATH=/your/path/to/stockfish
|
|
echo $STOCKFISH_PATH
|
|
|
|
python label_positions.py positions.txt training_data.jsonl $STOCKFISH_PATH --verbose
|
|
```
|
|
|
|
The script will print at the top: `Using Stockfish: /path/to/stockfish`
|
|
|
|
#### 4. Check the error summary
|
|
|
|
After running with verbose, look for the summary:
|
|
|
|
```
|
|
============================================================
|
|
LABELING SUMMARY
|
|
============================================================
|
|
Successfully evaluated: 0 ← This should be > 0
|
|
Skipped (duplicates): 0
|
|
Skipped (invalid): 0
|
|
Errors: 0
|
|
```
|
|
|
|
If "Successfully evaluated" is 0, positions aren't being saved.
|
|
|
|
---
|
|
|
|
### Issue 2: Empty positions.txt
|
|
|
|
**Symptom:** `positions.txt` is empty after running `generate_positions.py`
|
|
|
|
**Diagnosis:** Check the generation summary:
|
|
|
|
```bash
|
|
python generate_positions.py positions.txt --games 10000
|
|
```
|
|
|
|
Expected output:
|
|
|
|
```
|
|
============================================================
|
|
POSITION GENERATION SUMMARY
|
|
============================================================
|
|
Total games: 10000
|
|
Saved positions: 1234 ← This should be > 0
|
|
Filtered (check): 2345
|
|
Filtered (captures): 4321
|
|
Filtered (game over): 1100
|
|
Total filtered: 7766
|
|
Acceptance rate: 12.34%
|
|
============================================================
|
|
```
|
|
|
|
**If Saved positions = 0:**
|
|
|
|
The filters are too strict! Try with `--no-filter-captures`:
|
|
|
|
```bash
|
|
python generate_positions.py positions.txt --games 10000 --no-filter-captures
|
|
```
|
|
|
|
This allows positions with available captures, which should greatly increase the output.
|
|
|
|
---
|
|
|
|
### Issue 3: Stockfish Errors During Labeling
|
|
|
|
**Symptom:** Labeling runs but shows errors like:
|
|
```
|
|
Error evaluating position: rnbqkbnr/pppppppp...
|
|
SomeError: [error details]
|
|
```
|
|
|
|
**Solutions:**
|
|
|
|
1. **Check Stockfish is responsive:**
|
|
```bash
|
|
# Test Stockfish directly
|
|
echo "position startpos" | stockfish
|
|
echo "quit" | stockfish
|
|
```
|
|
|
|
2. **Try with lower depth** (faster, fewer timeouts):
|
|
```bash
|
|
python label_positions.py positions.txt training_data.jsonl /path/to/stockfish --depth 8
|
|
```
|
|
|
|
3. **Use explicit path** instead of relying on PATH:
|
|
```bash
|
|
python label_positions.py positions.txt training_data.jsonl /usr/games/stockfish
|
|
```
|
|
|
|
4. **Check if FENs in positions.txt are valid:**
|
|
```bash
|
|
head -5 positions.txt
|
|
```
|
|
|
|
Output should look like:
|
|
```
|
|
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
|
|
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
|
|
```
|
|
|
|
---
|
|
|
|
### Issue 4: Training Fails - No Valid Data
|
|
|
|
**Symptom:** `train_nnue.py` crashes with:
|
|
```
|
|
IndexError: list index out of range
|
|
```
|
|
|
|
**Cause:** `training_data.jsonl` is empty or contains invalid JSON.
|
|
|
|
**Debug:**
|
|
|
|
```bash
|
|
# Check file size
|
|
ls -lh training_data.jsonl
|
|
|
|
# Count valid lines
|
|
python -c "import json; lines = [1 for line in open('training_data.jsonl') if json.loads(line)]; print(f'Valid lines: {len(lines)}')"
|
|
|
|
# Look at first few lines
|
|
head -3 training_data.jsonl
|
|
```
|
|
|
|
Expected output:
|
|
```
|
|
{"fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1", "eval": 45}
|
|
{"fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1", "eval": 48}
|
|
```
|
|
|
|
If empty: go back to Issue 1.
|
|
|
|
---
|
|
|
|
## Step-by-Step Verification
|
|
|
|
Run this to verify each step works:
|
|
|
|
```bash
|
|
cd modules/bot/python
|
|
|
|
# Step 1: Generate 1000 positions (quick test)
|
|
echo "Testing position generation..."
|
|
python generate_positions.py test_positions.txt --games 1000 --no-filter-captures
|
|
|
|
# Check output
|
|
if [ ! -s test_positions.txt ]; then
|
|
echo "ERROR: positions.txt is empty"
|
|
exit 1
|
|
fi
|
|
POSITIONS=$(wc -l < test_positions.txt)
|
|
echo "✓ Generated $POSITIONS positions"
|
|
|
|
# Step 2: Label positions (quick test with 100 positions)
|
|
echo "Testing Stockfish labeling..."
|
|
export STOCKFISH_PATH=$(which stockfish || which /usr/games/stockfish || echo "stockfish")
|
|
if ! command -v $STOCKFISH_PATH &> /dev/null; then
|
|
echo "ERROR: Stockfish not found"
|
|
echo " Install: apt-get install stockfish (Linux) or brew install stockfish (Mac)"
|
|
exit 1
|
|
fi
|
|
|
|
head -100 test_positions.txt > test_positions_100.txt
|
|
python label_positions.py test_positions_100.txt test_training_data.jsonl $STOCKFISH_PATH --depth 8
|
|
|
|
# Check output
|
|
if [ ! -s test_training_data.jsonl ]; then
|
|
echo "ERROR: training_data.jsonl is empty"
|
|
echo " Run again with --verbose:"
|
|
python label_positions.py test_positions_100.txt test_training_data.jsonl $STOCKFISH_PATH --depth 8 --verbose
|
|
exit 1
|
|
fi
|
|
EVALS=$(wc -l < test_training_data.jsonl)
|
|
echo "✓ Evaluated $EVALS positions"
|
|
|
|
# Step 3: Test training
|
|
echo "Testing training..."
|
|
python train_nnue.py test_training_data.jsonl test_weights.pt --epochs 1 --batch-size 32 --no-versioning
|
|
|
|
if [ ! -f test_weights.pt ]; then
|
|
echo "ERROR: training failed"
|
|
exit 1
|
|
fi
|
|
echo "✓ Training works"
|
|
|
|
echo ""
|
|
echo "All tests passed! Pipeline is working correctly."
|
|
echo "You can now run the full pipeline with:"
|
|
echo " ./run_pipeline.sh"
|
|
```
|
|
|
|
Save as `test_pipeline.sh` and run:
|
|
|
|
```bash
|
|
chmod +x test_pipeline.sh
|
|
./test_pipeline.sh
|
|
```
|
|
|
|
---
|
|
|
|
## Common Error Messages
|
|
|
|
### "Stockfish not found at stockfish"
|
|
|
|
```bash
|
|
# Set the full path
|
|
export STOCKFISH_PATH=/usr/games/stockfish
|
|
# Or on Windows:
|
|
set STOCKFISH_PATH=C:\stockfish\stockfish.exe
|
|
```
|
|
|
|
### "No such file or directory: positions.txt"
|
|
|
|
```bash
|
|
# Make sure you're in the right directory
|
|
cd modules/bot/python
|
|
|
|
# Or provide full path
|
|
python label_positions.py /full/path/to/positions.txt training_data.jsonl stockfish
|
|
```
|
|
|
|
### "JSONDecodeError" in training
|
|
|
|
```bash
|
|
# training_data.jsonl has invalid JSON
|
|
# Regenerate it:
|
|
rm training_data.jsonl
|
|
python label_positions.py positions.txt training_data.jsonl stockfish
|
|
```
|
|
|
|
### "CUDA out of memory"
|
|
|
|
```bash
|
|
# Reduce batch size
|
|
python train_nnue.py training_data.jsonl nnue_weights.pt --batch-size 1024
|
|
```
|
|
|
|
---
|
|
|
|
## Getting More Information
|
|
|
|
### Verbose Output
|
|
|
|
All scripts support `--verbose` for detailed debugging:
|
|
|
|
```bash
|
|
python label_positions.py positions.txt training_data.jsonl stockfish --verbose
|
|
```
|
|
|
|
This prints:
|
|
- Which Stockfish is being used
|
|
- Error details for each failed position
|
|
- Summary of what passed/failed/skipped
|
|
|
|
### File Size Checks
|
|
|
|
```bash
|
|
# Check all files
|
|
ls -lh positions.txt training_data.jsonl nnue_weights.pt
|
|
|
|
# Count lines
|
|
echo "Positions: $(wc -l < positions.txt)"
|
|
echo "Training data: $(wc -l < training_data.jsonl)"
|
|
```
|
|
|
|
### Quick Tests
|
|
|
|
```bash
|
|
# Test position generation (100 games)
|
|
python generate_positions.py test_pos.txt --games 100 --no-filter-captures
|
|
|
|
# Test Stockfish labeling (10 positions)
|
|
head -10 test_pos.txt > test_pos_10.txt
|
|
python label_positions.py test_pos_10.txt test_data_10.jsonl stockfish --depth 6
|
|
|
|
# Test training (on test data)
|
|
python train_nnue.py test_data_10.jsonl test_model.pt --epochs 1 --batch-size 8
|
|
```
|
|
|
|
---
|
|
|
|
## Pipeline Workflow with Debugging
|
|
|
|
```bash
|
|
# 1. Generate positions
|
|
python generate_positions.py positions.txt --games 100000 --no-filter-captures
|
|
# Should output: Saved positions: ~20000-40000 (depends on filter)
|
|
|
|
# 2. Label with Stockfish
|
|
export STOCKFISH_PATH=$(which stockfish)
|
|
python label_positions.py positions.txt training_data.jsonl $STOCKFISH_PATH --depth 10
|
|
# Should output: Successfully evaluated: > 0
|
|
|
|
# 3. Train model
|
|
python train_nnue.py training_data.jsonl nnue_weights.pt --epochs 5
|
|
# Should output: Training summary with version info
|
|
|
|
# 4. Export to Scala
|
|
python export_weights.py nnue_weights_v1.pt ../src/main/scala/de/nowchess/bot/bots/nnue/NNUEWeights.scala
|
|
# Should output: NNUEWeights.scala created
|
|
|
|
# 5. Compile Scala
|
|
cd ../..
|
|
./compile
|
|
# Should output: BUILD SUCCESSFUL
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Monitoring
|
|
|
|
While labeling is running, monitor progress:
|
|
|
|
```bash
|
|
# In another terminal
|
|
watch -n 5 'wc -l modules/bot/python/training_data.jsonl'
|
|
|
|
# Or on macOS
|
|
while true; do echo $(wc -l < modules/bot/python/training_data.jsonl) positions labeled; sleep 5; done
|
|
```
|
|
|
|
This shows how many positions per second are being evaluated.
|
|
|
|
---
|
|
|
|
## Still Stuck?
|
|
|
|
1. **Read the full output** — Don't skip error messages
|
|
2. **Check file sizes** — `ls -lh` shows if files are being created
|
|
3. **Run with `--verbose`** — Shows exactly what's failing
|
|
4. **Test individual steps** — Don't run full pipeline, test pieces
|
|
5. **Check Stockfish** — `stockfish --version` confirms it works
|
|
|
|
For more help, see:
|
|
- `README_NNUE.md` — Complete pipeline docs
|
|
- `TRAINING_GUIDE.md` — Training workflows
|
|
- `INCREMENTAL_TRAINING.md` — Versioning & checkpoints
|