Add GameSource: normalises game records into a shared schema and
selects backend via NOWCHESS_PGN_PATH. Unset = PostgreSQL game_records
(unchanged); set = a Lichess PGN dump (file or http(s) URL).
- Parse Lichess PGN with Spark SQL string functions only (no UDFs).
- URLs fetched once via SparkContext.addFile, distributed to executors.
- .pgn.zst decompressed in-process via zstd-jni, plain .pgn redistributed.
- All four batch jobs read through GameSource and skip JDBC write-back
in PGN mode (Parquet/CSV output only).
Enables driving the analytics demo straight from
https://database.lichess.org standard dumps.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
apache/spark:3.5.4-scala2.13-java17-ubuntu does not exist on Docker Hub.
Oldest available scala2.13 image is 4.0.3. Bump compileOnly deps and
Dockerfile base image to match.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each batch job now writes its results to a Postgres table in addition to
the existing Parquet/CSV output. OpeningBookJob → analytics_opening_stats,
PlayerStatsJob → analytics_player_stats, PlayerClusteringJob →
analytics_player_clusters + analytics_cluster_archetypes, PlayerGraphJob
→ analytics_player_graph. MLlib Vector columns are excluded from the JDBC
write by reusing the already-selected scalar DataFrame in
PlayerClusteringJob.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pin jar output to analytics.jar (no version suffix) so Dockerfile COPY is stable
- Add Dockerfile based on apache/spark:3.5.4-scala2.13-java17-ubuntu
- Add versions.env (0.1.0) matching GitOps overlay image tag
- Add analytics-image.yml CI workflow following native-image.yml conventions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New standalone modules:analytics submodule with two Spark jobs:
- OpeningBookJob: reads game_records.pgn, extracts first N plies using
pure Catalyst SQL expressions (no UDFs), aggregates win/draw/loss rates
per opening sequence, writes Parquet + CSV top-1000 summary.
- PlayerStatsJob: unions each game into a player-centric view, aggregates
total_games/wins/losses/draws/avg_move_count/win_rate per player_id,
writes Parquet.
Module uses Scala 3 calling spark-sql_2.13 via JVM binary compatibility
(DataFrame API only; no spark.implicits._ / typed Datasets). Spark is
compileOnly; the fat jar bundles only scala3-library + postgresql driver.
Submit via spark-submit; see build.gradle.kts header for invocation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>