NowChessSystems

Author	SHA1	Message	Date
Janis	6351a19b67	feat(analytics): feed Lichess PGN dumps into Spark batch jobs Build & Test (NowChessSystems) TeamCity build failed Details Add GameSource: normalises game records into a shared schema and selects backend via NOWCHESS_PGN_PATH. Unset = PostgreSQL game_records (unchanged); set = a Lichess PGN dump (file or http(s) URL). - Parse Lichess PGN with Spark SQL string functions only (no UDFs). - URLs fetched once via SparkContext.addFile, distributed to executors. - .pgn.zst decompressed in-process via zstd-jni, plain .pgn redistributed. - All four batch jobs read through GameSource and skip JDBC write-back in PGN mode (Parquet/CSV output only). Enables driving the analytics demo straight from https://database.lichess.org standard dumps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 10:40:29 +02:00
TeamCity	9e800ecb59	ci: bump version with Build-124	2026-06-16 19:41:52 +00:00
Janis Eccarius	46af1154de	fix(analytics): upgrade Spark to 4.0.3 — 3.5.x has no official Docker image apache/spark:3.5.4-scala2.13-java17-ubuntu does not exist on Docker Hub. Oldest available scala2.13 image is 4.0.3. Bump compileOnly deps and Dockerfile base image to match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-16 20:08:29 +02:00
Janis Eccarius	0e0ea4c989	feat(analytics): add PostgreSQL JDBC write-back to all four batch jobs Each batch job now writes its results to a Postgres table in addition to the existing Parquet/CSV output. OpeningBookJob → analytics_opening_stats, PlayerStatsJob → analytics_player_stats, PlayerClusteringJob → analytics_player_clusters + analytics_cluster_archetypes, PlayerGraphJob → analytics_player_graph. MLlib Vector columns are excluded from the JDBC write by reusing the already-selected scalar DataFrame in PlayerClusteringJob. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 22:35:30 +02:00
Janis Eccarius	95215b6a42	feat(analytics): add Dockerfile, CI workflow, and stable jar name for K8s deployment - Pin jar output to analytics.jar (no version suffix) so Dockerfile COPY is stable - Add Dockerfile based on apache/spark:3.5.4-scala2.13-java17-ubuntu - Add versions.env (0.1.0) matching GitOps overlay image tag - Add analytics-image.yml CI workflow following native-image.yml conventions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 22:30:31 +02:00
Janis Eccarius	e1d80b9331	feat(analytics): add Structured Streaming, MLlib clustering, GraphX jobs Three new Spark jobs demonstrating complementary Spark pillars: LiveDashboardJob (Structured Streaming): - Simulates NowChess game-over event stream via rate source - Watermarking (45 s late-data tolerance) - Tumbling 1-min windows → append-mode Parquet output - Sliding 5-min/1-min windows → update-mode console output - Checkpointing for exactly-once fault tolerance - Production wiring comments show Kafka / spark-redis swap-in PlayerClusteringJob (MLlib): - Derives 4 player features from game_records via JDBC - VectorAssembler + StandardScaler + KMeans inside a Pipeline - ClusteringEvaluator (silhouette score) to measure quality - Per-cluster archetype averages show what each tier represents PlayerGraphJob (GraphX): - Builds directed player graph (vertices=players, edges=games) - PageRank — identifies most influential/active players - ConnectedComponents — finds isolated player communities - Bridges GraphX RDD results back to DataFrames via explicit schema (avoids spark.implicits._ which breaks Scala 3 → Spark 2.13 interop) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 22:15:24 +02:00
Janis Eccarius	259b3bbb24	feat(analytics): add Spark batch analytics module New standalone modules:analytics submodule with two Spark jobs: - OpeningBookJob: reads game_records.pgn, extracts first N plies using pure Catalyst SQL expressions (no UDFs), aggregates win/draw/loss rates per opening sequence, writes Parquet + CSV top-1000 summary. - PlayerStatsJob: unions each game into a player-centric view, aggregates total_games/wins/losses/draws/avg_move_count/win_rate per player_id, writes Parquet. Module uses Scala 3 calling spark-sql_2.13 via JVM binary compatibility (DataFrame API only; no spark.implicits._ / typed Datasets). Spark is compileOnly; the fat jar bundles only scala3-library + postgresql driver. Submit via spark-submit; see build.gradle.kts header for invocation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 21:58:05 +02:00

7 Commits