feat(analytics): add Spark batch analytics module #70

Merged
Janis merged 1 commits from feat/tournament-external-servers-bot-bridge into main 2026-06-16 20:38:16 +02:00
Member
No description provided.
Janis added 5 commits 2026-06-16 12:53:04 +02:00
New standalone modules:analytics submodule with two Spark jobs:

- OpeningBookJob: reads game_records.pgn, extracts first N plies using
  pure Catalyst SQL expressions (no UDFs), aggregates win/draw/loss rates
  per opening sequence, writes Parquet + CSV top-1000 summary.

- PlayerStatsJob: unions each game into a player-centric view, aggregates
  total_games/wins/losses/draws/avg_move_count/win_rate per player_id,
  writes Parquet.

Module uses Scala 3 calling spark-sql_2.13 via JVM binary compatibility
(DataFrame API only; no spark.implicits._ / typed Datasets). Spark is
compileOnly; the fat jar bundles only scala3-library + postgresql driver.
Submit via spark-submit; see build.gradle.kts header for invocation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three new Spark jobs demonstrating complementary Spark pillars:

LiveDashboardJob (Structured Streaming):
- Simulates NowChess game-over event stream via rate source
- Watermarking (45 s late-data tolerance)
- Tumbling 1-min windows → append-mode Parquet output
- Sliding 5-min/1-min windows → update-mode console output
- Checkpointing for exactly-once fault tolerance
- Production wiring comments show Kafka / spark-redis swap-in

PlayerClusteringJob (MLlib):
- Derives 4 player features from game_records via JDBC
- VectorAssembler + StandardScaler + KMeans inside a Pipeline
- ClusteringEvaluator (silhouette score) to measure quality
- Per-cluster archetype averages show what each tier represents

PlayerGraphJob (GraphX):
- Builds directed player graph (vertices=players, edges=games)
- PageRank — identifies most influential/active players
- ConnectedComponents — finds isolated player communities
- Bridges GraphX RDD results back to DataFrames via explicit schema
  (avoids spark.implicits._ which breaks Scala 3 → Spark 2.13 interop)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pin jar output to analytics.jar (no version suffix) so Dockerfile COPY is stable
- Add Dockerfile based on apache/spark:3.5.4-scala2.13-java17-ubuntu
- Add versions.env (0.1.0) matching GitOps overlay image tag
- Add analytics-image.yml CI workflow following native-image.yml conventions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each batch job now writes its results to a Postgres table in addition to
the existing Parquet/CSV output. OpeningBookJob → analytics_opening_stats,
PlayerStatsJob → analytics_player_stats, PlayerClusteringJob →
analytics_player_clusters + analytics_cluster_archetypes, PlayerGraphJob
→ analytics_player_graph. MLlib Vector columns are excluded from the JDBC
write by reusing the already-selected scalar DataFrame in
PlayerClusteringJob.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(tournament): add external server registry and official-bot bridge
Build & Test (NowChessSystems) TeamCity build finished
3718f7669a
TournamentServerRegistry holds multiple external tournament servers in
memory; ExternalTournamentClient proxies all read/write/stream calls to
them. TournamentResource fans out list() across registered servers and
routes per-tournament calls to the owning server via a tournamentId cache.
TournamentServerResource exposes GET/POST/DELETE /api/tournament/servers.

Official-bot bridge: TournamentBotGamePlayer gains a runtime joinTournament
API (registers fresh bot identity, starts background stream loop).
TournamentJoinResource exposes POST /api/bots/official/join-tournament so
the UI can add official bots without a server restart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Janis merged commit 39f1657e1d into main 2026-06-16 20:38:16 +02:00
Janis deleted branch feat/tournament-external-servers-bot-bridge 2026-06-16 20:38:16 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: NowChess/NowChessSystems#70