NowChessSystems

NowChess/NowChessSystems

Fork 0

Commit Graph

Author	SHA1	Message	Date
Janis Eccarius	0e0ea4c989	feat(analytics): add PostgreSQL JDBC write-back to all four batch jobs Each batch job now writes its results to a Postgres table in addition to the existing Parquet/CSV output. OpeningBookJob → analytics_opening_stats, PlayerStatsJob → analytics_player_stats, PlayerClusteringJob → analytics_player_clusters + analytics_cluster_archetypes, PlayerGraphJob → analytics_player_graph. MLlib Vector columns are excluded from the JDBC write by reusing the already-selected scalar DataFrame in PlayerClusteringJob. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 22:35:30 +02:00
Janis Eccarius	e1d80b9331	feat(analytics): add Structured Streaming, MLlib clustering, GraphX jobs Three new Spark jobs demonstrating complementary Spark pillars: LiveDashboardJob (Structured Streaming): - Simulates NowChess game-over event stream via rate source - Watermarking (45 s late-data tolerance) - Tumbling 1-min windows → append-mode Parquet output - Sliding 5-min/1-min windows → update-mode console output - Checkpointing for exactly-once fault tolerance - Production wiring comments show Kafka / spark-redis swap-in PlayerClusteringJob (MLlib): - Derives 4 player features from game_records via JDBC - VectorAssembler + StandardScaler + KMeans inside a Pipeline - ClusteringEvaluator (silhouette score) to measure quality - Per-cluster archetype averages show what each tier represents PlayerGraphJob (GraphX): - Builds directed player graph (vertices=players, edges=games) - PageRank — identifies most influential/active players - ConnectedComponents — finds isolated player communities - Bridges GraphX RDD results back to DataFrames via explicit schema (avoids spark.implicits._ which breaks Scala 3 → Spark 2.13 interop) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 22:15:24 +02:00

Author

SHA1

Message

Date

Janis Eccarius

0e0ea4c989

feat(analytics): add PostgreSQL JDBC write-back to all four batch jobs

Each batch job now writes its results to a Postgres table in addition to
the existing Parquet/CSV output. OpeningBookJob → analytics_opening_stats,
PlayerStatsJob → analytics_player_stats, PlayerClusteringJob →
analytics_player_clusters + analytics_cluster_archetypes, PlayerGraphJob
→ analytics_player_graph. MLlib Vector columns are excluded from the JDBC
write by reusing the already-selected scalar DataFrame in
PlayerClusteringJob.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-15 22:35:30 +02:00

Janis Eccarius

e1d80b9331

feat(analytics): add Structured Streaming, MLlib clustering, GraphX jobs

Three new Spark jobs demonstrating complementary Spark pillars:

LiveDashboardJob (Structured Streaming):
- Simulates NowChess game-over event stream via rate source
- Watermarking (45 s late-data tolerance)
- Tumbling 1-min windows → append-mode Parquet output
- Sliding 5-min/1-min windows → update-mode console output
- Checkpointing for exactly-once fault tolerance
- Production wiring comments show Kafka / spark-redis swap-in

PlayerClusteringJob (MLlib):
- Derives 4 player features from game_records via JDBC
- VectorAssembler + StandardScaler + KMeans inside a Pipeline
- ClusteringEvaluator (silhouette score) to measure quality
- Per-cluster archetype averages show what each tier represents

PlayerGraphJob (GraphX):
- Builds directed player graph (vertices=players, edges=games)
- PageRank — identifies most influential/active players
- ConnectedComponents — finds isolated player communities
- Bridges GraphX RDD results back to DataFrames via explicit schema
  (avoids spark.implicits._ which breaks Scala 3 → Spark 2.13 interop)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-15 22:15:24 +02:00

2 Commits