Compare commits

..

5 Commits

Author SHA1 Message Date
TeamCity 71cb2cc56c ci: bump version with Build-132 2026-06-21 14:10:10 +00:00
Janis Eccarius f43d1930d8 fix(official-bots): make botToken optional, fall back to env, fix 502 status
Build & Test (NowChessSystems) TeamCity build finished
botToken in JoinTournamentRequest is now Option[String]. When absent the
service resolves it from TOURNAMENT_BOT_TOKEN env var so official-bot
join requests no longer need a token in the body.

Response status on join failure changed from BAD_GATEWAY (502) to
BAD_REQUEST (400).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-21 15:40:09 +02:00
Janis Eccarius da0e6d1ee2 feat(analytics): always write results to PostgreSQL regardless of input source
Build & Test (NowChessSystems) TeamCity build failed
Remove isPgnMode JDBC guard from all 4 original jobs so staging (Lichess PGN mode)
and production (game_records JDBC mode) both persist analytics results to the DB.

Add JDBC write-back to all 7 new jobs:
- GameLengthJob → analytics_game_length_distribution + analytics_game_length_by_result
- ColorAdvantageJob → analytics_color_advantage
- EloDistributionJob → analytics_elo_distribution
- TimeControlJob → analytics_time_control_stats
- DailyActivityJob → analytics_hourly_activity + analytics_weekly_activity
- RatingMismatchJob → analytics_rating_mismatch
- TerminationStatsJob → analytics_termination_stats

Add analytics_component_sizes JDBC write to PlayerGraphJob.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-21 15:36:07 +02:00
TeamCity a6c600d6ce ci: bump version with Build-131 2026-06-21 13:28:40 +00:00
Janis Eccarius 8e17c14dff feat(analytics): add 7 new Spark analytics jobs and extend GameSource
Build & Test (NowChessSystems) TeamCity build finished
Adds GameLengthJob, ColorAdvantageJob, EloDistributionJob, TimeControlJob,
DailyActivityJob, RatingMismatchJob, and TerminationStatsJob bringing total
batch pipelines to 11 (+ 1 streaming).

Extends GameSource with loadExtended() / fromLichessPgnExtended() extracting
WhiteElo, BlackElo, TimeControl, UTCDate, UTCTime, Termination, ECO from PGN
headers; JDBC path returns nulls for extended columns, keeping all existing
jobs unaffected.

PlayerStatsJob gains a CSV output alongside the existing Parquet write so
the analytics webview can display player statistics without pyarrow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-21 15:03:07 +02:00
19 changed files with 731 additions and 62 deletions
+29
View File
@@ -36,3 +36,32 @@
### Bug Fixes
* **analytics:** upgrade Spark to 4.0.3 — 3.5.x has no official Docker image ([46af115](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/46af1154de34a8596cb6cb28c6fad7aba90f597c))
## (2026-06-21)
### Features
* **analytics:** add 7 new Spark analytics jobs and extend GameSource ([8e17c14](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/8e17c14dff740cd115011dfbf17de35083b8fe46))
* **analytics:** add Dockerfile, CI workflow, and stable jar name for K8s deployment ([95215b6](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/95215b6a420fd526df1aa395f9b087556c8ad03b))
* **analytics:** add PostgreSQL JDBC write-back to all four batch jobs ([0e0ea4c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/0e0ea4c9893c6efed52e633e55d05ab3ed004502))
* **analytics:** add Spark batch analytics module ([259b3bb](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/259b3bbb24c0f23326269b93f4b3c84012f727cd))
* **analytics:** add Structured Streaming, MLlib clustering, GraphX jobs ([e1d80b9](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/e1d80b9331666feea191b1fd08aa762f3581c918))
* **official-bots:** park expert bot on tournament server at startup ([#76](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/76)) ([751a58b](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/751a58b6061f7434115e229a7661894c76768bc2))
### Bug Fixes
* **analytics:** upgrade Spark to 4.0.3 — 3.5.x has no official Docker image ([46af115](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/46af1154de34a8596cb6cb28c6fad7aba90f597c))
## (2026-06-21)
### Features
* **analytics:** add 7 new Spark analytics jobs and extend GameSource ([8e17c14](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/8e17c14dff740cd115011dfbf17de35083b8fe46))
* **analytics:** add Dockerfile, CI workflow, and stable jar name for K8s deployment ([95215b6](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/95215b6a420fd526df1aa395f9b087556c8ad03b))
* **analytics:** add PostgreSQL JDBC write-back to all four batch jobs ([0e0ea4c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/0e0ea4c9893c6efed52e633e55d05ab3ed004502))
* **analytics:** add Spark batch analytics module ([259b3bb](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/259b3bbb24c0f23326269b93f4b3c84012f727cd))
* **analytics:** add Structured Streaming, MLlib clustering, GraphX jobs ([e1d80b9](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/e1d80b9331666feea191b1fd08aa762f3581c918))
* **analytics:** always write results to PostgreSQL regardless of input source ([da0e6d1](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/da0e6d1ee2d391ecb6291396f82471eb51b1b25e))
* **official-bots:** park expert bot on tournament server at startup ([#76](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/76)) ([751a58b](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/751a58b6061f7434115e229a7661894c76768bc2))
### Bug Fixes
* **analytics:** upgrade Spark to 4.0.3 — 3.5.x has no official Docker image ([46af115](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/46af1154de34a8596cb6cb28c6fad7aba90f597c))
@@ -0,0 +1,72 @@
package de.nowchess.analytics
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions as F
import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StructType
import scala.jdk.CollectionConverters.*
object ColorAdvantageJob:
def main(args: Array[String]): Unit =
val jdbcUrl = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
val dbUser = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
val dbPass = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-color-advantage"
val spark = SparkSession
.builder()
.appName("NowChess Color Advantage")
.getOrCreate()
run(spark, jdbcUrl, dbUser, dbPass, outputDir)
spark.stop()
def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
val games = GameSource
.load(spark, jdbcUrl, dbUser, dbPass)
.select("result")
.filter(F.col("result").isNotNull)
val totalGames = games.count()
val whiteWins = games.filter(F.col("result") === "white").count()
val blackWins = games.filter(F.col("result") === "black").count()
val draws = games.filter(F.col("result") === "draw").count()
val schema = StructType(
Seq(
StructField("color", DataTypes.StringType, false),
StructField("total_games", DataTypes.LongType, false),
StructField("wins", DataTypes.LongType, false),
StructField("losses", DataTypes.LongType, false),
StructField("draws", DataTypes.LongType, false),
),
)
val rows = List(
Row("white", totalGames, whiteWins, blackWins, draws),
Row("black", totalGames, blackWins, whiteWins, draws),
)
val stats = spark
.createDataFrame(rows.asJava, schema)
.withColumn("win_rate", F.round(F.col("wins") / F.col("total_games").cast("double"), 3))
.orderBy(F.asc("color"))
stats.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/color_advantage")
stats.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_color_advantage")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
@@ -0,0 +1,99 @@
package de.nowchess.analytics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions as F
object DailyActivityJob:
def main(args: Array[String]): Unit =
val jdbcUrl = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
val dbUser = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
val dbPass = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-daily-activity"
val spark = SparkSession
.builder()
.appName("NowChess Daily Activity")
.getOrCreate()
run(spark, jdbcUrl, dbUser, dbPass, outputDir)
spark.stop()
def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
val games = GameSource
.loadExtended(spark, jdbcUrl, dbUser, dbPass)
.select("result", "utc_date", "utc_time")
.filter(F.col("utc_time").isNotNull.and(F.col("utc_date").isNotNull))
val hourOfDay = F.regexp_extract(F.col("utc_time"), "^(\\d{2})", 1).cast("int")
val dow = F.dayofweek(F.to_date(F.col("utc_date"), "yyyy.MM.dd"))
val tagged = games
.withColumn("hour_of_day", hourOfDay)
.withColumn("dow", dow)
val hourly = tagged
.groupBy("hour_of_day")
.agg(
F.count("*").as("total_games"),
F.sum(F.when(F.col("result") === "white", 1).otherwise(0)).as("white_wins"),
F.sum(F.when(F.col("result") === "black", 1).otherwise(0)).as("black_wins"),
F.sum(F.when(F.col("result") === "draw", 1).otherwise(0)).as("draws"),
)
.withColumn("white_win_rate", F.round(F.col("white_wins") / F.col("total_games").cast("double"), 3))
.orderBy(F.asc("hour_of_day"))
.select("hour_of_day", "total_games", "white_wins", "black_wins", "draws", "white_win_rate")
hourly.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/hourly_activity")
hourly.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_hourly_activity")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
val dayName = F
.when(F.col("dow") === 1, "Sunday")
.when(F.col("dow") === 2, "Monday")
.when(F.col("dow") === 3, "Tuesday")
.when(F.col("dow") === 4, "Wednesday")
.when(F.col("dow") === 5, "Thursday")
.when(F.col("dow") === 6, "Friday")
.otherwise("Saturday")
val weekly = tagged
.withColumn("day_of_week", dayName)
.withColumn("day_order", F.col("dow"))
.groupBy("day_of_week", "day_order")
.agg(
F.count("*").as("total_games"),
F.sum(F.when(F.col("result") === "white", 1).otherwise(0)).as("white_wins"),
F.sum(F.when(F.col("result") === "black", 1).otherwise(0)).as("black_wins"),
F.sum(F.when(F.col("result") === "draw", 1).otherwise(0)).as("draws"),
)
.withColumn("white_win_rate", F.round(F.col("white_wins") / F.col("total_games").cast("double"), 3))
.orderBy(F.asc("day_order"))
.drop("day_order")
.select("day_of_week", "total_games", "white_wins", "black_wins", "draws", "white_win_rate")
weekly.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/weekly_activity")
weekly.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_weekly_activity")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
@@ -0,0 +1,58 @@
package de.nowchess.analytics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions as F
object EloDistributionJob:
def main(args: Array[String]): Unit =
val jdbcUrl = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
val dbUser = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
val dbPass = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-elo-distribution"
val spark = SparkSession
.builder()
.appName("NowChess Elo Distribution")
.getOrCreate()
run(spark, jdbcUrl, dbUser, dbPass, outputDir)
spark.stop()
def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
val games = GameSource
.loadExtended(spark, jdbcUrl, dbUser, dbPass)
.filter(F.col("white_elo").isNotNull)
val whiteElo = games.select(F.col("white_elo").as("elo"))
val blackElo = games.select(F.col("black_elo").as("elo"))
val allElo = whiteElo.union(blackElo).filter(F.col("elo").isNotNull)
val bucketMin = (F.floor(F.col("elo") / 200) * 200).cast("int")
val bucketLabel = F.when(
F.col("elo") >= 2800,
F.lit("2800+"),
).otherwise(F.concat(bucketMin.cast("string"), F.lit("-"), (bucketMin + 199).cast("string")))
val distribution = allElo
.withColumn("elo_bucket", bucketLabel)
.withColumn("bucket_order", F.when(F.col("elo") >= 2800, 2800).otherwise(bucketMin))
.groupBy("elo_bucket", "bucket_order")
.agg(F.count("*").as("player_count"))
.orderBy(F.asc("bucket_order"))
.select("elo_bucket", "player_count")
distribution.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/elo_distribution")
distribution.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_elo_distribution")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
@@ -0,0 +1,111 @@
package de.nowchess.analytics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions as F
object GameLengthJob:
def main(args: Array[String]): Unit =
val jdbcUrl = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
val dbUser = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
val dbPass = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-game-length"
val spark = SparkSession
.builder()
.appName("NowChess Game Length")
.getOrCreate()
run(spark, jdbcUrl, dbUser, dbPass, outputDir)
spark.stop()
def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
val games = GameSource
.load(spark, jdbcUrl, dbUser, dbPass)
.select("result", "move_count")
.filter(F.col("result").isNotNull.and(F.col("move_count").isNotNull))
val moves = F.col("move_count")
val bucket = F
.when(moves <= 10, "1-10")
.when(moves <= 20, "11-20")
.when(moves <= 30, "21-30")
.when(moves <= 40, "31-40")
.when(moves <= 60, "41-60")
.when(moves <= 100, "61-100")
.otherwise("101+")
val bucketOrder = F
.when(moves <= 10, 1)
.when(moves <= 20, 2)
.when(moves <= 30, 3)
.when(moves <= 40, 4)
.when(moves <= 60, 5)
.when(moves <= 100, 6)
.otherwise(7)
val tagged = games
.withColumn("move_bucket", bucket)
.withColumn("bucket_order", bucketOrder)
val distribution = tagged
.groupBy("move_bucket", "bucket_order")
.agg(
F.count("*").as("total_games"),
F.sum(F.when(F.col("result") === "white", 1).otherwise(0)).as("white_wins"),
F.sum(F.when(F.col("result") === "black", 1).otherwise(0)).as("black_wins"),
F.sum(F.when(F.col("result") === "draw", 1).otherwise(0)).as("draws"),
)
.withColumn("white_win_rate", F.round(F.col("white_wins") / F.col("total_games").cast("double"), 3))
.withColumn("black_win_rate", F.round(F.col("black_wins") / F.col("total_games").cast("double"), 3))
.withColumn("draw_rate", F.round(F.col("draws") / F.col("total_games").cast("double"), 3))
.orderBy(F.asc("bucket_order"))
.drop("bucket_order")
.select(
"move_bucket",
"total_games",
"white_wins",
"black_wins",
"draws",
"white_win_rate",
"black_win_rate",
"draw_rate",
)
distribution.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/game_length_distribution")
distribution.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_game_length_distribution")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
val byResult = games
.groupBy("result")
.agg(
F.round(F.avg("move_count"), 1).as("avg_move_count"),
F.min("move_count").as("min_moves"),
F.max("move_count").as("max_moves"),
)
.orderBy(F.asc("result"))
byResult.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/game_length_by_result")
byResult.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_game_length_by_result")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
@@ -33,6 +33,19 @@ object GameSource:
case Some(path) => fromLichessPgn(spark, path)
case None => fromJdbc(spark, jdbcUrl, dbUser, dbPass)
def loadExtended(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String): DataFrame =
sys.env.get(PgnPathEnv) match
case Some(path) => fromLichessPgnExtended(spark, path)
case None =>
fromJdbc(spark, jdbcUrl, dbUser, dbPass)
.withColumn("white_elo", F.lit(null).cast("int"))
.withColumn("black_elo", F.lit(null).cast("int"))
.withColumn("time_control", F.lit(null).cast("string"))
.withColumn("utc_date", F.lit(null).cast("string"))
.withColumn("utc_time", F.lit(null).cast("string"))
.withColumn("termination", F.lit(null).cast("string"))
.withColumn("eco", F.lit(null).cast("string"))
def fromJdbc(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String): DataFrame =
spark.read
.format("jdbc")
@@ -89,6 +102,49 @@ object GameSource:
)
.filter((F.col("white_id") =!= "").and(F.col("black_id") =!= ""))
private def fromLichessPgnExtended(spark: SparkSession, path: String): DataFrame =
val resolved = resolvePath(spark, path)
val record = F.col("value")
val resultTag = F.regexp_extract(record, "Result \"([^\"]*)\"", 1)
val result = F
.when(resultTag === "1-0", "white")
.when(resultTag === "0-1", "black")
.when(resultTag === "1/2-1/2", "draw")
.otherwise(F.lit(null).cast("string"))
val moveText = F.coalesce(F.split(record, "\n\n").getItem(1), F.lit(""))
val noComment = F.regexp_replace(moveText, "\\{[^}]*\\}", "")
val noResult = F.regexp_replace(noComment, "(1-0|0-1|1/2-1/2|\\*)", "")
val noNumbers = F.regexp_replace(noResult, "\\d+\\.+", " ")
val plies = F.size(F.filter(F.split(F.trim(noNumbers), "\\s+"), tok => F.length(tok) > 0))
def nullable(extracted: org.apache.spark.sql.Column): org.apache.spark.sql.Column =
F.when(F.length(extracted) > 0, extracted).otherwise(F.lit(null).cast("string"))
val whiteElo = nullable(F.regexp_extract(record, "WhiteElo \"([^\"]*)\"", 1)).cast("int")
val blackElo = nullable(F.regexp_extract(record, "BlackElo \"([^\"]*)\"", 1)).cast("int")
spark.read
.option("lineSep", "[Event ")
.text(resolved)
.filter(F.length(F.trim(record)) > 0)
.select(
F.regexp_extract(record, "White \"([^\"]*)\"", 1).as("white_id"),
F.regexp_extract(record, "Black \"([^\"]*)\"", 1).as("black_id"),
result.as("result"),
plies.as("move_count"),
F.concat(F.lit("[Event "), record).as("pgn"),
whiteElo.as("white_elo"),
blackElo.as("black_elo"),
nullable(F.regexp_extract(record, "TimeControl \"([^\"]*)\"", 1)).as("time_control"),
nullable(F.regexp_extract(record, "UTCDate \"([^\"]*)\"", 1)).as("utc_date"),
nullable(F.regexp_extract(record, "UTCTime \"([^\"]*)\"", 1)).as("utc_time"),
nullable(F.regexp_extract(record, "Termination \"([^\"]*)\"", 1)).as("termination"),
nullable(F.regexp_extract(record, "ECO \"([^\"]*)\"", 1)).as("eco"),
)
.filter((F.col("white_id") =!= "").and(F.col("black_id") =!= ""))
/** Turns an http(s)/ftp URL into a cluster-local path by fetching it once with SparkContext.addFile, which
* distributes the file to every executor. `.zst` is decompressed in-process and the plain `.pgn` is redistributed.
* Non-URL paths are returned unchanged.
@@ -72,16 +72,15 @@ object OpeningBookJob:
.option("header", "true")
.csv(s"$outputDir/opening_book_top1000")
if !GameSource.isPgnMode then
top1000.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_opening_stats")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
top1000.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_opening_stats")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
/** Extracts the first `maxPlies` moves from a PGN column as a space-separated string.
*
@@ -119,26 +119,25 @@ object PlayerClusteringJob:
.option("header", "true")
.csv(s"$outputDir/cluster_archetypes")
if !GameSource.isPgnMode then
clustersDf.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_player_clusters")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
clustersDf.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_player_clusters")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
archetypes.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_cluster_archetypes")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
archetypes.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_cluster_archetypes")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
private def buildPlayerStats(games: org.apache.spark.sql.DataFrame): org.apache.spark.sql.DataFrame =
val asWhite = games.select(
@@ -109,16 +109,15 @@ object PlayerGraphJob:
.mode("overwrite")
.parquet(s"$outputDir/player_graph")
if !GameSource.isPgnMode then
result.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_player_graph")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
result.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_player_graph")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
// How many players belong to each connected component?
// A large dominant component + many singletons is the expected shape.
@@ -135,6 +134,16 @@ object PlayerGraphJob:
.option("header", "true")
.csv(s"$outputDir/component_sizes")
componentSizes.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_component_sizes")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
// Build a two-column DataFrame (vertex_id: Long, valueCol: valueType) from an RDD.
// Used to bridge GraphX RDD results into the DataFrame API without implicits.
private def rddToFrame[T](
@@ -77,13 +77,17 @@ object PlayerStatsJob:
.mode("overwrite")
.parquet(s"$outputDir/player_stats")
if !GameSource.isPgnMode then
stats.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_player_stats")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
stats.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/player_stats_csv")
stats.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_player_stats")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
@@ -0,0 +1,75 @@
package de.nowchess.analytics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions as F
object RatingMismatchJob:
def main(args: Array[String]): Unit =
val jdbcUrl = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
val dbUser = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
val dbPass = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-rating-mismatch"
val spark = SparkSession
.builder()
.appName("NowChess Rating Mismatch")
.getOrCreate()
run(spark, jdbcUrl, dbUser, dbPass, outputDir)
spark.stop()
def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
val games = GameSource
.loadExtended(spark, jdbcUrl, dbUser, dbPass)
.select("result", "white_elo", "black_elo")
.filter(F.col("white_elo").isNotNull.and(F.col("black_elo").isNotNull))
val eloDiff = F.col("white_elo") - F.col("black_elo")
val bracket = F
.when(eloDiff < -200, "Black +200")
.when(eloDiff < -100, "Black +100200")
.when(eloDiff < -50, "Black +50100")
.when(eloDiff <= 50, "Even (±50)")
.when(eloDiff <= 100, "White +50100")
.when(eloDiff <= 200, "White +100200")
.otherwise("White +200")
val bracketOrder = F
.when(eloDiff < -200, 1)
.when(eloDiff < -100, 2)
.when(eloDiff < -50, 3)
.when(eloDiff <= 50, 4)
.when(eloDiff <= 100, 5)
.when(eloDiff <= 200, 6)
.otherwise(7)
val stats = games
.withColumn("elo_diff", eloDiff)
.withColumn("bracket", bracket)
.withColumn("bracket_order", bracketOrder)
.groupBy("bracket", "bracket_order")
.agg(
F.count("*").as("total_games"),
F.sum(F.when(F.col("result") === "white", 1).otherwise(0)).as("white_wins"),
F.sum(F.when(F.col("result") === "black", 1).otherwise(0)).as("black_wins"),
F.sum(F.when(F.col("result") === "draw", 1).otherwise(0)).as("draws"),
)
.withColumn("white_win_rate", F.round(F.col("white_wins") / F.col("total_games").cast("double"), 3))
.orderBy(F.asc("bracket_order"))
.drop("bracket_order")
.select("bracket", "total_games", "white_wins", "black_wins", "draws", "white_win_rate")
stats.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/rating_mismatch")
stats.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_rating_mismatch")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
@@ -0,0 +1,54 @@
package de.nowchess.analytics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions as F
object TerminationStatsJob:
def main(args: Array[String]): Unit =
val jdbcUrl = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
val dbUser = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
val dbPass = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-termination-stats"
val spark = SparkSession
.builder()
.appName("NowChess Termination Stats")
.getOrCreate()
run(spark, jdbcUrl, dbUser, dbPass, outputDir)
spark.stop()
def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
val games = GameSource
.loadExtended(spark, jdbcUrl, dbUser, dbPass)
.select("result", "termination")
.filter(F.col("termination").isNotNull.and(F.col("termination") =!= ""))
val stats = games
.groupBy("termination")
.agg(
F.count("*").as("total_games"),
F.sum(F.when(F.col("result") === "white", 1).otherwise(0)).as("white_wins"),
F.sum(F.when(F.col("result") === "black", 1).otherwise(0)).as("black_wins"),
F.sum(F.when(F.col("result") === "draw", 1).otherwise(0)).as("draws"),
)
.withColumn("draw_rate", F.round(F.col("draws") / F.col("total_games").cast("double"), 3))
.withColumnRenamed("termination", "termination_type")
.orderBy(F.desc("total_games"))
.select("termination_type", "total_games", "white_wins", "black_wins", "draws", "draw_rate")
stats.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/termination_stats")
stats.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_termination_stats")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
@@ -0,0 +1,68 @@
package de.nowchess.analytics
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions as F
object TimeControlJob:
def main(args: Array[String]): Unit =
val jdbcUrl = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
val dbUser = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
val dbPass = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-time-control"
val spark = SparkSession
.builder()
.appName("NowChess Time Control")
.getOrCreate()
run(spark, jdbcUrl, dbUser, dbPass, outputDir)
spark.stop()
def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
val games = GameSource
.loadExtended(spark, jdbcUrl, dbUser, dbPass)
.select("result", "time_control")
.filter(
F.col("time_control").isNotNull
.and(F.col("time_control") =!= "")
.and(F.col("time_control") =!= "-"),
)
val baseSeconds = F.regexp_extract(F.col("time_control"), "^(?:\\d+/)?(\\d+)", 1).cast("int")
val category = F
.when(baseSeconds < 30, "UltraBullet")
.when(baseSeconds < 180, "Bullet")
.when(baseSeconds < 480, "Blitz")
.when(baseSeconds < 1500, "Rapid")
.when(baseSeconds < 86400, "Classical")
.otherwise("Correspondence")
val stats = games
.withColumn("category", category)
.groupBy("category")
.agg(
F.count("*").as("total_games"),
F.sum(F.when(F.col("result") === "white", 1).otherwise(0)).as("white_wins"),
F.sum(F.when(F.col("result") === "black", 1).otherwise(0)).as("black_wins"),
F.sum(F.when(F.col("result") === "draw", 1).otherwise(0)).as("draws"),
)
.withColumn("white_win_rate", F.round(F.col("white_wins") / F.col("total_games").cast("double"), 3))
.withColumn("draw_rate", F.round(F.col("draws") / F.col("total_games").cast("double"), 3))
.orderBy(F.desc("total_games"))
.select("category", "total_games", "white_wins", "black_wins", "draws", "white_win_rate", "draw_rate")
stats.write
.mode("overwrite")
.option("header", "true")
.csv(s"$outputDir/time_control_stats")
stats.write
.mode("overwrite")
.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "analytics_time_control_stats")
.option("user", dbUser)
.option("password", dbPass)
.option("driver", "org.postgresql.Driver")
.save()
+1 -1
View File
@@ -1,3 +1,3 @@
MAJOR=0
MINOR=4
MINOR=6
PATCH=0
+31
View File
@@ -370,3 +370,34 @@
### Reverts
* Revert "refactor: update metrics paths formatting in application.yml for clarity" ([3870566](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/38705663498d5f47c40dafe2f26198589ede8656))
## (2026-06-21)
### Features
* add initialization metrics for various services ([d438e97](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d438e97f32bdde0bfc63c1b4a8cc810cdd093166))
* add OpenTelemetry trace configuration with parentbased sampler ([3904d5a](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/3904d5ad8ad4930ddee65287a7bfab785a6148f5))
* **analytics:** add Spark batch analytics module ([#70](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/70)) ([39f1657](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/39f1657e1db6e84889af338c43be8cb5c03c3ec3))
* **config:** update application.yml for PostgreSQL and remove staging/production configurations ([2404e61](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/2404e6164c3b50ffccbea5238d636060d6abe4d6))
* **config:** update application.yml for staging and production environments ([6113432](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/6113432a14c476a3a0dfc0d449e17d023697f2ba))
* configure logging and add OpenTelemetry support ([#49](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/49)) ([d57c488](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d57c4886612d1d92da0e1b79209fc83e6ef537a1))
* **docker:** add .dockerignore and .gitignore files for build exclusions ([c987d8e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/c987d8e258c0e6c4cfbdaa8381c64c410d7a2b83))
* **docker:** add Dockerfiles for building Quarkus application in native and JVM modes ([3f2d2bb](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/3f2d2bb4c97fa8cddba66e1da4427c54236dfeed))
* **docker:** add Dockerfiles for Quarkus application in JVM and native modes ([34b9933](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/34b993304670cf2aa62cd2f6460cee7b9864b08e))
* **events:** migrate game-creation and bot flows to Redis Streams NCS-89 ([#62](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/62)) ([a24924c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/a24924c23057db3d700a75dbc4333557789cd991))
* NCS-78 Add Traceability to the Applications ([#46](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/46)) ([649566e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/649566eb3fcf38f91c8896a739f74ea318af312d))
* NCS-78 Add Traceability to the Applications ([#47](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/47)) ([87dfc6c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/87dfc6c2bcce7f7d58fc641bd8d468a2e584c108))
* NCS-82 add Swiss-system tournament module ([#55](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/55)) ([c5661de](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/c5661de4a0ebf4b33211f5a391840dcf744656b7))
* **official-bots:** consume GameOver stream for bot cleanup ([#67](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/67)) ([db9d153](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/db9d1533912f4b41c4d1ca80ccffdde5d23d6ff6))
* **official-bots:** park expert bot on tournament server at startup ([#75](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/75)) ([30295a4](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/30295a4bb95855ee8261c92278bb9ebc80ee12ee))
* true-microservices ([#40](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/40)) ([5909242](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/590924254e8a2754de661a57a03e43f89ceb6299))
### Bug Fixes
* enable official bots to connect to external tournament server ([#71](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/71)) ([688d30e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/688d30e2b10026923372be5fca3c63eaaee2de2a))
* **official-bots:** configure JWT verification ([#72](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/72)) ([98c64fc](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/98c64fc0d56dc542beb31c75f4b9056d91de03cd))
* **official-bots:** make botToken optional, fall back to env, fix 502 status ([f43d193](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/f43d1930d80670d810c57b54eaa3789854fa082c))
* **official-bots:** NCS-70-auto-register official bots with account service ([#59](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/59)) ([7117a93](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/7117a93376272094d0b1a6abf2121254ce396684))
### Reverts
* Revert "refactor: update metrics paths formatting in application.yml for clarity" ([3870566](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/38705663498d5f47c40dafe2f26198589ede8656))
@@ -2,7 +2,7 @@ package de.nowchess.bot.resource
case class JoinTournamentRequest(
tournamentId: String,
botToken: String,
botToken: Option[String],
difficulty: String,
serverUrl: Option[String],
)
@@ -39,6 +39,6 @@ class TournamentJoinResource:
Response.ok(resp).build()
case Left(err) =>
Response
.status(Response.Status.BAD_GATEWAY)
.status(Response.Status.BAD_REQUEST)
.entity(s"""{"error":"$err"}""")
.build()
@@ -82,18 +82,23 @@ class TournamentBotGamePlayer:
def joinTournament(
tournamentId: String,
botToken: String,
botToken: Option[String],
difficulty: String,
serverUrl: String,
): Either[String, String] =
TournamentBotConfig.jwtSubject(botToken) match
case None => Left("Invalid bot token — could not extract subject")
case Some(botId) =>
val cfg = TournamentBotConfig(serverUrl, tournamentId, botToken, botId, difficulty)
if join(cfg) then
startAsync(cfg)
Right(botId)
else Left("Failed to join tournament")
val resolvedToken = botToken.filter(_.nonEmpty)
.orElse(System.getenv().asScala.get("TOURNAMENT_BOT_TOKEN").filter(_.nonEmpty))
resolvedToken match
case None => Left("No bot token provided and TOURNAMENT_BOT_TOKEN not configured")
case Some(token) =>
TournamentBotConfig.jwtSubject(token) match
case None => Left("Invalid bot token — could not extract subject")
case Some(botId) =>
val cfg = TournamentBotConfig(serverUrl, tournamentId, token, botId, difficulty)
if join(cfg) then
startAsync(cfg)
Right(botId)
else Left("Failed to join tournament")
private def startAsync(cfg: TournamentBotConfig): Unit =
val thread = new Thread(() => streamLoop(cfg), s"TournamentBot-${cfg.tournamentId}")
+1 -1
View File
@@ -1,3 +1,3 @@
MAJOR=0
MINOR=21
MINOR=22
PATCH=0