ci: bump version with Build-152

feat(analytics): add accuracy and blunder analysis job for Lichess data
ci: bump version with Build-151
2026-06-23 22:30:53 +00:00 · 2026-06-24 00:21:40 +02:00 · 2026-06-23 21:54:06 +00:00 · 2026-06-23 23:42:15 +02:00 · 2026-06-23 23:34:38 +02:00
8 changed files with 614 additions and 4 deletions
@@ -81,3 +81,20 @@
 * **analytics:** upgrade Spark to 4.0.3 — 3.5.x has no official Docker image ([46af115](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/46af1154de34a8596cb6cb28c6fad7aba90f597c))
 * **analytics:** write decompressed PGN to shared PVC path for executor access ([a268a9a](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/a268a9acb7ba190c76e996ccf3ea3bd00e5cec92))
 ##  (2026-06-23)
 ### Features
 * **analytics:** add 7 new Spark analytics jobs and extend GameSource ([8e17c14](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/8e17c14dff740cd115011dfbf17de35083b8fe46))
 * **analytics:** add accuracy and blunder analysis job for Lichess data ([c3e7b82](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/c3e7b82ae806adf5713ce4d267c1155e73a40ff5))
 * **analytics:** add Dockerfile, CI workflow, and stable jar name for K8s deployment ([95215b6](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/95215b6a420fd526df1aa395f9b087556c8ad03b))
 * **analytics:** add PostgreSQL JDBC write-back to all four batch jobs ([0e0ea4c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/0e0ea4c9893c6efed52e633e55d05ab3ed004502))
 * **analytics:** add Spark batch analytics module ([259b3bb](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/259b3bbb24c0f23326269b93f4b3c84012f727cd))
 * **analytics:** add Structured Streaming, MLlib clustering, GraphX jobs ([e1d80b9](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/e1d80b9331666feea191b1fd08aa762f3581c918))
 * **analytics:** always write results to PostgreSQL regardless of input source ([da0e6d1](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/da0e6d1ee2d391ecb6291396f82471eb51b1b25e))
 * **official-bots:** park expert bot on tournament server at startup ([#76](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/76)) ([751a58b](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/751a58b6061f7434115e229a7661894c76768bc2))
 ### Bug Fixes
 * **analytics:** upgrade Spark to 4.0.3 — 3.5.x has no official Docker image ([46af115](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/46af1154de34a8596cb6cb28c6fad7aba90f597c))
 * **analytics:** write decompressed PGN to shared PVC path for executor access ([a268a9a](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/a268a9acb7ba190c76e996ccf3ea3bd00e5cec92))
@@ -0,0 +1,191 @@
 package de.nowchess.analytics
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.expressions.Window
 import org.apache.spark.sql.functions as F
 /** Per-move accuracy & blunder analysis mined from Lichess `[%eval ...]` move annotations.
  *
  * Unlike the flat single-`groupBy` summaries (opening rates, colour advantage), this job reconstructs the *quality of
  * every move* from the engine evaluations Lichess embeds in the movetext (`{ [%eval 0.24] }`, mate scores `[%eval
  * #-3]`) and turns them into the same accuracy signals lichess.com surfaces: average centipawn loss (ACPL), and counts
  * of inaccuracies / mistakes / blunders.
  *
  * Pipeline (all Spark SQL string/array functions + window funcs — no UDFs, Catalyst-friendly):
  *   1. Keep only games carrying `[%eval` comments.
  *   2. `regexp_extract_all` pulls every eval in ply order; mate scores collapse to ±10 pawns, normal evals are clamped
  *      to ±10 so a single huge swing cannot dominate the mean. All evals are White-POV pawns.
  *   3. `posexplode` → one row per ply; a per-game window `lag` gives the eval *before* the move.
  *   4. Centipawn loss for the side that moved = how much the eval moved against them (white wants it up, black down),
  *      floored at 0 and scaled to centipawns.
  *   5. Roll up to (game, side): ACPL + inaccuracy(≥50cp) / mistake(≥100cp) / blunder(≥200cp) counts, tagged with that
  *      side's Elo and whether they won.
  *
  * Outputs (Parquet + CSV + JDBC):
  *   - `accuracy_by_rating` — ACPL, avg blunders/mistakes/inaccuracies per game and win-rate, per Elo band. Shows how
  *     move quality scales with rating.
  *   - `blunder_outcome` — win-rate bucketed by number of blunders in the game. Quantifies "one blunder costs you the
  *     game".
  *
  * Requires the eval-annotated Lichess dump (`NOWCHESS_PGN_PATH` → an evals dump); JDBC games carry no per-move evals.
  */
 object AccuracyBlunderJob:
  def main(args: Array[String]): Unit =
    val jdbcUrl   = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
    val dbUser    = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
    val dbPass    = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
    val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-accuracy"
    val spark = SparkSession
      .builder()
      .appName("NowChess Accuracy & Blunders")
      .getOrCreate()
    run(spark, jdbcUrl, dbUser, dbPass, outputDir)
    spark.stop()
  def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
    val games = GameSource
      .loadExtended(spark, jdbcUrl, dbUser, dbPass)
      .select("pgn", "result", "white_elo", "black_elo")
      .filter(F.col("result").isNotNull.and(F.col("pgn").contains("[%eval")))
      .withColumn("game_id", F.monotonically_increasing_id())
    // White-POV pawn evals in ply order; mate → ±10, normal evals clamped to ±10.
    val evalStrs = F.expr("""regexp_extract_all(pgn, '\\[%eval ([^\\]]+)\\]', 1)""")
    val evalCps = F.expr(
      "transform(eval_strs, x -> CASE " +
        "WHEN x LIKE '#-%' THEN -10.0 " +
        "WHEN x LIKE '#%' THEN 10.0 " +
        "ELSE greatest(-10.0, least(10.0, cast(x as double))) END)",
    )
    val withEvals = games
      .withColumn("eval_strs", evalStrs)
      .withColumn("eval_cp", evalCps)
      .filter(F.size(F.col("eval_cp")) >= 2)
    val plies = withEvals.select(
      F.col("game_id"),
      F.col("result"),
      F.col("white_elo"),
      F.col("black_elo"),
      F.posexplode(F.col("eval_cp")).as(Seq("ply", "eval_after")),
    )
    val byGame     = Window.partitionBy("game_id").orderBy("ply")
    val mover      = F.when(F.col("ply") % 2 === 0, "white").otherwise("black")
    val evalBefore = F.coalesce(F.lag("eval_after", 1).over(byGame), F.lit(0.15))
    val cpl = F.greatest(
      F.lit(0.0),
      F.when(F.col("mover") === "white", evalBefore - F.col("eval_after"))
        .otherwise(F.col("eval_after") - evalBefore),
    ) * 100
    val moves = plies
      .withColumn("mover", mover)
      .withColumn("cpl", cpl)
    val perSide = moves
      .groupBy("game_id", "mover", "result", "white_elo", "black_elo")
      .agg(
        F.round(F.avg("cpl"), 1).as("acpl"),
        F.sum(F.when(F.col("cpl") >= 200, 1).otherwise(0)).as("blunders"),
        F.sum(F.when(F.col("cpl") >= 100 && F.col("cpl") < 200, 1).otherwise(0)).as("mistakes"),
        F.sum(F.when(F.col("cpl") >= 50 && F.col("cpl") < 100, 1).otherwise(0)).as("inaccuracies"),
      )
      .withColumn(
        "self_elo",
        F.when(F.col("mover") === "white", F.col("white_elo")).otherwise(F.col("black_elo")),
      )
      .withColumn("won", F.when(F.col("mover") === F.col("result"), 1).otherwise(0))
    writeAccuracyByRating(perSide, jdbcUrl, dbUser, dbPass, outputDir)
    writeBlunderOutcome(perSide, jdbcUrl, dbUser, dbPass, outputDir)
  private def writeAccuracyByRating(
      perSide: org.apache.spark.sql.DataFrame,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      outputDir: String,
  ): Unit =
    val elo = F.col("self_elo")
    val band = F
      .when(elo < 1200, "<1200")
      .when(elo < 1500, "1200–1499")
      .when(elo < 1800, "1500–1799")
      .when(elo < 2100, "1800–2099")
      .otherwise("2100+")
    val bandOrder = F
      .when(elo < 1200, 1)
      .when(elo < 1500, 2)
      .when(elo < 1800, 3)
      .when(elo < 2100, 4)
      .otherwise(5)
    val stats = perSide
      .filter(elo.isNotNull)
      .withColumn("band", band)
      .withColumn("band_order", bandOrder)
      .groupBy("band", "band_order")
      .agg(
        F.count("*").as("player_games"),
        F.round(F.avg("acpl"), 1).as("avg_acpl"),
        F.round(F.avg("blunders"), 2).as("avg_blunders"),
        F.round(F.avg("mistakes"), 2).as("avg_mistakes"),
        F.round(F.avg("inaccuracies"), 2).as("avg_inaccuracies"),
        F.round(F.avg("won"), 3).as("win_rate"),
      )
      .orderBy(F.asc("band_order"))
      .drop("band_order")
    write(stats, outputDir, "accuracy_by_rating", jdbcUrl, dbUser, dbPass, "analytics_accuracy_by_rating")
  private def writeBlunderOutcome(
      perSide: org.apache.spark.sql.DataFrame,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      outputDir: String,
  ): Unit =
    val b      = F.col("blunders")
    val bucket = F.when(b === 0, "0").when(b === 1, "1").when(b === 2, "2").otherwise("3+")
    val order  = F.when(b === 0, 0).when(b === 1, 1).when(b === 2, 2).otherwise(3)
    val stats = perSide
      .withColumn("blunder_bucket", bucket)
      .withColumn("bucket_order", order)
      .groupBy("blunder_bucket", "bucket_order")
      .agg(
        F.count("*").as("player_games"),
        F.round(F.avg("won"), 3).as("win_rate"),
        F.round(F.avg("acpl"), 1).as("avg_acpl"),
      )
      .orderBy(F.asc("bucket_order"))
      .drop("bucket_order")
    write(stats, outputDir, "blunder_outcome", jdbcUrl, dbUser, dbPass, "analytics_blunder_outcome")
  private def write(
      df: org.apache.spark.sql.DataFrame,
      outputDir: String,
      name: String,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      table: String,
  ): Unit =
    df.write.mode("overwrite").parquet(s"$outputDir/$name")
    df.write.mode("overwrite").option("header", "true").csv(s"$outputDir/${name}_csv")
    if !GameSource.isPgnMode then
      df.write
        .mode("overwrite")
        .format("jdbc")
        .option("url", jdbcUrl)
        .option("dbtable", table)
        .option("user", dbUser)
        .option("password", dbPass)
        .option("driver", "org.postgresql.Driver")
        .save()
@@ -0,0 +1,199 @@
 package de.nowchess.analytics
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.expressions.Window
 import org.apache.spark.sql.functions as F
 /** Time-management & clock-pressure analysis mined from Lichess `[%clk ...]` move annotations.
  *
  * Lichess records each player's remaining clock after every move (`{ [%clk 0:02:31] }`). This job reconstructs
  * per-move thinking time and remaining-time from those stamps to answer questions the existing time-control summary
  * cannot: how long do players actually think, how often do they fall into time scrambles (<10 s left), how often do
  * they flag (lose on time), and does burning the clock correlate with winning?
  *
  * Pipeline (Spark SQL string/array funcs + window funcs — no UDFs):
  *   1. `regexp_extract_all` pulls every `h:mm:ss` clock in ply order, converted to seconds.
  *   2. `posexplode` → one row per ply; even plies are White's clock, odd plies Black's.
  *   3. A per-(game,side) window `lag` gives the same side's previous clock; the difference is that move's thinking time.
  *      Remaining clock <10 s marks a time-scramble move.
  *   4. Roll up to (game, side): avg move time, scramble fraction, min clock, Elo, win flag, and whether the side lost on
  *      time (`Termination "Time forfeit"`).
  *
  * Outputs (Parquet + CSV + JDBC):
  *   - `clock_by_rating` — avg move time, scramble fraction, flag-loss rate and win-rate per Elo band.
  *   - `scramble_outcome` — win-rate bucketed by how much of the game was played in time-scramble. Quantifies the cost of
  *     time trouble.
  *
  * Requires a clock-annotated Lichess dump (`NOWCHESS_PGN_PATH`).
  */
 object ClockPressureJob:
  def main(args: Array[String]): Unit =
    val jdbcUrl   = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
    val dbUser    = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
    val dbPass    = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
    val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-clock-pressure"
    val spark = SparkSession
      .builder()
      .appName("NowChess Clock Pressure")
      .getOrCreate()
    run(spark, jdbcUrl, dbUser, dbPass, outputDir)
    spark.stop()
  def run(spark: SparkSession, jdbcUrl: String, dbUser: String, dbPass: String, outputDir: String): Unit =
    val games = GameSource
      .loadExtended(spark, jdbcUrl, dbUser, dbPass)
      .select("pgn", "result", "white_elo", "black_elo", "termination")
      .filter(F.col("result").isNotNull.and(F.col("pgn").contains("[%clk")))
      .withColumn("game_id", F.monotonically_increasing_id())
    val clkStrs = F.expr("""regexp_extract_all(pgn, '\\[%clk ([^\\]]+)\\]', 1)""")
    // "h:mm:ss" → seconds.
    val clkSecs = F.expr(
      "transform(clk_strs, x -> " +
        "cast(split(x, ':')[0] as double) * 3600 + " +
        "cast(split(x, ':')[1] as double) * 60 + " +
        "cast(split(x, ':')[2] as double))",
    )
    val withClk = games
      .withColumn("clk_strs", clkStrs)
      .withColumn("clk_sec", clkSecs)
      .filter(F.size(F.col("clk_sec")) >= 4)
    val plies = withClk.select(
      F.col("game_id"),
      F.col("result"),
      F.col("white_elo"),
      F.col("black_elo"),
      F.col("termination"),
      F.posexplode(F.col("clk_sec")).as(Seq("ply", "clk_after")),
    )
    val mover    = F.when(F.col("ply") % 2 === 0, "white").otherwise("black")
    val bySide   = Window.partitionBy("game_id", "mover").orderBy("ply")
    val moveTime = F.lag("clk_after", 1).over(bySide) - F.col("clk_after")
    val moves = plies
      .withColumn("mover", mover)
      .withColumn("move_time", moveTime)
    val perSide = moves
      .groupBy("game_id", "mover", "result", "white_elo", "black_elo", "termination")
      .agg(
        F.round(F.avg("move_time"), 1).as("avg_move_time"),
        F.count("*").as("moves"),
        F.round(F.min("clk_after"), 1).as("min_clk"),
        F.sum(F.when(F.col("clk_after") < 10, 1).otherwise(0)).as("scramble_moves"),
      )
      .withColumn("scramble_fraction", F.round(F.col("scramble_moves") / F.col("moves"), 3))
      .withColumn(
        "self_elo",
        F.when(F.col("mover") === "white", F.col("white_elo")).otherwise(F.col("black_elo")),
      )
      .withColumn("won", F.when(F.col("mover") === F.col("result"), 1).otherwise(0))
      .withColumn(
        "flag_loss",
        F.when(
          F.coalesce(F.col("termination"), F.lit("")).contains("Time forfeit") && F.col("won") === 0,
          1,
        ).otherwise(0),
      )
    writeClockByRating(perSide, jdbcUrl, dbUser, dbPass, outputDir)
    writeScrambleOutcome(perSide, jdbcUrl, dbUser, dbPass, outputDir)
  private def writeClockByRating(
      perSide: org.apache.spark.sql.DataFrame,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      outputDir: String,
  ): Unit =
    val elo = F.col("self_elo")
    val band = F
      .when(elo < 1200, "<1200")
      .when(elo < 1500, "1200–1499")
      .when(elo < 1800, "1500–1799")
      .when(elo < 2100, "1800–2099")
      .otherwise("2100+")
    val bandOrder = F
      .when(elo < 1200, 1)
      .when(elo < 1500, 2)
      .when(elo < 1800, 3)
      .when(elo < 2100, 4)
      .otherwise(5)
    val stats = perSide
      .filter(elo.isNotNull)
      .withColumn("band", band)
      .withColumn("band_order", bandOrder)
      .groupBy("band", "band_order")
      .agg(
        F.count("*").as("player_games"),
        F.round(F.avg("avg_move_time"), 1).as("avg_move_time_s"),
        F.round(F.avg("scramble_fraction"), 3).as("avg_scramble_fraction"),
        F.round(F.avg("flag_loss"), 3).as("flag_loss_rate"),
        F.round(F.avg("won"), 3).as("win_rate"),
      )
      .orderBy(F.asc("band_order"))
      .drop("band_order")
    write(stats, outputDir, "clock_by_rating", jdbcUrl, dbUser, dbPass, "analytics_clock_by_rating")
  private def writeScrambleOutcome(
      perSide: org.apache.spark.sql.DataFrame,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      outputDir: String,
  ): Unit =
    val sf = F.col("scramble_fraction")
    val bucket = F
      .when(sf === 0, "none")
      .when(sf < 0.05, "<5%")
      .when(sf < 0.20, "5–20%")
      .otherwise(">20%")
    val order = F
      .when(sf === 0, 0)
      .when(sf < 0.05, 1)
      .when(sf < 0.20, 2)
      .otherwise(3)
    val stats = perSide
      .withColumn("scramble_bucket", bucket)
      .withColumn("bucket_order", order)
      .groupBy("scramble_bucket", "bucket_order")
      .agg(
        F.count("*").as("player_games"),
        F.round(F.avg("won"), 3).as("win_rate"),
        F.round(F.avg("flag_loss"), 3).as("flag_loss_rate"),
      )
      .orderBy(F.asc("bucket_order"))
      .drop("bucket_order")
    write(stats, outputDir, "scramble_outcome", jdbcUrl, dbUser, dbPass, "analytics_scramble_outcome")
  private def write(
      df: org.apache.spark.sql.DataFrame,
      outputDir: String,
      name: String,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      table: String,
  ): Unit =
    df.write.mode("overwrite").parquet(s"$outputDir/$name")
    df.write.mode("overwrite").option("header", "true").csv(s"$outputDir/${name}_csv")
    if !GameSource.isPgnMode then
      df.write
        .mode("overwrite")
        .format("jdbc")
        .option("url", jdbcUrl)
        .option("dbtable", table)
        .option("user", dbUser)
        .option("password", dbPass)
        .option("driver", "org.postgresql.Driver")
        .save()
@@ -0,0 +1,154 @@
 package de.nowchess.analytics
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.expressions.Window
 import org.apache.spark.sql.functions as F
 /** Smurf / sandbagging anomaly detection via population z-scores.
  *
  * Smurfs (strong players on fresh accounts) and sandbaggers leave a statistical signature: a win-rate, an upset-rate
  * (beating higher-rated opponents) and a self-Elo climb that sit far above the population norm. This job builds those
  * three features per player, standardises each against the whole player base, and flags the players whose combined
  * deviation is extreme.
  *
  * Features per player (from each game's own/opponent Elo):
  *   - win_rate — fraction of decisive results won
  *   - upset_rate — wins vs higher-rated opponents / games vs higher-rated opponents
  *   - elo_climb — max self-Elo − min self-Elo across their games (rapid rating gain)
  *
  * Standardisation uses a single unbounded window (`Window.partitionBy()`), i.e. mean/stddev over every qualifying
  * player, so z = (x − μ) / σ. The composite anomaly score sums the three z-scores. No UDFs — pure SQL aggregates +
  * window functions, so Catalyst plans the whole job.
  *
  * Outputs (Parquet + CSV + JDBC):
  *   - `anomaly_scores` — every qualifying player with features, z-scores and composite, ranked most-anomalous first.
  *   - `flagged_smurfs` — the suspicious subset (high composite, or the classic high-winrate / few-games / steep-climb
  *     profile).
  *
  * Meaningful only when Elo is present (Lichess dump); requires `minGames` (arg 1, default 15) to avoid small-sample
  * noise.
  */
 object SmurfAnomalyJob:
  def main(args: Array[String]): Unit =
    val jdbcUrl   = sys.env.getOrElse("NOWCHESS_JDBC_URL", "jdbc:postgresql://localhost:5432/nowchess")
    val dbUser    = sys.env.getOrElse("NOWCHESS_DB_USER", "nowchess")
    val dbPass    = sys.env.getOrElse("NOWCHESS_DB_PASS", "nowchess")
    val outputDir = if args.length > 0 then args(0) else "/tmp/nowchess-smurf-anomaly"
    val minGames  = if args.length > 1 then args(1).toInt else 15
    val spark = SparkSession
      .builder()
      .appName("NowChess Smurf Anomaly Detection")
      .getOrCreate()
    run(spark, jdbcUrl, dbUser, dbPass, outputDir, minGames)
    spark.stop()
  def run(
      spark: SparkSession,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      outputDir: String,
      minGames: Int,
  ): Unit =
    val games = GameSource
      .loadExtended(spark, jdbcUrl, dbUser, dbPass)
      .select("white_id", "black_id", "result", "white_elo", "black_elo")
      .filter(F.col("result").isNotNull)
    val asWhite = games.select(
      F.col("white_id").as("player_id"),
      F.col("white_elo").as("self_elo"),
      F.col("black_elo").as("opp_elo"),
      F.when(F.col("result") === "white", 1).otherwise(0).as("won"),
    )
    val asBlack = games.select(
      F.col("black_id").as("player_id"),
      F.col("black_elo").as("self_elo"),
      F.col("white_elo").as("opp_elo"),
      F.when(F.col("result") === "black", 1).otherwise(0).as("won"),
    )
    val playerGames = asWhite
      .union(asBlack)
      .filter(F.col("self_elo").isNotNull.and(F.col("opp_elo").isNotNull))
    val higher = F.col("opp_elo") > F.col("self_elo")
    val features = playerGames
      .groupBy("player_id")
      .agg(
        F.count("*").as("total_games"),
        F.round(F.avg("won"), 3).as("win_rate"),
        F.round(F.avg("self_elo"), 0).as("avg_self_elo"),
        (F.max("self_elo") - F.min("self_elo")).as("elo_climb"),
        F.sum(F.when(higher, 1).otherwise(0)).as("vs_higher"),
        F.sum(F.when(higher && F.col("won") === 1, 1).otherwise(0)).as("upsets"),
      )
      .filter(F.col("total_games") >= minGames)
      .withColumn("upset_rate", F.round(F.col("upsets") / F.greatest(F.col("vs_higher"), F.lit(1)), 3))
    val all = Window.partitionBy()
    def z(col: String): org.apache.spark.sql.Column =
      val mean = F.avg(col).over(all)
      val std  = F.stddev(col).over(all)
      F.round((F.col(col) - mean) / F.when(std === 0 || std.isNull, F.lit(1.0)).otherwise(std), 2)
    val scored = features
      .withColumn("z_win_rate", z("win_rate"))
      .withColumn("z_upset_rate", z("upset_rate"))
      .withColumn("z_elo_climb", z("elo_climb"))
      .withColumn(
        "anomaly_score",
        F.round(F.col("z_win_rate") + F.col("z_upset_rate") + F.col("z_elo_climb"), 2),
      )
      .withColumn(
        "flagged",
        (F.col("anomaly_score") >= 4.0)
          .or(F.col("win_rate") >= 0.8 && F.col("total_games") < 50 && F.col("elo_climb") >= 300),
      )
    val ordered = scored
      .select(
        "player_id",
        "total_games",
        "win_rate",
        "avg_self_elo",
        "elo_climb",
        "upset_rate",
        "z_win_rate",
        "z_upset_rate",
        "z_elo_climb",
        "anomaly_score",
        "flagged",
      )
      .orderBy(F.desc("anomaly_score"))
    write(ordered, outputDir, "anomaly_scores", jdbcUrl, dbUser, dbPass, "analytics_smurf_anomaly")
    val flagged = ordered.filter(F.col("flagged") === true)
    write(flagged, outputDir, "flagged_smurfs", jdbcUrl, dbUser, dbPass, "analytics_flagged_smurfs")
  private def write(
      df: org.apache.spark.sql.DataFrame,
      outputDir: String,
      name: String,
      jdbcUrl: String,
      dbUser: String,
      dbPass: String,
      table: String,
  ): Unit =
    df.write.mode("overwrite").parquet(s"$outputDir/$name")
    df.write.mode("overwrite").option("header", "true").csv(s"$outputDir/${name}_csv")
    if !GameSource.isPgnMode then
      df.write
        .mode("overwrite")
        .format("jdbc")
        .option("url", jdbcUrl)
        .option("dbtable", table)
        .option("user", dbUser)
        .option("password", dbPass)
        .option("driver", "org.postgresql.Driver")
        .save()
@@ -1,3 +1,3 @@
 MAJOR=0
-MINOR=7
+MINOR=8
 PATCH=0
@@ -938,3 +938,52 @@
 ### Reverts
 * Revert "refactor: update metrics paths formatting in application.yml for clarity" ([3870566](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/38705663498d5f47c40dafe2f26198589ede8656))
 ##  (2026-06-23)
 ### Features
 * add initialization metrics for various services ([d438e97](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d438e97f32bdde0bfc63c1b4a8cc810cdd093166))
 * add OpenTelemetry trace configuration with parentbased sampler ([3904d5a](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/3904d5ad8ad4930ddee65287a7bfab785a6148f5))
 * **analytics:** add Spark batch analytics module ([#70](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/70)) ([39f1657](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/39f1657e1db6e84889af338c43be8cb5c03c3ec3))
 * **config:** update application.yml for PostgreSQL and remove staging/production configurations ([2404e61](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/2404e6164c3b50ffccbea5238d636060d6abe4d6))
 * **config:** update application.yml for staging and production environments ([6113432](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/6113432a14c476a3a0dfc0d449e17d023697f2ba))
 * configure logging and add OpenTelemetry support ([#49](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/49)) ([d57c488](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/d57c4886612d1d92da0e1b79209fc83e6ef537a1))
 * **docker:** add .dockerignore and .gitignore files for build exclusions ([c987d8e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/c987d8e258c0e6c4cfbdaa8381c64c410d7a2b83))
 * **docker:** add Dockerfiles for building Quarkus application in native and JVM modes ([3f2d2bb](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/3f2d2bb4c97fa8cddba66e1da4427c54236dfeed))
 * **docker:** add Dockerfiles for Quarkus application in JVM and native modes ([34b9933](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/34b993304670cf2aa62cd2f6460cee7b9864b08e))
 * **events:** migrate game-creation and bot flows to Redis Streams NCS-89 ([#62](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/62)) ([a24924c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/a24924c23057db3d700a75dbc4333557789cd991))
 * NCS-78 Add Traceability to the Applications ([#46](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/46)) ([649566e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/649566eb3fcf38f91c8896a739f74ea318af312d))
 * NCS-78 Add Traceability to the Applications ([#47](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/47)) ([87dfc6c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/87dfc6c2bcce7f7d58fc641bd8d468a2e584c108))
 * NCS-82 add Swiss-system tournament module ([#55](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/55)) ([c5661de](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/c5661de4a0ebf4b33211f5a391840dcf744656b7))
 * **official-bots:** activate opening book in expert bot (native-safe) ([260db25](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/260db25803ec55ce99e55782791eabdc190dfed4))
 * **official-bots:** consume GameOver stream for bot cleanup ([#67](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/67)) ([db9d153](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/db9d1533912f4b41c4d1ca80ccffdde5d23d6ff6))
 * **official-bots:** make HybridBot veto actionable and use it for expert ([1df29cf](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/1df29cf3a6e21af3f396b2b7a6da67d978f941ae))
 * **official-bots:** park expert bot on tournament server at startup ([#75](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/75)) ([30295a4](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/30295a4bb95855ee8261c92278bb9ebc80ee12ee))
 * **official-bots:** resolve tournament bot token from Redis and account service ([386ddc5](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/386ddc5c19f8f893b16c6422aa5393b54c872e45))
 * **tournament:** auto-join external tournaments and publish created ones ([#77](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/77)) ([9978b7e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/9978b7ea78eb658a225a461b9cd339386c0c14f3))
 * **tournament:** federate tournaments across clusters with DB replication ([5b000a6](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/5b000a6e5f04ea6770d1c7ab6bfdaded77a99172))
 * **tournament:** seed external server registry from env var on startup ([845dc9c](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/845dc9c2935c8bc1be42541dfaf31c9a861d3272))
 * true-microservices ([#40](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/40)) ([5909242](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/590924254e8a2754de661a57a03e43f89ceb6299))
 ### Bug Fixes
 * enable official bots to connect to external tournament server ([#71](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/71)) ([688d30e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/688d30e2b10026923372be5fca3c63eaaee2de2a))
 * **official-bots:** configure JWT verification ([#72](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/72)) ([98c64fc](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/98c64fc0d56dc542beb31c75f4b9056d91de03cd))
 * **official-bots:** correct parkOn path from /api/bots to /api/account/bots ([1be9949](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/1be9949c0b5c6a1db535696620d77735050d6c93))
 * **official-bots:** derive tournament game color from game endpoint ([#79](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/79)) ([bfc4672](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/bfc46723e615bb9b65f7f9bba5f53877c4f079a7))
 * **official-bots:** discover tournament games by polling, not just the stream ([10113fd](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/10113fd0579b614d15870798d933bc9c495d2049))
 * **official-bots:** make botToken optional, fall back to env, fix 502 status ([f43d193](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/f43d1930d80670d810c57b54eaa3789854fa082c))
 * **official-bots:** NCS-70-auto-register official bots with account service ([#59](https://git.janis-eccarius.de/NowChess/NowChessSystems/issues/59)) ([7117a93](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/7117a93376272094d0b1a6abf2121254ce396684))
 * **official-bots:** park on external tournament servers using correct endpoint and token ([3188241](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/31882417377468b41bbe3ff94506aa4928024450))
 * **official-bots:** play games by polling state instead of NDJSON stream ([bfb15c7](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/bfb15c7299bd471d5e064a577ed10af98e2ea90a))
 * **official-bots:** play only own tournament games with correct color ([4651bb7](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/4651bb796f07a21bd013d9521b2dfe2e1078cebb))
 * **official-bots:** prioritize Redis token over stale env var in joinTournament ([83dd2d4](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/83dd2d4335ca48eb3e5aa234a75367574276ba63))
 * **official-bots:** register with tournament server directly to get correct token ([64b5d55](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/64b5d5567f110c2fe152558c7de275a1e0b30e21))
 * **official-bots:** resolve per-difficulty bot token on tournament join ([fdf4c94](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/fdf4c94811d086996447bb4657fac1d9bd6e5a93))
 * **official-bots:** resume tournaments already joined after restart ([285b73e](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/285b73efbd6dd98cec410ade9eead9881d693a8f))
 * **official-bots:** sync bots before token fetch on first startup after DB wipe ([b0ddb27](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/b0ddb274d23bca8b1b3f691ce0d643f33e0b54cd))
 * **official-bots:** use ThreadLocalRandom in PolyglotBook for native image ([1b30c3b](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/1b30c3be393d25712c8743d3d9057207f8bbb67c))
 ### Reverts
 * Revert "refactor: update metrics paths formatting in application.yml for clarity" ([3870566](https://git.janis-eccarius.de/NowChess/NowChessSystems/commit/38705663498d5f47c40dafe2f26198589ede8656))
@@ -5,8 +5,8 @@ import de.nowchess.api.game.GameContext
 import de.nowchess.api.move.{Move, MoveType, PromotionPiece}
 import java.io.{DataInputStream, FileInputStream, InputStream}
 import java.util.concurrent.ThreadLocalRandom
 import scala.collection.mutable
 import scala.util.Random
 /** Reads a Polyglot opening book (.bin file) and probes it for moves.
  *
@@ -93,7 +93,7 @@ final class PolyglotBook private (entries: Map[Long, Vector[BookEntry]]):
    if entries.length == 1 then entries.head
    else
      val totalWeight = entries.map(_.weight).sum
-      val pick        = Random.nextInt(totalWeight.max(1)) // NOSONAR
+      val pick        = ThreadLocalRandom.current().nextInt(totalWeight.max(1)) // NOSONAR
      @scala.annotation.tailrec
      def select(remaining: Int, idx: Int): BookEntry =
@@ -1,3 +1,3 @@
 MAJOR=0
-MINOR=35
+MINOR=36
 PATCH=0
Author	SHA1	Message	Date
TeamCity	7372867a82	ci: bump version with Build-152	2026-06-23 22:30:53 +00:00
Janis Eccarius	c3e7b82ae8	feat(analytics): add accuracy and blunder analysis job for Lichess data Build & Test (NowChessSystems) TeamCity build finished Details	2026-06-24 00:21:40 +02:00
TeamCity	e88b081947	ci: bump version with Build-151	2026-06-23 21:54:06 +00:00
Janis Eccarius	1b30c3be39	fix(official-bots): use ThreadLocalRandom in PolyglotBook for native image Build & Test (NowChessSystems) TeamCity build finished Details A stored java.util.Random field is reachable from BotController's static openingBook, so GraalVM baked it into the image heap and aborted the native build (Random in image heap has a cached seed). Use ThreadLocalRandom.current() at call time instead — no stored instance, nothing in the image heap, still thread-safe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 23:42:15 +02:00
Janis Eccarius	f8ca95af3c	refactor(official-bots): use java.util.Random in PolyglotBook Build & Test (NowChessSystems) TeamCity build finished Details scala.util.Random delegates to a shared global java.util.Random, a contention point across concurrent bot games. Use a per-book java.util.Random instance instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 23:34:38 +02:00