feat: NCS-82 add Swiss-system tournament module #55
Reference in New Issue
Block a user
Delete Branch "feat/NCS-82"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
round progression, finish) as a standalone Quarkus module
docs/tournament-openapi.yaml) are coveredgameStartevents carryingthe correct
colorfieldGameResultStreamListener)Known gaps (deferred)
GET /resultsnbparam defaults to 100 instead of allPairingDtoexposes an internalidfield not in the specGameExport.movesemits PGN instead of UCI (upstreamGameWritebackEventDtodoes not carry UCI moves)
Pairing.whitecan benullfor bye rounds (spec has no bye concept)Test plan
TournamentResourceTestintegration tests (H2, mocked core client) — all passSwissPairingServiceTestunit tests — all passOverall the feature direction is solid — the Swiss pairing algorithm, NDJSON streaming design, and the shift from polling to reactive Redis
subscriptions are all the right calls. The OpenAPI spec is clean and complete. That said, there are several issues that should be addressed before
this moves out of Draft.
Blockers:
The @RolesAllowed("Admin") → @RolesAllowed("**") change on the official bot creation endpoint is a security regression. Any authenticated user can
now create official bots. If the goal is to let tournament bots self-register, that should go through a separate endpoint or entity type, not the
official bot path.
The CI build is failing on TeamCity. The PR should not be marked ready until the pipeline is green — the failures may be hiding regressions in
existing modules.
Design concerns:
The wildcard Redis subscription (bot:*:events) means every replica of the official-bots service receives every bot event across all games. With
multiple replicas running, both will compute and publish a move — the second one wins non-deterministically. A Redis SET NX lock per (gameId,
turn) is needed before computing a move.
The two-persist pattern for OfficialBotAccount (persist to get the ID, then set the token and persist again) is fragile: if the second write
fails, the entity exists without a usable token. Assign the UUID before persisting and generate the token in one step.
TournamentBotGamePlayer runs on a raw daemon thread with a flat 5-second sleep. It won't participate in Quarkus graceful shutdown and will hammer
the server if the endpoint is persistently down. Replace with a Vert.x timer or @Scheduled with exponential backoff.
Non-expiring bot tokens (expiresAt(Long.MaxValue)) leave no recovery path if a token is leaked. Use a long but finite expiry and add a rotation
endpoint.
Acknowledged items that need tickets, not just PR notes:
The pagination default of 100 (spec says unlimited), UCI vs PGN export, and the null white player in bye rounds are spec non-conformances. Each
should be a tracked issue so they don't get lost when this PR merges.
@@ -51,0 +52,4 @@modules/account/src/main/resources/keys/dev-private.pemmodules/account/src/main/resources/keys/dev-public.pemmodules/core/src/main/resources/keys/dev-public.pemjava_pid2736.hprofThis is a heap dump filename with a specific PID, meaning a heap dump was taken during development. Confirm the dump file itself is not tracked in the repository (git ls-files | grep hprof). The gitignore entry should use a wildcard (*.hprof) to catch any future dump rather than a single named file.
@@ -75,4 +75,7 @@ class OfficialBotAccount extends PanacheEntityBase:var rating: Int = 1500var createdAt: Instant = uninitializedMissing nullable = false. A just-constructed OfficialBotAccount starts with token = uninitialized, and AccountService does two separate persist() calls to work around this. Add nullable = false and generate the token before the first persist to collapse it to one round-trip and make the constraint explicit at the DB level.
@@ -196,3 +196,3 @@@POST@Path("/official-bots")@RolesAllowed(Array("Admin"))@RolesAllowed(Array("**"))Is it okay to leave it like that?
@@ -219,3 +219,12 @@ class AccountResource:rating = bot.rating,createdAt = bot.createdAt.toString,)bot.token can be null if the Hibernate entity was just persisted (the token is not nullable = false on OfficialBotAccount). Wrap with
▎ Option(bot.token) rather than Some(bot.token) to avoid a NPE at runtime.
@@ -154,3 +154,3 @@bot.name = botNamebot.owner = ownerbot.token = generateBotToken(bot.id)bot.token = UUID.randomUUID().toStringThe token is set to a random UUID, persisted, then immediately overwritten with the real JWT and persisted again. This is two DB writes and leaves a window where the entity carries a meaningless token. Generate the ID first, build the JWT, set it once, then persist.
@@ -204,6 +205,8 @@ class AccountService:bot.name = botNamebot.createdAt = Instant.now()officialBotAccountRepository.persist(bot)bot.token = generateBotToken(bot.id, bot.name)Same two-persist pattern. The root cause is that bot.id is not available before the first persist(). Either use a pre-assigned UUID (@Id with UUID.randomUUID() before persist) or derive the token from a field that doesn't require a DB round-trip. Two flushes per creation is a correctness risk under failure: if the second persist fails, the entity exists without a valid token.
@@ -232,4 +237,4 @@Jwt.issuer("nowchess").subject(botId.toString).expiresAt(Long.MaxValue)Non-expiring tokens are a security liability. If a bot token is leaked there is no recovery short of deleting the account. At minimum use a multi-year expiry, and surface a token-rotation endpoint so operators have a recovery path.
@@ -96,3 +100,3 @@heartbeatServiceOpt.foreach(_.addGameSubscription(gameId))val handler: Consumer[String] = msg => handleC2sMessage(gameId, msg)val executor = c2sExecutors.computeIfAbsent(gameId, _ => Executors.newSingleThreadExecutor())newSingleThreadExecutor() uses an unbounded queue. Under a slow game handler, tasks will queue indefinitely. Use new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(1000), r -> ..., new DiscardOldestPolicy()) to bound memory.
@@ -99,0 +105,4 @@def run(): Unit =try handleC2sMessage(gameId, msg)catch case ex: Exception => log.warnf(ex, "Error handling c2s message for game %s", gameId)Try(executor.execute(task))Swallowing a rejected RejectedExecutionException silently drops moves. At minimum log the rejection so it shows up in monitoring.
@@ -109,0 +120,4 @@// Notify the official-bots service to start playing a side of a game. Mirrors// the event the tournament service publishes; official-bots subscribes to// "<prefix>:bot:*:events".def publishBotGameStart(gameId: String, botId: String, playingAs: String): Unit =Building JSON by string interpolation is fragile — any gameId or botId containing " or \ will produce invalid JSON and silently break the downstream OfficialBotService. Use objectMapper.writeValueAsString(Map(...)) or a structured DTO.
@@ -113,6 +134,7 @@ class GameRedisSubscriberManager:Option(s2cObservers.remove(gameId)).foreach { obs =>registry.get(gameId).foreach(_.engine.unsubscribe(obs))}Option(c2sExecutors.remove(gameId)).foreach(_.shutdownNow())shutdownNow() interrupts running tasks but doesn't wait for them to finish. Tasks already submitted to the executor may still be mid-handleC2sMessage when the game is considered gone. Consider awaitTermination(200, MILLISECONDS) after shutdownNow() to give in-flight tasks a chance to complete cleanly.
@@ -182,0 +185,4 @@// and core notifies the official-bots service to play the bot side.@POST@Path("/vs-bot")@PermitAllA @PermitAll endpoint that creates a game and triggers bot activity has no rate-limiting or identity attached to the player. Any unauthenticated caller can drive unlimited bot load. Either require authentication (@RolesAllowed("**")) or add explicit rate-limiting before exposing this in production.
@@ -182,0 +201,4 @@created(GameDtoMapper.toGameFullDto(entry, ioClient))private def notifyBotSide(entry: GameEntry): Unit =if entry.black.id.value.startsWith("bot-") thenString prefix "bot-" as an ID convention is fragile — nothing enforces it in the domain model. This will silently fail (not notify the bot) if the prefix ever changes or if a regular user's ID happens to start with "bot-". Introduce a typed BotId/PlayerId sum type or a boolean flag on PlayerInfo instead.
@@ -56,3 +57,1 @@private def subscribeToEventChannel(botName: String): Unit =val handler: Consumer[String] = msg => handleBotEvent(botName, msg)redis.pubsub(classOf[String]).subscribe(s"${redisConfig.prefix}:bot:$botName:events", handler)val pattern = s"${redisConfig.prefix}:bot:*:events"Every replica of the official-bots service will receive every bot event. As bot and game counts grow, this is a hot fan-out path with no deduplication. If two replicas are running, both will try to compute a move and publish it — the second publish wins non-deterministically. Add a distributed lock (e.g., Redis SET NX) per (gameId, turn) before computing a move.
@@ -81,0 +80,4 @@private def registerColor(gameId: String, playingAs: String, playerId: String): Unit =val fresh = new ConcurrentHashMap[String, String]()val existing = watchedGames.putIfAbsent(gameId, fresh)val colors = if existing == null then fresh else existingThere is a TOCTOU race: two gameStart events for the same game (one white, one black) can arrive concurrently. Both may see existing == null and call subscribeAndConnect twice, resulting in duplicate subscriptions and duplicate moves. The putIfAbsent + null-check pattern is correct in principle, but subscribeAndConnect must be inside the same atomic block, or a separate AtomicBoolean guard should be used.
@@ -122,1 +98,4 @@subscribers.put(gameId, subscriber)sendConnected(pubsub, gameId)}pubsub.subscribe(s2c, handler).subscribe().`with`(onSubscribed, logFailure(s"subscribe to game $gameId"))The comment above says "subscribe must be issued non-blocking from this context". If onSubscribed calls sendConnected (which publishes to Redis), and this all happens on the Vert.x event loop thread, the publish in sendConnected could block the loop if the Redis client drains its write buffer. Ensure sendConnected is also non-blocking (it appears to use .subscribe().with(...) which is fine, but verify the pubsub publish pipeline).
@@ -123,0 +120,4 @@if terminalStatuses.contains(status) then stopWatching(gameId)elseval colors = watchedGames.get(gameId)val playerId = if colors == null then null else colors.get(turn)null used as a sentinel. Per project style, this should be Option(watchedGames.get(gameId)).flatMap(m => Option(m.get(turn))). Using null here is inconsistent with the Option/Either convention and risks an NPE if the code changes. is inconsistent with the Option/Either convention and risks an NPE if the code changes.
@@ -123,0 +164,4 @@private def logFailure(what: String): Consumer[Throwable] =err => log.errorf(err, "Failed to %s", what)private def engineName: String = "classical"The engine name is hardcoded but the engine instance (ClassicalBot) is also hardcoded at class init. If another difficulty is injected in the future this will be invisible. Either derive the name from the Bot instance (if Bot exposes a name), or at least make it a constant rather than a method.
@@ -0,0 +1,219 @@package de.nowchess.bot.serviceThe entire file operates on a daemon thread with Thread.sleep(5000) reconnect. This bypasses Quarkus CDI lifecycle: it won't receive the shutdown signal, won't drain in-flight work on @PreDestroy, and the flat 5-second backoff will hammer the server if the tournament endpoint is persistently unavailable. Use @Scheduled or a Vert.x timer with exponential backoff, and track the thread reference so it can be interrupted on graceful shutdown.
The activeGames set that prevents duplicate game processing is never trimmed when a game ends. In a long-lived process across many rounds the set grows without bound. Remove game IDs from activeGames when a terminal game state is received.
@@ -0,0 +1,31 @@package de.nowchess.tournament.domainThe PR description acknowledges a null white player for bye rounds. Per project convention, this must be Option[PlayerId] (or a Bye case in a sum type), not a nullable field. A null propagating through the SwissPairingService and TournamentResource will eventually reach a serializer or comparison that throws rather than producing a well-formed API response.
9f9afd3094toc9cf92266c- Restore @RolesAllowed("Admin") on official bot creation (security regression) - Pre-assign UUID before first persist in createOfficialBotAccount/syncOfficialBots to eliminate two-persist fragility (token-less entity on second-write failure) - Add nullable = false to OfficialBotAccount.token column - Replace JSON string interpolation in publishBotGameStart with objectMapper - Replace specific hprof PID filename in .gitignore with *.hprof wildcard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>