Halo Write

Halo Write

Halo “保持登录”机制突然失效的问题排查

98
2024-07-14

背景

在 Halo 2.17 中,我们实现了持久化的“记住登录”机制,具体实现请参考:https://github.com/halo-dev/halo/pull/6131。但是近期有社区用户在 https://github.com/halo-dev/halo/issues/6290#issue-2393508995 中反馈“保持登录”会突然失效,且该问题并不是稳定复现,所以排查起来极为棘手。

分析

不过好在该社区用户提供了详细的错误日志,可以有效地定位到问题点。关键日志如下所示:

2024-07-06T15:34:06.258+08:00 ERROR 7 --- [reactor-tcp-epoll-1] r.h.a.s.a.r.TokenBasedRememberMeServices : Cookie theft detected

org.springframework.security.web.authentication.rememberme.CookieTheftException: Invalid remember-me token (Series/token) mismatch. Implies previous cookie theft attack.
	at run.halo.app.security.authentication.rememberme.PersistentTokenBasedRememberMeServices.lambda$processAutoLoginCookie$0(PersistentTokenBasedRememberMeServices.java:98) ~[classes/:2.17.1]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:122) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Operators$BaseFluxToMonoOperator.completePossiblyEmpty(Operators.java:2097) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onComplete(MonoCollectList.java:118) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onComplete(FluxPeekFuseable.java:277) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onComplete(FluxMapFuseable.java:152) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drainAsync(FluxFlattenIterable.java:371) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drain(FluxFlattenIterable.java:724) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.onComplete(FluxFlattenIterable.java:273) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onComplete(FluxPeekFuseable.java:277) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Operators$BaseFluxToMonoOperator.completePossiblyEmpty(Operators.java:2098) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onComplete(MonoCollectList.java:118) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxUsingWhen$UsingWhenSubscriber.deferredComplete(FluxUsingWhen.java:397) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxUsingWhen$CommitInner.onComplete(FluxUsingWhen.java:532) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Operators.complete(Operators.java:137) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:46) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4568) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxUsingWhen$UsingWhenSubscriber.onComplete(FluxUsingWhen.java:389) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxConcatMapNoPrefetch$FluxConcatMapNoPrefetchSubscriber.onComplete(FluxConcatMapNoPrefetch.java:241) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onComplete(Operators.java:2231) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxUsingWhen$UsingWhenSubscriber.deferredComplete(FluxUsingWhen.java:397) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxUsingWhen$CommitInner.onComplete(FluxUsingWhen.java:532) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onComplete(Operators.java:2231) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onComplete(Operators.java:2231) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:210) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:210) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.pool.SimpleDequePool.maybeRecycleAndDrain(SimpleDequePool.java:540) ~[reactor-pool-1.0.6.jar:1.0.6]
	at reactor.pool.SimpleDequePool$QueuePoolRecyclerInner.onComplete(SimpleDequePool.java:770) ~[reactor-pool-1.0.6.jar:1.0.6]
	at reactor.core.publisher.Operators.complete(Operators.java:137) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:46) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4568) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.pool.SimpleDequePool$QueuePoolRecyclerMono.subscribe(SimpleDequePool.java:882) ~[reactor-pool-1.0.6.jar:1.0.6]
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:241) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:204) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:89) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onComplete(MonoFlatMapMany.java:261) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onComplete(FluxPeekFuseable.java:277) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onComplete(FluxContextWrite.java:126) ~[reactor-core-3.6.7.jar:3.6.7]
	at io.asyncer.r2dbc.mysql.internal.util.DiscardOnCancelSubscriber.onComplete(DiscardOnCancelSubscriber.java:84) ~[r2dbc-mysql-1.1.3.jar:1.1.3]
	at reactor.core.publisher.FluxPeekFuseable$PeekConditionalSubscriber.onComplete(FluxPeekFuseable.java:940) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxHandle$HandleConditionalSubscriber.onNext(FluxHandle.java:359) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.FluxPeekFuseable$PeekConditionalSubscriber.onNext(FluxPeekFuseable.java:854) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.SinkManyEmitterProcessor.drain(SinkManyEmitterProcessor.java:476) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.SinkManyEmitterProcessor.tryEmitNext(SinkManyEmitterProcessor.java:273) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.SinkManySerialized.tryEmitNext(SinkManySerialized.java:100) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:27) ~[reactor-core-3.6.7.jar:3.6.7]
	at io.asyncer.r2dbc.mysql.client.ReactorNettyClient$ResponseSink.next(ReactorNettyClient.java:394) ~[r2dbc-mysql-1.1.3.jar:1.1.3]
	at io.asyncer.r2dbc.mysql.client.ReactorNettyClient.lambda$new$0(ReactorNettyClient.java:119) ~[r2dbc-mysql-1.1.3.jar:1.1.3]
	at reactor.core.publisher.FluxPeek$PeekSubscriber.onNext(FluxPeek.java:185) ~[reactor-core-3.6.7.jar:3.6.7]
	at reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:294) ~[reactor-netty-core-1.1.20.jar:1.1.20]
	at reactor.netty.channel.FluxReceive.onInboundNext(FluxReceive.java:403) ~[reactor-netty-core-1.1.20.jar:1.1.20]
	at reactor.netty.channel.ChannelOperations.onInboundNext(ChannelOperations.java:426) ~[reactor-netty-core-1.1.20.jar:1.1.20]
	at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:114) ~[reactor-netty-core-1.1.20.jar:1.1.20]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1473) ~[netty-handler-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1336) ~[netty-handler-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1385) ~[netty-handler-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) ~[netty-codec-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) ~[netty-codec-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) ~[netty-transport-classes-epoll-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501) ~[netty-transport-classes-epoll-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399) ~[netty-transport-classes-epoll-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]

通过以上错误栈我们可以找到对应的源码如下所示:

    @Override
    protected Mono<UserDetails> processAutoLoginCookie(String[] cookieTokens,
        ServerWebExchange exchange) {
        if (cookieTokens.length != 2) {
            throw new InvalidCookieException(
                "Cookie token did not contain " + 2 + " tokens, but contained '"
                    + Arrays.asList(cookieTokens) + "'");
        }
        String presentedSeries = cookieTokens[0];
        String presentedToken = cookieTokens[1];
        return this.tokenRepository.getTokenForSeries(presentedSeries)
            // No series match, so we can't authenticate using this cookie
            .switchIfEmpty(Mono.error(new RememberMeAuthenticationException(
                "No persistent token found for series id: " + presentedSeries))
            )
            .flatMap(token -> {
                // We have a match for this user/series combination
                if (!presentedToken.equals(token.getTokenValue())) {
                    // Token doesn't match series value. Delete all logins for this user and throw
                    // an exception to warn them.
                    return this.tokenRepository.removeUserTokens(token.getUsername())
                        .then(Mono.error(new CookieTheftException(
                            "Invalid remember-me token (Series/token) mismatch. Implies previous "
                                + "cookie theft"
                                + " attack.")));
                }

                if (isTokenExpired(token)) {
                    return Mono.error(
                        new RememberMeAuthenticationException("Remember-me login has expired"));
                }

                // Token also matches, so login is valid. Update the token value, keeping the
                // *same* series number.
                log.debug("Refreshing persistent login token for user '{}', series '{}'",
                    token.getUsername(), token.getSeries());
                var newToken = new PersistentRememberMeToken(token.getUsername(), token.getSeries(),
                    generateTokenData(), new Date());
                return Mono.just(newToken);
            })
            .flatMap(newToken -> updateToken(newToken)
                .doOnSuccess(unused -> addCookie(newToken, exchange))
                .onErrorMap(ex -> {
                    log.error("Failed to update token: ", ex);
                    return new RememberMeAuthenticationException(
                        "Autologin failed due to data access problem");
                })
                .then(getUserDetailsService().findByUsername(newToken.getUsername()))
            );
    }
https://github.com/halo-dev/halo/blob/1089061265ca2ec1d71808429ecad285ed49d888/application/src/main/java/run/halo/app/security/authentication/rememberme/PersistentTokenBasedRememberMeServices.java#L79-L127

通过源码分析后发现,如果 RememberMeToken 没有被盗的情况下,只有可能是通过 this.tokenRepository#getTokenForSeries 获取到的 token 数据已经被更改过。然而更新 token 的操作是在 cookieTokens 验证通过后执行的。所以我们可以大胆地猜测,导致该问题的原因可能有:

  1. 并发请求

  2. 在响应之前因为网络的问题或者用户手动关闭浏览器导致无法在浏览器侧处理 Set-Cookie

复现步骤

  1. 登录时选择“记住我”选项

  2. .doOnSuccess(unused -> addCookie(newToken, exchange)) 处打断点

  3. 打开浏览器开发者模式,并删除掉名为 SESSION 的 Cookie

  4. 刷新浏览器

  5. 完全关闭浏览器

  6. 恢复程序执行并取消断点

  7. 重新打开浏览器并访问 http://localhost:8090

  8. 查看 Halo 日志

解决方案

经过搜索错误 CookieTheftException 发现,该问题很早就在 Spring Security 中https://github.com/spring-projects/spring-security/issues/3079过,解决该问题并不是一件容易的事情。

可能的解决方案如下:

  1. 对 processAutoLoginCookie 加锁,保证每个用户访问该方法时都是顺序的。仍然无法适用用户意外关闭浏览器或者断网的场景

  2. 更新 token 的时候缓存旧的 token,并保证 30s 内有效。仍然无法适用用户意外关闭浏览器或者断网的场景

  3. token 验证通过后不进行更新 token 操作。可以解决以上问题,但是会损失一定安全性,无法解决用户 Cookie 被盗后禁止访问的问题

  4. 前端请求时保证顺序请求。缺点在于前端很难控制

总结

如果我们可以接受用户 Cookie 被盗后仍然可正常访问的风险,我更倾向选择第 3 种解决方案。