Gateway 网关坑我! 被这个404 问题折腾了一年?

大家好,我是小富~

最近同事找我帮忙排查一个"诡异"的 Bug,说困扰了他们一年多一直没解决。我接手后花了一些时间定位到了问题根源,今天就来跟大家分享一下这个问题的排查过程和解决方案。

问题描述

同事使用的是 SpringCloud Gateway 3.0.1 + JDK8,整合了 Nacos 做动态路由配置。问题是:每次修改 Nacos 的路由配置后,网关的 API 请求就会出现 404 错误,但重启网关后又能恢复正常。

听到这个问题,我的第一反应是:Nacos 配置更新后,网关的缓存数据可能没有及时更新。带着这个猜想,我开始深入排查。

环境准备

首先准备了 3 个后端服务实例,端口分别为 81031204012041,在 Nacos 中配置了对应的网关路由:xiaofu-8103xiaofu-12040xiaofu-12041,并将它们放在同一个权重组 xiaofu-group 中,实现基于权重的负载均衡。

- id: xiaofu-8103  uri: http://127.0.0.1:8103/  predicates:    - Weight=xiaofu-group, 2    - Path=/test/version1/**  filters:    - RewritePath=/test/version1/(?<segment>.*),/$\{segment}- id: xiaofu-12040  uri: http://127.0.0.1:12040/  predicates:    - Weight=xiaofu-group, 1    - Path=/test/version1/**  filters:    - RewritePath=/test/version1/(?<segment>.*),/$\{segment}- id: xiaofu-12041  uri: http://127.0.0.1:12041/  predicates:    - Weight=xiaofu-group, 2    - Path=/test/version1/**  filters:    - RewritePath=/test/version1/(?<segment>.*),/$\{segment}

使用 JMeter 进行持续请求测试,为了便于日志追踪,给每个请求参数都添加了随机数。

准备完成后启动 JMeter 循环请求,观察到三个实例都有日志输出,说明网关的负载均衡功能正常。

问题排查

为了获取更详细的日志信息,我将网关的日志级别调整为 TRACE

启动 JMeter 后,随机修改三个实例的路由属性(uri、port、predicates、filters),请求没有出现报错,网关控制台也显示了更新后的路由属性,说明 Nacos 配置变更已成功同步到网关。

接下来尝试去掉一个实例 xiaofu-12041,这时发现 JMeter 请求开始出现 404 错误,成功复现问题!

查看网关控制台日志时,惊奇地发现已删除的实例 xiaofu-12041 的路由配置仍然存在,甚至还被选中(chosen)处理请求。问题根源找到了:虽然 Nacos 中删除了实例路由配置,但网关在实际负载均衡时仍然使用旧的路由数据。

继续深入排查,发现在路由的权重信息(Weights attr)中也存在旧的路由数据。至此基本确定问题:在计算实例权重和负载均衡时,网关使用了陈旧的缓存数据。

源码分析

通过分析源码,发现了一个专门计算权重的过滤器 WeightCalculatorWebFilter。它内部维护了一个 groupWeights 变量来存储路由权重信息。当配置变更事件发生时,会执行 addWeightConfig(WeightConfig weightConfig) 方法来更新权重配置。

@Overridepublic void onApplicationEvent(ApplicationEvent event) {    if (event instanceof PredicateArgsEvent) {        handle((PredicateArgsEvent) event);    }    else if (event instanceof WeightDefinedEvent) {        addWeightConfig(((WeightDefinedEvent) event).getWeightConfig());    }    else if (event instanceof RefreshRoutesEvent && routeLocator != null) {        if (routeLocatorInitialized.compareAndSet(false, true)) {            routeLocator.ifAvailable(locator -> locator.getRoutes().blockLast());        }        else {            routeLocator.ifAvailable(locator -> locator.getRoutes().subscribe());        }    }}

addWeightConfig 方法的注释明确说明:该方法仅创建新的 GroupWeightConfig,而不进行修改。这意味着它只能新建或覆盖路由权重,无法清理已删除的路由权重信息。

void addWeightConfig(WeightConfig weightConfig) {  String group = weightConfig.getGroup();  GroupWeightConfig config;  // only create new GroupWeightConfig rather than modify  // and put at end of calculations. This avoids concurency problems  // later during filter execution.  if (groupWeights.containsKey(group)) {   config = new GroupWeightConfig(groupWeights.get(group));  }  else {   config = new GroupWeightConfig(group);  }  final AtomicInteger index = new AtomicInteger(0);  ....省略.....  if (log.isTraceEnabled()) {   log.trace("Recalculated group weight config " + config);  }  // only update after all calculations  groupWeights.put(group, config); }

解决方案

找到问题根源后,解决方案就清晰了

开始我怀疑可能是springcloud gateway 版本问题,将版本升级到了4.1.0,但结果还是存在这个问题。

看来只能手动更新缓存,需要监听 Nacos 路由配置变更事件,获取最新路由配置,并更新 groupWeights 中的权重数据。

以下是实现的解决方案代码:

@Slf4j@Configurationpublic class WeightCacheRefresher {    @Autowired    private WeightCalculatorWebFilter weightCalculatorWebFilter;    @Autowired    private RouteDefinitionLocator routeDefinitionLocator;    @Autowired    private ApplicationEventPublisher publisher;    /**     * 监听路由刷新事件,同步更新权重缓存     */    @EventListener(RefreshRoutesEvent.class)    public void onRefreshRoutes() {        log.info("检测到路由刷新事件,准备同步更新权重缓存");        syncWeightCache();    }    /**     * 同步权重缓存与当前路由配置     */    public void syncWeightCache() {        try {            // 获取 groupWeights 字段            Field groupWeightsField = WeightCalculatorWebFilter.class.getDeclaredField("groupWeights");            groupWeightsField.setAccessible(true);            // 获取当前的 groupWeights 值            @SuppressWarnings("unchecked")            Map<String, Object> groupWeights = (Map<String, Object>) groupWeightsField.get(weightCalculatorWebFilter);            if (groupWeights == null) {                log.warn("未找到 groupWeights 缓存");                return;            }            log.info("当前 groupWeights 缓存: {}", groupWeights.keySet());            // 获取当前所有路由的权重组和路由ID            final Set<String> currentRouteIds = new HashSet<>();            final Map<String, Map<String, Integer>> currentGroupRouteWeights = new HashMap<>();            routeDefinitionLocator.getRouteDefinitions()                    .collectList()                    .subscribe(definitions -> {                        definitions.forEach(def -> {                            currentRouteIds.add(def.getId());                            def.getPredicates().stream()                                    .filter(predicate -> predicate.getName().equals("Weight"))                                    .forEach(predicate -> {                                        Map<String, String> args = predicate.getArgs();                                        String group = args.getOrDefault("_genkey_0", "unknown");                                        int weight = Integer.parseInt(args.getOrDefault("_genkey_1", "0"));                                        // 记录每个组中当前存在的路由及其权重                                        currentGroupRouteWeights.computeIfAbsent(group, k -> new HashMap<>())                                                .put(def.getId(), weight);                                    });                        });                        log.info("当前路由配置中的路由ID: {}", currentRouteIds);                        log.info("当前路由配置中的权重组: {}", currentGroupRouteWeights);                        // 检查每个权重组,移除不存在的路由,更新权重变化的路由                        Set<String> groupsToRemove = new HashSet<>();                        Set<String> groupsToUpdate = new HashSet<>();                        for (String group : groupWeights.keySet()) {                            if (!currentGroupRouteWeights.containsKey(group)) {                                // 整个权重组不再存在                                groupsToRemove.add(group);                                log.info("权重组 [{}] 不再存在于路由配置中,将被移除", group);                                continue;                            }                            // 获取该组中当前配置的路由ID和权重                            Map<String, Integer> configuredRouteWeights = currentGroupRouteWeights.get(group);                            // 获取该组中缓存的权重配置                            Object groupWeightConfig = groupWeights.get(group);                            try {                                // 获取 weights 字段                                Field weightsField = groupWeightConfig.getClass().getDeclaredField("weights");                                weightsField.setAccessible(true);                                @SuppressWarnings("unchecked")                                LinkedHashMap<String, Integer> weights = (LinkedHashMap<String, Integer>) weightsField.get(groupWeightConfig);                                // 找出需要移除的路由ID                                Set<String> routesToRemove = weights.keySet().stream()                                        .filter(routeId -> !configuredRouteWeights.containsKey(routeId))                                        .collect(Collectors.toSet());                                // 找出权重发生变化的路由ID                                Set<String> routesWithWeightChange = new HashSet<>();                                for (Map.Entry<String, Integer> entry : weights.entrySet()) {                                    String routeId = entry.getKey();                                    Integer cachedWeight = entry.getValue();                                    if (configuredRouteWeights.containsKey(routeId)) {                                        Integer configuredWeight = configuredRouteWeights.get(routeId);                                        if (!cachedWeight.equals(configuredWeight)) {                                            routesWithWeightChange.add(routeId);                                            log.info("路由 [{}] 的权重从 {} 变为 {}", routeId, cachedWeight, configuredWeight);                                        }                                    }                                }                                // 找出新增的路由ID                                Set<String> newRoutes = configuredRouteWeights.keySet().stream()                                        .filter(routeId -> !weights.containsKey(routeId))                                        .collect(Collectors.toSet());                                if (!routesToRemove.isEmpty() || !routesWithWeightChange.isEmpty() || !newRoutes.isEmpty()) {                                    log.info("权重组 [{}] 中有变化:删除 {},权重变化 {},新增 {}",                                            group, routesToRemove, routesWithWeightChange, newRoutes);                                    // 如果有任何变化,我们将重新计算整个组的权重                                    groupsToUpdate.add(group);                                }                                // 首先,移除需要删除的路由                                for (String routeId : routesToRemove) {                                    weights.remove(routeId);                                }                                // 如果权重组中没有剩余路由,则移除整个组                                if (weights.isEmpty()) {                                    groupsToRemove.add(group);                                    log.info("权重组 [{}] 中没有剩余路由,将移除整个组", group);                                }                            } catch (Exception e) {                                log.error("处理权重组 [{}] 时出错", group, e);                            }                        }                        // 移除不再需要的权重组                        for (String group : groupsToRemove) {                            groupWeights.remove(group);                            log.info("已移除权重组: {}", group);                        }                        // 更新需要重新计算的权重组                        for (String group : groupsToUpdate) {                            try {                                // 获取该组中当前配置的路由ID和权重                                Map<String, Integer> configuredRouteWeights = currentGroupRouteWeights.get(group);                                // 移除旧的权重组配置                                groupWeights.remove(group);                                log.info("已移除权重组 [{}] 以便重新计算", group);                                // 为每个路由创建 WeightConfig 并调用 addWeightConfig 方法                                Method addWeightConfigMethod = WeightCalculatorWebFilter.class.getDeclaredMethod("addWeightConfig", WeightConfig.class);                                addWeightConfigMethod.setAccessible(true);                                for (Map.Entry<String, Integer> entry : configuredRouteWeights.entrySet()) {                                    String routeId = entry.getKey();                                    Integer weight = entry.getValue();                                    WeightConfig weightConfig = new WeightConfig(routeId);                                    weightConfig.setGroup(group);                                    weightConfig.setWeight(weight);                                    addWeightConfigMethod.invoke(weightCalculatorWebFilter, weightConfig);                                    log.info("为路由 [{}] 添加权重配置:组 [{}],权重 {}", routeId, group, weight);                                }                            } catch (Exception e) {                                log.error("重新计算权重组 [{}] 时出错", group, e);                            }                        }                        log.info("权重缓存同步完成,当前缓存的权重组: {}", groupWeights.keySet());                    });        } catch (Exception e) {            log.error("同步权重缓存失败", e);        }    }}

网上找一圈并没发现官方的修改意见,可能是咱们使用方式不对导致的,要不如此明显的BUG早就有人改了吧!

全部评论

相关推荐

宁檬微趣一面1.自我介绍2.hashmap底层原理,是否是线程安全的3.不安全应该使用什么4.currenthashmap原理,线程不安全的情况&nbsp;这块一致追问&nbsp;答的不太好5.多个线程写一个日志文件,怎么保证并发安全(不太会)6.jvm内存结构7.垃圾回收&nbsp;怎么确定回收哪些垃圾8.多线程使用场景9.常见的gcroots10.网络分层结构11.tcp和udp区别12.tcp概念问了一大堆13.https了解吗&nbsp;具体说一下&nbsp;也是说了一大堆14.mysql索引15.b+树&nbsp;为什么不用红黑树&nbsp;b+树的查询效率&nbsp;推导一下总结:一直问,不会就想,偶尔会给一个反馈,没问实习,没问项目,纯纯八股🍋【柠檬微趣26届秋招】火热开启!一周极速Offer,职等柠来!✔&nbsp;研发发行《宾果消消消》《浪漫餐厅》《梦幻旅行》等爆款手游✔&nbsp;中国手游发行商出海收入排行榜Top&nbsp;5✔&nbsp;合成手游赛道全球收入No.1的发行商📍&nbsp;工作地点:北京市西城区🔥&nbsp;秋招亮点✅&nbsp;岗位全覆盖:游戏开发、数据分析、游戏策划、后台、运维、测试等(总有一款适合你!)✅&nbsp;早投递=早占坑:HC有限,速投抢占先机!📩&nbsp;投递方式🔗&nbsp;【内推链接】https://app.mokahr.com/su/lodoap【内推码】NTA0tU4(优先筛选,提高通过率!)💎&nbsp;超香福利▪&nbsp;京户指标&nbsp;|&nbsp;一年免费住宿&nbsp;|&nbsp;七险一金▪&nbsp;全员带薪旅游&nbsp;|&nbsp;免费早晚餐&nbsp;|&nbsp;1v1导师带教▪&nbsp;节日礼物&nbsp;|&nbsp;免费健身房&nbsp;|&nbsp;更多等你解锁…🚀&nbsp;立即行动:投递简历+填写内推码,早投早拿Offer!大家投递完可以在评论区打上姓名缩写+岗位,我来确认有没有内推成功喽
点赞 评论 收藏
分享
不愿透露姓名的神秘牛友
09-11 13:00
投递长江存储等公司10个岗位
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客网在线编程
牛客网题解
牛客企业服务