欢迎访问《系统工程学报》编辑部网站！

基于分层深度强化学习的O2O取送货动态调度

Dynamic scheduling of o2o pick-up and delivery based on hierarchical deep reinforcement learning

摘要点击 11 全文点击 0 投稿时间：2023-01-04 修订日期：2025-04-07

中文关键词分层深度强化学习; O2O即时配送; 动态取送货问题; 交通仿真

英文关键词 hierarchical deep reinforcement learning; O2O instant delivery; dynamic pickup and delivery problem; traffic simulation

基金项目国家自然科学基金(72293563; 72442025); 辽宁省自然科学基金(2024-MS-175); 辽宁省教育厅基本科研项目(JYTZD2023050); 大连市科技人才创新支持计划(2022RG17); 辽宁省教育厅研究生科研创新专项(DUFEYJS24035); 辽宁省重点研发计划项目(2024JH2/102400020).

作者	单位	邮编
高明^*	东北财经大学	116025
陈明浩	东北财经大学
唐加福	东北财经大学
邹广宇	大连理工大学
许欣	东北财经大学

中文摘要

针对O2O即时配送调度中需求波动、路况不确定及实时性挑战, 提出一种分层深度强化学习方法. 上层智能体不断学习动态变化的历史订单及路况信息, 进行骑手任务分配; 下层专注于各骑手并单后的路径优化. 通过全局奖励函数, 在分层智能体间纵向传递全局优化信号, 并在多个滚动调度区间内横向协调长期平均目标. 在仿真平台中对大连市某外卖平台的真实和模拟订单进行了多场景实时调度实验, 验证了方法在滚动调度中兼顾长期目标的优越性和分层求解的高效性, 为即时配送服务提供了兼具成本效益和服务质量的优化调度解决方案.

英文摘要

To solve O2O delivery challenges like changing demands and real-time requirements, this study proposes a two-layer deep reinforcement learning method. The upper-layer agent handles order allocation by learning from historical orders and road conditions. The lower-layer agent focuses on optimizing delivery routes for riders. A global reward system connects both layers to share optimization signals. This design helps balance long-term goals across multiple scheduling periods. In the simulation platform, real and simulated orders from a certain food delivery platform in Dalian were subjected to multi-scenario real-time dispatching experiments. This verified the superiority of the method in considering long-term goals in rolling scheduling and the efficiency of hierarchical solving, providing an optimized dispatching solution that is both cost-effective and service quality-oriented for immediate delivery services.

关闭