首页 | 新闻公告 | 投稿须知 | 编委会 | 关于杂志 | 订阅 | 留言FAQ | 广告服务 | 相关链接 | 下载区 | 联系我们

基于分层深度强化学习的O2O取送货动态调度
Dynamic scheduling of o2o pick-up and delivery based on hierarchical deep reinforcement learning
摘要点击 11  全文点击 0  投稿时间:2023-01-04  修订日期:2025-04-07
  查看/发表评论  下载PDF阅读器
中文关键词  分层深度强化学习; O2O即时配送; 动态取送货问题; 交通仿真
英文关键词  hierarchical deep reinforcement learning; O2O instant delivery; dynamic pickup and delivery problem; traffic simulation
基金项目  国家自然科学基金(72293563; 72442025); 辽宁省自然科学基金(2024-MS-175); 辽宁省教育厅基本科研项目(JYTZD2023050); 大连市科技人才创新支持计划(2022RG17); 辽宁省教育厅研究生科研创新专项(DUFEYJS24035); 辽宁省重点研发计划项目(2024JH2/102400020).
作者单位邮编
高明* 东北财经大学 116025
陈明浩 东北财经大学 
唐加福 东北财经大学 
邹广宇 大连理工大学 
许欣 东北财经大学 
中文摘要
      针对O2O即时配送调度中需求波动、路况不确定及实时性挑战, 提出一种分层深度强化学习方法. 上层智能体不断学习动态变化的历史订单及路况信息, 进行骑手任务分配; 下层专注于各骑手并单后的路径优化. 通过全局奖励函数, 在分层智能体间纵向传递全局优化信号, 并在多个滚动调度区间内横向协调长期平均目标. 在仿真平台中对大连市某外卖平台的真实和模拟订单进行了多场景实时调度实验, 验证了方法在滚动调度中兼顾长期目标的优越性和分层求解的高效性, 为即时配送服务提供了兼具成本效益和服务质量的优化调度解决方案.
英文摘要
      To solve O2O delivery challenges like changing demands and real-time requirements, this study proposes a two-layer deep reinforcement learning method. The upper-layer agent handles order allocation by learning from historical orders and road conditions. The lower-layer agent focuses on optimizing delivery routes for riders. A global reward system connects both layers to share optimization signals. This design helps balance long-term goals across multiple scheduling periods. In the simulation platform, real and simulated orders from a certain food delivery platform in Dalian were subjected to multi-scenario real-time dispatching experiments. This verified the superiority of the method in considering long-term goals in rolling scheduling and the efficiency of hierarchical solving, providing an optimized dispatching solution that is both cost-effective and service quality-oriented for immediate delivery services.
关闭

版权所有 © 2007 《系统工程学报》
通讯地址:天津市卫津路92号天津大学25教学楼A区908室 邮编:300072
联系电话/传真:022-27403197 电子信箱: jse@tju.edu.cn