基于分层深度强化学习的O2O取送货动态调度 |
Dynamic scheduling of o2o pick-up and delivery based on hierarchical deep reinforcement learning |
摘要点击 11 全文点击 0 投稿时间:2023-01-04 修订日期:2025-04-07 |
查看/发表评论 下载PDF阅读器 |
中文关键词 分层深度强化学习; O2O即时配送; 动态取送货问题; 交通仿真 |
英文关键词 hierarchical deep reinforcement learning; O2O instant delivery; dynamic pickup and delivery problem; traffic simulation |
基金项目 国家自然科学基金(72293563; 72442025); 辽宁省自然科学基金(2024-MS-175); 辽宁省教育厅基本科研项目(JYTZD2023050); 大连市科技人才创新支持计划(2022RG17); 辽宁省教育厅研究生科研创新专项(DUFEYJS24035); 辽宁省重点研发计划项目(2024JH2/102400020). |
作者 | 单位 | 邮编 | 高明* | 东北财经大学 | 116025 | 陈明浩 | 东北财经大学 | | 唐加福 | 东北财经大学 | | 邹广宇 | 大连理工大学 | | 许欣 | 东北财经大学 | |
|
中文摘要 |
针对O2O即时配送调度中需求波动、路况不确定及实时性挑战, 提出一种分层深度强化学习方法. 上层智能体不断学习动态变化的历史订单及路况信息, 进行骑手任务分配; 下层专注于各骑手并单后的路径优化. 通过全局奖励函数, 在分层智能体间纵向传递全局优化信号, 并在多个滚动调度区间内横向协调长期平均目标. 在仿真平台中对大连市某外卖平台的真实和模拟订单进行了多场景实时调度实验, 验证了方法在滚动调度中兼顾长期目标的优越性和分层求解的高效性, 为即时配送服务提供了兼具成本效益和服务质量的优化调度解决方案. |
英文摘要 |
To solve O2O delivery challenges like changing demands and real-time requirements, this study proposes a two-layer deep reinforcement learning method. The upper-layer agent handles order allocation by learning from historical orders and road conditions. The lower-layer agent focuses on optimizing delivery routes for riders. A global reward system connects both layers to share optimization signals. This design helps balance long-term goals across multiple scheduling periods. In the simulation platform, real and simulated orders from a certain food delivery platform in Dalian were subjected to multi-scenario real-time dispatching experiments. This verified the superiority of the method in considering long-term goals in rolling scheduling and the efficiency of hierarchical solving, providing an optimized dispatching solution that is both cost-effective and service quality-oriented for immediate delivery services. |
关闭 |