基于强化学习的前m因子序贯分支法与应用 |
Top-m factor screening and application: reinforcement learning and sequential bifurcation combined |
摘要点击 29 全文点击 0 投稿时间:2024-08-16 修订日期:2025-02-15 |
查看/发表评论 下载PDF阅读器 |
中文关键词 序贯分支法; 多臂赌博机; 强化学习; 数字孪生; 筛选; 仿真实验 |
英文关键词 Sequential bifurcation; multiarmed bandit; reinforcement learning; Digital Twin; screening; simulation experiments |
基金项目 国家自然科学基金项目(面上项目,重点项目,重大项目) |
作者 | 单位 | 邮编 | 谢翔 | 中南大学商学院 | 410083 | 施文 | 中南大学商学院 | | 李青仪* | 清华大学深圳研究生院 | 518055 |
|
中文摘要 |
作为主流的仿真因子筛选方法之一, 序贯分支法(SB) 的不足之处在于依赖分析者对重要性阈值参数所掌握的先验信息. 针对该问题, 提出了一种结合强化学习随机多臂赌博机(MAB)框架的SB方法(简称TopmSB)来识别前m个显著因子效应. 在TopmSB的每一迭代阶段, 提出了均匀分配(UA)和自适应分配(AA)两种预算分配策略来识别最优群组进行分支. 蒙特卡洛仿真实验表明, TopmSB展现出优于SB的实验效率和效力, 并且AA策略相比UA策略有更高的效力. 基于产品召回过程仿真的案例研究发现, 政府部门相关的决策因子对召回时间有显著影响, 从而展现了TopmSB良好的实际应用前景. |
英文摘要 |
As one of the mainstream simulation factor screening methods, sequential bifurcation (SB) has the drawback of relying on the analyst's prior knowledge of the importance threshold parameters. To address this challenge, this study proposes a novel SB-based method (abbreviated TopmSB) by combining stochastic multi-armed bandit (MAB) framework for identifying the top-m significant factor effects. In each iterative stage of TopmSB, this study introduces two budget allocation strategies, uniform allocation (UA) and adaptive allocation (AA), to identify the optimal group for bifurcation. The Monte Carlo simulations show that TopmSB can outperform SB in computational efficiency and statistical power. Furthermore, the AA strategy demonstrates superior effectiveness compared to the UA strategy. A simulation case study on product recall process reveals that decision-making factors related to government authorities significantly influence recall times, showcasing the promising real-world applicability of TopmSB. |
关闭 |