引导式教学场景下深度强化学习的模型研究 |
Deep Reinforcement Learning Model in Heuristic Coaching Scenario |
摘要点击 1518 全文点击 0 投稿时间:2018-04-10 修订日期:2018-12-31 |
查看/发表评论 下载PDF阅读器 |
中文关键词 推荐算法; 深度强化学习; 学习区; 复杂网络 |
英文关键词 recommendation algorithm; deep reinforcement learning; stretch zone; complex networks |
基金项目 广东省应用型科技研发专项资金项目(2016B010124008), 国家自然科学基金(71771104),广州市产业重大专项(201802010034) |
作者 | 单位 | 邮编 | 汤胤 | 暨南大学管理学院 | 510632 | 王雯 | 暨南大学管理学院 | | 黄书强 | 暨南大学理工学院 | |
|
中文摘要 |
交互式教学要求``前向推荐',是个典型的引导式场景. 同时, 由于答题状态输入有个庞大的状态空间, 传统方法下智能体也难以获取用户隐含的认知风格及其多元智能特质. 这使得实践应用中存在作答正确率不稳定的问题, 难以达到循序渐进的效果. 本文针对这类引导式场景, 结合心理学上学习区的概念, 构造题库网络图, 进而根据特定学习者的行为来划分割集, 由此建立引导式教学场景下深度强化学习的模型, 在推荐偏差指标的控制下,做出最适合学习者的内容推荐. 对比实验证明了模型相比控制组能给出合理的``前向推荐', 有效解决学习者作答正确率不稳定的问题. 本文提出的引导式教学场景下深度强化学习的模型能够拟合经验教师出题决策的思维方式, 在历史作答数据中提取有效隐含信息, 为学习者推荐最佳习题. 模型亦可广泛应用在类似的引导式场景下. |
英文摘要 |
Heuristic coaching as a typical guiding scenario, requires ``forward recommendation". In the meantime, with traditional methods, the input of a high dimensional state space also makes difficult the extraction the hidden cognitive and multiple intelligence traits of user. In real-world applications, therefore, it is hard to maintain the correct answer rate, thus achieving gradual learning results. Concerned with the concepts of the learning zone in psychology, this paper focuses on heuristic scenarios, and constructs a network of question-base, where cut set is created based on behaviors of specific learners. A deep reinforcement learning model to make best recommendation for learners is then proposed. The model is trained with learner's bebavior under the control of recommendation deviation factor, and outputs the best recommendation of content. A comparative experiment proves the model can effectively solve the unstable problem of the correct rate, outforming the control group. The model imitates the thinking pattern of an experienced teacher, extract valid implicit information in the historical answering data, recommends the best exercise for the learner. The model can also be widely used in similar guidance. |
关闭 |
|
|
|
|
|