| 基于团嵌入和自适应最近邻优化的金融欺诈检测方法 |
| Financial Fraud Detection Method Based on Clique Embedding and Adaptive Nearest Neighbor Optimization |
| 摘要点击 61 全文点击 0 投稿时间:2025-01-20 修订日期:2025-11-19 |
| 查看/发表评论 下载PDF阅读器 |
| 中文关键词 金融欺诈检测;动态网络嵌入;深度自编码器;自适应潜在近邻;投票机制 |
| 英文关键词 Financial Fraud Detection; Dynamic Network Embedding; Deep Autoencoder; Adaptive Potential Nearest Neighbor; Voting Mechanism |
| 基金项目 国家自然科学基金(No.61772196),湖南省自然科学基金(No.2020JJ4249),湖南省社会科学成果评审委员会课题重点项目(No.XSP19ZD1005),湖南省教育厅科学研究重点项目(21A0374),湖南工商大学2024校级研究生科研创新项目(CX2024YB008) |
| 投稿方向 异常检测,金融欺诈检测 |
| 作者 | 单位 | 邮编 | | 蒋伟进 | 湖南工商大学 | 410000 | | 刘茜* | 湖南工商大学 | 410000 | | 聂彩燕 | 湖南工商大学 | | | 杨璇 | 湖南工商大学 | | | 杜熙晨 | 湖南工商大学 | |
|
| 中文摘要 |
| 随着金融交易量的激增,金融欺诈检测面临着数据分布不均衡、噪声干扰过多以及欺诈类之间的联系稀疏等挑战。为了提高欺诈检测的识别精度和泛化能力,本文提出了一种基于团嵌入和自适应最近邻优化的欺诈检测方法。首先,为了缓解数据集高维且不平衡问题,本文引入了一种协同优化的数据预处理策略,采用类别敏感性权重的改进随机欠采样技术,调整了数据集中的良性类别和欺诈类别样本比例;然后,提出了一种基于深度神经网络嵌入和储层采样的高效网络表示学习算法。该算法能够准确快速地编码演变中的网络对象,并学习数据的低维表示。其次,在扩充后的平衡数据集上,提出了一种融合自适应最近邻算法的随机森林分类框架,与传统的随机森林模型相比,该方法通过自适应地调整决策树的分裂准则,根据数据的局部特性动态选择最优的特征和分裂点。最后,考虑到袋外样本造成的信息丢失,提出了一种新的基于潜在近邻的投票机制,以取代传统的多数投票机制。实验结果表明,该方法具有较高的分类精度和较低的方差,充分验证了所提方法与模型的有效性。 |
| 英文摘要 |
| As the volume of financial transactions surges, financial fraud detection is confronted with challenges such as data distribution imbalance, excessive noise interference, and sparse connections among fraud classes. To enhance the identification accuracy and generalization capability of fraud detection, this paper proposes a fraud detection method based on clique embedding and adaptive nearest neighbor optimization. Firstly, to alleviate the issues of high dimensionality and imbalance in the dataset, this paper introduces a synergistic optimization data preprocessing strategy. An improved random under-sampling technique with class-sensitive weights is applied to adjust the ratio of benign and fraud class samples in the dataset. At the same time, an efficient network representation learning algorithm based on deep neural network embedding and reservoir sampling is proposed. It can accurately and quickly encode evolving network objects and learn low-dimensional representations of the data. Secondly, on the expanded balanced dataset, this paper proposes a random forest classification framework integrated with an adaptive nearest neighbor algorithm. Compared with traditional random forest models, this method adaptively adjusts the splitting criteria of decision trees, dynamically selecting the optimal features and splitting points based on local data characteristics. Lastly, considering the information loss caused by out-of-bag samples, this paper proposes a new voting mechanism based on potential nearest neighbors to replace the traditional majority voting mechanism. Experimental results show that the method has high classification accuracy and low variance, fully verifying the effectiveness of the proposed method and model. |
| 关闭 |