Page 42 - CCFA Journal - Seventh Issue
P. 42
机器学习 Machine Learning 加中金融
For data generation, there are three sets of data that can be generated which are A, B and C. Set A includes 5 millions of options and
each option involves 1 millions of simulation paths; Set B includes 50 millions of options and each option involves 100 thousands of
simulation paths; Set C includes 500 millions of options and each option involves 10 thousands of simulation paths. It will be tested
later which set of data are best for training the neural network.
For hyperparameter tuning, loss curves can be used to determine which set of hyperparameters are optimal. In the graph, the y-axis
represents the loss (i.e. value of the loss function, the difference between the predicted value and actual value) and x-axis represents
the train epoch (i.e. how many times the data has gone through the neural network). In the graph, each set of hyperparameters
comes with a test error curve and a training error curve. Usually, the curves will converge and lower loss curves indicates a better set.
Early stopping can be applied when the test error curve stops converging and starts to U-turn. It helps to make sure the neural
network doesn’t train further and avoid the overfitting problem.
Lastly, for performance measurement, it is a good idea to look at the distribution of the loss terms using a histogram. The best result
is a symmetric distribution with most loss terms centered at zero. This can be achieved through the training with the right data and
optimal hyperparameter tuning.
在数据生成方面,可以生成甲、乙、丙三组数据。甲组包含五百万个期权价值,每个期权价值涉及一百万条模拟路径;
乙组包含五千万个选项,每个选项涉及十万条模拟路径; 丙组包含五亿个选项,每个选项涉及一万条模拟路径。最后,
我们会测试哪组数据最适合神经网络作训练。
对于超参数调整,损失曲线可用于确定哪一组超参数是最好的。在损失曲线图中,y 轴表示损失(即损失函数的值,预测
值与实际值之间的差异),x 轴表示训练期(即每筆数据经过神经网络的次数)。在图中,每组超参数都带有一条测试误
差曲线和一条训练误差曲线。通常,曲线会收敛,较低的损耗曲线表示那组超参数更好。当测试误差曲线停止收敛并开始
回升时,可以应用提前停止方法,有助于确保神经网络不会进一步去训练并避免过度拟合的问题。
最后,在神经网络性能测量方面,使用直方图查看损失项的分布是一个好方法。 最好的结果是对称分布,其中大多数损
失项以零为中心。 这可以通过使用正确数据和最佳超参数调整的训练来实现。
本文的中文翻译是由 CCFA 义工 Yue Wu 提供的。
Application of Machine Learning in Fraud Detection and Anti-Money Laundering
机器学习在欺诈侦察与反洗钱中的应用
Fei Ye, Senior Manager, BMO Model Validation, 蒙特利尔银行模型检验高级经理
Disclaimer
The opinions expressed in this article are those of the author. They do not purport to reflect the
opinions or views of the author's employer
Artificial intelligence (AI) and machine learning (ML) are buzzwords of our day. AI is a branch of
computer science that attempts to simulate human intelligence in machines so that they can
perform tasks like humans. ML is one approach to achieving AI—it is the study of computer
algorithms that “can automatically detect patterns in data, and then use the uncovered patterns to
predict future data, or to perform other kinds of decision making under uncertainty” [1]. Thanks to advances in computing and
abundance of data, ML has enjoyed tremendous development over the past decades and found successful applications in many
industries. It is an indispensable tool for the financial industry, especially in areas of fraud detection and anti-money laundering (AML).
Fraud and money laundering are two criminal activities financial institutions (FIs) frequently encountered in daily operations. Fraud
is deliberate and wrongful deception intended to achieve financial or personal gains (e.g., credit card fraud, check fraud) [2]. Money
laundering is “the practice of integrating proceeds of crime into the legitimate mainstream of the financial community” by hiding the
funds’ illicit origins [3]. Fraud and money laundering activities are well-considered, concealed, and time-evolving. They are rare events
compared to legitimate activities. FI’s failure to mitigate them may cause financial losses, reputation damage, security breaches, or
regulatory penalties.
人工智能和机器学习是当下热门词汇。人工智能是计算机科学的一个分支,它试图在机器中模拟人类智能,以便它们可
以像人类一样执行任务。机器学习是实现人工智能的一种方法——它是对计算机算法的研究,“可以自动检测数据中的模
式,然后使用发现的模式来预测未来的数据,或者在不确定的情况下执行其他类型的决策”[1]。由于计算技术的进步和
数据的丰富,机器学习在过去的几十年中取得了巨大的发展,并成功应用于许多行业领域中。它是金融行业不可或缺的工
具,尤其是在欺诈检测和反洗钱 (AML) 领域。
欺诈和洗钱是金融机构 (FI) 在日常运营中经常遇到的两种犯罪活动。欺诈是旨在实现经济或个人利益的故意和错误的欺骗
(例如,信用卡欺诈、支票欺诈)[2]。洗钱是“通过隐藏资金的非法来源,将犯罪所得转入合法传统金融领域的做法”
[3]。欺诈和洗钱活动是经过缜密思考的、隐蔽的和随着时间不断演变进化的。与合法活动相比,它们是罕见的事件。 金
融机构如未能防止欺诈和洗钱可能会导致财务损失、声誉受损、安全漏洞或监管处罚。
CCFA JOURNAL OF FINANCE May 2022
Page 42 第42页