Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach

Tertychnyi, Pavlo; Slobozhan, Ivan; Ollikainen, Madis; Dumas, Marlon

doi:10.1007/978-3-030-64466-6_3

Pavlo Tertychnyi⁸,
Ivan Slobozhan⁸,
Madis Ollikainen⁹ &
…
Marlon Dumas⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 401))

Included in the following conference series:

International Workshop on Enterprise Applications, Markets and Services in the Finance Industry

786 Accesses
4 Citations

Abstract

In this paper, we address the problem of detecting potentially illicit behavior in the context of Anti-Money Laundering (AML). We specifically address two requirements that arise when training machine learning models for AML: scalability and imbalance-resistance. By scalability we mean the ability to train the models to very large transaction datasets. By imbalance-resistance we mean the ability for the model to achieve suitable accuracy despite high class imbalance, i.e. the low number of instances of potentially illicit behavior relative to a large number of features that may characterize potentially illicit behavior. We propose a two-layered modelling concept. The first layer consists of a Logistic Regression model with simple features, which can be computed with low overhead. These features capture customer profiles as well as global aggregates of transaction histories. This layer filters out a proportion of customers whose activity patterns can be deemed non-illicit with high confidence. In the second layer, a gradient boosting model with complex features is used so as to classify the remaining customers. We anticipate that this two-layered approach achieves the stated requirements. Firstly, feature extraction is more scalable as the more computationally demanding features of the second layer do not need to be extracted for every customer. Secondly, the first layer acts as an undersampling method for the second layer, thus partially addressing the class imbalance. We validate the approach using a real dataset of customer profiles and transaction histories, together with labels provided by AML experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We also conducted experiments using another implementation of extreme gradient boosting (XGBoost), but these experiments consistently led to lower accuracy. In the evaluation reported below, we only report on results obtained using Catboost.

References

Tsui, E., Gao, S., Xu, D., Wang, H., Green, P.: Knowledge-based anti-money laundering: a software agent bank application. J. Knowl. Manage. (2009)
Google Scholar
Breslow, S., Hagstroem, M., Mikkelsen, D., Robu, K. The new frontier in anti-money laundering McKinsey Insights, November 2017. https://www.mckinsey.com/business-functions/risk/our-insights/the-new-frontier-in-anti-money-laundering
Kotsiantis, S., Koumanakos, E., Tzelepis, D., Tampakas, V.: Forecasting fraudulent financial statements using data mining. Int. J. Comput. Intell. 3(2), 104–110 (2006)
Google Scholar
Jayasree, V., Siva Balan, R.V.: Money laundering regulatory risk evaluation using bitmap index-based decision tree. J. Assoc. Arab Univ. Basic Appl. Sci. 23(1), 96–102 (2017)
Google Scholar
Nielsen, D.: Tree boosting with XGBoost - why does XGBoost win “every” machine learning competition? Master’s Thesis, NTNU (2016)
Google Scholar
Palshikar, G.K., Apte, M.: Financial Security Against Money Laundering: A Survey. In: Emerging Trends in ICT Security, pp. 577–590. Morgan Kaufmann (2014)
Google Scholar
Senator, T.E., et al.: Financial crimes enforcement network AI system (FAIS) identifying potential money laundering from reports of large cash transactions. AI Mag. 16(4), 21 (1995)
Google Scholar
Chen, Z., Teoh, E.N., Nazir, A., Karuppiah, E.K., Lam, K.S.: Machine learning techniques for anti-money laundering (AML) solutions in potentially suspicious transaction detection: a review. Knowl. Inf. Syst. 57(2), 245–285 (2018)
Article Google Scholar
Helmy, T.H., Zaki, M., Salah, T., Badran, K.: Design of a monitor for detecting money laundering and terrorist financing. J. Theoret. Appl. Inf. Technol. 85(3), 425 (2016)
Google Scholar
Chen, Y.T., Mathe, J.: Fuzzy computing applications for anti-money laundering and distributed storage system load monitoring (2011)
Google Scholar
Cortinas, R., et al.: Secure failure detection and consensus in trustedpals. IEEE Trans. Dependable Secure Comput. 9(4), 610–625 (2012)
Article Google Scholar
Phua, C., Smith-Miles, K., Lee, V., Gayler, R.: Resilient identity crime detection. IEEE Trans. Knowl. Data Eng. 24(3), 533–546 (2010)
Article Google Scholar
Liou, F.M.: Fraudulent financial reporting detection and business failure prediction models: a comparison. Manage. Audit. J. (2008)
Google Scholar
tej.com.tw
Lopez-Rojas, E.A., Axelsson, S.: Money laundering detection using synthetic data. In: The 27th annual workshop of the Swedish Artificial Intelligence Society (SAIS), Örebro; Sweden, 14–15 May 2012, no. 071, pp. 33–40. Linköping University Electronic Press, May 2012
Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems, pp. 6638–6648 (2018)
Google Scholar
Leontjeva, A., Goldszmidt, M., Xie, Y., Yu, F., Abadi, M.: Early security classification of skype users via machine learning. In Proceedings of the 2013 ACM workshop on Artificial Intelligence and Security, pp. 35–44. ACM, November 2013
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Article MathSciNet Google Scholar
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta-a system for feature selection. Fundamenta Informaticae 101(4), 271–285 (2010)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was partly funded by the European Regional Development Funds via Archimedes Foundation (NUTIKAS programme).

Author information

Authors and Affiliations

University of Tartu, Tartu, Estonia
Pavlo Tertychnyi, Ivan Slobozhan & Marlon Dumas
Tallinn University of Technology, Tallinn, Estonia
Madis Ollikainen

Authors

Pavlo Tertychnyi
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Slobozhan
View author publications
You can also search for this author in PubMed Google Scholar
Madis Ollikainen
View author publications
You can also search for this author in PubMed Google Scholar
Marlon Dumas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavlo Tertychnyi .

Editor information

Editors and Affiliations

Goethe University Frankfurt, Frankfurt, Germany
Benjamin Clapham
Goethe University Frankfurt, Frankfurt, Germany
Jascha-Alexander Koch

A Appendix

Sequence-based Features Calculation. We have an assumption that sequences of customers’ transactions are not random and follow some hidden structure. Therefore we used a way to encode this information to the model by so-called generative log-odds features [17], where we estimate transaction probabilities between each transaction state separately for potentially illicit and non-illicit customers and then compare them. This approach allows us to capture the dynamics of the transaction history for our classification task while introducing less overhead than methods based on neural networks (e.g. Boltzman machines) or deep learning auto-encoders. In the log-odd feature extraction method, we want to generate features based on sequential probabilities. We are interested in the following probability:

$$\begin{aligned} P(X)= P(x_1,x_2,\ldots ,x_n ) \end{aligned}$$

(1)

where $x_1,x_2,\ldots ,x_n$ are some discrete properties of transactions (i.e. direction). One particular way to estimate this probability is to use the chain rule:

$$\begin{aligned} P(X)=P(x_1,\ldots ,x_n)=p(x_1)p(x_2 \mid x_1)\ldots p(x_n \mid x_1,\ldots ,x_{n-1}) \end{aligned}$$

(2)

In some cases, it is practically impossible, so we can simplify the assumptions using Markov property:

$$\begin{aligned} P(X_n=x_n \mid X_{n-1}=x_{n-1},\ldots ,X_0=x_0)=P(X_n=x_n \mid X_{n-1}=x_{n-1}) \end{aligned}$$

(3)

But for our task, we are more interested in finding that a particular set of transactions is more illicit than just a set of regular non-illicit transactions. Mathematically, we want to estimate:

$$\begin{aligned} \mathop {{{\,\mathrm{argmax}\,}}}\nolimits _{y\,\in \, (potentially\,illicit, non-illicit)} P(Y=y|X) \end{aligned}$$

(4)

One way to calculate this probability is to use Bayes theorem:

$$\begin{aligned} \mathop {{{\,\mathrm{argmax}\,}}}\limits _y P(Y=y \mid X) = \mathop {{{\,\mathrm{argmax}\,}}}\limits _y P(X \mid Y=y)P(Y=y) \end{aligned}$$

(5)

The only thing left is to calculate $ P(X \mid Y=y)$ and $ P(Y=y)$. $P(X \mid Y=y)$ can be calculated using train set and then calculating transition probabilities separately for potentially illicit class and non-illicit class. For example, if there are only two states in a transaction sequence, namely in, out. All we need to estimate transition probabilities is to calculate

$$\begin{aligned} P(in \mid out) = \frac{(count(out \rightarrow in))}{(count(out))} \end{aligned}$$

(6)

Similarly, for other combination of in, out we should do the same. $ P(Y=y)$ is the prior probability of being potentially illicit, which is simply a proportion of potentially illicit customers in a full customer set for train data. Finally, instead of outputting a binary label 1/0 (potentially illicit sequence or not), we can plug in this as a feature into a classifier along with other features. We can use so-called log-odds ratio instead of a binary feature, defining as:

$$\begin{aligned} \log \frac{P(Y=potentially\,illicit \mid X)}{P(Y=non-illicit \mid X)} \end{aligned}$$

(7)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tertychnyi, P., Slobozhan, I., Ollikainen, M., Dumas, M. (2020). Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach. In: Clapham, B., Koch, JA. (eds) Enterprise Applications, Markets and Services in the Finance Industry. FinanceCom 2020. Lecture Notes in Business Information Processing, vol 401. Springer, Cham. https://doi.org/10.1007/978-3-030-64466-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-64466-6_3
Published: 26 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64465-9
Online ISBN: 978-3-030-64466-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalable and Imbalance-Resistant Machine Learning Models for Anti-money Laundering: A Two-Layered Approach

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation