A new collaboration to defeat financial crime

collaboration Bleckwen x LIP6

Introduction

LIP6 and Bleckwen have decided to join their efforts to develop a new innovative approach to better fight against Financial Crime. Matthieu Latapy (LIP6) and Leonardo Noleto (Bleckwen) are presenting you this new collaboration. 

 

1. Can you introduce yourself? 

Leonardo Noleto Head od Data ScienceLeonardo Noleto (LN): Hello, I’m Head of Data Science at Bleckwen, a software vendor specialized in fraud and financial crime detection. Bleckwen has developed a real time detection engine, combining explainable AI, behavioural analytics, rules and human in the loop feedback to deliver unparalleled performance.

Before joining Bleckwen, I worked in the field of anomaly detection and set up a tailored fraud detection system for one of the leading European cloud suppliers. I am convinced that the use of Machine Learning methods, combined with the professional expertise of analysts, is a game-changer when it comes to combating fraud.

 

Matthieur-Latapy-Senior-Researcher

Matthieu Latapy (ML): I am a CNRS senior researcher at LIP6, a computer science laboratory. My role is to move the scientific knowledge forward to a “state-of-the-art” level, whilst managing the transfer of this knowledge towards real business application. I am in contact with several companies in a two-way relationship: how our scientific knowledge can help industrial use cases, and which applications raise questions that could feed our scientific research. Within the LIP6 laboratory, we have created in 2008 the Complex Network team, dedicated to the study of real-world relational data (like mobility, traffic, phone calls, transactions traces) using graph theory.

 

2.Detecting fraud in an open and digital world is challenging: how science can help? 

LN: From a data science perspective, fraud is a rare event, hidden in a large amount of data. To detect fraudulent activities, Bleckwen analyses all the data related to a transaction, using different techniques: ML models and Behavioural Analytics. We also use Graph theory to study the structure of relations and identify the links between the entities across transactions. Thanks to graph analytics, we can analyze multiple data sets across multiple transactions to aggregate the information not only at an account or a user level, but also using new variables such as destination, IP address, geographical places etc. We create links between entities to get a greater picture of the payment activities and detect suspicious activities.

Graph theory is useful but it has limits: the traditional approach of this technique poorly captures both temporal and structural nature of interactions. But fraud is a dynamic phenomenon and fraud patterns never stop evolving so traditional algorithms cannot keep pace.

Our challenge was to use a graph theory in a dynamic way and that is why we asked LIP6 who has developed a graph technique that studys link stream, including the temporal structure of interactions.

 

Graph Link streams

 

ML: Indeed, transaction data are in fact link sequences over time. So far, we could use two distinct approaches to analyse such data:

  • graph theory to study data as networks. In this case, as Leonardo mentioned, we are missing the time dimension in our analysis.
  • temporal analysis for a dynamic study of data over time but without taking in consideration the interactions between entities.

To properly analyse link sequences, we have developed a new dedicated formalism to cope with interactions over time, which we call stream graphs and link streams. It captures both structural and temporal information of interactions in a consistent way. This new approach is important for the research community as it merges two important scientific domains, which are so far quite independent. We strongly believe that the outcomes of our research on stream graphs and link streams will greatly benefit many industrial applications, such as fraud detection.

LN: Yes, thanks to this technique, we will be able to identify the link between entities taking in consideration the time and the speed to uncover signals that we may have previously ignored. This technique brings us time as a new dimension for a closer detection of the most sophisticated fraud patterns used in fraud and in AML/CTF.

 

3. Why was this collaboration started ?

ML: In 2017-2018, I published a reference article in the international journal Social Networks Analysis and Mining (SNAM) entitled “Stream Graphs and Link Streams for the Modelling of Interactions over Time”. This paper presents the foundations of our approach. Bleckwen was one the first readers of this paper and they contacted me after the release to discuss about it. I was excited to present this topic to a fintech specialized in fraud detection.

LN: Yes, I remember these passionate exchanges!  We quickly realized that our ambitions were aligned. Bleckwen wants to improve the detection capabilities of its fraud solution by integrating the latest cutting-edge technologies. LIP6 wants to make Science move ahead by applying its research to industrial use cases.

Our common objective is to clearly lead the scientific and technological advances required to make significant progresses in anomaly detection in financial transactions.

 

4. How do you collaborate?

ML: We have launched a joint laboratory FiT (link: http://fit.complexnetworks.fr) supported by the French National Agency for Research (ANR). This long-term collaboration means that we are working hand-in-hand to model and analyse financial transactions as link streams. We are currently developing and implementing the formalisms and algorithms, enabling the proper exploitation of this data. At LIP6 we are interested in applying our research to real financial data and to benefit from the fraud expertise of the Bleckwen’s team (ex: business understanding, fraud patterns knowledge, data interpretation).

LN: On our side, we are looking for the high-level expertise of the LIP6 team, especially in the new approach of dynamic graph modelling they have invented. The complementarity nature with our internal research is obvious: we adapt their feature engineering to our ML models to bring a new time-based dimensions to the analyses and reach a greater detection accuracy.

This collaboration enhances our research for identifying sophisticated fraud patterns especially in AML/CFT fraudulent activities.

 

5. What will be the benefits of this collaboration?

LN: Our customers will directly benefit from this collaboration with an improved quality in fraud detection thanks to an accurate and robust scientific approach. The pace of fraud and money laundering continues to accelerate along with the sophistication of attacks. Criminals are creative, collaborative, well-funded and technologically advanced.

As Bleckwen’s mission is to defeat financial crime and contribute to make the world a safer place, we need to collaborate!

ML: This collaboration is a great example of research-industry complementarity: a virtuous loop between fundamental research and real-life application with interesting issues to solve on both sides.

We expect our approach to become a game changer in the fight against fraud, which would greatly help to promote our scientific research: we are confident that our collaboration will have a strong impact on both science and financial crime detection!

 

Want to know more about our new research topic? Feel free to contact us:


Bleckwen contributing to the Open Source Community

Bleckwen ML algorithm optimization

Introduction

In this blog post, we want to share with you the story behind our recent publication on Github. 

Bleckwen is an award-winning French Behavioural Analytics fintech, dedicated to helping Banks and Financial Institutions defeat fraud and financial crime and make the world just a little bit safer.  Our risk engine is built to score transactions using a combination of explainable machine learning models, rules and human in the loop decisioning. 

 

1. Fighting fraud in an open banking world

Detecting fraud in an open and digital world, where everyone wants an immediate experience means analysing large amounts of data at very low latencies to spot suspicious activities in real-time.

Operating in the core of a banks payment architecture requires lot of technical constraints and we need to ensure that we meetheir exacting service level requirements, and as our clients work in highly regulated markets, black box AI is not an option, therefore being able to make explainable decisions in real time, around the clock, with no downtime, at scale, is a big challenge. 

To meet these requirements we built our fraud detection engine from data ingestion to decision based on various elements: 

  • An open source streaming middleware, Confluent Kafka, to be able to process high-throughput and low-latency real-time data feeds
  • An open source processing framework, Apache Flink, to be able to distribute the load across servers, scale threads on-demand, and that guarantees high availability, fault-tolerance and integrity (exactly-once)
  • An open source Machine Learning winning algorithm, XGBoost, designed for distributed frameworks, able to handle high volumes of data, which with fraud is inherently highly unbalanced with explainable AI

2. Taking a new challenge   

In a changing payment landscape with new regulations coming into force (DSP2, real-time payment etc) and customers and businesses are adopting new and rapidly changing behaviours due to COVID and digitisation, banks work continuously to manage evolving risks whilst also trying to reduce friction, in a market that is becoming ever more competitive with the emergence of neo and challenger banks. So there is a growing need for increased resilience, performance and control whilst still containing costs.  

In a current engagement, a global Tier 1 bank we are working with asked us to help them address a new challenge: how to guarantee sustained throughput of thousands of transactions per second  in a specific latency window (below 200 milliseconds) whilst ensuring

  • full explainablity of decision,
  • highly accurate prediction,
  • low false positives
  • at a decent cost. 

We realized that our solution was not able to face this challenge, unless we scaled up the infrastructure which in turn lead to unacceptable total cost of ownership profile. So after investigation we determined that the bottleneck resided in the core of Machine Learning processing and specifically within the explainable AI computation process. Being opensource we also determined that the implementation of XGBoost for Java was using calls to custom system libraries adding significant CPU and latency overhead. 

So being the curios engineers we are, we looked at various different options, focusing on some pure Java implementations of XGBoost within the open source community such as the one done by Yelp xgboost-predictor-java  even this was not implementing explainable AI. 

So we needed to design and develop our own implementation of XGBoost. 

 

3. Meeting customer’s requirement   

The solution had to respect several technical constraints and requirements

  • pure JVM implementation (no use of System Libraries), 
  • interoperability with all JVM languages (Java, Scala, Kotlin)
  • compatibility with XGBoost models from versions 0.8 to 1.0 and be able to support future incarnations
  • as well as an implementation of explainability using the SHAP algorithm.  

We also re-designed the Decision Trees for Java, instead of keeping a C++ data layer like the ones used in XGBoost core. 

The overall performance gain was significant when implemented within a full stack within our solution framework (Kaka, Flink) as we were able to do 6 times the throughput on same hardware, and reduce by half the end to end latency which delivered the desired performance at the right cost in a deterministic fashion.  

On micro-benchmarks (done with JMH), the gain is less obvious but shows that scoring predictions throughput alone was doubled.  

Models comparision

Conclusion: 

This experience was a good example meeting the business requirements at the right cost and hitting our clients objectives. As part of this excercise it could be too easy to only focus on model training times or model accuracy without paying attention to real world inference latencies and throughput requiremnts of our client but at Bleckwen we always go further as great model accuracy is for nothing if deosnt solve real world problems.

At Bleckwen, we love working with the latest tech, building new models and modelling approaches – but most of all we love listenting to our clients needs, objectives and constraints to ensure we deliver the optimal solution.

So a big up to the Bleckwen engineering team and a big thanks to the Yelp engineering team for some great inspiations.

So we are more than happy to share this code on Github to make the community benefit from our work as we strongly believe that collaboration is key to improve software development so if there is any comments or feedbacks its more than welcome!

Please email us: engineering@bleckwen.ai  

Bleckwen tech team