Bleckwen ML algorithm optimization

Introduction

In this blog post, we want to share with you the story behind our recent publication on Github. 

Bleckwen is an award-winning French Behavioural Analytics fintech, dedicated to helping Banks and Financial Institutions defeat fraud and financial crime and make the world just a little bit safer.  Our risk engine is built to score transactions using a combination of explainable machine learning models, rules and human in the loop decisioning. 

 

1. Fighting fraud in an open banking world

Detecting fraud in an open and digital world, where everyone wants an immediate experience means analysing large amounts of data at very low latencies to spot suspicious activities in real-time.

Operating in the core of a banks payment architecture requires lot of technical constraints and we need to ensure that we meetheir exacting service level requirements, and as our clients work in highly regulated markets, black box AI is not an option, therefore being able to make explainable decisions in real time, around the clock, with no downtime, at scale, is a big challenge. 

To meet these requirements we built our fraud detection engine from data ingestion to decision based on various elements: 

  • An open source streaming middleware, Confluent Kafka, to be able to process high-throughput and low-latency real-time data feeds
  • An open source processing framework, Apache Flink, to be able to distribute the load across servers, scale threads on-demand, and that guarantees high availability, fault-tolerance and integrity (exactly-once)
  • An open source Machine Learning winning algorithm, XGBoost, designed for distributed frameworks, able to handle high volumes of data, which with fraud is inherently highly unbalanced with explainable AI

2. Taking a new challenge   

In a changing payment landscape with new regulations coming into force (DSP2, real-time payment etc) and customers and businesses are adopting new and rapidly changing behaviours due to COVID and digitisation, banks work continuously to manage evolving risks whilst also trying to reduce friction, in a market that is becoming ever more competitive with the emergence of neo and challenger banks. So there is a growing need for increased resilience, performance and control whilst still containing costs.  

In a current engagement, a global Tier 1 bank we are working with asked us to help them address a new challenge: how to guarantee sustained throughput of thousands of transactions per second  in a specific latency window (below 200 milliseconds) whilst ensuring

  • full explainablity of decision,
  • highly accurate prediction,
  • low false positives
  • at a decent cost. 

We realized that our solution was not able to face this challenge, unless we scaled up the infrastructure which in turn lead to unacceptable total cost of ownership profile. So after investigation we determined that the bottleneck resided in the core of Machine Learning processing and specifically within the explainable AI computation process. Being opensource we also determined that the implementation of XGBoost for Java was using calls to custom system libraries adding significant CPU and latency overhead. 

So being the curios engineers we are, we looked at various different options, focusing on some pure Java implementations of XGBoost within the open source community such as the one done by Yelp xgboost-predictor-java  even this was not implementing explainable AI. 

So we needed to design and develop our own implementation of XGBoost. 

 

3. Meeting customer’s requirement   

The solution had to respect several technical constraints and requirements

  • pure JVM implementation (no use of System Libraries), 
  • interoperability with all JVM languages (Java, Scala, Kotlin)
  • compatibility with XGBoost models from versions 0.8 to 1.0 and be able to support future incarnations
  • as well as an implementation of explainability using the SHAP algorithm.  

We also re-designed the Decision Trees for Java, instead of keeping a C++ data layer like the ones used in XGBoost core. 

The overall performance gain was significant when implemented within a full stack within our solution framework (Kaka, Flink) as we were able to do 6 times the throughput on same hardware, and reduce by half the end to end latency which delivered the desired performance at the right cost in a deterministic fashion.  

On micro-benchmarks (done with JMH), the gain is less obvious but shows that scoring predictions throughput alone was doubled.  

Models comparision

Conclusion: 

This experience was a good example meeting the business requirements at the right cost and hitting our clients objectives. As part of this excercise it could be too easy to only focus on model training times or model accuracy without paying attention to real world inference latencies and throughput requiremnts of our client but at Bleckwen we always go further as great model accuracy is for nothing if deosnt solve real world problems.

At Bleckwen, we love working with the latest tech, building new models and modelling approaches – but most of all we love listenting to our clients needs, objectives and constraints to ensure we deliver the optimal solution.

So a big up to the Bleckwen engineering team and a big thanks to the Yelp engineering team for some great inspiations.

So we are more than happy to share this code on Github to make the community benefit from our work as we strongly believe that collaboration is key to improve software development so if there is any comments or feedbacks its more than welcome!

Please email us: engineering@bleckwen.ai  

Bleckwen tech team