A new collaboration to defeat financial crime

collaboration Bleckwen x LIP6


LIP6 and Bleckwen have decided to join their efforts to develop a new innovative approach to better fight against Financial Crime. Matthieu Latapy (LIP6) and Leonardo Noleto (Bleckwen) are presenting you this new collaboration. 


1. Can you introduce yourself? 

Leonardo Noleto Head od Data ScienceLeonardo Noleto (LN): Hello, I’m Head of Data Science at Bleckwen, a software vendor specialized in fraud and financial crime detection. Bleckwen has developed a real time detection engine, combining explainable AI, behavioural analytics, rules and human in the loop feedback to deliver unparalleled performance.

Before joining Bleckwen, I worked in the field of anomaly detection and set up a tailored fraud detection system for one of the leading European cloud suppliers. I am convinced that the use of Machine Learning methods, combined with the professional expertise of analysts, is a game-changer when it comes to combating fraud.



Matthieu Latapy (ML): I am a CNRS senior researcher at LIP6, a computer science laboratory. My role is to move the scientific knowledge forward to a “state-of-the-art” level, whilst managing the transfer of this knowledge towards real business application. I am in contact with several companies in a two-way relationship: how our scientific knowledge can help industrial use cases, and which applications raise questions that could feed our scientific research. Within the LIP6 laboratory, we have created in 2008 the Complex Network team, dedicated to the study of real-world relational data (like mobility, traffic, phone calls, transactions traces) using graph theory.


2.Detecting fraud in an open and digital world is challenging: how science can help? 

LN: From a data science perspective, fraud is a rare event, hidden in a large amount of data. To detect fraudulent activities, Bleckwen analyses all the data related to a transaction, using different techniques: ML models and Behavioural Analytics. We also use Graph theory to study the structure of relations and identify the links between the entities across transactions. Thanks to graph analytics, we can analyze multiple data sets across multiple transactions to aggregate the information not only at an account or a user level, but also using new variables such as destination, IP address, geographical places etc. We create links between entities to get a greater picture of the payment activities and detect suspicious activities.

Graph theory is useful but it has limits: the traditional approach of this technique poorly captures both temporal and structural nature of interactions. But fraud is a dynamic phenomenon and fraud patterns never stop evolving so traditional algorithms cannot keep pace.

Our challenge was to use a graph theory in a dynamic way and that is why we asked LIP6 who has developed a graph technique that studys link stream, including the temporal structure of interactions.


Graph Link streams


ML: Indeed, transaction data are in fact link sequences over time. So far, we could use two distinct approaches to analyse such data:

  • graph theory to study data as networks. In this case, as Leonardo mentioned, we are missing the time dimension in our analysis.
  • temporal analysis for a dynamic study of data over time but without taking in consideration the interactions between entities.

To properly analyse link sequences, we have developed a new dedicated formalism to cope with interactions over time, which we call stream graphs and link streams. It captures both structural and temporal information of interactions in a consistent way. This new approach is important for the research community as it merges two important scientific domains, which are so far quite independent. We strongly believe that the outcomes of our research on stream graphs and link streams will greatly benefit many industrial applications, such as fraud detection.

LN: Yes, thanks to this technique, we will be able to identify the link between entities taking in consideration the time and the speed to uncover signals that we may have previously ignored. This technique brings us time as a new dimension for a closer detection of the most sophisticated fraud patterns used in fraud and in AML/CTF.


3. Why was this collaboration started ?

ML: In 2017-2018, I published a reference article in the international journal Social Networks Analysis and Mining (SNAM) entitled “Stream Graphs and Link Streams for the Modelling of Interactions over Time”. This paper presents the foundations of our approach. Bleckwen was one the first readers of this paper and they contacted me after the release to discuss about it. I was excited to present this topic to a fintech specialized in fraud detection.

LN: Yes, I remember these passionate exchanges!  We quickly realized that our ambitions were aligned. Bleckwen wants to improve the detection capabilities of its fraud solution by integrating the latest cutting-edge technologies. LIP6 wants to make Science move ahead by applying its research to industrial use cases.

Our common objective is to clearly lead the scientific and technological advances required to make significant progresses in anomaly detection in financial transactions.


4. How do you collaborate?

ML: We have launched a joint laboratory FiT (link: http://fit.complexnetworks.fr) supported by the French National Agency for Research (ANR). This long-term collaboration means that we are working hand-in-hand to model and analyse financial transactions as link streams. We are currently developing and implementing the formalisms and algorithms, enabling the proper exploitation of this data. At LIP6 we are interested in applying our research to real financial data and to benefit from the fraud expertise of the Bleckwen’s team (ex: business understanding, fraud patterns knowledge, data interpretation).

LN: On our side, we are looking for the high-level expertise of the LIP6 team, especially in the new approach of dynamic graph modelling they have invented. The complementarity nature with our internal research is obvious: we adapt their feature engineering to our ML models to bring a new time-based dimensions to the analyses and reach a greater detection accuracy.

This collaboration enhances our research for identifying sophisticated fraud patterns especially in AML/CFT fraudulent activities.


5. What will be the benefits of this collaboration?

LN: Our customers will directly benefit from this collaboration with an improved quality in fraud detection thanks to an accurate and robust scientific approach. The pace of fraud and money laundering continues to accelerate along with the sophistication of attacks. Criminals are creative, collaborative, well-funded and technologically advanced.

As Bleckwen’s mission is to defeat financial crime and contribute to make the world a safer place, we need to collaborate!

ML: This collaboration is a great example of research-industry complementarity: a virtuous loop between fundamental research and real-life application with interesting issues to solve on both sides.

We expect our approach to become a game changer in the fight against fraud, which would greatly help to promote our scientific research: we are confident that our collaboration will have a strong impact on both science and financial crime detection!


Want to know more about our new research topic? Feel free to contact us:

Bleckwen contributing to the Open Source Community

Bleckwen ML algorithm optimization


In this blog post, we want to share with you the story behind our recent publication on Github. 

Bleckwen is an award-winning French Behavioural Analytics fintech, dedicated to helping Banks and Financial Institutions defeat fraud and financial crime and make the world just a little bit safer.  Our risk engine is built to score transactions using a combination of explainable machine learning models, rules and human in the loop decisioning. 


1. Fighting fraud in an open banking world

Detecting fraud in an open and digital world, where everyone wants an immediate experience means analysing large amounts of data at very low latencies to spot suspicious activities in real-time.

Operating in the core of a banks payment architecture requires lot of technical constraints and we need to ensure that we meetheir exacting service level requirements, and as our clients work in highly regulated markets, black box AI is not an option, therefore being able to make explainable decisions in real time, around the clock, with no downtime, at scale, is a big challenge. 

To meet these requirements we built our fraud detection engine from data ingestion to decision based on various elements: 

  • An open source streaming middleware, Confluent Kafka, to be able to process high-throughput and low-latency real-time data feeds
  • An open source processing framework, Apache Flink, to be able to distribute the load across servers, scale threads on-demand, and that guarantees high availability, fault-tolerance and integrity (exactly-once)
  • An open source Machine Learning winning algorithm, XGBoost, designed for distributed frameworks, able to handle high volumes of data, which with fraud is inherently highly unbalanced with explainable AI

2. Taking a new challenge   

In a changing payment landscape with new regulations coming into force (DSP2, real-time payment etc) and customers and businesses are adopting new and rapidly changing behaviours due to COVID and digitisation, banks work continuously to manage evolving risks whilst also trying to reduce friction, in a market that is becoming ever more competitive with the emergence of neo and challenger banks. So there is a growing need for increased resilience, performance and control whilst still containing costs.  

In a current engagement, a global Tier 1 bank we are working with asked us to help them address a new challenge: how to guarantee sustained throughput of thousands of transactions per second  in a specific latency window (below 200 milliseconds) whilst ensuring

  • full explainablity of decision,
  • highly accurate prediction,
  • low false positives
  • at a decent cost. 

We realized that our solution was not able to face this challenge, unless we scaled up the infrastructure which in turn lead to unacceptable total cost of ownership profile. So after investigation we determined that the bottleneck resided in the core of Machine Learning processing and specifically within the explainable AI computation process. Being opensource we also determined that the implementation of XGBoost for Java was using calls to custom system libraries adding significant CPU and latency overhead. 

So being the curios engineers we are, we looked at various different options, focusing on some pure Java implementations of XGBoost within the open source community such as the one done by Yelp xgboost-predictor-java  even this was not implementing explainable AI. 

So we needed to design and develop our own implementation of XGBoost. 


3. Meeting customer’s requirement   

The solution had to respect several technical constraints and requirements

  • pure JVM implementation (no use of System Libraries), 
  • interoperability with all JVM languages (Java, Scala, Kotlin)
  • compatibility with XGBoost models from versions 0.8 to 1.0 and be able to support future incarnations
  • as well as an implementation of explainability using the SHAP algorithm.  

We also re-designed the Decision Trees for Java, instead of keeping a C++ data layer like the ones used in XGBoost core. 

The overall performance gain was significant when implemented within a full stack within our solution framework (Kaka, Flink) as we were able to do 6 times the throughput on same hardware, and reduce by half the end to end latency which delivered the desired performance at the right cost in a deterministic fashion.  

On micro-benchmarks (done with JMH), the gain is less obvious but shows that scoring predictions throughput alone was doubled.  

Models comparision


This experience was a good example meeting the business requirements at the right cost and hitting our clients objectives. As part of this excercise it could be too easy to only focus on model training times or model accuracy without paying attention to real world inference latencies and throughput requiremnts of our client but at Bleckwen we always go further as great model accuracy is for nothing if deosnt solve real world problems.

At Bleckwen, we love working with the latest tech, building new models and modelling approaches – but most of all we love listenting to our clients needs, objectives and constraints to ensure we deliver the optimal solution.

So a big up to the Bleckwen engineering team and a big thanks to the Yelp engineering team for some great inspiations.

So we are more than happy to share this code on Github to make the community benefit from our work as we strongly believe that collaboration is key to improve software development so if there is any comments or feedbacks its more than welcome!

Please email us: engineering@bleckwen.ai  

Bleckwen tech team

International Women day 2020 - Discover Bleckwen tech team's angels.

On the International Women’s day, Bleckwen wants to spotlight 3 women from the Tech team, fully committed to make Bleckwen the market leading financial crime detection solution!


Nina, Data Scientist & software engineer at Bleckwen, is currently working on the Machine Learning models training and the platform developments.

1. What can you tell us about your professional background?

Once graduated from INSA (Toulouse – France), I started my career as a Data Science consultant to work on a wide range of projects and to quickly gain a wide experience. At that time, I co-founded the association “Paris DataLadies” meetups to promote women in the Data industry by offering them the opportunity to present their projects in front of an audience. Those meetups are a good way to facilitate exchanges and to encourage gender diversity in the working world, especially in technical and scientific fields.  After 3 years as a consultant, I realised that I needed to participate in a longer-term project and to invest myself in the development of a technically challenging product. I was especially motivated by joining a project that would that include Machine Learning in production. That’s why I joined Bleckwen 2 years ago…

2. What makes you get up each morning?

Every day at Bleckwen is a new challenge: we are moving apace. We face interesting technical challenges which include: new technologies, new languages, we reflect as a team on best development practices and about the lifecycle of a model in production. I also acknowledge that the Bleckwen team is playing a great role for me, because it is made up of brilliant and passionate people who have created a great working environment.

3. Women in Tech: a good choice?

From the very beginning, I chose a career in tech because I was passionate about Data Science. Looking back, it’s no regrets: everything evolves quickly, so you’re always learning when you embrace this sector.

4. The music you listen when you are working?


Best Of Tarantino Soundtracks 1992-1997




MANAR, Data Scientist & software engineer at Bleckwen, is working on Machine Learning models integration to the platform.

1. Why did you become a Data Scientist?

Since I was young, I was passionate about mathematics and then I discovered I also had a fascination for computer science. Data science was an evidence for me because it allowed me to marry my two passions. Moreover, it is a rapidly developing sector that covers multiple fields of application: many opportunities are open!

2. Why did you join Bleckwen?

I met Leonardo Noleto (nb: Head of DataScience at Bleckwen) during a recruitment event called DataJob in Paris, two years ago. We started a passionate discussion about the interpretability of Machine Learning models. The next day, I sent my application for an internship and finally, I was offered a position after that.

At Bleckwen, I really appreciate the importance given to research and innovation, especially in the field of Artificial Intelligence. The size of the company and its start-up mindset enables everyone to express his or her opinion, to actively participate in the decision-making process and the design of the solution. It is a very stimulating working environment.

3. Women in Tech: a good choice?

Absolutely! If I had to do it all over again, I wouldn’t change a thing! Science, unlike one might think, requires a certain form of creativity. The projects I’m working on are constantly evolving, you have to stay open and tuned in: I’m never bored and it’s very motivating!

4. The music you listen when you are working?


Wake up sister – Pavlov Stelar




WAFA, front end software engineer, is working on the user interface development.

1. What can you tell us about your professional background?

I have a degree in Telecommunication and Computer Network Engineering and a Master’s Degree in Computer Development. I love Computer Science and during my studies I worked as a Design Engineer at SIMSU-UGA before joining Bleckwen.

2. Why did you join Bleckwen?

I was impressed by the central place given to the customer’s voice in the platform development process. The Bleckwen solution was co-built with a leading Tier One bank (you may read on it “Designed for banks with banks“). This is not just a buzzword but a real commitment of the whole team to solve our customers’ problems and to support them in the use of the solution. So my work is meaningful for our customers and this is extremely rewarding and motivating for me! Last but not least, Bleckwen’s team is fantastic: highly talented and motivated colleagues who have created a very positive and dynamic work environment.

3. Women in Tech: a good choice?

Very early in my studies, I was attracted to technology. It’s a dynamic sector that opens up great prospects for evolution to everyone, as long as you love what you are doing! Don’t hesitate to take the leap!

4. The music you listen when you are working?

Best Relaxing Piano Studio Ghibli Collection


Interested in joining Bleckwen tech team? Drop us a line!



Bleckwen get the support of the European Regional Development Fund (ERDF)

Production ready Explainable Artificial Intelligence to fight against financial crime with the support of:

Bleckwen combines fraud prevention, Machine Learning algorithms and technical expertise to create a cutting-edge platform that moves Explainable AI from the lab to solve real-world business problems.

  • A complete decision pipeline to reveal the deep truth
  • High volume, low-latency detection
  • Open, modular architecture

This project is supported by the European Regional Development Fund (ERDF)

European AI Night : Why AI Should Be Explainable?

1. Introduction

Interpretable, or Explainable, Artificial Intelligence (“AI”) has turned into an important topic for those software vendors and users in today business world working within the space. As AI has increasing impact on day to day activities; trust, transparency, liability and auditability have become prerequisites for any project deployed at a large scale.

A workshop was organized on this theme at the 2019 European AI Night in Paris. Four French noteworthy AI players were welcomed by France Digitale and Hub France IA, to examine why they are now increasingly focused on Explainable AI (“XAI): BleckwenD-EdgeCraft.ai and Thales.

Three AI use cases, already running today in production, were displayed, demonstrating how explainable AI can be leveraged to make better, more efficient and usable tools for projects within corporations.

2. Presentation

Interpretability is about communication: it’s mandatory to know the end users’ activities and processes in order to adapt the presentation of the results to their needs 

Yannick Martel, Chief Strategist at Bleckwen

Created in 2016, Bleckwen is a French fintech, leveraging Behavioural Analytics and Machine Learning to help Banks and Financial Institutions to fight against fraud. Up until this point, the appropriation of Artificial Intelligence in a critical sector such as Financial Services has been limited. Yannick Martel believes that interpretability is a key success factor in AI adoption, as both experts, customers and compliance officers need to get a better understanding of the results of algorithmic models to establish a trustful collaboration with technology-based solutions such as Bleckwen’s.

A significant area for improvement is ensuring providers give the best clarifications to clients, choosing among all mathematically correct explanations those that match with their thought processes and activities. This is a key strength of the Bleckwen platform as outlined and illustrated by Yannick through the discussion. Another test, as Yannick clarified, is to ensure there is a clear explanation for decision making within the platform – clearly illustrating and helping illuminating factors leading to decisions which helps create understanding. In Bleckwen’s case, Yannick was able to illustrate how this thought process directed the design process and how it was fostering the building of trust in the ultimate detection processes.

Explanations are mandatory when AI empowers humans to perform complex tasks

Antoine Buhl, CTO @D-Edge

D-Edge offers SaaS solutions for lodgings and inn networks. 11 000 lodgings in Europe and Asia are using the D-Edge solution for optimizing their distribution. D-Edge uses Artificial Intelligence alongside with statistical models to improve rooms pricing and to make reservation withdrawals predictions.

Choosing the right price for a room is very complicated and requires the combination of numerous elements (room officially sold, costs of the contenders, nearby events, and so on) including external events which can’t be foreseen. Antoine Buhl took the example of the ongoing “Gilets Jaunes” crisis in France, started by the end of 2018, which brought on an unusual and significant rise in the cancellation rate for hotels. What seems to be an “AI bug” can be effectively analysed if the AI lets the revenue manager knows that he does not recognise those elements. Additionally, D-Edge faces another challenge: analysing, even after the occasions, if a room price was ideal or not, is nearly an unachievable objective in an ever-evolving environment.

D-Edge solution presents recommendations, but the final decision is the Revenue Managers’ job. To settle on the correct choices in this evolving and complex environment, Revenue Managers need clarifications of the suggestions. Adoption is key in this cooperation between humans and machines. At D-Edge, they measure how Revenue Managers use the recommended prices to constantly quantify this selection (both the nature of the suggestion and the nature of the clarification). To an ever-increasing extent, they see Revenue Managers giving the AI a chance to change independently the price proposal according to the clarifications and different parameters.

Without interpretability, predictions have no value

Caroline Chopinaud, CCO @ Craft.ai

Craft.ai offers Explainable AI as-a-service to empower product and operational teams to develop and run XAI projects. Craft.ai manages information stream to computerise business processes, enable predictive maintenance or boost user engagement. Caroline explained how Dalkia uses Craft.ai to improve the efficiency of their energy managers by providing them with detailed analyses and recommendations. Explainability is a prerequisite; without it, human specialists would need to reinvestigate to understand the results, thus invalidating the efficiency advantage. That is only one illustration among others of why explainability is a key for AI deployment and that is the reason why craft.ai builds up their own whitebox Machine Learning models!

When it comes to create AI for critical systems, trustability and certifiability are mandatory

David Sadek, VP Research, Innovation & Technology @ Thales

David Sadek presented the difficulties faced by Thales as they create AI for complex frameworks: space, communications, avionics, defence…

A key issue is building trust between machines and the people that collaborate with them. It is critical to consider how explanations are passed on: for instance, through a conversational interface ready to dialog in a natural language and using explanatory variables that matter to the operators. Another significant field where explainability is key is autonomous vehicles certification. While current algorithmic models’ calculations are secret, having the option to understand decisions will be critical to certify such systems: why an obstruction was perceived, why an identified shape was not viewed as an obstacle, etc. To this end, hybrid solutions consolidating effective but unexplainable deep learning techniques and symbolic AI reasoning are investigated at Thales.

3. Roundtable

The workshop finished up with a talk between the participants and the panellists on the key issues for interpretable AI.

The main issue raised was on the nature of the explanations: the fidelity of explanations and the trust people can have on them. The panellists pointed out that those two aspects are clearly connected.

Yannick Martel disclosed that because fraud is a complex phenomenon, especially regarding the number of meaningful features that need to be considered, Bleckwen decided to develop a dual methodology: forecasts depending on non-explainable AI models, combined with local explanations based on surrogate models. This approach helps provide efficient insights to the users. While creating the AI, Bleckwen verified that the forecasts did not miss genuine frauds and that the explanations made sense to business experts.

Caroline Chopinaud depicted an explainable-by-design approach where the same model is used for prediction and explanation – which means no gap between prediction and insights provided to users. To be really insightful, the algorithms have to work on business-meaningful features and combinations of features – not just any combination that “works” for the data scientists but those which talk to business experts. This is the reason for Craft.ai investment in natively interpretable machine learning algorithms. Evaluating whether an explanation is useful and understandable requires a feedback from users – no quantitative assessment is currently provided.

A comparable explainable by-design approach is also used by D-Edge, Antoine Buhl clarified, relying on various AI models. Since approving a recommended price is complex, D-Edge concentrates its KPI on the trust the revenue manager put in the suggestions and the clarifications, by following how regularly Revenue managers approve the suggested prices as it stands.

David Sadek ended the discussion by presenting the ethics issue in AI. For him AI ought to be evaluated on three dimensions: accuracy, interpretability and morality. For a long time, most of the AI players have focussed on the first aspect. The two others are critical when it comes to put AI in production, especially in complex systems. Explainability is mandatory to control and audit the ethics of an AI model, helping to spot bias for instance, yet it isn’t enough to guarantee an ethical behaviour.

4. Key Take Away

Explainable AI may be a concern that has arrived only recently in the spotlight, however for certain players in the field, it has been key for quite a while. It’s not by chance that those actors could run in production AI projects affecting key parts of organizations.

Explainability is not just another feature of those AI projects,

it is a critical factor in the decision to go live!

> Want to know more about explainability, a key  success factor in fighting financial crime?

Contact our experts: contact@bleckwen.ai

Key Take Away From The 2nd Fraud And Financial Crime Conference In London

Bleckwen were delighted to participate and contribute at what was a fascinating and high energy event in London this week.

During two days filled with presentations and panel discussions, a wide-ranging audience discussed the rapidly moving developments across technology landscapes, regulatory topics, debating the psychology of crime and, of course, looking into evolving criminal strategies. Participants left rich with an updated set of relevant market statistics, estimates and predications.

 Fraud: the 15th largest country on earth!

Businesses everywhere are now affected in the broad wave of financial crime in all its guises: 47% of businesses have been affected by financial crime within the last 12 months. It is estimated that $1.47tn is lost to financial crime globally, which is 5% of global GDP according to one estimate. This criminal community (call it “Gotham City” just without Batman to oversee things!!) would rank as the 15th largest country on earth. Staggering stuff!

5% of global GDP was lost to financial crime during the last 12 months  

At the same time, it was also noted how cultural and reputational issues meant that not all financial crime was disclosed to allow the broader financial community to effectively mitigate or organize against repeat, similar attack profiles or bad actors.

Regulation speeds up

During the conference, there were many engaging discussions around Bank’s levels of readiness in respect to the 5th Money Laundering Directive (5MLD). For a number of banks the pace and speed at which regulatory change was approaching was raising concerns.

Considering that the 4th AMLD is not fully deployed yet, the 5th AMLD will put additional pressure on financial institutions and require new processes to be implemented. As an example, they will have to use the Ultimate Beneficiary Owner (“UBO”) registries put in place by local authorities as well as report any discrepancies they identify between the information gathered from customers and that which is available in the official registry.

Given the pace of regulatory change and the corresponding new requirements to be placed upon financial institutions – the upcoming 5MLD was compared to the recent implementation of SEPA where, as an industry, there were varying degrees of bank preparedness as deadline dates approached and with a corresponding awareness that not everybody would be ready on time to meet the new obligations.

The Arms Race

One recurring theme throughout many discussions was an acknowledgement of the rising level of collaboration amongst criminal fraternities. Against this backdrop there was a recognition that the evolution in real-time payment platforms and corporate/bank strategies were changing the nature of threat. A corresponding response in solution technology approach and architecture is now required. The situation was characterised in one panel session as being akin to an ‘Arms Race’ developing between the criminal and the business/individual when it comes to Fraud, Anti-Money Laundering and broader financial crime.

With views shared from the senior experts within, for e.g. the Metropolitan police, large banks, technology providers and European regulators, it was recognized that there is increasing sophistication, speed and also threat level and as an industry and as a community we need to respond quickly.

The most effective counter-strategy will need to be federated and collaborative 


Leveraging technology and AI

David Christie, Bleckwen’s CEO, joined a key panel discussing the “Uses of technology of AI and Machine Learning to automate and increase fraud and AML detection”. In today’s faster moving and more dynamic space, models based on averages or generic rules are no longer to be considered best-in-class: the rules can be quickly subverted by an intelligent, well mobilized and connected criminal fraternity.  Also broad averages would not adapt and adjust to individual profile development and hence would throw out too great a field of unnecessary investigations leading to both operational, cost and client friction.

AI and ML are now being effectively deployed in the fight against financial crime and particularly fraud detection. Noting the processing power, detection capabilities, intuitive nature of Bleckwen’s technology and enhancements on basic ‘rules based’ platforms, David highlighted how Bleckwen’s solution is geared towards addressing both the authorized and unauthorized fraud processes.

A critical differentiator amongst AI based solution providers is the “interpretability” of the output: humans need to know on what elements the algorithm has based its decision (the contribution of each variable to the results). Interpretability allows them to make an informed decision in an efficient and reliable way. Bleckwen, as a company, believes “Interpretability” to be a key for effectiveness.


To effectively fight fraud, AI based solutions need Interpretability  


Another take away from this conference is that regulators were now placing increasing weight and focus on Interpretability of AI to help both understand but also further mobilise these critical, advancing technologies.

Interpretability Of Machine Learning Models – Part 2

In the previous article, we explained why the interpretability of machine learning models is an important factor in the adoption of AI in industries, and more specifically in fraud detection. (https://www.bleckwen.ai/2017/09/06/interpretable-machine-learning-in-fraud-prevention/ ).

In this article, we’re going to explain how LIME works. It’s an intuitive technique that we have tested at Bleckwen.

Before looking at LIME in detail, it is necessary to situate it among other existing techniques. In general, interpretability techniques are categorized along two axes:

  • Applicability: Model-specific versus Model-agnostic
  • Scope: Global versus Local

Model-specific versus Model-agnostic

There are two types of techniques:

  • Specific techniques: these techniques apply to a single type of model because they rely on the internal structure of a machine learning algorithm. Some examples of specific techniques are: Deeplift for Deep Learning, Treeinterpreter for models tree-based models like RandomForest, XGBoost, etc.
  • One of the biggest advantages of model-specific techniques is that they generate, potentially more precise explanations because they are directly dependent on the model to be interpreted
  • However, the disadvantage is that the explainabilty process is therefore attached to the algorithm used by the model and any change to another model can become complicated.
  • Agnostic techniques: these techniques don’t take into account the model to which they apply and they only analyze the data used from inputs and the decisions taken out. Examples of agnostic techniques are: LIME, SHAP, Influence Functions,
  • The main advantage of agnostic techniques is their flexibility. The data scientist is free to use any type of machine learning model because the explanation process is separate from the algorithm used for the model.
  • The disadvantage is that often these techniques are based on replacement models (surrogates models) which can seriously reduce the quality of explanations provided.

Global versus Local

The underlying logic of a machine learning model can be explained on two levels:

Global Explanation: It’s important to understand the model as a whole, then to focus on a specific case (or group of cases). The global explanation provides an overview of the most influential variables in the model, based on the data input and the predicted variable. The most common method for obtaining an overall explanation of a model is the computation of features importance.

Local explanations identify the specific variables that contributed to an individual decision, a requirement that is increasingly critical for apps using machine learning.

The most important variables in the overall explanation of an algorithm doesn’t necessarily correspond to the most important variables of a local prediction.

When trying to understand why a machine learning algorithm reaches a particular decision, especially when this decision has an impact on an individual with a “right to explanation” (as stated in the service provider obligations under the GDPR) local explanations are generally more relevant.

Case study for banks

Let’s take an illustrative case study to understand MLI (machine learning interpretability) techniques better:

The BankCorp Bank offers its customers a mobile application to lend money instantly. The loan application consists of four pieces of information: age, income, SPC (socio-professional categories) and amount requested. To respond quickly to its customers, BankCorp uses a machine learning model that assigns a risk score (between 0 and 100) for each case in real time. Cases with a score greater than 50 require a manual review by bank risk analysts. The image below illustrates the utilization mechanism of this model:

Scoring of credit applications with a black box machine learning model.

A BankCorp risk analyst believes that the score of Case 3 is strangely high compared to the demand characteristic and wants to obtain detailed reasons for the score. The BankCorp data scientist team use a complex black-box model, given the financial performance constraints of the market and can’t provide an explanation for each case. However, the model used makes it possible to extract a global explanation of the important variables according to the model (figure below):

Global interpretation of the BankCorp black box model.

The global interpretation of the model provides an insight into the logic of the model through the level of importance of each variable. The level of importance of a variable is assigned by the model during the learning process (training) but this doesn’t indicate the absolute contribution of each factor, in the final score. In our example, we can see that the requested amount variable, as expected, is the most important variable from the models point of view for calculating the score. Income and age variables are slightly less important while the borrower’s SPC doesn’t seem to affect the score too much.

Although this level of interpretation offers a first understanding of the model, it’s not sufficient to explain why Case 3 is twice as poorly rated as Case 1, when both ask for the same amount and have income and similar ages. To answer this question, we must use a local and agnostic method (since the model is a black box).

Understanding the decisions made by a machine learning model with LIME

LIME (Local Interpretable Model-Agnostic Explanations) is an interpretation technique, applicable to all types of models (agnostic) that provides an explanation at the individual level (local)
. It was created in 2016 by three researchers from the University of Washington and remains one of the most known methods.

The idea of ​​LIME is quite intuitive: instead of explaining the results of a complex model as a whole, LIME will create another model, simple and explainable, applicable only in the vicinity of the case to be explained. By vicinity we mean the cases close to the case that we want to explain (in our example Case 3). LIME’s mathematical hypothesis is to demonstrate that this new model, also known as the “surrogate model” or replacement model, approximates the complex model (black-box) with good precision, in a very limited region.

The only prerequisites for using LIME is to have the input data (cases) and for it to be able to ask the black-box model as many times as necessary to know the scores. LIME then carries out a kind of, “reverse engineering” to reconstruct the inter logic workings around the specific case.

To do this, LIME will create new examples for the case slightly different from those you want to explain. This consists of changing the information in the original case, one at a time, and presenting it to the original model (black-box). This process is repeated a few thousand times depending on the number of variables to be modified. This process is known as “data perturbation” and the modified cases are called “perturbed data”.

Eventually, LIME will have set up a database of “local” labelled data (i.e., case → score) where LIME knows what it has changed from one case to another and the decision issued by the black-box model.

Construction of the training database from the case to be explained by the data disruption process.

From this database of cases similar to the one that we want to explain, LIME will create a new machine learning model, that is simpler but explainable. It’s therefore this new model of “replacement” that is used by LIME to extract the explanations.

Creation of the replacement machine learning model created by LIME

The figure below shows the explanation of the score provided by LIME for Case 3. The variable SPC and the amount requested contribute to a high score (+49 and +29 points respectively). On the other hand, the age and income variables reduce the risk score of demand (-6 and -2 points respectively). This level of interpretation highlights that for this particular case, the variable SPC is very important, contrary to what one could expect by looking only at the global interpretation of the model.

Therefore, the risk analyst would now be able to understand the particular reasons that led this case having a poor score (in this case SPC equal to craftspeople). The risk analyst then could compare this decision with their experience to judge whether the model responds correctly to the bank’s granting policy or if it’s biased towards a population.

Explanation of the score for Case 3

In its current version, LIME uses a linear regression (Ridge Regression) for building the replacement model. The explanations are therefore derived from the regression coefficients, which are immediately interpreted. It should be noted, that some of the concepts explained here differ slightly in LIME’s Python implementation. However, the idea presented makes it possible to understand the intuition of the technique as a whole. This video made by the author of the framework offers a little more understanding in the operation details of LIME.

The official implementation of LIME is available in Python. There are also other frameworks that offer LIME in Python (eli5 and Skater). A port in R language is also available here.

Advantages and disadvantages of LIME

At Bleckwen, we were able to test LIME with real data and in different case studies. From our experience, we are able to share with you the following advantages and disadvantages:


  • The use of LIME means that the data scientist doesn’t need to change the way they work or the models deployed to make them interpretable.
  • The official implementation supports structured (tabular), textual, and image (pixel) data.
  • The method is easy to understand and the implementation is well documented and open source.


  • Finding the right neighborhood level (close cases): LIME has a parameter to find the right neighborhood radius. However, its tuning is empirical and requires a trial and error approach.
  • The discretization of the variables: the continuous variables can be discretized in several ways. We found that the explanations were highly unstable, depending on the parameter used.
  • For rare targets, which are common cases in fraud detection, LIME gives rather unstable results because it’s difficult to perturb new data sufficiently to cover enough fraud cases.
  • Time consuming: LIME is a little slow at computing explanations for the results (a matter of seconds). This prevents us from using it in real time.


The interpretability of machine learning models is a blooming market where much remains to be done. Over the past three years, a growing number of new approaches have been seen and it’s important to be able to identify them according to two main axes: its application – Agnostic vs. Specific methods and its scope of interpretation – Global vs. Local.

In this article, we have introduced LIME, a local and agnostic technique created in 2016. LIME works by creating a local model from the inputs and outputs of the black-box model and then deriving the explanations from a replacement model, which is easier to interpret. This model is only applicable in a region well defined by the vicinity of the case that one wants to explain.

Other techniques like SHAP and Influence Functions are also promising because they are based on strong mathematical theory and will be the subject of a future blog post.

AI Books Summer Reading List

At Bleckwen we love reading books! For your summer break, we’ve compiled this list our favourite AI books. We are happy to share it with you. Enjoy!


Author: Cathy O’Neil

Cathy O’Neil (a Harvard PhD graduate in mathematics) has worked as a professor, hedge-fund analyst and data scientist. She founded ORCAA, an algorithmic auditing company. In her book, she explores how algorithms can threaten many aspects of our lives if they are used without control.

Algorithms, rule-based processes for solving mathematical and business problems, are being applied to a wide/large variety of fields. Their decisions directly affect our daily life: at which high school can we register? which car loan can we get? how much do we have to pay our health insurance?

In theory, as mathematics are neutral, we would say that it is fine: everyone is judged according to the same rules. But in practice, algorithm decisions can be biased because the models widely use today are opaque and unregulated. They provide only black box decisions: nobody can explain the logic and the reasons that lead an algorithm to produce its result. Thus they cannot easily be challenged or audited. Can we let models rule parts of our lives and shape our future?

Cathy O’Neil calls on data scientists to take more responsibility for their models and governments more regulations on their use. By the end, it is any citizen who should be savvy about the use of their personal data and the algorithm models that govern our lives.


Why we love this book: we enjoyed O’Neil actual description our present world and the weaknesses she points out, weaknesses that will expand with the increasing use of AI. At Bleckwen, we believe that Interpretability is necessary to create a trustful collaboration between Human and machine.

On the same subject, you can also take a look at our post: Interpretability, The Success Key Factor When Opting For Artificial Intelligence.


Author: Jerry Kaplan

Jerry Kaplan is a Silicon Valley serial entrepreneur and a pioneer in tablet computing. At Stanford University, he teaches ethics and the impact of artificial intelligence at the Computer Science Department.

In his book, Jerry Kaplan looks at the profound transformations Artificial Intelligence technologies are already bringing to our society and their consequences. Kaplan warns about a future growth driven more by assets than by labor, as AI through automatization decreases the value of labor. One possible consequence to the rise of AI would be unemployment and a broader income disparity.

Another consequence could be the risk to get a part of our economy under the control of algorithmic systems. This could happen if we decide to create cybernetic persons with the right to sign contracts and own property: this would grant them high autonomy with a limited capacity of control.

Sidestepping from techno-optimism, Kaplan exposes the necessary regulatory adaptations of our society to artificial intelligence in order to ensure a prosperous and fair future. It is important to tackle some of the moral, ethical, as well as political issues created by AI before it is too late.


Why we love this book:“Science without conscience is but the ruin of the soul” – and technology without politics could bring ruin to society! This book presents the promise and perils of Artificial Intelligence. AI has many individual and societal benefits but also significant risks if any ethical and political reflection is not established. We appreciate Kaplan’s future vision and deep reflexions on AI.

SUPERINTELLIGENCE: paths, dangers, strategies

Author: Nick Bostrom

Nick Bostrom is professor in the Faculty of Philosophy at Oxford University, specialized in foresight, especially that concerning the future of humanity. He is also the Director of the Future of Humanity Institute and of the Programme on the Impacts of Future Technology within the Oxford Martin School. He is the author of some 200 publications.

What happens when machines surpass humans in general intelligence?  This new superintelligence could become extremely powerful and possibly beyond our control.

In his book, Nick Bostrom is laying the foundation for understanding the future of humanity and intelligent life.

A superintelligence agent could arise from the extension of technics we presently use today such as: Artificial Intelligence, neuron emulation, genetic selection…. Nick Bostrom calls us to engineer initial conditions to make this superintelligence compatible with human survival and well-being. It seems necessary to solve the “control problem” of this intelligence, to ensure we develop the control mechanisms with growing capabilities in parallel with the capability of this intelligence. The author distinguishes two broad classes of potential methods for addressing this problem: capability control and motivation selection.


Why we love this book: at Bleckwen, we believe that debates about the future of AI and Machine Learning are very important for society. We love Bostrom’s practical vision of the potential risks entailed by the development of this superintelligence. In order to keep this superintelligence in our Humanity, he recommends research be guided and managed within a strict transparent and ethical framework. He calls for a collective responsibility.


Author: William Gibson

We could not forget in our list this multi-awared book written in 1984 by the American-Canadian author William Ford Gibson. This well-known science fiction novel spawned the cyberpunk movement, a rather bleak vision of our future.

The novel tells the near-future story of Case, a washed-up computer hacker hired by a mysterious employer for one last job against a powerful corporation. Case and his cohorts will have to fight against the domination of a corporate-controlled society by breaking through the global computer network’s cyberspace matrix.

When Neuromancer was published in the early 80’s, only around 1% of Americans owned a computer and most people were unaware of the potential of the networked computing.  Gibson not only conceived of a credible evolution of virtual reality, but had already anticipate the kind of hacker culture that would emerge as the dark side of the web.


Why we love this book: we love this book because we love science fiction! It has an astounding predictive power and keeps us on our toes. This novel also challenges our assumptions about our technology and ourselves. Beyond the story, this books raises up a lot of ethical, philosophical and legal questions around the theme of control (and how to escape from it).

Using Graphs To Reconstruct Catalan Crisis Events With Tweets.

In 2017, over 330 million people around the world use Twitter each month to comment and react instantly to events. Over the same year around 180 billion of tweets have been sent by users! In autumn 2017, Spain experienced one of the most important social event since the Spanish Civil War: the Catalan independence referendum. Throughout the crisis, the population has widely used twitter to react to events in real time. This major event in the recent history of Spain received a massive media coverage as well.

At Bleckwen, we believe data and analytics can be used to answer challenging questions in many fields. In a daily basis, we use these techniques to fight fraud and protect our clients. Thus we decided to apply similar techniques on a completely different field: we asked ourselves if we could reconstruct the timeline of Catalonia crisis and correctly identify the main events using only twitter metadata, i.e. without analyzing the content of tweets.

Our goal: use the power of analytics to answer two questions:

  • Can we identify major events during the crisis from the metadata of the tweets, and then reconstruct the timeline of events?
  • Are we able to detect important events that have not been covered by traditional media?

On the morning of 27th October 2017, the Spanish Senate gives full powers to the head of government, Mariano Rajoy. The latter can now put Catalonia under guardianship. On the same day, in the afternoon, Carles Puigdemont, President of the Generalitat of Catalonia, proclaims the independence of the region, following the results of the referendum. Spain is experiencing a major crisis in its history.

Could we reconstruct the events that marked this crisis using only Twitter metadata?

To study the Catalan crisis, we collected via Twitter’s API Stream all tweets sent from October 3rd to November 6th, 2017, written in Spanish, Catalan, Galician and Basque. We filtered only tweets containing the words [catalogne, catalunia, catalunya, etc.]. We created a data set of 824 influential tweets and their 1 million retweets.

In the same time, we manually listed the dates of the 18 major events of the Catalan crisis (see image below) covered by 8 major traditional media: BBC, The Independent, The Local, Fox News, NBC, Euronews, US News, and Politico.eu.

Tweets are reactions to real life events

People tend to use Twitter in different ways: to share their moments, their ideas or just to post their cat’s photo!  During important events like social movements, World Cup or US election, tweets are the voice of people in reaction to what they see, feel or experience in the real life.

Based on this assumption, we collected all tweets of a given population in a delimited period of time. Then we tried to classify tweets reacting to a single event in different clusters. The result of this categorization is what we called “a set of an event abstraction”. Example of tweets clustering during the Catalonia crisis:

However, in order to group tweets together with analytical methods, we need to measure how close a tweet is to another. So we have to define a similarity metric. One could analyze the content of the tweets and assess if they talk about the same event. But as we like challenges, we tried to do this without content analysis!

Defining similarity of tweets with no content analysis

In order to compute the similarity between tweets we first need to understand two important concepts:

  1. The co-occurrence: tweet A and tweet B have been sent close in time
  2. Co-retweeting: both tweets have been retweeted by the same people

We can now state that the similarity between two tweets is defined by the product of these measures as illustrated in the figure below:

In other words, two tweets are more similar as they were sent near in time and as same users retweeted both of them.


Clustering tweets with a graph approach

Now we have a good way to measure how similar one tweet is to another, it is time to group a number of them together and discover the event they are correlated with. We use for that a data structure called Graph.

Graph is a powerful concept and widely used in many fields like chemical, security, fraud prevention and the most known, social networks.

According Wikipedia’s definition:

Graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense “related”. The objects correspond to mathematical abstractions called nodes and each of the related pairs of nodes is called an edge.

In our case, the nodes of the graph are the tweets we collected and the relation between them (edges) are the similarity with each other computed, according to the definition above.

We then applied a community detection algorithm called Walk Trap to the graph of Catalonia we have built. The assumption behind this approach is that each detected community correspond to a cluster containing homogeneous tweets linked to a specific event happened during the crisis. That is what we have called “the abstraction of an event”.

Is our model able to identify the events covered by the media?

We applied our approach to tweets sent between October 3 and November 6, 2017 and we found 34 sets of events’ abstraction. Are these 34 sets of events relevant ? Do they match with events that really happened in the real world ?

Remember that we assumed tweets inside a same cluster should be homogeneous because they have been sent in reaction to a same event. However, assessing the homogeneity of an event abstraction is a quite subjective task. It implies to look at the content of each tweet and judging if the majority of tweets that compose the abstraction are related to the same subject.

We manually reviewed the content of the tweets of each abstraction.

Here are the results:

Here is an example to visualize an “event abstraction”. Abstraction e34: 100% of the 18 tweets that compose this abstraction are linked to the event “Demonstration for the Union on October 8th, 2017″.


Now that we have a reasonable way to assess the relevancy of events found by our model, we are able to answer our first question. Again, our model uses only the tweet’s metadata to find events. The figure below shows that 12 events are correctly identified by our model among 18 covered by traditional media:

Of the 6 events not found by our model, 3 happened on the same day as a very large event. For example, the model does not identify two small events listed on October 3rd. However, we note that this same day there was a massive demonstration in Barcelona “against police’s violence” happening on the day before.

Partially recovered events refer to events that are not directly identified. For example, on 3rd and 5th November, two events are covered by media:

  • the arrest warrant required against Carles Puigdemont
  • Carles Puigdemont’s submission to the Belgian authorities

We noticed the model combined these two events into a single “three-day” event that could be defined as the “Puigdemont leak”.


Is our model able to find significant events not covered by the media?

Here again, our results are quite interesting: the model identified reactions to 11 events that are mostly less intense compared to those reported by the media. Among the events detected, we find for example:

– the publication of a press article;

– a media debate about the real or supposed indoctrination of children in Catalan schools;

– the agreement in principle between the PP (Partido Popular – People’s party) and the PSOE (Partido Socialista Obrero Español – Spanish Socialist Workers Party) on the organisation of new elections;

– the announcement of a demonstration in Brussels;

– the broadcast of the YouTube video HELP CATALONIA which denounces a “fascist Spanish state”.

The detection of minor events not mentioned by media allow a more accurate understanding of the events. Moreover, our model complements traditional media coverage of events.

Our model detects the abstractions of 12 of the 18 events mentioned by the 8 media in the first month of the Catalan crisis. It also detects 11 other “minor” events.



We have shown that it is possible to apply analytical methods to Twitter’s metadata to reconstruct a timeline of events occurring in the real world. Our approach is based on tweets’ metadata analysis (i.e. no content analysing) and graph models.

The presented model allowed us to correctly recover 12 of the 18 main events of the Catalan crisis covered by 8 media. The 6 undetected events are relatively minor.

In addition, our graph based approach identifies 11 additional events of a relatively low intensity and therefore more difficult to detect. This enable an additional understanding to the chronology of events carried out by the media.

As next step, we would like to develop a real-time version of our model. We look forward to see you for our next challenge!

On October 20th, 2017, an agreement in principle was signed between the PP and the PSOE concerning new elections to be held in Catalonia. This event has not been listed by the 8 media, nor even included in Wikipedia’s Catalan Crisis article. Our model detected this event !

Interpretability, The Key Success Factor When Opting For Artificial Intelligence.

According to Yannick Martel, Managing Director of Bleckwen, a fintech specialized in Artificial Intelligence for fraud prevention, interpretability is today a major stake to ensure a broad adoption of Artificial Intelligence. 

First of all, could you explain what interpretability is?

Interpretability is the ability to explain the logic and the reasons that lead an algorithms to produce its results. It applies at two levels:

  • Global: to understand the major trends, i.e. the most significant factors
  • Local: to precisely analyze the specific factors that contributed to the machine’s decision for a group of closely-related individuals

How does it work?

The interpretation of a model is obtained by applying an algorithm which will explain the contribution of each variable to the results.Imagine that you enjoyed a delicious black forest cake bought in a bakery (= a black box, eg. a model with opaque internal operations). If you want to make this cake at home, you will gather the ingredients (the data), follow the recipe (the algorithms) and you will get your cake (the model). But why is it not as good as the one from the bakery? Although you have used exactly the same ingredients, you probably lack the chef’s tips to explain to you the reasons why certain ingredients, at certain stages of the recipe, are important and how to combine them!

In this example, interpretability techniques will allow you to discover the chef’s tips.

There are two types of interpretability techniques:

  • the model-agnostic techniques: these techniques do not take into consideration the model to which they are applied and only analyze the data used in input and the decisions (ex: LIME: Local Interpretable Model-Agnostic Explanations, SHAP: Shapely, etc …)
  • the model-specific techniques: they rely on analyzing the inherent architecture of the model that one wants to explain to understand it (eg. for Deep Learning: Deeplift, for random forests: Ando Saabas, etc.). Both techniques can explain provide local and global model interpretation.

Could you give a concrete example of the application of interpretability?

At Bleckwen, we indifferently apply the techniques of the two types, depending on what we aim to explain. For example, for a customer’s credit request, we shall look for the reasons of its scoring by combining agnostic and specific techniques, at the local level. On another level, a global interpretability makes it possible to understand the overall logic of a model and to check for the variables deemed as important (for example, to ensure that an explanatory variable does not contain “too much” information, which is usually suspect …).

Why this need for transparency towards machine learning models?

AI is starting to be used to make critical or even vital decisions, for example in medical diagnosis, fight against fraud or terrorism, autonomous vehicles … At Bleckwen, we apply AI on sensitive topics of safety to help our customers to make decisions that are often difficult to take (eg. accepting a 40,000 euros credit granting or validating a 3,000,000 euros transfer towards a risky country). The challenges of avoiding an error and understanding a decision, or helping an expert to confirm a decision, are all the more important.

Our systems interact with human beings who ultimately must be in control of their decisions. The human being needs to understand the reasoning followed by our algorithms, to know on what elements the algorithm has based its decision. Interpretability allows them to make an informed decision in an efficient and reliable way.

How does interpretability become a societal and political issue?

Recent events show a growing concern about the use of personal data. The entry into force of the GDPR in May this year, is an important step for the data protection of European citizens. It also requires companies to be able to justify algorithmic decision making. The techniques to understand the algorithms have therefore become critical. Also in the United States, many people are questioning the “Fair use” of the algorithms’ data so as to structure their uses. This is what Cathy O’Neil, a renowned mathematician and data scientist, suggests on her blog (https://mathbabe.org) and in her book “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy” by inviting us to be careful in the trust we place in Big Data and algorithmic decisions.

The Cambridge Analytica case will certainly contribute to reinforce this trend. Algorithmic decision-making becomes a major societal, political and ethical subject.

The adoption of AI will not happen without transparency.

At Bleckwen, we have made it a major focus of our offer and our technological developments.



Do you want to understand better the topic of interpretability?  Go on reading :