Social Engineering And Its Consequences.

While the media highlight increasingly sophisticated cyber-attacks, decision-makers could easily forget that humans are one of the main weak links of IT security. According to the latest IBM – Ponemon Institute report published in 2016, 25% of data leaks are due to human error or negligence.

Social engineering is about exploiting human weakness to obtain goods, services or key information.

Social engineering existed before the digital era. For example, during the 2000s, organized scammers used personal information available in Alumni directories to impersonate alumni of a prestigious university and extract money from their fellow classmates.

There is no need today to use malware or ransomware to access personal information: it is readily available on social media such as Facebook and LinkedIn. A white paper published by Alban Jarry in 2016 shows that 43% of people accept strangers on their LinkedIn network[1].

The president of a French bank recently showed us the Facebook profile of an individual allegedly working at the bank and trying to get in touch with clients: fake profile, fake identity obviously … In the same manner, how do you know who is behind the LinkedIn profile inviting you?

These “simple” techniques allow fraudsters to deceitfully obtain key information about a payer, a supplier… and subsequently impersonate them to initiate fraudulent wire transfers.

According to Grand Thornton, at least 3 out of 4 companies were targeted by fraud attempts over the past two years. If 80% of all attempts are failures, successful attacks can cause damages upward of $10 million.

$2.3 billion were stolen from businesses between 2013 and 2016, according to the FBI, and the number of victims identified in 2015 increased by 270%[2].

The phenomenon is significant, and companies have begun to build walls to contain it, implementing behavioral measures (e.g.: paying attention to corporate data published on personal social media, refraining from clicking on suspicious e-mails originating from unknown parties…), business processes to improve internal controls, etc. But these measures are not sufficient, even if correctly applied, because they still rely too much on the humans. This is the reason why new solutions are emerging, based on machine learning and big data processing. They automate more and more effectively the process of detecting attacks and fraud, in addition to human activities and processes.

You will find out more by reading our next post!



Fighting Fraud : From Big Data To Fast Data.

Credit card fraud is the most visible type of consumer fraud: According to The Nilson Report, global damages caused by credit card fraud reached 21 billion dollars (18.4 billion euros) in 2015. Less known to consumers, wire transfer fraud (see catches the attention of banks seeking to protect their customers, as one single such attack may siphon millions. It is important to realize that the system managing wire transfers is critical to the proper operation of the economy of a country. This system cannot be subjected to major breaches.

Traditional protection methods involve the implementation of expert rules and manual controls to identify and verify the most suspicious operations, but negatively impact the customer journey.

Machine Learning is a good candidate to improve the level of protection while reducing friction and manual processing during this journey.

During the design phase, creating models requires cold data analysis, in particular to build and choose variables that will reveal specific fraud patterns. Machine learning train on the model using data history. This step uses technologies that are specific to cold processing (batch).

If this part is essential, it is also necessary to consider very early on how the model will be deployed and used on “hot” data. To be effective, fraud fighting tools must be implemented on large data streams but must also be able to minimize processing delays for each wire transfer. New legislation related to instant payments further increases the requirements as to processing speed (less than 20 seconds to fully process a wire transfer[1] and a few hundred milliseconds to detect fraud). Fraud detection systems must operate in this context, which therefore requires designing a specific architecture, supported by appropriate technologies.

The main challenge of implementing a fraud detection system is the operational capacity to manage the flow of wire transfers, during peaks in particular. A fraud detection system must therefore meet at least the following requirements:

  • Comply with delay limitations per wire transfer and debit operation
  • In case of failure, switch to a second system (simple rules or automated approval) so as to not disrupt the complete chain
  • Maintain the integrity of the wire transfer chain (no duplicates or missing wire transfers)

The below diagram provides a macro view of the processing chain required for credit scoring.

The processing chain indicated in red must be completed in less than 20 seconds. To ensure this, some of the calculations must be performed offline.

  1. Fetching data history: Variables identifying fraud must be able to distinguish between “legitimate” and fraudulent wire transfers. Based on customer habits, old and recent history, these variables are therefore often queried. Old data history can usually be pre-calculated since it characterizes phenomena observed over long periods of time, with little variation. Recent data history must sometimes be calculated on the fly, depending on the observed time scale.
  2. Querying the pre-trained model: The time required for the prediction is generally negligible compared to the time required to train on the data model. This training is therefore also performed upstream.
  3. Interpretation: Analysis and decision assistance is an essential part of an effective fraud detection system, as an effective control call is characterized by precise indications given to the customer, because the risk of authorizing a detected fraud is real. Identity theft associated in cases of social engineering sometimes places the payer in a situation of trust (usual supplier, request from management), even when the alert is given.

To implement this processing chain, the requirement for streaming technologies (Fast Data) is added to existing big data requirements. There is a real technological challenge to providing tools that meet the level of reliability required by the banking industry, and support for recent innovations such as instant payments.

Our next blog post will take an in depth look at these technologies!



Fraud And Interpretability Of Machine Learning Models – Part 1

Interpretability: the missing link in Machine Learning adoption for fraud detection

Machine learning methods are increasingly used, especially in anti-fraud products (developed by Bleckwen and other vendors) to capture weak signals and spot patterns in data that humans would otherwise miss.

If the relevance of these methods for fraud detection is widely recognized, they are still mistrusted in certain industries such as banking, insurance or healthcare, due to their “black box” nature. The decisions made by a predictive model can be difficult to interpret by a business analyst, in part because the complexity behind of the calculations and the lack of transparency in the “recipe” that was used to produce the final output. Therefore, it seems quite understandable that an analyst having to make an important decision, for example granting a credit application or refusing the reimbursement of healthcare expenses, is reluctant to automatically apply the predictive model output without understanding the underlying reasons.

The predictive power of a machine learning model and its interpretability have long been considered as opposite. But that was before! For the past two or three years, there has been renewed interest from researchers, the industry and more broadly the data science community, to make machine learning more transparent, or even make it  “white box”.

Advantages of Machine Learning for fraud detection

Fraud is a complex phenomenon to detect because fraudsters are always a step ahead and constantly adapt their techniques. Rare by definition, fraud comes in many forms (from the simple falsification of an identity card to very sophisticated social engineering techniques) and represents a potentially high financial and reputational risk (money laundering, financing terrorism…). And, on top of that, fraud mechanism is known to be “adversarial”, which means that fraudsters are constantly working to subvert the procedures and detection systems in place to exploit the slightest breach.

Most anti-fraud systems currently in place are based on rules determined by a human because the derived results are relatively simple to understand and considered transparent by the industry. As a first step, these systems are easy to set up and prove to be effective. However, they become very difficult to maintain when the number of rules increases. With fraudsters adapting themselves to the rules in place, the system requires additional or updated rules, which makes the system more and more complicated to maintain.

One of the perverse effects is a steadily degradation of the anti-fraud defense. The system ends up becoming too intrusive (with rules capturing the specificities of data), or conversely, too broad. In both cases, it has a negative impact on good customers because fraudsters know how to perfectly mimic the “average customer”. It is a well-known fact for risk managers: “The typical fraudster profile? My best customers!”

Tracking fraudsters is therefore a difficult task and often causes friction in the customer experience, which generates significant direct and indirect costs.

As a result, an effective detection system that is not very intrusive and detect the latest fraud techniques must address considerable challenges. The machine learning is proving to be an effective solution to get around this problem.

Moreover, with the latest interpretability techniques, business analysts can be showed with the reasons that led the machine learning algorithm to emit one input or another.

Interpretability, why is it important?

In a general way, machine learning is becoming ubiquitous in our lives, and the need to understand and collaborate with machines is growing. On the other hand, machines do not often explain the results of their predictions, which can lead to a lack of confidence from end-users and ultimately hinder the adoption of these methods.

Obviously, certain machine learning applications do not require explanations. When used in a low-risk environment, such as music recommendation engines or to optimize online advertisements, errors have no significant impact. In contrast, when deciding who will be hired, braking a self-driving car or deciding whether to release someone on bail, the lack of transparency in the decision raises legitimate concerns from users, regulators and, more broadly, society.

In her book published in 2016, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Cathy O’Neil, a renowned mathematician and data scientist, calls on society and politicians to be extremely vigilant about what she defines as “the era of blind faith in big data”. Among the most denouncing flaws, she highlights the lack of transparency and the discriminatory aspect of the algorithms that govern us. Techniques to understand decisions made by a machine have become a societal issue.

Interpretability means that we are able to understand why an algorithm makes a particular decision. Even if there is no real consensus as to its definition, an interpretable model can increase confidence, meet regulatory requirements (eg GDPRand CNIL, the French administrative regulatory body), explain the decisions to humans and improve existing models.

The need for interpretable models is not shared by all leading researchers in artificial intelligence field. Critics suggest instead a paradigm shift in how we can model and interpret the world around us. For example, few people really worry today about the explainability of a computer processor and trust the results displayed on screen. This topic is a source of debate even in machine learning conferences like NIPS.


Fraud is a complex phenomenon to detect, and the use of machine learning is a strong ally to fight it effectively. Interpretability favors its adoption by business analysts. The emergence of a new category of techniques in the last two years has made the interpretability of machine learning more accessible and directly applicable to AI products. With these techniques, we can now obtain very high predictive power without compromising their ability to explain the results to a human. In our next blog post, we will explain how techniques such as LIME, Influence Functions or SHAP, are used in machine learning models to bring more transparent decisions.

Further reading:

Miller, Tim. 2017. “Explanation in Artificial Intelligence: Insights from the Social Sciences.”

The Business Case for Machine Learning Interpretability

Is there a ‘right to explanation’ for machine learning in the GDPR?