Pitney Bowes Detects Fraudulent Orders through Machine Learning Predictive Models (Part 1)

5 Jun 2022

There is an apparent need to go along with the abrupt changes made in industries, even for corporations with a long history. Do you remember Motorola, the first-ever company to develop a mobile phone, once known as the trendsetter?

The nostalgic Motorola Flip Phone used to have the highest sales record until the appearance of the iPhone. 

If you have ever set your foot into the marketing world, it’s impossible to have not heard of 3 representative cases that have noticeably failed in the industry: ‘Motorola’, ‘Nokia’, and ‘Blackberry’. These cases imply that even the leading corporations within the industry could fall behind mercilessly if they fail to read the trends and current stream of ICT(Information and Communications Technology). On the other hand, there still exist companies that run for over a century keeping track of everything. 

Commerce Technology corporation in the US, Pitney Bowes

Pitney Bowes, founded in 1920, is a representative e-commerce technology company in the US. Once having started with postage meters, now it provides commercial solutions and business commerce technologies to more than 500 companies. Thanks to its unceasing efforts to adjust to the rapidly changing world, it has lasted successfully for a long time.

Pitney Bowes has a good machine learning use case applied to solve the faced problem. It was to detect fraudulent orders made by those disguised as customers, using Google’s AutoML service. Let’s dive in to seek how Pitney Bowes has prevented a massive financial leak by distinguishing suspicious orders from overseas.

Pitney Bowes provides distribution and logistics companies optimal solutions, which means customer data are used a lot in this process.

The Top Factor in delivery, ‘Precision’ and ‘Speed’

Hurrying swims in our blood; Koreans can’t help but hustle even when it comes to deliveries

Forever waiting for your package to arrive and keeping track of the situation through online service are common scenes in internet shopping (I am currently scavenging for any trace of delivery updates on my new pair of sneakers ordered online last week). There are technological limits for distribution companies to offer customers detailed updates of delivery, which is exactly why and where Pitney Bowes has come out to the surface with its solution, hand-in-hand with Google’s autoML Tables.

Pitney Bowes is a global technology company that provides e-commerce, delivery, mail, finance, and service solutions. It takes charge of the initial steps of sorting the loads in the warehouse in place of distribution companies and to the final step of the delivery. The quantity of information handled throughout the service is bound to expand day by day due to expanded shipping overseas. Along with the increased number of deliveries across borders, the market becomes vulnerable to fraudulent orders. It’s way more challenging to spot 10 abnormal cases out of 10,000, compared to detecting a single anomaly out of ten. Pitney Bowes’s role was to establish a rule in this process and smoothen the delivery system so that the products could safely reach their destinations. The project started from the ideal of detecting all fraudulent orders from abroad.


Fraud has increased enormously over the year 2016-2017.


Google AutoML Table detects any fraudulent orders

Online shopping has grown incredulously online markets of America have expanded in tremendous volume, with online sales records between 1st to 23rd of April having increased by 49% compared to that of the previous month from 1st to 11th. Especially grocery sales have shot up by 110%.

In that sense, Pitney Bowes’s trial of incorporating Google’s AI machine learning model to ease out the delivery process is a wise one. As a matter of fact, Pitney Bowes has been able to accomplish building the desired delivery model within 2 weeks, which would have taken months if not for the AutoML Tables. Not just the expenses but an enormous amount of time has been saved in advance. If this isn’t proof of investment value, what is?

Google autoML Tables service definitely reminds us of a somewhat similar layout: the DAVinCI LABS settings.

What on earth is ‘Tabular Data’?

Tabular data literally means data in a form of a table chart.

Google AutoML Tables enables the operation of machine learning models with this tabular data. Tabular data refers to data in a form of a table chart, consisting of rows and columns. The most common example is CSV form with spreadsheets, and also HTML table, SQL dump, and so on.

Ailys provides a predictive model using tabular data. With Google’s examples, we shall brief through how machine learning is actually used in the distribution industry, and follow up with DAVinCI LABS’ demo with open-source data to offer an elaborate example.

A fraud-detecting machine learning model

Any abnormal card transactions? Here we come

Being a large corporation dealing with hundreds of corporate collaborations, Pitney Bowes was keen on establishing a high-quality delivery system as soon as possible. The first approach to catching fraud was developing a model that applies a statistical algorithm ‘XG Boost’ (Yes, it sounds like nothing we’ve heard before, but just being aware that it’s one of various machine learning methods is enough). It was a clever step to take, upgrading the data game from rather sloppily managed spreadsheets to something orderly and organized, incorporating machine learning. As a result of applying the XG Boost algorithm, the fraud amount has been reduced to almost half of the initial number and the input workforce, as well as cost, has decreased by 14%.

The procedure of the fraud detection system was conducted by Google AutoML Tables and Pitney Bowes.


The main idea of XG Boost is ‘boosting’ and stacking of data referring to the importance of weight.


However, as models age just like we humans do, the updating speed of the model performance had significantly slowed down and simultaneously brought some errors, causing skepticism on the model’s accuracy. The errors usually consisted of ‘false positive’, which meant defining a perfectly normal order as a fraudulent one. Pitney Bowes has teamed up with Google to get rid of this problem, which is where AutoML Tables comes into place.

So far, we have just gone as far as acknowledging briefly ‘fraudulent orders have been detected by predictive models’. Let’s get into actual examples of how exactly abnormal numbers get caught.

In short, transaction records of the clients’ credit cards and address information are the main resources that get utilized.

Credit card information consists of ID, time, paid amount, location of payment action, revealing the client’s data in detail. What if a record shows a client making a large transaction in the US, and in a matter of hours the same person purchases a house in Korea? Something is fishy without a doubt. Like so, once an abnormal transaction record is spotted it gets distinguished as a fraud order.

First, Transaction Information.

▶Transaction information keeps track of transaction ID, date and time (DT), transaction amount (Amt), product code, and other related information.

Second, Identity Information.

▶Other than transaction ID, different variables that are indecipherable appear under the label of ‘id’. They have initially been labeled with more precision and detail, but due to the confidentiality of the company, all the labels have been erased only to leave the values lingering. (On Kaggle, there are still continuous arguments among data-scientists and students on what each id_xx could mean…)

The output? The target variable is “isFraud”, informing whether or not there’s a fraudulent order

▶These two datasets (transaction and identity data) are combined into one and it undergoes the modeling process to conclude if the client has committed fraud or not.

The outcome is labeled as the rate of fraud order, in a value between 0 and 1.

isFraud is our target variable! It’s recorded in a form of probability from 0 to 1.

Why don’t we sum the whole thing up with a picture?

Transaction and identity information are combined and then machine learning predictive modeling is applied to determine frauds.

Transaction and identity information are merged into a dataset, then machine learning is applied to start rolling the predictive modeling system-which detects a fraudulent order. We could summarize this in one that ‘AutoML models enabled successful detection of foreign fraudulent orders’.

Next session, we’ll run a DAVinCI LABS demo using this open-source data to go in-depth with fraud detection!