Representative Use Cases of Machine learning (Supervised Learning, Unsupervised Learning, Time Series Analysis, Clustering)

7 Jun 2022

In the previous segment, we covered the basic concepts of machine learning and how it differed from statistics. We also noted that business leaders need an automated machine learning solution to smoothly develop machine learning models.

Now then, let's figure out how an automated machine learning solution gets structured and what it can target to develop a model, along with some detailed examples.

1. Supervised Learning Predictive Modeling: Clear and Accurate Prediction of the Target


Supervised Learning Predictive Modeling: Product Sales Prediction (Demand Prediction)

The example above is supervised predictive modeling, the most frequently used function in machine learning.

Let’s say a company running a convenience store business wished to predict the sales and how much of the product would be sold. In the past, test sales were conducted with small-sized samples and the sales were predicted, but this method carried a lot of errors.

That is why we came up with machine learning to predict sales and compare them with the actual sales. The existing sales data for analysis included numerous variables such as sales price, product category, sales period, the average number of customers, authorized retail stores, and the weather. We searched for patterns using machine learning to see how these variables affected the target of sales.

So, what would have happened as a result?

As you can see in the figure, for product A, the error between predicted sales and actual sales decreased from 120/month to 30/month, and product B even decreased to 6/month. Product C has also been reduced from 60 pieces/month to 10 pieces/month.

Like so, once the sales forecast achieves accuracy, stock management can be performed efficiently, which can lead to significant business profits.

The reason why the unfamiliar term “supervised learning” appeared in the example above is that the prediction target is defined very clearly. It basically means that what value should be retrieved is named specifically and exactly. This method is a learning method distinct from unsupervised learning in which a predictive target is not clearly pointed out. We will go into further elaboration later on. 

2. Supervised Learning Clustering: Grouping samples by specifying a target


Supervised learning clustering points out customer groups with high risk of churn

The second example is a special module that is the only one provided by Ailys’s solution called Supervised Clustering. The title of the picture is Rule generation, which indicates creating a rule (rule) that groups customers into different parties. What rule would this be?

Let's take a glance at this example. Insurance companies implement a variety of strategies to prevent customers from canceling or voiding their products. In order to carry this out, it is important to first determine the customers most likely to cancel or become invalid. So, we analyzed the existing data to figure out the characteristics among customers more likely to churn. Existing data will include various variables such as age, payment method, subscription channel, number of contracts, number of months into the contract, number of failed withdrawals, number of calls, and other marketing information. Then, clustering is carried out by specifying the target variable as invalidation or cancellation/termination. 

As a result, the characteristics of the invalidation risk group and the characteristics of the cancellation/ termination risk group were generated. The customer group most likely to be invalid is when the final payment method is real-time transfer, PAYCO, Giro, etc., and the product group is senior, dementia, general, etc., and the monthly average number of unsuccessful withdrawals is 0.572 or more. And although not shown in the figure, this client group has a total of xxx people and the xx% chance that it will be invalidated.

In a way, you can think of it as creating a ‘Rule’ that binds groups of customers. This grouping of customers who satisfy a specific target value is called supervised learning clustering in machine learning terms, and it’s been stressed that a Rule has been created by Ailys. This is because it is a similar concept to classifying clients and assigning credit scores through rules in the credit scoring system. Overall, it can be put that supervised learning clustering is effectively utilized in customized strategies through customer segmentation.

3. Time series analysis: Prediction of the Target Over Time


Time series analysis: forecasting product shipment of 10 days

A third example is the Time Series Analysis module, which predicts targets over time. In the first example, supervised learning predictive modeling, we’ve gone over that the target is predicted based on the variables of the given data. Similarly, in time series analysis, there is also a clear target to predict. In the figure above, we set a goal we want to estimate the appropriate shipment volume to optimize the stock amount of the product.

 However, the major difference that tells time series analysis apart from supervised predictive modeling lies in the fact that shipments are predicted over time. Therefore, date records must be included in the data, and they must be defined separately when modeling. As for the first example, supervised predictive modeling, a separate date variable wasn’t required. What’s necessary was only to predict the final target value. However, in time series analysis, as you can see in the analysis result graph above, you can closely observe the change in the target value over time. It is a more advanced technology than general predictive modeling.

4. Unsupervised Clustering: Does the grouping automatically without specifying a target

Unsupervised Clustering: Fraud Detection System


 The fourth example is an Unsupervised Clustering module that automatically groups samples together. In our second example, the Supervised Learning Clustering, we presented certain criteria for grouping customers. For example, we were asked to group customers who had a high probability of churning (cancellation or invalidation) among customers who had insurance. In other words, supervised learning clustering was possible because the target of cancellation or invalidation was specifically defined.

However, clustering includes not only supervised learning but also unsupervised learning. As a matter of fact, no target is pointed out in unsupervised clustering. That said, the user does not separately define the criteria or a condition to group the clients. So how are you supposed to tie it up? The answer is to tie it up on your own. The computer automatically grouped samples with similar characteristics and ordered them to look for distinct or special clusters.

For example, in the example above, we tried to detect if there was any fraudulent activity by analyzing the data history of the financial transaction. In general, it can’t be denied that pointing a finger at the characteristics of fraudulent activity is fairly challenging. Therefore, you’re bound to find it meaningless to specify a target of fraudulent activity and predict it through supervised learning. All that aside, you never know what new types of frauds may appear in the future. In this case, if unsupervised clustering is applied, the computer starts to automatically bundle the customer's transaction information while briefing through them. Especially the clients who are highly likely to cause fraudulent activities are placed into separate groups, as the pattern is different from that of existing clients. In other words, we're getting a warning that it's likely fraud.

How supervised learning, unsupervised learning, and clustering are linked


To summarize, the relationship between supervised learning, unsupervised learning, and clustering is shown in the figure above.

Let's point the arrows from left to right in the picture. Modeling by specifying the target to predict from the characteristics of the data is called supervised predictive modeling. For example, if you want to check whether a man, 24 years old, married, earning 45 million won, and owning his/her own can repay a loan, you can choose to run the predictive modeling. Also, if you want to observe the target you want to predict over time, this is the point you can decide to use time series analysis.

 Now, the arrow points from right to left. Tracking back what characteristics a group that meets a specific target value has is called supervised clustering. For example, when only those who are likely to repay a loan are grouped together, their information such as gender, age, family, and salary of the group is generated as the output. There are 500 people in one group, the gender is female, the age is 34-42, the annual salary is between 32 million won and 39 million won, and as a result, the probability of repaying the loan is 89%.

Finally, the arrow shoots straight out of the data, not from the data characteristics towards the target. This is unsupervised clustering, which automatically groups together without specifying a target. For example, when we grouped client financial transactions by ourselves, we discovered a unique group that was different from the typical, general transactions. We can take this as a hint and use it for fraud detection.

DAVinCI LABS contains all the modules we’ve observed so far. If you would like to make more inquiries about a detailed solution or use cases, please reach out to our website.