1 Artificial Intelligence vs Machine Learning vs Deep Learning
2 Algorithm and Model
3 Type: Supervised Learning and Unsupervised Learning
4 ML workflow
5 DAVinCI LABS Workflow
1 Problem Definition
2 Data Preparation
3 Variable Processing
4 Modeling
5 Model Evaluation
6 Result Interpretation
7 Model Deployment and Operations
1 | Why do you need data analysis? |
2 | What is Machine Learning? |
2 |
3 |
4 |
1 | Problem DefinitionTrain the loan repayment history data of existing customers to create a better decision-making model. In other words, according to the existing internal rules, among the customers who would have been rejected, customers who can afford to repay are approved, and among the customers who may have been approved, customers who cannot afford to pay are rejected. In problem definition, it's important to understand your business and data well, and set the task you ultimately want to solve. |
2 | Data PreparationAll data is rarely in one place. You need to find relevant data for problem definition. For example, create a large dataset that characterizes the customer, including variables that reflect the customer's financial What is 2 Machine Learning? information and appraised social credit rating scores. Of course, it should also include the target value that labels the correct answer and correct repayment history, which is essential for training. You need to put the correct values labeled, i.e., valuable data to reach a valuable conclusion. |
3 | Model DevelopmentDevelop a model that can determine whether a new loan applicant is approved or not by training the prepared data with various algorithms. |
4 | Model EvaluationThe developed model is immediately deployed and evaluated before operation to determine whether to use it or not. The evaluation index is largely different depending on the target type (regression or categorical). Loan approval is a binary classification problem, and the final model with good performance is selected by using the corresponding evaluation index. |
5 | Model DeploymentNo matter how powerful a predictive model is, it is meaningless if it is not available to use by practitioners. Therefore, once the model evaluation is completed, the process of model deployment for operation must take place. Typically, a trained model can be applied to operating environment through a separate software, which is the predictive server. |
6 | Operation (apply prediction result)The deployed model shows whether new applicants are approved in real time during the operation phase. However, these models cannot be used permanently. Why? Because models age just like humans. As time goes by, the data of new applicants will also become historical data that can be trained, and the trend of the data will change. Therefore, in order to prevent the decline in predictive performance, you need to retrain the model according to an internal standard or if there is no such standard, it is necessary to establish a standard for when to retrain the model. |
5 |
3 | Predictive Model Using DAVinCI LABS |
1 |
In defining a supervised learning-based problem, the user will consider two main cass:
2 |
For example, on the Dataset page of churn data,
1 | Delete variableIf it is determined through the provided statistics that most of the data’s value is missing and that there is no significant impact on predicting the target with domain knowledge, delete the variable |
2 | Ignore variablePersonal identification variables CustomerID, RowNumber, and Surname are excluded before modeling because it is necessary to predict customers with such characteristics rather than specific customers. |
3 | Correlation analysisCorrelation also allows you to check the relationship of the variables and the target to decide whether to use a variable or not. |
4 | Change type of variableChange the variables that require a change of type before modeling. A tip when judging whether to change the variable type or not is to change variables that are not comparable to categories. Manually change the card ownership status (HasCrCard) and customer activity status (IsActiveMember) variables recognized as numeric types among churn data to categorical types. Contrary to the example, the case of changing a categorical type to a numeric type is more common. |
3 |
4 |
5 |
6 |
7 |
4 | Rule Optimization |
. |
For example,
5 | Summary |