I’m currently learning R programming, and I’m trying to classify Kaggle loan customers using R programming.
The first analysis that can be done using raw data as it is is to estimate (predict, classify) 1 dependent variable with 3 categories using 10 input variables (independent variables, X).
Here, the dependent variable (target variable, Y) is loan_status, and the three categories are as follows.
PAIDOFF: Repay all loans within the deadline
COLLECTION: Non-payment until data collection
COLLECTION_PAIDOFF: The deadline has passed, but all loans are repaid
Originally, it is a multi classification problem that categorizes the above three categories, but I will do a binary classification that categorizes repayment within the deadline into success or failure through some modifications.
loan <- loan %>% mutate(Loan_ID = factor(Loan_ID), loan_status = factor(loan_status), effective_date = factor(effective_date), due_date = factor(due_date), paid_off_time = factor(paid_off_time), education = factor(education), Gender = factor(Gender)) summary(loan)
visualization
loan %>% ggplot(aes(loan_status)) + geom_bar() + labs(title = "Bar plot", subtitle = "Succes People", caption = "Source: Kaggle Loan data")
Leave a Reply
You must be logged in to post a comment.