Customer Churn Prediction
Customer Churn Prediction Report
Date: June 2, 2025
Author: Muhammad Ahsan
1. Executive Summary
This report walks through our project to develop a machine learning model that can predict when a customer might leave our fictional subscription-based company. After looking at customer details and how they use our service, we built and tested several models. The one that stood out was our Random Forest Classifier. It did a great job, hitting an F1-Score of 0.69 and an AUC-ROC of 0.91 when we tested it on new data it hadn't seen before.
Our analysis showed that the biggest clues a customer might leave are how long it's been since they last interacted with our service, their monthly bill, and how long they've been a customer. With these insights, we can create smart ways to keep customers around, like reaching out to inactive users or looking at our pricing for high-cost plans. Ultimately, this should help us lower churn and make our customers happier and more loyal.
2. Introduction
Customer churn – when customers stop using a service – is a big deal for any subscription business. It hits both revenue and growth. That's why we started this project: to get ahead of the curve and spot customers who might be thinking of leaving. This way, we can step in with the right actions to encourage them to stay. Our main goals were to:
Dig into the data of 10,000 customers to find out what makes them more likely to churn.
Create new, helpful data points (features) from what we already had.
Build and compare different machine learning models to see which one could predict churn most accurately.
Come up with practical, actionable ideas based on what the best model told us.
3. What We Found in the Data (Exploratory Data Analysis)
Looking through the customer data, a few interesting patterns popped out:
Overall Churn Rate: About 22% of customers in our dataset had churned. That's a noticeable number, confirming that it’s worth building a model to tackle this.
How Long They've Been Customers (Tenure): There was a clear link here – the longer someone was a customer, the less likely they were to leave. Newer customers, especially those with us for less than a year, were at a much higher risk.
What They Pay (Service Charges): The MonthlyCharge was definitely a factor. Customers paying more, particularly those with bills over $70-$100, were more likely to churn. This might mean we need to look at how much value customers feel they're getting at those higher price points.
How Engaged They Are: Two things really stood out: how long it had been since a customer's LastInteraction with our service, and the number of SupportTickets they'd raised. If a customer hadn't been active recently, or if they'd been contacting support a lot, these were strong signs they might be unhappy or losing interest.
4. How We Built the Model
4.1. Getting the Data Ready (Preprocessing and Feature Engineering)
To make sure our data was in the best shape for the models, we did a few things:
Creating New Insights (Feature Engineering): We made some new data points that we thought would be helpful, like TenureInYears (how many years they've been a customer), AvgChargePerTenure (their average monthly bill over their time with us), and simple yes/no flags for whether they had HighSupportTickets or had a RecentInteraction.
Organizing the Data (Data Transformation): We converted text-based categories (like Gender and SubscriptionType) into numbers using one-hot encoding. We also scaled all the numerical data (like age and monthly charges) using StandardScaler so that all features were on a level playing field, which helps many machine learning algorithms perform better.
4.2. Dealing with Imbalanced Data
With a 22% churn rate, we had more non-churners than churners in our data. This imbalance can sometimes make models biased. To handle this, we used a technique called SMOTE (Synthetic Minority Over-sampling Technique). We only applied this to our training data. SMOTE cleverly creates more examples of the smaller group (in our case, churners), which helps the model learn better from them without messing up our test data, which we kept separate and untouched for a fair evaluation.
5. How the Models Performed
We trained and tested three different types of classification models. Here’s how they stacked up when predicting churn on data they hadn't seen before. We picked our final model based on its F1-Score, because that gives a good balance between correctly identifying churners and not wrongly flagging too many non-churners.
Model
Accuracy
F1-Score (Churn)
ROC AUC
Random Forest
0.858
0.689
0.907
XGBoost
0.856
0.678
0.906
Logistic Regression
0.790
0.620
0.877
The Random Forest Classifier came out on top. It had the best F1-Score and ROC AUC, meaning it was the most effective at finding the right balance in identifying customers who are likely to churn.
6. What Really Drives Churn? (Key Indicators)
Looking at what the Random Forest model learned, these are the top things that influence a customer's decision to leave:
Days Since Last Interaction (LastInteraction): This was the biggest one. If a customer hasn't used the service in a while, there's a high chance they're on their way out.
Monthly Charge (MonthlyCharge): Higher monthly bills are a major reason customers leave.
Tenure (Tenure and TenureInYears): Loyalty really does grow over time. New customers are the riskiest.
Average Charge Per Tenure (AvgChargePerTenure): If the bill feels too high for how long they've been a customer, it can signal poor value.
Number of Support Tickets (SupportTickets): Lots of calls to support often come before a customer decides to churn.
Subscription Type (SubscriptionType_Premium): Interestingly, customers with a Premium subscription were less likely to churn. This suggests they're more invested in what we offer.
7. What This Means for the Business (Implications and Recommendations)
Now that we know what the model found, here are some practical, data-backed ideas we can use:
Reach Out to Inactive Users:
Action: Since LastInteraction is so important, let's proactively contact customers who haven't logged in or used the service for, say, 90-180 days. We could send emails highlighting new features, offer a special deal, or simply ask for their feedback.
Look at Pricing and Offer Smart Discounts:
Action: For customers with high MonthlyCharge that our model flags as high-risk, maybe we can offer a one-time discount, a free month, or an option to move to a plan that’s a better fit for their budget.
Make the New Customer Experience Amazing:
Action: New customers (Tenure) are more likely to leave, so let's really focus on their first 1-3 months. A great onboarding program can show them the value of our service quickly and help them build it into their routine.
Be Proactive with Support:
Action: If a customer has a lot of SupportTickets, let's automatically flag them for a follow-up. A senior support agent or a customer success manager could reach out to make sure their problems are solved and see how they're feeling about the service overall.
8. Wrapping Up
This project has given us a powerful Random Forest model that can predict customer churn with an impressive AUC of 0.91. More than just making predictions, this model gives us clear, actionable insights into why customers leave. By putting these recommendations into practice, we believe the business can significantly cut down its churn rate, keep more customers happy, and build a more stable and profitable future.
Project Summary
Developed a machine learning pipeline to predict customer churn in a subscription-based business, identifying high-risk clients to improve retention campaigns.
Conducted Exploratory Data Analysis (EDA) on 10,000 customer records. Resolved class imbalance using SMOTE on training data. Programmed and optimized Random Forest, XGBoost, and Logistic Regression models using Scikit-learn.
Random Forest Classifier emerged as the best model with an F1-score of 0.69 and an AUC-ROC of 0.91. The primary churn drivers identified were days since last interaction, monthly bill charges, and account tenure.