| Coef. | |
|---|---|
| ContractOne year | -2.1077429 |
| ContractTwo year | -3.7239890 |
| MultipleLinesNo phone service | -0.2891042 |
| MultipleLinesYes | -0.3604447 |
| log(scale) | 3.5809741 |
| log(shape) | -0.1757588 |
Démo
Understanding Survival Analysis
Foreword
This document demonstrates the use of a number of advanced page layout features to produce an attractive and usable document inspired by the Tufte handout style and the use of Tufte’s styles in RMarkdown documents (Xie, Allaire, and Grolemund 2018). The Tufte handout style is a style that Edward Tufte uses in his books and handouts. Tufte’s style is known for its extensive use of sidenotes, tight integration of graphics with text, and well-set typography. Quarto1 supports most of the layout techniques that are used in the Tufte handout style for both HTML and LaTeX/PDF output.
1 To learn more, you can read more about Quarto or visit Quarto’s Github repository.
Introduction to Customer Churn
Customer churn is when a customer cancels their subscription or stops using your service. This can happen for many reasons, from being unhappy with the service to finding a better deal with a competitor.
Understanding why customers leave is crucial for a business because a long-term customer base means a stable, predictable stream of revenue. The longer a customer stays subscribed, the more valuable they are.
While this is easy to see in industries like telecommunications, the subscription model is common in many other sectors:
- Insurance: Customers can choose to let their policy lapse.
- Mortgages: Homeowners might pay off their mortgage early.
- Retail: Shoppers may stop making repeat purchases from a mail-order catalog or online store.
A lower churn increases the Customer Lifetime Value (CLV).
Lowering your average churn rate means customers stay subscribed for longer, which directly leads to higher total revenue over time.
This “trickle-down effect” shows how reducing churn boosts your Customer Lifetime Value (CLV), also known as Lifetime Value (LTV):
\[Churn \rightarrow Duration \rightarrow Recurring~revenues \rightarrow\] \[Recurring~revenues - Recurring~costs = Gross~Contribution~Margin\] \[Gross~Contribution~Margin - Marketing~costs = Net~margin~for~single~event\] \[Net~margin~for~single~event \times Expected~number~of~purchase~in~lifetime = \] \[Accumulated~margin\]
\[Present~value(Accumulated~margin - Acquisition~cost) =\] \[LTV\]
Survival Analysis to Predict Customer Churn
Survival Analysis is a powerful statistical method for modeling churn. While traditionally used in medicine to track how long patients “survive” before an event like death, it can be applied to business to understand customer behavior.
By using Survival Analysis, we can gain valuable insights into churn, such as:
- Predicting churn likelihood over time: visualize and understand how the probability of a customer churning changes throughout their subscription.
- Identifying key churn drivers: determine which factors or customer characteristics are most likely to lead to churn.
Survival Analysis provides deep insights into customer behavior, allowing us to:
- Estimate customer duration: calculate the average time customers remain subscribed.
- Compare different customer segments: for example, analyze if male and female customers have different average subscription lengths.
- Validate business assumptions: test whether your understanding of the customer lifecycle aligns with real-world data.
Survival regression, a related technique, takes this further by allowing us to model the relationship between churn, time and specific customer characteristics (known as covariates). This helps us identify and quantify the factors that drive churn.
For example, we can use covariates like gender, age and family status to predict the probability that a specific customer segment – such as “female, non-senior subscribers with dependents” – will remain a customer for a certain period, say, 24 months.
A Worked Example
For this analysis, we use a dataset provided by IBM that details a fictional telecommunications company’s subscriber base. The dataset includes information on 7,043 customers, with 20 variables per customer. The majority of these variables are categorical.
Categorical variables: gender, SeniorCitizen, Partner, Dependents, PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, PaperlessBilling, PaymentMethod and Churn.
Numeric variables: tenure, MonthlyCharges and TotalCharges.
Our Survival Analysis focuses on the duration of each subscription, or tenure. Customers are categorized based on their churn status: Churn = 1 for those who opted out during the observation period and Churn = 0 for those who remained subscribed.
As shown in Figure 2, the distribution of monthly charges indicates that a large portion of the subscriber base pays approximately $20 per month.
Modeling Churn with Survival Curves
The survival curve is a fundamental tool in Survival Analysis. It illustrates the probability that a customer will remain subscribed over a given period. As time progresses, the probability of survival decreases and the risk of churn increases.
Proportional Hazards Models
tenure and Churn are the two dependent variables for the Survival Analysis.
We begin with a Weibull Proportional Hazards (PH) model, using two independant variables or covariates:
- Subscribers’
Contracttype. - Whether they have
MultipleLines.
The Weibull distribution is a common choice for estimating the survival curve, as it is a natural fit for modeling time-to-event data, such as a machine’s time to failure, a patient time to death or a customer’s time to churn.
A covariate is an explanatory variable used in a model, and it can be either categorical (a factor) or continuous. Categorical covariates or factors have levels or categories, which can be either nominal (without a natural order, like gender) or ordinal (with a ranked order, like a rating from low to high). A categorical covariates with 2 levels or categories is called binary or dichotomous. Continuous covariates, such as age or disposable income, are measured on a scale. Age is a classic example of a time-dependent covariate because its value changes due to the passing of time, whereas disposable income can also change over time (time-varying), but not because of time.
As shown in Table 1, each result should be interpreted in isolation, holding all other covariates constant (ceteris paribus). Each result should also be interpreted on a logarithmic scale, so their direct values are not always intuitive.
Keep in mind that these models and results are based on a fictitious dataset and are for illustrative purposes only.
- Coefficient > 0: a positive coefficient indicates that as the value of the covariate increases, the hazard of the event (churn) also increases, holding all other variables constant.
- Coefficient < 0: a negative coefficient means that as the covariate increases, the hazard of the event (churn) decreases.
- A lower (more negative) coefficient is associated with a lower hazard of the event (churn).
- Coefficient = 0: a coefficient of zero suggests the covariate has no effect on the hazard.
As shown in Table 1, a two-year contract is the most effective at reducing churn risk: -3.724. In contrast, subscribers with multiple lines but no phone service have a weaker effect on reducing churn: -0.289.
The coefficients log(scale) and log(shape) are the parameters of the Weibull distribution that determine the shape of the survival curve. We will explore these concepts in more detail later.
As shown in Table 2, it is often easier to interpret the exponentiated coefficients. These provide a direct, multiplicative measure of the effect on the hazard.
| exp(Coef.) | |
|---|---|
| ContractOne year | 0.1215119 |
| ContractTwo year | 0.0241375 |
| MultipleLinesNo phone service | 0.7489342 |
| MultipleLinesYes | 0.6973662 |
- Hazard Ratio > 1: the covariate increases the hazard of the event. For example, a hazard ratio of 1.50 means the hazard is 50% higher for a one-unit increase in the variable, holding all other covariates constant.
- Hazard Ratio < 1: the covariate decreases the hazard. A hazard ratio of 0.75 indicates a 25% reduction in hazard for a one-unit increase, holding all other covariates constant.
- Hazard Ratio = 1: the covariate has no effect on the hazard.
The hazard ratio is not the same as probability or the odds.
A hazard ratio is a relative measure. If a covariate has a hazard ratio of 1.10, it means that a customer with that characteristic has a (1.10 - 1) 0.10 or 10% higher instantaneous risk of churning at any given moment, when holding all other covariates constant (ceteris paribus).
As shown in Table 2, a one-unit increase in a two-year contract , 0.024, would changes the hazard of churn by -0.976% compared to the baseline while a one-year contract, 0.122, would change it by -0.878% holding all other covariates constant (ceteris paribus).
It means the churn rate is lowered for that group holding a two-year contract. This extends the expected tenure for that group holding a two-year contract.
The survival curve (Figure 3) is a vital tool for validating whether the analytical model aligns with your business’s understanding of the customer lifecycle.
As time passes, the number of survivers or subscribers steadily decreases.
The Survival Analysis reveals a significant churn rate, with the survival rate dropping to 50% after just 24 months. Due to this substantial early churn, long-term predictions become less reliable. For example, with fewer than 20% of the original subscribers remaining after 70 months, the model’s ability to make precise forecasts is significantly reduced.
For each covariate, the hazard ratio changes the steepness of the baseline survival curve. A hazard ratio less than 1 makes the curve less steep, indicating a lower risk and better survival. For example, a two-year contract has a positive effect, making the curve less steep (a shallower descent) and showing better survival over time.
In Table 1, the log coefficients (log(scale) and log(shape)) are parameters that define the Weibull distribution used in the PH model.
The shape parameter (\(\lambda\)) is the key metric that dictates how the hazard rate changes over time. It determines whether the risk of an event is increasing, decreasing, or constant throughout the observation period.
\(\lambda\) > 1: the hazard is increasing over time. This indicates an accelerating risk, which is common for things that wear out, like mechanical parts.
\(\lambda\) < 1: the hazard is decreasing over time. This indicates a decelerating risk, which is often seen with customers who become more loyal over time.
\(\lambda\) = 1: the hazard is constant over time. This is a special case known as the Exponential model, where the risk does not change.
log(shape)> 0 corresponds to \(\lambda\) > 1: the hazard function is increasing.log(shape)< 0 corresponds to \(\lambda\) < 1: the hazard function is decreasing.- In Table 1,
log(shape)= -0.176 < 0 and the hazard function is decreasing. In addition, we exponentiatelog(shape)=shape= \(\lambda\) = 0.839 < 1.
- In Table 1,
log(shape)= 0 corresponds to \(\lambda\) = 1: the hazard function is constant.
The scale parameter (\(\eta\)) is a time-scaling factor. It defines the overall magnitude of the hazard function and helps determine where the baseline survival curve is located on the time axis.
When comparing two Weibull PH models, a larger scale parameter means the hazard is lower at a given time point, which results in a longer average duration or survival time.
The relation with log(scale) is \(\eta = e^{\log(\text{scale})}\). In a Weibull PH model output (Table 1), the log(shape) parameter is provided: 3.581; and the scale can be calculated: \(\eta\) = 35.909.
Based on the analysis, we can confidently say that the hazard of churn is not constant over time.
The log(shape) coefficient, -0.176, in Table 1 confirms that the hazard curve is decreasing. As shown in Figure 4, the hazard of churn is highest in the first 7 to 10 months. After this initial period, the hazard rapidly declines and then flattens out.
This behavior directly explains the shape of the survival curve in Figure 3: the high initial hazard causes the steep drop in the number of subscribers, while the subsequent decline and flattening of the hazard curve is reflected in the survival curve’s deceleration over time.
Accelerated Failure Time Models
To confirm our findings, we fit a Weibull Accelerated Failure Time (AFT) model using the same set of independant variables or covariates:
- Subscribers’
Contracttype. - Whether they have
MultipleLines.
tenure and Churn remain the two dependent variables.
As shown in Table 3, each result should be interpreted in isolation, holding all other covariates constant (ceteris paribus). Each result should also be interpreted on a log scale, so their direct values are not always intuitive.
| Coef. | |
|---|---|
| (Intercept) | 3.5809741 |
| ContractOne year | 2.5127468 |
| ContractTwo year | 4.4395554 |
| MultipleLinesNo phone service | 0.3446557 |
| MultipleLinesYes | 0.4297043 |
- Coefficient > 0: as the value of the covariate increases, the duration or survival time also increases, holding all other variables constant.
- Coefficient < 0: as the covariate increases, the duration or survival time decreases.
- Coefficient = 0: the covariate has no effect on the the duration or survival time.
As shown in Table 3, a two-year contract has a more substantial effect on duration or survival time, with a coefficient of 4.44 then a one-year contract.
This finding is consistent with the results from the Weibull PH model. A higher coefficient in the Weibull AFT model (indicating a longer duration or survival time) corresponds directly to a lower hazard of churn in the Weibull PH model.
As shown in Table 4, it is often easier to interpret the exponentiated coefficients. These provide a direct, multiplicative measure of the effect on duration or survival time.
| exp(Coef.) | |
|---|---|
| (Intercept) | 35.908503 |
| ContractOne year | 12.338776 |
| ContractTwo year | 84.737258 |
| MultipleLinesNo phone service | 1.411504 |
| MultipleLinesYes | 1.536803 |
- Time Ratio > 1: the covariate increases the duration or survival time. For example, a time ratio of 2.0 means the duration is 2 times longer for a one-unit increase in the covariate, holding all other covariates constant.
- Time Ratio < 1: the covariate decreases the duration or survival time. A duration ratio of 0.5 means the duration is 2 times shorter for a one-unit increase in the covariate, holding all other covariates constant.
- Time Ratio = 1: the covariate has no effect on duration or survival time.
A one-unit increase in a two-year contract would extend the duration by a factor of 84.737 compared to the baseline, holding all other covariates constant (ceteris paribus).
The time ratio is also called the accelation factor.
The coefficients from these two model types are inversely related. While the AFT coefficients describe a multiplicative change in duration or survival time, the PH coefficients describe a multiplicative change in the hazard rate.
As shown in Table 5, we also compute the percentage change in duration or survival time from the model’s coefficients: \(\text{percentage change} = (e^{\beta} - 1) \times 100\%\).
| %Change | |
|---|---|
| (Intercept) | 3490.85033 |
| ContractOne year | 1133.87760 |
| ContractTwo year | 8373.72579 |
| MultipleLinesNo phone service | 41.15039 |
| MultipleLinesYes | 53.68030 |
- Percentage ratio > 1: an increase in the duration or survival time.
- Percentage ratio < 1: a decrease in the duration or survival time.
- Percentage ratio = 1: no effect in the duration or survival time.
Compared to the baseline group, having a two-year contract extends the subscription duration by 8373.7%, holding all other covariates constant (ceteris paribus). This means the entire survival curve for this group is stretched out, a concept that is far easier for a non-technical audience to understand than a raw coefficient or time ratio.
PH Models produce coefficients that measure the effect of covariates on churn. These coefficients quantify how a covariate influences the hazard, which in turn impacts subscription duration.
AFT Models produce coefficients that measure the effect of covariates on duration. These coefficients quantify how a covariate influences the survival time, which in turn impacts the churn rate.
An increase in subscription duration is inversely related to the hazard of churn. A longer subscription period is associated with a lower risk of the event (churn).
For example, two-year contracts are associated with a reduction in churn, 8373.7%, an effect that is greater than that of one-year contracts. This finding is consistent with the results from the Weibull PH model.
Just a Reminder
The parameter estimates for Weibull PH and AFT models have fundamentally different interpretations.
The Proportional Hazards (PH) models are based on the assumption that covariates have a multiplicative effect on the hazard (or risk) of an event. This makes it ideal for comparing how different factors increase or decrease the risk of churn.
The Accelerated Failure Time (AFT) models, by contrast, are based on the assumption that covariates have a multiplicative effect on the duration or survival time. This makes it useful for comparing how different factors extend or shorten the subscription period.
The Weibull PH model can also be used to generate visualizations that depict the hazard rate and survival probability over the subscription duration (Figure 5).
In Table 1, the log coefficients (log(scale) and log(shape)) are parameters that define the Weibull distribution used in the PH model.
The shape parameter (\(\lambda\)) is the key metric that dictates how the hazard rate changes over time. It determines whether the risk of an event is increasing, decreasing, or constant throughout the observation period.
\(\lambda\) > 1: the hazard is increasing over time. This indicates an accelerating risk, which is common for things that wear out, like mechanical parts.
\(\lambda\) < 1: the hazard is decreasing over time. This indicates a decelerating risk, which is often seen with customers who become more loyal over time.
\(\lambda\) = 1: the hazard is constant over time. This is a special case known as the Exponential model, where the risk does not change.
log(shape)> 0 corresponds to \(\lambda\) > 1: the hazard function is increasing.log(shape)< 0 corresponds to \(\lambda\) < 1: the hazard function is decreasing.- In Table 1,
log(shape)= -0.176 < 0 and the hazard function is decreasing. In addition, we exponentiatelog(shape)=shape= \(\lambda\) = 0.839 < 1.
- In Table 1,
log(shape)= 0 corresponds to \(\lambda\) = 1: the hazard function is constant.
While PH models are commonly used to visualize hazard and survival curves, AFT models are focused on measuring the direct impact on time.
The key distinction is that PH models measure the effect of covariates on the churn rate, which in turn affects duration, whereas AFT models directly measure the effect on duration, which in turn affects the churn rate.
Ultimately, the choice of model depends on the research question. However, the results from both models can be compared using their respective metrics (for example, hazard ratios for PH models and time ratios for AFT models) to provide a consistent view of the underlying dynamics.
The relationship between the Weibull PH model and the Weibull AFT model is unique and directly tied to the shape parameter. The Weibull distribution is the only distribution that satisfies both the PH and AFT assumptions, allowing for a direct conversion between the parameters of the two models.
The key to this relationship is that the shape parameter in a Weibull PH model (\(\lambda_{PH}\)) is the reciprocal of the scale parameter in a Weibull AFT model (\(\sigma_{AFT}\)):
\[\lambda_{PH} = \frac{1}{\sigma_{AFT}}\]
- Weibull PH Model
shapeparameter, \(\lambda_{PH}\) = 0.839. - Weibull AFT Model
scaleparameter, \(\sigma_{AFT}\) = 1.192 and the reciprocal \(\frac{1}{\sigma_{AFT}}\) = 0.839.
The Weibull AFT model confirms the Weibull PH model.
Comparison of the Parameters
Weibull PH model, log(scale) and scale.
The scale parameter (\(\eta\)) is a time-scaling factor. It defines the overall magnitude of the hazard function and helps determine where the baseline survival curve is located on the time axis.
When comparing two Weibull PH models, a larger scale parameter is associated with a lower hazard and, consequently, a longer duration or survival time.
The relation with log(scale) is \(\eta = e^{\log(\text{scale})}\). In a Weibull PH model output (Table 1), the log(shape) parameter is provided: 3.581; and the scale can be calculated: \(\eta\) = 35.909.
Weibull PH model, log(shape) and shape.
The shape parameter (\(\lambda\)) is the most important parameter for interpreting the Weibull PH model’s baseline hazard. It describes how the hazard rate changes over time.
\(\lambda\) > 1: the hazard is increasing over time. This indicates an accelerating risk, which is common for things that wear out, like mechanical parts.
\(\lambda\) < 1: the hazard is decreasing over time. This indicates a decelerating risk, which is often seen with customers who become more loyal over time.
\(\lambda\) = 1: the hazard is constant over time. This is a special case known as the Exponential model, where the risk does not change.
log(shape)> 0 corresponds to \(\lambda\) > 1: the hazard function is increasing.log(shape)< 0 corresponds to \(\lambda\) < 1: the hazard function is decreasing.- In Table 1,
log(shape)= -0.176 < 0 and the hazard function is decreasing. In addition, we exponentiatelog(shape)=shape= \(\lambda\) = 0.839 < 1.
- In Table 1,
log(shape)= 0 corresponds to \(\lambda\) = 1: the hazard function is constant.
Weibull AFT model, log(scale) and scale.
In the AFT model, the scale parameter (\(\sigma\)) defines the hazard’s behavior over time.
- \(\sigma\) < 1: the hazard is increasing over time. This indicates an accelerating risk, which is common for things that wear out, like mechanical parts.
- \(\sigma\) > 1: the hazard is decreasing over time. This indicates a decelerating risk, which is often seen with customers who become more loyal over time.
scale= \(\sigma\) = 1.192 > 1 and the hazard function is decreasing.log(scale)= 0.176.
- \(\sigma\) > 1: the hazard is constant over time.
The shape parameter in a Weibull PH model (\(\lambda_{PH}\)) is the reciprocal of the scale parameter in a Weibull AFT model (\(\sigma_{AFT}\)):
\[\lambda_{PH} = \frac{1}{\sigma_{AFT}}\]
- Weibull PH Model
shapeparameter, \(\lambda_{PH}\) = 0.839. - Weibull AFT Model
scaleparameter, \(\sigma_{AFT}\) = 1.192 and the reciprocal \(\frac{1}{\sigma_{AFT}}\) = 0.839.
Takeways
It is critical to select the right model to accurately represent the data, as a proper understanding of the hazard and survival functions is essential for making sound business decisions.
Beyond the statistics, the churn rate is a serious business issue with a direct impact on customer lifecycle management, revenues and long-term profitability.
Given the model’s results, any initiative to lower churn is likely to increase long-term profits. The analysis of the Weibull model’s shape parameter revealed that the hazard of churn is decreasing over time, particularly after the first 7 to 10 months. This means that customers who have already stayed longer are actually becoming more loyal and have a lower risk of churning. Therefore, a more effective strategy would be to focus retention efforts and incentives on newer customers, who are in the initial high-risk period, rather than on your most loyal, long-term subscribers.
Comparing Survival Curves
We can also compare survival curves across different customer segments to gain deeper insights into churn behavior.
For example, we might want to know if there is a difference in churn rates between genders.
| Female | Male | Female | Male | |
|---|---|---|---|---|
| No | 2549 | 2625 | 36.19 | 37.27 |
| Yes | 939 | 930 | 13.33 | 13.20 |
As shown in Table 6 and Figure 6, it appears that the churn rates for males and females are quite similar.
While a simple count of total churned customers by gender is a good starting point, it does not account for the duration of their subscriptions or the size of each group. Survival Analysis allows us to move beyond these simple counts to see if the risk of churn changes differently for each segment over time.
tenure and Churn are the two dependent variables.
We run Weibull PH model 2, using the same two independant variables or covariates:
- Subscribers’
Contracttype. - Whether they have
MultipleLines.
This time, we stratify the results by gender.
We are interested in the log(scale) parameters for Male and Female.
The scale (\(\eta\)) and log(scale) are time-scaling factors. They defines the overall magnitude of the hazard function and help determine where the baseline survival curve is located on the time axis.
When comparing two Weibull PH models or two subgroups, larger scale and log(scale) parameters are associated with a lower hazard and, consequently, a longer duration or survival time.
| Coef. | |
|---|---|
| ContractOne year | -2.1075900 |
| ContractTwo year | -3.7242056 |
| MultipleLinesNo phone service | -0.2893374 |
| MultipleLinesYes | -0.3604831 |
| log(scale):1 | 3.5725884 |
| log(shape):1 | -0.1930597 |
| log(scale):2 | 3.5891540 |
| log(shape):2 | -0.1581253 |
As shown in Table 7, the parameters are not very different, suggesting that gender does not have a meaningful impact on churn.
Female:log(scale):1.Male:log(scale):2.
1 is always associated with the first category or level of factor gender in the dataset.
[1] "class of gender: factor"
[1] "levels of gender: Female" "levels of gender: Male"
Female is the first category or level.
As shown in Figure 7, the survival curves for the subgroups are nearly identical and closely resemble the survival curve of the overall population. This suggests that gender does not have a significant impact on duration or survival time.
Now, what would be the difference between customers who have Dependents (children) or not. We run Weibull PH model 3.
| Coef. | |
|---|---|
| ContractOne year | -2.0629410 |
| ContractTwo year | -3.6444719 |
| MultipleLinesNo phone service | -0.3111825 |
| MultipleLinesYes | -0.3909394 |
| log(scale):1 | 3.4529736 |
| log(shape):1 | -0.1876260 |
| log(scale):2 | 3.9539983 |
| log(shape):2 | -0.0737134 |
As shown in Table 8, the parameters are different, suggesting that Dependents (children) does have a meaningful impact on churn.
No:log(scale):1.Yes:log(scale):2.
1 is always associated with the first category or level of factor Dependents in the dataset.
[1] "class of Dependents: factor"
[1] "levels of Dependents: No" "levels of Dependents: Yes"
No is the first category or level.
When comparing two Weibull PH models or two subgroups, larger scale and log(scale) parameters are associated with a lower hazard and, consequently, a longer duration or survival time.
As shown in Figure 8, the survival curve for customers with Dependents (children) is noticeably higher than the baseline curve. This visual evidence suggests that customers with dependents are more loyal and have a significantly higher probability of remaining subscribed at any given point in time.
To determine if this difference is statistically significant, we would perform a log-rank test. The test would likely result in a rejection of the null hypothesis, which states that the two survival curves are identical. This would confirm that the survival curves are, in fact, statistically different.
From the results, the model has a maximun log(likelihood) of -9289.542 with 4 covariates (4 degrees of freedom).
log-rank test statistic = \(-\tfrac{loglik}{df}\) where \(loglik\) is the Max. log. likelihood or the maximum log(likelihood).
The log-rank test statistic is 2322.385, giving a p-value = 0, way below 0.05 or 5%. The test is statistically significant and we can reject the null hypothesis. The two survival curves are different.
For any duration or survival time, the survival rate of those with Dependents (children) is statistically higher and the churn is lower.
Takeaway
Based on these findings, the marketing department should prioritize customer segmentation based on whether subscribers have Dependents (children), as this group has a significantly higher survival rate.
Conversely, the analysis revealed no statistically significant difference in churn rates between genders, so focusing marketing efforts on gender composition would be an inefficient use of resources.
Survival Regression to Predict Customer Churn
Understanding customer tenure is valuable, but to truly manage churn, we need to identify the covariates that influence it. For this purpose, we use Cox Proportional Hazards (PH) regression, a statistical model that allows us to analyze how various customer-specific covariates impact the hazard of churn.
The hazard ratio is not the same as probability or the odds.
A hazard ratio is a relative measure. If a covariate has a hazard ratio of 1.10, it means that a customer with that characteristic has a (1.10 - 1) 0.10 or 10% higher instantaneous risk of churning at any given moment, when holding all other covariates constant (ceteris paribus).
The primary output of a Cox Proportional Hazards model is the regression coefficients. Exponentiating these coefficients gives us the hazard ratios, which quantify the relative risk of an event (in this case, churn).
Let’s say we’ve fitted a Cox regression model with contract as a covariate, coded as 1 for Yes and 0 for No.
If the hazard ratio for contract is 1.10, it means that at any given time, No have a 1.10 times higher hazard of churn compared to Yes, holding all other covariates constant. This translates to No being 10% more likely to churn than Yes.
The key to interpreting hazard ratios is a comparison to 1:
- Hazard Ratio > 1: this means the covariate increases the hazard of the event. For example, a hazard ratio of 1.50 means the hazard is 50% higher for a one-unit increase in the covariate, holding all other covariates constant.
- Hazard Ratio < 1: this means the covariate decreases the hazard. A hazard ratio of 0.75 indicates a 25% reduction in hazard for a one-unit increase, holding all other covariates constant.
- Hazard Ratio = 1: this means the covariate has no effect on the hazard.
Proportional Hazards Assumption
Cox models rely on a fundamental assumption known as proportional hazards. This assumption states that the hazard ratio for all covariates remains constant over time.
This concept can be understood by looking at the Weibull model’s shape parameter. Because there is only one \(\lambda\) coefficient, it defines a single hazard shape for the entire baseline. This ensures that the effect of any covariate is a simple multiplicative factor, making the hazard ratios proportional by definition.
For example, consider the covariate Dependents (1 if the customer has children, 0 otherwise). If the proportional hazards assumption holds, the hazard ratio for this covariate would be consistent. This means that at any point in time – whether it has been 6 months or 12 months since the customer signed up – the relative difference in churn risk between those with and without children is the same.
As shown in Figure 9, we are examining two different scenarios for hazard functions.
On the left, the parallel lines represent a proportional hazard. This means that while the hazard for each group may change over time, the ratio of the hazards between the two groups remains constant. For example, the hazard for customers with Dependents (red line) is always a fixed multiple of the hazard for the other group. This is the assumption that the Weibull model and other Cox models rely on. With a decreasing hazard function, we can expect a survival curve that decreases and then flattens out over time, similar to what’s seen in Figure 3.
On the right, the lines are not parallel. This represents a scenario where the proportional hazards assumption is violated, meaning the hazard ratio between the two groups changes over time. With an increasing hazard over time, the survival curve would plunge more steeply over time, rather than flattening out. These kinds of plunging survival curves are often seen in health sciences, such as when observing the survival rate of cancer patients over time.
A statistical test can be used to formally check the proportional hazards assumption. In this case, the test revealed that the following covariates satisfy the assumption: PaperlessBilling, SeniorCitizen, Dependents and Gender.
For the other covariates that do not satisfy the assumption, there are several methods to incorporate them into the model:
- Time-dependent covariates, which model the effect of a covariate as it changes over time. A time-dependent covariate could be age: people age due to the passing of time.
- Time-varying effects (with or without time lags), where the hazard ratio itself is allowed to change over time, often through interactions with time. A time-varying covariate could be disposable income: it can change over time, but not because of time.
- Stratified Cox regression (with or without interactions among the covariates), which allows for a different baseline hazard for each stratum of a categorical covariate. Table 7 and Table 8 are cases of one-factor stratified Weibull PH models (without interactions).
- Pseudo-observations, a more advanced technique that can be used to fit models without the proportional hazards assumption.
Cox Regression Results
We fit a time-constant Cox PH model using the same set of dependant variables: tenure and Churn. The covariates are : PaperlessBilling, SeniorCitizen, Dependents and Gender.
You cannot directly interpret the time-related parameters of a Cox PH model in the same way as a Weibull model because they are fundamentally different types of models. Cox models are semi-parametric, which means they do not assume a specific shape for the baseline hazard function. In contrast, Weibull models are parametric, as they assume the baseline hazard follows a specific distribution (the Weibull distribution). This difference in assumptions is why the Weibull model provides interpretable shape and scale parameters, while the Cox model focuses solely on the hazard ratios.
| Coef. | p-value | |
|---|---|---|
| genderMale | -0.0207941 | 6.53e-01 |
| SeniorCitizen1 | 0.2649922 | 1.20e-06 |
| DependentsYes | -0.7820715 | 0.00e+00 |
| PaperlessBillingYes | 0.6120632 | 0.00e+00 |
As shown in Table 9:
genderis not a good covariate since the p-value is over 0.05 or 5%, confirming what we saw in Section 2.6 about a difference in churn rates between genders.SeniorCitizen,DependentsandPaperlessBillingare statistically significant covariates (p-values < 0.05 or 5%) confirming what we saw in Section 2.6 about a difference in churn rates between between customers who have dependents (children) or not.
As shown in Table 10, the covariate effects on the hazard are quantified by the exponentiated coefficients. These provide a direct, multiplicative measure of the effect on the hazard.
| exp(Coef.) | |
|---|---|
| genderMale | 0.9794206 |
| SeniorCitizen1 | 1.3034208 |
| DependentsYes | 0.4574574 |
| PaperlessBillingYes | 1.8442325 |
- Hazard Ratio > 1: the covariate increases the hazard of the event. For example, a hazard ratio of 1.50 means the hazard is 50% higher for a one-unit increase in the variable, holding all other covariates constant.
- Hazard Ratio < 1: the covariate decreases the hazard. A hazard ratio of 0.75 indicates a 25% reduction in hazard for a one-unit increase, holding all other covariates constant.
- Hazard Ratio = 1: the covariate has no effect on the hazard.
However, all the statistically significant covariates are categorical:
SeniorCitizen:0and1(no or yes).Dependents:NoorYes.PaperlessBilling:NoorYes.
Interpreting a continuous covariate is different from interpreting a binary one because it relates to a one-unit increase in the variable, not a comparison between two groups or several groups and a baseline.
If the exponentiated coefficient (hazard ratio) for a continuous covariate is 1.30, first, we can say the covariate increases the hazard of the event, since the hazard ratio is greater than 1. Second, we can transform the hazard ratio into a percentage change: (1.30 - 1) × 100% = 30% as we did with the Weibull AFT model (tbl-weibull.aft.model.pch.1). However, an parametric AFT model measure the effect on duration or survival time. A parametric PH models and the Cox PH model measure the effect on hazard (churn). For a Cox PH model, the percentage change means that for every one-unit increase in that continuous covariate, the hazard of churn increases by 30%, holding all other covariates constant. We will not convert the hazard ratios into percentage changes to cut to the takeaway:
SeniorCitizenincrease the hazard of the event (churn) because the exponentiated coefficient is above 1, holding all other covariates constant.- The multiplicative effect is 1.303 compared to the baseline and fixed over time. We have 4 covariates, then, 4 degrees of freedom. The odds of churning (exp(Coef.)/df) are 32.6% higher compared to non-senior citizens, holding all other covariates constant.
- Subscribers with
Dependentsdecrease the hazard of the event (churn). The exponential coefficient is below 1, holding all other covariates constant. - Subscribers with
PaperlessBillingincrease the hazard of the event (churn). The exponential coefficient is above 1, holding all other covariates constant.
Cox PH models do not provide an explicit estimate for an intercept term. This is because the intercept is already implicitly incorporated into the baseline hazard function, which is not directly estimated by the model. The model instead focuses on the relative effects of the covariates on that baseline.
The hazard function (Figure 10) represents the instantaneous risk of the event (churn) over time, given that the event has not occurred yet. The baseline hazard is the hazard for a subscriber in the reference category (where all covariates equal 0). A covariate hazard ratio (Table 10) is a constant multiplier that scales the baseline hazard. The hazard ratio for binary SeniorCitizen is 1.303. When SeniorCitizen the category or level is 1, the risk of churn is 0.48% higher. The hazard curves for each binary covariate (when the category or level is 1 or Yes) will be parallel to the baseline hazard curve (when the category or level is 0 or No), holding all other covariates constant. In the case of SeniorCitizen, since the hazard ratio > 1, the hazard curve is above the hazard baseline curve. The curves never cross because the ratio between them is constant (proportional hazard assumption).
The red dots are the observed. The green line is the predicted survival curve. Figure 10 is similar to a Q-Q plot to estimate normality (compare a covariate distribution in red against the Gaussian or Normal distribution in green). The discrepancy, particularly at the extremes, is a normal part of modeling real-world data. It indicates that the model’s fit is not perfect, which is to be expected.
The survivor function (Figure 11), gives the probability of a subscriber remaining (not experiencing the event of churn) over time. This curve is directly derived from the hazard function. The shaded area or two bands around the curve represent the confidence interval, which provides a range of likely values for the true survival curve and a measure of the uncertainty in the model’s estimate.
Unlike the Weibull PH model (Figure 3 and Figure 4), which can show an initial high hazard that drops and flattens out, the Cox PH model with a constant hazard would have a survival curve that drops at a steady, exponential rate over time. This means the risk of churn is consistent regardless of how long a customer has been subscribed. The curve would not show a substantial early drop followed by a flattening, as that behavior is typical of a decreasing hazard – a scenario where the risk of churn lessens over time.
The survival curves for different categories visualize the probability of remaining over time. As shown in Table 10, the group with the higher hazard ratio (when PaperlessBilling is Yes) will have a lower survival curve that descends more rapidly. Conversely, the group with the lower hazard ratio (when Dependents is Yes) will have a higher survival curve that descends more gradually, indicating a longer average survival time.
The survival curves for the different categories are not parallel. They will start at 1.0 (100% survival at time 0) and diverge over time, with the lower-hazard group maintaining a higher survival probability. The difference between the curves is most pronounced when the hazard is highest, and they tend to flatten out as the hazard approaches zero.
Taking These Insights to Action
Based on the model, we identified three significant covariates of churn (Figure 12). To compare their impact on an equal footing, we display their hazard ratios, which are expressed in positive, absolute terms.
While the hazard ratio for SeniorCitizen may be the lowest in magnitude compared to the other significant covariates, it is still a statistically significant covariate influencing churn.
A more strategic approach is to prioritize resources based on the magnitude of each covariate’s impact. Given that the other covariates have a much larger effect on churn, marketing and retention efforts would likely be more effective if focused on those covariates first. The SeniorCitizen covariate, while statistically relevant, may be a lower priority for initiatives aimed at curbing churn.
As shown in Figure 12, the finding that PaperlessBilling is a significant covariate in churn is not surprising. It suggests a correlation between a preference for paperless billing and a customer segment that may be more tech-savvy and responsive to new product offerings. This group might be more prone to churning for a new contract that includes the latest devices and gadgets.
Given that the majority of the customer base falls into this category, this represents a major business risk. The company should investigate the underlying drivers of churn within this segment to develop targeted retention strategies.
Based on the model, customers with Dependents are a highly valuable segment (Figure 14). Although they represent a smaller portion of the customer base compared to those without children, they are significantly more loyal. The hazard ratio (exp(Coef.)) of 0.457 for this group indicates they have a pourcentage change ((exp(Coef.) - 1) * 100%) of -54.3%, controlling for other covariates. A negative pourcentage indicates a decrease in churn. This finding suggests a strong business case for retaining this segment. The company should prioritize efforts to identify and cater to these customers, perhaps with family-oriented plans or loyalty programs that recognize their value.
Conclusion
is a versatile tool for addressing a variety of business situations:
- Business Planning: it can be used to forecast the monthly number of customer churn events and to monitor current lapse rates.
- Customer Segmentation: the model can predict each customer’s time to the next purchase, which helps in identifying and differentiating ‘active’ versus ‘inactive’ customers.
- Campaign Evaluation: it allows for the precise evaluation of marketing campaigns by monitoring their effect on customer survival rates.
Ultimately, these applications all have a direct impact on the Customer Lifetime Value (LTV). Furthermore, this analysis can be extended to evaluate the potential of win-back initiatives by estimating the LTV of reactivated customers.
Additional Resources
The survival package documentation.
The eha (event history analysis) package documentation.
For multiple comparisons, ggplot2 does facet layouts.
If modeling time-varying Cox proportional hazards regression model, tidyr is useful for reshaping wide data.frames into long data.frames.
With the mstate package, we can model multistate and competing risks. LTV or win-backs imply multistates.
Dayne Batten’s blog posts on in the business setting.
Barry Leventhal’s presentation with business insights.