This project is an Exploratory Data Analysis (EDA) of a synthetically generated dataset. The primary goal was to analyze customer behaviour and subscription data to uncover the primary drivers of user engagement. The analysis was structured around three core areas:
- Listening Habits by Subscription/Demographics: How do the key engagement metrics—listening time and songs per day— vary significantly across different subscription types and user demographics like gender, age, and country?
- Device Usage Impact: Is there a measurable difference in engagement (e.g., total listening time) between users primarily using Mobile, Web, or Desktop devices?
- The Skip Factor: What is the correlation between a user's skip rate and their overall level of engagement? Does a high skip rate suggest content dissatisfaction or merely more active playlist curation?
- Ad Impact on Free Users: How does the frequency of ads listened per week affect the listening behavior (listening time, songs per day) of Free users? Is there an optimal ad load before engagement declines?
- Value of Offline Listening: Does the availability of offline listening significantly correlate with higher overall usage and retention, suggesting it's a valued Premium feature?
- Subscription Demographics: What are the distinct demographic profiles (age, country) of users who opt for specialized plans, such as Student and Family, compared to standard Premium plans?
The Exploratory analysis revealed several counter-intuitive findings about user engagement, plan value, and demographic patterns within this dataset.
| Findings | Insight |
|---|---|
| Habits by Subscription/Demographics: | All plans (Premium, Free, Family, Student) exhibit near-identical average listening times (approx. 151-155 min/week). Demographics (Gender, Age, Country) also show minimal variation, indicating engagement is high across the board regardless of tier of user profile. |
| Findings | Insight |
|---|---|
| Device Usage Impact : | Users on Desktop and Web devices show marginally higher average listening time than Mobile users, though the difference is small. This suggests engagement is more dependent on the user's context (e.g., working at a desk) than the device itself. |
Visualization of Device Usage:

| Findings | Insight |
|---|---|
| The Skip Factor : | The correlation between Skip Rate and both listening time and songs per day is virtually zero (r ≈ -0.01). This refutes the idea that high skippers are either frustrated or hyper-engaged curators; the skip rate is independent of overall usage. |
| Finding | Insight |
|---|---|
| Ad Impact on Free Users : | Engagement metrics (time and songs) are highly volatile across the range of 5 to 50 ads per week. There is no observable optimal ad load or drop-off point, suggesting that ad frequency, as currently implemented, does not significantly drive changes in average Free user behavior. |
| Findings | Insight |
|---|---|
| Value of Offline Listening : | The feature does not correlate with higher usage (listening time is slightly lower) and is associated with a slightly higher churn rate ( |
Visualization of Offline Listening Impact:

| Findings | Insight |
|---|---|
| Subscription Demographics : | Specialized plans (Student, Family) show no distinct demographic profile in terms of age or country. Mean age is uniform ( |
Visualization of Subscription Demographics:

Based on the data, the following recommendations are suggested:
- Investigate Deeper Drivers: Since Age, Gender, Country, and Device do not explain engagement variance, focus future analysis on variables like content genre preference, playlist interaction, or time of day to find the true behavioral drivers.
- Rethink Premium Value: The high churn and lack of usage boost associated with the Offline Listening feature should be investigated. Marketing efforts should focus on other benefits (like ad-free listening) as the primary value propositions.
- Optimize Ad Strategy: Since engagement is random regarding ad load, the ad strategy should shift from optimizing engagement to maximizing revenue per user, as there appears to be no behavioral cost to increased ad frequency.






