機器學習 客戶流失
介紹 (Introduction)
This article is part of a project for Udacity “Become a Data Scientist Nano Degree”. The Jupyter Notebook with the code for this project can be downloaded from GitHub.
本文是Udacity“成為數據科學家納米學位”項目的一部分。 可以從GitHub下載帶有該項目代碼的Jupyter Notebook。
I will create a series of articles about this project going through CRISP-DM process. This part is covering the data and business understanding steps.
我將針對CRISP-DM流程創建有關該項目的一系列文章。 這一部分涵蓋了數據和業務理解步驟。
業務理解 (Business Understanding)
Let’s imagine for a moment that we are freshly hired data scientists working for a startup called “Sparkify”, which offers music streaming service through their website and App.
讓我們想象一下,我們剛招聘了一位數據科學家,為一家名為“ Sparkify”的創業公司工作,該公司通過其網站和App提供音樂流媒體服務。
Our first job is to prepare a presentation for the management meeting on business strategy. The meeting is going to be in several hours from now. We have about 10 minutes for our presentation there.
我們的第一項工作是為業務戰略管理會議準備演示文稿。 會議將在幾個小時后開始。 我們在那里大約有10分鐘的演講時間。
Clearly we want to impress our managers with our machine learning skills, but there is simply no time to clean all the data, not to mention run machine learning on the huge 12 GB log of the last two months of user activities.
顯然,我們希望用我們的機器學習技能來打動我們的經理,但是根本沒有時間清理所有數據,更不用說在最近兩個月的用戶活動中,在龐大的12 GB日志上運行機器學習。
We decide to take about 1% of users from the log and prepare some statistical analysis and visualisations to answer the questions we expect our managers to be most interested in, such as:
我們決定從日志中抽取大約1%的用戶,并準備一些統計分析和可視化圖表,以回答我們希望經理們最感興趣的問題,例如:
- Usage patterns 使用方式
- Business development 業務發展
- Threats to the business 對企業的威脅
1.使用方式 (1. Usage patterns)
As a streaming service of course we would like to know how many songs are played every day:
作為流媒體服務,我們當然想知道每天播放多少首歌曲:

We can see that there are only about half as much songs being played around weekends and unsurprisingly there is a large spike around Halloween. To get a better feeling of the usage frequency let’s look at the and average number of unique users per weekday:
我們可以看到,周末前后只播放大約一半的歌曲,毫不奇怪,萬圣節前后會有很大的高峰。 為了更好地了解使用頻率,讓我們看一下每個工作日的唯一身份用戶數和平均數量:

Another interesting question is the distribution of user activity throughout the day. Let’s have a look at the average number of songs played by the hour:
另一個有趣的問題是一天中用戶活動的分布。 讓我們看一下每小時播放的平均歌曲數:

And the user activity:
和用戶活動:

使用情況摘要 (Summary usage statistics)
Let’s formulate the key insights from our analysis:
讓我們從分析中得出關鍵見解:
- We have seen that usage statistics follow a weekly pattern with less users using Sparkify on weekends. 我們已經看到,使用情況統計信息遵循每周模式,周末使用Sparkify的用戶減少了。
- Unsurprisingly there is a spike in streams around Halloween. 毫無疑問,萬圣節前后的溪流激增。
- Throughout the day the number of users remains almost constant with a slight increase between 1 and 7 p.m. 整天的用戶數量幾乎保持不變,下午1點至晚上7點之間略有增加
- The number of songs played per user throughout the day has a pattern where it follows daily activities: get up, way to work, start of work, lunch break etc. 全天每位用戶播放的歌曲數量遵循以下日常活動模式:起床,工作方式,工作開始,午餐休息時間等。
More important is to know what we can do with this insights:
更重要的是要知道我們可以用這些見解做什么:
- We can optimise licence costs knowing how many songs will be played. 我們可以知道要播放多少首歌曲,從而優化許可費用。
- We can optimise the number of servers running throughout the day and week to save electricity and networking costs based on user activity. 我們可以優化每天和每周運行的服務器數量,以根據用戶活動節省電費和網絡成本。
- We can target our user communication to the time frames where they are most likely to use our service. 我們可以將我們的用戶交流定位到最有可能使用我們服務的時間范圍。
2.業務發展 (2. Business development)
The main revenue source for Sparkify are periodical subscription fees from paying users. We would like to know how many users have actually used “paid” and how many used “free” options:
Sparkify的主要收入來源是來自付費用戶的定期訂閱費用。 我們想知道實際上有多少用戶使用了“付費”選項,有多少用戶使用了“免費”選項:

Another source of revenue is playing advertising clips for free users. How many clips are played every week?
另一個收入來源是為免費用戶播放廣告片段。 每周播放幾段剪輯?

Let’s also see how many ads on average are displayed to each user:
我們還要查看平均向每個用戶展示多少個廣告:

摘要業務發展 (Summary business development)
Let’s formulate the key insights and takeaways for our business.
讓我們為我們的業務制定關鍵的見解和要點。
Key insights
重要見解
- The number of paying customers is increasing in the observation period. 在觀察期內,付費客戶的數量正在增加。
- The number of adverts decreases. 廣告數量減少。
- The number of free customers is decreasing. 免費客戶的數量正在減少。
Takeaways for business
外賣業務
- The number of paying customers is not changing much after the first week. Probably we need to motivate people to switch to paid account by limited time offer or free trial. 第一周后,付費客戶的數量變化不大。 可能我們需要激勵人們通過限時優惠或免費試用來切換到付費帳戶。
- The number of free customers is decreasing at quite high rate. It seems that the free account is not very attractive. We have to look at the reasons more closely. Are the adverts to frequent? Do free users have limited access to the music titles? 免費客戶的數量正在以很高的速度減少。 看來免費帳戶不是很吸引人。 我們必須更仔細地研究原因。 廣告頻繁嗎? 免費用戶對音樂標題的訪問受限嗎?
- Although the number of adverts is falling the number of adverts per user is increasing. Perhaps we have taken the wrong road here given that free users are probably choosing to leave the service over upgrading their account? 盡管廣告數量在減少,但每位用戶的廣告數量卻在增加。 鑒于免費用戶可能選擇離開服務而不是升級其帳戶,也許我們走錯了路?
3.對企業的威脅 (3. Threats to the business)
Finally let’s look at the account level upgrades, downgrades and cancellations:
最后,讓我們看一下帳戶級別的升級,降級和取消:

To have a more clear picture let’s see which account level do users who cancel their account have:
為了更清楚地了解情況,讓我們看看取消帳戶的用戶具有哪個帳戶級別:

摘要業務威脅 (Summary business threats)
Let’s formulate the key insights and takeaways for our business.
讓我們為我們的業務制定關鍵的見解和要點。
Key insights
重要見解
- The number of upgrades spiked in the first week of observation. 在觀察的第一周內,升級數量激增。
- The number of upgrades is declining during the period of observation. 在觀察期間,升級次數正在減少。
- The number of downgrades has a small spike in the week 41 and is almost steady with decline near the end. 降級的數量在第41周有一個小峰值,并且幾乎是穩定的,并且在接近尾聲時有所下降。
- The number of cancellations is almost steady with a small spike around week 42 and decline near the end. 取消的數量幾乎是穩定的,在第42周左右有一個小峰值,并在接近尾聲時下降。
- Paying users are cancelling their accounts more often then free users. 付費用戶比免費用戶更頻繁地取消帳戶。
Takeaways for business
外賣業務
- Whatever we have done in the week 40 we must keep doing that! 不管我們在40周內做了什么,我們都必須繼續這樣做!
- We need to understand why less and less customers choose to upgrade their accounts. 我們需要了解為什么越來越少的客戶選擇升級他們的帳戶。
- Although the downgrade and cancellation rates are falling we need pay more attention to them. 盡管降級和取消率正在下降,但我們需要更加注意它們。
- The fact that paying users are choosing to cancel their account rather than to downgrade them is alarming. What have we done wrong to make them angry? 付費用戶選擇取消其帳戶而不是降級他們的事實令人震驚。 我們做錯了什么使他們生氣?
結論:我們可以確定流失的原因嗎? (Conclusion: can we identify reasons for churn?)
The presentation went well. Most of the people in the room were not of technical background. They were impressed by comprehensive visualisations and clearly formulated statements about the current situation.
演講進行得很順利。 房間里的大多數人都不是技術背景。 全面的可視化效果和清晰表達的有關當前狀況的陳述給他們留下了深刻的印象。
The consequence is that the management is now worried about churn. They ask us to find the reasons why the customers, especially paying ones are cancelling their accounts.
結果是管理層現在擔心流失。 他們要求我們找出客戶(尤其是付費客戶)取消帳戶的原因。
We will have to run machine learning on our data and it will take some days to find the right techniques on the small subset of data and then maybe some weeks to run the algorithms on the full dataset.
我們將不得不對數據進行機器學習,這將需要幾天的時間才能在較小的數據子集上找到正確的技術,然后可能需要數周的時間才能在完整的數據集上運行算法。
Using our intuition we can try to find a quick fix, which may help our company on a short notice. Let’s look at the statistics of rolling adverts:
利用我們的直覺,我們可以嘗試找到快速解決方案,這可能會在短時間內為我們的公司提供幫助。 讓我們看一下滾動廣告的統計信息:

It turns out paying customers still may see or hear an advert. Can it be the reason why they choose to quit? Perhaps our web developers should look into that issue.
事實證明,付費客戶仍然可以看到或聽到廣告。 這可能是他們選擇退出的原因嗎? 也許我們的Web開發人員應該調查該問題。
In my next article I will focus on machine learning techniques and how can they be applied to predict churn based on usage statistics.
在我的下一篇文章中,我將重點介紹機器學習技術以及如何將其應用于基于使用情況統計信息的客戶流失率。
翻譯自: https://medium.com/@viovioviovioviovio/predict-churn-with-machine-learning-ea00b8a42011
機器學習 客戶流失
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389933.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389933.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389933.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!