機器學習 客戶流失_通過機器學習預測流失

機器學習 客戶流失

介紹 (Introduction)

This article is part of a project for Udacity “Become a Data Scientist Nano Degree”. The Jupyter Notebook with the code for this project can be downloaded from GitHub.

本文是Udacity“成為數據科學家納米學位”項目的一部分。 可以從GitHub下載帶有該項目代碼的Jupyter Notebook。

I will create a series of articles about this project going through CRISP-DM process. This part is covering the data and business understanding steps.

我將針對CRISP-DM流程創建有關該項目的一系列文章。 這一部分涵蓋了數據和業務理解步驟。

業務理解 (Business Understanding)

Let’s imagine for a moment that we are freshly hired data scientists working for a startup called “Sparkify”, which offers music streaming service through their website and App.

讓我們想象一下,我們剛招聘了一位數據科學家,為一家名為“ Sparkify”的創業公司工作,該公司通過其網站和App提供音樂流媒體服務。

Our first job is to prepare a presentation for the management meeting on business strategy. The meeting is going to be in several hours from now. We have about 10 minutes for our presentation there.

我們的第一項工作是為業務戰略管理會議準備演示文稿。 會議將在幾個小時后開始。 我們在那里大約有10分鐘的演講時間。

Clearly we want to impress our managers with our machine learning skills, but there is simply no time to clean all the data, not to mention run machine learning on the huge 12 GB log of the last two months of user activities.

顯然,我們希望用我們的機器學習技能來打動我們的經理,但是根本沒有時間清理所有數據,更不用說在最近兩個月的用戶活動中,在龐大的12 GB日志上運行機器學習。

We decide to take about 1% of users from the log and prepare some statistical analysis and visualisations to answer the questions we expect our managers to be most interested in, such as:

我們決定從日志中抽取大約1%的用戶,并準備一些統計分析和可視化圖表,以回答我們希望經理們最感興趣的問題,例如:

  1. Usage patterns

    使用方式
  2. Business development

    業務發展
  3. Threats to the business

    對企業的威脅

1.使用方式 (1. Usage patterns)

As a streaming service of course we would like to know how many songs are played every day:

作為流媒體服務,我們當然想知道每天播放多少首歌曲:

Image for post

We can see that there are only about half as much songs being played around weekends and unsurprisingly there is a large spike around Halloween. To get a better feeling of the usage frequency let’s look at the and average number of unique users per weekday:

我們可以看到,周末前后只播放大約一半的歌曲,毫不奇怪,萬圣節前后會有很大的高峰。 為了更好地了解使用頻率,讓我們看一下每個工作日的唯一身份用戶數和平均數量:

Image for post

Another interesting question is the distribution of user activity throughout the day. Let’s have a look at the average number of songs played by the hour:

另一個有趣的問題是一天中用戶活動的分布。 讓我們看一下每小時播放的平均歌曲數:

Image for post

And the user activity:

和用戶活動:

Image for post

使用情況摘要 (Summary usage statistics)

Let’s formulate the key insights from our analysis:

讓我們從分析中得出關鍵見解:

  • We have seen that usage statistics follow a weekly pattern with less users using Sparkify on weekends.

    我們已經看到,使用情況統計信息遵循每周模式,周末使用Sparkify的用戶減少了。
  • Unsurprisingly there is a spike in streams around Halloween.

    毫無疑問,萬圣節前后的溪流激增。
  • Throughout the day the number of users remains almost constant with a slight increase between 1 and 7 p.m.

    整天的用戶數量幾乎保持不變,下午1點至晚上7點之間略有增加
  • The number of songs played per user throughout the day has a pattern where it follows daily activities: get up, way to work, start of work, lunch break etc.

    全天每位用戶播放的歌曲數量遵循以下日常活動模式:起床,工作方式,工作開始,午餐休息時間等。

More important is to know what we can do with this insights:

更重要的是要知道我們可以用這些見解做什么:

  • We can optimise licence costs knowing how many songs will be played.

    我們可以知道要播放多少首歌曲,從而優化許可費用。
  • We can optimise the number of servers running throughout the day and week to save electricity and networking costs based on user activity.

    我們可以優化每天和每周運行的服務器數量,以根據用戶活動節省電費和網絡成本。
  • We can target our user communication to the time frames where they are most likely to use our service.

    我們可以將我們的用戶交流定位到最有可能使用我們服務的時間范圍。

2.業務發展 (2. Business development)

The main revenue source for Sparkify are periodical subscription fees from paying users. We would like to know how many users have actually used “paid” and how many used “free” options:

Sparkify的主要收入來源是來自付費用戶的定期訂閱費用。 我們想知道實際上有多少用戶使用了“付費”選項,有多少用戶使用了“免費”選項:

Image for post

Another source of revenue is playing advertising clips for free users. How many clips are played every week?

另一個收入來源是為免費用戶播放廣告片段。 每周播放幾段剪輯?

Image for post

Let’s also see how many ads on average are displayed to each user:

我們還要查看平均向每個用戶展示多少個廣告:

Image for post

摘要業務發展 (Summary business development)

Let’s formulate the key insights and takeaways for our business.

讓我們為我們的業務制定關鍵的見解和要點。

Key insights

重要見解

  • The number of paying customers is increasing in the observation period.

    在觀察期內,付費客戶的數量正在增加。
  • The number of adverts decreases.

    廣告數量減少。
  • The number of free customers is decreasing.

    免費客戶的數量正在減少。

Takeaways for business

外賣業務

  • The number of paying customers is not changing much after the first week. Probably we need to motivate people to switch to paid account by limited time offer or free trial.

    第一周后,付費客戶的數量變化不大。 可能我們需要激勵人們通過限時優惠或免費試用來切換到付費帳戶。
  • The number of free customers is decreasing at quite high rate. It seems that the free account is not very attractive. We have to look at the reasons more closely. Are the adverts to frequent? Do free users have limited access to the music titles?

    免費客戶的數量正在以很高的速度減少。 看來免費帳戶不是很吸引人。 我們必須更仔細地研究原因。 廣告頻繁嗎? 免費用戶對音樂標題的訪問受限嗎?
  • Although the number of adverts is falling the number of adverts per user is increasing. Perhaps we have taken the wrong road here given that free users are probably choosing to leave the service over upgrading their account?

    盡管廣告數量在減少,但每位用戶的廣告數量卻在增加。 鑒于免費用戶可能選擇離開服務而不是升級其帳戶,也許我們走錯了路?

3.對企業的威脅 (3. Threats to the business)

Finally let’s look at the account level upgrades, downgrades and cancellations:

最后,讓我們看一下帳戶級別的升級,降級和取消:

Image for post

To have a more clear picture let’s see which account level do users who cancel their account have:

為了更清楚地了解情況,讓我們看看取消帳戶的用戶具有哪個帳戶級別:

Image for post

摘要業務威脅 (Summary business threats)

Let’s formulate the key insights and takeaways for our business.

讓我們為我們的業務制定關鍵的見解和要點。

Key insights

重要見解

  • The number of upgrades spiked in the first week of observation.

    在觀察的第一周內,升級數量激增。
  • The number of upgrades is declining during the period of observation.

    在觀察期間,升級次數正在減少。
  • The number of downgrades has a small spike in the week 41 and is almost steady with decline near the end.

    降級的數量在第41周有一個小峰值,并且幾乎是穩定的,并且在接近尾聲時有所下降。
  • The number of cancellations is almost steady with a small spike around week 42 and decline near the end.

    取消的數量幾乎是穩定的,在第42周左右有一個小峰值,并在接近尾聲時下降。
  • Paying users are cancelling their accounts more often then free users.

    付費用戶比免費用戶更頻繁地取消帳戶。

Takeaways for business

外賣業務

  • Whatever we have done in the week 40 we must keep doing that!

    不管我們在40周內做了什么,我們都必須繼續這樣做!
  • We need to understand why less and less customers choose to upgrade their accounts.

    我們需要了解為什么越來越少的客戶選擇升級他們的帳戶。
  • Although the downgrade and cancellation rates are falling we need pay more attention to them.

    盡管降級和取消率正在下降,但我們需要更加注意它們。
  • The fact that paying users are choosing to cancel their account rather than to downgrade them is alarming. What have we done wrong to make them angry?

    付費用戶選擇取消其帳戶而不是降級他們的事實令人震驚。 我們做錯了什么使他們生氣?

結論:我們可以確定流失的原因嗎? (Conclusion: can we identify reasons for churn?)

The presentation went well. Most of the people in the room were not of technical background. They were impressed by comprehensive visualisations and clearly formulated statements about the current situation.

演講進行得很順利。 房間里的大多數人都不是技術背景。 全面的可視化效果和清晰表達的有關當前狀況的陳述給他們留下了深刻的印象。

The consequence is that the management is now worried about churn. They ask us to find the reasons why the customers, especially paying ones are cancelling their accounts.

結果是管理層現在擔心流失。 他們要求我們找出客戶(尤其是付費客戶)取消帳戶的原因。

We will have to run machine learning on our data and it will take some days to find the right techniques on the small subset of data and then maybe some weeks to run the algorithms on the full dataset.

我們將不得不對數據進行機器學習,這將需要幾天的時間才能在較小的數據子集上找到正確的技術,然后可能需要數周的時間才能在完整的數據集上運行算法。

Using our intuition we can try to find a quick fix, which may help our company on a short notice. Let’s look at the statistics of rolling adverts:

利用我們的直覺,我們可以嘗試找到快速解決方案,這可能會在短時間內為我們的公司提供幫助。 讓我們看一下滾動廣告的統計信息:

Image for post

It turns out paying customers still may see or hear an advert. Can it be the reason why they choose to quit? Perhaps our web developers should look into that issue.

事實證明,付費客戶仍然可以看到或聽到廣告。 這可能是他們選擇退出的原因嗎? 也許我們的Web開發人員應該調查該問題。

In my next article I will focus on machine learning techniques and how can they be applied to predict churn based on usage statistics.

在我的下一篇文章中,我將重點介紹機器學習技術以及如何將其應用于基于使用情況統計信息的客戶流失率。

翻譯自: https://medium.com/@viovioviovioviovio/predict-churn-with-machine-learning-ea00b8a42011

機器學習 客戶流失

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389933.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389933.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389933.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

2044. 統計按位或能得到最大值的子集數目

2044. 統計按位或能得到最大值的子集數目 給你一個整數數組 nums ,請你找出 nums 子集 按位或 可能得到的 最大值 ,并返回按位或能得到最大值的 不同非空子集的數目 。 如果數組 a 可以由數組 b 刪除一些元素(或不刪除)得到&…

redis系列:分布式鎖

1 介紹 這篇博文講介紹如何一步步構建一個基于Redis的分布式鎖。會從最原始的版本開始,然后根據問題進行調整,最后完成一個較為合理的分布式鎖。 本篇文章會將分布式鎖的實現分為兩部分,一個是單機環境,另一個是集群環境下的Redis…

Qt中的坐標系統

轉載:原野追逐 Qt使用統一的坐標系統來定位窗口部件的位置和大小。 以屏幕的左上角為原點即(0, 0)點,從左向右為x軸正向,從上向下為y軸正向,這整個屏幕的坐標系統就用來定位頂層窗口; 此外,窗口內部也有自己…

預測股票價格 模型_建立有馬模型來預測股票價格

預測股票價格 模型前言 (Preface) If you are reading this, it’s most likely because you love to solve puzzles. I’m a very competitive person by nature. The Mt. Everest of puzzles, in my opinion, is trying to find excess returns through active trading in th…

Python 模塊 timedatetime

time & datetime 模塊 在平常的代碼中,我們常常需要與時間打交道。在Python中,與時間處理有關的模塊就包括:time,datetime,calendar(很少用,不講),下面分別來介紹。 在開始之前,首先要說明幾…

大數模板Java

import java.util.*; import java.math.BigInteger; public class Main{public static void main(String args[]){Scanner cinnew Scanner(System.in);BigInteger a,b;acin.nextBigInteger();bcin.nextBigInteger();System.out.println(a.add(b));//加法System.out.println(a.…

檸檬工會_工會經營者

檸檬工會Hey guys! This week we’ll be going over some ways to work with result sets in MySQL. These result sets are the outputs of your everyday queries, such as:大家好! 本周,我們將介紹一些在MySQL中處理結果集的方法。 這些結果集是您日常…

229. 求眾數 II

229. 求眾數 II 給定一個大小為 n 的整數數組,找出其中所有出現超過 ? n/3 ? 次的元素。 示例 1:輸入:[3,2,3] 輸出:[3]示例 2:輸入:nums [1] 輸出:[1]示例 3:輸入:…

寫給Java開發者看的JavaScript對象機制

幫助面向對象開發者理解關于JavaScript對象機制 本文是以一個熟悉OO語言的開發者視角,來解釋JavaScript中的對象。 對于不了解JavaScript 語言,尤其是習慣了OO語言的開發者來說,由于語法上些許的相似會讓人產生心理預期,JavaScrip…

Pythonic---------詳細講解

作者:半載流殤 鏈接:https://zhuanlan.zhihu.com/p/35219750 來源:知乎 著作權歸作者所有。商業轉載請聯系作者獲得授權,非商業轉載請注明出處。Pythonic,簡言之就是以Python這門語言獨特的方式寫出既簡潔又優美的代碼…

大數據ab 測試_在真實數據上進行AB測試應用程序

大數據ab 測試Hello Everyone!大家好! I am back with another article about Data Science. In this article, I will write about what is A-B testing and how to use it on real life data-set to compare two advertisement methods.我回來了另一篇有關數據科…

492. 構造矩形

492. 構造矩形 作為一位web開發者, 懂得怎樣去規劃一個頁面的尺寸是很重要的。 現給定一個具體的矩形頁面面積,你的任務是設計一個長度為 L 和寬度為 W 且滿足以下要求的矩形的頁面。要求: 你設計的矩形頁面必須等于給定的目標面積。 寬度 …

node:爬蟲爬取網頁圖片

前言 周末自己在家閑著沒事,刷著微信,玩著手機,發現自己的微信頭像該換了,就去網上找了一下頭像,看著圖片,自己就想著作為一個碼農,可以把這些圖片都爬取下來做成一個微信小程序,說干…

如何更好的掌握一個知識點_如何成為一個更好的講故事的人3個關鍵點

如何更好的掌握一個知識點You’re launching a digital transformation initiative in the middle of the ongoing pandemic. You are pretty excited about this big-ticket investment, which has the potential to solve remote-work challenges that your organization fac…

centos 搭建jenkins+git+maven

gitmavenjenkins持續集成搭建發布人:[李源] 2017-12-08 04:33:37 一、搭建說明 系統:centos 6.5 jdk:1.8.0_144 jenkins:jenkins-2.93-1.1 git:git-2.9.0 maven:Maven 3.3.9 二、部署 2.1、jdk安裝 1)下…

638. 大禮包

638. 大禮包 在 LeetCode 商店中, 有 n 件在售的物品。每件物品都有對應的價格。然而,也有一些大禮包,每個大禮包以優惠的價格捆綁銷售一組物品。 給你一個整數數組 price 表示物品價格,其中 price[i] 是第 i 件物品的價格。另有…

記錄一次spark連接mysql遇到的問題

在使用spark連接mysql的過程中報錯了,錯誤如下 08:51:32.495 [main] ERROR - Error loading factory org.apache.calcite.jdbc.CalciteJdbc41Factory java.lang.NoClassDefFoundError: org/apache/calcite/linq4j/QueryProviderat java.lang.ClassLoader.defineCla…

什么事數據科學_如果您想進入數據科學,則必須知道的7件事

什么事數據科學No way. No freaking way to enter data science any time soon…That is exactly what I thought a year back.沒門。 很快就不會出現進入數據科學的怪異方式 ……這正是我一年前的想法。 A little bit about my data science story: I am a complete beginner…

python基礎03——數據類型string

1. 字符串介紹 在python中,引號中加了引號的字符都被認為是字符串。 1 namejim 2 address"beijing" 3 msg My name is Jim, I am 22 years old! 那單引號、雙引號、多引號有什么區別呢? 1) 單雙引號木有任何區別,部分情況 需要考慮…

Java基礎-基本數據類型

Java中常見的轉義字符: 某些字符前面加上\代表了一些特殊含義: \r :return 表示把光標定位到本行行首. \n :next 表示把光標定位到下一行同樣的位置. 單獨使用在某些平臺上會產生不同的效果.通常這兩個一起使用,即:\r\n. 表示換行. \t :tab鍵,長度上相當于四個或者是八個空格 …