接facebook廣告
Is our company’s Facebook advertising even worth the effort?
我們公司的Facebook廣告是否值得努力?
題: (QUESTION:)
A company would like to know if their advertising is effective. Before you start, yes…. Facebook does have analytics for users who actually utilize their advertising platform. Our customer does not. Their “advertisements” are posts on their feed and are not marketed by Facebook.
公司想知道他們的廣告是否有效。 開始之前,是的。 Facebook確實為實際使用其廣告平臺的用戶提供了分析。 我們的客戶沒有。 他們的“廣告”是他們的供稿上的帖子,并不由Facebook進行營銷。
數據: (DATA:)
Data is from the client’s POS system and their Facebook feed.
數據來自客戶的POS系統及其Facebook提要。
模型: (MODEL:)
KISS. A simple linear model will suffice.
吻。 一個簡單的線性模型就足夠了。
First, we need to obtain our data. We can use a nice Facebook scraper to scrape the last posts in a usable format.
首先,我們需要獲取數據。 我們可以使用一個不錯的Facebook抓取工具以可用的格式抓取最后的帖子。
#install & load scraper !pip install facebook_scraper from facebook_scraper import get_posts import pandas as pd
Lets first scrape the posts from the first 200 posts of their Facebook page.
首先讓我們從其Facebook頁面的前200個帖子中抓取這些帖子。
#scrape post_list = [] for post in get_posts('clients_facebook_page', pages=200): post_list.append(post)#View the data print(post_list[0].keys()) print("Number of Posts: {}".format(len(post_list)))## dict_keys(['post_id', 'text', 'post_text', 'shared_text', 'time', 'image', 'video', 'video_thumbnail', 'likes', 'comments', 'shares', 'post_url', 'link']) ## Number of Posts: 38
Lets clean up the list, keeping only Time, Image, Likes, Comments, Shares.
讓我們清理列表,僅保留時間,圖像,頂,評論,共享。
post_list_cleaned = [] for post in post_list: #create a list of indexes to keep temp = [] indexes_to_keep = ['time', 'image', 'likes', 'comments', 'shares'] for key in indexes_to_keep: temp.append(post[key]) post_list_cleaned.append(temp) #Remove image hyperlink, replace with 0, 1 & recast date for post in post_list_cleaned: if post[1] == None: post[1] = 0 else: post[1] = 1 post[0] = post[0].date
We now need to combine the Facebook data with data from the company’s POS system.
現在,我們需要將Facebook數據與該公司POS系統中的數據結合起來。
#turn into a DataFrame fb_posts_df = pd.DataFrame(post_list_cleaned) fb_posts_df.columns = ['Date', 'image', 'likes', 'comments', 'shares'] #import our POS data daily_sales_df = pd.read_csv('daily_sales.csv') #merge both sets of data combined_df = pd.merge(daily_sales_df, fb_posts_df, on='Date', how='outer')
Finally, lets export the data to a csv. We’ll do our modeling in R.
最后,讓我們將數據導出到csv。 我們將在R中進行建模。
combined_df.to_csv('data.csv')
R分析 (R Analysis)
First, lets import our data from python. We then need to ensure the variables are cast appropriate (ie, dates are actual datetime fields and not just strings’). Finally, we are only conserned with data since the start of 2019.
首先,讓我們從python導入數據。 然后,我們需要確保變量被正確地轉換(即,日期是實際的日期時間字段,而不僅僅是字符串')。 最后,自2019年初以來,我們只了解數據。
library(readr) library(ggplot2) library(gvlma) #set a seed to be reproducable data <- read.table("data.csv", header = TRUE, sep = ",") data <- as.data.frame(data) #rename 'i..Date' to 'Date' names(data)[1] <- c("Date") #set data types data$Sales <- as.numeric(data$Sales) data$Date <- as.Date(data$Date, "%m/%d/%Y") data$Image <- as.factor(data$Image) data$Post <- as.factor(data$Post) #create a set of only 2019+ data data_PY = data[data$Date >= '2019-01-01',] head(data)
6 rows
6排
head(data_PY)
6 rows
6排
summary(data_PY)## Date Sales Post Image Likes ## Min. :2019-01-02 Min. :3181 0:281 0:287 Min. : 0.000 ## 1st Qu.:2019-04-12 1st Qu.:3370 1: 64 1: 58 1st Qu.: 0.000 ## Median :2019-07-24 Median :3456 Median : 0.000 ## Mean :2019-07-24 Mean :3495 Mean : 3.983 ## 3rd Qu.:2019-11-02 3rd Qu.:3606 3rd Qu.: 0.000 ## Max. :2020-02-15 Max. :4432 Max. :115.000 ## Comments Shares ## Min. : 0.0000 Min. :0 ## 1st Qu.: 0.0000 1st Qu.:0 ## Median : 0.0000 Median :0 ## Mean : 0.3101 Mean :0 ## 3rd Qu.: 0.0000 3rd Qu.:0 ## Max. :19.0000 Max. :0
Now that our data’s in, let’s review our summary. We can see our data starts on Jan. 2, 2019 (as we hoped), but we do see one slight problem. When we look at the Post variable, we see it’s highly imbalanced. We have 281 days with no posts and only 64 days with posts. We should re-balance our dataset before doing more analysis to ensure our results aren’t skewed. I’ll rebalance our data by sampling from the larger group (days with no posts), known as undersampling. I’ll also set a random seed so that our numbers are reproducible.
現在已經有了我們的數據,讓我們回顧一下摘要。 我們可以看到我們的數據從2019年1月2日開始(我們希望如此),但是我們確實看到了一個小問題。 當我們查看Post變量時,我們發現它高度不平衡。 我們有281天無帖子,只有64天有帖子。 在進行更多分析之前,我們應該重新平衡我們的數據集,以確保結果不偏斜。 我將通過從較大的組(無帖子的日子)中進行抽樣來重新平衡數據,這被稱為欠抽樣。 我還將設置一個隨機種子,以便我們的數字可重現。
set.seed(15) zeros = data_PY[data_PY$Post == 0,] samples = sample(281, size = (345-281), replace = FALSE) zeros = zeros[samples, ] balanced = rbind(zeros, data_PY[data_PY$Post == 1,]) summary(balanced$Post)## 0 1 ## 64 64
Perfect, now our data is balanced. We should also do some EDA on our dependent variable (Daily Sales). It’s a good idea to know what our distribution looks like and if we have outliers we should address.
完美,現在我們的數據已經平衡了。 我們還應該對我們的因變量(每日銷售)進行一些EDA。 知道我們的分布是什么樣子是一個好主意,如果有異常值我們應該解決。
hist(balanced$Sales)
boxplot(balanced$Sales)
We can see our data is slightly skewed. Sadly, with real-world data our data is never a perfect normal distribution… Luckily though, we appear to have no outliers in our boxplot. Now we can begin modeling. Since we’re interested in understanding the dynamics of the system and not actually classifying or predicting, we’ll use a standard regression model.
我們可以看到我們的數據略有傾斜。 遺憾的是,對于現實世界的數據,我們的數據永遠不會是完美的正態分布……但是幸運的是,我們的箱線圖中似乎沒有異常值。 現在我們可以開始建模了。 由于我們有興趣了解系統的動態特性,而不是實際進行分類或預測,因此我們將使用標準回歸模型。
model1 <- lm(data=balanced, Sales ~ Post) summary(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -316.22 -114.73 -29.78 111.17 476.49 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3467.1 20.5 169.095 < 2e-16 *** ## Post1 77.9 29.0 2.687 0.00819 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 164 on 126 degrees of freedom ## Multiple R-squared: 0.05418, Adjusted R-squared: 0.04667 ## F-statistic: 7.218 on 1 and 126 DF, p-value: 0.008193gvlma(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 3467.1 77.9 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model1) ## ## Value p-value Decision ## Global Stat 8.351e+00 0.07952 Assumptions acceptable. ## Skewness 6.187e+00 0.01287 Assumptions NOT satisfied! ## Kurtosis 7.499e-01 0.38651 Assumptions acceptable. ## Link Function -1.198e-13 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.414e+00 0.23435 Assumptions acceptable.
Using a standard linear model, we obtain a result that says, on average, a FaceBook post increases daily sales by $77.90. We can see, based on the t-statistic and p-value that this result is highly statistically significant. We can use the GVLMA feature to ensure our model passes the underlying OLS assumptions. Here, we see we pass on all levels except skewness. We already identified earlier that skewness may be a problem with our data. A common correction for skewness is a log transformation. Let’s transform our dependent variable and see if it helps. Note that this model (a log-lin model) will produce coefficients with different interpretations than our last model.
使用標準線性模型,我們得出的結果表明,平均而言,FaceBook發布使每日銷售額增加77.90美元。 根據t統計量和p值,我們可以看到此結果在統計上非常重要。 我們可以使用GVLMA功能來確保我們的模型通過基本的OLS假設。 在這里,我們看到除了偏斜度之外,我們都通過了所有級別。 前面我們已經確定偏斜可能是我們數據的問題。 偏度的常見校正是對數變換。 讓我們轉換我們的因變量,看看是否有幫助。 請注意,此模型(對數線性模型)將產生與上一個模型具有不同解釋的系數。
model2 <- lm(data=balanced, log(Sales) ~ Post) summary(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.092228 -0.032271 -0.007508 0.032085 0.129686 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.150154 0.005777 1410.673 < 2e-16 *** ## Post1 0.021925 0.008171 2.683 0.00827 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.04622 on 126 degrees of freedom ## Multiple R-squared: 0.05406, Adjusted R-squared: 0.04655 ## F-statistic: 7.201 on 1 and 126 DF, p-value: 0.008266gvlma(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 8.15015 0.02193 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model2) ## ## Value p-value Decision ## Global Stat 7.101e+00 0.13063 Assumptions acceptable. ## Skewness 4.541e+00 0.03309 Assumptions NOT satisfied! ## Kurtosis 1.215e+00 0.27030 Assumptions acceptable. ## Link Function -6.240e-14 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.345e+00 0.24614 Assumptions acceptable.plot(model2)
hist(log(balanced$Sales))
Our second model produces another highly significant coefficient for the FaceBook post variable. Here we see that each post is associated with an average 2.19% increase in daily sales. Unfortunately, even our log transformation was unable to correct for the skewness in our data. We’ll have to note this when presenting our findings later. Let’s now determine how much the 2.19% is actually worth (since we saw in model 1 that a post was worth $77.90).
我們的第二個模型為FaceBook post變量生成了另一個非常重要的系數。 在這里,我們看到每個帖子都與日均銷售額平均增長2.19%相關。 不幸的是,即使我們的對數轉換也無法糾正數據中的偏斜。 稍后介紹我們的發現時,我們必須注意這一點。 現在讓我們確定2.19%的實際價值是多少(因為我們在模型1中看到,某帖子的價值為77.90美元)。
mean_sales_no_post <- mean(balanced$Sales[balanced$Post == 0]) mean_sales_with_post <- mean(balanced$Sales[balanced$Post == 1]) mean_sales_no_post * model1$coefficients['Post1']## Post1 ## 270086.1
Very close. Model 2’s coefficient equates to $76.02, which is very similar to our 77.90. Let’s now run another test to see if we get similar results. In analytics, it’s always helpful to arrive at the same conclusion via different means, if possible. This helps solidify our results. Here we can run a standard T-test. Yes, yes, for those other analyst reading, a t-test is the same metric used in the OLS (hence the t-statistic it produces). Here, however, let’s run it on the unbalanced dataset to ensure we didn’t miss anything in sampling our data (perhaps we sampled really good or really bad data that will skew our results).
很接近。 模型2的系數等于$ 76.02,與我們的77.90非常相似。 現在讓我們運行另一個測試,看看是否獲得相似的結果。 在分析中,如果可能的話,通過不同的方法得出相同的結論總是有幫助的。 這有助于鞏固我們的結果。 在這里,我們可以運行標準的T檢驗。 是的,是的,對于其他分析師而言,t檢驗與OLS中使用的度量相同(因此它會產生t統計量)。 但是,在這里,讓我們在不平衡的數據集上運行它,以確保我們在采樣數據時不會遺漏任何東西(也許我們采樣的是好數據還是壞數據會歪曲我們的結果)。
t_test <- t.test(data_PY$Sales[data_PY$Post == 1],data_PY$Sales[data_PY$Post == 0] ) t_test## ## Welch Two Sample t-test ## ## data: data_PY$Sales[data_PY$Post == 1] and data_PY$Sales[data_PY$Post == 0] ## t = 2.5407, df = 89.593, p-value = 0.01278 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 13.3264 108.9259 ## sample estimates: ## mean of x mean of y ## 3544.970 3483.844summary(t_test)## Length Class Mode ## statistic 1 -none- numeric ## parameter 1 -none- numeric ## p.value 1 -none- numeric ## conf.int 2 -none- numeric ## estimate 2 -none- numeric ## null.value 1 -none- numeric ## stderr 1 -none- numeric ## alternative 1 -none- character ## method 1 -none- character ## data.name 1 -none- characterggplot(data = data_PY, aes(Post, Sales, color = Post)) + geom_boxplot() + geom_jitter()
Again, we receive a promising result. On all of the data, our t-statistic was 2.54, meaning we reject the null hypothesis that the difference in the means between the two groups is 0. Our T-test produces a confidence interval of [13.33, 89.59]. This interval includes our previous findings, once again giving us some added confidence. So now we know our FaceBook posts are actually benefiting the business by generating additional daily sales. However, we also saw earlier that our company hasn’t been posting reguarly (which is why we had to rebalance the data).
再次,我們收到了可喜的結果。 在所有數據上,我們的t統計量為2.54,這意味著我們拒絕零假設,即兩組之間的均值差為0。我們的T檢驗得出的置信區間為[13.33,89.59]。 這個間隔包括我們以前的發現,再次給了我們更多的信心。 因此,現在我們知道我們的FaceBook帖子實際上通過產生額外的每日銷售額而使業務受益。 但是,我們還早些時候看到我們的公司并沒有進行過定期過賬(這就是我們必須重新平衡數據的原因)。
length(data_PY$Sales[data_PY$Post == 0])## [1] 281ggplot(data = data.frame(post=as.factor(c('No Post','Post')), m=c(280, 54)) ,aes(x=post, y=m)) + geom_bar(stat='identity', fill='dodgerblue3')
Let’s create a function that takes two inputs: 1) % of additional days advertised, 2) % of advertisements that were effective. The reason for the second argument is that it’s likely unrealistic to assume all over our ads are effective. Indeed, it’s likely we have diminishing returns with more posts (as people probably tire of seeing them or block them in their feed if they become too frequent). Limiting effectiveness will give us some sense of a more reasonable estimate of lost revenue. Another benefit of creating a custom function is that we can quickly re-run the calculations if management desires.
讓我們創建一個接受兩個輸入的函數:1)額外廣告投放天數的百分比,2)有效廣告投放的百分比。 第二種說法的原因是,假設我們所有的廣告都有效是不現實的。 確實,我們可能會通過發布更多的帖子來減少報酬(因為人們可能會厭倦看到它們,或者如果它們變得太頻繁就會阻止它們進入供稿)。 限制有效性將使我們對損失的收入進行更合理的估算。 創建自定義函數的另一個好處是,如果管理層需要,我們可以快速重新運行計算。
#construct a 95% confidence interval around Post1 coefficient conf_interval = confint(model2, "Post1", .90) missed_revenue <- function(pct_addlt_adv, pct_effective){ min = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[1] mean = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * model2$coefficients['Post1'] max = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[2] print(paste(pct_addlt_adv * 280, "additional days of advertising")) sprintf("$%.2f -- $%.2f -- $%.2f",max(min, 0), mean, max } #Missed_revenue(% of additional days advertised, % of advertisements that were effective) missed_revenue(.5, .7)## [1] "140 additional days of advertising"## [1] "$2849.38 -- $7449.57 -- $12049.75"
So if our company had advertised half of the days they didn’t, and only 70% of those adds were effective, we’d have missed out on an average of $7,449.57.
因此,如果我們的公司在一半的時間里沒有刊登廣告,而其中只有70%的廣告有效,那么我們平均會錯失$ 7,449.57。
Originally published at http://lowhangingfruitanalytics.com on August 21, 2020.
最初于 2020年8月21日 發布在 http://lowhangingfruitanalytics.com 上。
翻譯自: https://medium.com/the-innovation/facebook-advertising-analysis-3bedca07d7fe
接facebook廣告
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/391551.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/391551.shtml 英文地址,請注明出處:http://en.pswp.cn/news/391551.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!