接facebook廣告

Is our company’s Facebook advertising even worth the effort?

我們公司的Facebook廣告是否值得努力？

題： (QUESTION:)

A company would like to know if their advertising is effective. Before you start, yes…. Facebook does have analytics for users who actually utilize their advertising platform. Our customer does not. Their “advertisements” are posts on their feed and are not marketed by Facebook.

公司想知道他們的廣告是否有效。開始之前，是的。 Facebook確實為實際使用其廣告平臺的用戶提供了分析。我們的客戶沒有。他們的“廣告”是他們的供稿上的帖子，并不由Facebook進行營銷。

數據： (DATA:)

Data is from the client’s POS system and their Facebook feed.

數據來自客戶的POS系統及其Facebook提要。

模型： (MODEL:)

KISS. A simple linear model will suffice.

吻。一個簡單的線性模型就足夠了。

First, we need to obtain our data. We can use a nice Facebook scraper to scrape the last posts in a usable format.

首先，我們需要獲取數據。 我們可以使用一個不錯的Facebook抓取工具以可用的格式抓取最后的帖子。

#install & load scraper !pip install facebook_scraper from facebook_scraper import get_posts import pandas as pd

Lets first scrape the posts from the first 200 posts of their Facebook page.

首先讓我們從其Facebook頁面的前200個帖子中抓取這些帖子。

#scrape post_list = [] for post in get_posts('clients_facebook_page', pages=200): post_list.append(post)#View the data print(post_list[0].keys()) print("Number of Posts: {}".format(len(post_list)))## dict_keys(['post_id', 'text', 'post_text', 'shared_text', 'time', 'image', 'video', 'video_thumbnail', 'likes', 'comments', 'shares', 'post_url', 'link']) ## Number of Posts: 38

Lets clean up the list, keeping only Time, Image, Likes, Comments, Shares.

讓我們清理列表，僅保留時間，圖像，頂，評論，共享。

post_list_cleaned = [] for post in post_list: #create a list of indexes to keep temp = [] indexes_to_keep = ['time', 'image', 'likes', 'comments', 'shares'] for key in indexes_to_keep: temp.append(post[key]) post_list_cleaned.append(temp) #Remove image hyperlink, replace with 0, 1 & recast date for post in post_list_cleaned: if post[1] == None: post[1] = 0 else: post[1] = 1 post[0] = post[0].date

We now need to combine the Facebook data with data from the company’s POS system.

現在，我們需要將Facebook數據與該公司POS系統中的數據結合起來。

#turn into a DataFrame fb_posts_df = pd.DataFrame(post_list_cleaned) fb_posts_df.columns = ['Date', 'image', 'likes', 'comments', 'shares'] #import our POS data daily_sales_df = pd.read_csv('daily_sales.csv') #merge both sets of data combined_df = pd.merge(daily_sales_df, fb_posts_df, on='Date', how='outer')

Finally, lets export the data to a csv. We’ll do our modeling in R.

最后，讓我們將數據導出到csv。 我們將在R中進行建模。

combined_df.to_csv('data.csv')

R分析 (R Analysis)

First, lets import our data from python. We then need to ensure the variables are cast appropriate (ie, dates are actual datetime fields and not just strings’). Finally, we are only conserned with data since the start of 2019.

首先，讓我們從python導入數據。 然后，我們需要確保變量被正確地轉換(即，日期是實際的日期時間字段，而不僅僅是字符串')。 最后，自2019年初以來，我們只了解數據。

library(readr) library(ggplot2) library(gvlma) #set a seed to be reproducable data <- read.table("data.csv", header = TRUE, sep = ",") data <- as.data.frame(data) #rename 'i..Date' to 'Date' names(data)[1] <- c("Date") #set data types data$Sales <- as.numeric(data$Sales) data$Date <- as.Date(data$Date, "%m/%d/%Y") data$Image <- as.factor(data$Image) data$Post <- as.factor(data$Post) #create a set of only 2019+ data data_PY = data[data$Date >= '2019-01-01',] head(data)

6 rows

6排

head(data_PY)

6 rows

6排

summary(data_PY)## Date Sales Post Image Likes ## Min. :2019-01-02 Min. :3181 0:281 0:287 Min. : 0.000 ## 1st Qu.:2019-04-12 1st Qu.:3370 1: 64 1: 58 1st Qu.: 0.000 ## Median :2019-07-24 Median :3456 Median : 0.000 ## Mean :2019-07-24 Mean :3495 Mean : 3.983 ## 3rd Qu.:2019-11-02 3rd Qu.:3606 3rd Qu.: 0.000 ## Max. :2020-02-15 Max. :4432 Max. :115.000 ## Comments Shares ## Min. : 0.0000 Min. :0 ## 1st Qu.: 0.0000 1st Qu.:0 ## Median : 0.0000 Median :0 ## Mean : 0.3101 Mean :0 ## 3rd Qu.: 0.0000 3rd Qu.:0 ## Max. :19.0000 Max. :0

Now that our data’s in, let’s review our summary. We can see our data starts on Jan. 2, 2019 (as we hoped), but we do see one slight problem. When we look at the Post variable, we see it’s highly imbalanced. We have 281 days with no posts and only 64 days with posts. We should re-balance our dataset before doing more analysis to ensure our results aren’t skewed. I’ll rebalance our data by sampling from the larger group (days with no posts), known as undersampling. I’ll also set a random seed so that our numbers are reproducible.

現在已經有了我們的數據，讓我們回顧一下摘要。 我們可以看到我們的數據從2019年1月2日開始(我們希望如此)，但是我們確實看到了一個小問題。 當我們查看Post變量時，我們發現它高度不平衡。 我們有281天無帖子，只有64天有帖子。 在進行更多分析之前，我們應該重新平衡我們的數據集，以確保結果不偏斜。 我將通過從較大的組(無帖子的日子)中進行抽樣來重新平衡數據，這被稱為欠抽樣。 我還將設置一個隨機種子，以便我們的數字可重現。

set.seed(15) zeros = data_PY[data_PY$Post == 0,] samples = sample(281, size = (345-281), replace = FALSE) zeros = zeros[samples, ] balanced = rbind(zeros, data_PY[data_PY$Post == 1,]) summary(balanced$Post)## 0 1 ## 64 64

Perfect, now our data is balanced. We should also do some EDA on our dependent variable (Daily Sales). It’s a good idea to know what our distribution looks like and if we have outliers we should address.

完美，現在我們的數據已經平衡了。 我們還應該對我們的因變量(每日銷售)進行一些EDA。 知道我們的分布是什么樣子是一個好主意，如果有異常值我們應該解決。

hist(balanced$Sales)

boxplot(balanced$Sales)

We can see our data is slightly skewed. Sadly, with real-world data our data is never a perfect normal distribution… Luckily though, we appear to have no outliers in our boxplot. Now we can begin modeling. Since we’re interested in understanding the dynamics of the system and not actually classifying or predicting, we’ll use a standard regression model.

我們可以看到我們的數據略有傾斜。 遺憾的是，對于現實世界的數據，我們的數據永遠不會是完美的正態分布……但是幸運的是，我們的箱線圖中似乎沒有異常值。 現在我們可以開始建模了。 由于我們有興趣了解系統的動態特性，而不是實際進行分類或預測，因此我們將使用標準回歸模型。

model1 <- lm(data=balanced, Sales ~ Post) summary(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -316.22 -114.73 -29.78 111.17 476.49 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3467.1 20.5 169.095 < 2e-16 *** ## Post1 77.9 29.0 2.687 0.00819 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 164 on 126 degrees of freedom ## Multiple R-squared: 0.05418, Adjusted R-squared: 0.04667 ## F-statistic: 7.218 on 1 and 126 DF, p-value: 0.008193gvlma(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 3467.1 77.9 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model1) ## ## Value p-value Decision ## Global Stat 8.351e+00 0.07952 Assumptions acceptable. ## Skewness 6.187e+00 0.01287 Assumptions NOT satisfied! ## Kurtosis 7.499e-01 0.38651 Assumptions acceptable. ## Link Function -1.198e-13 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.414e+00 0.23435 Assumptions acceptable.

Using a standard linear model, we obtain a result that says, on average, a FaceBook post increases daily sales by $77.90. We can see, based on the t-statistic and p-value that this result is highly statistically significant. We can use the GVLMA feature to ensure our model passes the underlying OLS assumptions. Here, we see we pass on all levels except skewness. We already identified earlier that skewness may be a problem with our data. A common correction for skewness is a log transformation. Let’s transform our dependent variable and see if it helps. Note that this model (a log-lin model) will produce coefficients with different interpretations than our last model.

使用標準線性模型，我們得出的結果表明，平均而言，FaceBook發布使每日銷售額增加77.90美元。 根據t統計量和p值，我們可以看到此結果在統計上非常重要。 我們可以使用GVLMA功能來確保我們的模型通過基本的OLS假設。 在這里，我們看到除了偏斜度之外，我們都通過了所有級別。 前面我們已經確定偏斜可能是我們數據的問題。 偏度的常見校正是對數變換。 讓我們轉換我們的因變量，看看是否有幫助。 請注意，此模型(對數線性模型)將產生與上一個模型具有不同解釋的系數。

model2 <- lm(data=balanced, log(Sales) ~ Post) summary(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.092228 -0.032271 -0.007508 0.032085 0.129686 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.150154 0.005777 1410.673 < 2e-16 *** ## Post1 0.021925 0.008171 2.683 0.00827 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.04622 on 126 degrees of freedom ## Multiple R-squared: 0.05406, Adjusted R-squared: 0.04655 ## F-statistic: 7.201 on 1 and 126 DF, p-value: 0.008266gvlma(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 8.15015 0.02193 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model2) ## ## Value p-value Decision ## Global Stat 7.101e+00 0.13063 Assumptions acceptable. ## Skewness 4.541e+00 0.03309 Assumptions NOT satisfied! ## Kurtosis 1.215e+00 0.27030 Assumptions acceptable. ## Link Function -6.240e-14 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.345e+00 0.24614 Assumptions acceptable.plot(model2)

hist(log(balanced$Sales))

Our second model produces another highly significant coefficient for the FaceBook post variable. Here we see that each post is associated with an average 2.19% increase in daily sales. Unfortunately, even our log transformation was unable to correct for the skewness in our data. We’ll have to note this when presenting our findings later. Let’s now determine how much the 2.19% is actually worth (since we saw in model 1 that a post was worth $77.90).

我們的第二個模型為FaceBook post變量生成了另一個非常重要的系數。 在這里，我們看到每個帖子都與日均銷售額平均增長2.19％相關。 不幸的是，即使我們的對數轉換也無法糾正數據中的偏斜。 稍后介紹我們的發現時，我們必須注意這一點。 現在讓我們確定2.19％的實際價值是多少(因為我們在模型1中看到，某帖子的價值為77.90美元)。

mean_sales_no_post <- mean(balanced$Sales[balanced$Post == 0]) mean_sales_with_post <- mean(balanced$Sales[balanced$Post == 1]) mean_sales_no_post * model1$coefficients['Post1']## Post1 ## 270086.1

Very close. Model 2’s coefficient equates to $76.02, which is very similar to our 77.90. Let’s now run another test to see if we get similar results. In analytics, it’s always helpful to arrive at the same conclusion via different means, if possible. This helps solidify our results. Here we can run a standard T-test. Yes, yes, for those other analyst reading, a t-test is the same metric used in the OLS (hence the t-statistic it produces). Here, however, let’s run it on the unbalanced dataset to ensure we didn’t miss anything in sampling our data (perhaps we sampled really good or really bad data that will skew our results).

很接近。 模型2的系數等于$ 76.02，與我們的77.90非常相似。 現在讓我們運行另一個測試，看看是否獲得相似的結果。 在分析中，如果可能的話，通過不同的方法得出相同的結論總是有幫助的。 這有助于鞏固我們的結果。 在這里，我們可以運行標準的T檢驗。 是的，是的，對于其他分析師而言，t檢驗與OLS中使用的度量相同(因此它會產生t統計量)。 但是，在這里，讓我們在不平衡的數據集上運行它，以確保我們在采樣數據時不會遺漏任何東西(也許我們采樣的是好數據還是壞數據會歪曲我們的結果)。

t_test <- t.test(data_PY$Sales[data_PY$Post == 1],data_PY$Sales[data_PY$Post == 0] ) t_test## ## Welch Two Sample t-test ## ## data: data_PY$Sales[data_PY$Post == 1] and data_PY$Sales[data_PY$Post == 0] ## t = 2.5407, df = 89.593, p-value = 0.01278 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 13.3264 108.9259 ## sample estimates: ## mean of x mean of y ## 3544.970 3483.844summary(t_test)## Length Class Mode ## statistic 1 -none- numeric ## parameter 1 -none- numeric ## p.value 1 -none- numeric ## conf.int 2 -none- numeric ## estimate 2 -none- numeric ## null.value 1 -none- numeric ## stderr 1 -none- numeric ## alternative 1 -none- character ## method 1 -none- character ## data.name 1 -none- characterggplot(data = data_PY, aes(Post, Sales, color = Post)) + geom_boxplot() + geom_jitter()

Again, we receive a promising result. On all of the data, our t-statistic was 2.54, meaning we reject the null hypothesis that the difference in the means between the two groups is 0. Our T-test produces a confidence interval of [13.33, 89.59]. This interval includes our previous findings, once again giving us some added confidence. So now we know our FaceBook posts are actually benefiting the business by generating additional daily sales. However, we also saw earlier that our company hasn’t been posting reguarly (which is why we had to rebalance the data).

再次，我們收到了可喜的結果。 在所有數據上，我們的t統計量為2.54，這意味著我們拒絕零假設，即兩組之間的均值差為0。我們的T檢驗得出的置信區間為[13.33，89.59]。 這個間隔包括我們以前的發現，再次給了我們更多的信心。 因此，現在我們知道我們的FaceBook帖子實際上通過產生額外的每日銷售額而使業務受益。 但是，我們還早些時候看到我們的公司并沒有進行過定期過賬(這就是我們必須重新平衡數據的原因)。

length(data_PY$Sales[data_PY$Post == 0])## [1] 281ggplot(data = data.frame(post=as.factor(c('No Post','Post')), m=c(280, 54)) ,aes(x=post, y=m)) + geom_bar(stat='identity', fill='dodgerblue3')

Let’s create a function that takes two inputs: 1) % of additional days advertised, 2) % of advertisements that were effective. The reason for the second argument is that it’s likely unrealistic to assume all over our ads are effective. Indeed, it’s likely we have diminishing returns with more posts (as people probably tire of seeing them or block them in their feed if they become too frequent). Limiting effectiveness will give us some sense of a more reasonable estimate of lost revenue. Another benefit of creating a custom function is that we can quickly re-run the calculations if management desires.

讓我們創建一個接受兩個輸入的函數：1)額外廣告投放天數的百分比，2)有效廣告投放的百分比。 第二種說法的原因是，假設我們所有的廣告都有效是不現實的。 確實，我們可能會通過發布更多的帖子來減少報酬(因為人們可能會厭倦看到它們，或者如果它們變得太頻繁就會阻止它們進入供稿)。 限制有效性將使我們對損失的收入進行更合理的估算。 創建自定義函數的另一個好處是，如果管理層需要，我們可以快速重新運行計算。

#construct a 95% confidence interval around Post1 coefficient conf_interval = confint(model2, "Post1", .90) missed_revenue <- function(pct_addlt_adv, pct_effective){ min = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[1] mean = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * model2$coefficients['Post1'] max = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[2] print(paste(pct_addlt_adv * 280, "additional days of advertising")) sprintf("$%.2f -- $%.2f -- $%.2f",max(min, 0), mean, max } #Missed_revenue(% of additional days advertised, % of advertisements that were effective) missed_revenue(.5, .7)## [1] "140 additional days of advertising"## [1] "$2849.38 -- $7449.57 -- $12049.75"

So if our company had advertised half of the days they didn’t, and only 70% of those adds were effective, we’d have missed out on an average of $7,449.57.

因此，如果我們的公司在一半的時間里沒有刊登廣告，而其中只有70％的廣告有效，那么我們平均會錯失$ 7,449.57。

Originally published at http://lowhangingfruitanalytics.com on August 21, 2020.

最初于 2020年8月21日 發布在 http://lowhangingfruitanalytics.com 上。

翻譯自: https://medium.com/the-innovation/facebook-advertising-analysis-3bedca07d7fe

接facebook廣告

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/391551.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/391551.shtml
英文地址，請注明出處：http://en.pswp.cn/news/391551.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

如何創建自定義進度欄

Originally published on www.florin-pop.com最初發布在www.florin-pop.com The theme for week #14 of the Weekly Coding Challenge is: 每周編碼挑戰第14周的主題是： 進度條 (Progress Bar) A progress bar is used to show how far a user action is still in…

基于SpringBoot的CodeGenerator

title: 基于SpringBoot的CodeGenerator tags: SpringBootMybatis生成器PageHelper categories: springboot date: 2017-11-21 15:13:33背景目前組織上對于一個基礎的crud的框架需求較多因此選擇了SpringBoot作為基礎選型。 Spring Boot是由Pivotal團隊提供的全新框架&#xf…

seaborn線性關系數據可視化：時間線圖|熱圖|結構化圖表可視化

一、線性關系數據可視化lmplot( ) 表示對所統計的數據做散點圖，并擬合一個一元線性回歸關系。 lmplot(x, y, data, hueNone, colNone, rowNone, paletteNone,col_wrapNone, height5, aspect1,markers"o", sharexTrue,shareyTrue, hue_orderNone, col_orde…

hdu 1257

http://acm.hdu.edu.cn/showproblem.php?pid1257 題意：有個攔截系統，這個系統在最開始可以攔截任意高度的導彈，但是之后只能攔截不超過這個導彈高度的導彈，現在有N個導彈需要攔截，問你最少需要多少個攔截系統思路&am…

eda可視化_5用于探索性數據分析（EDA）的高級可視化

eda可視化Early morning, a lady comes to meet Sherlock Holmes and Watson. Even before the lady opens her mouth and starts telling the reason for her visit, Sherlock can tell a lot about a person by his sheer power of observation and deduction. Similarly, we…

我的AWS開發人員考試未通過。現在怎么辦？

I have just taken the AWS Certified Developer - Associate Exam on July 1st of 2019. The result? I failed.我剛剛在2019年7月1日參加了AWS認證開發人員-聯考。結果如何？ 我失敗了。 The AWS Certified Developer - Associate (DVA-C01) has a scaled score …

關系數據可視化gephi

表示對象之間的關系，可通過gephi軟件實現，軟件下載官方地址https://gephi.org/users/download/ 如何來表示兩個對象之間的關系？ 把對象變成點，點的大小、顏色可以是它的兩個參數，兩個點之間的關系可以用連線表示。連線…

Hyperledger Fabric 1.0 從零開始（十二）——fabric-sdk-java應用

Hyperledger Fabric 1.0 從零開始（十）——智能合約（參閱：Hyperledger Fabric Chaincode for Operators——實操智能合約） Hyperledger Fabric 1.0 從零開始（十一）——CouchDB（參閱&a…

css跑道_如何不超出跑道：計劃種子的簡單方法

css跑道There’s lots of startup advice floating around. I’m going to give you a very practical one that’s often missed — how to plan your early growth. The seed round is usually devoted to finding your product-market fit, meaning you start with no or li…

將json 填入表格_如何將Google表格用作JSON端點

將json 填入表格UPDATE: 5/13/2020 - New Share Dialog Box steps available below.更新：5/13/2020-下面提供了新共享對話框步驟。 Thanks Erica H!謝謝埃里卡H！ Are you building a prototype dynamic web application and need to collaborate with …

leetcode 173. 二叉搜索樹迭代器

實現一個二叉搜索樹迭代器類BSTIterator ，表示一個按中序遍歷二叉搜索樹（BST）的迭代器： BSTIterator(TreeNode root) 初始化 BSTIterator 類的一個對象。BST 的根節點 root 會作為構造函數的一部分給出。指針應初始化為一個不存在…

jyputer notebook 、jypyter、IPython basics

1 、修改jupyter默認工作目錄：打開cmd，在命令行下指定想要進的工作目錄，即鍵入“cd d/ G:\0工作面試\學習記錄”標紅部分是想要進入的工作目錄。 2、Tab補全 a、在命令行輸入表達式時，按下Tab鍵即可為任意變量（對象、…

cookie和session（1）

cookie和session 1.cookie產生識別用戶 HTTP是無狀態協議，這就回出現這種現象：當你登錄一個頁面，然后轉到登錄網站的另一個頁面，服務器無法認識到。或者說兩次的訪問，服務器不能認識到是同一個客戶端的訪問&#xff0…

熊貓數據集_為數據科學拆箱熊貓

熊貓數據集If you are already familiar with NumPy, Pandas is just a package build on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of single data type(dtype) Pandas has the flexibility to st…