mardown 標題帶數字_標題中帶有數字的故事更成功嗎?

mardown 標題帶數字

統計 (Statistics)

I have read a few stories on Medium about writing advice, and there were some of them which, along with other tips, suggested that putting numbers in your story’s title will increase the number of views, as people tend to be more attracted by such headlines, and therefore, more people will click on your story.

我已經閱讀了有關“撰寫建議”的“媒體”上的一些故事,其中有一些故事以及其他技巧建議將數字放在故事標題中會增加觀看次數,因為人們通常會被此類標題吸引,因此會有更多人點擊您的故事。

It seems interesting that people are attracted by such headlines. But I don’t like to take things for granted. I want to convince myself that this fact is actually true.

人們被這樣的頭條新聞吸引似乎很有趣。 但是我不喜歡把事情當作理所當然。 我想說服自己,這個事實確實是真的。

So, what I have been thinking? Let’s use Statistics to check if this thing is actually true. But Statistics is useless without data. I first need to obtain some data about Medium articles and use that to do hypothesis testing. Therefore, I used Python and Beautiful Soup to scrape data about a random set of 6K+ Medium articles from 7 different publications. This dataset can be found on Kaggle. If you want to see how I scraped this data, I have an article about that here:

所以,我一直在想什么? 讓我們使用統計信息來檢查這件事是否真的正確。 但是,如果沒有數據,統計信息將無用。 我首先需要獲取有關中型文章的一些數據,并使用這些數據進行假設檢驗。 因此,我使用Python和Beautiful Soup來刮取來自7個不同出版物的6K +中型隨機文章集的數據。 該數據集可以在Kaggle上找到。 如果您想了解如何抓取這些數據,請在此處發表有關此內容的文章:

What we are going to do now is to split this dataset into 2 groups (or samples): one that has numbers in headlines and one without numbers. Then, we will do a hypothesis test on the expected value for the number of claps in these 2 groups. We use the number of claps as a measure of “how successful” a story is, although a more logical variable for our scenario would be the number of views as it is the one that is more directly affected by our choice for the title. People typically click on a story because of the preview that they see (including headline and image), and then after they read the story, they decide whether to clap or not. But, because the number of views is not publicly shown on Medium, we use the number of claps as it should be highly correlated with views (the more the views, the more likely is that someone would clap).

現在,我們要做的就是將此數據集分為2組(或樣本):一組在標題上有數字,而另一組沒有數字。 然后,我們將對這兩組的拍手數量的期望值進行假設檢驗。 我們使用拍手次數來衡量故事的“成功”程度,盡管對于我們的場景而言,更合乎邏輯的變量是觀看次數,因為觀看次數會直接受到我們對標題的選擇的影響。 人們通常會因為看到的預覽(包括標題和圖像)而點擊故事,然后在閱讀故事后決定是否拍手。 但是,由于視圖的數量未在“媒體”上公開顯示,因此我們使用拍手的數量,因為它應該與視圖高度相關(視圖越多,有人拍手的可能性就越大)。

If you are not familiar with hypothesis testing, here is an article you can read:

如果您不熟悉假設檢驗,則可以閱讀以下文章:

That being said, we will consider the following model:

話雖如此,我們將考慮以下模型:

Sample 1: Articles with numbers in headlines

示例1:標題中帶有數字的文章

We will model the number of claps inside this group as n i.i.d. (independent and identically distributed) random variables: X?, X?, …, X? with expected value μ? and variance σ?2, both of which are finite.

我們將這個組中拍手的數量建模為n iid(獨立且分布均勻)的隨機變量:X 1,X 2,…,X 1,其期望值為μ1,方差為σ2,這兩個都是有限的。

Sample 2: Articles without numbers in headlines

示例2:標題中沒有數字的文章

We will model the number of claps inside this group as m i.i.d. random variables: Y?, Y?, …, Y? with expected value μ? and variance σ?2, both of which are finite.

我們會為m IID隨機變量本組內拍手的次數型號:Y 1 Y 2,...,Y?與期望值μ?和方差σ?2,這兩者都是有限的。

We formulate the null hypothesis as “articles with numbers in headlines bring no improvement over articles that have no numbers in headlines”, and the alternative hypothesis as “articles with numbers in headlines are more successful compared to articles without numbers in headlines”.

我們將無效假設表述為“標題中帶有數字的文章與標題中沒有數字的文章沒有任何改進”,替代假設為“標題中帶有數字的文章比標題中沒有數字的文章更成功”。

Mathematically this means:

從數學上講,這意味著:

Image for post

We will consider the following test statistic:

我們將考慮以下測試統計信息:

Image for post

Where Xn bar and Ym bar are the averages of sample 1, respectively sample 2.

其中Xn bar和Ym bar是樣本1的平均值,分別是樣本2的平均值。

Because the sample sizes are pretty large and due to the Central Limit Theorem, the probability distribution of our test statistic Z can be approximated very well by a standard normal distribution, and the true variances σ?2, σ?2 should be very close to the estimated variances from our data. So, when we compute the test statistic, we can just substitute the estimated variances for σ?2, σ?2.

由于樣本量非常大并且由于中心極限定理,所以我們的測試統計量Z的概率分布可以通過標準正態分布很好地近似,并且真實方差σ?2,σ22應該非常接近于估計的方差我們的數據。 因此,當我們計算檢驗統計量時,我們可以僅將估計方差替換為σ?2,σ22。

But, what about μ? - μ?? By assuming H? to be true, it follows that μ? - μ? ≤ 0. And we choose μ? - μ? = 0 as this value is the worst-case scenario for the probability of type I error (we don’t want to underestimate the error).

但是,μ?-μ2呢? 通過假設H?為真,可得出μ?-μ2≤0。我們選擇μ?-μ2 = 0,因為該值是I型錯誤概率的最壞情況(我們不想低估該錯誤) )。

Now, let’s run some Python code. We start by importing the required packages and defining a utility function: like(x, pattern). This function is used to match regular expressions in pandas data frames; x is the column, and pattern is a regular expression. I named this function after SQL’s LIKE operator as it is meant to do something similar, but for pandas data frames.

現在,讓我們運行一些Python代碼。 我們首先導入所需的包并定義一個實用程序函數: like(x, pattern) 。 該函數用于匹配熊貓數據幀中的正則表達式。 x是列,而pattern是正則表達式。 我將此函數命名為SQL的LIKE運算符,因為它的意思是做類似的事情,但適用于熊貓數據幀。

Image for post

After that, we read the CSV file into a pandas data frame:

之后,我們將CSV文件讀入pandas數據框中:

Image for post

We make sure we don’t have missing values in the “title” or “claps” columns:

我們確保在“標題”或“拍子”列中沒有缺失的值:

Image for post

Then, we create 2 new data frames (numbers/no-numbers) using the like() function defined earlier:

然后,我們使用前面定義的like()函數創建2個新的數據幀(數字/無數字):

Image for post

These 2 new data frames are shown below:

這兩個新數據幀如下所示:

Image for post
Image for post

After that, we compute the quantities that we need for the test statistic:

之后,我們計算測試統計所需的數量:

Image for post

Now, we compute the test statistic and the p-value. In our case, because we’re doing a one-sided test, the p-value is the area to the right of our test statistic under a standard gaussian:

現在,我們計算檢驗統計量和p值。 在我們的例子中,因為我們正在進行單面測試,所以p值是標準高斯下測試統計量右側的面積:

Image for post

And we got a p-value much smaller than the usual threshold of 0.05. That’s good news, we can reject the null hypothesis very confidently.

而且我們得到的p值比通常的閾值0.05小得多。 這是個好消息,我們可以非常有信心地拒絕原假設。

For a significance level of α = 0.001, it follows that p ≈ 0.0009 < α, and therefore we reject the null hypothesis and accept the alternative. In plain English, this means: “We are 99.9% confident that stories with numbers in their headlines are expected to have more claps than stories without numbers in headlines”.

對于顯著性水平α= 0.001,可以得出p≈0.0009 <α,因此我們拒絕零假設并接受替代假設。 用簡單的英語來說,這意味著:“ 我們相信99.9%的人相信標題中帶有數字的故事比標題中沒有數字的故事更有拍手聲 ”。

You can find the Jupyter notebook on Kaggle.

您可以在Kaggle上找到Jupyter筆記本。

Thanks for reading!

謝謝閱讀!

翻譯自: https://medium.com/towards-artificial-intelligence/are-stories-with-numbers-in-headlines-more-successful-b925cae2f6b4

mardown 標題帶數字

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388645.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388645.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388645.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

897. 遞增順序查找樹-未解決

897. 遞增順序查找樹 https://leetcode-cn.com/contest/weekly-contest-100/problems/increasing-order-search-tree/ package com.test;import java.util.ArrayList; import java.util.Collections; import java.util.List;/*** author stono* date 2018/9/2* 897. 遞增順序查…

Azure PowerShell (16) 并行開關機Azure ARM VM

《Windows Azure Platform 系列文章目錄》 并行開機腳本&#xff1a; https://github.com/leizhang1984/AzureChinaPowerShell/blob/master/ARM/2StartAzureARMVM/StartAzureRMVM.txt 并行關機腳本&#xff1a; https://github.com/leizhang1984/AzureChinaPowerShell/blob/mas…

使用Pandas 1.1.0進行穩健的2個DataFrames驗證

Pandas is one of the most used Python library for both data scientist and data engineers. Today, I want to share some Python tips to help us do qualification checks between 2 Dataframes.Pandas是數據科學家和數據工程師最常用的Python庫之一。 今天&#xff0c;我…

Maya開發

Maya開發&#xff08;一&#xff09;-- 緒論 &#xff08;翻譯自Maya官方文檔&#xff09;2008-05-09 15:33 緒論 Autodesk Maya 是一個開放的產品,就是說任何Autodesk以外的人都可以改變Maya現有的特征,或者 增加新的特性.你可以用兩個方法來修改MAYA: ME…

織夢在線報名平臺php,DedeCMSv5

DedeCMS v5國內專業的PHP網站內容管理系統-織夢內容管理系統v5.8 Roadmap狀態 ? 已完成 &#x1f528; 進行中 ? 未完成項目開發可以到織夢開發問題管理中進行交流反饋。&#x1f528; 調整DedeCMS目錄結構&#xff0c;將原有include中外部訪問的內容遷移出去&#xff1b;&am…

pom.xml文件詳解

<project xmlns"http://maven.apache.org/POM/4.0.0" xmlns:xsi"http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd "> <!-- 父項目的坐…

軟件工程第一次作業

&#xff08;1&#xff09;回想一下你初入大學時對計算機專業的暢想 當初你是如何做出選擇計算機專業的決定的&#xff1f; 當初選擇計算機專業是因為之前看大佬們參加信息競賽&#xff0c;覺得很厲害、很有意思&#xff0c;而且也希望能自己做一款游戲出來&#xff0c;所以就選…

置信區間的置信區間_什么是置信區間,為什么人們使用它們?

置信區間的置信區間I’m going to try something a little different today, in which I combine two (completely unrelated) topics I love talking about, and hopefully create something that is interesting and educational.今天&#xff0c;我將嘗試一些與眾不同的東西…

事實上著就是MAYA4.5完全手冊插件篇的內容

不過著好象侵權了&#xff0c;因為&#xff21;&#xff2c;&#xff29;&#xff21;&#xff33;聲明不得一任何方式傳播該手冊的部分或全部_炙墨追零 Maya不為插件提供二進制兼容性。每當發布新版本時&#xff0c;舊插件的源代碼要重新編譯。然而&#xff0c;我們的目標是保…

制作alipay-sdk-java包到本地倉庫

項目要用到支付寶的掃碼支付&#xff0c;后臺使用的maven 問了客服 官方目前沒有 maven 的地址只能手動安裝到本地了&#xff0c;如果建了maven 服務器也可以上傳到服務器上 從支付寶官網上下載sdk 制作本地安裝包 alipay-sdk-java.jar 放到D: 盤根目錄下 執行命令&#xff1a…

php中wlog是什么意思,d-log模式是什么意思

D-Log是一種高動態范圍的視頻素材記錄格式&#xff0c;總而言之這個色彩模式為后期調色提供了更大的空間。在相機和攝影機拍攝時&#xff0c;一顆高性能的傳感器通常支持11檔以上的動態范圍&#xff0c;而在8bit的照片或視頻上&#xff0c;以符合人眼感知的Gamma進行機內處理和…

PowerShell入門(三):如何快速地掌握PowerShell?

如何快速地掌握PowerShell呢&#xff1f;總的來說&#xff0c;就是要盡可能多的使用它&#xff0c;就像那句諺語說的&#xff1a;Practice makes perfect。當然這里還有一些原則和方法讓我們可以遵循。 有效利用交互式環境 一般來說&#xff0c;PowerShell有兩個主要的運行環境…

pca 主成分分析_通過主成分分析(PCA)了解您的數據并發現潛在模式

pca 主成分分析Save time, resources and stay healthy with data exploration that goes beyond means, distributions and correlations: Leverage PCA to see through the surface of variables. It saves time and resources, because it uncovers data issues before an h…

宏觀數據提取

http://219.235.129.58/fbk/checkSession.jsp http://219.235.129.58/indicatorYear.do?methodgetDczdsXml&bgqbDm1 大列表 http://219.235.129.58/indicatorYear.do?methodgetDczdsXml&bgqbDm1&FBZD_FJD0300&timeTypeyear 第二層 http://219.235.129.58/…

UML-- plantUML安裝

plantUML安裝 因為基于intellid idea,所以第一步自行安裝.setting->plugins 搜索plantUML安裝完成后&#xff0c;重啟idea 會有如下顯示安裝Graphviz 下載地址 https://graphviz.gitlab.io/_pages/Download/Download_windows.html配置Graphviz環境變量&#xff1a; dot -ver…

php不發送referer,php – 注意:未定義的索引:HTTP_REFERER

注意&#xff1a;未定義的索引&#xff1a;HTTP_REFERER$http_referer $_SERVER [‘HTTP_REFERER’]我使用這個從教程,它看起來不錯代碼也從包含文件中調用它我該怎么改&#xff1f;我添加了print_r($_ SERVER);現在頁面給了我這個Array ([UNIQUE_ID] > UoSxWa56310AAAwUck…

HDU 最大報銷額 (0 1 背包)

最大報銷額 Time Limit : 1000/1000ms (Java/Other) Memory Limit : 32768/32768K (Java/Other) Total Submission(s) : 29 Accepted Submission(s) : 11 Problem Description現有一筆經費可以報銷一定額度的發票。允許報銷的發票類型包括買圖書&#xff08;A類&#xff09…

rstudio 關聯r_使用關聯規則提出建議(R編程)

rstudio 關聯r背景 (Background) Retailers typically have a wealth of customer transaction data which consists of the type of items purchased by a customer, their value and the date they were purchased. Unless the retailer has a loyalty rewards system, they …

PHP進程1608占用了9012,swoole (ERRNO 9012): worker exit timeout, forced to terminate

swoole server下使用了swoole_event_add&#xff0c;在關閉服務的時候日志中出現了提示swWorker_reactor_is_empty (ERRNO 9012): worker exit timeout, forced to terminate并且關閉服務時間比正常情況下要慢。解決方法開啟 reload_async > true 配置注冊onWorderExit回調&…

C#高級應用之CodeDomProvider引擎篇 .

using System; using System.Text; using System.CodeDom.Compiler; using System.Reflection; using Microsoft.CSharp; namespace ToolPackages.CodeDomProvider { public class SampleCodeDomProvider { //代碼生成器對象 private static System.CodeDom.Compiler.Code…