搜索引擎優化學習原理_如何使用數據科學原理來改善您的搜索引擎優化工作

搜索引擎優化學習原理

Search Engine Optimisation (SEO) is the discipline of using knowledge gained around how search engines work to build websites and publish content that can be found on search engines by the right people at the right time.

搜索引擎優化(SEO)是一門學科,它使用有關搜索引擎如何工作的知識來構建網站和發布內容,這些內容可以由合適的人在正確的時間在搜索引擎上找到。

Some people say that you don’t really need SEO and they take a Field of Dreams ‘build it and they shall come’ approach. The size of the SEO industry is predicted to be $80 billion by the end of 2020. There are at least some people who like to hedge their bets.

有人說您真的不需要SEO,而他們卻選擇了“夢想之場 ”來構建它,然后他們就會來。 到2020年底,SEO行業的規模預計將達到800億美元。至少有些人喜歡對沖自己的賭注。

An often-quoted statistic is that Google’s ranking algorithm contains more than 200 factors for ranking web pages and SEO is often seen as an ‘arms race’ between its practitioners and the search engines. With people looking for the next ‘big thing’ and putting themselves into tribes (white hat, black hat and grey hat).

經常被引用的統計數據是Google的排名算法包含200多個用于對網頁進行排名的因素 ,而SEO通常被視為其從業者與搜索引擎之間的“軍備競賽”。 人們正在尋找下一個“大事情”,并將自己納入部落( 白帽子 , 黑帽子和灰帽子 )。

There is a huge amount of data generated by SEO activity and its plethora of tools. For context, the industry-standard crawling tool Screaming Frog has 26 different reports filled with web page metrics on things you wouldn’t even think are important (but are). That is a lot of data to munge and find interesting insights from.

SEO活動及其大量工具生成了大量數據。 就上下文而言,行業標準的爬網工具Screaming Frog有26種不同的報告,其中包含關于您甚至不認為很重要(但很重要)的內容的網頁指標。 需要大量的數據來進行整理并從中找到有趣的見解。

The SEO mindset also lends itself well to the data science ideal of munging data and using statistics and algorithms to derive insights and tell stories. SEO practitioners have been pouring over all of this data for 2 decades trying to figure out the next best thing to do and to demonstrate value to clients.

SEO的思維方式也非常適合數據科學的理想,即處理數據并使用統計數據和算法來獲得見解和講故事。 SEO從業人員已經傾注了所有這些數據長達20年之久,試圖找出下一步要做的事情,并向客戶展示價值。

Despite access to all of this data, there is still a lot of guesswork in SEO and while some people and agencies test different ideas to see what performs well, a lot of the time it comes down to the opinion of the person with the best track record and overall experience on the team.

盡管可以訪問所有這些數據,但SEO仍然存在很多猜測,盡管有些人和機構測試不同的想法以查看效果良好,但很多時候卻取決于最佳跟蹤者的意見。記錄和團隊的整體經驗。

I’ve found myself in this position a lot in my career and this is something I would like to address now that I have acquired some data science skills of my own. In this article, I will point you to some resources that will allow you to take more data-led approach to your SEO efforts.

在我的職業生涯中,我經常擔任這個職位,這是我現在要解決的問題,因為我已經掌握了一些數據科學技能。 在本文中,我將為您指出一些資源,這些資源將使您可以采用更多以數據為主導的方法來進行SEO。

SEO測試 (SEO Testing)

One of the most often asked questions in SEO is ‘We’ve implemented these changes on a client’s webaite, but did they have an effect?’. This often leads to the idea that if the website traffic went up ‘it worked’ and if the traffic went down it was ‘seasonality’. That is hardly a rigorous approach.

SEO中最常被問到的問題之一是“我們已經在客戶的Webaite上實施了這些更改,但是它們有效果嗎?”。 這通常導致這樣的想法:如果網站流量上升,則“正常”,如果流量下降,則為“季節性”。 那不是嚴格的方法。

A better approach is to put some maths and statistics behind it and analyse it with a data science approach. A lot of the maths and statistics behind data science concepts can be difficult, but luckily there are a lot of tools out there that can help and I would like to introduce one that was made by Google called Causal Impact.

更好的方法是將一些數學和統計信息放在后面,并使用數據科學方法進行分析。 數據科學概念背后的許多數學和統計數據可能很困難,但是幸運的是,那里有很多工具可以提供幫助,我想介紹一下由Google制造的名為因果影響的工具 。

The Causal Impact package was originally an R package, however, there is a Python version if that is your poison and that is what I will be going through in this post. To install it in your Python environment using Pipenv, use the command:

因果影響包最初是R包 ,但是,如果有毒,那就有一個Python版本 ,這就是我將在本文中介紹的內容。 要使用Pipenv在Python環境中安裝它,請使用以下命令:

pipenv install pycausalimpact

If you want to learn more about Pipenv, see a post I wrote on it here, otherwise, Pip will work just fine too:

如果您想了解有關Pipenv的更多信息,請參閱我在此處寫的一篇文章,否則,Pip也可以正常工作:

pip install pycausalimpact

什么是因果影響? (What is Causal Impact?)

Causal Impact is a library that is used to make predictions on time-series data (such as web traffic) in the event of an ‘intervention’ which can be something like campaign activity, a new product launch or an SEO optimisation that has been put in place.

因果影響是一個庫,用于在發生“干預”時對時間序列數據(例如網絡流量)進行預測,該干預可以是諸如活動活動,新產品發布或已經進行的SEO優化之類的事情。到位。

You supply two-time series as data to the tool, one time series could be clicks over time for the part of a website that experienced the intervention. The other time series acts as a control and in this example that would be clicks over time for a part of the website that didn’t experience the intervention.

您向工具提供了兩個時間序列作為數據,一個時間序列可能是隨著時間的流逝而發生的涉及網站干預的部分。 其他時間序列用作控制,在此示例中,將是一段時間內未經歷干預的網站的點擊次數。

You also supply a data to the tool when the intervention took place and what it does is it trains a model on the data called a Bayesian structural time series model. This model uses the control group as a baseline to try and build a prediction about what the intervention group would have looked like if the intervention hadn’t taken place.

您還可以在發生干預時向工具提供數據,它所做的是在數據上訓練一個稱為貝葉斯結構時間序列模型的模型 。 該模型以對照組為基準,以嘗試建立關于如果未進行干預的情況下干預組的狀況的預測。

The original paper on the maths behind it is here, however, I recommend watching this video below by a guy at Google, which is far more accessible:

關于它背后的數學原理的原始文章在這里 ,但是,我建議下面由Google的一個人觀看此視頻,該視頻更容易獲得:

在Python中實現因果影響 (Implementing Causal Impact in Python)

After installing the library into your environment as outlined above, using Causal Impact with Python is pretty straightforward, as can be seen in the notebook below by Paul Shapiro:

在如上所述將庫安裝到您的環境中之后,將因果影響與Python結合使用非常簡單,如Paul Shapiro在下面的筆記本中所示:

Causal Impact with Python
Python的因果影響

After pulling in a CSV with the control group data, intervention group data and defining the pre/post periods you can train the model by calling:

在輸入包含控制組數據,干預組數據的CSV并定義前后期間后,您可以通過調用以下方法來訓練模型:

ci = CausalImpact(data[data.columns[1:3]], pre_period, post_period)

This will train the model and run the predictions. If you run the command:

這將訓練模型并運行預測。 如果運行命令:

ci.plot()

You will get a chart that looks like this:

您將獲得一個如下所示的圖表:

Image for post
Output after training the Causal Impact Model
訓練因果影響模型后的輸出

You have three panels here, the first panel showing the intervention group and the prediction of what would have happened without the intervention.

您在此處有三個面板,第一個面板顯示干預組,并預測沒有干預的情況。

The second panel shows the pointwise effect, which means the difference between what happened and the prediction made by the model.

第二個面板顯示了逐點效應,這意味著發生的事情與模型所做的預測之間的差異。

The final panel shows the cumulative effect of the intervention as predicted by the model.

最后一個面板顯示了模型所預測的干預措施的累積效果。

Another useful command to know is:

另一個有用的命令是:

print(ci.summary('report'))

This prints out a full report that is human readable and ideal for summarising and dropping into client slides:

這將打印出一份完整的報告,該報告易于閱讀,是匯總和放入客戶端幻燈片的理想選擇:

Image for post
Report output for Causal Impact
報告因果影響的輸出

選擇一個對照組 (Selecting a control group)

The best way to build your control group is to pick pages which aren’t affected by the intervention at random using a method called stratified random sampling.

建立對照組的最佳方式是使用一種稱為分層隨機抽樣的方法隨機選擇不受干預影響的頁面。

Etsy has done a post on how they’ve used Causal Impact for SEO split testing and they recommend using this method. Random stratified sampling is as the name implies where you pick from the population at random to build the sample. However if what we’re sampling is segmented in some way, we try and maintain the same proportions in the sample as in the population for these segments:

Etsy發表了一篇關于他們如何將因果影響用于SEO拆分測試的文章,他們建議使用此方法。 顧名思義,隨機分層抽樣是您從總體中隨機選擇以構建樣本的地方。 但是,如果以某種方式對樣本進行了細分,則我們將嘗試在樣本中保持與這些細分中的總體相同的比例:

Image for post
EtsyEtsy提供

An ideal way to segment web pages for stratified sampling is to use sessions as a metric. If you load your page data into Pandas as a data frame, you can use a lambda function to label each page:

細分網頁以進行分層抽樣的理想方法是使用會話作為指標。 如果將頁面數據作為數據框加載到Pandas中,則可以使用lambda函數標記每個頁面:

df["label"] = df["Sessions"].apply(lambda x:"Less than 50" if x<=50 else ("Less than 100" if x<=100 else ("Less than 500" if x<=500 else ("Less than 1000" if x<=1000 else ("Less than 5000" if x<=5000 else "Greater than 5000")))))

df["label"] = df["Sessions"].apply(lambda x:"Less than 50" if x<=50 else ("Less than 100" if x<=100 else ("Less than 500" if x<=500 else ("Less than 1000" if x<=1000 else ("Less than 5000" if x<=5000 else "Greater than 5000")))))

From there, you can use test_train_split in sklearn to build your control and test groups:

從那里,您可以在sklearn中使用test_train_split來構建您的控制和測試組:

from sklearn.model_selection import train_test_split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(selectedPages["URL"],selectedPages["label"], test_size=0.01, stratify=selectedPages["label"])

X_train, X_test, y_train, y_test = train_test_split(selectedPages["URL"],selectedPages["label"], test_size=0.01, stratify=selectedPages["label"])

Note that stratify is set and if you have a list of pages you want to test already then your sample pages should equal the number of pages you want to test. Also, the more pages you have in your sample, the better the model will be. If you use too few pages, the less accurate the model will be.

請注意,已設置分層 ,并且如果您已經有要測試的頁面列表,則示例頁面應等于要測試的頁面數。 另外,樣本中的頁面越多,模型越好。 如果使用的頁面太少,則模型的準確性將降低。

It is is worth noting that JC Chouinard gives a good background on how to do all of this in Python using a method similar to Etsy:

值得注意的是,JC Chouinard為如何使用類似于Etsy的方法在Python中完成所有這些操作提供了良好的背景知識:

結論 (Conclusion)

There are a couple of different use cases that you could use this type of testing. The first would be to test ongoing improvements using split testing and this is similar to the approach that Etsy uses above.

您可以使用幾種類型的測試來使用這種類型的測試。 首先是使用拆分測試來測試正在進行的改進,這與Etsy上面使用的方法類似。

The second would be to test an improvement that was made on-site as part of ongoing work. This is similar to an approach outlined in this post, however with this approach you need to ensure your sample size is sufficiently large otherwise your predictions will be very inaccurate. So please do bear that in mind.

第二個是測試正在進行的工作中在現場進行的改進。 這類似于在此列出的方法后 ,但是這種方法,你需要確保你的樣本規模足夠大,否則你的預測將是非常不準確的。 因此,請記住這一點。

Both ways are valid ways of doing SEO testing, with the former being a type of A/B split test for ongoing optimisation and the latter being an test for something that has already been implemented.

兩種方法都是進行SEO測試的有效方法,前一種是用于進行持續優化的A / B拆分測試,而后一種是針對已經實施的測試。

I hope this has given you some insight into how to apply data science principles to your SEO efforts. Do read around these interesting topics and try and come up with other ways to use this library to validate your efforts. If you need background on the Python used in this post I recommend this course.

我希望這使您對如何將數據科學原理應用于SEO有所了解。 請閱讀這些有趣的主題,并嘗試使用其他方法來使用此庫來驗證您的工作。 如果您需要本文中使用的Python的背景知識,我建議您學習本課程 。

翻譯自: https://towardsdatascience.com/how-to-use-data-science-principles-to-improve-your-search-engine-optimisation-efforts-927712ed0b12

搜索引擎優化學習原理

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389260.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389260.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389260.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Siamese網絡(孿生神經網絡)詳解

SiameseFCSiamese網絡&#xff08;孿生神經網絡&#xff09;本文參考文章&#xff1a;Siamese背景Siamese網絡解決的問題要解決什么問題&#xff1f;用了什么方法解決&#xff1f;應用的場景&#xff1a;Siamese的創新Siamese的理論Siamese的損失函數——Contrastive Loss損失函…

Dubbo 源碼分析 - 服務引用

1. 簡介 在上一篇文章中&#xff0c;我詳細的分析了服務導出的原理。本篇文章我們趁熱打鐵&#xff0c;繼續分析服務引用的原理。在 Dubbo 中&#xff0c;我們可以通過兩種方式引用遠程服務。第一種是使用服務直聯的方式引用服務&#xff0c;第二種方式是基于注冊中心進行引用。…

期權價格的上限和下限

期權按照買方權利性質分為&#xff1a;看漲期權和看跌期權 1、首先&#xff0c;看漲期權的上限和下限 看漲期權價格上限為其標的資產價格。 看漲期權是給予買方一個在未來買入標的資產的權利&#xff0c;如果該權利的價格高于標的資產的價格&#xff0c;那么投資者不如直接購買…

一件登錄facebook_我從Facebook的R教學中學到的6件事

一件登錄facebookBetween 2018 to 2019, I worked at Facebook as a data scientist — during that time I was involved in developing and teaching a class for R beginners. This was a two-day course that was taught about once a month to a group of roughly 15–20 …

SiameseFC超詳解

SiameseFC前言論文來源參考文章論文原理解讀首先要知道什么是SOT&#xff1f;&#xff08;Siamese要做什么&#xff09;SiameseFC要解決什么問題&#xff1f;SiameseFC用了什么方法解決&#xff1f;SiameseFC網絡效果如何&#xff1f;SiameseFC基本框架結構SiameseFC網絡結構Si…

Python全棧工程師(字符串/序列)

ParisGabriel Python 入門基礎字符串&#xff1a;str用來記錄文本信息字符串的表示方式&#xff1a;在非注釋中凡是用引號括起來的部分都是字符串‘’ 單引號“” 雙引號 三單引""" """ 三雙引有內容代表非空字符串否則是空字符串 區別&#xf…

跨庫數據表的運算

跨庫數據表的運算&#xff0c;一直都是一個說難不算太難&#xff0c;說簡單卻又不是很簡單的、總之是一個麻煩的事。大量的、散布在不同數據庫中的數據表們&#xff0c;明明感覺要把它們合并起來&#xff0c;再來個小小的計算&#xff0c;似乎也就那么回事……但真要做起來&…

FCN全卷積網絡隨筆

參考&#xff1a;四、全卷積網絡FCN詳細講解&#xff08;超級詳細哦&#xff09; 這篇文章已經寫的很好了&#xff0c;這里說兩個我考慮的點。 第一個就是&#xff1a;FCN在縮小成heat map&#xff0c;為什么要通過上采樣還原回原圖大小&#xff1f; 我覺得這個的原因是因為&a…

熊貓在線壓縮圖_回歸圖與熊貓和脾氣暴躁

熊貓在線壓縮圖數據可視化 (Data Visualization) I like the plotting facilities that come with Pandas. Yes, there are many other plotting libraries such as Seaborn, Bokeh and Plotly but for most purposes, I am very happy with the simplicity of Pandas plotting…

敏捷數據科學pdf_敏捷數據科學數據科學可以并且應該是敏捷的

敏捷數據科學pdfTL;DR;TL; DR; I have encountered a lot of resistance in the data science community against agile methodology and specifically scrum framework; 在數據科學界&#xff0c;我遇到了許多反對敏捷方法論(特別是Scrum框架)的抵制。 I don’t see it this …

oracle的連接字符串

OracleConnection oCnn new OracleConnection("Data SourceORCL_SERVER;USERM70;PASSWORDmmm;");建立個角色 建立個表空間(角色與表空間同名的) 在方案里就可以建立表,然后就哦了 10g

SiameseRPN詳解

SiameseRPN論文來源論文背景一&#xff0c;簡介二&#xff0c;研究動機三、相關工作論文理論注意&#xff1a;網絡結構&#xff1a;1.Siamese Network2.RPN3.LOSS計算4.Tracking論文的優缺點分析一、Siamese-RPN的貢獻/優點&#xff1a;二、Siamese-RPN的缺點&#xff1a;代碼流…

數據可視化 信息可視化_可視化數據操作數據可視化與紀錄片的共同點

數據可視化 信息可視化Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing data and kicki…

python 圖表_使用Streamlit-Python將動畫圖表添加到儀表板

python 圖表介紹 (Introduction) I have been thinking of trying out Streamlit for a while. So last weekend, I spent some time tinkering with it. If you have never heard of this tool before, it provides a very friendly way to create custom interactive Data we…

Python--day26--復習

轉載于:https://www.cnblogs.com/xudj/p/9953293.html

sockets C#

Microsoft.Net Framework為應用程序訪問Internet提供了分層的、可擴展的以及受管轄的網絡服務&#xff0c;其名字空間System.Net和System.Net.Sockets包含豐富的類可以開發多種網絡應用程序。.Net類采用的分層結構允許應用程序在不同的控制級別上訪問網絡&#xff0c;開發人員可…

667. Beautiful Arrangement II

找規律 1&#xff0c;2&#xff0c;... , n 亂序排列&#xff0c;相鄰數據的絕對差最多有n-1種 比如1&#xff0c;2&#xff0c;3&#xff0c;4&#xff0c;5對應于 1 5 2 4 3 class Solution { public:vector<int> constructArray(int n, int k) {vector<int> re…

SiameseRPN++分析

SiamRPN論文來源論文背景什么是目標跟蹤什么是孿生網絡結構Siamese的局限解決的問題論文分析創新點一&#xff1a;空間感知策略創新點二&#xff1a;ResNet-50深層網絡創新點三&#xff1a;多層特征融合創新點四&#xff1a;深層互相關代碼分析整體代碼簡述&#xff08;1&#…

MySQL:Innodb page clean 線程 (二) :解析

一、數據結構和入口函數 1、數據結構 ● page_cleaner_t&#xff1a;整個Innodb只有一個&#xff0c;包含整個page clean線程相關信息。其中包含了一個page_cleaner_slot_t的指針。變量名含義mutex用于保護整個page_cleaner_t結構體和page_cleaner_slot_t結構體&#xff0c;當…