知識力量_網絡分析的力量

知識力量

The most common way to store data is in what we call relational form. Most systems get analyzed as collections of independent data points. It looks something like this:

存儲數據的最常見方式是我們所謂的關系形式。大多數系統作為獨立數據點的集合進行分析。看起來像這樣：

Whether you’re a spreadsheet user or a machine learning master, you’re probably used to seeing your data that way. Rows and columns representing different categories and metrics.

無論您是電子表格用戶還是機器學習大師，您都可能習慣于以這種方式查看數據。行和列代表不同的類別和指標。

However, this approach makes it very difficult to capture information about the relationships that are fundamental to so much of our world. When we stop to think about some of the common systems around us — systems which we care about understanding, optimizing and predicting — we start to see how treating these systems as independent data points misses crucial information.

但是，這種方法很難捕獲有關我們世界大部分地區基本關系的信息。當我們停止思考周圍的一些常見系統(我們關心理解，優化和預測的系統)時，我們開始看到如何將這些系統視為獨立數據點會丟失關鍵信息。

Economies are defined by relationships and transactions more than just individual players that operate independently.

經濟關系和交易的定義不僅限于獨立運作的個體參與者。

Infrastructure we use every day is highly connected. We have transportation systems linking cities and people, and communication systems linking electronic devices.

我們每天使用的基礎設施緊密相連。我們擁有連接城市和人的交通系統以及通訊系統鏈接電子設備。

In biology, life doesn’t emerge from cells/proteins/genes working separately, but from those components coming together and performing many interactions to make the cell alive. Even our thoughts are hidden and encoded in the connections and wiring between billions of neurons.

在生物學中，生命不是來自分開工作的細胞/蛋白質/基因，而是來自那些聚集在一起并進行許多相互作用以使細胞存活的成分。甚至我們的思想也被隱藏并編碼在數十億個神經元之間的連接和連線中。

And, of course, we have social networks. This has become a term to describe social media platforms, but to be more specific, it is the data underlying these platforms, recording friendships and followers, that is useful to model as a literal network.

而且，當然，我們有社交網絡。這已成為描述社交媒體平臺的術語，但更具體地說，正是這些平臺的基礎數據(記錄友誼和關注者)對于建模為文字網絡很有用。

We could keep going with endless examples. What do all of these systems have in common? Highly connected data. Essentially, anything that involves humans is highly connected. Our world isn’t just a collection of individuals isolated from everyone else, but a network of billions of members who are constantly interacting with each other. Therefore, data that describes these systems will be useful insofar as it captures those connections.

我們可以繼續列舉無盡的例子。所有這些系統有什么共同點？ 高度關聯的數據 。本質上，任何涉及人類的事物都是高度關聯的。我們的世界不僅是彼此孤立的個人的集合，還包括數十億個不斷互動的成員組成的網絡。因此，描述這些系統的數據將在捕獲這些連接的范圍內很有用。

簡而言之，在許多系統后面都有一個復雜的接線圖，即網絡，它定義了組件之間的連接。 (In short, behind many systems there is an intricate wiring diagram, a network, that defines the connections between the components.)

Although the traditional relational data model has served many domains well, highly connected systems can never be fully modeled or used for prediction unless we understand the networks behind them.

盡管傳統的關系數據模型已經很好地服務于許多領域，但是除非我們了解背后的網絡，否則高度連接的系統永遠無法完全建模或用于預測。

Google的PageRank算法 (Google’s PageRank Algorithm)

To further understand the transformative potential of network analysis upon its introduction to a new domain, I think it’d be useful to explain its role on a platform that you likely use every day.

為了進一步了解網絡分析在引入新領域后的變革潛力，我認為在您可能每天使用的平臺上解釋其作用很有用。

In the late 1990s and early 2000s, there were many search engines on the web. The internet was a vast, ever-evolving terrain whose users desperately needed navigation help. Many understood this need, and the field of search engines and directories was crowded.

在1990年代末和2000年代初，網絡上有許多搜索引擎。互聯網是一個廣闊而不斷發展的領域，用戶迫切需要導航幫助。許多人都了解這種需求，因此搜索引擎和目錄領域非常擁擠。

Despite being a latecomer, Google managed to surpass the competition only a few years after its founding by Larry Page and Sergey Brin in 1998. What made Google different? It modeled the internet as a network.

盡管是后來者，但Google在1998年由拉里·佩奇(Larry Page)和謝爾蓋·布林(Sergey Brin)創立后僅幾年就超越了競爭對手。是什么使Google與眾不同？它將互聯網建模為網絡。

The basic problem that search engines were faced with was: how do you measure the relevance or importance of different pages in order to determine what results to show after a user searches? There was no obvious answer. Most search engines attempted to measure the importance of a website by analyzing the content on that website itself. Similar to the way we use Excel, these entailed rows and columns, where each row was a page and each column was a variable or metric about the content on that page.

搜索引擎面臨的基本問題是：如何測量不同頁面的相關性或重要性，以便確定用戶搜索后顯示什么結果？沒有明顯的答案。大多數搜索引擎都試圖通過分析網站本身的內容來衡量該網站的重要性。與我們使用Excel的方式類似，這些包含行和列，其中每一行是一個頁面，每一列是關于該頁面上內容的變量或度量。

However, this is very gameable. If you wanted to look up how to bake cupcakes, for example, and the search engine you’re using determines which results to return based only on website content, I could create a cupcake-baking website in 30 minutes with all the right content to make your search engine deem my page “relevant.” But my site is unlikely to be the most relevant or highest quality for your needs. In the early days of the internet, when everyone was trying to cash in on this new medium, cupcake con-men abounded.

但是，這是非常可玩的。例如，如果您想查找如何烘焙紙杯蛋糕，并且您使用的搜索引擎僅根據網站內容確定要返回的結果，那么我可以在30分鐘內創建一個包含所有內容的紙杯蛋糕烘焙網站讓您的搜索引擎認為我的頁面“相關”。但是我的網站不太可能是您需要的最相關或質量最高的網站。在互聯網的早期，當每個人都試圖在這種新媒體上賺錢時，紙杯蛋糕盛裝出現。

Page and Brin invented a different approach to search. They realized that they could dramatically improve the results they showed users if they first modeled the internet as a network of domains and pages that reference each other. More specifically, they developed an algorithm to detect the “role” or the “importance” of a node in a network, now called the PageRank algorithm. Once understood, the PageRank algorithm seems very simple, but it’s very powerful.

Page和Brin發明了另一種搜索方法。他們意識到，如果他們首先將互聯網建模為相互引用的域和頁面的網絡，則可以極大地改善向用戶展示的結果。更具體地說，他們開發了一種算法來檢測網絡中節點的“角色”或“重要性”，現在稱為PageRank算法。一旦理解，PageRank算法看起來很簡單，但是功能非常強大。

It looks like this. Given a collection of web pages, we can keep track of all of the links or references that pages make to each other. In our model, when one page references another, we can add these references as arrows, or “edges,” pointing from the first page to the second page. We can do this across all of the pages in our collection of interest, and Google did it across all of the pages on the internet. What we end up with might look something like this:

看起來像這樣。給定一個網頁集合，我們可以跟蹤頁面之間的所有鏈接或引用。在我們的模型中，當一頁引用另一頁時，我們可以將這些引用添加為箭頭，即從第一頁指向第二頁的“邊”。我們可以在我們感興趣的所有頁面上執行此操作，而Google在互聯網上的所有頁面上都執行此操作。我們最終得到的結果可能是這樣的：

As you can imagine, and also see in the simple illustrated example, some web pages are referenced way more often than others. A web page that has authority and is relevant, unlike that which a poser just created 30 minutes ago, will be one of those nodes referenced more often. Without going into the mathematical details of the actual algorithm, we can think of PageRank as essentially measuring the “importance” or “influence” of a page based on its role in the network. By scraping the entire internet and all of the references that web pages make to each other, Google was able to calculate precisely this importance of each page, weed out the irrelevant ones, and subsequently return higher quality search results to its users.

您可以想象，也可以在簡單的示例中看到，某些網頁的引用頻率比其他網頁高。具有權威性和相關性的網頁將不同于其中一個在30分鐘前創建的姿勢者那樣的網頁，它將成為被引用次數最多的節點之一。無需深入研究實際算法的數學細節，我們可以認為PageRank實際上是根據頁面在網絡中的作用來衡量頁面的“重要性”或“影響力”。通過抓取整個互聯網以及網頁相互之間的所有引用，Google能夠精確地計算出每個網頁的重要性，剔除不相關的網頁，然后將更高質量的搜索結果返回給用戶。

我們還能用網絡做什么？ (What else can we do with networks?)

A description of the potential of of networks could fill (and has filled) many books, and indeed more recently this modeling approach has garnered much interest in machine learning implementations, particularly deep learning. But all of these fancy applications still depend on the basic advantage of modeling your data as a network, similar to the way that Google grasped it: networks allow you to calculate entirely new metrics to describe and understand your data that you never would have been able to calculate previously. These metrics are many, and they are derived from various algorithms, like Google’s PageRank, that can be run once you model your data as a network.

對網絡潛力的描述可能會填滿(并且已經填滿)許多書籍，實際上，最近，這種建模方法已經引起了人們對機器學習實現(特別是深度學習)的極大興趣。但是，所有這些精美的應用程序仍然依賴于將數據建模為網絡的基本優勢，類似于Google掌握的方式：網絡使您能夠計算全新的指標來描述和理解您從未有過的數據以前計算。這些指標很多，它們衍生自各種算法(例如Google的PageRank)，一旦您將數據建模為網絡即可運行。

There are various measures of centrality, similar to pagerank, and these centrality measures correspond to many concepts that we already think about and would be interested in measuring. In a social network, for example, these might be the members who have many friends and whose opinions are highly regarded. The role of those authorities/influencers would become clear after modeling the relationships between people as a network.

有多種集中度度量，類似于pagerank，這些集中度度量對應于我們已經考慮過并且將有興趣進行度量的許多概念。例如，在社交網絡中，這些人可能是擁有許多朋友并且其觀點受到高度重視的成員。在將人與人之間的關系建模為網絡之后，這些權威/影響者的作用將變得清晰。

We can also measure directionality or flow within our networks, where the connections between components are essentially arrows. These might allow you to uncover patterns of movement. For example, you might be able to notice confusion or inefficiencies in transportation networks where there are a lot of cycles or zig-zags, and there are many ways to calculate that numerically given the relationships between the nodes in your network.

我們還可以測量網絡中的方向性或流量，其中組件之間的連接實質上是箭頭。這些可能使您發現運動模式。例如，在運輸網絡中存在許多周期或曲折的運輸網絡時，您可能會注意到混亂或效率低下，并且有多種方法可以根據給定網絡中節點之間的關系以數字方式進行計算。

Another aspect that modelers are often interested in quantifying is a network’s connectedness. Again, there are several ways to do this, but it’s useful in many different applications.

建模人員經常對量化感興趣的另一個方面是網絡的連通性。同樣，有幾種方法可以執行此操作，但是它在許多不同的應用程序中很有用。

For example, if you were modeling any type of infrastructure — transportation, trade, IT — you could use this as a measure of your infrastructure’s robustness. Or, as an e-commerce vendor, you could use this approach to find communities or clusters of customers.

例如，如果您要對任何類型的基礎架構進行建模(運輸，貿易，IT)，則可以將其用作衡量基礎架構健壯性的指標。 或者，作為電子商務供應商，您可以使用這種方法來查找客戶的社區或集群。

In Conclusion

結論

Networks are an extremely exciting and useful domain of analysis, and one that is increasingly garnering interest from a wide variety of fields. In particular, the possibility of performing network analysis at scale, with datasets of billions of nodes/edges, is seen by many as one of the next big challenges in prediction and machine learning. I plan to write more about all of that in future stories, but hopefully this article gives a helpful brief introduction to the idea of networks and why they can be so powerful in many domains.

網絡是一種非常令人興奮和有用的分析領域，并且越來越引起來自各個領域的興趣。特別是，對數十億個節點/邊緣的數據集進行大規模網絡分析的可能性被許多人視為預測和機器學習中的下一個重大挑戰之一。我計劃在將來的故事中寫更多有關所有這些內容的信息，但是希望本文對網絡的概念以及為什么它們在許多領域如此強大的原因提供有益的簡要介紹。

翻譯自: https://medium.com/@ben.makansi/the-power-of-network-analysis-8a245633a36

知識力量

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/391303.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/391303.shtml
英文地址，請注明出處：http://en.pswp.cn/news/391303.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

python里的apply,applymap和map的區別

apply,applymap和map的應用總結:apply 用在dataframe上，用于對row或者column進行計算；applymap 用于dataframe上，是元素級別的操作；map （其實是python自帶的）用于series上，是元素級別的操作。如…

驗證曲線和學習曲線_如何擊敗技術學習曲線的怪物

驗證曲線和學習曲線Doing what I do for a living, which these days mostly involves creating technology books and courseware, I’m constantly learning new technologies. In a way, my new tech adventures are not much different than the ones most IT pros face, e…

234

234 轉載于:https://www.cnblogs.com/Forever77/p/11509588.html

SCCM PXE客戶端無法加載DP（分發點）映像

上一篇文章我們講到了一個比較典型的PXE客戶端無法找到操作系統映像的故障，今天再和大家一起分享一個關于 PXE客戶端無法加載分發點映像的問題。具體的報錯截圖如下：從報錯中我們可以看到，PXE客戶端已經成功的找到了SCCM服務器，并…

Docker 入門（2）技術實現和核心組成

1. Docker 的技術實現 Docker 的實現，主要歸結于三大技術： 命名空間 ( Namespaces )控制組 ( Control Groups )聯合文件系統 ( Union File System ) 1.1 Namespace 命名空間可以有效地幫助Docker分離進程樹、網絡接口、掛載點以及進程間通信等資源。L…

marlin 三角洲_帶火花的三角洲湖：什么和為什么？

marlin 三角洲Let me start by introducing two problems that I have dealt time and again with my experience with Apache Spark:首先，我介紹一下我在Apache Spark上的經歷反復解決的兩個問題： Data “overwrite” on the same path causing data l…

環境變量的作用

1. PATH環境變量。作用是指定命令搜索路徑，在shell下面執行命令時，它會到PATH變量所指定的路徑中查找看是否能找到相應的命令程序。我們需要把 jdk安裝目錄下的bin目錄增加到現有的PATH變量中，bin目錄中包含經常要用到的可執行文件如javac/ja…

WeWork通過向225,000個社區征稅來拼命地從Meetup.com榨取現金

Update: A few hours after I published this article, Meetup quietly added a note to the top of their announcement. They have not tweeted or done anything else to publicize this note, but some people noticed it and shared it with me.更新：在我發布本…

eda分析_EDA理論指南

eda分析Most data analysis problems start with understanding the data. It is the most crucial and complicated step. This step also affects the further decisions that we make in a predictive modeling problem, one of which is what algorithm we are going to ch…