vray陰天室內_陰天有話:第1部分

vray陰天室內

When working with text data and NLP projects, word-frequency is often a useful feature to identify and look into. However, creating good visuals is often difficult because you don’t have a lot of options outside of bar charts. Lets face it; bar charts get old and boring quick! This is where word clouds come into play. In this blog learn how to spice up your visualizations using word clouds on your next project.

在處理文本數據和NLP項目時,單詞頻率通常是識別和調查的有用功能。 但是,創建良好的視覺效果通常很困難,因為在條形圖之外您沒有太多選擇。 面對現實吧; 條形圖變老又無聊! 這就是詞云發揮作用的地方。 在此博客中,學習如何在下一個項目中使用詞云為您的可視化增添趣味。

Up until my most recent project I actually didn’t know a word cloud library existed in python, but I assure you it does, and it has some amazing features!

在我最近的項目之前,我實際上還不知道python中存在詞云庫,但是我向您保證,它確實存在,并且它具有一些驚人的功能!

The full WordCloud library and documentation can be found here for those interested.

完整的WordCloud庫和文檔可以在 此處 找到 感興趣的人。

TLDR (TLDR)

Part 1 of this blog will walk you through obtaining the appropriate libraries and the basic parameters and functions of the wordcloud library as well as how to create a generic word cloud. Part 2 will build upon this and walk you through creating custom masks for word clouds and other unique visual options.

本博客的第1部分將引導您獲得合適的庫以及wordcloud庫的基本參數和功能,以及如何創建通用詞云。 第2部分將以此為基礎,并引導您為詞云和其他獨特的視覺選項創建自定義蒙版。

WordCloud入門 (Getting Started With WordCloud)

Before we can start making visuals, we’ll need to make sure we have the libraries we need to create our word clouds. You’ll need the following libraries:

在開始制作視覺效果之前,我們需要確保擁有創建詞云所需的庫。 您將需要以下庫:

  • numpy

    麻木
  • matplotlib

    matplotlib
  • PIL

    皮爾
  • wordcloud

    詞云
  • nltk (This is only necessary for the purpose of this blog and as a source of sample text to create word clouds from)

    nltk (這僅對于本博客而言是必需的,并且作為從其創建詞云的示例文本的來源)

All of these libraries can be pip installed if you’re unable to import them. For my specific project, I used Google Colab which required a slightly more unique solution to import wordcloud. For Google Colab users, you can use the following command to install wordcloud:

如果您無法導入所有這些庫,則可以通過pip安裝。 對于我的特定項目,我使用了Google Colab,它需要一個稍微獨特的解決方案來導入wordcloud。 對于Google Colab用戶,您可以使用以下命令來安裝wordcloud:

!pip install git+https://github.com/amueller/word_cloud.git #egg=wordcloud

!pip安裝git + https://github.com/amueller/word_cloud.git#egg = wordcloud

That last part is important for Colab because it identifies and effectively names the library so that it can be properly imported.

最后一部分對Colab很重要,因為它可以識別并有效地命名庫,以便可以正確導入它。

Once we have all of our needed libraries installed, we can use the following set of import statements:

一旦我們安裝了所有需要的庫,就可以使用以下一組導入語句:

Image for post

We’re now ready to create some word clouds!

現在我們準備創建一些詞云!

通用詞云 (Generic Word Clouds)

To start with, lets explore generic word clouds. For those that want to follow along, we’ll use some corpora from the nltk library.

首先,讓我們探索通用詞云。 對于那些想要繼續學習的人,我們將使用nltk庫中的一些語料庫。

First off, we’ll need to acquire our text. I’ll note here that there are two forms of text that WordCloud can use to generate a visual. The first, and the main one we’ll use, is in the form of a string. The second, is from a dictionary of words and their frequency as key-value pairs.

首先,我們需要獲取文本。 我將在此處指出,WordCloud可使用兩種形式的文本來生成視覺效果。 我們將使用的第一個也是主要的字符串形式。 第二個是來自單詞字典及其作為鍵值對的頻率。

If you’re following along, or want to attempt this using other sample text from nltk, you can use the following code to acquire our text samples:

如果您正在遵循,或者想使用來自nltk的其他示例文本來嘗試此操作,則可以使用以下代碼獲取我們的文本示例:

Image for post
This shows a list of the different authors and texts we have to choose from within nltk’s gutenberg files
This shows a list of the different authors and texts we have to choose from within nltk’s gutenberg files
這顯示了我們必須從nltk的gutenberg文件中選擇的不同作者和文本的列表

Feel free to attempt creating word clouds from any of the above options. The one that we’ll continue with in these examples, however, will be Moby Dick.

隨意嘗試從以上任何選項創建詞云。 但是,在這些示例中我們將繼續討論的是Moby Dick。

To gather our sample text as a single string you can use the following command:

要將示例文本作為單個字符串收集,可以使用以下命令:

Image for post

Now that we have our text, let’s take a look at how to turn this into a word cloud. What we’re doing in the code block below is instantiating a WordCloud object, we then use that object to generate a cloud based upon the text that we pass in. Once we have the cloud generated, we then want to be able to show it without the unnecessary x and y axis.

現在我們有了文本,讓我們看一下如何將其變成詞云。 在下面的代碼塊中,我們正在實例化一個WordCloud對象,然后使用該對象根據傳入的文本生成一個云。一旦生成了云,我們便希望能夠顯示它沒有不必要的x和y軸。

Image for post

Look at that! We made a word cloud!

看那個! 我們做了一個詞云!

Now personally, I’m not a fan of the black background and it seems a little small, so let’s change that with some simple parameters.

現在我個人不喜歡黑色背景,而且看起來有點小,所以讓我們用一些簡單的參數來更改它。

Image for post

Now we’re talking! Although, there seems to be some strange things showing up in our generic word cloud doesn’t there?

現在我們在說話! 雖然,在通用詞云中似乎有一些奇怪的事情出現了嗎?

參數和語言處理 (Parameters and Language Processing)

Looking at the cloud above we notice some things. Some words seem to be paired.

看著上面的云,我們注意到一些事情。 有些話似乎成對出現。

  • the whale

    鯨魚
  • the ship

  • the sea

  • the captain

    隊長
  • White Whale

    白鯨

So on and so forth. Our word cloud is still showing word frequencies however one of the parameters WordCloud has is ‘collocations’ which it defaults to True. What this does is also looks at pairs of words and their frequencies. In some instances this can definitely be useful, but in this one I think we’ll get better results not using it.

等等等等。 我們的詞云仍在顯示詞頻,但是WordCloud的參數之一是“配置”,默認為True。 這還著眼于單詞對及其頻率。 在某些情況下,這絕對是有用的,但在我看來,不使用它會得到更好的結果。

Image for post

Notice the difference?

注意區別嗎?

A keen eye may recognize that the word ‘the’ no longer appears in our word cloud. This is because ‘the’ is recognized as a stop-word and excluded from the cloud even though it appears quite frequently in the text.

敏銳的眼睛可能會意識到“ the”一詞不再出現在我們的詞云中。 這是因為“ the”被識別為停用詞,即使在文本中出現頻率很高,也被排除在云端之外。

You may be wondering where stop-words came into play, and that is one of the really cool features of the wordcloud library. The library comes with it’s own list of stop-words that it uses by default. The library actually uses quite a few NLP practices by default that makes creating the clouds that much easier and also adjustable for the more experienced NLP practitioner. Some of these additional NLP parameters that are used are:

您可能想知道停用詞在哪里起作用,而這是wordcloud庫的真正酷功能之一。 該庫附帶了它自己的默認停用詞列表。 默認情況下,該庫實際上使用了許多NLP實踐,這使得創建云變得更加容易,并且對于經驗豐富的NLP從業者而言也是可調整的。 使用的一些其他NLP參數是:

  • regexp — an optional parameter that if left blank will use r”\w[\w’]+” by default. Custom regex string can be passed in here.

    regexp —一個可選參數,如果保留為空白,默認情況下將使用r” \ w [\ w'] +” 。 自定義正則表達式字符串可以在此處傳遞。

  • normalize_plurals — default = True; For words that appear both with and without a trailing ‘s’, that ‘s’ is removed from the plural and it’s counted as another of it’s singular version

    normalize_plurals —默認= True; 對于同時帶有和不帶有尾部“ s”的單詞,該“ s”將從復數形式中刪除,并被視為另一個單數形式

In our original import statement we imported STOPWORDS from the wordcloud library. You can print this to see the entire list of words that are being excluded by default, but it currently uses 192 of the most common stop-words. You can also add to this list if you have additional words you want excluded. You can also supply your own stop-words if prefer. Note that the stopwords must be passed in as a set and not a list.

在原始的導入語句中,我們從wordcloud庫中導入了STOPWORDS。 您可以打印此內容以查看默認情況下排除的單詞的整個列表,但當前它使用192個最常用的停用詞。 如果您想排除其他單詞,也可以添加到此列表中。 如果愿意,您也可以提供自己的停用詞。 請注意,停用詞必須作為集合而不是列表傳遞。

Image for post

What a difference!

有什么不同!

One last thing we’ll talk about before moving on to making fun and unique word clouds is “relative scaling”.

在繼續取笑和獨特的詞云之前,我們要談論的最后一件事是“相對縮放”。

Relative scaling is what’s used to determine the size of the word based upon its frequency. By default, relative scaling is set to 0.5, which is essentially the equivalent of saying that a word that occurs twice as often as another word will be 50% larger.

相對縮放是根據單詞的頻率來確定單詞大小的方法。 默認情況下,相對縮放比例設置為0.5,這基本上等于說一個單詞出現的頻率是另一個單詞的兩倍將增加50%。

Relative scaling can be set to any number between 0 and 1. With 0 being essentially kind of pointless as all words will be the same size, and 1 being that words that occur twice as often will be twice as large. In some cases this can be useful to better identify the differences in frequency. However, this doesn’t always look very good and can affect the fit of a word cloud to a mask which we will talk about later.

相對縮放比例可以設置為0到1之間的任何數字。0本質上是毫無意義的,因為所有單詞的大小都相同,而1表示出現頻率兩倍的單詞將是兩倍大。 在某些情況下,這有助于更好地識別頻率差異。 但是,這并不總是看起來很好,并且可能會影響詞云與蒙版的匹配度,我們將在后面討論。

Image for post

In this case, using a relative scaling of 1 actually doesn’t look too bad! We’ll soon see how this translates to using it with an image mask.

在這種情況下,使用1的相對比例實際上看起來還不錯! 我們將很快看到如何將其轉換為與圖像蒙版一起使用。

保存您的詞云 (Saving Your Word Cloud)

Once you have your word cloud the way you want it, you’ll probably want to save it. To do so, you can run the following code which will save the current state of your WordCloud object.

一旦有了您想要的詞云,就可能要保存它。 為此,您可以運行以下代碼來保存WordCloud對象的當前狀態。

Image for post

Keep in mind this will save the image to your local folder and if you have a specific location in mind, you will need to add in the appropriate path.

請記住,這會將圖像保存到本地文件夾,如果您有特定的位置,則需要添加適當的路徑。

值得一玩的其他參數 (Other Parameters Worth Playing With)

We looked at the key parameters for making word clouds, but there are many more that are worth looking into and toying with. These parameters are fairly self-explanatory and can be used to further tweak your clouds:

我們研究了制作詞云的關鍵參數,但是還有很多值得研究和研究的參數。 這些參數是不言自明的,可用于進一步調整云:

  • prefer_horizontal — (float)If set to 1, all words will appear horizontal while lower values will increase the frequency of vertical words. default = 0.9

    preferred_horizo??ntal —(浮動)如果設置為1,則所有單詞將顯示為水平,而較低的值將增加垂直單詞的頻率。 默認值= 0.9

  • min_font_size — (int) Smallest font size to be used. default = 4

    min_font_size —(int)要使用的最小字體大小。 默認= 4

  • max_words — (int) default = 200

    max_words —(整數)默認= 200

  • min_word_length — (int) Minimum number of letters required in a word to be in the cloud. default = 0

    min_word_length —(int)單詞在云中所需的最小字母數。 默認值= 0

  • include_numbers — (bool) default = False

    include_numbers —(布爾值)默認= False

  • repeat — (bool) Determines if words/phrases will be repeated until max_words or min_font_size is reached. (Can be used to create word clouds from a single word) default = False

    repeat —(布爾)確定是否重復單詞/短語,直到達到max_words或min_font_size。 (可用于從單個單詞創建單詞云)default = False

獨特和自定義詞云 (Unique and Custom Word Clouds)

Due to this blog turning out much longer than I had initially planned, I’ll discuss using image masks to create custom word clouds, how to create your own image masks from any image, and how to apply an image’s color to your cloud in a soon to follow, Part 2 of this blog.

由于此博客的發布時間比我最初計劃的要長得多,因此我將討論使用圖像蒙版創建自定義文字云,如何從任何圖像創建自己的圖像蒙版以及如何將圖像的顏色應用于云中。不久之后,該博客的第2部分 。

翻譯自: https://medium.com/swlh/cloudy-with-a-chance-of-words-part-1-d34a29739dba

vray陰天室內

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/391018.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/391018.shtml
英文地址,請注明出處:http://en.pswp.cn/news/391018.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

【codevs2497】 Acting Cute

這個題個人認為是我目前所做的最難的區間dp了,以前把環變成鏈的方法在這個題上并不能使用,因為那樣可能存在重復計算 我第一遍想的時候就是直接把環變成鏈了,wa了5個點,然后仔細思考一下就發現了問題 比如這個樣例 5 4 1 2 4 1 1 …

漸進式web應用程序_漸進式Web應用程序與加速的移動頁面:有什么區別,哪種最適合您?

漸進式web應用程序Do you understand what PWAs and AMPs are, and which might be better for you? Lets have a look and find out.您了解什么是PWA和AMP,哪一種可能更適合您? 讓我們看看并找出答案。 So many people own smartphones these days. T…

高光譜圖像分類_高光譜圖像分析-分類

高光譜圖像分類初學者指南 (Beginner’s Guide) This article provides detailed implementation of different classification algorithms on Hyperspectral Images(HSI).本文提供了在高光譜圖像(HSI)上不同分類算法的詳細實現。 目錄 (Table of Contents) Introduction to H…

在Java里如何給一個日期增加一天

在Java里如何給一個日期增加一天 我正在使用如下格式的日期: yyyy-mm-dd. 我怎么樣可以給一個日期增加一天? 回答一 這樣應該可以解決問題 String dt "2008-01-01"; // Start date SimpleDateFormat sdf new SimpleDateFormat("yyyy-MM-dd&q…

CentOS 7安裝和部署Docker

版權聲明:本文為博主原創文章,未經博主允許不得轉載。 https://blog.csdn.net/u010046908/article/details/79553227 Docker 要求 CentOS 系統的內核版本高于 3.10 ,查看本頁面的前提條件來驗證你的CentOS 版本是否支持 Docker 。通過 uname …

JavaScript字符串方法終極指南-拆分

The split() method separates an original string into an array of substrings, based on a separator string that you pass as input. The original string is not altered by split().split()方法根據您作為輸入傳遞的separator字符串,將原始字符串分成子字符串…

機器人的動力學和動力學聯系_通過機器學習了解幸福動力學(第2部分)

機器人的動力學和動力學聯系Happiness is something we all aspire to, yet its key factors are still unclear.幸福是我們所有人都渴望的東西,但其關鍵因素仍不清楚。 Some would argue that wealth is the most important condition as it determines one’s li…

在Java里怎將字節數轉換為我們可以讀懂的格式?

問題:在Java里怎將字節數轉換為我們可以讀懂的格式? 在Java里怎將字節數轉換為我們可以讀懂的格式 像1024應該變成"1 Kb",而1024*1024應該變成"1 Mb". 我很討厭為每個項目都寫一個工具方法。在Apache Commons有沒有這…

ubuntu 16.04 安裝mysql

2019獨角獸企業重金招聘Python工程師標準>>> 1) 安裝 sudo apt-get install mysql-server apt-get isntall mysql-client apt-get install libmysqlclient-dev 2) 驗證 sudo netstat -tap | grep mysql 如果有 就代表已經安裝成功。 3)開啟遠程訪問 1、 …

shell:多個文件按行合并

paste file1 file2 file3 > file4 file1內容為: 1 2 3 file2內容為: a b c file3內容為: read write add file4內容為: 1 a read 2 b write 3 c add 轉載于:https://www.cnblogs.com/seaBiscuit0922/p/7728444.html

form子句語法錯誤_用示例語法解釋SQL的子句

form子句語法錯誤HAVING gives the DBA or SQL-using programmer a way to filter the data aggregated by the GROUP BY clause so that the user gets a limited set of records to view.HAVING為DBA或使用SQL的程序員提供了一種過濾由GROUP BY子句聚合的數據的方法&#xff…

leetcode 1310. 子數組異或查詢(位運算)

有一個正整數數組 arr,現給你一個對應的查詢數組 queries,其中 queries[i] [Li, Ri]。 對于每個查詢 i,請你計算從 Li 到 Ri 的 XOR 值(即 arr[Li] xor arr[Li1] xor … xor arr[Ri])作為本次查詢的結果。 并返回一…

大樣品隨機雙盲測試_訓練和測試樣品生成

大樣品隨機雙盲測試This post aims to explore a step-by-step approach to create a K-Nearest Neighbors Algorithm without the help of any third-party library. In practice, this Algorithm should be useful enough for us to classify our data whenever we have alre…

vue組件命名指南,不為取名而糾結

前言 自古中國取名文化博大進深,往往取一個好的名字而絞盡腦汁.那么一個好名字能夠帶來什么呢? 名字的內涵必需和使用者固有的本性相配套不和名人重名、不易重名、創意新穎,真正體現通過名字以區分人的作用響亮上口讀起來流暢好聽,協音美好,…

JavaScript 基礎,登錄驗證

<script></script>的三種用法&#xff1a;放在<body>中放在<head>中放在外部JS文件中三種輸出數據的方式&#xff1a;使用 document.write() 方法將內容寫到 HTML 文檔中。使用 window.alert() 彈出警告框。使用 innerHTML 寫入到 HTML 元素。使用 &qu…

使用final類的作用是什么?

問題&#xff1a;使用final類的作用是什么&#xff1f; 我在看一本關于Java的書&#xff0c;它里面說你可以定義一個類為final。我搞不明白有什么地方會被用到這樣。 我是一個編程萌新。我想知道程序員在他們的程序里面都是怎么用fianl類的。如果知道他們是什么時候使用的話&…

photoshop cc_如何使用Photoshop CC將圖片變成卡通

photoshop ccA fun photo effect is to make a photo look like a cartoon. In this tutorial you will learn how to use Photoshop CC to make a photo look like a cartoon drawing.有趣的照片效果是使照片看起來像卡通漫畫。 在本教程中&#xff0c;您將學習如何使用Photos…

從數據角度探索在新加坡的非法毒品

All things are poisons, for there is nothing without poisonous qualities. It is only the dose which makes a thing poison.” ― Paracelsus萬物都是毒藥&#xff0c;因為沒有毒藥就沒有什么。 只是使事物中毒的劑量。” ― 寄生蟲 執行摘要(又名TL&#xff1b; DR) (Ex…

Android 自定義View實現QQ運動積分抽獎轉盤

因為偶爾關注QQ運動&#xff0c; 看到QQ運動的積分抽獎界面比較有意思&#xff0c;所以就嘗試用自定義View實現了下&#xff0c;原本想通過開發者選項查看下界面的一些信息&#xff0c;后來發現積分抽獎界面是在WebView中展示的&#xff0c;應該是在H5頁面中用js代碼實現的&…

瑞立視:厚積薄發且具有“工匠精神”的中國品牌

一家成立兩年的公司&#xff1a;是如何在VR行業趨于穩定的情況下首次融資就獲得如此大額的金額呢&#xff1f; 2017年VR行業內宣布融資的公司寥寥無幾&#xff0c;無論是投資人還是消費者對這個 “寵兒”都開始紛紛投以懷疑的目光。但就在2017年7月27日&#xff0c;深圳市一家…