多維空間可視化_使用GeoPandas進行空間可視化

多維空間可視化

Recently, I was working on a project where I was trying to build a model that could predict housing prices in King County, Washington — the area that surrounds Seattle. After looking at the features, I wanted a way to determine the houses’ worth based on location.

最近,我在一個項目中嘗試建立一個可以預測華盛頓金縣(西雅圖周邊地區)房價的模型。 在查看了這些功能之后,我想找到一種根據位置確定房屋價值的方法。

The dataset included latitude and longitude and it was easy to google them to take a look at the houses, their neighborhoods, their distance from the water, etc. But with over 17000 observations, that was a fool’s task. I had to find an easier way.

數據集包括緯度和經度,可以很容易地用谷歌瀏覽一下房屋,附近,距水的距離等。但是,通過17000多個觀察,這是一個傻瓜的任務。 我必須找到一種更簡單的方法。

I had used Geographic Information Systems (GIS) only once before but not in Python. So I did what I do best: I googled, and ran into this amazing package called GeoPandas. I am going to let the GeoPandas team sum up what they do because they can say much better than I can.

我以前只使用過一次地理信息系統(GIS),而沒有在Python中使用過。 因此,我做了我最擅長的事情:我搜索了Google,并遇到了一個名為GeoPandas的驚人軟件包。 我要讓GeoPandas團隊總結他們所做的事情,因為他們的發言能力比我更好。

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. GeoPandas further depends on fiona for file access and descartes and matplotlib for plotting. — Description from GeoPandas Website (2020)

GeoPandas是一個開源項目,可簡化使用python中的地理空間數據的工作。 GeoPandas擴展了熊貓使用的數據類型,以允許對幾何類型進行空間操作。 幾何運算是通過勻稱進行的。 GeoPandas進一步依賴于fiona進行文件訪問,并依賴笛卡爾和matplotlib進行繪圖。 — GeoPandas網站(2020)的說明

This blew my mind, and what I wanted was really just the most basic of the features. I am going to show you how to run this code and do what I did — plotting accurate points on a map.

這讓我大吃一驚,而我想要的實際上只是最基本的功能。 我將向您展示如何運行此代碼并完成我的工作-在地圖上繪制準確的點。

You are going to need several packages and some files in addition to the basic pandas and matplotlib. They include:

除了基本的pandasmatplotlib外,您還需要幾個軟件包和一些文件 它們包括:

  • geopandas — the package that makes all of this possible

    geopandas-使所有這些成為可能的軟件包
  • shapely — package for manipulation and analysis of planar geometric objects

    勻稱 —用于處理和分析平面幾何對象的程序包

  • descartes — provides a nicer integration of Shapely geometry objects with Matplotlib. It’s not needed every time but I import it just to be safe

    笛卡爾(笛卡爾) -將Shapely幾何對象與Matplotlib更好地集成。 并非每次都需要它,但為了安全起見我將其導入

  • Any .shp file — this is going to be the backdrop of the plot. Mine is going to have King County, but you should be able to find one from any city’s data department. Don’t delete any files from the .zip file it comes in. Something always breaks.

    任何.shp文件-這將是情節的背景。 我的將有金縣,但您應該可以從任何城市的數據部門中找到一個。 不要從它所包含的.zip文件中刪除任何文件。總有東西會中斷。

More information about shapefiles can be found here, but the long and short of it is that these aren’t normal images. They are a vector data storage format that has information linking to locations — coordinates and the rest.

關于shapefile的更多信息可以在這里找到,但總的來說,它們不是正常圖像。 它們是矢量數據存儲格式,具有鏈接到位置(坐標和其余位置)的信息。

First I imported the basic packages that I needed and then the new packages:

首先,我導入了所需的基本軟件包,然后導入了新軟件包:

import matplotlib.pyplot as plt
import numpy as np from shapely.geometry import Point,Polygon
import geopandas as gpd
import descartes

The Point and Polygon features are what help me match my data to the map I make.

多邊形功能可以幫助我將數據與我制作的地圖進行匹配。

Next, I load in my data. This is basic pandas but for those that are new, everything in quotations is the name of the file I had to access the housing records.

接下來,我加載我的數據。 這是基本的大熊貓,但對于新熊貓,引號中的所有內容都是我必須訪問房屋記錄的文件的名稱。

df = pd.read_csv('kc_house_data_train.csv')

With all of the packages imported and the data ready to go, I wanted to take a look at the map I was going to be plotting. I did this by finding a shape file made by the King County government website. They have done all the hard work of surveying and cataloging the land — it would be rude to not use their freely offered services. Loading in the shape file is easy and comparable to loading in a csv file with pandas.

導入了所有軟件包并準備好數據后,我想看一下我要繪制的地圖。 我通過查找金縣政府網站制作的形狀文件來完成此操作。 他們已經完成了土地測量和分類的所有艱苦工作-不使用免費提供的服務是不禮貌的。 加載到shape文件中很容易,并且與使用pandas加載到csv文件中相當。

kings_county = gpd.read_file('*file_path_here*/School_Districts_in_King_County___schdst_area.shp')

You can open this up if you want to take a look at the data. The King County shape file was just a dataframe of locations matched with their school districts, geometry coordinates, and area. But the best part is when we plot it and yes, we have to plot it. This isn’t an image you can just call — it will have the coordinates built in so our data can be placed down like a point on a 5th grade (x,y) graph.

如果要查看數據,可以打開此窗口。 金縣形狀文件只是與他們的學區,幾何坐標和面積相匹配的位置的數據框。 但是最好的部分是當我們繪制它時,是的,我們必須繪制它。 這不是您只能調用的圖像-它具有內置的坐標,因此我們的數據可以像5級(x,y)圖上的點一樣放置。

Using the below code (notice how I edited it the same way I would edit a graph):

使用下面的代碼(注意,我以與編輯圖形相同的方式對其進行編輯):

fig, ax = plt.subplots(figsize = (15,15))
kings_county.plot(ax=ax)
ax.set_title('King County',fontdict = {'fontsize': 30})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

My output looked like this:

我的輸出看起來像這樣:

Image for post
Graphic by Author
圖形作者

Before we start adding our housing data we should look at utilizing the shape file to the fullest. Let’s take a look at the file.

在開始添加房屋數據之前,我們應該充分利用形狀文件。 讓我們看一下文件。

OID  D#  NAME                              geometry
0   1   1   Seattle           MULTIPOLYGON (((-122.40324 47.66637...
1   2   210 Federal Way       POLYGON ((-122.29057 47.39374...
2   3   216 Enumclaw          POLYGON ((-121.84898 47.34708...
3   4   400 Mercer Island     POLYGON ((-122.24475 47.59601...
4   5   401 Highline          POLYGON ((-122.35853 47.51553...- Truncated for clarity

As you can see, the county is divided on school districts — each with a shape used as boundaries. We will now try to plot the shape file and annotate the districts using the data provided like so:

如您所見,該縣分為多個學區-每個學區的形狀都用作邊界。 現在,我們將嘗試繪制形狀文件并使用提供的數據對區域進行注釋,如下所示:

left = ['Riverview','Snoqualmie Valley']
center = ['Skykomish','Kent','Auburn','Tahoma','VashonIsland','Northshore','Shoreline','Renton','Highline','Issaquah','Enumclaw','Seattle','FederalWay','Bellevue','Mercer Island','LakeWashington','Tukwila']
right = ['Fife']
kings_county.plot(figsize = (15,15),cmap = 'gist_earth')
for idx, row in kings_county.iterrows():if row['NAME'] in left:plt.annotate(s=row['NAME'], xy=row['coords'],ha='left', color = 'red')elif row['NAME'] in center:plt.annotate(s=row['NAME'], xy=row['coords'],ha='center', color = 'red')elif row['NAME'] in right:plt.annotate(s=row['NAME'], xy=row['coords'],ha='right', color = 'red')
plt.title('School Districts in Kings County, WA', fontdict = {'fontsize': 20})
plt.ylabel('Latitude',fontdict = {'fontsize': 20})
plt.xlabel('Longitude',fontdict = {'fontsize': 20})

The lists — left, right, center — are from trial and error with the placement of the district names. Some overlapped or needed to be manipulated so that they did not stray too far from their actual district.

列表(左,右,中心)來自地區名稱的放置,反復嘗試。 有些重疊或需要進行操縱,以使它們不會偏離實際區域。

I’ve changed the color map to gist_earth for clarity. Next, I iterated through each row using the entry in the NAME series, and placing the title at a point that was definitely in the polygon. I aligned the names based on the lists I had made earlier. And this was out output:

為了清楚起見,我將顏色映射更改為gist_earth 。 接下來,我使用NAME系列中的條目遍歷每一行,并將標題放置在肯定位于多邊形中的點上。 我根據之前的清單排列了名稱。 這是輸出:

Image for post
School Districts of King County. Graphic by Author
金縣學區。 圖形作者

Each of the regions signifies a school district in King County. This matches the data I found about the twenty school districts in the county. I never really thought about the size and shape of a county, so I googled it just to be sure.

每個地區都代表金縣的學區。 這與我發現的有關該縣二十個學區的數據相匹配。 我從來沒有真正考慮過一個縣的大小和形狀,所以我用谷歌搜索只是為了確定。

Image of Washington State with King County highlighted. From Google Maps
Source: Google Maps
資料來源:Google地圖

It seemed like the Google Maps image was the perfect hole for my puzzle piece. From here, it was just a matter of formatting my data to fit the shape file. I did that by initiating my coordinate system and creating applicable points using the latitude and longitude of my houses.

似乎Google Maps圖像是我的拼圖的完美選擇。 從這里開始,只需要格式化我的數據以適合形狀文件即可。 我通過啟動坐標系并使用房屋的緯度和經度來創建適用的點來完成此操作。

crs = {'init': 'epsg:4326'} # initiating my coordinate system
geometry = [Point(x,y) for x,y in zip(df.long,df.lat)] # creating points

If you were to look at an entry in geometry, you only get back that they are shapely objects. They need to be applied to our original dataframe. Below, you can see as I make a brand new dataframe that has the coordinate system built in, the old dataframe, and the addition of the points created by the intersection of the Latitude and Longitude of the houses.

如果要查看幾何圖形中的條目,您只會發現它們是勻稱的對象。 它們需要應用于我們的原始數據框。 在下面,您可以看到當我制作一個全新的數據框時,該數據框內置了坐標系,舊的數據框,并添加了房屋的經度和緯度相交點。

geo_df = gpd.GeoDataFrame(df, # the dataframecrs = crs, # coordinate systemgeometry = geometry) # geometric points

That was the last step before we can plot the houses. Now, we put it all together.

那是我們繪制房屋之前的最后一步。 現在,我們將所有內容放在一起。

fig, ax = plt.subplots(figsize = (15,16))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df.plot(ax = ax , markersize = 2, color = 'blue',marker ='o',label = 'House', aspect = 1)
plt.legend(prop = {'size':10} )
ax.set_title('Houses in Kings County, WA', fontdict = {'fontsize':20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

在上面的代碼中,步驟包括: (In the code above, the steps include:)

  1. Calling an object to plot.

    調用對象進行繪圖。
  2. Plotting the King County shape file.

    繪制金縣形狀文件。
  3. Plotting the data I made that includes the geometry point.

    繪制我制作的包括幾何點的數據。

    This includes making markers, choosing the aspect, and adding the label for the legend.

    這包括制作標記,選擇外觀以及為圖例添加標簽。

  4. Adding a legend, title, and axis labels.

    添加圖例,標題和軸標簽。

These steps were done for each of the graphs.

對每個圖形都完成了這些步驟。

Our output:

我們的輸出:

Image for post

This is a great product but our goal is to learn something from this visualization. While this gives some information, like the outliers far to the eastern part of the county, it doesn’t give much else. We have to play with parameters. Let’s try splitting the data by price. These are the houses that are listed for less than $750,000.

這是一個很棒的產品,但是我們的目標是從可視化中學習一些東西。 盡管這提供了一些信息,例如該縣東部的離群值,但它并沒有提供其他信息。 我們必須使用參數。 讓我們嘗試按價格劃分數據。 這些房屋的標價低于750,000美元。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] < 750000].plot(ax = ax , markersize = 2,color = 'red',marker = 's',label = 'Price < 750k',aspect = 1.5)
plt.legend(prop = {'size':15} )
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})
Image for post
Houses priced below $750,000. Graphic by Author
價格低于750,000美元的房屋。 圖形作者

Now we graph the houses greater than or equal to $750,000.

現在我們繪制大于或等于750,000美元的房子的圖。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] >= 750000].plot(ax = ax , markersize = 2,color = 'yellow',marker = 'v',label = 'Price >=750k', aspect = 1.5)
plt.legend(prop = {'size':15})
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})
Image for post
Houses priced above $750,000. Graphic by Author
價格在750,000美元以上的房屋。 圖形作者

There is a big difference in terms of both location and quantity. But that is not the end, we can also layer them one on top of the other. We will be doing the expensive on top of the cheap because it is scarcer.

在位置和數量上都存在很大差異。 但這還沒有結束,我們也可以將它們一個接一個地放置。 我們將在便宜的基礎上再做昂貴的,因為它稀缺。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] < 750000].plot(ax = ax , markersize = 1,color = 'red',marker = 's',label = 'Price <750k = Red', aspect = 1.5)
geo_df[geo_df['price'] >= 750000].plot(ax = ax , markersize = 1,color = 'yellow',marker = 'v',label = 'Price>= 750k = Yellow',aspect = 1.5)
plt.legend(prop = {'size':12})
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})
Image for post
Side by side comparison. Graphic by Author
并排比較。 圖形作者

The picture painted by this map is interesting. There is a plethora of housing in King County that falls below the bar we’ve set. Most of the houses on the lower end of the price scale falls more inland than the more expensive classes.

該地圖繪制的圖片很有趣。 金縣的住房過多,低于我們設定的標準。 價格范圍較低端的大多數房屋比昂貴的房屋價格下跌的地區更多。

If you zoom in, the more expensive houses dot the waterside. They also are more centrally located around the Seattle city center. There are several physical outliers but the trend is clear.

如果放大,則較貴的房屋將點綴在水邊。 它們還位于西雅圖市中心附近的中心位置。 有幾個物理異常值,但趨勢很明顯。

Overall, the visualization has done its job. We have made several determinations from the houses on the map. Pricier houses are collected around the downtown area and spread around Puget Sound. They are also a minority in the data, which could be telling for predicting housing prices. The houses priced on the cheaper side are much more numerous and have a varied location. This will be useful for further EDA.

總體而言,可視化已完成工作。 我們已經從地圖上的房屋中做出了一些決定。 價格較高的房屋在市區周圍收集,并分布在普吉特海灣附近。 他們也是數據中的少數,這可能有助于預測房價。 價格便宜的房屋數量更多,并且位置各異。 這對于進一步的EDA很有用。

If you want to connect to talk more about this technique, you can find me on LinkedIn. If you would like to check out the code, take a look at my Github.

如果您想聯系以更多地談論這種技術,可以在LinkedIn上找到我。 如果您想查看代碼,請查看我的Github 。

資料來源 (Sources)

  • King County Dataset — here

    金縣數據集- 此處

    King County Shape File —

    金縣形狀文件—

    here

    這里

  • Geopandas

    大熊貓

  • Shapely

    勻稱

  • Descartes

    笛卡爾

  • Fiona

    菲奧娜

翻譯自: https://towardsdatascience.com/using-geopandas-for-spatial-visualization-21e78984dc37

多維空間可視化

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390912.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390912.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390912.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

蠻力寫算法_蠻力算法解釋

蠻力寫算法Brute Force Algorithms are exactly what they sound like – straightforward methods of solving a problem that rely on sheer computing power and trying every possibility rather than advanced techniques to improve efficiency.蠻力算法聽起來確實像是–…

NoClassDefFoundError和ClassNotFoundException之間有什么區別?是由什么導致的?

問題&#xff1a; NoClassDefFoundError和ClassNotFoundException之間有什么區別?是由什么導致的&#xff1f; NoClassDefFoundError和ClassNotFoundException之前的區別是什么? 是什么導致它們被拋出?這些問題我們要怎么樣解決? 當我在為了引入新的jar包而修改現有代碼…

關于Tensorflow安裝opencv和pygame

1.安裝opencv https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv C:\ProgramData\Anaconda3\Lib\site-packages>pip install opencv_python-3.3.1-cp36-cp36m-win_amd64.whlProcessing c:\programdata\anaconda3\lib\site-packages\opencv_python-3.3.1-cp36-cp36m-win_a…

內置的常用協議實現模版

SuperSocket內置的常用協議實現模版 中文&#xff08;中國&#xff09;Toggle Dropdownv1.6Toggle Dropdown關鍵字: TerminatorReceiveFilter, CountSpliterReceiveFilter, FixedSizeReceiveFilter, BeginEndMarkReceiveFilter, FixedHeaderReceiveFilter 閱讀了前面一篇文檔之…

機器學習 來源框架_機器學習的秘密來源:策展

機器學習 來源框架成功的機器學習/人工智能方法 (Methods for successful Machine learning / Artificial Intelligence) It’s widely stated that data is the new oil, and like oil, data needs the right refinement to evolve to be utilised perfectly. The power of ma…

linux gcc 示例_最好的Linux示例

linux gcc 示例Linux is a powerful operating system that powers most servers and most mobile devices. In this guide, we will show you examples of how to use some of its most powerful features. This involves using the Bash command line.Linux是功能強大的操作系…

帆軟報表和jeecg的進一步整合--ajax給后臺傳遞map類型的參數

下面是頁面代碼&#xff1a; <% page language"java" contentType"text/html; charsetUTF-8" pageEncoding"UTF-8"%> <%include file"/context/mytags.jsp"%> <% String deptIds (String)request.getAttribute("…

@Nullable 注解的用法

問題&#xff1a;Nullable 注解的用法 我看到java中的一些方法聲明為: void foo(Nullable Object obj){…}在這里Nullable是什么意思?這是不是意味著輸入可以為空? 沒有這個注解&#xff0c;輸入仍然可以是null&#xff0c;所以我猜這不是它的用法? 回答一 它清楚地說明…

WebLogic調用WebService提示Failed to localize、Failed to create WsdlDefinitionFeature

在本地Tomcat環境下調用WebService正常&#xff0c;但是部署到WebLogic環境中&#xff0c;則提示警告&#xff1a;[Failed to localize] MEX0008.PARSING_MDATA_FAILURE<SOAP_1_2 ......警告&#xff1a;[Failed to localize] MEX0008.PARSING_MDATA_FAILURE<SOAP_1_1 ..…

呼吁開放外網_服裝數據集:呼吁采取行動

呼吁開放外網Getting a dataset with images is not easy if you want to use it for a course or a book. Yes, there are many datasets with images, but few of them are suitable for commercial or educational use.如果您想將其用于課程或書籍&#xff0c;則獲取帶有圖像…

git push命令_Git Push命令解釋

git push命令The git push command allows you to send (or push) the commits from your local branch in your local Git repository to the remote repository.git push命令允許您將提交(或推送 )從本地Git存儲庫中的本地分支發送到遠程存儲庫。 To be able to push to you…

在Java里面使用Pairs或者二元組

問題&#xff1a;在Java里面使用Pairs或者二元組 在Java里面&#xff0c;我的Hashtable要用到一個元組結構。在Java里面&#xff0c;我可以使用的什么數據結構呢&#xff1f; Hashtable<Long, Tuple<Set<Long>,Set<Long>>> table ...回答一 我不認…

github 搜索技巧

1、關鍵詞 指定開發語言 bitcoin language:javascript 2、關鍵詞 stars 數量 forks 數量 bitcoin stars:>100 forks:>50

React JS 組件間溝通的一些方法

剛入門React可能會因為React的單向數據流的特性而遇到組件間溝通的麻煩&#xff0c;這篇文章主要就說一說如何解決組件間溝通的問題。 1.組件間的關系 1.1 父子組件 ReactJS中數據的流動是單向的&#xff0c;父組件的數據可以通過設置子組件的props傳遞數據給子組件。如果想讓子…

數據可視化分析票房數據報告_票房收入分析和可視化

數據可視化分析票房數據報告Welcome back to my 100 Days of Data Science Challenge Journey. On day 4 and 5, I work on TMDB Box Office Prediction Dataset available on Kaggle.歡迎回到我的100天數據科學挑戰之旅。 在第4天和第5天&#xff0c;我將研究Kaggle上提供的TM…

sql limit子句_SQL子句解釋的位置:之間,之間,類似和其他示例

sql limit子句什么是SQL Where子句&#xff1f; (What is a SQL Where Clause?) WHERE子句(和/或IN &#xff0c; BETWEEN和LIKE ) (The WHERE Clause (and/or, IN , BETWEEN , and LIKE )) The WHERE clause is used to limit the number of rows returned.WHERE子句用…

在Java里面使用instanceof的性能影響

問題&#xff1a;在Java里面使用instanceof的性能影響 我正在寫一個應用程序&#xff0c;其中一種設計方案包含了instanceof操作的大量使用。雖然我知道面向對象設計通常試圖避免使用instanceof&#xff0c;但那是另一回事了&#xff0c;這個問題純粹只是討論與性能有關。我想…

Soot生成控制流圖

1.將soot.jar文件復制到工程bin目錄下&#xff1b;2.在cmd中執行如下命令java -cp soot-trunck.jar soot.tools.CFGViewer --soot-classpath .;"%JAVA_HOME%"\jre\lib\rt.jar com.wauoen.paper.classes.Activity其中&#xff0c;JAVA_HOME是jdk目錄&#xff1b;com.w…

Centos 6.5安裝MySQL-python

報錯信息&#xff1a;Using cached MySQL-python-1.2.5.zip Complete output from command python setup.py egg_info: sh: mysql_config: command not found Traceback (most recent call last): File "<string>", line 1, in <module&g…

react 最佳實踐_最佳React教程

react 最佳實踐React is a JavaScript library for building user interfaces. It was voted the most loved in the “Frameworks, Libraries, and Other Technologies” category of Stack Overflow’s 2017 Developer Survey.React是一個用于構建用戶界面JavaScript庫。 在S…