java 在底圖上繪制線條
This is the third of four stories that aim to address the issue of identifying disease outbreaks by extracting news headlines from popular news sources.
這是四個故事中的第三個,旨在通過從流行新聞來源中提取新聞頭條來解決識別疾病暴發的問題。
This article aims to determine an easy way to view the clusters determined (in the second article) on a global and US-level scale. First, a list of large cities are gathered, and placed with their corresponding latitude and longitude inside a dataset. Next, a function is made that plots the cluster points on a map with different colors for each respective cluster. Lastly, the function is called for the points in the United States, the centers of the clusters in the United States, the points globally, and the centers of the clusters globally.
本文旨在確定一種簡單的方法來查看在全球和美國范圍內確定的集群(在第二篇文章中)。 首先,收集大城市列表,并將其對應的緯度和經度放置在數據集中。 接下來,創建一個函數,在每個地圖上用不同的顏色繪制地圖上的聚類點。 最后,該函數用于美國的點,美國的聚類中心,全球的點以及全球的聚類中心。
A detailed explanation is shown below for how this is implemented:
下面顯示了有關如何實現的詳細說明:
Step 1: Compiling a List of the Largest Cities in the US
步驟1:編制美國最大城市清單
First, the city name, latitude, longitude, and population are extracted from ‘largest_us_cities.csv’, a file containing the cities in the US with a population over 30,000. Cities with a population over 200,000 were added to the dictionary, and Anchorage and Honolulu were excluded as they skewed the positioning of the map. Next, using the haversine distance formula, which determines the distance between pairs of cities, cities close to one another were excluded and used a population heuristic to determine which city should should be kept.
首先,從“ largest_us_cities.csv”中提取城市名稱,緯度,經度和人口,該文件包含美國人口超過30,000的城市。 人口超過200,000的城市被添加到詞典中,并且由于錨定地圖和地圖的位置偏斜,因此將安克雷奇和檀香山排除在外。 接下來,使用Haversine距離公式確定兩對城市之間的距離,將彼此靠近的城市排除在外,并使用人口啟發法確定應保留的城市。
file2 = open('largest_us_cities.csv', 'r')
large_cities = file2.readlines()
large_city_data = {}for i in range(1, len(large_cities)):
large_city_values = large_cities[i].strip().split(';')
lat_long = large_city_values[-1].split(',')if ((int(large_city_values[-2]) >= 200000) and (large_city_values[0] != "Anchorage") and (large_city_values[0] != "Honolulu") and (large_city_values[0] != "Greensboro")):
large_city_data[large_city_values[0]] = [lat_long[0], lat_long[1], large_city_values[-2]]def haversine(point_a, point_b):
lon1, lat1 = point_a[0], point_a[1]
lon2, lat2 = point_b[0], point_b[1]
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371return c * rfor i in list(large_city_data.keys()):for j in list(large_city_data.keys()):if ((i != j) and haversine((float(large_city_data[i][0]), float(large_city_data[i][1])), (float(large_city_data[j][0]),
float(large_city_data[j][1]))) < 80.0):if (large_city_data[j][2] > large_city_data[i][2]):
large_city_data[i] = [np.nan, np.nan, large_city_data[i][2]]else:
large_city_data[j] = [np.nan, np.nan, large_city_data[j][2]]
large_city_data['Chicago'] = [41.8781136, -87.6297982, 2718782]
Step 2: Plotting K-Means Clusters and Cluster Centers Using Basemap
步驟2:使用底圖繪制K均值聚類和聚類中心
First, a function is created with seven parameters: df1, num_cluster, typeof, path, size, add_large_city, and figsize. Using the basemap library, depending on the typeof parameter, geographic models of the US and world are generated. Furthermore, the figsize parameter changes the model size depending on its value. A dictionary is created where the keys are the cluster labels, subdivided by latitude and longitude. The values contain the latitude and longitude for each headline for each cluster label.
首先,使用七個參數創建一個函數:df1,num_cluster,typeof,路徑,大小,add_large_city和figsize。 使用底圖庫,根據參數的typeof,可以生成美國和世界的地理模型。 此外,figsize參數根據其值更改模型大小。 將創建一個字典,其中鍵是聚類標簽,按緯度和經度細分。 該值包含每個群集標簽的每個標題的緯度和經度。
A list of colors is intitialized, and specific colors are assigned to each cluster label. The latitude and longitude points are plotted using these color values on the geographic models made above. If the add_large_city parameter is true, the largest cities will also be added to the graph. The figure is saved to a “.png” file using the path parameter.
顏色列表被初始化,并且特定的顏色分配給每個群集標簽。 使用這些顏色值在上述地理模型上繪制緯度和經度點。 如果add_large_city參數為true,則最大的城市也將添加到圖形中。 使用path參數將圖形保存到“ .png”文件中。
def print_k_means(df1, num_cluster, typeof, path, size, add_large_city, figsize):if (typeof == "US"):
map_plotter = Basemap(projection='lcc', lon_0=-95, llcrnrlon=-119, llcrnrlat=22, urcrnrlon=-64, urcrnrlat=49, lat_1=33, lat_2=45)else:
map_plotter = Basemap()if (figsize):
fig = plt.figure(figsize = (24,16))else:
fig = plt.figure(figsize = (12,8))
coordinates = []for index in df1.index:
coordinates.append([df1['latitude'][index], df1['longitude'][index], df1['cluster_label'][index]])
cluster_vals = {}for i in range(num_cluster):
cluster_vals[str(i)+"_long"] = []
cluster_vals[str(i)+"_lat"] = []for index in df1.index:
cluster_vals[str(df1['cluster_label'][index])+'_long'].append(float(df1['longitude'][index]))
cluster_vals[str(df1['cluster_label'][index])+'_lat'].append(float(df1['latitude'][index]))
num_list = [i for i in range(num_cluster)]
color_list = ['rosybrown', 'lightcoral', 'indianred', 'brown',
'maroon', 'red', 'darksalmon', 'sienna', 'chocolate', 'sandybrown', 'peru',
'darkorange', 'burlywood', 'orange', 'tan', 'darkgoldenrod', 'goldenrod', 'gold', 'darkkhaki',
'olive', 'olivedrab', 'yellowgreen', 'darkolivegreen', 'chartreuse',
'darkseagreen', 'forestgreen', 'darkgreen', 'mediumseagreen', 'mediumaquamarine',
'turquoise', 'lightseagreen', 'darkslategrey', 'darkcyan',
'cadetblue', 'deepskyblue', 'lightskyblue', 'steelblue', 'lightslategrey',
'midnightblue', 'mediumblue', 'blue', 'slateblue', 'darkslateblue', 'mediumpurple', 'rebeccapurple',
'thistle', 'plum', 'violet', 'purple', 'fuchsia', 'orchid', 'mediumvioletred', 'deeppink', 'hotpink',
'palevioletred']
colors = [color_list[i] for i in range(num_cluster+1)]for target,color in zip(num_list, colors):
map_plotter.scatter(cluster_vals[str(target)+'_long'], cluster_vals[str(target)+'_lat'], latlon=True, s = size, c = color)
map_plotter.shadedrelief()if (add_large_city):for index in list(large_city_data.keys()):if (large_city_data[index][1] != np.nan):
x, y = map_plotter(large_city_data[index][1], large_city_data[index][0])
plt.plot(x, y, "ok", markersize = 4)
plt.text(x, y, index, fontsize = 16)
plt.show()
fig.savefig(path)
Step 3: Running the Function
步驟3:運行功能
The print_k_means function is run on the df_no_us dataframe to make a scatterplot of the latitude and longitudes for headlines pertaining to the US. Next, a geographic center to each cluster is determined and stored in another dataframe called df_center_us. The print_k_means function is run on the df_center_us dataframe and adds large cities to determine the cities closest to the disease outbreak centers. Additionally, the size is increased for easier readability. A similar process is run for df_no_world. Each of the dataframes are stored in a “.csv” file.
在df_no_us數據幀上運行print_k_means函數,以制作與美國相關的標題的經度和緯度散點圖。 接下來,確定每個群集的地理中心,并將其存儲在另一個名為df_center_us的數據框中。 print_k_means函數在df_center_us數據幀上運行,并添加大城市以確定最接近疾病爆發中心的城市。 此外,增加了大小以更易于閱讀。 df_no_world運行類似的過程。 每個數據幀都存儲在“ .csv”文件中。
print_k_means(df_no_us, us_clusters, "US", "corona_disease_outbreaks_us.png", 50, False, False)
df_no_us.to_csv("corona_disease_outbreaks_us.csv")
df_center_us = {'latitude': [], 'longitude':[] , 'cluster_label': []}for i in range(us_clusters):
df_1 = df_no_us.loc[df_no_us['cluster_label'] == i]
df_1 = df_1.reset_index()del df_1['index']
latitude = []
longitude = []for index in df_1.index:
latitude.append(float(df_1['latitude'][index]))
longitude.append(float(df_1['longitude'][index]))
df_1['latitude'] = latitude
df_1['longitude'] = longitude
sum_latitude = df_1['latitude'].sum()
sum_longitude = df_1['longitude'].sum()if (len(df_1['latitude']) >= 20):
df_center_us['latitude'].append(sum_latitude/(len(df_1['latitude'])))
df_center_us['cluster_label'].append(i)
df_center_us['longitude'].append(sum_longitude/(len(df_1['longitude'])))
df_center_us = pd.DataFrame(data = df_center_us)for index in df_center_us.index:
df_center_us['cluster_label'][index] = index
print_k_means(df_center_us, len(df_center_us['latitude']), "US", "corona_disease_outbreaks_us_centers.png", 500, True, True)
df_center_us.to_csv("corona_disease_outbreaks_us_centers.csv")
df_center_world = {'latitude': [], 'longitude':[] , 'cluster_label': []}for i in range(world_clusters):
df_1 = df_no_world.loc[df_no_world['cluster_label'] == i]
df_1 = df_1.reset_index()del df_1['index']
latitude = []
longitude = []for index in df_1.index:
latitude.append(float(df_1['latitude'][index]))
longitude.append(float(df_1['longitude'][index]))
df_1['latitude'] = latitude
df_1['longitude'] = longitude
sum_latitude = df_1['latitude'].sum()
sum_longitude = df_1['longitude'].sum()if (len(df_1['latitude']) >= 10):
df_center_world['latitude'].append(sum_latitude/(len(df_1['latitude'])))
df_center_world['cluster_label'].append(i)
df_center_world['longitude'].append(sum_longitude/(len(df_1['longitude'])))
df_center_world = pd.DataFrame(data = df_center_world)for index in df_center_world.index:
df_center_world['cluster_label'][index] = index
print_k_means(df_center_world, len(df_center_world['latitude']), "world", "corona_disease_outbreaks_world_centers.png", 500, False, True)
df_center_us.to_csv("corona_disease_outbreaks_world_centers.csv")
Click this link for access to the Github repository for a detailed explanation of the code: Github.
單擊此鏈接可訪問Github存儲庫,以獲取代碼的詳細說明: Github 。
翻譯自: https://medium.com/@neuralyte/using-basemap-and-geonamescache-to-plot-k-means-clusters-995847513fc2
java 在底圖上繪制線條
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389972.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389972.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389972.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!