COVID-19和世界幸福報告數據告訴我們什么?

For many people, the idea of ??staying home actually sounded good at first. This process was really efficient for Netflix and Amazon. But then sad truths awaited us. What was boring was the number of dead and intubated patients one after the other. We all know the aftermath well.

對于很多人來說,居家的想法一開始聽起來確實不錯。 對于Netflix和Amazon,此過程確實非常有效。 但是可悲的事實等待著我們。 無聊的是死者和插管者的數量接連不斷增加。 我們都知道后果。

In this article, we will try to examine the covid-19 virus, which can affect all countries in the world, and the relationship between it and the countries explained in the happiness report.

在本文中,我們將嘗試檢查可能影響世界所有國家的covid-19病毒,以及它與幸福報告中解釋的國家之間的關系。

Before we start, let’s get to know our datasets:

在開始之前,讓我們了解我們的數據集:

  • ‘covid19_Confirmed_dataset.csv’ (Data include 96 days from the first case)

    'covid19_Confirmed_dataset.csv'(數據包括自第一種情況起的96天)
  • ‘worldwide_happiness_report.csv’

    'worldwide_happiness_report.csv'

And of course the libraries we will use:

當然,我們將使用的庫:

import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

First of all, we will need a small clearing of our data. We will follow the following procedures to obtain our data frame, where ‘Lat’ and ‘Long’ are removed from the columns:

首先,我們將需要少量清除數據。 我們將按照以下過程獲取數據框,其中從列中刪除了“ Lat”和“ Long”:

corona_dataset_csv.drop(["Lat", "Long"],axis=1,inplace=True)

And only the country names and the number of cases are shown day by day:

每天僅顯示國家名稱和案件數:

corona_dataset_aggregated = corona_dataset_csv.groupby("Country/Region").sum()
Image for post
Our first aggregated data frame will look like this.
我們的第一個聚合數據幀將如下所示。

For 3 countries to be shown in the same graphic:

在同一圖形中顯示3個國家/地區:

corona_dataset_aggregated.loc["China"].plot()
corona_dataset_aggregated.loc["Italy"].plot()
corona_dataset_aggregated.loc["Spain"].plot()plt.legend()
Image for post

We will use the derivative function in order to better observe which periods stand out and the prominent trend points in infected numbers. Of course for this: diff ()

我們將使用導數函數,以便更好地觀察哪些時期脫穎而出以及感染人數中的顯著趨勢點。 當然要這樣做:diff()

corona_dataset_aggregated.loc["China"].diff().plot()
Image for post
Max notation for China with diff ()
diff()在中國的最大符號

We add ‘max_infection_rate’ as a new column and refresh our data frame.

我們將“ max_infection_rate”添加為新列,并刷新我們的數據框。

countries = list(corona_dataset_aggregated.index)
max_infection_rates = []
for c in countries : max_infection_rates.append(corona_dataset_aggregated.loc[c].diff().max())max_infection_rates

Meanwhile, we are starting to process our data from the happiness report.To import:

同時,我們開始處理幸福報告中的數據。要導入:

happiness_report_csv = pd.read_csv("worldwide_happiness_report.csv")
Image for post

We extract “Overall rank”, “Score”, “Generosity”, “Perceptions of corruption” from among the columns.

我們從各列中提取“總體排名”,“得分”,“慷慨”,“腐敗感”。

useless_cols = ["Overall rank","Score","Generosity","Perceptions of corruption"]

Now we are including “max_infection_rate” in this framework and we are making countries fits for themselves.

現在,我們將“ max_infection_rate”包含在此框架中,并且正在使國家適合自己。

data = corona_data.join(happiness_report_csv,how="inner")
data.head()
Image for post

We will use the corr () function for the correlation matrix:

我們將對相關矩陣使用corr()函數:

data.corr()
Image for post

As you can see this matrix consists of correlation coefficcients off every two columns in our data set.

如您所見,此矩陣由數據集中每兩列的相關系數組成。

We have ‘max infection rate’ and ‘GDP per capita’ and this data frame is the correlation coefficient between these two variables. As this value gets higher, it means the correlation between these two variables is also higher.

我們有“最大感染率”和“人均GDP”,并且此數據框是這兩個變量之間的相關系數。 隨著該值變高,這意味著這兩個變量之間的相關性也變高。

If you look at other of life factors, for example social support has the life expectancy and freedom to make life choices is also we can see that we have positive correlations betwen all off other life factors.

如果您查看其他生活因素,例如社會支持具有預期壽命和自由選擇生活的機會,那么我們也可以看到我們與其他生活因素之間存在正相關關系。

But our work is not done yet. We know that our Analysis is not finished unless we visualize the results in terms figures and graphs so that everyone can understand what you get out of our analysis.

但是我們的工作還沒有完成。 我們知道,除非我們用術語圖和圖形將結果可視化,以便每個人都可以理解您從分析中得到的結果,否則分析尚未完成。

We found out that there are positive correlation between the max inf rate and all off the life factors that we have in our data set.

我們發現,最大INF比率與數據集中所有壽命因素之間存在正相關。

In this task, i am going to use seaborn module, which is a very handed tool for regionalisation. What we want to do is to plot every each of these columns.

在此任務中,我將使用seaborn模塊,這是用于區域化的非常有用的工具。 我們要做的是繪制每個這些列。

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x,y)
Image for post

However, it is not possible to examine the graph in detail. So this difference between in X axis and Y access has caused the problem that we cannot enough details in our data. So for so to solve this problem, what we can do is to use log scaling:

但是,無法詳細檢查圖表。 因此,X軸訪問和Y軸訪問之間的差異導致了問題,即我們的數據中沒有足夠的細節。 因此,要解決此問題,我們可以做的是使用日志縮放:

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x, np.log(y))
Image for post

This is completely shows us as it goes. So this slope, as you can see there is increase. There is a correlation positive.

這完全向我們展示了一切。 如您所見,這個斜率在增加。 存在正相關。

sns.regplot(x, np.log(y))
Image for post

Very clearly there is a positive slope between these two variables (“max inf rate” & “GDP per capita”)

很明顯,這兩個變量之間存在正斜率(“最大通脹率”和“人均GDP”)

所以 (Consequently)

We have found very interestingresult in this analysis. This result shows that people who are living in developed countries are more prone to getting the infection off Covid-19 with compare off with compared to less developed countries.Can be said that this result is because off lack of corona test kits in less developed countries, in order to prove that this is not the case.

我們在這項分析中發現了非常有趣的結果。 該結果表明,與欠發達國家相比,生活在發達國家的人更容易感染Covid-19,這可以說是因為欠發達國家缺少電暈測試儀,以證明事實并非如此。

Even so i recommend to do the similar analysis on the data said related to cumulative number of the deaths.

即便如此,我還是建議對與死亡總數相關的數據進行類似分析。

See here for more: https://github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

有關更多信息,請參見此處: https : //github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

翻譯自: https://medium.com/think-make/what-does-covid-19-and-world-happiness-report-data-tell-us-c76bdd44b7ac

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/391266.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/391266.shtml
英文地址,請注明出處:http://en.pswp.cn/news/391266.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Python:self理解

Python類 class Student:# 類變量,可以通過類.類變量(Student.classroom)或者實例.類變量(a.classroom)方式調用classroom 火箭班def __init__(self, name, age):# self代表類的實例,self.name name表示當實例化Student時傳入的name參數賦值給類的實例…

leetcode 633. 平方數之和(雙指針)

給定一個非負整數 c ,你要判斷是否存在兩個整數 a 和 b,使得 a2 b2 c 。 示例 1: 輸入:c 5 輸出:true 解釋:1 * 1 2 * 2 5 示例 2: 輸入:c 3 輸出:false 示例 3&…

洛谷 P2919 [USACO08NOV]守護農場Guarding the Farm

題目描述 The farm has many hills upon which Farmer John would like to place guards to ensure the safety of his valuable milk-cows. He wonders how many guards he will need if he wishes to put one on top of each hill. He has a map supplied as a matrix of int…

iOS 開發一定要嘗試的 Texture(ASDK)

原文鏈接 - iOS 開發一定要嘗試的 Texture(ASDK)(排版正常, 包含視頻) 前言 本篇所涉及的性能問題我都將根據滑動的流暢性來評判, 包括掉幀情況和一些實際體驗 ASDK 已經改名為 Texture, 我習慣稱作 ASDK 編譯環境: MacOS 10.13.3, Xcode 9.2 參與測試機型: iPhone 6 10.3.3, i…

lisp語言是最好的語言_Lisp可能不是數據科學的最佳語言,但是我們仍然可以從中學到什么呢?...

lisp語言是最好的語言This article is in response to Emmet Boudreau’s article ‘Should We be Using Lisp for Data-Science’.本文是對 Emmet Boudreau的文章“我們應該將Lisp用于數據科學”的 回應 。 Below, unless otherwise stated, lisp refers to Common Lisp; in …

鏈接訪問后刷新顏色回到初始_如何使鏈接可訪問(提示:顏色不夠)

鏈接訪問后刷新顏色回到初始Link accessibility is one of the most important aspects of usability. However, designers often dont understand what it takes to make links accessible. Most frequently, they only distinguish links by color, which makes it hard for …

567

567 轉載于:https://www.cnblogs.com/Forever77/p/11519678.html

leetcode 403. 青蛙過河(dp)

一只青蛙想要過河。 假定河流被等分為若干個單元格,并且在每一個單元格內都有可能放有一塊石子(也有可能沒有)。 青蛙可以跳上石子,但是不可以跳入水中。 給你石子的位置列表 stones(用單元格序號 升序 表示&#xff…

static、volatile、synchronize

原子性(排他性):不論是多核還是單核,具有原子性的量,同一時刻只能有一個線程來對它進行操作!可見性:多個線程對同一份數據操作,thread1改變了某個變量的值,要保證thread2…

tensorflow基本教程

轉載自 http://tensornews.cn/ 轉載于:https://www.cnblogs.com/Chris-01/p/11523316.html

1.10-linux三劍客之sed命令詳解及用法

內容:1.sed命令介紹2.語法格式,常用功能查詢 增加 替換 批量修改文件名第1章 sed是什么字符流編輯器 Stream Editor第2章 sed功能與版本處理出文本文件,日志,配置文件等增加,刪除,修改,查詢sed --versionsed -i 修改文件內容第3章 語法格式3.1 語法格式sed [選項] [sed指令…

python pca主成分_超越“經典” PCA:功能主成分分析(FPCA)應用于使用Python的時間序列...

python pca主成分FPCA is traditionally implemented with R but the “FDASRSF” package from J. Derek Tucker will achieve similar (and even greater) results in Python.FPCA傳統上是使用R實現的,但是J. Derek Tucker的“ FDASRSF ”軟件包將在Python中獲得相…

blender視圖縮放_如何使用主視圖類型縮放Elm視圖

blender視圖縮放A concept to help Elm Views scale as applications grow larger and more complicated.當應用程序變得更大和更復雜時,可幫助Elm Views擴展的概念。 In Elm, there are a lot of great ways to scale the Model, and update, but there is more c…

初探Golang(2)-常量和命名規范

1 命名規范 1.1 Go是一門區分大小寫的語言。 命名規則涉及變量、常量、全局函數、結構、接口、方法等的命名。 Go語言從語法層面進行了以下限定:任何需要對外暴露的名字必須以大寫字母開頭,不需要對外暴露的則應該以小寫字母開頭。 當命名&#xff08…

789

789 轉載于:https://www.cnblogs.com/Forever77/p/11524161.html

sql的split()函數

ALTER function [dbo].[StrToList_Test](Str varchar(max), fg NVARCHAR(200)) returns table table(value nvarchar(max) ) as begindeclare tempStr nvarchar(max),len INT LEN(fg); --去除前后分割符 while substring(Str,1,len)fg beginset Strsubstring(Str,len1,len(S…

大數據平臺構建_如何像產品一樣構建數據平臺

大數據平臺構建重點 (Top highlight)Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on…

初探Golang(3)-數據類型

Go語言擁有兩大數據類型,基本數據類型和復合數據類型。 1. 數值類型 ##有符號整數 int8(-128 -> 127) int16(-32768 -> 32767) int32(-2,147,483,648 -> 2,147,483,647) int64&#x…

freecodecamp_freeCodeCamp的服務器到底發生了什么?

freecodecampUpdate at 17:00 California time: We have now fixed most of the problems. Were still working on a few known issues, but /learn is now fully operational.加利福尼亞時間17:00更新 :我們現在解決了大多數問題。 我們仍在處理一些已知問題&#…

為什么Linux下的環境變量要用大寫而不是小寫

境變量的名稱通常用大寫字母來定義。實際上用小寫字母來定義環境變量也不會報錯,只是習慣上都是用大寫字母來表示的。 首先說明一下,在Windows下是不區分大小寫的,所以在Windows下怎么寫都能獲取到值。 而Linux下不同,區分大小寫&…