使用Matplotlib Numpy Pandas構想泰坦尼克號高潮

Did you know, a novel predicted the Titanic sinking 14 years previously to the actual disaster???

您知道嗎，一本小說預言泰坦尼克號在14年前沉沒到了真正的災難中？？？

In 1898 (14 years before the Titanic sank), American author Morgan Robertson wrote a novel titled ‘The Wreck of the Titan.’

1898年(泰坦尼克號沉沒之前的14年)，美國作家摩根·羅伯森(Morgan Robertson)寫了一本名為《泰坦號的殘骸》的小說。

The book was about a fictional ocean liner that sinks due to a collision with an iceberg. In the book, the ship is described as being “unsinkable” and doesn’t have enough lifeboats for everyone on board, sounds familiar yeah you’re right it’s the epic story of titanic which was predicted years ago.

這本書是關于一個虛構的遠洋客輪，由于與冰山相撞而沉沒。在這本書中，這艘船被描述為“不沉”，船上沒有足夠的救生艇供所有人使用，聽起來不錯，是的，這是多年前預言的泰坦尼克號的史詩故事。

We cannot conclude whether the author had technical proofs for his prediction, but we as responsible Data science enthusiasts can predict the possibilities and outcomes of the disaster using the data set and what not we can even try to envision the various prospects of the climax.

我們無法斷定作者是否有預測的技術證據，但是作為負責任的數據科學愛好者，我們可以使用數據集來預測災難的可能性和后果，而我們甚至不可以嘗試設想高潮的各種前景。

I am sure that all of us know what happened to Rose and Jack in the movie Titanic. We all wished that the story had a different ending, didn’t we? Let’s try to make our wish come true by recreating the climax of the story by a simple analysis of the story plot,

我確定我們所有人都知道電影《泰坦尼克號》中羅斯和杰克的事。我們都希望這個故事有一個不同的結局，不是嗎？讓我們通過對故事情節的簡單分析來重現故事的高潮，以實現我們的愿望，

At the end of the analysis we will be creating three climaxes and come to know the answer of three questions:

在分析的最后，我們將創建三個高潮，并了解三個問題的答案：

? Is there a possibility for jack to be alive and rose’s survival?

?杰克有可能活著并生存下來嗎？

? Was there a chance for Jack and Rose together to narrate their adventurous story to their grandchildren?

?杰克和羅斯有沒有機會向孫子們講述他們的冒險故事？

? Did Cal Hockley (Rose’s Fiancé) have a higher chance of survival as he belonged to the upper-class or what would make the villain dead?

?卡爾·霍克利(羅斯的未婚妻)是屬于上流社會的人，他有更高的生存機會嗎？或者會使小人喪生？

We are carrying out our analysis using the ‘Matplotlib’, ‘Numpy’,’Pandas’, and ‘Seaborn’ Libraries.

我們正在使用“ Matplotlib”，“ Numpy”，“ Pandas”和“ Seaborn”圖書館進行分析。

Let us see what each library function is:

讓我們看看每個庫函數是什么：

Matplotlib is a python library used for visualizing data sets using various plots; it has more than 50 plots to name a few, bar plot, line plot, histogram, etc.

Matplotlib是一個python庫，用于使用各種圖表可視化數據集。它有50多個圖，例如條形圖，線形圖，直方圖等。

Numpy is also a Python library that provides a high-performance multidimensional array and basic tools to compute with and manipulate these arrays.

Numpy還是一個Python庫，它提供了高性能的多維數組和基本工具，可以使用這些數組進行計算和操作。

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

Pandas是為Python編程語言編寫的用于數據處理和分析的軟件庫。特別是，它提供了用于處理數字表和時間序列的數據結構和操作。

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Seaborn是基于Matplotlib的Python數據可視化庫。它提供了一個高級界面，用于繪制引人入勝且內容豐富的統計圖形。

Let us start our journey…

讓我們開始我們的旅程...

數據探索： (Data Exploration:)

import pandas as pd
import numpy as np
import random as rnd
# visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

We import all the necessary libraries and read the data set which has the stat of titanic disaster using the ‘pandas’ library.

我們導入所有必需的庫，并使用“ pandas”庫讀取具有泰坦尼克號災難狀態的數據集。

titanic_df = pd.read_csv('titanic.csv')

Then we display the first five entries using the head command to get a glimpse of the nature of the data and the categories of labels which we are going to explore,

然后，我們使用head命令顯示前五個條目，以大致了解數據的性質以及我們將要探索的標簽類別，

titanic_df.head()

功能分析： (Feature Analysis:)

Looking at titanic_df.describe() we gain a lot of useful insights and find the categorical labels which we can ignore

查看titanic_df.describe()，我們獲得了許多有用的見解，并找到了我們可以忽略的分類標簽

titanic_df.describe()

? PassengerId: Unique for each passenger so this has no relation with the survival label hence this need not be considered for analyzing

?PassengerId：對于每個乘客而言都是唯一的，因此它與生存標簽沒有關系，因此在分析時無需考慮這一點

? Survived: Survival is a binary option, 0 for the passenger is dead and 1 for the passenger is alive, so this will be only ‘Y’ variable in XY plotting

?Survived(生存)：Survival(生存)是一個二進制選項，0表示乘客已死亡，1表示該乘客還活著，因此在XY繪圖中這僅是“ Y”變量

? Pclass: Integer equal to 1, 2, or 3 indicating the class of each passenger (lower, middle, or upper), this can be taken for analyzing as this has three inner categories which may contribute to the survival of passengers

?Pclass：等于1、2或3的整數，表示每個乘客的等級(下層，中層或上層)，這可以用于分析，因為它具有三個內部類別，可能有助于乘客的生存

? Age: Number representing the age of each passenger, though as we can see in titanic_df.tail(), some passengers have NaN for their age, this can also be considered as maybe younger ones can act swiftly and escape so this can also contribute to the survival label

?年齡：代表每位乘客的年齡的數字，盡管正如我們在titanic_df.tail()中看到的那樣，有些乘客的年齡為NaN，這也可以被認為是年齡較小的乘客可以Swift行動并逃脫，因此這也可以有所作為到生存標簽

? SibSp: Number of siblings also on board, we may not completely ignore this, as it may or may not support the survival label

?SibSp：船上也有兄弟姐妹，我們可能不會完全忽略這一點，因為它可能支持或可能不支持生存標簽

? Parch: Number of children also on board, this also has a similar case of SibSp

?Parch：船上也有兒童人數，這也與SibSp類似

? Fare: amount paid for the ticket by each passenger, this may add essence to the Passenger Class label as the higher the fare higher the class of ticket.

?票價：每位乘客支付的機票金額，這可能會在“乘客艙位”標簽上添加實質內容，因為票價越高，機票艙位越高。

For a quick comparison, we’ll create use NumPy functions to verify the mean, standard deviation, min, and max of numerical columns.

為了進行快速比較，我們將創建使用NumPy函數來驗證數字列的均值，標準差，最小值和最大值。

columns = list(titanic_df[['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']])def describe_data(data, col):
    print ('\n\n', col)
    print ('_' * 40)
    print ('Mean:', np.mean(data)) #NumPy Mean
    print ('STD:', np.std(data))   #NumPy STD
    print ('Min:', np.min(data))   #NumPy Min
    print ('Max:', np.max(data))   #NumPy Maxfor c in columns:
    describe_data(titanic_df[c], c)

這些見解包括： (Insights from these are:)

? Survived is a categorical label with 0 or 1 values.

?Survived是具有0或1個值的分類標簽。

? Around 38% of samples survived representative of the actual survival rate at 32%.

?大約有38％的樣本存活下來，代表了32％的實際存活率。

? Most passengers (> 75%) did not travel with parents or children.

?大多數乘客(> 75％)沒有和父母或孩子一起旅行。

? Nearly 30% of the passengers had siblings and/or spouse aboard.

?近30％的乘客有兄弟姐妹和/或配偶上車。

? Fares varied significantly with few passengers (<1%) paying as high as $512.

?票價差異很大，很少乘客(<1％)支付的費用高達512美元。

? Few elderly passengers (<1%) within the age range 65–80.

?65-80歲年齡段的老年乘客(<1％)很少。

Great numbers, Let us move on to realize our dream climaxes…

偉大的數字，讓我們繼續前進，實現我們的夢想高潮…

高潮1：杰克活著玫瑰死了！ (Climax 1: Jack Lived Rose Died!!!)

The vice versa case where Jack narrated his love story to his grandchildren and Rose sadly died, is there a possibility for this? Let us examine by keeping in mind Jack is a male and he belongs to the lower class, and Rose is of Female gender belonging to Upper Class,

反之亦然，杰克向孫子講述自己的愛情故事，而羅斯則不幸去世，這有可能嗎？讓我們記住杰克是男性，他屬于下層階級，羅斯是女性，屬于上層階級，

We first need to break the analysis into several parts. First, we will look at the impact sex had on survival by pivoting the data frame.

我們首先需要將分析分為幾個部分。首先，我們將通過透視數據框來研究性別對生存的影響。

titanic_df.describe(include=['O'])

titanic_df[['Sex', 'Survived']].groupby(['Sex'], as_index=False).mean().sort_values(by='Survived', ascending=False)

This table shows us the percentage of females that survived and the percentage of males that survived. The female survival rate was 74.2%, and the male survival rate was 18.9%. The huge gap between these numbers is an immediate indication that the female survival rate on the Titanic was significantly higher than the male survival rate, and being a woman in fact increase Rose’s chances of survival.

該表向我們顯示了女性存活率和男性存活率。女性生存率為74.2％，男性生存率為18.9％。這些數字之間的巨大差距立即表明，泰坦尼克號上的女性成活率明顯高于男性成活率，而成為女性實際上增加了羅斯的生存機會。

titanic_df['AgeRange'] = pd.cut(titanic_df['Age'], 16)# Calculate proportion of surviors for each AgeRange
titanic_df[['AgeRange', 'Survived']].groupby(['AgeRange'], as_index=False).mean().sort_values(by='AgeRange', ascending=True)

age_sex_hist = sns.FacetGrid(titanic_df, col='Survived', row='Sex', hue='Survived')
age_sex_hist.map(plt.hist, 'Age', bins=20)

This breakdown gives us an extremely interesting and informative view of the answer to survival rate for women and children. If we look at the data for children 5 and under, we can see that sex didn’t have much of an impact on survival. But in large, most of the children in this sample survived, including the males. Another interesting insight we can see is just how many of the males on board died (aside from male children under 5). Men were more likely to have died than to have survived. When we make the same comparison for females, you can see that females in almost every age range were more likely to survive than to have died. To validate the inferences made here, we can look at the numbers in a table once again, though it becomes harder to read with more variables. However, viewing this type of table can emphasize how helpful histograms can be for visualizing data.

這種分類為我們提供了關于婦女和兒童生存率答案的極其有趣和有益的見解。如果我們查看5歲及以下兒童的數據，可以發現性別對生存沒有太大影響。但總體而言，此樣本中的大多數兒童(包括男性)都得以幸存。我們可以看到的另一個有趣的見解是，機上有多少男性死亡(5歲以下的男性兒童除外)。男人比死者更有可能死。當我們對女性進行相同的比較時，您可以看到幾乎每個年齡段的女性生存的可能性都比死亡的可能性大。為了驗證此處所做的推論，我們可以再次查看表中的數字，盡管使用更多的變量將更難以閱讀。但是，查看這種類型的表可以強調直方圖對可視化數據有多大幫助。

titanic_df[['Sex', 'AgeRange', 'Survived']].groupby(['Sex', 'AgeRange'], as_index=False).mean().sort_values(by='AgeRange', ascending=True)

Based on these observations and numbers, we can conclude that both women and children had a higher chance of survival and hence our Climax 1 has a lesser probability of realization also we could not change factors like gender and age to make it as per our wish, So there is lesser possibility of Rose dead and Jack Alive, Poor Jack!!!

根據這些觀察和數字，我們可以得出結論，婦女和兒童都有較高的生存機會，因此我們的Climax 1的實現可能性較小，我們也無法根據自己的意愿改變性別和年齡等因素，因此，羅斯死亡和杰克活著，可憐的杰克的可能性較小！！！

高潮2：杰克和羅斯逃脫并過著幸福的生活！ (Climax 2: Jack and Rose Escaped and lived happily!!!)

Imagine while rose was about finish her adventurous trip story, jack joins her and ends as, “that’s how your grandma and grandpa fell in love with each other and ended up together”

想象一下，當玫瑰即將結束她的冒險旅行故事時，杰克加入她并以“那是你奶奶和爺爺彼此相愛并最終在一起的方式”結尾

That’s the heart-warming climax one would ever want, so what all ways this could be realized, let us analyze the Passenger Class as it is one of the important labels in the titanic data set.

那將是一個令人心動的高潮，因此可以用什么方法實現，讓我們分析一下旅客等級，因為它是泰坦尼克號數據集中的重要標簽之一。

Were upper-class passengers more likely to have made it onto a lifeboat than middle and lower class passengers? Let us make it interesting by examining using bar and point plots.

上層階級的乘客比中層和下層階級的乘客更有可能登上救生艇嗎？讓我們通過使用條形圖和點圖來使其有趣。

titanic_df['Passenger'] = 'Passenger'
# Create Class column with string values for class
titanic_df['Class'] = titanic_df['Pclass'].map( {1: 'Upper', 2: 'Middle', 3: 'Lower'} )# Create PointPlot for Passengers by Class
bp = sns.pointplot(x='Passenger', y='Survived', hue='Class', data=titanic_df, hue_order=['Lower', 'Middle', 'Upper'])
bp.set(ylabel='% Survivors', xlabel='Passenger Class')

This plot shows the average survival and confidence interval of passengers by class. Looking at the breakdown of average survival rates by class shows a correlation between class and rate of survival. Lower class passenger survival ranged somewhere between 20–30%, while upper-class survival ranged somewhere between 55–70%, with middle class ranging somewhere between 35–55%.

該圖顯示了按等級劃分的旅客的平均生存和置信區間。按類別查看平均存活率的細目分類顯示類別與存活率之間的相關性。下層階級的乘客生存率介于20％至30％之間，而上層階級的生存率介于55％至70％之間，中層階級的生存率介于35％至55％之間。

So to make our Climax 2, come true what rose could have done is maybe instead of her joining Jack for the party in his lower passenger class, she could have taken Jack along with her to Upper Class and enjoyed the day, which may have kept them alive for a happy ending.

因此，要使我們的Climax 2成為現實，玫瑰所能做的可能不是代替她參加低級客艙的杰克派對，而是可以將杰克和她一起帶到上層客艙并享受這一天，這可能使他們為了一個幸福的結局而活著。

高潮3：卡爾·霍克利，想讓他死還是活？？？ (Climax 3: Cal Hockley, want him dead or alive???)

Cal Hockley, the villain who took advantage of Rose’s state and tricked her into his marriage proposal, also seemed to be escaped from the sink what could be the reason?

惡棍卡爾·霍克利(Cal Hockley)趁著羅斯(Rose)的州，騙她參加了求婚，似乎也從水槽中逃了出來，這可能是什么原因？

Let us explore in detail the passenger class and gender labels as they contribute more significantly and are varying parameters between Jack and Cal using seaborn plots

讓我們詳細研究旅客等級和性別標簽，因為它們的貢獻更大，并且使用海洋圖在杰克和卡爾之間改變參數

bps = sns.barplot(x='Sex', y='Survived', hue='Class', data=titanic_df, hue_order=['Lower', 'Middle', 'Upper'])
bps.set(ylabel='% Survivors', xlabel='Passenger Sex by Class')

We can see that middle-class female passengers had almost the same rate of survival as upper-class females, but middle-class men had about the same rate of survival as lower classmen, which further illustrates the greater likelihood of women to have survived. Overall we can observe that upper-class passengers did indeed have a higher chance of survival than lower-class passengers regardless of sex.

我們可以看到，中產階級女性乘客的生存率與上層階級女性幾乎相同，但是中產階級男性的生存率與下層階級的男性幾乎相同，這進一步說明了女性生存的可能性更大。總的來說，我們可以觀察到，不論性別，上層旅客確實比下層旅客有更高的生存機會。

np.corrcoef(x=titanic_df['Pclass'], y=titanic_df['Survived'])

A negative correlation tells us is that when class increases (1 → 2 → 3), survival decreases. So, since the lower class is represented as 3, the lower class is correlated with lower survival.

負相關告訴我們，當階級增加時(1→2→3)，生存率降低。因此，由于較低的類別表示為3，因此較低的類別與較低的生存率相關。

Unfortunately, there are more chances for the survival of Cal Hockley!

不幸的是，卡爾·霍克利(Cal Hockley)有更多的生存機會！

除了我們對Climax的假設外，還有一些限制： (Apart from our assumptions to the Climax, there are certain limitations:)

As some of these inferences were drawn based on correlation, it’s always important to remember that correlation does not imply causation (relationship).
由于其中一些推論是根據相關性得出的，因此記住相關性并不意味著因果關系(關系)總是很重要的。
Since we know that some passengers did not have a recorded age, entries with ‘NaN’ (null) were not taken into account when running these numbers.
由于我們知道有些乘客沒有年齡記錄，因此運行這些數字時不會考慮帶有“ NaN”(空)的條目。
Conclusions were drawn based on descriptive statistics, charts, and opted not to run t-tests on the sample.
根據描述性統計數據，圖表得出結論，并選擇不對樣本進行t檢驗。

有趣的發現： (Interesting Findings:)

What proportion of passengers in the sample survived?

樣本中有多少乘客幸存下來？

38% of total passengers in the sample survived
樣本中38％的旅客幸存下來

Did women and children have a higher survival rate?

婦女和兒童的生存率更高嗎？

The female survival rate in this sample was 55.3% higher than the survival rate for males.
該樣本中女性的生存率比男性的生存率高55.3％。
Women had a much higher rate of survival than men.
女人的生存率比男人高得多。
Children under the age of 5, regardless of sex, had a much higher rate of survival
5歲以下的兒童(不論性別)的存活率要高得多

Did upper-class passengers in the sample have an advantage that translated into a higher survival rate than lower-class passengers?

樣本中的上等乘客是否具有比下等乘客具有更高的生存率的優勢？

The class has a strong correlation with survival, with upper-class passengers having a much larger rate of survival than lower-class passengers, regardless of sex and age.
班級與生存能力密切相關，不論性別和年齡，上層旅客的生存率都比下層旅客高得多。
Upper-class passengers were more likely to survive than lower-class passengers.
上層旅客比下層旅客更有可能生存。

So, as we are approaching the climax of our post, quickly let’s summarize, we got some insights about Matplotlib, Numpy, pandas, and seaborn libraries which are essential and inevitable for data science.

因此，隨著我們接近帖子的高潮，我們快速總結一下，我們對Matplotlib，Numpy，pandas和seaborn庫有了一些見解，這對于數據科學是必不可少的。

Also instead of mourning on the loss of Jack and the separation of true love, we tried the possibilities to change the climax, what’s exactly the duty data scientist, to analyze the data and come up with useful possibilities to attain desired outcomes.

此外，我們沒有為杰克的失落和真愛的分離而哀悼，我們嘗試了改變高潮的可能性，這正是數據科學家的職責，它是對數據進行分析并提出有用的可能性來達到預期效果的方法。

Now, it’s your turn pals to create your own customized climaxes and conclusions with these kinds of simple analysis of the data set and come up with creative and innovative endings of your favorite historical epics, kudos for learners!

現在，輪到您了，您可以通過對數據集的這種簡單分析來創建自己的自定義高潮和結論，并為您喜歡的歷史史詩提供創造性的和創新的結局，對學習者們表示敬意！

The End (Happy Ending)!

結局(幸福結局)！

You can find the code and dataset at:

您可以在以下位置找到代碼和數據集：

https://github.com/PradeepaK1/Envision-the-Titatnic-Climax-with-Matplotlib-Numpy-Pandas

Contributors:

貢獻者：

Anjana M P — https://anjana21it.wixsite.com/mysite

Anjana MP- https: //anjana21it.wixsite.com/mysite

Pradeepa K — https://ptljkpd.wixsite.com/pradeepa

Pradeepa K- https: //ptljkpd.wixsite.com/pradeepa

翻譯自: https://medium.com/diva-coders/envision-the-titanic-climax-with-matplotlib-numpy-pandas-d5568cc6d0cc

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/391746.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/391746.shtml
英文地址，請注明出處：http://en.pswp.cn/news/391746.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

spark 架構_深入研究Spark內部和架構

spark 架構by Jayvardhan Reddy通過杰伊瓦爾丹雷迪(Jayvardhan Reddy) 深入研究Spark內部和架構 (Deep-dive into Spark internals and architecture) Apache Spark is an open-source distributed general-purpose cluster-computing framework. A spark application is a JV…

使用faker生成測試數據

需要先安裝faker模塊，pip install faker 導入模塊中的Faker類：from faker import Faker 實例化faker Faker() print(姓名相關) print(姓名:,faker.name()) print(名:,faker.first_name()) print(姓:,faker.last_name()) print(男姓名:,faker.name_male(…

JavaScript中的數組創建

JavaScript中的數組創建本文轉載自：眾成翻譯譯者：loveky 鏈接：http://www.zcfy.cc/article/713 原文：http://rainsoft.io/power-up-the-array-creation-in-javascript/ 數組是一個包含了對象或原始類型的有序集合。很難想象一個…

CODEVS——T1519 過路費

http://codevs.cn/problem/1519/ 時間限制: 1 s空間限制: 256000 KB題目等級 : 大師 Master題解查看運行結果題目描述 Description在某個遙遠的國家里，有 n個城市。編號為 1,2,3,…,n。這個國家的政府修建了m 條雙向道路，每條道路連接著兩個城市。政府規…

pca數學推導_PCA背后的統計和數學概念

pca數學推導As I promised in the previous article, Principal Component Analysis (PCA) with Scikit-learn, today, I’ll discuss the mathematics behind the principal component analysis by manually executing the algorithm using the powerful numpy and pandas lib…