orange 數據分析_使用Orange GUI的放置結果數據分析

orange 數據分析

Objective : Analysing of several factors influencing the recruitment of students and extracting information through plots.

目的:分析影響學生招生和通過情節提取信息的幾個因素。

Description : The following analysis presents the different plots that attempts to link students’ placement prospects, made possible through student perceptions of recruiting organisations to certain academic parameters such as percentage obtained in secondary and higher secondary school, undergraduate degree and post graduation degree.

Description(說明) :以下分析提出了不同的圖,這些圖試圖通過將學生對招募組織的理解與某些學術參數(例如,在中學和高中獲得的百分比,大學學位和畢業學位)的理解聯系起來,從而嘗試聯系學生的就業前景。

Miscellaneous factors such as the gender of the candidate, the choice of board for and the stream opted for in high school and secondary education, undergraduate degree specialisation and post graduate degree specialisation have also been taken into account to predict placement status as well as salary offered.

還考慮了其??他因素,例如候選人的性別,高中和中等教育的董事會選擇和選擇的職位,本科學位專業和研究生學位專業,以預測安置狀況以及所提供的薪水。

Several colleges offer employ-ability tests which serve as a way of helping the employers evaluate their workforce, analyse and judge their skills and hence recruit the right talent. Thus, performance of students in such tests conducted by the college and their previous work experience have also been analysed to deduce their relation with recruitment opportunities.

幾所大學提供就業能力測試,以幫助雇主評估其勞動力,分析和判斷其技能,從而招募合適的人才。 因此,還對學生在大學進行的此類測試中的表現以及他們以前的工作經驗進行了分析,以推斷出他們與招聘機會的關系。

Hypothesis : Students with better scores in secondary education and undergraduate degree have better prospects of getting placed.

假設 :中學教育和大學學位較高的學生有更好的入學前景。

Understanding the Project :

了解項目

Going through the analysis, a reader shall be able to infer :

通過分析,讀者應能夠推斷:

  1. How the choice of board of education influences placement prospects.

    教育委員會的選擇如何影響安置前景。
  2. The relative importance of scores obtained in various degrees and streams in campus recruitment procedure.

    在校園招聘過程中,不同程度和不同等級獲得的分數的相對重要性。
  3. The relation between gender and work experience with salary offered by corporate on campus placements.

    性別和工作經驗與公司在校園安置中提供的薪水之間的關系。

Acknowledgements:

致謝:

Myself Ruchika Parag Barman and my team mate Prafful Chauhan created this notebook/blog as part of the course work under “Pandas, bamboolib & Orange workshop” at Suven, under mentor-ship of Rocky Jagtiani .

我自己的Ruchika Parag Barman和我的隊友Prafful ChauhanRocky Jagtiani的指導下,在Suven的 “熊貓,竹筒和橙子工作坊”下創建了此筆記本/博客,作為該課程工作的一部分。

Learned from https://datascience.suvenconsultants.com.

從https://datascience.suvenconsultants.com了解到。

Mentored by Rocky Jagtiani.

Rocky Jagtiani指導

Dataset:

資料集:

This data set consists of Placement data of students in a XYZ campus. It includes secondary and higher secondary school percentage and specialization. It also includes degree specialization, type and Work experience and salary offers to the placed students.

此數據集包含XYZ校園中學生的安置數據。 它包括中學和高中的百分比和專業。 它還包括學位專業化,類型和工作經驗以及向所安置學生提供的薪水。

Image for post

We have taken 60 observations (no of rows) from which we are extract information through exploratory data analysis and visualization. There are 8 categorical features and 6 numerical features.

我們采用了60個觀測值(無行),通過探索性數據分析和可視化從中提取信息。 有8個分類特征和6個數字特征。

Histograms :

直方圖:

Image for post

Inference : Male students are getting more placements than female students and the ratio of male to female in placements is almost around 2:1.

推論 學生變得比女生更多的展示位置男性的比例, 女性 配股幾乎是2:1左右。

Image for post

Inference : We can inspect that with respect to high school education, Central board students have wider range of salary than the other board students but placement ratio central to others is less than 1.

推論 :我們可以檢查到,就高中教育而言, 中央董事會學生的薪資范圍比其他董事會學生要大,但相對于其他人而言, 中心 職位的就業率低于1。

Image for post

Inference : We can inspect that with respect to secondary education, Central board students have wider range of salary than the other board students.

推論 :我們可以檢查到,就中等教育而言, 中央董事會學生的薪資范圍比其他董事會學生要廣。

Image for post

Inference : Commerce and Arts students have wider range of salary and number of placed students are more as compared to science or other stream.

推論 :與理科其他 專業相比, 商科文科生的薪資范圍更廣,安置學生的數量也更多。

From the above graphs, one can gather that gender plays quite an important role in whether or not a candidate will be hired. It is more likely for a male candidate to get placed at a corporate as compared to a female candidate. Similarly, the board of education and the stream chosen also determine salary offered. Students have been proposed higher amounts of pay that opted for Commerce and Management studies.

從以上圖表可以看出,性別在是否應聘者中起著非常重要的作用。 與女性候選人相比,男性候選人更有可能被安置在公司。 同樣,教育委員會和所選擇的職位也決定了提供的薪水。 建議學生選擇更高的薪水,選擇商務和管理學習。

Correlations :

相關性

Image for post

The correlations table gives us the following ideas :

相關表為我們提供了以下想法:

  1. Students who have scored well in their secondary education are very likely to perform well in their undergraduate degree also.

    中學教育中取得良好成績的學生,其本科學位也很可能會表現良好。

  2. Students who have scored well in their high school education eventually perform well in their secondary education also.

    高中階段成績良好的學生最終在中等教育方面也表現良好。

  3. Again, students who have scored well in their high school education are very likely to perform well in their undergraduate degree also.

    同樣,在高中階段取得良好成績的學生也很可能在本科學位上表現良好。

  4. Most students who have had a good academic record in their high school education also score high in their MBA degree.

    大多數高中學歷良好的學生的MBA學位也很高。

Boxplots :

箱線圖

Image for post

Inference : The above boxplot shows the relation between percentage obtained in the undergraduate degree and placement status. Students who get placed score higher than those who do not get placed. The mean score of placed students is given by 68.6925, standard deviation is 6.189 ,2nd quartile or median is 69.25 ,1st quartile is 64.50 and 3rd quartile is 72.1150.

推論 :上面的方框圖顯示了本科學位所占百分比升學狀況的關系 。 被安置的學生的得分高于沒有被安置的學生。 留學生的平均分數為68.6925,標準差為6.189,第二四分位數或中位數為69.25,第一四分位數為64.50,第三四分位數為72.1150。

Whereas, the mean percentage of students not placed is given by 60.8670, standard deviation is 7.045, 2nd quartile or median is 61.00, 1st quartile is 56.65 and 3rd quartile is 64.00.

而未安置學生的平均百分比為60.8670,標準差為7.045,第二四分位數或中位數是61.00,第一四分位數是56.65,第三四分位數是64.00。

From this analysis, undergraduate students/freshers can prioritise and prepare for their undergraduate/degree examinations keeping in mind the average score, as mentioned above, that the corporate companies generally perceive worthy of grabbing a placement in their establishment.

通過這種分析,本科生/新生可以優先考慮并為本科生/學位考試做準備,同時牢記如上所述的平均分數,即公司通常認為值得在其機構中獲得職位。

Image for post

Inference : Male candidates get a higher pay than female candidates. The mean salary of placed male students is given by 302608.70 , standard deviation is 144726.4 , 2nd quartile or median is 264000, 1st quartile is 240000 and 3rd quartile is 300000.

推論男性候選人的薪酬高于女性候選人 。 入學男生的平均工資為302608.70,標準差為144726.4,第二四分位數或中位數為264000,第一四分位數為240000,第三四分位數為300000。

On the other hand, the mean salary of placed female students is given by 267571.43, standard deviation is 41776.1, 2nd quartile or median is 250000 ,1st quartile is 240000 and 3rd quartile is 300000.

另一方面,入職女學生的平均工資為267571.43,標準差為41776.1,第二四分位數或中位數為250000,第一四分位數為240000,第三四分位數為300000。

Thus, we can see that while the placement rate of females is lower than males, the salary offered to the placed female candidates is also relatively lower than that of the male candidates.

因此,我們可以看到,盡管女性的就業率低于男性,但提供給被安置的女性候選人的薪水也相對低于男性候選人。

Pivot Table :

數據透視表

Image for post

Inference : As more students opt for Commerce and Management, the no. of placed students as well as students not placed are much higher in it as compared to Science and other streams. Even the ratio of placed to students not placed is higher in Commerce and Management is higher than that in Science.

推論 :隨著越來越多的學生選擇商業與管理 ,不 與理科和其他科目相比, 錄取學生和未錄取學生的比例要高得多。 即使在商務和管理領域,就讀率和未就讀率之間的比重也更高,而在理科中則更高。

Readers can understand there are relatively more job opportunities for students who opt for Commerce and Management than other streams.

讀者可以理解,選擇商業和管理專業的學生比其他領域的工作機會相對更多。

Scatterplots :

散點圖

Image for post

For scatterplots, we have used 60% of the data provided. A scatterplot with variables salary and percentage obtained in the degree examination is formed. Here,the different points have been coloured according to the different streams as shown in the legends table.

對于散點圖,我們使用了提供的60%的數據。 形成了在學位考試中獲得的 薪水百分比可變的散點圖。 在這里,不同的點已根據圖例表中所示的不同流進行了著色。

Inference : The higher salaries have been offered to students whose scores lie in the range 64–74. Moreover, from the point of stream, most of the students that have been offered a pay higher than 300,000 belong to Commerce and Management. Very few students of Science and even fewer students of other streams have crossed the threshold of 300,000 pay.

推論 :為分數在64-74之間的學生提供了更高的薪水 。 而且,從的角度來看,獲得超過30萬薪水的大多數學生屬于商業與管理專業。 理科專業的學生很少,其他流派的學生甚至超過了30萬。

Image for post

Inference : Students that specialise in Marketing and Finance and those in Marketing and HR score similarly in MBA percentage. However, the highest paid students generally have scores in the range 62–70, approximately. Very few students have been offered a pay higher than 400,000. Majority of students are offered salaries in the range of 250,000 to 350,000.

推論市場營銷與金融專業的學生, 市場 營銷與人力資源 專業的MBA百分比得分相似。 但是,收入最高的學生的分數通常在62-70之間。 很少有學生獲得高于40萬的薪水。 大多數學生的薪水在250,000到350,000之間。

We can understand that maintaining an average score that falls in the above mentioned range shall suffice for a decent paying placement.

我們可以理解,將平均得分保持在上述范圍內就足以獲得不錯的付費。

Mosaic Plot :

馬賽克圖

Image for post

Other than academic parameters, some other factors may also be considered for placement by recruiting companies. Employablity tests conducted by colleges are key for establishing appropriate labour market linkages and ascertaining that the workforce is industry ready.

除了學術參數,其他一些因素也可以由招聘單位考慮的位置 。 高校進行的能力測試對于建立適當的勞動力市場聯系并確定勞動力已做好行業準備至關重要。

Inference: From the plot above, we can see that of all the students that did not get placed, very few scored above 83.5. Most of the unemployed candidates scored below 83.5.

推論 :從上圖可以看出,在所有未獲得排名的學生中,只有極少數得分高于83.5。 大多數失業候選人的得分都低于83.5。

Moreover, the plot suggests that students having prior work experience are considered more deserving than freshers. Nearly all the sections of students not placed did not have a prior work experience, whereas those having work experience are on the placed students section on the right.

此外,該圖表明,具有過往工作經驗的學生被認為比新生更值得。 幾乎所有未安置學生的部分都沒有事先的工作經驗,而那些有工作經驗的學生則在右側的已安置學生部分。

From this, students can comprehend that having an experience in a work environment before campus recruitment proves to be beneficial. Thus, they can plan and prepare accordingly for their future.

由此,學生可以理解,在校園招聘之前的工作環境中的經驗被證明是有益的。 因此,他們可以為自己的未來做計劃并作相應的準備。

Classification Tree :

分類樹

Image for post

This classification tree has placement status (placed) as target .It has the following parameters:

該分類樹以放置狀態(已放置)為目標,具有以下參數:

It is an induced binary tree.

它是一個誘導二叉樹。

Minimum no. of instances in leaves : 2.

最低編號 葉子中的實例數量:2。

Do not split subsets more than :5.

子集分割不要超過:5。

Limit the maximal tree depth to : 100.

將最大樹深度限制為:100。

Classification stops when majority reaches 95%.

當多數達到95%時,分類將停止。

Students can acquire a detailed analysis about the dependence of the various academic and other factors on whether or not a candidate gets placed based on the data provided. This tree gives a clear explanation of how the different attributes of a particular student shall influence their placement status.

學生可以根據所提供的數據,詳細了解各種學術因素和其他因素對候選人是否被安置的依賴性。 該樹清楚地解釋了特定學生的不同屬性如何影響他們的位置狀況

Image for post

This classification tree has salary offered as target .It has the following parameters:

此分類樹以薪金為目標,它具有以下參數:

It is an induced binary tree.

它是一個誘導二叉樹。

Minimum no. of instances in leaves : 2.

最低編號 葉子中的實例數量:2。

Do not split subsets more than :5.

子集分割不要超過:5。

Limit the maximal tree depth to : 100.

將最大樹深度限制為:100。

Classification stops when majority reaches 95%.

當多數達到95%時,分類將停止。

Students can acquire a detailed analysis about the dependence of the various academic and other factors on the salary offered to a candidate. This tree gives a clear explanation of how the different attributes of a particular student shall influence their pay.

學生可以獲得有關各種學術和其他因素對應聘者薪水的依賴性的詳細分析。 這棵樹清楚地說明了特定學生的不同屬性將如何影響他們的工資。

Vote of Thanks :

感謝票:

I would like to humbly and sincerely thank my mentor Rocky Jagtiani. He is more of a friend to me than mentor .The data analytics taught by him and various assignments we did and are still doing is the best way to learn and skill in Data Science field.

我要衷心地感謝我的導師 洛基 對于我而言,他不是導師,而是導師。他教給我們的數據分析以及我們目前做的和仍在做的各種作業是在數據科學領域學習和技能的最佳方法。

Recommended https://datascience.suvenconsultants.com/

推薦的 https://datascience.suvenconsultants.com/

翻譯自: https://medium.com/@ruchikaparag18/placement-outcomes-data-analysis-using-orange-gui-1884aa3ac0c2

orange 數據分析

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389887.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389887.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389887.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

C++(1)引用

引用 引用 為對象起另外一個名字,通過將聲明符寫成 &d,其中d是聲明的變量名。一旦初始化完成,引用將和起初始值綁定在一起,無法再綁定到另一個對象,因此引用必須初始化。 引用就是別名,初始化以后&am…

普里姆從不同頂點出發_來自三個不同聚類分析的三個不同教訓數據科學的頂點...

普里姆從不同頂點出發繪制大流行時期社區的風險群圖:以布宜諾斯艾利斯為例 (Map Risk Clusters of Neighbourhoods in the time of Pandemic: a case of Buenos Aires) 介紹 (Introduction) Every year is unique and particular. But, 2020 brought the world the …

一步一步圖文介紹SpriteKit使用TexturePacker導出的紋理集Altas

1、為什么要使用紋理集? 游戲是一種很耗費資源的應用,特別是在移動設備中的游戲,性能優化是非常重要的 紋理集是將多張小圖合成一張大圖,使用紋理集有以下優點: 1、減少內存占用,減少磁盤占用; …

BZOJ.1007.[HNOI2008]水平可見直線(凸殼 單調棧)

題目鏈接 可以看出我們是要維護一個下凸殼。 先對斜率從小到大排序。斜率最大、最小的直線是一定會保留的,因為這是凸殼最邊上的兩段。 維護一個單調棧,棧中為當前可見直線(按照斜率排序)。 當加入一條直線l時,可以發現 如果l與棧頂直線l的交…

荷蘭牛欄 荷蘭售價_荷蘭的公路貨運是如何發展的

荷蘭牛欄 荷蘭售價I spent hours daily driving on one of the busiest motorways in the Netherlands when commuting was still a norm. When I first came across with the goods vehicle data on CBS website, it immediately attracted my attention: it could answer tho…

Vim 行號的顯示與隱藏

2019獨角獸企業重金招聘Python工程師標準>>> Vim 行號的顯示與隱藏 一、當前文檔的顯示與隱藏 1 打開一個文檔 [rootpcname ~]# vim demo.txt This is the main Apache HTTP server configuration file. It contains the configuration directives that give the s…

結對項目-小學生四則運算系統網頁版項目報告

結對作業搭檔:童宇欣 本篇博客結構一覽: 1).前言(包括倉庫地址等項目信息) 2).開始前PSP展示 3).結對編程對接口的設計 4).計算模塊接口的設計與實現過程 5).計算模塊接口部分的性能改進 6&…

367. 有效的完全平方數

367. 有效的完全平方數 給定一個 正整數 num ,編寫一個函數,如果 num 是一個完全平方數,則返回 true ,否則返回 false 。 進階:不要 使用任何內置的庫函數,如 sqrt 。 示例 1:輸入&#xff1…

袁中的第三次作業

第一題: 輸出月份英文名 設計思路: 1:看題目:主函數與函數聲明,知道它要你干什么2:理解與分析:在main中,給你一個月份數字n,要求你通過調用函數char *getmonth,來判斷:若…

Python從菜鳥到高手(1):初識Python

1 Python簡介 1.1 什么是Python Python是一種面向對象的解釋型計算機程序設計語言,由荷蘭人吉多范羅蘇姆(Guido van Rossum)于1989年發明,第一個公開發行版發行于1991年。目前Python的最新發行版是Python3.6。 Python是純粹的自由…

如何成為數據科學家_成為數據科學家需要了解什么

如何成為數據科學家Data science is one of the new, emerging fields that has the power to extract useful trends and insights from both structured and unstructured data. It is an interdisciplinary field that uses scientific research, algorithms, and graphs to…

2053. 數組中第 K 個獨一無二的字符串

2053. 數組中第 K 個獨一無二的字符串 獨一無二的字符串 指的是在一個數組中只出現過 一次 的字符串。 給你一個字符串數組 arr 和一個整數 k ,請你返回 arr 中第 k 個 獨一無二的字符串 。如果 少于 k 個獨一無二的字符串,那么返回 空字符串 “” 。 …

阿里云對數據可靠性保障的一些思考

背景互聯網時代的數據重要性不言而喻,任何數據的丟失都會給企事業單位、政府機關等造成無法計算和無法彌補的損失,尤其隨著云計算和大數據時代的到來,數據中心的規模日益增大,環境更加復雜,云上客戶群體越來越龐大&…

linux實驗二

南京信息工程大學實驗報告 實驗名稱 linux 常用命令練習 實驗日期 2018-4-4 得分指導教師 系 計軟院 專業 軟嵌 年級 2015 級 班次 (1) 姓名王江遠 學號20151398006 一、實驗目的 1. 掌握 linux 系統中 shell 的基礎知識 2. 掌握 linux 系統中文件系統的…

個人項目api接口_5個免費有趣的API,可用于學習個人項目等

個人項目api接口Public APIs are awesome!公共API很棒! There are over 50 pieces covering APIs on just the Towards Data Science publication, so I won’t go into too lengthy of an introduction. APIs basically let you interact with some tool or servi…

5918. 統計字符串中的元音子字符串

5918. 統計字符串中的元音子字符串 子字符串 是字符串中的一個連續(非空)的字符序列。 元音子字符串 是 僅 由元音(‘a’、‘e’、‘i’、‘o’ 和 ‘u’)組成的一個子字符串,且必須包含 全部五種 元音。 給你一個字…

咕泡-模板方法 template method 設計模式筆記

2019獨角獸企業重金招聘Python工程師標準>>> 模板方法模式(Template Method) 定義一個操作中的算法的骨架,而將一些步驟延遲到子類中Template Method 使得子類可以不改變一個算法的結構即可重定義該算法的某些特定步驟Template Me…

如何評價強gis與弱gis_什么是gis的簡化解釋

如何評價強gis與弱gisTL;DR — A Geographic Information System is an information system that specializes in the storage, retrieval and display of location data.TL; DR — 地理信息系統 是專門從事位置數據的存儲,檢索和顯示的信息系統。 The standard de…

clone-graph

1. clone-graph Clone an undirected graph. Each node in the graph contains a label and a list of its neighbors. 思路:dfs,其實就是遞歸。 1 /**2 * Definition for undirected graph.3 * struct UndirectedGraphNode {4 * int label;5 * …

5919. 所有子字符串中的元音

5919. 所有子字符串中的元音 給你一個字符串 word ,返回 word 的所有子字符串中 元音的總數 ,元音是指 ‘a’、‘e’、‘i’、‘o’ 和 ‘u’ 。 子字符串 是字符串中一個連續(非空)的字符序列。 注意:由于對 word …