北方工業大學gpa計算_北方大學聯盟倉庫的探索性分析

北方工業大學gpa計算

This is my firts publication here and i will start simple.

這是我的第一篇出版物,這里我將簡單介紹

I want to make an exploratory data analysis of UFRN’s warehouse and answer some questions about the data using Python and Power BI.

我想對UFRN的倉庫進行探索性數據分析,并使用Python和Power BI回答有關數據的一些問題。

I have downloaded the dataset from “dados.gov.br”, wich is an open source website of public data from Brazil. You can have the used dataset here 👈

我已經從“ dados.gov.br ”下載了數據集,該數據集是來自巴西的公共數據的開源網站。 您可以在這里擁有使用的數據集👈

QUESTIONS TO BE ANSWERED:

需要回答的問題:

  1. What are the items with the greatest quantity in stock;

    庫存量最大物品是什么?

  2. What are the items with the greatest value in stock;

    庫存中價值最大物品是什么?

  3. Wich warehouse has the greatest quantity of items in stock and wich has the lower quantity;

    Wich 倉庫的庫存物品最多,而數量 較少的 倉庫 ;

  4. Wich warehouse has the greatest value in stock;

    Wich 倉庫的庫存價值 最大

  5. What are the most expensive items.

    什么是最昂貴的物品。

So, let’s start importing pandas in the Jupyter Notebook and opening the csv file:

因此,讓我們開始在Jupyter Notebook中導入熊貓并打開csv文件:

Image for post

Here ? we have our first problem:

?我們有第一個問題:

File is strange. Seems that all informations are in the same column…

文件很奇怪 。 似乎所有信息都在同一列中…

This factor makes it impossible to analyse the dataframe by column as we can see below:

這個因素使我們無法按列分析數據幀,如下所示:

Image for post

The problem in this case is the separator. Pandas uses the comma as default delimiter (or separator), and in this file, the delimiter is semicolon.

在這種情況下,問題是分隔符。 Pandas使用逗號作為默認定界符(或分隔符),在此文件中,定界符為分號。

So, we need to inform the delimiter right after the path file:

因此,我們需要在路徑文件之后立即通知定界符:

Image for post

Now, here we go! ?

現在,我們開始! ?

Let’s check some informations about the dataset, like columns, data types, lenght, and missing values.

讓我們檢查有關數據集的一些信息,例如數據類型長度缺失

Note that we have 5 columns:

請注意,我們有5列:

Image for post
Image for post

material (Item): object;almoxarifado (Warehouse): object;saldo (balance): float;preco (price): object;valor_total (Amount): object;

材料 (項目):對象; almoxarifado (倉庫):對象; saldo (平衡):浮動; preco (價格):對象; valor_total (金額):對象;

If we had to make some operations between columns, we should probably change some object types to integer or float values, but in this case we won’t need.

如果必須在列之間進行某些操作,則可能應將某些對象類型更改為整數或浮點值,但在這種情況下,我們將不需要。

Now we can analyse data through columns

現在我們可以通過列分析數據

Image for post

? Above we have the second issue:

?以上是第二個問題:

There are 2 different names for the same warehouse: “MEJC — ALMOXARIFADO NUTRI??O” and “MEJC — ALMOXARIFADO DE NUTRI??O”.

有對同一倉庫2個不同的名字:“MEJC - ALMOXARIFADONUTRI??O”和“MEJC - ALMOXARIFADO DENUTRI??O”。

It’s a typo, so let’s make them one.

這是一個錯字,因此讓我們將它們設為一個。

Image for post

Done!

做完了!

A very quick and easy way to have a lot of information in one code line is using Pandas Profilling.

使用Pandas Profilling是在一個代碼行中具有很多信息的非常快速,簡便的方法。

Analysing the earlier infos and Pandas Profilling below, we have: 9087 rows and 4536 distinct items. 49.9% unique items and 0 missing cells.

分析下面的早期信息和Pandas Profilling,我們有9087行4536個不同的項目。 49.9%的獨特物品和0個丟失的單元格。

It means that dataframe doesn’t need much effort in cleaning and treatment.

這意味著數據幀在清理和處理方面不需要太多的工作。

Image for post

This dataset is small and brings us just a few columns. Like i said, first publication, simple things (sorry 😬)

這個數據集很小,只給我們帶來了幾列。 就像我說的那樣,第一次出版,簡單的事情(對不起😬)

Excepting “saldo”, “preco” and “valor_total”, the columns are categorical data. So now, we can go to Power BI to make some visualizations and answer some questions.

除“ saldo”,“ preco”和“ valor_total”外,這些列均為分類數據。 現在,我們可以去Power BI進行可視化并回答一些問題。

But first, we need to export the dataset we edited in Jupyter Notebook:

但是首先,我們需要導出在Jupyter Notebook中編輯的數據集:

Image for post

With the file saved in the same folder of the actual project, we just need to open it in Power BI

將文件保存在實際項目的同一文件夾中,我們只需要在Power BI中打開它

The data is configured by default as a “Western European” source. Because of this, some words in brazilian portuguese are misspelled

默認情況下,數據配置為“西歐”來源。 因此,巴西葡萄牙語中的某些單詞 拼寫錯誤

Image for post

We solve this issue choosing “Unicode — UTF8” as the file source

我們選擇“ Unicode — UTF8”作為文件源來解決此問題

Image for post

At the Power Query editor, we can see that the columns are already in the right format (Text, Text, Role Number, Currency, Currency). So it doesn’t need any transformation, just close and apply.

在Power Query編輯器中,我們可以看到列的格式已經正確(文本,文本,角色編號,貨幣,貨幣)。 因此,它不需要任何轉換,只需關閉并應用即可。

Image for post

LET’S GO TO THE QUESTIONS!

讓我們去解決問題!

  1. What are the items with the greatest quantity in stock:

    什么是物品 庫存數量最多的庫存:

Image for post

Almost 2 millions syringes and double-distilled water.

近200萬支注射器和雙蒸餾水

2. What are the items with the greatest value in stock:

2.什么是物品 庫存最大的價值

Image for post

WOW! Wait..close to 1 million $ in coffee??? Seems that teachers are driven by coffee! 😂

哇! 等待..接近一百萬美元的咖啡 ??? 看來老師是被咖啡驅動的! 😂

Note that the value of these 2 items is very different from the others (could they be outliers?)

請注意,這兩項的價值與其他項有很大不同 (它們可能是離群值嗎?)

3. Wich warehouse has the greatest quantity of items in stock and wich has the lower quantity:

3. Wich 倉庫中的物品數量最多 ,而其中的物品數量 較少:

Image for post

Despite these values are being rounded, the medicine warehouse has the largest number of items, followed by the infra-structure warehouse.

盡管對這些值進行了四舍五入,但藥品倉庫的物品數量最多 ,其次是基礎結構倉庫。

Image for post

And the museum warehouse has de lower number of items.

博物館倉庫的物品數量較少

4. Wich warehouse has the greatest value in stock:

4. Wich 倉庫的庫存最大 價值

Image for post

We saw earlier that the Medicine Warehouse has the largest number of items, slightly larger than that of Infrastructure. But the stock value of the Infra Warehouse is twice that of Medicine.

前面我們看到, 藥品倉庫的物品數量最多 ,比基礎設施的數量略大。 但是Infra倉庫的庫存價值 醫學的兩倍

5. What are the most expensive items.

5.什么是最昂貴的物品。

Image for post

Ok, so a 250g package of coffee costs 30.4 thousands?? Sounds weird 🤔

好的,一包250克咖啡的價格為30,400歐元? 聽起來很奇怪🤔

Image for post
Image for post

Well, it’s true! 😐 Also 11 thousand $ in plastic cups.

好吧,這是真的! 😐還有一萬一千美元的塑料杯。

Another strange thing that i realize is that 107 items have 1 cent as a price:

我意識到的另一件奇怪的事是107件商品的價格為1美分

Image for post
Image for post

Does all these items really costs just 1 cent? Someone will need to answer it.

所有這些物品真的只花費1美分嗎? 有人需要回答。

SO WHAT??

所以呢??

Analysing this dataset we have seen that there is A LOT of money spent in coffe and plastic cup. ALMOST 1 MILLION!!!??? 💸💸💸

通過分析該數據集,我們發現在咖啡和塑料杯上花費了很多錢。 幾乎一百萬!!!!!! 💸💸💸

Maybe something it’s wrong…With this information i would probably check the veracity of the facts. Perhaps, the item’s description is not as much meaningfull as it could be (how big is this package?)

也許有些問題……有了這些信息,我可能會檢查事實的真實性。 也許,該項目的描述沒有那么有意義(此包裝有多大?)

Some inconclusions can turn into conclusions. In this case, some data curation could be needed.

一些結論可以得出結論。 在這種情況下,可能需要一些數據管理。

If you have some tips, comments or any suggestions, i would be very happy to learn more with you guys. I’m just a begginer.

如果您有任何提示,意見或建議,我將非常高興與你們一起學習更多。 我只是一個初學者。

Thank you for your time! 🙏

感謝您的時間! 🙏

翻譯自: https://medium.com/análises-exploratórias-de-dados/exploratory-analysis-of-ufrn-universidade-federal-do-rio-grande-do-norte-warehouses-e6f2ff334b0f

北方工業大學gpa計算

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389288.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389288.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389288.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

泰坦尼克數據集預測分析_探索性數據分析-泰坦尼克號數據集案例研究(第二部分)

泰坦尼克數據集預測分析Data is simply useless until you don’t know what it’s trying to tell you.除非您不知道數據在試圖告訴您什么,否則數據將毫無用處。 With this quote we’ll continue on our quest to find the hidden secrets of the Titanic. ‘The …

各種數據庫連接的總結

SQL數據庫的連接 return new SqlConnection("server127.0.0.1;databasepart;uidsa;pwd;"); oracle連接字符串 OracleConnection oCnn new OracleConnection("Data SourceORCL_SERVER;USERM70;PASSWORDmmm;");oledb連接數據庫return new OleDbConnection…

關于我

我是誰? Who am I?這是個哲學問題。。 簡單來說,我是Light,一個靠前端吃飯,又不想單單靠前端吃飯的Coder。 用以下幾點稍微給自己打下標簽: 工作了兩三年,對,我是16年畢業的90后一直…

L1和L2正則

https://blog.csdn.net/jinping_shi/article/details/52433975轉載于:https://www.cnblogs.com/zyber/p/9257843.html

基于PyTorch搭建CNN實現視頻動作分類任務代碼詳解

數據及具體講解來源: 基于PyTorch搭建CNN實現視頻動作分類任務 import torch import torch.nn as nn import torchvision.transforms as T import scipy.io from torch.utils.data import DataLoader,Dataset import os from PIL import Image from torch.autograd…

missforest_missforest最佳丟失數據插補算法

missforestMissing data often plagues real-world datasets, and hence there is tremendous value in imputing, or filling in, the missing values. Unfortunately, standard ‘lazy’ imputation methods like simply using the column median or average don’t work wel…

華碩猛禽1080ti_F-22猛禽動力回路的視頻分析

華碩猛禽1080tiThe F-22 Raptor has vectored thrust. This means that the engines don’t just push towards the front of the aircraft. Instead, the thrust can be directed upward or downward (from the rear of the jet). With this vectored thrust, the Raptor can …

聊天常用js代碼

<script languagejavascript>//轉意義字符與替換圖象以及字體HtmlEncode(text)function HtmlEncode(text){return text.replace(//"/g, &quot;).replace(/</g, <).replace(/>/g, >).replace(/#br#/g,<br>).replace(/IMGSTART/g,<IMG style…

溫故而知新:柯里化 與 bind() 的認知

什么是柯里化?科里化是把一個多參數函數轉化為一個嵌套的一元函數的過程。&#xff08;簡單的說就是將函數的參數&#xff0c;變為多次入參&#xff09; const curry (fn, ...args) > fn.length < args.length ? fn(...args) : curry.bind(null, fn, ...args); // 想要…

OPENVAS運行

https://www.jianshu.com/p/382546aaaab5轉載于:https://www.cnblogs.com/diyunpeng/p/9258163.html

Memory-Associated Differential Learning論文及代碼解讀

Memory-Associated Differential Learning論文及代碼解讀 論文來源&#xff1a; 論文PDF&#xff1a; Memory-Associated Differential Learning論文 論文代碼&#xff1a; Memory-Associated Differential Learning代碼 論文解讀&#xff1a; 1.Abstract Conventional…

大數據技術 學習之旅_如何開始您的數據科學之旅?

大數據技術 學習之旅Machine Learning seems to be fascinating to a lot of beginners but they often get lost into the pool of information available across different resources. This is true that we have a lot of different algorithms and steps to learn but star…

純API函數實現串口讀寫。

以最后決定用純API函數實現串口讀寫。 先從網上搜索相關代碼&#xff08;關鍵字&#xff1a;C# API 串口&#xff09;&#xff0c;發現網上相關的資料大約來源于一個版本&#xff0c;那就是所謂的msdn提供的樣例代碼&#xff08;msdn的具體出處&#xff0c;我沒有考證&#xff…

數據可視化工具_數據可視化

數據可視化工具Visualizations are a great way to show the story that data wants to tell. However, not all visualizations are built the same. My rule of thumb is stick to simple, easy to understand, and well labeled graphs. Line graphs, bar charts, and histo…

Android Studio調試時遇見Install Repository and sync project的問題

我們可以看到&#xff0c;報的錯是“Failed to resolve: com.android.support:appcompat-v7:16.”&#xff0c;也就是我們在build.gradle中最后一段中的compile項內容。 AS自動生成的“com.android.support:appcompat-v7:16.”實際上是根據我們的最低版本16來選擇16.x.x及以上編…

Apache Ignite 學習筆記(二): Ignite Java Thin Client

前一篇文章&#xff0c;我們介紹了如何安裝部署Ignite集群&#xff0c;并且嘗試了用REST和SQL客戶端連接集群進行了緩存和數據庫的操作。現在我們就來寫點代碼&#xff0c;用Ignite的Java thin client來連接集群。 在開始介紹具體代碼之前&#xff0c;讓我們先簡單的了解一下Ig…

VGAE(Variational graph auto-encoders)論文及代碼解讀

一&#xff0c;論文來源 論文pdf Variational graph auto-encoders 論文代碼 github代碼 二&#xff0c;論文解讀 理論部分參考&#xff1a; Variational Graph Auto-Encoders&#xff08;VGAE&#xff09;理論參考和源碼解析 VGAE&#xff08;Variational graph auto-en…

IIS7設置

IIS 7.0和IIS 6.0相比改變很大誰都知道&#xff0c;而且在IIS 7.0中用VS2005來調試Web項目也不是什么新鮮的話題&#xff0c;但是我還是第一次運用這個東東&#xff0c;所以在此記下我的一些過程&#xff0c;希望能給更多的后來者帶了一點參考。其實我寫這篇文章時也參考了其他…

tableau大屏bi_Excel,Tableau,Power BI ...您應該使用什么?

tableau大屏biAfter publishing my previous article on data visualization with Power BI, I received quite a few questions about the abilities of Power BI as opposed to those of Tableau or Excel. Data, when used correctly, can turn into digital gold. So what …

python 可視化工具_最佳的python可視化工具

python 可視化工具Disclaimer: I work for Datapane免責聲明&#xff1a;我為Datapane工作 動機 (Motivation) There are amazing articles on data visualization on Medium every day. Although this comes at the cost of information overload, it shouldn’t prevent you …