使用python數據分析_如何使用Python提升您的數據分析技能

使用python數據分析

If you're learning Python, you've likely heard about sci-kit-learn, NumPy and Pandas. And these are all important libraries to learn. But there is more to them than you might initially realize.

如果您正在學習Python,則可能聽說過sci-kit-learn,NumPy和Pandas。 這些都是需要學習的重要庫。 但是他們所擁有的比您最初想象的要多。

There are numerous tips and tricks in the world of Python that can help you speed up your tasks in data science, improve your code, and also help you to write code more efficiently.

Python領域中有許多技巧和竅門,可以幫助您加快數據科學中的任務,改善代碼并還可以更有效地編寫代碼。

So I decided to compile some of the most valuable data analysis tips in this article for you.

因此,我決定為您編譯一些最有價值的數據分析技巧。

在Pandas中剖析數據框 (Profile dataframes in Pandas)

The primary role or purpose of profiling is to get a clear understanding of the data. And this is what the Python package, Pandas Profiling, does. This method is straightforward and fast in performing data analysis of dataframes in Pandas.

概要分析的主要作用或目的是對數據有清晰的了解。 這就是Python程序包Pandas Profiling所做的。 該方法在對Pandas中的數據幀執行數據分析時非常簡單快捷。

The exploratory data analysis process includes the Pandas df.info()functions and df.describe() as the first steps. But you only get a basic data overview, which might not be very helpful if you're dealing with a large data set.

探索性數據分析過程包括熊貓df.info()函數和df.describe()作為第一步。 但是您只會得到基本的數據概述,如果您要處理大量數據集,這可能不會很有幫助。

Pandas’s profiling function also extends the dataframe of Pandas with the df.profile_report(), which helps you quickly analyze data. It displays plenty of information in just one line of code, which also happens to be an HTML report that's interactive.

Pandas的分析功能還使用df.profile_report()擴展了Pandas的數據框,該功能可幫助您快速分析數據。 它僅用一行代碼顯示大量信息,而這恰好是交互式HTML報告。

For a set of data, Pandas profiling computes these statistics:

對于一組數據,Pandas分析會計算以下統計信息:

使熊貓圖更具互動性 (Make pandas plots more interactive)

The built-in plot() function of Pandas is also one of the Dataframe classes. However, this function offers visualizations that are not very interactive, and so do not appeal much to a data science audience.

Pandas的內置plot()函數也是Dataframe類之一。 但是,此功能提供的可視化效果不是很互動,因此對數據科學的受眾吸引力不大。

On the other hand, it is easy to plot a chart with the Pandas.DataFrame.plot() function. The question then is, how do we plot interactive charts like Plotly using Pandas and without making significant changes to the code?

另一方面,使用Pandas.DataFrame.plot()函數可以很容易地繪制圖表。 然后的問題是,如何在不對代碼進行重大更改的情況下使用Pandas繪制交互式圖表(如Plotly)?

You can do this with the Cufflinks library, which binds Plotly’s power with Pandas's flexibility for plotting quickly.

您可以使用Cufflinks庫來做到這一點,該庫將Plotly的功能與Pandas的靈活性相結合,可以快速進行繪圖。

You can see the result in the images below.

您可以在下面的圖像中看到結果。

Both visualizations show the same things. The first visualization is a static chart, while the second one is a more interactive chart (and it also provides more details than the first one). Yet, we got this without making any significant changes to the syntax.

兩種可視化都顯示相同的內容。 第一個可視化是靜態圖表,而第二個可視化是更具交互性的圖表(它還提供了比第一個圖表更多的詳細信息)。 但是,我們在沒有對語法進行任何重大更改的情況下獲得了此代碼。

魔術命令 (Magic commands)

The tag ‘Magic Commands’ refers to a set of functions in Jupyter Notebooks. They created this set of features to solve the many common problems that are experienced in standard data analysis.

標簽“ Magic Commands”指的是Jupyter Notebook中的一組功能。 他們創建了這組功能來解決標準數據分析中遇到的許多常見問題。

There are two kinds of Magic commands. First, there are the line magics - those that have a prefix of the % character. They also operate on one line of input.

有兩種Magic命令。 首先,有線魔術-帶有%字符前綴的魔術。 它們還可以在一行輸入上運行。

The second kind are the cell magics - denoted by the double %% prefix. They work on more than one input line. If you set it to 1, you'll call the magic functions without needing to type the initial %.

第二種是細胞魔術-由雙%%前綴表示。 它們在多個輸入行上工作。 如果將其設置為1,則無需鍵入首字母%就可以調用magic函數。

Some of these commands might come in handy when you're doing everyday tasks in data analysis. Some of them are:

在執行數據分析的日常任務時,其中一些命令可能會派上用場。 他們之中有一些是:

%pastebin (%pastebin)

This function returns the URL and also uploads the code to Pastebin. Pastebin is a content hosting service online where it's possible to store plain text (such as source code snippets) and then share the URL with other people.

此函數返回URL,并將代碼上傳到Pastebin。 Pastebin是在線的內容托管服務,可以存儲純文本(例如源代碼片段),然后與其他人共享URL。

As a matter of fact, a Github gist is very similar to Pastebin, but has version control.

實際上,Github要點與Pastebin非常相似,但是具有版本控制功能。

%matplotlib筆記本 (%matplotlib notebook)

You can use this inline function for rendering static Matplotlib plots within Jupyter notebooks. You have to try and replace the inline part with a notebook. This will get you resize-able and zoom-able plots quickly.

您可以使用此內聯函數在Jupyter筆記本中渲染靜態Matplotlib圖。 您必須嘗試用筆記本替換嵌入式部件。 這將使您能夠快速調整大小和縮放比例的圖。

But make sure you call the function before you start to import the Matplotlib library.

但是請確保在開始導入Matplotlib庫之前先調用該函數。

%跑 (%run)

You can use this function to run a Python script in a notebook.

您可以使用此功能在筆記本中運行Python腳本。

%% writefile (%%writefile)

This function writes the cell content into a file. You then write the code into another file named foo.py before saving it into the current directory.

此函數將單元格內容寫入文件。 然后,將代碼寫入另一個名為foo.py的文件中,然后再將其保存到當前目錄中。

%%膠乳 (%%latex)

This function makes the cell content appear as LaTeX. It comes in handy when writing mathematical equations and formulae in a cell.

此功能使單元格內容顯示為LaTeX。 在單元格中編寫數學方程式和公式時非常方便。

查找并刪除錯誤 (Find and remove errors)

The function known as the interactive debugger is another magic feature. However, for this article, it has a different category all its own.

稱為交互式調試器的功能是另一個魔術功能。 但是,對于本文,它自己擁有一個不同的類別。

If you are running a code cell and get an exception, type %debug under a new line and then run it. This will open up an environment for interactive debugging that takes you back to the point where the exception happened.

如果您正在運行代碼單元并遇到異常,請在新行下鍵入%debug,然后運行它。 這將為交互式調試打開一個環境,使您回到發生異常的地方。

You can also check the values of the different variables that they assigned within the program and, at the same time, perform operations there. After that, if you want to exit the debugger, press q.

您還可以檢查它們在程序中分配的不同變量的值,并同時在其中執行操作。 此后,如果要退出調試器,請按q。

運行Python腳本時使用“ I”選項 (Use the ‘I’ option when running Python scripts)

One way to typically run a Python script from the command line is with hello.py. But if you add an -i and run the same Python script, (Python -i hello.py), you get more benefits. How?

通常從命令行運行Python腳本的一種方法是hello.py。 但是,如果添加-i并運行相同的Python腳本(Python -i hello.py),則會獲得更多好處。 怎么樣?

First of all, after you get to the program end, Python does not close the interpreter. This means that we can check for the values of the different variables and how correct the functions defined in the program are.

首先,進入程序端后 ,Python不會關閉解釋器。 這意味著我們可以檢查不同變量的值以及程序中定義的函數的正確性。

Second, it is then easy to invoke the Python debugger, especially since the interpreter is still available by:

其次,調用Python調試器非常容易,特別是因為解釋器仍然可以通過以下方式使用:

  • Import pdb

    導入pdb
  • Pdb.pm()

    Pdb.pm()

From here, we can quickly get to the point where the exception happened and then work on the code.

從這里,我們可以快速到達發生異常的地方,然后對代碼進行處理。

刪除并還原 (Delete and restore)

So what do you do when you mistakenly delete one cell within your Jupyter Notebook? Luckily there is a shortcut for you to undo that action.

那么,當您錯誤地刪除Jupyter Notebook中的一個單元格時該怎么辦? 幸運的是,您可以通過快捷方式撤消該操作。

You can recover or undo your deleted content by hitting CTRL/CMD+Z.

您可以通過按CTRL / CMD + Z來恢復或撤消已刪除的內容。

If you have deleted an entire cell that you want to recover, press ESC+Z, or EDIT > Undo Delete Cells.

如果已刪除要恢復的整個單元,請按ESC + Z或EDIT> Undo Delete Cells。

結論 (Conclusion)

This article shared some tips to boost your data analysis skills with Python. These hacks should come in handy for you at some point in your Python data analysis journey.

本文分享了一些技巧,以提高您使用Python的數據分析技能。 在您進行Python數據分析的過程中,這些技巧應該會很方便。

翻譯自: https://www.freecodecamp.org/news/how-to-boost-your-data-analysis-skills-with-python/

使用python數據分析

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390217.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390217.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390217.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

openresty 日志輸出的處理

最近出了個故障,有個接口的請求居然出現了長達幾十秒的處理時間,由于日志缺乏,網絡故障也解除了,就沒法再重現這個故障了。為了可以在下次出現問題的時候能追查到問題,所以需要添加一些追蹤日志。添加這些追蹤日志&…

誰是贏家_贏家的真正作品是股東

誰是贏家As I wrote in the article “5 Skills to Look For When Hiring Remote Talent,” remote work is a fast emerging segment of the labor market. Today roughly eight million Americans work remotely full-time. And among the most commonly held jobs include m…

博客園代碼黑色主題高亮設置

參考鏈接: https://segmentfault.com/a/1190000013001367 先發鏈接,有空實踐后會整理。我的GitHub地址:https://github.com/heizemingjun我的博客園地址:http://www.cnblogs.com/chenmingjun我的螞蟻筆記博客地址:http…

Matplotlib課程–學習Python數據可視化

Learn the basics of Matplotlib in this crash course tutorial. Matplotlib is an amazing data visualization library for Python. You will also learn how to apply Matplotlib to real-world problems.在此速成班教程中學習Matplotlib的基礎知識。 Matplotlib是一個很棒…

Android 開發使用 Gradle 配置構建庫模塊的工作方式

Android 開發過程中,我們不可避免地需要引入其他人的工作成果。減少重復“造輪子”的時間,投入到更有意義的核心任務當中。Android 庫模塊在結構上與 Android 應用模塊相同。提供構建應用所需的一切內容,包括源代碼(src&#xff0…

vue 組件庫發布_如何創建和發布Vue組件庫

vue 組件庫發布Component libraries are all the rage these days. They make it easy to maintain a consistent look and feel across an application. 如今,組件庫風行一時。 它們使在整個應用程序中保持一致的外觀和感覺變得容易。 Ive used a variety of diff…

angular

<input type"file" id"one-input" accept"image/*" file-model"images" οnchange"angular.element(this).scope().img_upload(this.files)"/>轉載于:https://www.cnblogs.com/loweringye/p/8441437.html

Java網絡編程 — Netty入門

認識Netty Netty簡介 Netty is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. Netty is a NIO client server framework which enables quick and easy development o…

har文件分析http_如何使用HAR文件分析一段時間內的性能

har文件分析httpWhen I consider the performance of a website, several things come to mind. I think about looking at the requests of a page, understanding what resources are being loaded, and how long these resources take to be available to users.當我考慮網站…

第一階段:前端開發_Mysql——表與表之間的關系

2018-06-26 表與表之間的關系 一、一對多關系&#xff1a; 常見實例&#xff1a;分類和商品&#xff0c;部門和員工一對多建表原則&#xff1a;在從表&#xff08;多方&#xff09;創建一個字段&#xff0c;字段作為外鍵指向主表&#xff08;一方&#xff09;的一方      …

按鈕提交在url后添加字段_在輸入字段上定向單擊“清除”按鈕(X)

按鈕提交在url后添加字段jQuery makes it easy to get your project up and running. Though its fallen out of favor in recent years, its still worth learning the basics, especially if you want quick access to its powerful methods.jQuery使您可以輕松啟動和運行項目…

429. N 叉樹的層序遍歷

429. N 叉樹的層序遍歷 給定一個 N 叉樹&#xff0c;返回其節點值的層序遍歷。&#xff08;即從左到右&#xff0c;逐層遍歷&#xff09;。 樹的序列化輸入是用層序遍歷&#xff0c;每組子節點都由 null 值分隔&#xff08;參見示例&#xff09;。 - 示例 1&#xff1a;輸入…

javascript如何阻止事件冒泡和默認行為

阻止冒泡&#xff1a; 冒泡簡單的舉例來說&#xff0c;兒子知道了一個秘密消息&#xff0c;它告訴了爸爸&#xff0c;爸爸知道了又告訴了爺爺&#xff0c;一級級傳遞從而以引起事件的混亂&#xff0c;而阻止冒泡就是不讓兒子告訴爸爸&#xff0c;爸爸自然不會告訴爺爺。下面的d…

89. Gray Code - LeetCode

為什么80%的碼農都做不了架構師&#xff1f;>>> Question 89. Gray Code Solution 思路&#xff1a; n 0 0 n 1 0 1 n 2 00 01 10 11 n 3 000 001 010 011 100 101 110 111 Java實現&#xff1a; public List<Integer> grayCode(int n) {List&…

400. 第 N 位數字

400. 第 N 位數字 在無限的整數序列 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, …中找到第 n 位數字。 注意&#xff1a;n 是正數且在 32 位整數范圍內&#xff08;n < 231&#xff09;。 示例 1&#xff1a; 輸入&#xff1a;3 輸出&#xff1a;3 示例 2&#xff1a; 輸入&…

1.初識Linux

1.Linux 區分大小寫 2.shell命令行-bash 進入終端->[stulocalhost~]$ (其中,Stu為登錄用戶名&#xff0c;localhost為登錄主機名&#xff0c;’~’ 表示當前用戶正處在stu用戶的家目錄中, 普通用戶的提示符以$結尾&#xff0c;而根用戶以’#’結尾) 3.Linux中所謂的命令(…

這份NLP研究進展匯總請收好,GitHub連續3天最火的都是它

最近&#xff0c;有一份自然語言處理 (NLP) 進展合輯&#xff0c;一發布就受到了同性交友網站用戶的瘋狂標星&#xff0c;已經連續3天高居GitHub熱門榜首位。 合集里面包括&#xff0c;20多種NLP任務前赴后繼的研究成果&#xff0c;以及用到的數據集。 這是來自愛爾蘭的Sebasti…

基于模型的嵌入式開發流程_如何使用基于模型的測試來改善工作流程

基于模型的嵌入式開發流程Unit testing is not enough – so lets start using model-based testing to improve our workflows.單元測試還不夠–因此&#xff0c;讓我們開始使用基于模型的測試來改善我們的工作流程。 Software testing is an important phase in building a …

166. 分數到小數

166. 分數到小數 給定兩個整數&#xff0c;分別表示分數的分子 numerator 和分母 denominator&#xff0c;以 字符串形式返回小數 。 如果小數部分為循環小數&#xff0c;則將循環的部分括在括號內。 如果存在多個答案&#xff0c;只需返回 任意一個 。 對于所有給定的輸入…

最近用.NET實現DHT爬蟲,全.NET實現

最近用.NET實現DHT爬蟲&#xff0c;全.NET實現&#xff0c;大家可以加我QQ交流下 309159808 轉載于:https://www.cnblogs.com/oshoh/p/9236186.html