使用python數據分析
If you're learning Python, you've likely heard about sci-kit-learn, NumPy and Pandas. And these are all important libraries to learn. But there is more to them than you might initially realize.
如果您正在學習Python,則可能聽說過sci-kit-learn,NumPy和Pandas。 這些都是需要學習的重要庫。 但是他們所擁有的比您最初想象的要多。
There are numerous tips and tricks in the world of Python that can help you speed up your tasks in data science, improve your code, and also help you to write code more efficiently.
Python領域中有許多技巧和竅門,可以幫助您加快數據科學中的任務,改善代碼并還可以更有效地編寫代碼。
So I decided to compile some of the most valuable data analysis tips in this article for you.
因此,我決定為您編譯一些最有價值的數據分析技巧。
在Pandas中剖析數據框 (Profile dataframes in Pandas)
The primary role or purpose of profiling is to get a clear understanding of the data. And this is what the Python package, Pandas Profiling, does. This method is straightforward and fast in performing data analysis of dataframes in Pandas.
概要分析的主要作用或目的是對數據有清晰的了解。 這就是Python程序包Pandas Profiling所做的。 該方法在對Pandas中的數據幀執行數據分析時非常簡單快捷。
The exploratory data analysis process includes the Pandas df.info()functions and df.describe() as the first steps. But you only get a basic data overview, which might not be very helpful if you're dealing with a large data set.
探索性數據分析過程包括熊貓df.info()函數和df.describe()作為第一步。 但是您只會得到基本的數據概述,如果您要處理大量數據集,這可能不會很有幫助。
Pandas’s profiling function also extends the dataframe of Pandas with the df.profile_report(), which helps you quickly analyze data. It displays plenty of information in just one line of code, which also happens to be an HTML report that's interactive.
Pandas的分析功能還使用df.profile_report()擴展了Pandas的數據框,該功能可幫助您快速分析數據。 它僅用一行代碼顯示大量信息,而這恰好是交互式HTML報告。
For a set of data, Pandas profiling computes these statistics:
對于一組數據,Pandas分析會計算以下統計信息:
使熊貓圖更具互動性 (Make pandas plots more interactive)
The built-in plot() function of Pandas is also one of the Dataframe classes. However, this function offers visualizations that are not very interactive, and so do not appeal much to a data science audience.
Pandas的內置plot()函數也是Dataframe類之一。 但是,此功能提供的可視化效果不是很互動,因此對數據科學的受眾吸引力不大。
On the other hand, it is easy to plot a chart with the Pandas.DataFrame.plot() function. The question then is, how do we plot interactive charts like Plotly using Pandas and without making significant changes to the code?
另一方面,使用Pandas.DataFrame.plot()函數可以很容易地繪制圖表。 然后的問題是,如何在不對代碼進行重大更改的情況下使用Pandas繪制交互式圖表(如Plotly)?
You can do this with the Cufflinks library, which binds Plotly’s power with Pandas's flexibility for plotting quickly.
您可以使用Cufflinks庫來做到這一點,該庫將Plotly的功能與Pandas的靈活性相結合,可以快速進行繪圖。
You can see the result in the images below.
您可以在下面的圖像中看到結果。
Both visualizations show the same things. The first visualization is a static chart, while the second one is a more interactive chart (and it also provides more details than the first one). Yet, we got this without making any significant changes to the syntax.
兩種可視化都顯示相同的內容。 第一個可視化是靜態圖表,而第二個可視化是更具交互性的圖表(它還提供了比第一個圖表更多的詳細信息)。 但是,我們在沒有對語法進行任何重大更改的情況下獲得了此代碼。
魔術命令 (Magic commands)
The tag ‘Magic Commands’ refers to a set of functions in Jupyter Notebooks. They created this set of features to solve the many common problems that are experienced in standard data analysis.
標簽“ Magic Commands”指的是Jupyter Notebook中的一組功能。 他們創建了這組功能來解決標準數據分析中遇到的許多常見問題。
There are two kinds of Magic commands. First, there are the line magics - those that have a prefix of the % character. They also operate on one line of input.
有兩種Magic命令。 首先,有線魔術-帶有%字符前綴的魔術。 它們還可以在一行輸入上運行。
The second kind are the cell magics - denoted by the double %% prefix. They work on more than one input line. If you set it to 1, you'll call the magic functions without needing to type the initial %.
第二種是細胞魔術-由雙%%前綴表示。 它們在多個輸入行上工作。 如果將其設置為1,則無需鍵入首字母%就可以調用magic函數。
Some of these commands might come in handy when you're doing everyday tasks in data analysis. Some of them are:
在執行數據分析的日常任務時,其中一些命令可能會派上用場。 他們之中有一些是:
%pastebin (%pastebin)
This function returns the URL and also uploads the code to Pastebin. Pastebin is a content hosting service online where it's possible to store plain text (such as source code snippets) and then share the URL with other people.
此函數返回URL,并將代碼上傳到Pastebin。 Pastebin是在線的內容托管服務,可以存儲純文本(例如源代碼片段),然后與其他人共享URL。
As a matter of fact, a Github gist is very similar to Pastebin, but has version control.
實際上,Github要點與Pastebin非常相似,但是具有版本控制功能。
%matplotlib筆記本 (%matplotlib notebook)
You can use this inline function for rendering static Matplotlib plots within Jupyter notebooks. You have to try and replace the inline part with a notebook. This will get you resize-able and zoom-able plots quickly.
您可以使用此內聯函數在Jupyter筆記本中渲染靜態Matplotlib圖。 您必須嘗試用筆記本替換嵌入式部件。 這將使您能夠快速調整大小和縮放比例的圖。
But make sure you call the function before you start to import the Matplotlib library.
但是請確保在開始導入Matplotlib庫之前先調用該函數。
%跑 (%run)
You can use this function to run a Python script in a notebook.
您可以使用此功能在筆記本中運行Python腳本。
%% writefile (%%writefile)
This function writes the cell content into a file. You then write the code into another file named foo.py before saving it into the current directory.
此函數將單元格內容寫入文件。 然后,將代碼寫入另一個名為foo.py的文件中,然后再將其保存到當前目錄中。
%%膠乳 (%%latex)
This function makes the cell content appear as LaTeX. It comes in handy when writing mathematical equations and formulae in a cell.
此功能使單元格內容顯示為LaTeX。 在單元格中編寫數學方程式和公式時非常方便。
查找并刪除錯誤 (Find and remove errors)
The function known as the interactive debugger is another magic feature. However, for this article, it has a different category all its own.
稱為交互式調試器的功能是另一個魔術功能。 但是,對于本文,它自己擁有一個不同的類別。
If you are running a code cell and get an exception, type %debug under a new line and then run it. This will open up an environment for interactive debugging that takes you back to the point where the exception happened.
如果您正在運行代碼單元并遇到異常,請在新行下鍵入%debug,然后運行它。 這將為交互式調試打開一個環境,使您回到發生異常的地方。
You can also check the values of the different variables that they assigned within the program and, at the same time, perform operations there. After that, if you want to exit the debugger, press q.
您還可以檢查它們在程序中分配的不同變量的值,并同時在其中執行操作。 此后,如果要退出調試器,請按q。
運行Python腳本時使用“ I”選項 (Use the ‘I’ option when running Python scripts)
One way to typically run a Python script from the command line is with hello.py. But if you add an -i and run the same Python script, (Python -i hello.py), you get more benefits. How?
通常從命令行運行Python腳本的一種方法是hello.py。 但是,如果添加-i并運行相同的Python腳本(Python -i hello.py),則會獲得更多好處。 怎么樣?
First of all, after you get to the program end, Python does not close the interpreter. This means that we can check for the values of the different variables and how correct the functions defined in the program are.
首先,進入程序端后 ,Python不會關閉解釋器。 這意味著我們可以檢查不同變量的值以及程序中定義的函數的正確性。
Second, it is then easy to invoke the Python debugger, especially since the interpreter is still available by:
其次,調用Python調試器非常容易,特別是因為解釋器仍然可以通過以下方式使用:
- Import pdb 導入pdb
- Pdb.pm() Pdb.pm()
From here, we can quickly get to the point where the exception happened and then work on the code.
從這里,我們可以快速到達發生異常的地方,然后對代碼進行處理。
刪除并還原 (Delete and restore)
So what do you do when you mistakenly delete one cell within your Jupyter Notebook? Luckily there is a shortcut for you to undo that action.
那么,當您錯誤地刪除Jupyter Notebook中的一個單元格時該怎么辦? 幸運的是,您可以通過快捷方式撤消該操作。
You can recover or undo your deleted content by hitting CTRL/CMD+Z.
您可以通過按CTRL / CMD + Z來恢復或撤消已刪除的內容。
If you have deleted an entire cell that you want to recover, press ESC+Z, or EDIT > Undo Delete Cells.
如果已刪除要恢復的整個單元,請按ESC + Z或EDIT> Undo Delete Cells。
結論 (Conclusion)
This article shared some tips to boost your data analysis skills with Python. These hacks should come in handy for you at some point in your Python data analysis journey.
本文分享了一些技巧,以提高您使用Python的數據分析技能。 在您進行Python數據分析的過程中,這些技巧應該會很方便。
翻譯自: https://www.freecodecamp.org/news/how-to-boost-your-data-analysis-skills-with-python/
使用python數據分析