頂級數據恢復
Data science is the discipline of making data useful
數據科學是使數據有用的學科
When we talk about the top programming language for Data Science, we often find Python to be the best fit for the topic. Sure, Python is undoubtedly an excellent choice for a vast majority of Data Science-centric tasks, but there’s another programming language that was built specifically to provide superior number-crunching capabilities for Data Science, and that is R.
當我們談論數據科學的頂級編程語言時 ,我們經常發現Python最適合該主題。 當然,對于絕大多數以數據科學為中心的任務,Python無疑是一個不錯的選擇,但是還有另一種專門為數據科學提供出色的數字運算功能的編程語言, 那就是R。
In addition to providing robust statistical computing, R offers a huge collection, over 16 thousand to be exact, of highly resourceful libraries, catering to the needs of Data Scientists, Data Miners, and Statisticians alike. Further, in this article, we will shed some light on a handful of top R libraries for Data Science.
除了提供強大的統計計算功能外,R還提供了大量的資源豐富的庫 (準確地說是一萬六千多個),可以滿足數據科學家,數據挖掘者和統計學家的需求。 此外,在本文中,我們將闡明一些用于數據科學的頂級R庫。
最佳R數據科學圖書館 (Best R Libraries for Data Science)
R is extremely popular among Data Miners and Statisticians, and part of the reason is the extensive range of libraries that comes with R. These tools and functions can simplify statistical tasks to a great extent, making tasks such as data manipulation, visualization, web crawling, Machine Learning and more, a breeze. Some of the libraries have been briefly explained below:
R在數據挖掘者和統計學家中非常受歡迎,部分原因是R附帶的大量庫 。這些工具和功能可以在很大程度上簡化統計任務 ,從而完成諸如數據操作,可視化,Web爬網等任務,機器學習等等,輕而易舉。 下面簡要說明了一些庫:
1. dplyr (1. dplyr)
The dplyr package, also known as the grammar of data manipulation, essentially provides frequently used tools and functions for data manipulation, that includes the following functions:
dplyr軟件包 (也稱為數據操作語法)本質上提供了用于數據操作的常用工具和功能 ,其中包括以下功能:
filter(): for filtering your data based on the criteria
filter():用于根據條件過濾數據
mutate(): to add new variables which will act as functions of existing variables
mutate():添加將充當現有變量功能的新變量
select(): for selecting variables based on the names
select():根據名稱選擇變量
summarise(): helps summarise the data from multiple values
summarise():有助于匯總來自多個值的數據
arrange(): for rearranging the ordering of the rows
range():用于重新排列行的順序
Additionally, you can use the group_by() function, which can return the results grouped according to the requirements. If you’re keen on checking out the dplyr package, you can either get it from the tidyverse or install the package directly with the command “install.packages(“dplyr”).
此外,您可以使用group_by()函數,該函數可以返回根據要求分組的結果。 如果您熱衷于簽出dplyr軟件包,則可以從tidyverse獲取它。 或使用命令“ install.packages(“ dplyr”)”直接安裝軟件包。
2.提迪爾 (2. tidyr)
tidyr is one of the core packages in the Tidyverse ecosystem, and as the name suggests, it is used to tidy up messy data. Now, if you’re wondering what tidy data is, let me clear it for you. A tidy data indicates that every column is variable, each row is an observation, and each cell is a singular value.
tidyr是Tidyverse 生態系統的核心軟件包之一,顧名思義,它用于整理凌亂的數據 。 現在,如果您想知道什么是整潔的數據,請讓我為您清除。 整潔的數據表示每一列都是變量,每一行都是觀察值,每個單元格都是一個奇異值。
According to tidyr, tidy data is a way of storing the data that is to be used throughout the tidyverse and can help you save time and be more productive with your analysis. You can get the package from tidyverse or by the following command “install.packages(“tidyr”)”.
根據tidyr的說法,整齊的數據是一種存儲將在整個tidyverse中使用的數據的方式,它可以幫助您節省時間并提高分析效率。 您可以從tidyverse或通過以下命令“ install.packages(“ tidyr”)”獲取軟件包。
3. ggplot2 (3. ggplot2)
ggplot2 is among the top R libraries for data visualization and is actively being used by thousands of users around the world to create compelling charts, graphs, and plots. The reason behind this popularity is ggplot2 was created to simplify the visualization process by taking minimal input from the developer, such as the data to visualize, the style, and the primitives to use while leaving the rest onto the library.
ggplot2是用于數據可視化的頂級R庫之一 ,世界各地成千上萬的用戶積極使用ggplot2來創建引人注目的圖表,圖形和繪圖 。 之所以如此受歡迎,是因為創建了ggplot2來簡化可視化過程,方法是從開發人員獲取最少的輸入,例如要可視化的數據,樣式和要使用的基元,而將其余的保留在庫中。
The result is a graph that effortlessly presents complex statistics for instant visualizations. If you’re looking to add more customizability to your charts, you can use IDEs like RStudio for more granular control. You can get your hands on ggplot2 via the tidyverse collection or by using the standalone library via the command “install.packages(“ggplot2”)”.
結果是一個圖形,該圖形毫不費力地呈現了復雜的統計數據,以實現即時可視化。 如果您想為圖表添加更多可定制性,則可以使用以下IDE: RStudio提供更精細的控制。 您可以通過tidyverse集合或使用獨立庫(通過命令“ install.packages(“ ggplot2”))使用ggplot2。
Read this R documentation to know about ggplot2 functions-
閱讀此R文檔以了解ggplot2函數-
4.潤滑 (4. lubridate)
R is an excellent programming language for Data Science, but there are certain areas where R may feel incomplete. One such area is the handling of date and time. For anyone extensively working with date and time in R, may find it’s built-in capabilities cumbersome.
R是Data Science的出色編程語言,但在某些方面R可能感覺不完整。 這樣的領域之一就是日期和時間的處理。 對于在R中廣泛使用日期和時間的人,可能會發現它的內置功能很麻煩。
To overcome this, we have a handy package called lubridate. The package not only handles the standard date and time in R, but also offers additional enhancements such as time periods, daylight savings times, leap days, supports various time zones, fast time parsing, and many helper functions. Should your project require you to work with time and date, you can get the lubridate package from tidyverse or install just the package with “install.packages(“lubridate”)” command.
為了克服這個問題,我們有一個名為lubridate的便捷軟件包。 該軟件包不僅可以處理R中的標準日期和時間,而且還提供其他增強功能,例如時間段,夏令時,leap日,支持各種時區,快速時間解析以及許多輔助功能。 如果您的項目要求您使用時間和日期,則可以從tidyverse獲取lubridate軟件包。 或者使用“ install.packages(“ lubridate”)”命令僅安裝軟件包。
Read the documentation here:
在此處閱讀文檔:
5.格子 (5. lattice)
lattice is another elegant yet powerful data visualization library focussed on multivariate data. What makes this library special, is that apart from handling the regular visualizations, lattice also comes prepared with support for nonstandard situations and requirements. Due to being the practical implementation of Trellis graphics for R, it allows you to create Trellis graphs and even offers options to tune the graphs according to your requirements. lattice comes with R by default, but there’s an advanced version of lattice called latticeExtra, which might come in handy in case you want to extend the core features provided by the lattice.
格是另一個優雅而強大的數據可視化庫,它專注于多元數據。 這個庫之所以與眾不同,是因為除了處理常規的可視化之外 ,格網還準備了對非標準情況和要求的支持。 由于是R的Trellis圖形的實際實現,因此它允許您創建Trellis圖形 ,甚至提供根據您的要求調整圖形的選項。 默認情況下,R附帶有lattice,但是有一個高級版本的網格稱為gridExtra ,如果您想擴展該網格提供的核心功能,可能會派上用場。
6.毫升 (6. mlr)
The Machine Learning in R(mlr), is a library that was released in 2013 and was updated to mlr3 with newer techniques, a better architecture, and core design in 2019. As of now, the library provides a framework to address several classifications, regression, support vector machines, and many other Machine Learning activities.
R(mlr)中的機器學習(Machine Learning in R(mlr))是一個庫,于2013年發布,并于2019年通過更新的技術,更好的體系結構和核心設計更新為mlr3 。 到目前為止,該庫提供了一個框架,用于處理幾種分類,回歸,支持向量機以及許多其他機器學習活動。
mlr3 is targeted towards Machine Learning practitioners and researchers to facilitate the benchmarking and deployment of various Machine Learning algorithms without much hassle. For those looking to extend and even combine the existing learners and fine-tune the best technique for a task, will find mlr3 to be a perfect option. mlr3 can be installed using the command “install.packages(“mlr3”)”.
mlr3面向機器學習從業者和研究人員,旨在幫助輕松地對各種機器學習算法進行基準測試和部署。 對于那些希望擴展甚至結合現有學習者并微調最佳技術來完成某項任務的人來說,mlr3是理想的選擇。 可以使用命令“ install.packages(“ mlr3”)”安裝mlr3。
The wide range of functions are mentioned here —
這里提到了廣泛的功能-
7. 插入號 (7. caret)
Short for Classification And REgression Training, the caret library provides several functions to optimize the process of model training for tricky regression and classification problems. caret comes with several additional tools and functions for tasks like data splitting, variable importance estimation, feature selection, pre-processing, and many more. With caret, you can also measure the performance of the models, and even fine-tune the model behavior by using various parameters like tuneLength or tuneGrid according to your requirements. The package itself is easy to use and only loads the necessary components as it goes. The library can be installed with the command “install.packages(“caret”)”.
插入式 分類和回歸訓練的縮寫 該庫提供了一些功能來優化棘手的回歸和分類問題的模型訓練過程。 插入符還提供了一些其他工具和功能來執行任務,例如數據拆分,變量重要性估計,功能選擇,預處理等等。 使用插入符號,您還可以測量模型的性能,甚至根據需要使用各種參數(如tuneLength或tuneGrid)來微調模型行為。 程序包本身易于使用,并且僅在運行時加載必要的組件。 可以使用命令“ install.packages(“ caret”)”安裝該庫。
8. 隨從 (8. esquisse)
esquisse is not a library per se, but an addin for the powerful data visualization library ggplot2. You might be wondering why would you need this with ggplot2, let me clear it for you. ggplot2 is already smart enough, but if you need an additional layer of intuitiveness for your visualizations, esquisse is the right way to go. esquisse allows you to simply drag and drop the required data, choose the desired customization options, and there you have it, a tailored plot built within a short period and ready to export to your application of choice. With esquisse, you can create visualizations such as bar plots, histograms, scatter plots, sf objects. You can add esquisse to your environment using “install.packages(“esquisse”)”.
esquisse本身并不是一個庫,而是強大的數據可視化庫ggplot2的插件。 您可能想知道為什么ggplot2需要它,讓我為您清除它。 ggplot2已經足夠聰明了,但是如果您需要可視化的附加直觀性,那么使用esquisse是正確的方法。 esquisse允許您簡單地拖放所需的數據,選擇所需的自定義選項,就可以在短時間內構建定制的繪圖,并準備將其導出到所選的應用程序中。 使用esquisse,您可以創建可視化效果,例如條形圖,直方圖,散點圖,sf對象 。 您可以使用“ install.packages(“ esquisse”)”將esquisse添加到您的環境中。
9. 有光澤 (9. shiny)
shiny is a web application framework from RStudio that allows the developers to create interactive web applications using R with minimal web development background. With shiny, you can build web pages, interactive visualizations, dashboards, and even embed widgets on R documents. shiny can also be easily extended with CSS themes, JavaScript actions, and htmlwidgets for added customization. It comes with a host of attractive built-in widgets for presenting plots, tables, and output of R objects, and whatever you code in shiny goes live the same instant, eliminating those annoying frequent page refreshes. If you’re sold on the features and want to give it a shot, you can get shiny using the command “install.packages(“shiny”)”.
Shiny是RStudio的Web應用程序框架,允許開發人員使用R在最小的Web開發背景下創建交互式Web應用程序。 有了光澤,您可以構建網頁,交互式可視化效果,儀表板,甚至將小部件嵌入 R文檔中。 還可使用CSS主題,JavaScript操作和htmlwidget輕松擴展Shiny,以添加自定義功能。 它帶有許多吸引人的內置小部件,用于顯示R對象的圖,表和輸出,無論您用閃亮的代碼進行編碼,都可以在同一瞬間生效,從而消除了那些煩人的頻繁頁面刷新。 如果您已購買這些功能部件并想試一試,則可以使用“ install.packages(“ shiny”)”命令獲得光澤。
10. 爬行者 (10. Rcrawler)
If you’re looking for a tool to scrape data off websites and that too in an understandable format, look no further, Rcrawler is the right option for you. With Rcrawler’s powerful web crawling, data scraping, and data mining capabilities, you can not only crawl through websites and scrape data, but also analyze the network structure of any website, including its internal and external hyperlinks. In case you’re wondering why not use rvest, the Rcrawler package is a step up from rvest as it goes through all the pages on a website and extracts the data, which can be extremely helpful while trying to gather all the information from one source and in one go. The package can be installed with the command “install.packages(“Rcrawler”)”.
如果您正在尋找一種可以從網站上抓取數據的工具,并且格式也是可以理解的, 那就別無所求 , Rcrawler是您的正確選擇。 借助Rcrawler強大的Web爬網,數據抓取和數據挖掘功能 ,您不僅可以爬網網站并抓取數據,還可以分析任何網站的網絡結構,包括其內部和外部超鏈接。 如果您想知道為什么不使用rvest ,那么Rcrawler程序包會比rvest更高,因為它會遍歷網站上的所有頁面并提取數據,這在嘗試從一個來源收集所有信息時非常有幫助一口氣。 可以使用命令“ install.packages(“ Rcrawler”)”安裝該軟件包。
11. DT (11. DT)
The DT package acts as a wrapper of the JavaScript library called DataTables, for R. DT allows you to transform the data in your R matrix into an interactive table on your HTML page, which facilitates easy searching, sorting, and filtering of data. The package works by letting the main function i.e, the datatable() function, create an HTML widget for the R objects. DT allows further fine-tuning via the “options” arguments and even some additional customizability to your tables, all of this without going deep into the coding. The DT package can be installed using the command “install.packages(“DT”)”.
DT包充當JavaScript庫DataTables的包裝,用于R。DT允許您將R矩陣中的數據轉換為HTML頁面上的交互式表,從而方便了數據的搜索,排序和過濾。 該包通過讓主要功能(即datatable()函數)為R對象創建HTML小部件來工作。 DT允許通過“選項”參數進行進一步的微調,甚至可以對表進行一些其他自定義,而所有這些都無需深入編碼。 可以使用命令“ install.packages(“ DT”)”安裝DT軟件包。
12. 密謀 (12. plotly)
If you want to create interactive visualizations that steal the show, plotly would be perfect for you. With Plotly, you can create stunning, publication-worthy visualizations from a diverse collection of charts and graphs, such as scatter and line plots, bar charts, pie charts, histograms, heatmaps, contour plots, time series, you name it and plotly can make it. Built on top of the plotly.js library, plotly visualizations can also be displayed in web applications via Dash, in Jupyter Notebooks, or saved as HTML files. If you’re interested in trying out the package, you can install it using the command “install.packages(“plotly”)”.
如果您想創建可以竊取節目的交互式可視化效果,那么對于您而言, plotly非常適合。 使用Plotly,您可以從各種圖表和圖形中創建令人驚嘆的,值得發布的可視化效果,例如散點圖和折線圖,條形圖,餅圖,直方圖,熱圖,等高線圖,時間序列 ,您可以為其命名并進行繪圖做了。 構建在plotly.js庫的頂部,繪制可視化效果還可以通過Dash在Jupyter Notebooks中顯示在Web應用程序中,或另存為HTML文件。 如果您想試用該軟件包,可以使用命令“ install.packages(“ plotly”)”進行安裝。
其他值得R庫- (Other Worth R Libraries —)
- BioConductor 生物導體
- Knitr 針織衫
- Janitor 看門人
- randomForest randomForest
- e1071 e1071
- stringr 縱梁
- data.table 數據表
- RMarkdown RMarkdown
- Rvest Rvest
結論 (Conclusion)
Throughout this article, we covered some of the top R libraries covering common Data Science tasks, such as visualization, grammar, Machine Learning model training, and optimization. We know that this is not an extensive list and by no means covers the entirety of the vast ecosystem of libraries R has. CRAN, the repository for all things R, has thousands of equally capable and resourceful libraries for your specific needs with detailed information and documentation, should you ever need to find a library, we highly recommend you give CRAN a shot.
在本文中,我們涵蓋了一些頂級R庫,這些庫涵蓋了常見的數據科學任務,例如可視化,語法,機器學習模型訓練和優化。 我們知道這不是一個廣泛的清單,并且絕不涵蓋R擁有的巨大的圖書館生態系統。 CRAN是所有R的存儲庫,擁有成千上萬個功能相同且資源豐富的庫,可滿足您的特定需求,并提供詳細的信息和文檔,如果您需要查找庫,我們強烈建議您嘗試一下CRAN。
Note: To eliminate problems of different kinds, I want to alert you to the fact this article represents just my personal opinion I want to share, and you possess every right to disagree with it. If I’ve missed out any important library then do let me know in the comments section.
注意: 為消除各種問題,我想提醒您以下事實,即本文僅代表我要分享的個人觀點,您擁有與此不同意的一切權利。 如果我錯過了任何重要的庫,請在評論部分讓我知道。
About Author
關于作者
Claire D. is a Content Crafter and Marketer at Digitalogy — a tech sourcing and custom matchmaking marketplace that connects people with pre-screened & top-notch developers and designers based on their specific needs across the globe. Connect with Digitalogy on Linkedin, Twitter, Instagram.
克萊爾·D 。 是 Digitalogy 的Content Crafter and Marketinger ,這 是一個技術采購和自定義配對市場,可根據人們在全球的特定需求,將他們與預先篩選和一流的開發商和設計師聯系起來。 在 Linkedin , Twitter , Instagram 上 與 Digitalogy聯系 。
翻譯自: https://towardsdatascience.com/top-r-libraries-for-data-science-29b4e9f4907c
頂級數據恢復
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/392201.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/392201.shtml 英文地址,請注明出處:http://en.pswp.cn/news/392201.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!