數據科學r語言_您應該為數據科學學習哪些語言?

數據科學r語言

Data science is an exciting field to work in, combining advanced statistical and quantitative skills with real-world programming ability. There are many potential programming languages that the aspiring data scientist might consider specializing in.

數據科學是一個令人興奮的領域,它將先進的統計和定量技能與實際編程能力相結合。 有抱負的數據科學家可能會考慮采用許多潛在的編程語言。

While there is no correct answer, there are several things to take into consideration. Your success as a data scientist will depend on many points, including:

盡管沒有正確的答案,但有幾件事要考慮。 您作為數據科學家的成功取決于很多方面,包括:

Specificity

特異性

When it comes to advanced data science, you will only get so far reinventing the wheel each time. Learn to master the various packages and modules offered in your chosen language. The extent to which this is possible depends on what domain-specific packages are available to you in the first place!

當涉及到高級數據科學時,每次您都只能重新發明輪子。 學習掌握以您選擇的語言提供的各種軟件包和模塊。 可能的程度首先取決于您可以使用哪些特定于域的軟件包!

Generality

概論

A top data scientist will have good all-round programming skills as well as the ability to crunch numbers. Much of the day-to-day work in data science revolves around sourcing and processing raw data or ‘data cleaning’. For this, no amount of fancy machine learning packages are going to help.

一位頂尖的數據科學家將具有良好的全面編程技能以及處理數字的能力。 數據科學中的許多日常工作都圍繞著采購和處理原始數據或“數據清理”。 為此,沒有任何花哨的機器學習包會有所幫助。

Productivity

生產率

In the often fast-paced world of commercial data science, there is much to be said for getting the job done quickly. However, this is what enables technical debt to creep in — and only with sensible practices can this be minimized.

在通常快節奏的商業數據科學世界中,要快速完成工作有很多話要說。 但是,這正是技術債務蔓延的原因,只有明智的實踐才能使這種債務減至最少。

Performance

性能

In some cases it is vital to optimize the performance of your code, especially when dealing with large volumes of mission-critical data. Compiled languages are typically much faster than interpreted ones; likewise statically typed languages are considerably more fail-proof than dynamically typed. The obvious trade-off is against productivity.

在某些情況下,優化代碼的性能至關重要,尤其是在處理大量關鍵任務數據時。 編譯語言通常比解釋語言要快得多。 同樣,靜態類型的語言比動態類型的語言具有更好的防故障能力。 明顯的權衡是不利于生產力。

To some extent, these can be seen as a pair of axes (Generality-Specificity, Performance-Productivity). Each of the languages below fall somewhere on these spectra.

在某種程度上,這些可以看作是一對軸(通用性,性能和生產率)。 下面的每種語言都屬于這些頻譜。

With these core principles in mind, let’s take a look at some of the more popular languages used in data science. What follows is a combination of research and personal experience of myself, friends and colleagues — but it is by no means definitive! In approximately order of popularity, here goes:

牢記這些核心原則,讓我們看一下數據科學中使用的一些更流行的語言。 接下來是我自己,朋友和同事的研究和個人經驗的結合,但這絕不是確定的! 按流行程度大致如下:

[R (R)

你需要知道的 (What you need to know)

Released in 1995 as a direct descendant of the older S programming language, R has since gone from strength to strength. Written in C, Fortran and itself, the project is currently supported by the R Foundation for Statistical Computing.

R作為較早的S編程語言的直接后代于1995年發布,此后R變得越來越強大。 該項目由C,Fortran及其本身編寫,目前得到R統計計算基金會的支持。

執照 (License)

Free!

自由!

優點 (Pros)

  • Excellent range of high-quality, domain specific and open source packages. R has a package for almost every quantitative and statistical application imaginable. This includes neural networks, non-linear regression, phylogenetics, advanced plotting and many, many others.

    高質量,特定領域和開源軟件包的優秀產品。 R提供了幾乎所有可以想象的定量和統計應用程序的軟件包。 這包括神經網絡,非線性回歸,系統發育,高級繪圖以及許多其他功能。

  • The base installation comes with very comprehensive, in-built statistical functions and methods. R also handles matrix algebra particularly well.

    基本安裝帶有非常全面的內置統計功能和方法。 R還可以很好地處理矩陣代數。
  • Data visualization is a key strength with the use of libraries such as ggplot2.

    借助ggplot2之類的庫,數據可視化是關鍵優勢。

缺點 (Cons)

  • Performance. There’s no two ways about it, R is not a quick language.

    性能。 關于它,沒有兩種方法, R不是一種快速的語言 。

  • Domain specificity. R is fantastic for statistics and data science purposes. But less so for general purpose programming.

    域特異性。 對于統計和數據科學而言,R太棒了。 但是對于通用編程而言則更少。
  • Quirks. R has a few unusual features that might catch out programmers experienced with other languages. For instance: indexing from 1, using multiple assignment operators, unconventional data structures.

    怪癖。 R具有一些不尋常的功能,這些功能可能趕不上使用其他語言的程序員。 例如:使用多個賦值運算符從1開始索引,使用非常規數據結構。

裁決-“為設計目的而精采” (Verdict — “brilliant at what it’s designed for”)

R is a powerful language that excels at a huge variety of statistical and data visualization applications, and being open source allows for a very active community of contributors. Its recent growth in popularity is a testament to how effective it is at what it does.

R是一種功能強大的語言,擅長于各種統計和數據可視化應用程序,并且開源是一個非常活躍的貢獻者社區。 它最近受歡迎程度的提高證明了它在工作中的有效性。

Python (Python)

你需要知道的 (What you need to know)

Guido van Rossum introduced Python back in 1991. It has since become an extremely popular general purpose language, and is widely used within the data science community. The major versions are currently 3.6 and 2.7.

Guido van Rossum于1991年引入Python。此后,Python成為一種非常流行的通用語言,并在數據科學界廣泛使用。 當前的主要版本是3.6和2.7 。

執照 (License)

Free!

自由!

優點 (Pros)

  • Python is a very popular, mainstream general purpose programming language. It has an extensive range of purpose-built modules and community support. Many online services provide a Python API.

    Python是一種非常流行的主流通用編程語言。 它具有廣泛的專用模塊和社區支持。 許多在線服務都提供Python API。

  • Python is an easy language to learn. The low barrier to entry makes it an ideal first language for those new to programming.

    Python是一種易于學習的語言。 入門門檻低,使其成為編程新手的理想第一語言。
  • Packages such as pandas, scikit-learn and Tensorflow make Python a solid option for advanced machine learning applications.

    諸如pandas , scikit-learn和Tensorflow之類的軟件包使Python成為高級機器學習應用程序的可靠選擇。

缺點 (Cons)

  • Type safety: Python is a dynamically typed language, which means you must show due care. Type errors (such as passing a String as an argument to a method which expects an Integer) are to be expected from time-to-time.

    類型安全性:Python是一種動態類型化的語言,這意味著您必須格外小心。 有時會出現類型錯誤(例如將String作為參數傳遞給需要Integer的方法)。
  • For specific statistical and data analysis purposes, R’s vast range of packages gives it a slight edge over Python. For general purpose languages, there are faster and safer alternatives to Python.

    為了實現特定的統計和數據分析目的,R廣泛的軟件包使其與Python相比有一點優勢。 對于通用語言,Python提供了更快,更安全的替代方法。

裁決-“優秀的全能選手” (Verdict — “excellent all-rounder”)

Python is a very good choice of language for data science, and not just at entry-level. Much of the data science process revolves around the ETL process (extraction-transformation-loading). This makes Python’s generality ideally suited. Libraries such as Google’s Tensorflow make Python a very exciting language to work in for machine learning.

Python是數據科學語言的很好選擇,而不僅僅是入門級的語言。 許多數據科學過程都圍繞ETL過程 (提取-轉換-加載)進行。 這使得Python的通用性非常適合。 諸如Google的Tensorflow之類的庫使Python成為一種非常激動人心的語言,可用于機器學習。

SQL (SQL)

你需要知道的 (What you need to know)

SQL (‘Structured Query Language’) defines, manages and queries relational databases. The language appeared by 1974 and has since undergone many implementations, but the core principles remain the same.

SQL (“結構化查詢語言”)定義,管理和查詢關系數據庫 。 該語言于1974年問世,此后經歷了許多實現,但是核心原理保持不變。

執照 (License)

Varies — some implementations are free, others proprietary

不同-有些實現是免費的,而另一些則是專有的

優點 (Pros)

  • Very efficient at querying, updating and manipulating relational databases.

    在查詢,更新和操作關系數據庫方面非常高效。
  • Declarative syntax makes SQL an often very readable language . There’s no ambiguity about what SELECT name FROM users WHERE age > 18 is supposed to do!

    聲明式語法使SQL成為一種非常易讀的語言。 對于SELECT name FROM users WHERE age > 18 SELECT name FROM users WHERE age >SELECT name FROM users WHERE age >應該做什么沒有任何歧義!

  • SQL is very used across a range of applications, making it a very useful language to be familiar with. Modules such as SQLAlchemy make integrating SQL with other languages straightforward.

    SQL在各種應用程序中都非常常用,這使其成為一種非常有用的語言。 諸如SQLAlchemy之類的模塊使SQL與其他語言的集成變得簡單。

缺點 (Cons)

  • SQL’s analytical capabilities are rather limited — beyond aggregating and summing, counting and averaging data, your options are limited.

    SQL的分析功能非常有限-除了聚合,求和,計數和平均數據之外,您的選擇也受到限制。
  • For programmers coming from an imperative background, SQL’s declarative syntax can present a learning curve.

    對于來自命令性背景的程序員而言,SQL的聲明性語法可以顯示學習曲線。
  • There are many different implementations of SQL such as PostgreSQL, SQLite, MariaDB . They are all different enough to make inter-operability something of a headache.

    SQL有許多不同的實現,例如PostgreSQL , SQLite , MariaDB 。 它們之間的差異足以使互操作性令人頭疼。

裁決-“永恒而高效” (Verdict — “timeless and efficient”)

SQL is more useful as a data processing language than as an advanced analytical tool. Yet so much of the data science process hinges upon ETL, and SQL’s longevity and efficiency are proof that it is a very useful language for the modern data scientist to know.

SQL作為數據處理語言比作為高級分析工具更有用。 然而,這么多的數據科學過程都取決于ETL,而SQL的壽命和效率證明了它是現代數據科學家了解的非常有用的語言。

Java (Java)

你需要知道的 (What you need to know)

Java is an extremely popular, general purpose language which runs on the (JVM) Java Virtual Machine. It’s an abstract computing system that enables seamless portability between platforms. Currently supported by Oracle Corporation.

Java是一種非常流行的通用語言,可在(JVM)Java虛擬機上運行。 這是一個抽象的計算系統,可實現平臺之間的無縫移植。 目前由Oracle Corporation支持。

執照 (License)

Version 8 — Free! Legacy versions, proprietary.

版本8 —免費! 舊版,專有。

優點 (Pros)

  • Ubiquity . Many modern systems and applications are built upon a Java back-end. The ability to integrate data science methods directly into the existing codebase is a powerful one to have.

    無處不在。 許多現代系統和應用程序都建立在Java后端上。 將數據科學方法直接集成到現有代碼庫中的能力非常強大。
  • Strongly typed. Java is no-nonsense when it comes to ensuring type safety. For mission-critical big data applications, this is invaluable.

    強類型。 在確保類型安全性方面,Java毫無疑問。 對于關鍵任務大數據應用程序來說,這是無價的。
  • Java is a high-performance, general purpose, compiled language . This makes it suitable for writing efficient ETL production code and computationally intensive machine learning algorithms.

    Java是一種高性能的通用編譯語言。 這使其適合編寫高效的ETL生產代碼和計算密集型機器學習算法。

缺點 (Cons)

  • For ad-hoc analyses and more dedicated statistical applications, Java’s verbosity makes it an unlikely first choice. Dynamically typed scripting languages such as R and Python lend themselves to much greater productivity.

    對于臨時分析和更專用的統計應用程序,Java的冗長性使其成為不太可能的首選。 動態類型的腳本語言(例如R和Python)可提高生產力。
  • Compared to domain-specific languages like R, there aren’t a great number of libraries available for advanced statistical methods in Java.

    與R之類的領域特定語言相比,Java中沒有太多可用于高級統計方法的庫。

裁決-“數據科學的有力競爭者” (Verdict — “a serious contender for data science”)

There is a lot to be said for learning Java as a first choice data science language. Many companies will appreciate the ability to seamlessly integrate data science production code directly into their existing codebase, and you will find Java’s performance and and type safety are real advantages.

學習Java作為首選的數據科學語言有很多話要說。 許多公司將欣賞將數據科學生產代碼直接無縫集成到其現有代碼庫中的能力,并且您會發現Java的性能和類型安全是真正的優勢。

However, you’ll be without the range of stats-specific packages available to other languages. That said, definitely one to consider — especially if you already know one of R and/or Python.

但是,您將沒有其他語言可用的特定于統計信息的軟件包范圍。 就是說,絕對要考慮的一個-特別是如果您已經了解R和/或Python之一。

Scala (Scala)

你需要知道的 (What you need to know)

Developed by Martin Odersky and released in 2004, Scala is a language which runs on the JVM. It is a multi-paradigm language, enabling both object-oriented and functional approaches. Cluster computing framework Apache Spark is written in Scala.

Scala由Martin Odersky開發并于2004年發布,是一種在JVM上運行的語言。 它是一種多范式語言,支持面向對象的方法和功能方法。 集群計算框架Apache Spark用Scala編寫。

執照 (License)

Free!

自由!

優點 (Pros)

  • Scala + Spark = High performance cluster computing. Scala is an ideal choice of language for those working with high-volume data sets.

    Scala + Spark =高性能集群計算。 對于使用大量數據集的人員來說,Scala是理想的語言選擇。
  • Multi-paradigmatic: Scala programmers can have the best of both worlds. Both object-oriented and functional programming paradigms available to them.

    多范式:Scala程序員可以兼得兩者。 面向對象和功能編程范例都可以使用。
  • Scala is compiled to Java bytecode and runs on a JVM. This allows inter-operability with the Java language itself, making Scala a very powerful general purpose language, while also being well-suited for data science.

    Scala被編譯為Java字節碼,并在JVM上運行。 這允許與Java語言本身進行互操作,從而使Scala成為功能非常強大的通用語言,同時也非常適合數據科學。

缺點 (Cons)

  • Scala is not a straightforward language to get up and running with if you’re just starting out. Your best bet is to download sbt and set up an IDE such as Eclipse or IntelliJ with a specific Scala plug-in.

    如果您剛入門,Scala不是一種簡單易用的語言。 最好的選擇是下載sbt并使用特定的Scala插件設置IDE(例如Eclipse或IntelliJ)。

  • The syntax and type system are often described as complex. This makes for a steep learning curve for those coming from dynamic languages such as Python.

    語法和類型系統通常被描述為復雜的。 這為那些來自動態語言(例如Python)的人提供了陡峭的學習曲線。

裁決-“完美,適用于大數據” (Verdict — “perfect, for suitably big data”)

When it comes to using cluster computing to work with Big Data, then Scala + Spark are fantastic solutions. If you have experience with Java and other statically typed languages, you’ll appreciate these features of Scala too.

在使用群集計算與大數據一起使用時,Scala + Spark是絕佳的解決方案。 如果您有Java和其他靜態類型語言的使用經驗,那么您也會喜歡Scala的這些功能。

Yet if your application doesn’t deal with the volumes of data that justify the added complexity of Scala, you will likely find your productivity being much higher using other languages such as R or Python.

但是,如果您的應用程序不處理足以證明Scala增加了復雜性的數據量,那么使用R或Python等其他語言可能會發現您的生產力要高得多。

朱莉亞 (Julia)

你需要知道的 (What you need to know)

Released just over 5 years ago, Julia has made an impression in the world of numerical computing. Its profile was raised thanks to early adoption by several major organizations including many in the finance industry.

Julia(Julia)發布于5年前,在數值計算領域給人留下了深刻的印象。 由于包括金融業在內的數個主要組織的早期采用,提高了它的形象。

執照 (License)

Free!

自由!

優點 (Pros)

  • Julia is a JIT (‘just-in-time’) compiled language, which lets it offer good performance. It also offers the simplicity, dynamic-typing and scripting capabilities of an interpreted language like Python.

    Julia是一種JIT(“及時”)編譯語言,可提供良好的性能。 它還提供了像Python這樣的解釋語言的簡單性,動態鍵入和腳本編寫功能。
  • Julia was purpose-designed for numerical analysis. It is capable of general purpose programming as well.

    Julia是專為數字分析而設計的。 它也能夠進行通用編程。
  • Readability. Many users of the language cite this as a key advantage

    可讀性。 許多使用該語言的用戶都將其作為主要優勢

缺點 (Cons)

  • Maturity. As a new language, some Julia users have experienced instability when using packages. But the core language itself is reportedly stable enough for production use.

    到期。 作為一種新語言,一些Julia用戶在使用軟件包時會遇到不穩定的情況。 但是據報道,核心語言本身已經足夠穩定,可供生產使用。
  • Limited packages are another consequence of the language’s youthfulness and small development community. Unlike long-established R and Python, Julia doesn’t have the choice of packages (yet).

    有限的軟件包是該語言的年輕化和小型開發社區的另一個結果。 與歷史悠久的R和Python不同,Julia尚未選擇軟件包。

裁決-“一個為未來” (Verdict — “one for the future”)

The main issue with Julia is one that cannot be blamed for. As a recently developed language, it isn’t as mature or production-ready as its main alternatives Python and R.

Julia的主要問題是不能責怪的。 作為一種新近開發的語言,它不像其主要替代品Python和R那樣成熟或可以投入生產。

But, if you are willing to be patient, there’s every reason to pay close attention as the language evolves in the coming years.

但是,如果您愿意耐心等待,那么隨著語言在未來幾年的發展,我們有充分的理由要密切注意。

的MATLAB (MATLAB)

你需要知道的 (What you need to know)

MATLAB is an established numerical computing language used throughout academia and industry. It is developed and licensed by MathWorks, a company established in 1984 to commercialize the software.

MATLAB是一種在學術界和行業中廣泛使用的已建立的數值計算語言。 它由MathWorks開發并獲得許可,該公司成立于1984年,旨在將該軟件商業化。

執照 (License)

Proprietary — pricing varies depending on your use case

專有-定價因您的用例而異

優點 (Pros)

  • Designed for numerical computing. MATLAB is well-suited for quantitative applications with sophisticated mathematical requirements such as signal processing, Fourier transforms, matrix algebra and image processing.

    專為數值計算而設計。 MATLAB非常適合具有復雜數學要求的定量應用,例如信號處理,傅立葉變換,矩陣代數和圖像處理。
  • Data Visualization. MATLAB has some great inbuilt plotting capabilities.

    數據可視化。 MATLAB具有一些出色的內置繪圖功能。
  • MATLAB is often taught as part of many undergraduate courses in quantitative subjects such as Physics, Engineering and Applied Mathematics. As a consequence, it is widely used within these fields.

    在許多物理,工程和應用數學等定量學科的本科課程中,經常將MATLAB授課。 結果,它在這些領域中被廣泛使用。

缺點 (Cons)

  • Proprietary licence. Depending on your use-case (academic, personal or enterprise) you may have to fork out for a pricey licence. There are free alternatives available such as Octave. This is something you should give real consideration to.

    專有許可證。 根據您的用例(學術,個人或企業),您可能需要為獲得昂貴的許可證付出代價。 有免費的替代方法,例如Octave 。 這是您應該真正考慮的事情。

  • MATLAB isn’t an obvious choice for general-purpose programming.

    對于通用編程,MATLAB不是一個明顯的選擇。

裁決-“最適合數學密集型應用程序” (Verdict — “best for mathematically intensive applications”)

MATLAB’s widespread use in a range of quantitative and numerical fields throughout industry and academia makes it a serious option for data science.

MATLAB在整個行業和學術界在定量和數值領域的廣泛使用使其成為數據科學的重要選擇。

The clear use-case would be when your application or day-to-day role requires intensive, advanced mathematical functionality. Indeed, MATLAB was specifically designed for this.

明確的用例是您的應用程序或日常角色需要密集的高級數學功能時。 實際上,MATLAB是為此專門設計的。

其他語言 (Other Languages)

There are other mainstream languages that may or may not be of interest to data scientists. This section provides a quick overview… with plenty of room for debate of course!

還有其他主流語言可能會或可能不會對數據科學家感興趣。 本節提供快速概述…當然還有足夠的討論空間!

C ++ (C++)

C++ is not a common choice for data science, although it has lightning fast performance and widespread mainstream popularity. The simple reason may be a question of productivity versus performance.

盡管C ++具有閃電般的快速性能和廣泛的主流流行度,但它并不是數據科學的常見選擇。 簡單的原因可能是生產率與性能的問題。

As one Quora user puts it:

正如Quora的一位用戶所說 :

“If you’re writing code to do some ad-hoc analysis that will probably only be run one time, would you rather spend 30 minutes writing a program that will run in 10 seconds, or 10 minutes writing a program that will run in 1 minute?”

“如果您正在編寫代碼以進行可能僅運行一次的臨時分析,您寧愿花30分鐘編寫將在10秒內運行的程序,還是花10分鐘編寫將在1秒內運行的程序?分鐘?”

The dude’s got a point. Yet for serious production-level performance, C++ would be an excellent choice for implementing machine learning algorithms optimized at a low-level.

花花公子的觀點。 但是對于嚴重的生產級性能,C ++將是實現在低級優化的機器學習算法的絕佳選擇。

Verdict — “not for day-to-day work, but if performance is critical…”

裁決-“不是日常工作,但如果績效至關重要……”

JavaScript (JavaScript)

With the rise of Node.js in recent years, JavaScript has become more and more a serious server-side language. However, its use in data science and machine learning domains has been limited to date (although checkout brain.js and synaptic.js!). It suffers from the following disadvantages:

近年來,隨著Node.js的興起, JavaScript越來越成為一種嚴肅的服務器端語言。 但是,它在數據科學和機器學習領域中的使用迄今受到限制(盡管結帳brain.js和synaptic.js !)。 它具有以下缺點:

  • Late to the game (Node.js is only 8 years old!), meaning…

    游戲晚了(Node.js才8歲!),這意味著…
  • Few relevant data science libraries and modules are available. This means no real mainstream interest or momentum

    很少有相關的數據科學庫和模塊可用。 這意味著沒有真正的主流興趣或動力
  • Performance-wise, Node.js is quick. But JavaScript as a language is not without its critics.

    在性能方面,Node.js很快。 但是JavaScript作為一種語言并不是沒有批評者的 。

Node’s strengths are in asynchronous I/O, its widespread use and the existence of languages which compile to JavaScript. So it’s conceivable that a useful framework for data science and realtime ETL processing could come together.

Node的優勢在于異步I / O,其廣泛使用以及可編譯為JavaScript的語言的存在。 因此可以想象,一個有用的數據科學框架和實時ETL處理可以融合在一起。

The key question is whether this would offer anything different to what already exists.

關鍵問題是,這是否會提供與已經存在的不同的東西。

Verdict — “there is much to do before JavaScript can be taken as a serious data science language”

結論—“在將JavaScript視為一種嚴肅的數據科學語言之前,還有很多事情要做”

Perl (Perl)

Perl is known as a ‘Swiss-army knife of programming languages’, due to its versatility as a general-purpose scripting language. It shares a lot in common with Python, being a dynamically typed scripting language. But, it has not seen anything like the popularity Python has in the field of data science.

Perl因其作為通用腳本語言的多功能性而被稱為“編程語言的瑞士軍刀”。 它是動態類型化的腳本語言,與Python有很多共同點。 但是,它沒有像Python在數據科學領域那樣受歡迎。

This is a little surprising, given its use in quantitative fields such as bioinformatics. Perl has several key disadvantages when it comes to data science. It isn’t stand-out fast, and its syntax is famously unfriendly. There hasn’t been the same drive towards developing data science specific libraries. And in any field, momentum is key.

考慮到它在生物信息學等定量領域的使用,這有點令人驚訝。 在數據科學方面,Perl有幾個關鍵的缺點。 它不是很快就脫穎而出,并且其語法眾所周知是不友好的 。 開發特定于數據科學的庫的驅動力不同。 在任何領域,動力都是關鍵。

Verdict — “a useful general purpose scripting language, yet it offers no real advantages for your data science CV”

判決-“一種有用的通用腳本語言,但對于您的數據科學CV并沒有任何真正的優勢”

Ruby (Ruby)

Ruby is another general purpose, dynamically typed interpreted language. Yet it also hasn’t seen the same adoption for data science as has Python.

Ruby是另一種通用的動態類型的解釋語言。 但是,它還沒有像Python那樣被數據科學采用。

This might seem surprising, but is likely a result of Python’s dominance in academia, and a positive feedback effect . The more people use Python, the more modules and frameworks are developed, and the more people will turn to Python.

這似乎令人驚訝,但很可能是Python在學術界的統治地位以及積極的反饋效應的結果。 使用Python的人越多,開發的模塊和框架就越多,并且使用Python的人也就越多。

The SciRuby project exists to bring scientific computing functionality, such as matrix algebra, to Ruby. But for the time being, Python still leads the way.

存在SciRuby項目是為了將科學計算功能(例如矩陣代數)引入Ruby。 但是就目前而言,Python仍然處于領先地位。

Verdict — “not an obvious choice yet for data science, but won’t harm the CV”

判決-“對于數據科學而言,尚不是一個顯而易見的選擇,但不會損害簡歷”

結論 (Conclusion)

Well, there you have it — a quickfire guide to which languages to consider for data science. The key here is to understand your usage requirements in terms of generality vs specificity, as well as your personal preferred development style of performance vs productivity.

好了,您已掌握了它-速成指南,可為數據科學考慮使用哪些語言。 這里的關鍵是要從通用性和特異性方面了解您的使用要求,以及您個人偏愛的性能與生產力開發風格。

I use R, Python and SQL on a regular basis, as my current role largely focuses on developing existing data pipeline and ETL processes. These languages give the right balance of generality and productivity to do the job, with the option of using R’s more advanced statistics packages when needed.

我經常使用R,Python和SQL,因為我目前的職責主要集中在開發現有的數據管道和ETL流程上。 這些語言可以在通用性和生產率之間實現適當的平衡,并在需要時可以選擇使用R的高級統計軟件包。

However — you may already have some experience with Java. Or you may want to use Scala for big data. Or, perhaps you’re keen to get involved with the Julia project.

但是,您可能已經對Java有一定的經驗。 或者您可能想將Scala用于大數據。 或者,也許您渴望參與Julia項目。

Maybe you learned MATLAB at university, or want to give SciRuby a chance? Perhaps you have an altogether different suggestion. If so, please leave a reply below — I look forward to hearing from you!

也許您是在大學學習過MATLAB的,還是想給SciRuby一個機會? 也許您有完全不同的建議。 如果是這樣,請在下面留下答復-我期待您的來信!

Thanks for reading!

謝謝閱讀!

翻譯自: https://www.freecodecamp.org/news/which-languages-should-you-learn-for-data-science-e806ba55a81f/

數據科學r語言

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/395367.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/395367.shtml
英文地址,請注明出處:http://en.pswp.cn/news/395367.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Linux平臺不同解壓縮命令的使用方法

作者:郭孝星 微博:郭孝星的新浪微博 郵箱:allenwells163.com 博客:http://blog.csdn.net/allenwells github:https://github.com/AllenWell 一 .tar 解包 tar xvf FileName.tar 打包 tar cvf FileName.tar DirName 注意…

unity中怎么做河流_【干貨】工作中怎么做工業設計的?(一)

最近在找工作,一直在看招聘信息。看到工業設計工資還是蠻高的。應屆畢業生一般是4-6K,1-3年工作經驗是6-8K,3年以后的差不多是8K以上了。我沒有嫉妒羨慕恨,發誓,真的沒有。工業設計已經被重視,未來的道路會…

[易學易懂系列|golang語言|零基礎|快速入門|(一)]

golang編程語言,是google推出的一門語言。 主要應用在系統編程和高性能服務器編程,有廣大的市場前景,目前整個生態也越來越強大,未來可能在企業應用和人工智能等領域占有越來越重要的地位。 本文章是【易學易懂系列|編程語言入門】…

APUE學習之三個特殊位 設置用戶ID(set-user-ID),設置組ID(set-group-ID),sticky...

設置用戶ID(set-user-ID),設置組ID(set-group-ID),stickyset-user-ID: SUID當文件的該位有設置時,表示當該文件被執行時,程序具有文件所有者的權限而不是執行者的權限。這樣說有點繞…

微信調用html退后方法,微信瀏覽器后退關閉頁面

不需要引用 微信jssdk 關閉瀏覽器WeixinJSBridge.invoke(closeWindow, {}, function (res) { });參考:https://mp.weixin.qq.com/wiki/12/7dd29a53f4b55a8ddc6177ab60e5ee2c.html監聽微信、支付寶等移動app及瀏覽器的返回、后退、上一頁按鈕的事件方法參考&#xff…

在gitlab 中使用webhook 實現php 自動部署git 代碼

在技術團隊討論中,我們決定從svn 遷移到 git ,于是使用了gitlab,代碼自動部署使用了webhook在服務器上 1.開啟PHP需要的環境支持 服務器環境必須先安裝git 環境,webhook 依賴php運行環境,同時需要使用shell_exec 和 exec 等函數。…

spi收發時的寄存器sr不變_「正點原子Linux連載」第二十七章SPI實驗(二)

1)實驗平臺:正點原子Linux開發板2)摘自《正點原子I.MX6U嵌入式Linux驅動開發指南》關注官方微信號公眾號,獲取更多資料:正點原子文件bsp_spi.c中有兩個函數:spi_init和spich0_readwrite_byte,函數spi_init是SPI初始化函…

vue腳手架vue數據交互_學習Vue:3分鐘的交互式Vue JS教程

vue腳手架vue數據交互Vue.js is a JavaScript library for building user interfaces. Last year, it started to become quite popular among web developers. It’s lightweight, relatively easy to learn, and powerful.Vue.js是用于構建用戶界面JavaScript庫。 去年&#…

[JSOI2018]潛入行動

題解 一道思路不難但是寫起來很麻煩的樹形背包 我們發現每個節點有很多信息需要保留 所以就暴力的設\(f[u][j][0/1][0/1]\)表示點u的子樹分配了j個監察器,點u有沒有被控制,點u放沒放監察器 然后就分四種情況暴力討論就好了 注意背包的時候要卡常數 代碼 #include<cstdio>…

css。元素樣式、邊框樣式

1.外邊距  margin 縮寫形式&#xff1a; margin&#xff1a;上邊距  右邊距  下邊距  左邊距 margin&#xff1a;上下邊距  左右邊距 margin&#xff1a;上邊距  左右邊距  下邊距 2.內邊距  padding 縮寫形式&#xff1a; padding&#xff1a;上邊距  右邊距…

html文本對齊6,HTML對齊文本

我要像以下列方式顯示頁面上的文本&#xff1a;HTML對齊文本My Text: Text HereMy Text: More Text Here.........................................................Text from line above continued here.我有以下的標記只是為了測試&#xff1a;body {font-family: arial;}fo…

vue底部跳轉_詳解Vue底部導航欄組件

不多說直接上代碼 BottomNav.vue&#xff1a;{{item.name}}export default{props:[idx],data(){return {items:[{cls:"home",name:"首頁",push:"/home",icon:"../static/home.png",iconSelect:"../static/home_select.png"}…

Android Studio環境搭建

Android Studio環境搭建 個人博客 歡迎大家多多關注該獨立博客。 ###[csdn博客]&#xff08;http://blog.csdn.net/peace1213&#xff09; 一直想把自己的經驗分享出來&#xff0c;記得上次寫博客還是ok6410的筆記。感覺時代久遠啊。記得那個時候我還一心想搞硬件了。如今又一次…

hacktoberfest_Hacktoberfest和其他有趣的事情將在本周末在freeCodeCamp

hacktoberfestby Quincy Larson昆西拉爾森(Quincy Larson) Hacktoberfest和其他有趣的事情將在本周末在freeCodeCamp (Hacktoberfest and other fun things going on this weekend at freeCodeCamp) Earlier this month, the freeCodeCamp community turned 3 years old. And …

C# 動態創建數據庫三(MySQL)

前面有說明使用EF動態新建數據庫與表&#xff0c;數據庫使用的是SQL SERVER2008的&#xff0c;在使用MYSQL的時候還是有所不同 一、添加 EntityFramework.dll &#xff0c;System.Data.Entity.dll &#xff0c;MySql.Data, MySql.Data.Entity.EF6 注意&#xff1a;Entity Frame…

iOS開發Swift篇—(七)函數(1)

一、函數的定義 &#xff08;1&#xff09;函數的定義格式 1 func 函數名(形參列表) -> 返回值類型 { 2 // 函數體... 3 4 } &#xff08;2&#xff09;形參列表的格式 形參名1: 形參類型1, 形參名2: 形參類型2, … &#xff08;3&#xff09;舉例&#xff1a;計算2個…

如何用計算機管理員權限,教你電腦使用代碼添加管理員權限的詳細教程

我們在使用電腦運行某些軟件的時候&#xff0c;可能需要用到管理員權限才能運行&#xff0c;通常來說直接點擊右鍵就會有管理員權限&#xff0c;但最近有用戶向小編反饋&#xff0c;在需要管理員權限的軟件上點擊右鍵沒有看到管理員取得所有權&#xff0c;那么究竟該如何才能獲…

activiti 5.22的demo運行

activiti 5.22的demo運行 從github上clon下來的activiti項目,運行demo項目activiti-webapp-explorer2時&#xff0c;在使用到流程設計工作區&#xff0c;選取activiti modeler作為設計器的時候報錯。 從下面的報錯信息中發現&#xff0c;請求路徑http://localhost:8080/activit…

宣布JavaScript 2017狀況調查

by Sacha Greif由Sacha Greif 宣布JavaScript 2017狀況調查 (Announcing the State of JavaScript 2017 Survey) 讓我們找出去年以來發生的變化&#xff01; (Let’s find out what’s changed since last year!) In a hurry? You can take the survey here.匆忙&#xff1f;…

內是不是半包圍結構_輕鋼別墅的體系結構

一、輕鋼別墅介紹1、輕鋼別墅的屋面系統輕鋼別墅屋面系統是由屋架、結構OSB面板、防水層、輕型屋面瓦&#xff08;金屬或瀝青瓦&#xff09;組成的。輕鋼結構的屋面&#xff0c;外觀可以有多種組合。材料也有多種。在保障了防水這一技術的前提下&#xff0c;外觀有了許多的選擇…