多種數據庫連接工具_20多種熱門數據工具及其不具備的功能

多種數據庫連接工具

In the past few months, the data ecosystem has continued to burgeon as some parts of the stack consolidate and as new challenges arise. Our first attempt to help stakeholders navigate this ecosystem highlighted 25 Hot New Data Tools and What They DON’T Do — clarifying specific problems the featured companies and projects did and did NOT solve.

在過去的幾個月中,隨著堆棧中某些部分的合并以及新挑戰的出現,數據生態系統繼續蓬勃發展。 我們幫助利益相關者在這個生態系統中導航的首次嘗試著重介紹了25個熱門新數據工具及其不做的事情 -闡明了特色公司和項目已解決和未解決的具體問題。

This effort was positively received by the data science, engineering and analytics communities, and spurred more engagement than we originally anticipated. Further, we were flattered to see the original post motivate other thought-provoking pieces such as 20 Hot New Data Tools and their Early Go-to-Market Strategies.

這項努力得到了數據科學,工程和分析社區的積極歡迎,并激發了比我們最初預期更多的參與。 此外,我們很高興看到原始帖子激發了其他發人深省的內容,例如20個熱門新數據工具及其早期的進入市場策略 。

更進一步 (Taking it Further)

Regardless, we quickly recognized our original post did not go far enough as we received dozens of emails, Twitter messages and Slack DMs about other solutions that were not covered. We had shed light on a small corner of the expanding universe of data tools and platforms, yet there was an opportunity to cover even more.

無論如何,我們很快意識到我們的原始帖子遠遠不夠,因為我們收到了數十封關于其他解決方案的電子郵件,Twitter消息和Slack DM,這些其他解決方案均未涵蓋。 我們在不斷擴展的數據工具和平臺領域中發現了一個小角落,但仍有機會涵蓋更多內容。

Although we cannot chronicle every additional data tool in just one follow-up post, here we continue our efforts to cultivate this ecosystem by highlighting a few more. The creators of these tools are not only occupying meaningful parts of the ever-evolving modern data stack, they graciously responded to our requests to help us understand where they fit in.

盡管我們無法僅在一個后續職位中列出所有其他數據工具,但在此我們通過重點介紹更多內容來繼續努力培育這個生態系統。 這些工具的創建者不僅占據了不斷發展的現代數據堆棧中有意義的部分,而且還親切響應我們的要求,以幫助我們了解它們的適用范圍。

They sound-off here in their own words.

他們在這里用自己的話說。

更多工具和響應 (More Tools and Responses)

  1. Shipyard: Shipyard is a workflow orchestration platform that helps teams quickly launch, monitor, and share data solutions without worrying about infrastructure management. It lets users create reusable blueprints, share data seamlessly between jobs, and run code without any proprietary setup, all while scaling resources dynamically. Shipyard is NOT a no-code tool and does not support data versioning or data visualization.

    造船廠 :造船廠是一個工作流程編排平臺,可以幫助團隊快速啟動,監視和共享數據解決方案,而不必擔心基礎架構管理。 它使用戶可以創建可重用的藍圖,在作業之間無縫共享數據,并且無需任何專有設置即可運行代碼,而所有這些都可以動態擴展資源。 Shipyard不是一種非代碼工具,并且不支持數據版本控制或數據可視化。

  2. Count: Count is a data notebook that replaces dashboards for reporting and self-service, and supports data transformation. Count is uniquely good at team collaboration, enabling technical and non-technical users to work within the same notebook. Count is NOT a data science notebook.

    Count :Count是一個數據筆記本,它取代了用于報告和自助服務的儀表板,并支持數據轉換。 Count非常擅長團隊協作,使技術和非技術用戶都可以在同一筆記本上工作。 Count不是數據科學筆記本。

  3. Castor: Castor is uniquely good at organizing information about data to support data discovery, GDPR compliance, and knowledge management. Through a plug-and-play solution, Castor builds a comprehensive and actionable map of all data assets. Castor is NOT a data visualization or BI tool.

    Castor :Castor非常擅長組織有關數據的信息,以支持數據發現,GDPR合規性和知識管理。 通過即插即用解決方案,Castor可以構建所有數據資產的全面且可行的地圖。 Castor不是數據可視化或BI工具。

  4. Census: Census is uniquely good at syncing data models from a warehouse to business tools like Salesforce. It complements existing warehouses, data loaders & transform tools to enable data teams to drive business operations. It is NOT a no-code tool nor does it automagically model your data; it relies on analysts writing models in SQL.

    人口普查 :人口普查在將數據模型從倉庫同步到Salesforce等業務工具方面具有獨特的優勢。 它是對現有倉庫,數據加載器和轉換工具的補充,以使數據團隊能夠推動業務運營。 它不是無代碼工具,也不是自動對數據建模的工具。 它依靠分析師用SQL編寫模型。

  5. Iteratively: Iteratively is a schema registry that helps teams collaborate to define, instrument, and validate their analytics. With Iteratively, you can ship high-quality analytics faster and prevent common data quality & privacy issues that undermine trust. Iteratively is NOT a BI tool, data pipeline, or transformation tool.

    反復進行 :反復進行是一個架構注冊表,可以幫助團隊協作來定義,檢測和驗證其分析。 借助迭代,您可以更快地交付高質量的分析,并防止破壞信任的常見數據質量和隱私問題。 迭代地不是BI工具,數據管道或轉換工具。

  6. StreamSQL: StreamSQL handles deploying, versioning, and sharing model features. Using your definitions, it generates features for both serving and training. Its registry facilitates re-using features across teams and models. Stream does NOT model management and is completely agnostic to what you do with the features once you get them.

    StreamSQL :StreamSQL處理部署,版本控制和共享模型功能。 使用您的定義,它可以為服務和培訓生成功能。 其注冊表有助于跨團隊和模型重用功能。 Stream不對管理進行建模,一旦獲得這些功能,您將完全不知所措。

  7. Xplenty: Xplenty is a cloud-based ETL solution providing simple visualized data pipelines for automated data flows across a wide range of sources and destinations. It is uniquely good at ingesting large volumes of data, performing code-free data transformations, and scheduling workflows. Xplenty does NOT do event streaming.

    Xplenty :Xplenty是基于云的ETL解決方案,它提供了簡單的可視化數據管道,用于跨各種來源和目的地的自動化數據流。 它在吸收大量數據,執行無代碼的數據轉換以及調度工作流方面具有獨特的優勢。 Xplenty不執行事件流傳輸。

  8. Vectice: Vectice is uniquely good at tracking, documenting, organizing all AI assets (e.g datasets, features, models, experiments, dashboards, notebooks) and the underlying domain knowledge to successfully manage and scale the enterprise AI initiatives. Vectice does NOT provide any runtime or computational environment.

    Vectice :Vectice獨特地擅長跟蹤,記錄,組織所有AI資產(例如,數據集,功能,模型,實驗,儀表板,筆記本)和基礎領域知識,以成功管理和擴展企業AI計劃。 Vectice不提供任何運行時或計算環境。

  9. Snowplow Analytics: Snowplow is a streaming behavioral data engine that is uniquely good at generating event data from dedicated web/mobile/server SDKs, enhancing that data and delivering it to your data warehouse. Snowplow is NOT a data integration (ELT) tool, nor a general streaming framework, nor a BI tool.

    Snowplow Analytics :Snowplow是一種流式行為數據引擎,非常擅長從專用的Web /移動/服務器SDK生成事件數據,增強該數據并將其傳遞到您的數據倉庫。 Snowplow并不是數據集成(ELT)工具,也不是通用的流框架,也不是BI工具。

  10. Datafold: Datafold is uniquely good at comparing datasets in a SQL data warehouse or across data warehouses. It enables running “git diff” on a table of any size. Datafold is NOT a database itself (it works on top of existing infrastructure) and it does NOT work with files.

    數據折疊 :數據折疊獨特地擅長比較SQL數據倉庫或跨數據倉庫中的數據集。 它允許在任何大小的表上運行“ git diff”。 Datafold本身不是數據庫(它可以在現有基礎結構之上運行),并且不能與文件一起使用。

  11. Splitgraph: Splitgraph is a tool for building, extending, versioning, and sharing SQL databases that is uniquely good at enhancing existing tools. Splitgraph also features a data catalogue including 40K open datasets that can be queried (and joined) with any SQL client. Splitgraph is NOT a database.

    Splitgraph :Splitgraph是用于構建,擴展,版本控制和共享SQL數據庫的工具,該工具獨特地擅長于增強現有工具。 Splitgraph還具有一個數據目錄,其中包括可以與任何SQL客戶端查詢(和聯接)的4萬個開放數據集。 Splitgraph不是數據庫。

  12. Datacoral: Datacoral is uniquely good at automatically generating data ingestion and transformation pipelines from SQL-based declarative specifications, and automatically capturing and displaying schema level lineage. Datacoral plays nice with data ingestion tools like Segment, and workflow management tools like Airflow. Datacoral is NOT a data warehouse or a query engine.

    Datacoral :Datacoral擅長于根據基于SQL的聲明性規范自動生成數據提取和轉換管道,以及自動捕獲和顯示架構級別的沿襲。 Datacoral可以與數據吸收工具(例如細分)和工作流管理工具(例如Airflow)配合使用。 Datacoral不是數據倉庫或查詢引擎。

  13. Apache Arrow: Apache Arrow is uniquely good as a language-independent standard for fast in-memory analytical processing and efficient interprocess transport (with minimal overhead) of large tabular datasets. While intended as a computational foundation for data frame projects, it is NOT a replacement for end-user facing tools like pandas.

    Apache Arrow :Apache Arrow作為獨立于語言的標準非常出色,可用于大型表格數據集的快速內存內分析處理和高效的進程間傳輸(開銷最小)。 雖然旨在作為數據框架項目的計算基礎,但它并不能替代面向最終用戶的工具(如熊貓)。

  14. Datasaur: Datasaur is built to support NLP labeling via ML-assisted suggestions. It supports workforce management, maintains data privacy, and can be integrated via API to any ML workflow. Datasaur does NOT handle bounding boxes for image/video labeling.

    Datasaur :Datasaur旨在通過ML輔助建議來支持NLP標記。 它支持勞動力管理,維護數據隱私,并且可以通過API集成到任何ML工作流程中。 Datasaur不處理圖像/視頻標簽的邊框。

  15. Datakin: Datakin is a DataOps solution that helps guarantee that data pipelines run without disruption and resulting data can be trusted. It does so by automatically discovering data lineage and providing tools to quickly identify and resolve issues. Datakin is NOT a data catalog nor does it replace any existing data infrastructure components (workflow orchestration, data processing, …).

    Datakin :Datakin是DataOps解決方案,可幫助確保數據管道運行不中斷,并且可以信任生成的數據。 它通過自動發現數據沿襲并提供工具來快速識別和解決問題來做到這一點。 Datakin不是數據目錄,也不代替任何現有的數據基礎架構組件(工作流程編排,數據處理等)。

  16. ApertureData: ApertureData is a database for visual data like images, videos, feature vectors, and associated metadata like annotations. It natively supports complex searching and preprocessing operations over media objects, and integrates with cloud-based storage and ML frameworks like PyTorch/Tensorflow.. ApertureData does NOT extract metadata or features from images/videos.

    ApertureData :ApertureData是一個數據庫,用于存儲視覺數據,例如圖像,視頻,特征向量以及相關的元數據(例如注釋)。 它本身支持對媒體對象的復雜搜索和預處理操作,并與基于云的存儲和ML框架(如PyTorch / Tensorflow)集成。.ApertureData不會從圖像/視頻中提取元數據或特征。

  17. Orchest: Orchest is uniquely good at assisting data scientists in interactively building data science pipelines by providing a visual pipeline editing environment in the browser. Pipeline steps are containerized notebooks or scripts. Orchest does NOT replace Jupyter notebooks, provide a no-code tool, or bring its own computational infrastructure.

    Orchest :Orchest獨特地擅長通過在瀏覽器中提供可視化的管道編輯環境來協助數據科學家以交互方式構建數據科學管道。 管道步驟是容器化的筆記本或腳本。 Orchest不會替換Jupyter筆記本,提供無代碼工具或擁有自己的計算基礎結構。

  18. Gazette: Gazette is an open source streaming platform that breaks down the divide between batch and real-time data, enabling users to build real-time applications with exactly-once semantics. It offers real-time message streams, which are natively and durably stored as regular files in cloud storage. Gazette is NOT an ETL tool or an analytics platform.

    Gazette :Gazette是一個開放源代碼的流媒體平臺,可打破批處理數據與實時數據之間的鴻溝,使用戶能夠使用一次精確的語義構建實時應用程序。 它提供了實時消息流,這些消息流作為常規文件以本地和持久方式存儲在云存儲中。 憲報不是ETL工具或分析平臺。

  19. Coiled Computing: Coiled excels at scaling data science and machine learning workflows in native Python using Dask, which is familiar, widely adopted, and gives great feedback. Coiled is an opinionated way of bursting to clusters and the cloud while staying in the PyData ecosystem. Coiled/Dask is NOT a database or Kubernetes replacement.

    Coiled Computing :Coiled在使用達斯克(Dask)來擴展本地Python中的數據科學和機器學習工作流程方面表現出色,該工具已被熟悉,被廣泛采用并提供了很好的反饋。 盤繞是一種固守在PyData生態系統中而突然爆發的集群和云方法。 Coiled / Dask不是數據庫或Kubernetes的替代品。

  20. Upsolver: Upsolver is a cloud-native solution for integrating structured and unstructured data on cloud storage. It utilizes a visual, SQL interface for quick and easy data transformation. Upsolver is NOT a Platform as a Service solution that requires developers to write additional code and learn low-level concepts to process data.

    Upsolver :Upsolver是一種云原生解決方案,用于在云存儲上集成結構化和非結構化數據。 它利用可視化SQL界面進行快速輕松的數據轉換。 Upsolver并非平臺即服務解決方案,它要求開發人員編寫其他代碼并學習低級概念來處理數據。

As authors (Sarah, Abe & Pete) we’re collectively brainstorming about how we can extend this effort and create an ever-growing list that helps practitioners find and adopt the right tools, founders align with the best partners, and investors map companies to their investment theses. We look forward to hearing your thoughts on the best medium to continue this exploration with the support of the community.

作為作者( Sarah , Abe和Pete ),我們正在集體商討如何擴展這項工作并創建一個不斷增長的清單,以幫助從業人員找到并采用正確的工具,創始人與最佳合作伙伴保持一致,以及投資者將公司定位于他們的投資論文。 我們期待聽到您在最佳媒體上的想法,以便在社區的支持下繼續進行這一探索。

翻譯自: https://towardsdatascience.com/20-more-hot-data-tools-and-what-they-dont-do-46bc365bea74

多種數據庫連接工具

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/392555.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/392555.shtml
英文地址,請注明出處:http://en.pswp.cn/news/392555.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

怎么連接 mysql_怎樣連接連接數據庫

這個博客是為了說明怎么連接數據庫第一步:肯定是要下載數據庫,本人用的SqlServer2008,是從別人的U盤中拷來的。第二步:數據庫的登錄方式設置為混合登錄,步驟如下:1.打開數據庫這是數據庫界面,要…

webstorm環境安裝配置(less+autoprefixer)

node安裝: 參考地址:http://www.runoob.com/nodejs/nodejs-install-setup.html 1.下載node安裝包并完成安裝 2.在開始菜單打開node 3.查看是否安裝完成(npm是node自帶安裝的) 命令:node -v npm -v less安裝&#xff1a…

leetcode 659. 分割數組為連續子序列(貪心算法)

給你一個按升序排序的整數數組 num(可能包含重復數字),請你將它們分割成一個或多個子序列,其中每個子序列都由連續整數組成且長度至少為 3 。 如果可以完成上述分割,則返回 true ;否則,返回 fa…

將JAVA編譯為EXE的幾種方法

< DOCTYPE html PUBLIC -WCDTD XHTML StrictEN httpwwwworgTRxhtmlDTDxhtml-strictdtd> 將JAVA編譯為EXE的幾種方法 -------------------------------------------------------------------------------- 將Java應用程序本地編譯為EXE的幾種方法(建議使用JOVE和JET)  a.…

文本訓練集_訓練文本中的不穩定性

文本訓練集介紹 (Introduction) In text generation, conventionally, maximum likelihood estimation is used to train a model to generate a text one token at a time. Each generated token will be compared against the ground-truth data. If any token is different …

山東省賽 傳遞閉包

https://vjudge.net/contest/311348#problem/A 思路&#xff1a;用floyd傳遞閉包處理點與點之間的關系&#xff0c;之后開數組記錄每個數字比它大的個數和小的個數&#xff0c;如果這個個數超過n/2那么它不可能作為中位數&#xff0c;其他的都有可能。 #include<bits/stdc.h…

如何使用動態工具提示構建React Native圖表

by Vikrant Negi通過Vikrant Negi 如何使用動態工具提示構建React Native圖表 (How to build React Native charts with dynamic tooltips) Creating charts, be it on the web or on mobile apps, has always been an interesting and challenging task especially in React …

如何解決ajax跨域問題(轉)

由 于此前很少寫前端的代碼(哈哈&#xff0c;不合格的程序員啊)&#xff0c;最近項目中用到json作為系統間交互的手段&#xff0c;自然就伴隨著眾多ajax請求&#xff0c;隨之而來的就是要解決 ajax的跨域問題。本篇將講述一個小白從遇到跨域不知道是跨域問題&#xff0c;到知道…

mysql并發錯誤_又談php+mysql并發數據出錯問題

最近&#xff0c;項目中的所有crond定時盡量取消&#xff0c;改成觸發式。比如每日6點清理數據。原來的邏輯&#xff0c;寫一個crond定時搞定現在改為觸發式6點之后第一個玩家/用戶 進入&#xff0c;才開始清理數據。出現了一個問題1 如何確保第一個玩家觸發&#xff1f;updat…

leetcode 621. 任務調度器(貪心算法)

給你一個用字符數組 tasks 表示的 CPU 需要執行的任務列表。其中每個字母表示一種不同種類的任務。任務可以以任意順序執行&#xff0c;并且每個任務都可以在 1 個單位時間內執行完。在任何一個單位時間&#xff0c;CPU 可以完成一個任務&#xff0c;或者處于待命狀態。 然而&…

英國腦科學領域_來自英國A級算法崩潰的數據科學家的4課

英國腦科學領域In the UK, families, educators, and government officials are in an uproar about the effects of a new algorithm for scoring “A-levels,” the advanced level qualifications used to evaluate students’ knowledge of specific subjects in preparati…

MVC發布后項目存在于根目錄中的子目錄中時的css與js、圖片路徑問題

加載固定資源js與css <script src"Url.Content("~/Scripts/js/jquery.min.js")" type"text/javascript"></script> <link href"Url.Content("~/Content/css/shop.css")" rel"stylesheet" type&quo…

telegram 機器人_學習使用Python在Telegram中構建您的第一個機器人

telegram 機器人Imagine this, there is a message bot that will send you a random cute dog image whenever you want, sounds cool right? Let’s make one!想象一下&#xff0c;有一個消息機器人可以隨時隨地向您發送隨機的可愛狗圖像&#xff0c;聽起來很酷吧&#xff1…

判斷輸入的字符串是否為回文_刷題之路(九)--判斷數字是否回文

Palindrome Number問題簡介&#xff1a;判斷輸入數字是否是回文,不是返回0,負數返回0舉例:1:輸入: 121輸出: true2:輸入: -121輸出: false解釋: 回文為121-&#xff0c;所以負數都不符合3:輸入: 10輸出: false解釋: 倒序為01&#xff0c;不符合要求解法一&#xff1a;這道題比較…

python + selenium 搭建環境步驟

介紹在windows下&#xff0c;selenium python的安裝以及配置。1、首先要下載必要的安裝工具。 下載python&#xff0c;我安裝的python3.0版本,根據你自己的需要安裝下載setuptools下載pip(python的安裝包管理工具) 配置系統的環境變量 python,需要配置2個環境變量C:\Users\AppD…

VirtualBox 虛擬機復制

本文簡單講兩種情況下的復制方式 1 跨電腦復制 2 同一virtrul box下 虛擬機復制 ---------------------------------------------- 1 跨電腦復制 a虛擬機 是老的虛擬機 b虛擬機 是新的虛擬機 新虛擬機b 新建&#xff0c; 點擊下一步會生成 相應的文件夾 找到老虛擬機a的 vdi 文…

javascript實用庫_編寫實用JavaScript的實用指南

javascript實用庫by Nadeesha Cabral通過Nadeesha Cabral 編寫實用JavaScript的實用指南 (A practical guide to writing more functional JavaScript) Functional programming is great. With the introduction of React, more and more JavaScript front-end code is being …

數據庫數據過長避免_為什么要避免使用商業數據科學平臺

數據庫數據過長避免讓我們從一個類比開始 (Lets start with an analogy) Stick with me, I promise it’s relevant.堅持下去&#xff0c;我保證這很重要。 If your selling vegetables in a grocery store your business value lies in your loyal customers and your positi…

mysql case快捷方法_MySQL case when使用方法實例解析

首先我們創建數據庫表&#xff1a; CREATE TABLE t_demo (id int(32) NOT NULL,name varchar(255) DEFAULT NULL,age int(2) DEFAULT NULL,num int(3) DEFAULT NULL,PRIMARY KEY (id)) ENGINEInnoDB DEFAULT CHARSETutf8;插入數據&#xff1a;INSERT INTO t_demo VALUES (1, 張…

【~~~】POJ-1006

很簡單的一道題目&#xff0c;但是引出了很多知識點。 這是一道中國剩余問題&#xff0c;先貼一下1006的代碼。 #include "stdio.h" #define MAX 21252 int main() { int p , e , i , d , n 1 , days 0; while(1) { scanf("%d %d %d %d",&p,&e,&…