多種數據庫連接工具
In the past few months, the data ecosystem has continued to burgeon as some parts of the stack consolidate and as new challenges arise. Our first attempt to help stakeholders navigate this ecosystem highlighted 25 Hot New Data Tools and What They DON’T Do — clarifying specific problems the featured companies and projects did and did NOT solve.
在過去的幾個月中,隨著堆棧中某些部分的合并以及新挑戰的出現,數據生態系統繼續蓬勃發展。 我們幫助利益相關者在這個生態系統中導航的首次嘗試著重介紹了25個熱門新數據工具及其不做的事情 -闡明了特色公司和項目已解決和未解決的具體問題。
This effort was positively received by the data science, engineering and analytics communities, and spurred more engagement than we originally anticipated. Further, we were flattered to see the original post motivate other thought-provoking pieces such as 20 Hot New Data Tools and their Early Go-to-Market Strategies.
這項努力得到了數據科學,工程和分析社區的積極歡迎,并激發了比我們最初預期更多的參與。 此外,我們很高興看到原始帖子激發了其他發人深省的內容,例如20個熱門新數據工具及其早期的進入市場策略 。
更進一步 (Taking it Further)
Regardless, we quickly recognized our original post did not go far enough as we received dozens of emails, Twitter messages and Slack DMs about other solutions that were not covered. We had shed light on a small corner of the expanding universe of data tools and platforms, yet there was an opportunity to cover even more.
無論如何,我們很快意識到我們的原始帖子遠遠不夠,因為我們收到了數十封關于其他解決方案的電子郵件,Twitter消息和Slack DM,這些其他解決方案均未涵蓋。 我們在不斷擴展的數據工具和平臺領域中發現了一個小角落,但仍有機會涵蓋更多內容。
Although we cannot chronicle every additional data tool in just one follow-up post, here we continue our efforts to cultivate this ecosystem by highlighting a few more. The creators of these tools are not only occupying meaningful parts of the ever-evolving modern data stack, they graciously responded to our requests to help us understand where they fit in.
盡管我們無法僅在一個后續職位中列出所有其他數據工具,但在此我們通過重點介紹更多內容來繼續努力培育這個生態系統。 這些工具的創建者不僅占據了不斷發展的現代數據堆棧中有意義的部分,而且還親切響應我們的要求,以幫助我們了解它們的適用范圍。
They sound-off here in their own words.
他們在這里用自己的話說。
更多工具和響應 (More Tools and Responses)
Shipyard: Shipyard is a workflow orchestration platform that helps teams quickly launch, monitor, and share data solutions without worrying about infrastructure management. It lets users create reusable blueprints, share data seamlessly between jobs, and run code without any proprietary setup, all while scaling resources dynamically. Shipyard is NOT a no-code tool and does not support data versioning or data visualization.
造船廠 :造船廠是一個工作流程編排平臺,可以幫助團隊快速啟動,監視和共享數據解決方案,而不必擔心基礎架構管理。 它使用戶可以創建可重用的藍圖,在作業之間無縫共享數據,并且無需任何專有設置即可運行代碼,而所有這些都可以動態擴展資源。 Shipyard不是一種非代碼工具,并且不支持數據版本控制或數據可視化。
Count: Count is a data notebook that replaces dashboards for reporting and self-service, and supports data transformation. Count is uniquely good at team collaboration, enabling technical and non-technical users to work within the same notebook. Count is NOT a data science notebook.
Count :Count是一個數據筆記本,它取代了用于報告和自助服務的儀表板,并支持數據轉換。 Count非常擅長團隊協作,使技術和非技術用戶都可以在同一筆記本上工作。 Count不是數據科學筆記本。
Castor: Castor is uniquely good at organizing information about data to support data discovery, GDPR compliance, and knowledge management. Through a plug-and-play solution, Castor builds a comprehensive and actionable map of all data assets. Castor is NOT a data visualization or BI tool.
Castor :Castor非常擅長組織有關數據的信息,以支持數據發現,GDPR合規性和知識管理。 通過即插即用解決方案,Castor可以構建所有數據資產的全面且可行的地圖。 Castor不是數據可視化或BI工具。
Census: Census is uniquely good at syncing data models from a warehouse to business tools like Salesforce. It complements existing warehouses, data loaders & transform tools to enable data teams to drive business operations. It is NOT a no-code tool nor does it automagically model your data; it relies on analysts writing models in SQL.
人口普查 :人口普查在將數據模型從倉庫同步到Salesforce等業務工具方面具有獨特的優勢。 它是對現有倉庫,數據加載器和轉換工具的補充,以使數據團隊能夠推動業務運營。 它不是無代碼工具,也不是自動對數據建模的工具。 它依靠分析師用SQL編寫模型。
Iteratively: Iteratively is a schema registry that helps teams collaborate to define, instrument, and validate their analytics. With Iteratively, you can ship high-quality analytics faster and prevent common data quality & privacy issues that undermine trust. Iteratively is NOT a BI tool, data pipeline, or transformation tool.
反復進行 :反復進行是一個架構注冊表,可以幫助團隊協作來定義,檢測和驗證其分析。 借助迭代,您可以更快地交付高質量的分析,并防止破壞信任的常見數據質量和隱私問題。 迭代地不是BI工具,數據管道或轉換工具。
StreamSQL: StreamSQL handles deploying, versioning, and sharing model features. Using your definitions, it generates features for both serving and training. Its registry facilitates re-using features across teams and models. Stream does NOT model management and is completely agnostic to what you do with the features once you get them.
StreamSQL :StreamSQL處理部署,版本控制和共享模型功能。 使用您的定義,它可以為服務和培訓生成功能。 其注冊表有助于跨團隊和模型重用功能。 Stream不對管理進行建模,一旦獲得這些功能,您將完全不知所措。
Xplenty: Xplenty is a cloud-based ETL solution providing simple visualized data pipelines for automated data flows across a wide range of sources and destinations. It is uniquely good at ingesting large volumes of data, performing code-free data transformations, and scheduling workflows. Xplenty does NOT do event streaming.
Xplenty :Xplenty是基于云的ETL解決方案,它提供了簡單的可視化數據管道,用于跨各種來源和目的地的自動化數據流。 它在吸收大量數據,執行無代碼的數據轉換以及調度工作流方面具有獨特的優勢。 Xplenty不執行事件流傳輸。
Vectice: Vectice is uniquely good at tracking, documenting, organizing all AI assets (e.g datasets, features, models, experiments, dashboards, notebooks) and the underlying domain knowledge to successfully manage and scale the enterprise AI initiatives. Vectice does NOT provide any runtime or computational environment.
Vectice :Vectice獨特地擅長跟蹤,記錄,組織所有AI資產(例如,數據集,功能,模型,實驗,儀表板,筆記本)和基礎領域知識,以成功管理和擴展企業AI計劃。 Vectice不提供任何運行時或計算環境。
Snowplow Analytics: Snowplow is a streaming behavioral data engine that is uniquely good at generating event data from dedicated web/mobile/server SDKs, enhancing that data and delivering it to your data warehouse. Snowplow is NOT a data integration (ELT) tool, nor a general streaming framework, nor a BI tool.
Snowplow Analytics :Snowplow是一種流式行為數據引擎,非常擅長從專用的Web /移動/服務器SDK生成事件數據,增強該數據并將其傳遞到您的數據倉庫。 Snowplow并不是數據集成(ELT)工具,也不是通用的流框架,也不是BI工具。
Datafold: Datafold is uniquely good at comparing datasets in a SQL data warehouse or across data warehouses. It enables running “git diff” on a table of any size. Datafold is NOT a database itself (it works on top of existing infrastructure) and it does NOT work with files.
數據折疊 :數據折疊獨特地擅長比較SQL數據倉庫或跨數據倉庫中的數據集。 它允許在任何大小的表上運行“ git diff”。 Datafold本身不是數據庫(它可以在現有基礎結構之上運行),并且不能與文件一起使用。
Splitgraph: Splitgraph is a tool for building, extending, versioning, and sharing SQL databases that is uniquely good at enhancing existing tools. Splitgraph also features a data catalogue including 40K open datasets that can be queried (and joined) with any SQL client. Splitgraph is NOT a database.
Splitgraph :Splitgraph是用于構建,擴展,版本控制和共享SQL數據庫的工具,該工具獨特地擅長于增強現有工具。 Splitgraph還具有一個數據目錄,其中包括可以與任何SQL客戶端查詢(和聯接)的4萬個開放數據集。 Splitgraph不是數據庫。
Datacoral: Datacoral is uniquely good at automatically generating data ingestion and transformation pipelines from SQL-based declarative specifications, and automatically capturing and displaying schema level lineage. Datacoral plays nice with data ingestion tools like Segment, and workflow management tools like Airflow. Datacoral is NOT a data warehouse or a query engine.
Datacoral :Datacoral擅長于根據基于SQL的聲明性規范自動生成數據提取和轉換管道,以及自動捕獲和顯示架構級別的沿襲。 Datacoral可以與數據吸收工具(例如細分)和工作流管理工具(例如Airflow)配合使用。 Datacoral不是數據倉庫或查詢引擎。
Apache Arrow: Apache Arrow is uniquely good as a language-independent standard for fast in-memory analytical processing and efficient interprocess transport (with minimal overhead) of large tabular datasets. While intended as a computational foundation for data frame projects, it is NOT a replacement for end-user facing tools like pandas.
Apache Arrow :Apache Arrow作為獨立于語言的標準非常出色,可用于大型表格數據集的快速內存內分析處理和高效的進程間傳輸(開銷最小)。 雖然旨在作為數據框架項目的計算基礎,但它并不能替代面向最終用戶的工具(如熊貓)。
Datasaur: Datasaur is built to support NLP labeling via ML-assisted suggestions. It supports workforce management, maintains data privacy, and can be integrated via API to any ML workflow. Datasaur does NOT handle bounding boxes for image/video labeling.
Datasaur :Datasaur旨在通過ML輔助建議來支持NLP標記。 它支持勞動力管理,維護數據隱私,并且可以通過API集成到任何ML工作流程中。 Datasaur不處理圖像/視頻標簽的邊框。
Datakin: Datakin is a DataOps solution that helps guarantee that data pipelines run without disruption and resulting data can be trusted. It does so by automatically discovering data lineage and providing tools to quickly identify and resolve issues. Datakin is NOT a data catalog nor does it replace any existing data infrastructure components (workflow orchestration, data processing, …).
Datakin :Datakin是DataOps解決方案,可幫助確保數據管道運行不中斷,并且可以信任生成的數據。 它通過自動發現數據沿襲并提供工具來快速識別和解決問題來做到這一點。 Datakin不是數據目錄,也不代替任何現有的數據基礎架構組件(工作流程編排,數據處理等)。
ApertureData: ApertureData is a database for visual data like images, videos, feature vectors, and associated metadata like annotations. It natively supports complex searching and preprocessing operations over media objects, and integrates with cloud-based storage and ML frameworks like PyTorch/Tensorflow.. ApertureData does NOT extract metadata or features from images/videos.
ApertureData :ApertureData是一個數據庫,用于存儲視覺數據,例如圖像,視頻,特征向量以及相關的元數據(例如注釋)。 它本身支持對媒體對象的復雜搜索和預處理操作,并與基于云的存儲和ML框架(如PyTorch / Tensorflow)集成。.ApertureData不會從圖像/視頻中提取元數據或特征。
Orchest: Orchest is uniquely good at assisting data scientists in interactively building data science pipelines by providing a visual pipeline editing environment in the browser. Pipeline steps are containerized notebooks or scripts. Orchest does NOT replace Jupyter notebooks, provide a no-code tool, or bring its own computational infrastructure.
Orchest :Orchest獨特地擅長通過在瀏覽器中提供可視化的管道編輯環境來協助數據科學家以交互方式構建數據科學管道。 管道步驟是容器化的筆記本或腳本。 Orchest不會替換Jupyter筆記本,提供無代碼工具或擁有自己的計算基礎結構。
Gazette: Gazette is an open source streaming platform that breaks down the divide between batch and real-time data, enabling users to build real-time applications with exactly-once semantics. It offers real-time message streams, which are natively and durably stored as regular files in cloud storage. Gazette is NOT an ETL tool or an analytics platform.
Gazette :Gazette是一個開放源代碼的流媒體平臺,可打破批處理數據與實時數據之間的鴻溝,使用戶能夠使用一次精確的語義構建實時應用程序。 它提供了實時消息流,這些消息流作為常規文件以本地和持久方式存儲在云存儲中。 憲報不是ETL工具或分析平臺。
Coiled Computing: Coiled excels at scaling data science and machine learning workflows in native Python using Dask, which is familiar, widely adopted, and gives great feedback. Coiled is an opinionated way of bursting to clusters and the cloud while staying in the PyData ecosystem. Coiled/Dask is NOT a database or Kubernetes replacement.
Coiled Computing :Coiled在使用達斯克(Dask)來擴展本地Python中的數據科學和機器學習工作流程方面表現出色,該工具已被熟悉,被廣泛采用并提供了很好的反饋。 盤繞是一種固守在PyData生態系統中而突然爆發的集群和云方法。 Coiled / Dask不是數據庫或Kubernetes的替代品。
Upsolver: Upsolver is a cloud-native solution for integrating structured and unstructured data on cloud storage. It utilizes a visual, SQL interface for quick and easy data transformation. Upsolver is NOT a Platform as a Service solution that requires developers to write additional code and learn low-level concepts to process data.
Upsolver :Upsolver是一種云原生解決方案,用于在云存儲上集成結構化和非結構化數據。 它利用可視化SQL界面進行快速輕松的數據轉換。 Upsolver并非平臺即服務解決方案,它要求開發人員編寫其他代碼并學習低級概念來處理數據。
As authors (Sarah, Abe & Pete) we’re collectively brainstorming about how we can extend this effort and create an ever-growing list that helps practitioners find and adopt the right tools, founders align with the best partners, and investors map companies to their investment theses. We look forward to hearing your thoughts on the best medium to continue this exploration with the support of the community.
作為作者( Sarah , Abe和Pete ),我們正在集體商討如何擴展這項工作并創建一個不斷增長的清單,以幫助從業人員找到并采用正確的工具,創始人與最佳合作伙伴保持一致,以及投資者將公司定位于他們的投資論文。 我們期待聽到您在最佳媒體上的想法,以便在社區的支持下繼續進行這一探索。
翻譯自: https://towardsdatascience.com/20-more-hot-data-tools-and-what-they-dont-do-46bc365bea74
多種數據庫連接工具
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/392555.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/392555.shtml 英文地址,請注明出處:http://en.pswp.cn/news/392555.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!