小型數據庫_如果您從事“小型科學”工作,那么您是否正在利用數據存儲庫?

小型數據庫

If you’re a scientist, especially one performing a lot of your research alone, you probably have more than one spreadsheet of important data that you just haven’t gotten around to writing up yet. Maybe you never will. Sitting idle on a hard drive, that “dark data” could prove very useful to someone in the future (or even someone in the present), especially as our climate and society changes.

如果您是一名科學家,尤其是一個人獨自進行大量研究,那么您可能擁有多個重要數據電子表格,而這些電子數據您還沒有寫出來。 也許你永遠不會。 閑置在硬盤上的“黑暗數據”可能對將來的某人(甚至現在的某人)非常有用,尤其是在我們的氣候和社會變化的情況下。

What are you going to do with those files? How are you going to preserve them?

您將如何處理這些文件? 您將如何保存它們?

If you’re like me, maybe you’ve felt the terror of losing data every time you moved your files to a new computer or moved your research to a new job. Did you remember to back up that spreadsheet from your brilliant pet project from 7 years ago? If you did back it up, are you sure you backed up the most recent version? It’s sobering to imagine other people have gone through this and lost potentially valuable species records, survey data, and field observations.

如果您像我一樣,也許您每次將文件移至新計算機或將研究移至新工作時都會感到丟失數據的恐懼。 您是否還記得從7年前的出色寵物項目中備份了該電子表格? 如果您備份過,是否確定備份了最新版本? 想象其他人經歷了這一過程并失去了可能有價值的物種記錄,調查數據和實地觀察結果,這真是令人震驚。

營救的數字數據存儲庫 (Digital Data Repositories to the Rescue)

In the years before I returned to graduate school, I worked for a science nonprofit on Nantucket Island, Massachusetts, and this problem haunted me all the time. Over nearly a decade there, I accumulated spreadsheets filled with very localized, ecological data, but had no way to organize it, save it, and share it. Fortunately, a solution is emerging in the form of digital repositories backed with robust metadata schemes and indexing services. Importantly, some of these repositories are accessible to everyone, and no university affiliation is required.

回到研究生院的前幾年,我在馬薩諸塞州楠塔基特島的一家科學非營利組織工作,這個問題一直困擾著我。 在那附近的近十年中,我積累了電子表格,其中包含非常本地化的生態數據,但是卻無法組織,保存和共享它。 幸運的是,以強大的元數據方案和索引服務為后盾的數字存儲庫的形式正在出現一種解決方案。 重要的是,每個人都可以訪問其中一些存儲庫,并且不需要大學附屬機構。

In May 2020, Meghan Mitchell, Christopher Tillman Neal and I launched a digital repository for the Nantucket Biodiversity Initiative (NBI). The repository stores and protects environmental and ecology research data from around Nantucket, but it is focused on projects funded by NBI. Visit the Nantucket Biodiversity Digital Repository and browse through the files to learn about bat counts, spider surveys, sandplain grassland research, and much more.

2020年5月,我和梅根·米切爾 ( Meghan Mitchell) , 克里斯托弗·蒂爾曼·尼爾 ( Christopher Tillman Neal)共同為楠塔基特生物多樣性倡議 (NBI)建立了一個數字倉庫。 該存儲庫可以存儲和保護Nantucket周圍的環境和生態研究數據,但它的重點是由NBI資助的項目。 訪問Nantucket生物多樣性數字資料庫 ,瀏覽文件,以了解蝙蝠數量,蜘蛛調查,灘涂草地研究等更多信息。

A snapping turtle on Nantucket Island, Massachusetts
A snapping turtle on Nantucket. Over half of Nantucket Island is conservation land and scientific species inventories date back to the late 1800’s. There is a wealth of information that would benefit from being published to a repository. Photo: Andrew Mckenna-Foster
楠塔基特島上的一只鱷龜。 楠塔基特島一半以上的土地是自然保護區,科學物種清單可追溯到1800年代后期。 通過發布到存儲庫中可以獲得大量信息。 照片:安德魯·麥肯納·福斯特(Andrew Mckenna-Foster)

We used Zenodo, a free platform that allows anyone to upload research related files. Zenodo stores the files forever, makes them searchable on the internet, and even gives them a digital object identifier (DOI). However, uploading your files to a repository is the easy part of the solution; to make data useful far into the future, it is crucial to follow the core principles of data publishing and sharing. Uploading data with no context makes it one more piece of junk in the vastness of the internet.

我們使用了Zenodo ,這是一個免費平臺,任何人都可以上傳與研究相關的文件。 Zenodo永久存儲文件,使它們可以在Internet上搜索,甚至為它們提供數字對象標識符(DOI)。 但是,將文件上傳到存儲庫是該解決方案的簡單部分。 為了使數據對將來有用,遵循數據發布和共享的核心原則至關重要。 在沒有上下文的情況下上傳數據會使它在互聯網的廣闊空間中變得更加垃圾。

記錄數據很困難,但是絕對必要 (Documenting Data is Difficult but Absolutely Essential)

Published data should be FAIR: Findable, Accessible, Interoperable, and Reusable. In practice, this means

發布的數據應公平 :可查找,可訪問,可互操作和可重用。 實際上,這意味著

  • Describing the data with a solid description, useful keywords, and author information (metadata)

    用可靠的描述,有用的關鍵字和作者信息(元數據)描述數據
  • Using a standard metadata scheme so that the information can be easily shared

    使用標準的元數據方案,以便可以輕松共享信息
  • Uploading the files in an open format (like CSV)

    以開放格式(例如CSV)上傳文件
  • Licensing the data so that people and machines will understand how the data can be used.

    授予數據許可,以便人和機器可以理解如何使用數據。

That is only the bare minimum. While Zenodo and other free repository platforms like figshare and Dataverse simplify this process, it still requires work and planning.

那只是最低限度。 雖然Zenodo和其他免費的存儲庫平臺(例如figshare和Dataverse)簡化了此過程,但仍需要進行工作和計劃??。

The meat of our project was working with NBI to create a workflow that curates and applies metadata to all reports and datasets before publication. If you want to set up a repository for yourself or your organization, this is where you should focus most of your energy. We built a documentation site on GitHub that describes the process in detail and is free to copy.

我們項目的重點是與NBI合作創建一個工作流,該工作流在發布之前對所有報表和數據集進行策展并將其應用于元數據。 如果您想為自己或您的組織建立存儲庫,則應在此處集中精力。 我們在GitHub上建立了一個文檔站點 ,該站點詳細描述了該過程,可以免費復制。

那么,結果是什么? (So, What are the Outcomes?)

The repository is growing as we curate and upload reports and data going back to 2005. More importantly,

隨著我們整理和上載可追溯到2005年的報告和數據,該信息庫正在增長。更重要的是,

  • NBI now has a permanent, accessible, and shareable library of the research it has supported.

    NBI現在擁有其支持的研究的永久,可訪問且可共享的庫。
  • Researchers who work on or near Nantucket now have a way to publish their data and reports.

    現在,在Nantucket上或附近工作的研究人員可以發布其數據和報告。
  • People looking for data and information for the area can now browse current and past research. Importantly, they can cite any information they use, giving authors the credit they deserve.

    正在尋找該地區數據和信息的人們現在可以瀏覽當前和過去的研究。 重要的是,他們可以引用自己使用的任何信息,從而為作者提供應有的信譽。
  • I can sleep at night knowing the data I spent years collecting has a permanent home.

    我知道自己花了數年收集的數據擁有永久性住所,因此我可以在晚上入睡。
Charts showing what types of files have been uploaded to the digital repository
A summary of the repository as of August 2020. We use Zenodo’s API to harvest metadata from the Nantucket Biodiversity Digital Repository for visualization using Python. These charts are only possible because the workflow we designed controls how keywords are assigned.
截至2020年8月的存儲庫摘要。我們使用Zenodo的API從Nantucket生物多樣性數字存儲庫中收集元數據,以便使用Python進行可視化。 這些圖表是唯一可行的,因為我們設計的工作流程控制著關鍵字的分配方式。

As NBI continues to support research and add files to this repository, publishing the raw data, not just a project report, will be especially important. With that data in hand, researchers in 10, 50, or 100 years will be able to reproduce and directly compare data from species surveys, population surveys, and management regimes.

隨著NBI繼續支持研究并向該存儲庫添加文件,發布原始數據(而不僅僅是項目報告)將變得尤為重要。 有了這些數據,研究人員將能夠在10、50或100年內重現并直接比較物種調查,種群調查和管理制度中的數據。

存儲庫已被使用 (The Repository Is Already Being Used)

The icing on the cake is that since the repository became operational, it has already proven useful: I recently shared a dataset on Nantucket tarantulas with another spider researcher who was looking for a way to cite our observations.

錦上添花的是,自該庫投入運行以來,它已被證明是有用的:我最近與另一位蜘蛛研究人員共享了Nantucket tarantulas的數??據集,該研究人員正在尋找一種方法來引用我們的觀察結果。

I hope you consider publishing your data whenever possible and choose to follow the FAIR principles. The open science community is growing rapidly and offers numerous resources for anyone to get started. I am always open to questions and collaborations so please contact me if you’re interested in working together.

我希望您考慮在任何可能的時候發布數據,并選擇遵循FAIR原則。 開放式科學界正在Swift發展,并為任何人提供了眾多的資源。 我總是對問題和合作持開放態度,因此,如果您有興趣合作,請與我聯系。

翻譯自: https://medium.com/swlh/if-you-work-in-small-science-are-you-leveraging-data-repositories-357cabfc2326

小型數據庫

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388753.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388753.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388753.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

BitmapEffect位圖效果是簡單的像素處理操作。它可以呈現下面幾種特殊效果。

BitmapEffect位圖效果是簡單的像素處理操作。它可以呈現下面幾種特殊效果。 BevelBitmapEffect 凹凸效果 BlurBitmapEffect 模糊效果 DropShadowBitmapEffect投影效果 EmbossBitmapEffect 浮雕效果 Outer…

AutoScaling 與函數計算結合,賦予更豐富的彈性能力

目前,彈性伸縮服務已經接入了負載均衡(SLB)、云數據庫RDS 等云產品,但是暫未接入 云數據庫Redis,有時候我們可能會需要彈性伸縮服務在擴縮容的時候自動將擴縮容涉及到的 ECS 實例私網 IP 添加到 Redis 白名單或者從 Re…

參考文獻_參考

參考文獻Recently, I am attracted by the news that Tanzania has attained lower middle income status under the World Bank’s classification, five years ahead of projection. Being curious on how they make the judgement, I take a look of the World Bank’s offi…

java語言靜態分析工具_PMD 6.16.0 發布,跨語言靜態代碼自動分析工具

PMD 6.16.0 發布了。PMD 是一個代碼分析器,能夠幫助發現常見的編程問題,比如未使用的變量、空的 catch 塊、不必要的對象創建等等。最初僅支持 Java 代碼,目前還可支持 JavaScript、Salesforce.com Apex 和 Visualforce、PLSQL、Apache Veloc…

B1922 [Sdoi2010]大陸爭霸 最短路

我一直都不會dij的堆優化,今天搞了一下。。。就是先弄一個優先隊列,存每個點的數據,然后這個題就加了一點不一樣的東西,每次的最短路算兩次,一次是自己的最短路,另一次是機關的最短路,兩者取最大…

WPF中的鼠標事件詳解

WPF中的鼠標事件詳解 Uielement和ContentElement都定義了十個以Mouse開頭的事件,8個以PreviewMouse開頭的事件,MouseMove,PreviewMouseMove,MouseEnter,Mouseleave的事件處理器類型都是MouseEventHandler類型。這些事件都具備對應得MouseEventargs對象。…

數據統計 測試方法_統計測試:了解如何為數據選擇最佳測試!

數據統計 測試方法This post is not meant for seasoned statisticians. This is geared towards data scientists and machine learning (ML) learners & practitioners, who like me, do not come from a statistical background.?他的職位是不是意味著經驗豐富的統計人…

前端介紹-35

前端介紹-35 # 前端## 一、什么是前端 前端即網站前臺部分,運行在PC端,移動端等瀏覽器上展現給用戶瀏覽的網頁。隨著互聯網技術的發展,HTML5,CSS3,前端框架的應用,跨平臺響應式網頁設計能夠適應各種屏幕…

spring的幾個通知(前置、后置、環繞、異常、最終)

1、沒有異常的 2、有異常的 1、被代理類接口Person.java 1 package com.xiaostudy;2 3 /**4 * desc 被代理類接口5 * 6 * author xiaostudy7 *8 */9 public interface Person { 10 11 public void add(); 12 public void update(); 13 public void delete();…

每個Power BI開發人員的Power Query提示

If someone asks you to define the Power Query, what should you say? If you’ve ever worked with Power BI, there is no chance that you haven’t used Power Query, even if you weren’t aware of it. Therefore, one could easily say that Power Query is the “he…

c# PDF 轉換成圖片

1.新建項目 2.新增一個新文件夾“lib”(主要是為了存放引用的dll) 3.將“gsdll32.dll 、PDFLibNet.dll 、PDFView.dll”3個dll添加到文件夾中 4.項目添加“PDFLibNet.dll 、PDFView.dll”2個類庫的引用,并將gsdll32.dll 拷貝到項目生產根…

java finally在return_Java finally語句到底是在return之前還是之后執行?

點擊上方“方志朋”,選擇“置頂或者星標”你的關注意義重大!網上有很多人探討Java中異常捕獲機制try...catch...finally塊中的finally語句是不是一定會被執行?很多人都說不是,當然他們的回答是正確的,經過我試驗&#…

oracle 死鎖

為什么80%的碼農都做不了架構師?>>> ORA-01013: user requested cancel of current operation 轉載于:https://my.oschina.net/8808/blog/2994537

面試題:二叉樹的深度

題目描述:輸入一棵二叉樹,求該樹的深度。從根結點到葉結點依次經過的結點(含根、葉結點)形成樹的一條路徑,最長路徑的長度為樹的深度。 思路:遞歸 //遞歸 public class Solution {public int TreeDepth(Tre…

a/b測試_如何進行A / B測試?

a/b測試The idea of A/B testing is to present different content to different variants (user groups), gather their reactions and user behaviour and use the results to build product or marketing strategies in the future.A / B測試的想法是將不同的內容呈現給不同…

hibernate h2變mysql_struts2-hibernate-mysql開發案例 -解道Jdon

Hibernate專題struts2-hibernate-mysql開發案例與源碼源碼下載本案例展示使用Struts2,Hibernate和MySQL數據庫開發一個個人音樂管理器Web應用程序。,可將您的音樂收藏添加到數據庫中。功能有:顯示一個添加記錄的表單和所有的音樂收藏的列表。…

P5024 保衛王國

傳送門 我現在還是不明白為什么NOIPd2t3會是一道動態dp…… 首先關于動態dp可以看這里 然后這里就是把把矩陣給改一改,改成這個形式\[\left[dp_{i-1,0},dp_{i-1,1}\right]\times \left[\begin{matrix}\infty&ldp_{i,1}\\ldp_{i,0}&ldp_{i,1}\end{matrix}\ri…

提取圖像感興趣區域_從圖像中提取感興趣區域

提取圖像感興趣區域Welcome to the second post in this series where we talk about extracting regions of interest (ROI) from images using OpenCV and Python.歡迎來到本系列的第二篇文章,我們討論使用OpenCV和Python從圖像中提取感興趣區域(ROI)。 As a rec…

解決java compiler level does not match the version of the installed java project facet

ava compiler level does not match the version of the installed java project facet錯誤的解決 因工作的關系,Eclipse開發的Java項目拷來拷去,有時候會報一個很奇怪的錯誤。明明源碼一模一樣,為什么項目復制到另一臺機器上,就會…

php模板如何使用,ThinkPHP如何使用模板

到目前為止,我們只是使用了控制器和模型,還沒有接觸視圖,下面來給上面的應用添加視圖模板。首先我們修改下 Action 的 index 操作方法,添加模板賦值和渲染模板操作。PHP代碼classIndexActionextendsAction{publicfunctionindex(){…