小型數據庫
If you’re a scientist, especially one performing a lot of your research alone, you probably have more than one spreadsheet of important data that you just haven’t gotten around to writing up yet. Maybe you never will. Sitting idle on a hard drive, that “dark data” could prove very useful to someone in the future (or even someone in the present), especially as our climate and society changes.
如果您是一名科學家,尤其是一個人獨自進行大量研究,那么您可能擁有多個重要數據電子表格,而這些電子數據您還沒有寫出來。 也許你永遠不會。 閑置在硬盤上的“黑暗數據”可能對將來的某人(甚至現在的某人)非常有用,尤其是在我們的氣候和社會變化的情況下。
What are you going to do with those files? How are you going to preserve them?
您將如何處理這些文件? 您將如何保存它們?
If you’re like me, maybe you’ve felt the terror of losing data every time you moved your files to a new computer or moved your research to a new job. Did you remember to back up that spreadsheet from your brilliant pet project from 7 years ago? If you did back it up, are you sure you backed up the most recent version? It’s sobering to imagine other people have gone through this and lost potentially valuable species records, survey data, and field observations.
如果您像我一樣,也許您每次將文件移至新計算機或將研究移至新工作時都會感到丟失數據的恐懼。 您是否還記得從7年前的出色寵物項目中備份了該電子表格? 如果您備份過,是否確定備份了最新版本? 想象其他人經歷了這一過程并失去了可能有價值的物種記錄,調查數據和實地觀察結果,這真是令人震驚。
營救的數字數據存儲庫 (Digital Data Repositories to the Rescue)
In the years before I returned to graduate school, I worked for a science nonprofit on Nantucket Island, Massachusetts, and this problem haunted me all the time. Over nearly a decade there, I accumulated spreadsheets filled with very localized, ecological data, but had no way to organize it, save it, and share it. Fortunately, a solution is emerging in the form of digital repositories backed with robust metadata schemes and indexing services. Importantly, some of these repositories are accessible to everyone, and no university affiliation is required.
回到研究生院的前幾年,我在馬薩諸塞州楠塔基特島的一家科學非營利組織工作,這個問題一直困擾著我。 在那附近的近十年中,我積累了電子表格,其中包含非常本地化的生態數據,但是卻無法組織,保存和共享它。 幸運的是,以強大的元數據方案和索引服務為后盾的數字存儲庫的形式正在出現一種解決方案。 重要的是,每個人都可以訪問其中一些存儲庫,并且不需要大學附屬機構。
In May 2020, Meghan Mitchell, Christopher Tillman Neal and I launched a digital repository for the Nantucket Biodiversity Initiative (NBI). The repository stores and protects environmental and ecology research data from around Nantucket, but it is focused on projects funded by NBI. Visit the Nantucket Biodiversity Digital Repository and browse through the files to learn about bat counts, spider surveys, sandplain grassland research, and much more.
2020年5月,我和梅根·米切爾 ( Meghan Mitchell) , 克里斯托弗·蒂爾曼·尼爾 ( Christopher Tillman Neal)共同為楠塔基特生物多樣性倡議 (NBI)建立了一個數字倉庫。 該存儲庫可以存儲和保護Nantucket周圍的環境和生態研究數據,但它的重點是由NBI資助的項目。 訪問Nantucket生物多樣性數字資料庫 ,瀏覽文件,以了解蝙蝠數量,蜘蛛調查,灘涂草地研究等更多信息。

We used Zenodo, a free platform that allows anyone to upload research related files. Zenodo stores the files forever, makes them searchable on the internet, and even gives them a digital object identifier (DOI). However, uploading your files to a repository is the easy part of the solution; to make data useful far into the future, it is crucial to follow the core principles of data publishing and sharing. Uploading data with no context makes it one more piece of junk in the vastness of the internet.
我們使用了Zenodo ,這是一個免費平臺,任何人都可以上傳與研究相關的文件。 Zenodo永久存儲文件,使它們可以在Internet上搜索,甚至為它們提供數字對象標識符(DOI)。 但是,將文件上傳到存儲庫是該解決方案的簡單部分。 為了使數據對將來有用,遵循數據發布和共享的核心原則至關重要。 在沒有上下文的情況下上傳數據會使它在互聯網的廣闊空間中變得更加垃圾。
記錄數據很困難,但是絕對必要 (Documenting Data is Difficult but Absolutely Essential)
Published data should be FAIR: Findable, Accessible, Interoperable, and Reusable. In practice, this means
發布的數據應公平 :可查找,可訪問,可互操作和可重用。 實際上,這意味著
- Describing the data with a solid description, useful keywords, and author information (metadata) 用可靠的描述,有用的關鍵字和作者信息(元數據)描述數據
- Using a standard metadata scheme so that the information can be easily shared 使用標準的元數據方案,以便可以輕松共享信息
- Uploading the files in an open format (like CSV) 以開放格式(例如CSV)上傳文件
- Licensing the data so that people and machines will understand how the data can be used. 授予數據許可,以便人和機器可以理解如何使用數據。
That is only the bare minimum. While Zenodo and other free repository platforms like figshare and Dataverse simplify this process, it still requires work and planning.
那只是最低限度。 雖然Zenodo和其他免費的存儲庫平臺(例如figshare和Dataverse)簡化了此過程,但仍需要進行工作和計劃??。
The meat of our project was working with NBI to create a workflow that curates and applies metadata to all reports and datasets before publication. If you want to set up a repository for yourself or your organization, this is where you should focus most of your energy. We built a documentation site on GitHub that describes the process in detail and is free to copy.
我們項目的重點是與NBI合作創建一個工作流,該工作流在發布之前對所有報表和數據集進行策展并將其應用于元數據。 如果您想為自己或您的組織建立存儲庫,則應在此處集中精力。 我們在GitHub上建立了一個文檔站點 ,該站點詳細描述了該過程,可以免費復制。
那么,結果是什么? (So, What are the Outcomes?)
The repository is growing as we curate and upload reports and data going back to 2005. More importantly,
隨著我們整理和上載可追溯到2005年的報告和數據,該信息庫正在增長。更重要的是,
- NBI now has a permanent, accessible, and shareable library of the research it has supported. NBI現在擁有其支持的研究的永久,可訪問且可共享的庫。
- Researchers who work on or near Nantucket now have a way to publish their data and reports. 現在,在Nantucket上或附近工作的研究人員可以發布其數據和報告。
- People looking for data and information for the area can now browse current and past research. Importantly, they can cite any information they use, giving authors the credit they deserve. 正在尋找該地區數據和信息的人們現在可以瀏覽當前和過去的研究。 重要的是,他們可以引用自己使用的任何信息,從而為作者提供應有的信譽。
- I can sleep at night knowing the data I spent years collecting has a permanent home. 我知道自己花了數年收集的數據擁有永久性住所,因此我可以在晚上入睡。

As NBI continues to support research and add files to this repository, publishing the raw data, not just a project report, will be especially important. With that data in hand, researchers in 10, 50, or 100 years will be able to reproduce and directly compare data from species surveys, population surveys, and management regimes.
隨著NBI繼續支持研究并向該存儲庫添加文件,發布原始數據(而不僅僅是項目報告)將變得尤為重要。 有了這些數據,研究人員將能夠在10、50或100年內重現并直接比較物種調查,種群調查和管理制度中的數據。
存儲庫已被使用 (The Repository Is Already Being Used)
The icing on the cake is that since the repository became operational, it has already proven useful: I recently shared a dataset on Nantucket tarantulas with another spider researcher who was looking for a way to cite our observations.
錦上添花的是,自該庫投入運行以來,它已被證明是有用的:我最近與另一位蜘蛛研究人員共享了Nantucket tarantulas的數??據集,該研究人員正在尋找一種方法來引用我們的觀察結果。
I hope you consider publishing your data whenever possible and choose to follow the FAIR principles. The open science community is growing rapidly and offers numerous resources for anyone to get started. I am always open to questions and collaborations so please contact me if you’re interested in working together.
我希望您考慮在任何可能的時候發布數據,并選擇遵循FAIR原則。 開放式科學界正在Swift發展,并為任何人提供了眾多的資源。 我總是對問題和合作持開放態度,因此,如果您有興趣合作,請與我聯系。
翻譯自: https://medium.com/swlh/if-you-work-in-small-science-are-you-leveraging-data-repositories-357cabfc2326
小型數據庫
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388753.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388753.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388753.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!