電子表格轉換成數據庫_創建數據庫,將電子表格轉換為關系數據庫,第1部分...

電子表格轉換成數據庫

Part 1: Creating an Entity Relational Diagram (ERD)

第1部分:創建實體關系圖(ERD)

A Relational Database Management System (RDMS) is a program that allows us to create, update, and manage a relational database. Structured Query Language (SQL) is a programming language used to communicate with data stored in the RDMS. The SQL skill for using a RDMA is required for many data-related positions these days. In the social media forums like Quora or Reddit, there are many people who search for a public database for practicing their SQL querying skills. However, although there are many public data sets in a single spreadsheet, there are not many public databases online. Even if you found a data set for the topic you have interest in, the format of the data is usually just one spreadsheet, not a database for most cases. Therefore, it will be very useful to know how to convert a data set in one spreadsheet to a database with multiple tables fitting a relational database format. Knowing the process of this conversion can give us many chances to practice SQL querying skills with a variety of databases.

關系數據庫管理系統 (RDMS)是允許我們創建,更新和管理關系數據庫的程序。 結構化查詢語言 (SQL)是一種編程語言,用于與RDMS中存儲的數據進行通信。 如今,與許多數據相關的職位都需要使用RDMASQL技能。 在QuoraReddit等社交媒體論壇中,有很多人在搜索公共數據庫以練習其SQL查詢技能。 但是,盡管單個電子表格中有許多公共數據集,但在線的公共數據庫并不多。 即使您找到了感興趣的主題的數據集,數據的格式通常也只是一個電子表格,在大多數情況下不是數據庫。 因此,了解如何將一個電子表格中的數據集轉換為具有多個符合關系數據庫格式的表的數據庫將非常有用。 了解這種轉換的過程可以使我們有很多機會來練習各種數據庫SQL查詢技能。

This is the first article in a two part series. The goal of this series is to show how to create a relational database for SQL. The whole process is to convert a data in one spreadsheet to a relational database for SQL. In this first article I create an Entity Relational Diagram (ERD) which is a graphical representation showing the relationships between entities.

這是分兩部分的系列文章中的第一篇。 本系列的目的是展示如何為SQL創建關系數據庫。 整個過程是將一個電子表格中的數據轉換為SQL的關系數據庫。 在第一篇文章中,我創建一個實體關系圖(ERD) ,該圖是顯示實體之間關系的圖形表示。

數據集 (The Data Set)

Let’s find a data set for this practice. There are many public data on the Kaggle dataset. Among a bunch of data sets, I selected a data set named Sample Sales Data. The following pictures show the original format of the data which is contained in one spreadsheet.

讓我們為該練習找到一個數據集。 Kaggle數據集上有許多公共數據。 在一系列數據集中,我選擇了一個名為Sample Sales Data的數據集 下圖顯示了一個電子表格中包含的數據的原始格式。

Image for post
Image for post
Sample Sales Data from the Kaggle Dataset
來自Kaggle數據集的樣本銷售數據

The data has 25 columns and 2824 rows including headers. The list of the headers is as follows:

數據有25列和2824行,包括標題。 標頭列表如下:

  • ORDERNUMBER: the identification number for each order

    ORDERNUMBER :每個訂單的標識號

  • QUANTITYORDERED: the quantity ordered

    QUANTITYORDERED :訂購的數量

  • PRICEEACH: the actual price paid for the transaction (variable across transactions)

    PRICEEACH :為交易支付的實際價格(隨交易而變化)

  • ORDERLINENUMBER: the number of the order line

    ORDERLINENUMBER :訂單行的編號

  • SALES: the amount of sales

    銷售 :銷售額

  • ORDERDATE: the order date

    ORDERDATE :訂單日期

  • STATUS: the shipping status (Shipped, Resolved, Cancelled, On Hold, Disputed, and In Progress)

    狀態 :運輸狀態(已發貨,已解決,已取消,保留,有爭議和進行中)

  • QTR_ID: the quarter of the order date

    QTR_ID :訂購日期的四分之一

  • MONTH_ID: the month of the order date

    MONTH_ID :訂購日期的月份

  • YEAR_ID: the year of the order date

    YEAR_ID :訂單日期的年份

  • PRODUCTLINE: the category of products

    PRODUCTLINE :產品類別

  • MSRP: the manufacture’s suggested retail price (constant across transactions)

    MSRP :制造商的建議零售價(在交易中保持不變)

  • PRODUCTCODE: the identification code for each product

    PRODUCTCODE :每個產品的識別碼

  • CUSTOMERNAME: the names of customers

    客戶名稱:客戶名稱

  • PHONE: the phone numbers of customers

    電話 :客戶的電話號碼

  • ADDRESSLINE1: addressline 1 for customers

    ADDRESSLINE1 :客戶的地址行1

  • ADDRESSLINE2: address line 2 for customers

    ADDRESSLINE2 :客戶的地址行2

  • CITY: city names for customers

    城市 :客戶的城市名稱

  • STATE: state names for customers (only for customers located in the US)

    STATE :客戶的州名(僅適用于位于美國的客戶)

  • POSTALCODE: postal codes for customers

    郵政編碼 :為客戶郵政編碼

  • COUNTRY: countries for customers

    國家 :客戶所在的國家

  • TERRITORY: the regional names of each country (NA, EMEA, Japan, and APAC)

    地區 :每個國家的區域名稱(NA,EMEA,日本和亞太地區)

  • CONTACTLASTNAME and CONTACTFIRSTNAME: the last and first names of customers

    CONTACTLASTNAMECONTACTFIRSTNAME :客戶的姓氏和名字

  • DEALSIZE: the deal sizes of orders

    DEALSIZE :訂單的交易大小

創建實體關系圖(ERD) (Creating an Entity Relationship Diagram (ERD))

The information contained in the data can be basically divided into three entities: Customers, Products, and Orders.

數據中包含的信息基本上可以分為三個實體: 客戶產品訂單

顧客 (Customers)

Based on the columns in the data, the attributes to be included in the entity for Customers are CUSTOMERNAME, CONTACTLASTNAME, CONTACTFIRSTNAME, PHONE, ADDRESSLINE1, ADDRESSLINE2, POSTALCODE, CITY, STATE, COUNTRY, and TERRITORY. CustomerID is added into the entity as the primary key. And for convenience sake, the cases of the attributes are changed as follows:

根據數據中的列,要包含在“客戶”實體中的屬性是CUSTOMERNAMECONTACTLASTNAME,CONTACTFIRSTNAME, PHONEADDRESSLINE1ADDRESSLINE2POSTALCODECITY,STATE,COUNTRYTERRITORYCustomerID作為主鍵添加到實體中。 為了方便起見,屬性的情況如下更改:

Image for post

The problem of keeping City, State, Country, and Territory in the entity for Customers is that there will be many duplicate rows in the table for Customers. Therefore, entities for City, Country, and Territory are separately created, and then the links between them and Customers are made using foreign keys. The reason why the entity for State is not created is that there are many null values in State so that states do not determine countries and territories in this data.

保留城市國家地區的問題 客戶實體中 是表中的客戶會有很多重復的行。 因此,將分別創建用于城市國家地區的實體,然后使用外鍵在它們和客戶之間建立鏈接。 為什么不建立國有實體的原因是,在許多國家空值,這樣的狀態并不確定這個數據的國家和地區。

In the ERD, there are three types of relationships between entities: One-to-One, One-to-Many(or Many-to-One), and Many-to-Many. The relationships between Territories and Countries, Countries and Cities, and Cities and Customers are One-to-Many relationships. For example, each customer lives in one city in this data while one city has many customers. Therefore, using the Crow’s Foot Notation, the ERD between Customers, Cities, Countries, and Territories is as follows:

在ERD中,實體之間存在三種類型的關系: 一對一,一對多(或多對一)和多對多 。 區域國家國家城市以及城市客戶之間的關系是一對多關系。 例如,在此數據中,每個客戶居住在一個城市中,而一個城市有許多客戶。 因此,使用烏鴉腳符號 , 客戶之間的ERD, 城市國家 ,和地區如下:

Image for post
The Crow’s Foot Notations for Customers, Cities, Countries, and Territories
客戶,城市,國家和地區的魚尾紋符號

產品展示 (Products)

The attributes to be included in the entity for Products are PRODUCTCODE, PRODUCTLINE, MSRP, and ORDERNUMBER. Among these attributes, PRODUCTCODE can be a candidate for the primary key in the entity since it is unique and does not have null values. However, in order to keep the format of the primary keys consistent across entities, ProductID is added into the entity and is set to be the primary key. The letter cases are changed as follows:

產品實體中要包含的屬性是PRODUCTCODEPRODUCTLINEMSRPORDERNUMBER 。 在這些屬性中, PRODUCTCODE可以是實體中主鍵的候選者,因為它是唯一的并且沒有空值。 但是,為了使主鍵的格式在各個實體之間保持一致, 產品編號 被添加到實體中并設置為主鍵。 字母大小寫更改如下:

Image for post

As we saw in the case for the entity for Customers, including Productline in this entity generates duplicate rows as well. Therefore, the entity for ProductLines is created separately, and is linked to the entity for Products using a foreign key. Since each product is classified into one product line while each product line has many products, the relationship between ProductLines and Products is One-to-Many.

正如我們在“ 客戶 ”實體案例中看到的那樣,在該實體中包括Productline也會生成重復的行。 因此, ProductLine的實體是單獨創建的,并使用外鍵鏈接到Products的實體。 由于每個產品分類為一個產品線,而每個產品線具有許多產品,因此ProductLineProducts之間的關系為一對多

Image for post
The Crows’ Notation for ProductLines and Products
產品線和產品的烏鴉符號

This is not the final version for Products since OrderNumber will be included in the entity for Orders as well. Including OrderNumber in the entity for Orders will make some changes in the entity for Products. This will be covered after the entity for Orders is created.

這并非以來ORDERNUMBER 產品將包含在訂單的實體以及最終版本。 在訂單的實體,包括訂單號碼將在產品實體的一些變化。 創建訂單實體后將對此進行介紹。

命令 (Orders)

The entity for Orders should include ORDERNUMBER, ORDERDATE, PRODUCTCODE, STATUS, QUARTER_ID, MONTH_ID, YEAR_ID,and CUSTOMERNAME as its attributes. OrderID is added into this entity as the primary key. The names and cases of the attributes are changed as follows for convenience sake:

訂單實體應包括ORDERNUMBERORDERDATEPRODUCTCODESTATUSQUARTER_IDMONTH_IDYEAR_IDCUSTOMERNAME作為其屬性。 OrderID作為主鍵添加到該實體中。 為了方便起見,屬性的名稱和大小寫更改如下:

Image for post

In order to remove duplicate rows generated by including Status and CustomerName in this entity, the entities for Status and CustomerNames are created separately and are linked to the entity for Orders using foreign keys. The entity for CustomerNames (which is equivalent to the Customers) was already created previously. The relationships of Orders with Status and Customers are Many-to-One, respectively.

為了刪除通過在此實體中包含StatusCustomerName生成的重復行,分別創建了StatusCustomerNames的實體,并使用外鍵將其鏈接到Order的實體。 CustomerNames的實體(等同于Customer )是先前已創建的。 訂單狀態客戶的關系分別是多對一的

Image for post
The Crow’s Foot Notations for Orders, Customers, and Status
訂單,客戶和狀態的魚尾紋

訂單和產品 (Orders and Products)

Each order contains a list of products in it. This means that Orders and Products are related to each other. What is the relationship between these two entries? Each product can be ordered by multiple orders and each order can contain multiple products. Therefore, the relationship is many-to-many.

每個訂單中都包含一個產品列表。 這意味著訂單產品相互關聯。 這兩個條目之間是什么關系? 每個產品可以按多個訂單訂購,每個訂單可以包含多個產品。 因此,這種關系是多對多的

Image for post
The Crow’s Foot Notation for Orders and Products 1
訂單和產品的魚尾紋1

The way to create tables which are in the many-to-many relationship is to create a junction table between the two original tables and then to make one-to-many relationships between the original tables and the junction table. The junction table is the table for the combination of Orders and Products, and then one-to-many relationships are made between Orders and the junction, and between Products and the junction. The primary key for this junction entity should be the composite of OrderID and ProductID.

創建作為在許多一對多的關系表的方法是將原來的兩個表之間創建一個表,然后做一個對許多原始表和聯接表之間的關系。 聯結表是OrdersProducts組合的表,然后在Orders和聯結之間以及Products和聯結之間建立一對多關系。 此聯結實體的主鍵應為OrderID的組合 ProductID

The remaining columns which are not included in any entity so far are QUANTITYORDERED, PRICEEACH, ORDERLINENUMBER, SALES, and DEALSIZE. These columns cannot be determined by Orders or Products by themselves. For example, the quantity of a product in an order cannot be identified by an order or a product alone. Only the combination of an order and a product can identify it. Therefore, those columns are included in the junction entity as its attributes. The entities and their relationships for Orders and Products are as follows:

到目前為止,尚未包含在任何實體中的其余列為QUANTITYORDEREDPRICEEACHORDERLINENUMBERSALESDEALSIZE 。 這些列不能由訂單產品自行確定。 例如,訂單中的產品數量無法通過訂單或單獨的產品來識別。 只有訂單和產品的組合才能識別它。 因此,這些列作為其屬性包含在聯結實體中。 訂單產品的實體及其關系如下:

Image for post
The Crow’s Foot Notations for Orders and Products 2
訂單和產品的魚尾紋2

Due to duplicate rows, the entity for DealSizes is separately created and linked to the entity for Orders_Products using foreign key for SizeID.

由于重復行,為DealSizes實體被單獨創建并鏈接到實體使用SizeID外鍵Orders_Products。

整個ERD (The Entire ERD)

Now, we completed creating all the entities for this data and making relationships between them. The entire ERD is as follows:

現在,我們完成了為該數據創建所有實體并在它們之間建立關系。 整個ERD如下:

Image for post
The Final Version of the ERD for Sample Sales Data
樣品銷售數據ERD的最終版本

The usefulness of the ERD is that it makes us to easily understand the relationships between the columns (or the entities) in the data set and to see the entire structure of the data set at a glance.

ERD的有用之處在于,它使我們能夠輕松了解數據集中的列(或實體)之間的關系,并一目了然地看到數據集的整個結構。

下一步是什么? (What’s Next?)

This article covered how to create an Entity Relationship Diagram using a data set. The data for this practice is Sample Sales Data obtained from the Kaggle dataset. It is briefly covered how to divide the original data set into entities and make relationships between the entities.

本文介紹了如何使用數據集創建實體關系圖。 該實踐的數據是從Kaggle數據集獲得的樣本銷售數據 。 簡要介紹了如何將原始數據集劃分為實體以及如何在實體之間建立關系。

In the next article, the practice of converting a data in a single spread sheet to a relational database by using MySQL will be covered.

在下一篇文章中 ,將介紹通過使用MySQL將單個電子表格中的數據轉換為關系數據庫的實踐。

翻譯自: https://medium.com/swlh/creating-a-database-converting-a-spreadsheet-to-a-relational-database-part-1-2a9a228bf77a

電子表格轉換成數據庫

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389965.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389965.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389965.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

【Vue.js學習】生命周期及數據綁定

一、生命后期 官網的圖片說明: Vue的生命周期總結 var app new Vue({el:"#app", beforeCreate: function(){console.log(1-beforeCreate 初始化之前);//加載loading},created: function(){console.log(2-created 創建完成);//關閉loading},be…

5885. 使每位學生都有座位的最少移動次數

5885. 使每位學生都有座位的最少移動次數 一個房間里有 n 個座位和 n 名學生,房間用一個數軸表示。給你一個長度為 n 的數組 seats ,其中 seats[i] 是第 i 個座位的位置。同時給你一個長度為 n 的數組 students ,其中 students[j] 是第 j 位…

Springboot(2.0.0.RELEASE)+spark(2.1.0)框架整合到jar包成功發布(原創)!!!

一、前言 首先說明一下,這個框架的整合可能對大神來說十分容易,但是對我來說十分不易,踩了不少坑。雖然整合的時間不長,但是值得來紀念下!!!我個人開發工具比較喜歡IDEA,創建的sprin…

求一個張量的梯度_張量流中離散策略梯度的最小工作示例2 0

求一個張量的梯度Training discrete actor networks with TensorFlow 2.0 is easy once you know how to do it, but also rather different from implementations in TensorFlow 1.0. As the 2.0 version was only released in September 2019, most examples that circulate …

docker環境 快速使用elasticsearch-head插件

docker環境 快速使用elasticsearch-head插件 #elasticsearch配置 #進入elk容器 docker exec -it elk /bin/bash #head插件訪問配置 echo #head插件訪問# http.cors.enabled: true http.cors.allow-origin: "*" >>/etc/elasticsearch/elasticsearch.yml#重啟el…

476. 數字的補數

476. 數字的補數 給你一個 正 整數 num ,輸出它的補數。補數是對該數的二進制表示取反。 例 1:輸入:num 5 輸出:2 解釋:5 的二進制表示為 101(沒有前導零位),其補數為 010。所以你…

zabbix網絡發現主機

1 功能介紹 默認情況下,當我在主機上安裝agent,然后要在server上手動添加主機并連接到模板,加入一個主機組。 如果有很多主機,并且經常變動,手動操作就很麻煩。 網絡發現就是主機上安裝了agent,然后server自…

python股市_如何使用python和破折號創建儀表板來主導股市

python股市始終關注大局 (Keep Your Eyes on the Big Picture) I’ve been fascinated with the stock market since I was a little kid. There is certainly no shortage of data to analyze, and if you find an edge you can make some easy money. To stay on top of the …

阿里巴巴開源 Sentinel,進一步完善 Dubbo 生態

為什么80%的碼農都做不了架構師?>>> 阿里巴巴開源 Sentinel,進一步完善 Dubbo 生態 Sentinel 開源地址:https://github.com/alibaba/Sentinel 轉載于:https://my.oschina.net/dyyweb/blog/1925839

數據結構與算法 —— 鏈表linked list(01)

鏈表(維基百科) 鏈表(Linked list)是一種常見的基礎數據結構,是一種線性表,但是并不會按線性的順序存儲數據,而是在每一個節點里存到下一個節點的指針(Pointer)。由于不必須按順序存儲,鏈表在插入的時候可以…

離群值如何處理_有理處理離群值的局限性

離群值如何處理ARIMA models can be quite adept when it comes to modelling the overall trend of a series along with seasonal patterns.ARIMA模型可以很好地建模一系列總體趨勢以及季節性模式。 In a previous article titled SARIMA: Forecasting Seasonal Data with P…

網絡爬蟲基礎練習

0.可以新建一個用于練習的html文件,在瀏覽器中打開。 1.利用requests.get(url)獲取網頁頁面的html文件 import requests newsurlhttp://news.gzcc.cn/html/xiaoyuanxinwen/ res requests.get(newsurl) #返回response對象 res.encodingutf-8 2.利用BeautifulSoup的H…

10生活便捷:購物、美食、看病時這樣搜,至少能省一半心

本次課程介紹實實在在能夠救命、省錢的網站,解決了眼前這些需求后,還有“詩和遠方”——不花錢也能點亮自己的生活,獲得美的享受! 1、健康醫療這么搜,安全又便捷 現在的醫療市場確實有些混亂,由于醫療的專業…

ppt圖表圖表類型起始_梅科圖表

ppt圖表圖表類型起始There are different types of variable width bar charts but two are the most popular: 1) Bar Mekko chart; 2) Marimekko chart.可變寬度條形圖有不同類型,但最受歡迎的有兩種:1)Mekko條形圖; 2)Marimekko圖表。 Th…

Tomcat日志亂碼了怎么處理?

【前言】 tomacat日志有三個地方,分別是Output(控制臺)、Tomcat Localhost Log(tomcat本地日志)、Tomcat Catalina Log。 啟動日志和大部分報錯日志、普通日志都在output打印;有些錯誤日志,在Tomcat Localhost Log。 三個日志顯示區,都可能…

python 編碼規范

縮進 用4個空格來縮進代碼 分號 不要在行尾加分號, 也不要用分號將兩條命令放在同一行。 行長度 每行不超過80個字符 以下情況除外: l 長的導入模塊語句 l 注釋里的URL 不要使用反斜杠連接行。 Python會將 圓括號, 中括號和花括號中的行隱式的連接起來 , 你可以利用…

5888. 網絡空閑的時刻

5888. 網絡空閑的時刻 給你一個有 n 個服務器的計算機網絡,服務器編號為 0 到 n - 1 。同時給你一個二維整數數組 edges ,其中 edges[i] [ui, vi] 表示服務器 ui 和 vi 之間有一條信息線路,在 一秒 內它們之間可以傳輸 任意 數目的信息。再…

django框架預備知識

內容: 1.web預備知識 2.django介紹 3.web框架的本質及分類 4.django安裝與基本設置 1.web預備知識 HTTP協議:https://www.cnblogs.com/wyb666/p/9383077.html 關于web的本質:http://www.cnblogs.com/wyb666/p/9034042.html 如何自定義web框架…

現實世界 機器學習_公司溝通分析簡介現實世界的機器學習方法

現實世界 機器學習In my previous posts I covered analytical subjects from a scientific point of view, rather than an applied real world problem. For this reason, this article aims at approaching an analytical idea from a managerial point of view, rather tha…

拷貝構造函數和賦值函數

1、拷貝構造函數:用一個已經有的對象構造一個新的對象。 CA(const CA & c )函數的名稱必須和類名稱相一致,它的唯一的一個參數是本類型的一個引用變量,該參數是const 類型,不可變。 拷貝構造函數什么時…