sql 左聯接 全聯接_通過了解自我聯接將您SQL技能提升到一個新的水平

sql 左聯接 全聯接

The last couple of blogs that I have written have been great for beginners ( Data Concepts Without Learning To Code or Developing A Data Scientist’s Mindset). But, I would really like to push myself to create content for other members of my audience as well. So today, we are going to take a step into more intermediate data analysis territory *dun dun dun*…. and discuss self joins: what they are and how to use them to take your analysis to the next level.

我寫的最后兩個博客對初學者非常有用( 無需學習編碼或發展數據科學家思維方式的 數據概念 )。 但是,我真的很想推動自己為觀眾的其他成員創建內容。 因此,今天,我們將邁入更中間的數據分析領域* dun dun dun *...。 并討論自我聯接:它們是什么以及如何使用它們將您的分析提高到一個新的水平。

Disclaimer: This post assumes that you already understand how joins work in SQL. If you are not familiar with this concept yet, no worries at all! Save this article for later because I think it’ll definitely be useful as you master SQL in the future.

免責聲明:本文假定您已經了解聯接在SQL中的工作方式。 如果您還不熟悉這個概念,那就不用擔心了! 請保存本文以供以后使用,因為我認為將來掌握SQL肯定會很有用。

什么是“自”聯接? (What is a “Self” Join?)

A self join is actually as literal as it gets — it is joining a database table to itself. You can use any kind of join you want to perform a self join (left, right, inner, etc.) — what makes it the self join is that you use the same table on both sides. Just make sure that you select the correct join type for your specific scenario and desired outcome.

自我連接實際上就是獲得的字面量-它是將數據庫表連接到自身。 您可以使用任何類型的聯接來執行自聯接(左,右,內部等)—使之成為自聯接的原因是您在兩邊都使用相同的表。 只需確保為您的特定方案和所需結果選擇正確的聯接類型即可。

我應該何時使用自我加入? (When Should I Use a Self Join?)

If you’ve been working or studying in the field of data analytics and data science for more than, say, 5 minutes, you’ll know that there are always 27 ways to solve a problem. Some are better than others of course, but sometimes the differences are almost indiscernible.

如果您在數據分析和數據科學領域從事了超過5分鐘的工作或學習,那么您將知道總有27種方法可以解決問題。 當然,有些比其他的要好,但是有時差異幾乎是看不到的。

That being said, there is probably never going to be one exact case where you MUST HAVE a self join or your analysis will shrivel up and die with nowhere to turn. *drop me a scenario in the comments below if you’ve got one, of course*

話雖這么說,可能永遠不會有一個確切的案例,您必須進行自我加入,否則您的分析將崩潰而死,無處可去。 *如果有,請在下面的評論中給我一個方案,*

But, I do at least have some scenarios where I have used self joins to solve my analytics problems, at work or in personal analysis. Here’s my own spin on two of the best (AKA the ones that I remember and can think of a good example for).

但是,至少在某些情況下,我在工作中或個人分析中使用了自我聯接來解決我的分析問題。 這是我自己選出的兩個最好的(也就是我記得并可以想到的一個很好的例子)。

方案1:消息/響應 (Scenario #1: Message/Response)

Suppose that there exists a database table called Chats that holds all of the chat messages that have been sent or received by an online clothing store business.

假設存在一個名為Chats的數據庫表,其中包含在線服裝店業務已發送或接收的所有聊天消息。

Image for post

It would be extremely beneficial for the clothing store owner to know how long it usually takes her to respond to messages from her customers.

對于服裝店的老板來說,知道她通常需要多長時間才能響應來自客戶的消息,這將是非常有益的。

But, the messages from her customers and messages to her customers are in the same data source. So, we can use a self join to query the data and provide this analysis to the store owner. We will need one copy of the Chats table to get the initial message from the customer and one copy of the Chats table to get the response from the owner. Then, we can do some date math on the dates associated with those events to figure out how long the store owner is taking to respond.

但是,來自她的客戶的消息和發給她的客戶的消息位于同一數據源中。 因此,我們可以使用自我聯接來查詢數據并將此分析提供給商店所有者。 我們將需要一個Chats表副本來獲取來自客戶的初始消息,并需要一個Chats表副本來獲取所有者的響應。 然后,我們可以對與這些事件相關的日期進行一些日期數學運算,以確定商店所有者需要花多長時間進行響應。

I would write this hypothetical self join query as the following:

我將這個假設的自我聯接查詢編寫如下:

SELECT
msg.MessageDateTime AS CustomerMessageDateTime,
resp.MessageDateTime AS ResponseDateTime,
DATEDIFF(day, msg.MessageDateTime, resp.MessageDateTime)
AS DaysToRespond
FROM
Chats msg
INNER JOIN resp ON msg.MsgId = resp.RespondingTo

Note: This SQL query is written using Transact-SQL. Use whatever date functions work for your database at hand.

注意:此SQL查詢是使用Transact-SQL編寫的。 使用適用于您的數據庫的任何日期函數。

This query is relatively straightforward, since the RespondingTo column gives us a one-to-one mapping of which original message to join back to.

該查詢相對簡單,因為RespondingTo列為我們提供了將原始消息加入其中的一對一映射。

方案2:開啟/關閉 (Scenario #2: On/Off)

Let’s say this time you are presented a database table AccountActivity that holds a log of events that can occur on a yoga subscription site. The yoga site offers certain “premium trial periods” where customers can get a discounted membership rate for some period when they first join. The trials starting and ending date are tracked in this table with the Premium_Start and PremiumEnd event types.

假設這次是為您提供一個數據庫表AccountActivity,該表包含一個瑜伽預訂網站上可能發生的事件的日志。 瑜伽網站提供某些“高級試用期”,在此期間,客戶在首次加入時可以享受一定的折扣會員價。 在此表中,使用Premium_Start和PremiumEnd事件類型跟蹤審判的開始和結束日期。

Image for post

Suppose that some employees on the business side at this yoga subscription company are asking 1. how many people have the premium trial period currently active, and 2. how many used to have the premium trial period active but now they don’t.

假設這家瑜伽訂閱公司的業務方面的一些員工在問:1.有多少人當前處于保費試用期,并且2.有多少人以前有保費試用期,但現在卻沒有。

Again, we’ve got the event for the premium period being started and the premium period being ended in the same database table (along with the other account activity as well).

再次,我們在同一數據庫表中(同時還有其他帳戶活動)開始了保費期開始和保費期結束的事件。

分析請求A:高級試用期內的帳戶 (Analysis Request A: Accounts in Premium Trial Period)

To answer the first question, we need to find events where a premium membership was started but has not been ended yet. So, we need to join the AccountActivity table to itself to look for premium start and premium end event matches. But, we can’t use an inner join this time. We need the null rows in the end table… so left join it is.

要回答第一個問題,我們需要找到開始高級會員資格但尚未結束的活動。 因此,我們需要將AccountActivity表自身連接起來,以查找高級開始事件和高級結束事件匹配項。 但是,這次我們不能使用內部聯接。 我們需要終端表中的空行…因此需要左連接。

SELECT
t_start.UserId,
t_start.EventDateTime AS PremiumTrialStart,
DATEDIFF(day, t_start.EventDateTime, GETDATE()) AS DaysInTrial
FROM
AccountActivity t_start
LEFT JOIN AccountActivity t_end ON t_start.UserId = t_end.UserId
AND t_start.EventType = 'Premium_Start'
AND t_end.EventType = 'Premium_End'
WHERE
t_end.EventDateTime IS NULL

Notice how we also check and make sure that the events we are joining are in the right order. We want the premium trial start on the left side of the join, and the premium trial end on the right side of the join. We also make sure that the User Id matches on both sides. We wouldn’t want to join events from two different customers!

請注意,我們還如何檢查并確保我們加入的事件的順序正確。 我們希望高級試用版在聯接的左側開始,而高級試用版在聯接的右側結束。 我們還確保用戶ID在兩側都匹配。 我們不想參加來自兩個不同客戶的活動!

分析請求B:曾經處于高級試用期的帳戶 (Analysis Request B: Accounts Who Used to Be in Premium Trial Period)

Regarding the second question, we want to find the customers whose sweet premium trial has come to an end. We are going to need to self join AccountActivity again, but this time we can switch it up to be a little stricter. We want matches from both the left and right, since, in this population, the trial has ended. So, we can choose an inner join this time.

關于第二個問題,我們想找到甜蜜溢價試用期已結束的客戶。 我們將需要再次自行加入AccountActivity,但是這次我們可以將其更改為更嚴格一些。 我們需要左右匹配,因為在此人群中,審判已經結束。 因此,這次我們可以選擇一個內部聯接。

SELECT
t_start.UserId,
t_start.EventDateTime AS PremiumTrialStart,
DATEDIFF(day, t_start.EventDateTime, t_end.EventDateTime)
AS DaysInTrial
FROM
AccountActivity t_start
INNER JOIN AccountActivity t_end ON t_start.UserId = t_end.UserId
AND t_start.EventType = 'Premium_Start'
AND t_end.EventType = 'Premium_End'

See, self joins are pretty fun. They can be pretty useful in cases where you have events that are related to each other in the same database table. Thanks for reading, and happy querying. 🙂

看到,自我加入很有趣。 在同一數據庫表中具有彼此相關的事件的情況下,它們非常有用。 感謝您的閱讀和查詢。 🙂

Originally published at https://datadreamer.io on August 13, 2020.

最初于 2020年8月13日 發布在 https://datadreamer.io

翻譯自: https://towardsdatascience.com/take-your-sql-skills-to-the-next-level-by-understanding-the-self-join-75f1d52f2322

sql 左聯接 全聯接

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388107.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388107.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388107.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

如何查看linux中文件打開情況

如何查看linux中文件打開情況 前言 我們都知道,在linux下,“一切皆文件”,因此有時候查看文件的打開情況,就顯得格外重要,而這里有一個命令能夠在這件事上很好的幫助我們-它就是lsof。 linux下有哪些文件 在介紹lsof命…

hadoop windows

1、安裝JDK1.6或更高版本 官網下載JDK,安裝時注意,最好不要安裝到帶有空格的路徑名下,例如:Programe Files,否則在配置Hadoop的配置文件時會找不到JDK(按相關說法,配置文件中的路徑加引號即可解決&#xff…

Ocelot中文文檔入門

入門 Ocelot僅適用于.NET Core,目前是根據netstandard2.0構建的,如果Ocelot適合您,這個文檔可能會有用。 .NET Core 2.1 安裝NuGet包 使用nuget安裝Ocelot及其依賴項。 您需要創建一個netstandard2.0項目并將其打包到其中。 然后按照下面的“…

科學價值 社交關系 大數據_服務的價值:數據科學和用戶體驗研究美好生活

科學價值 社交關系 大數據A crucial part of building a product is understanding exactly how it provides your customers with value. Understanding this is understanding how you fit into the lives of your customers, and should be central to how you build on wha…

在Ubuntu下創建hadoop組和hadoop用戶

一、在Ubuntu下創建hadoop組和hadoop用戶 增加hadoop用戶組,同時在該組里增加hadoop用戶,后續在涉及到hadoop操作時,我們使用該用戶。 1、創建hadoop用戶組 2、創建hadoop用戶 sudo adduser -ingroup hadoop hadoop 回車后會提示輸入新的UNIX…

day06 hashlib模塊

1.hashlib模塊 import hashlib# password123456# print( password.encode()) #加密前先轉成二進制# mhashlib.md5(password.encode())# print(dir(m)) #查詢m的所有方法## print(m.hexdigest()) #使用hexdigest()方法加密,md5加密之后是不可逆,不可以解…

vs azure web_在Azure中遷移和自動化Chrome Web爬網程序的指南。

vs azure webWebscraping as a required skill for many data-science related jobs is becoming increasingly desirable as more companies slowly migrate their processes to the cloud.隨著越來越多的公司將其流程緩慢遷移到云中,將Web爬網作為許多與數據科學相…

hadoop eclipse windows

首先說一下本人的環境: Windows7 64位系統 Spring Tool Suite Version: 3.4.0.RELEASE Hadoop2.6.0 一.簡介 Hadoop2.x之后沒有Eclipse插件工具,我們就不能在Eclipse上調試代碼,我們要把寫好的java代碼的MapReduce打包成jar然后在Linux上運…

同步函數死鎖現象

多線程:一個進程中有多個線程可以同時執行任務 多線程的好處: 1、解決一個進程中可以同時執行多個任務的問題。 2、提高了資源利用率 多線程的弊端: 1、增加了CPU的負擔 2、降低了一個進程中線程的執行概率 3、出現了線程安全問題 4、會引發死…

netstat 在windows下和Linux下查看網絡連接和端口占用

假設忽然起個服務,告訴我8080端口被占用了,OK,我要去看一下是什么服務正在占用著,能不能殺 先假設我是在Windows下: 第一列: Proto 協議 第二列: 本地地址【ip端口】 第三列:遠程地址…

selenium 解析網頁_用Selenium進行網頁搜刮

selenium 解析網頁網頁抓取系列 (WEB SCRAPING SERIES) 總覽 (Overview) Selenium is a portable framework for testing web applications. It is open-source software released under the Apache License 2.0 that runs on Windows, Linux and macOS. Despite serving its m…

表的設計與優化

單表設計與優化 1)設計規范化表,消除數據冗余(以使用正確字段類型最明顯): 數據庫范式是確保數據庫結構合理,滿足各種查詢需要、避免數據庫操作異常的數據庫設計方式。滿足范式要求的表,稱為規范…

代理ARP協議(Proxy ARP)

代理ARP(Proxy-arp)的原理就是當出現跨網段的ARP請求時,路由器將自己的MAC返回給發送ARP廣播請求發送者,實現MAC地址代理(善意的欺騙),最終使得主機能夠通信。 圖中R1和R3處于不同的局域網&…

hive 導入hdfs數據_將數據加載或導入運行在基于HDFS的數據湖之上的Hive表中的另一種方法。

hive 導入hdfs數據Preceding pen down the article, might want to stretch out appreciation to all the wellbeing teams beginning from cleaning/sterile group to Nurses, Doctors and other who are consistently battling to spare the mankind from continuous Covid-1…

Java性能優化

一、避免在循環條件中使用復雜表達式 在不做編譯優化的情況下,在循環中,循環條件會被反復計算,如果不使用復雜表達式,而使循環條件值不變的話,程序將會運行的更快。 例子: import java.util.vector; class …

對Faster R-CNN的理解(1)

目標檢測是一種基于目標幾何和統計特征的圖像分割,最新的進展一般是通過R-CNN(基于區域的卷積神經網絡)來實現的,其中最重要的方法之一是Faster R-CNN。 1. 總體結構 Faster R-CNN的基本結構如下圖所示,其基礎是深度全…

大數據業務學習筆記_學習業務成為一名出色的數據科學家

大數據業務學習筆記意見 (Opinion) A lot of aspiring Data Scientists think what they need to become a Data Scientist is :許多有抱負的數據科學家認為,成為一名數據科學家需要具備以下條件: Coding 編碼 Statistic 統計 Math 數學 Machine Learni…

postman 請求參數為數組及JsonObject

2019獨角獸企業重金招聘Python工程師標準>>> 1. (1)數組的請求方式(post) https://blog.csdn.net/qq_21205435/article/details/81909184 (2)數組的請求方式(get) http://localhost:port/list?ages10,20,30 后端接收方式: PostMa…

領扣(LeetCode)對稱二叉樹 個人題解

給定一個二叉樹,檢查它是否是鏡像對稱的。 例如,二叉樹 [1,2,2,3,4,4,3] 是對稱的。 1/ \2 2/ \ / \ 3 4 4 3但是下面這個 [1,2,2,null,3,null,3] 則不是鏡像對稱的: 1/ \2 2\ \3 3說明: 如果你可以運用遞歸和迭代兩種方法解決這個問題&#…

python 開發api_使用FastAPI和Python快速開發高性能API

python 開發apiIf you have read some of my previous Python articles, you know I’m a Flask fan. It is my go-to for building APIs in Python. However, recently I started to hear a lot about a new API framework for Python called FastAPI. After building some AP…