sql 左聯接 全聯接
The last couple of blogs that I have written have been great for beginners ( Data Concepts Without Learning To Code or Developing A Data Scientist’s Mindset). But, I would really like to push myself to create content for other members of my audience as well. So today, we are going to take a step into more intermediate data analysis territory *dun dun dun*…. and discuss self joins: what they are and how to use them to take your analysis to the next level.
我寫的最后兩個博客對初學者非常有用( 無需學習編碼或發展數據科學家思維方式的 數據概念 )。 但是,我真的很想推動自己為觀眾的其他成員創建內容。 因此,今天,我們將邁入更中間的數據分析領域* dun dun dun *...。 并討論自我聯接:它們是什么以及如何使用它們將您的分析提高到一個新的水平。
Disclaimer: This post assumes that you already understand how joins work in SQL. If you are not familiar with this concept yet, no worries at all! Save this article for later because I think it’ll definitely be useful as you master SQL in the future.
免責聲明:本文假定您已經了解聯接在SQL中的工作方式。 如果您還不熟悉這個概念,那就不用擔心了! 請保存本文以供以后使用,因為我認為將來掌握SQL肯定會很有用。
什么是“自”聯接? (What is a “Self” Join?)
A self join is actually as literal as it gets — it is joining a database table to itself. You can use any kind of join you want to perform a self join (left, right, inner, etc.) — what makes it the self join is that you use the same table on both sides. Just make sure that you select the correct join type for your specific scenario and desired outcome.
自我連接實際上就是獲得的字面量-它是將數據庫表連接到自身。 您可以使用任何類型的聯接來執行自聯接(左,右,內部等)—使之成為自聯接的原因是您在兩邊都使用相同的表。 只需確保為您的特定方案和所需結果選擇正確的聯接類型即可。
我應該何時使用自我加入? (When Should I Use a Self Join?)
If you’ve been working or studying in the field of data analytics and data science for more than, say, 5 minutes, you’ll know that there are always 27 ways to solve a problem. Some are better than others of course, but sometimes the differences are almost indiscernible.
如果您在數據分析和數據科學領域從事了超過5分鐘的工作或學習,那么您將知道總有27種方法可以解決問題。 當然,有些比其他的要好,但是有時差異幾乎是看不到的。
That being said, there is probably never going to be one exact case where you MUST HAVE a self join or your analysis will shrivel up and die with nowhere to turn. *drop me a scenario in the comments below if you’ve got one, of course*
話雖這么說,可能永遠不會有一個確切的案例,您必須進行自我加入,否則您的分析將崩潰而死,無處可去。 *如果有,請在下面的評論中給我一個方案,*
But, I do at least have some scenarios where I have used self joins to solve my analytics problems, at work or in personal analysis. Here’s my own spin on two of the best (AKA the ones that I remember and can think of a good example for).
但是,至少在某些情況下,我在工作中或個人分析中使用了自我聯接來解決我的分析問題。 這是我自己選出的兩個最好的(也就是我記得并可以想到的一個很好的例子)。
方案1:消息/響應 (Scenario #1: Message/Response)
Suppose that there exists a database table called Chats that holds all of the chat messages that have been sent or received by an online clothing store business.
假設存在一個名為Chats的數據庫表,其中包含在線服裝店業務已發送或接收的所有聊天消息。

It would be extremely beneficial for the clothing store owner to know how long it usually takes her to respond to messages from her customers.
對于服裝店的老板來說,知道她通常需要多長時間才能響應來自客戶的消息,這將是非常有益的。
But, the messages from her customers and messages to her customers are in the same data source. So, we can use a self join to query the data and provide this analysis to the store owner. We will need one copy of the Chats table to get the initial message from the customer and one copy of the Chats table to get the response from the owner. Then, we can do some date math on the dates associated with those events to figure out how long the store owner is taking to respond.
但是,來自她的客戶的消息和發給她的客戶的消息位于同一數據源中。 因此,我們可以使用自我聯接來查詢數據并將此分析提供給商店所有者。 我們將需要一個Chats表副本來獲取來自客戶的初始消息,并需要一個Chats表副本來獲取所有者的響應。 然后,我們可以對與這些事件相關的日期進行一些日期數學運算,以確定商店所有者需要花多長時間進行響應。
I would write this hypothetical self join query as the following:
我將這個假設的自我聯接查詢編寫如下:
SELECT
msg.MessageDateTime AS CustomerMessageDateTime,
resp.MessageDateTime AS ResponseDateTime,
DATEDIFF(day, msg.MessageDateTime, resp.MessageDateTime)
AS DaysToRespond
FROM
Chats msg
INNER JOIN resp ON msg.MsgId = resp.RespondingTo
Note: This SQL query is written using Transact-SQL. Use whatever date functions work for your database at hand.
注意:此SQL查詢是使用Transact-SQL編寫的。 使用適用于您的數據庫的任何日期函數。
This query is relatively straightforward, since the RespondingTo column gives us a one-to-one mapping of which original message to join back to.
該查詢相對簡單,因為RespondingTo列為我們提供了將原始消息加入其中的一對一映射。
方案2:開啟/關閉 (Scenario #2: On/Off)
Let’s say this time you are presented a database table AccountActivity that holds a log of events that can occur on a yoga subscription site. The yoga site offers certain “premium trial periods” where customers can get a discounted membership rate for some period when they first join. The trials starting and ending date are tracked in this table with the Premium_Start and PremiumEnd event types.
假設這次是為您提供一個數據庫表AccountActivity,該表包含一個瑜伽預訂網站上可能發生的事件的日志。 瑜伽網站提供某些“高級試用期”,在此期間,客戶在首次加入時可以享受一定的折扣會員價。 在此表中,使用Premium_Start和PremiumEnd事件類型跟蹤審判的開始和結束日期。

Suppose that some employees on the business side at this yoga subscription company are asking 1. how many people have the premium trial period currently active, and 2. how many used to have the premium trial period active but now they don’t.
假設這家瑜伽訂閱公司的業務方面的一些員工在問:1.有多少人當前處于保費試用期,并且2.有多少人以前有保費試用期,但現在卻沒有。
Again, we’ve got the event for the premium period being started and the premium period being ended in the same database table (along with the other account activity as well).
再次,我們在同一數據庫表中(同時還有其他帳戶活動)開始了保費期開始和保費期結束的事件。
分析請求A:高級試用期內的帳戶 (Analysis Request A: Accounts in Premium Trial Period)
To answer the first question, we need to find events where a premium membership was started but has not been ended yet. So, we need to join the AccountActivity table to itself to look for premium start and premium end event matches. But, we can’t use an inner join this time. We need the null rows in the end table… so left join it is.
要回答第一個問題,我們需要找到開始高級會員資格但尚未結束的活動。 因此,我們需要將AccountActivity表自身連接起來,以查找高級開始事件和高級結束事件匹配項。 但是,這次我們不能使用內部聯接。 我們需要終端表中的空行…因此需要左連接。
SELECT
t_start.UserId,
t_start.EventDateTime AS PremiumTrialStart,
DATEDIFF(day, t_start.EventDateTime, GETDATE()) AS DaysInTrial
FROM
AccountActivity t_start
LEFT JOIN AccountActivity t_end ON t_start.UserId = t_end.UserId
AND t_start.EventType = 'Premium_Start'
AND t_end.EventType = 'Premium_End'
WHERE
t_end.EventDateTime IS NULL
Notice how we also check and make sure that the events we are joining are in the right order. We want the premium trial start on the left side of the join, and the premium trial end on the right side of the join. We also make sure that the User Id matches on both sides. We wouldn’t want to join events from two different customers!
請注意,我們還如何檢查并確保我們加入的事件的順序正確。 我們希望高級試用版在聯接的左側開始,而高級試用版在聯接的右側結束。 我們還確保用戶ID在兩側都匹配。 我們不想參加來自兩個不同客戶的活動!
分析請求B:曾經處于高級試用期的帳戶 (Analysis Request B: Accounts Who Used to Be in Premium Trial Period)
Regarding the second question, we want to find the customers whose sweet premium trial has come to an end. We are going to need to self join AccountActivity again, but this time we can switch it up to be a little stricter. We want matches from both the left and right, since, in this population, the trial has ended. So, we can choose an inner join this time.
關于第二個問題,我們想找到甜蜜溢價試用期已結束的客戶。 我們將需要再次自行加入AccountActivity,但是這次我們可以將其更改為更嚴格一些。 我們需要左右匹配,因為在此人群中,審判已經結束。 因此,這次我們可以選擇一個內部聯接。
SELECT
t_start.UserId,
t_start.EventDateTime AS PremiumTrialStart,
DATEDIFF(day, t_start.EventDateTime, t_end.EventDateTime)
AS DaysInTrial
FROM
AccountActivity t_start
INNER JOIN AccountActivity t_end ON t_start.UserId = t_end.UserId
AND t_start.EventType = 'Premium_Start'
AND t_end.EventType = 'Premium_End'
See, self joins are pretty fun. They can be pretty useful in cases where you have events that are related to each other in the same database table. Thanks for reading, and happy querying. 🙂
看到,自我加入很有趣。 在同一數據庫表中具有彼此相關的事件的情況下,它們非常有用。 感謝您的閱讀和查詢。 🙂
Originally published at https://datadreamer.io on August 13, 2020.
最初于 2020年8月13日 發布在 https://datadreamer.io 。
翻譯自: https://towardsdatascience.com/take-your-sql-skills-to-the-next-level-by-understanding-the-self-join-75f1d52f2322
sql 左聯接 全聯接
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388107.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388107.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388107.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!