力扣 Pandas 挑戰（6）---數據合并

本文圍繞力扣的Pandas簡單題集，解析如何用Pandas完成基礎數據處理任務，適合Pandas初學者學習。

題目1：1050. 合作過至少三次的演員和導演

題目描述：

ActorDirector 表：

+-------------+---------+
| Column Name | Type ? ?|
+-------------+---------+
| actor_id ? ?| int ? ? |
| director_id | int ? ? |
| timestamp ? | int ? ? |
+-------------+---------+
timestamp 是這張表的主鍵(具有唯一值的列).

編寫解決方案找出合作過至少三次的演員和導演的 id 對 (actor_id, director_id)

示例 1：

輸入：
ActorDirector 表：
+-------------+-------------+-------------+
| actor_id ? ?| director_id | timestamp ? |
+-------------+-------------+-------------+
| 1 ? ? ? ? ? | 1 ? ? ? ? ? | 0 ? ? ? ? ? |
| 1 ? ? ? ? ? | 1 ? ? ? ? ? | 1 ? ? ? ? ? |
| 1 ? ? ? ? ? | 1 ? ? ? ? ? | 2 ? ? ? ? ? |
| 1 ? ? ? ? ? | 2 ? ? ? ? ? | 3 ? ? ? ? ? |
| 1 ? ? ? ? ? | 2 ? ? ? ? ? | 4 ? ? ? ? ? |
| 2 ? ? ? ? ? | 1 ? ? ? ? ? | 5 ? ? ? ? ? |
| 2 ? ? ? ? ? | 1 ? ? ? ? ? | 6 ? ? ? ? ? |
+-------------+-------------+-------------+
輸出：
+-------------+-------------+
| actor_id ? ?| director_id |
+-------------+-------------+
| 1 ? ? ? ? ? | 1 ? ? ? ? ? |
+-------------+-------------+
解釋：
唯一的 id 對是 (1, 1)，他們恰好合作了 3 次。

解題思路：

方法1：使用 value_counts() 直接統計每對的出現次數。

方法2：按actor_id和director_id分為兩組，使用size（）計算每組行數，篩選出大于等于3的數據。

題目代碼：

方法1：

import pandas as pd
def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:# 使用value_counts()直接統計counts = actor_director[['actor_id', 'director_id']].value_counts()# 篩選并重置索引result = counts[counts >= 3].reset_index()[['actor_id', 'director_id']]return result

方法2：

import pandas as pd
def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:# 統計每對演員和導演的合作次數collaboration_counts = actor_director.groupby(['actor_id', 'director_id']).size().reset_index(name='count')# 篩選次數≥3的result = collaboration_counts[collaboration_counts['count'] >= 3]return result[['actor_id', 'director_id']]

題目2：1378. 使用唯一標識碼替換員工ID

題目描述：

Employees 表：

+---------------+---------+
| Column Name ? | Type ? ?|
+---------------+---------+
| id ? ? ? ? ? ?| int ? ? |
| name ? ? ? ? ?| varchar |
+---------------+---------+
在 SQL 中，id 是這張表的主鍵。
這張表的每一行分別代表了某公司其中一位員工的名字和 ID 。

EmployeeUNI 表：

+---------------+---------+
| Column Name ? | Type ? ?|
+---------------+---------+
| id ? ? ? ? ? ?| int ? ? |
| unique_id ? ? | int ? ? |
+---------------+---------+
在 SQL 中，(id, unique_id) 是這張表的主鍵。
這張表的每一行包含了該公司某位員工的 ID 和他的唯一標識碼（unique ID）。

展示每位用戶的唯一標識碼（unique ID ）；如果某位員工沒有唯一標識碼，使用 null 填充即可。

你可以以任意順序返回結果表。

解題思路：

使用merge（）方法根據id將數據合并，如果沒有對應的數據，則填充為NaN。

題目代碼：

import pandas as pddef replace_employee_id(employees: pd.DataFrame, employee_uni: pd.DataFrame) -> pd.DataFrame:#根據id將數據合并，無對應數據則填充為NaNemployee_name_uni = pd.merge(employees, employee_uni, on='id', how='left')return employee_name_uni[['unique_id', 'name']]

題目3：1280. 學生們參加各科測試的次數

題目描述：

學生表: Students

科目表: Subjects

考試表: Examinations

查詢出每個學生參加每一門科目測試的次數，結果按 student_id 和 subject_name 排序。

解題思路：

該題目包含多種對dataframe數據的操作，分解為多個問題來解答。

首先按id和科目分組，并計算考試次數，合并dataframe數據，填充缺失值，最后按照升序排序。

題目代碼：

import pandas as pddef students_and_examinations(students: pd.DataFrame, subjects: pd.DataFrame,examinations: pd.DataFrame) -> pd.DataFrame:#按id和科目分組，并計算考試次數。grouped = examinations.groupby(['student_id', 'subject_name']).size().reset_index(name='attended_exams')# 獲取id和subject的所有組合all_id_subjects = pd.merge(students, subjects, how='cross')# 左連接id_subjects_count = pd.merge(all_id_subjects, grouped, on=['student_id', 'subject_name'], how='left')#缺失值填充id_subjects_count['attended_exams'] = id_subjects_count['attended_exams'].fillna(0).astype(int)#升序排序id_subjects_count.sort_values(['student_id', 'subject_name'], inplace=True)return id_subjects_count[['student_id', 'student_name', 'subject_name', 'attended_exams']]

題目4：570. 至少有5名直接下屬的經理

題目描述：

表: Employee

編寫一個解決方案，找出至少有五個直接下屬的經理。

以任意順序返回結果表。

解題思路：

使用groupby（）方法按managerid分組，計算每組id數量，即經理的下屬數量，然后篩選出數量大于等于5個的數據id，再找到數據id對應的name數據。

題目代碼：

import pandas as pd
def find_managers(employee: pd.DataFrame) -> pd.DataFrame:#按managerId分組，計算每組id數量，即下屬數量subordinate_count = employee.groupby('managerId')['id'].count()#篩選出下屬數量大于等于5的數據managers_with_5_subordinates = subordinate_count[subordinate_count >= 5].index#找出篩選出的id所對應的姓名name數據result = employee[employee['id'].isin(managers_with_5_subordinates)]['name']return result.to_frame(name='name')

題目5：607. 銷售員

題目描述：

表: SalesPerson

表: Company

表: Orders

編寫解決方案，找出沒有任何與名為 “RED” 的公司相關的訂單的所有銷售人員的姓名。

以任意順序返回結果表。

解題思路：

找到與red有關的訂單，根據訂單找相關的銷售人員，找出不在這些銷售人員名單中的其他銷售人員。

題目代碼：

import pandas as pddef sales_person(sales_person: pd.DataFrame, company: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:#篩選與red有關的訂單red_company = company[company['name'] == 'RED']if red_company.empty:return sales_person[['name']]red_orders = orders[orders['com_id'] == red_company['com_id'].iloc[0]]# 找出這些訂單對應的銷售人員IDred_sales_ids = red_orders['sales_id'].unique()# 找出不在這些銷售人員名單中的所有銷售人員non_red_sales = sales_person[~sales_person['sales_id'].isin(red_sales_ids)]return non_red_sales[['name']]