題目列表:
- 數據統計:
- 2082. The Number of Rich Customers
- 1173. Immediate Food Delivery I
- 1907. Count Salary Categories
- 數據分組
- 1741. Find Total Time Spent by Each Employee
- 511. Game Play Analysis I
- 2356. Number of Unique Subjects Taught by Each Teacher
- 596. Classes More Than 5 Students
- 586. Customer Placing the Largest Number of Orders
- 1484. Group Sold Products By The Date(好題)
數據統計:
2082. The Number of Rich Customers
原題鏈接:2082. The Number of Rich Customers
Table: Store
+-------------+------+
| Column Name | Type |
+-------------+------+
| bill_id | int |
| customer_id | int |
| amount | int |
+-------------+------+
bill_id is the primary key (column with unique values) for this table.
Each row contains information about the amount of one bill and the customer associated with it.
Write a solution to report the number of customers who had at least one bill with an amount strictly greater than 500
.
The result format is in the following example.
Example 1:
Input:
Store table:
+---------+-------------+--------+
| bill_id | customer_id | amount |
+---------+-------------+--------+
| 6 | 1 | 549 |
| 8 | 1 | 834 |
| 4 | 2 | 394 |
| 11 | 3 | 657 |
| 13 | 3 | 257 |
+---------+-------------+--------+
Output:
+------------+
| rich_count |
+------------+
| 2 |
+------------+
Explanation:
Customer 1 has two bills with amounts strictly greater than 500.
Customer 2 does not have any bills with an amount strictly greater than 500.
Customer 3 has one bill with an amount strictly greater than 500.
題目大意:
統計一下amount大于500的customer的數目,一個customer會有多張單據,需要去重后統計
pandas思路:
nunique()
方法能直接返回去重后的個數
pandas實現:
import pandas as pddef count_rich_customers(store: pd.DataFrame) -> pd.DataFrame:store = store[store['amount'] > 500]return pd.DataFrame({'rich_count': [store['customer_id'].nunique()]})
MySQL思路:
用一下 count
和 distinct
MySQL實現:
# Write your MySQL query statement below
SELECTCOUNT(DISTINCT customer_id) AS rich_count
FROM Store
WHERE amount > 500
1173. Immediate Food Delivery I
原題鏈接:1173. Immediate Food Delivery I
Table: Delivery
+-----------------------------+---------+
| Column Name | Type |
+-----------------------------+---------+
| delivery_id | int |
| customer_id | int |
| order_date | date |
| customer_pref_delivery_date | date |
+-----------------------------+---------+
delivery_id is the primary key (column with unique values) of this table.
The table holds information about food delivery to customers that make orders at some date and specify a preferred delivery date (on the same order date or after it).
If the customer’s preferred delivery date is the same as the order date, then the order is called immediate; otherwise, it is called scheduled.
Write a solution to find the percentage of immediate orders in the table, rounded to 2 decimal places.
The result format is in the following example.
Example 1:
Input:
Delivery table:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1 | 1 | 2019-08-01 | 2019-08-02 |
| 2 | 5 | 2019-08-02 | 2019-08-02 |
| 3 | 1 | 2019-08-11 | 2019-08-11 |
| 4 | 3 | 2019-08-24 | 2019-08-26 |
| 5 | 4 | 2019-08-21 | 2019-08-22 |
| 6 | 2 | 2019-08-11 | 2019-08-13 |
+-------------+-------------+------------+-----------------------------+
Output:
+----------------------+
| immediate_percentage |
+----------------------+
| 33.33 |
+----------------------+
Explanation: The orders with delivery id 2 and 3 are immediate while the others are scheduled.
題目大意:
計算一個兩個日期相等的條目在整表中所占的比例,保留兩位小數
pandas思路:
保留兩位小數用 round()
pandas實現:
import pandas as pddef food_delivery(delivery: pd.DataFrame) -> pd.DataFrame:tmp = delivery[delivery['order_date'] == delivery['customer_pref_delivery_date']]return pd.DataFrame({'immediate_percentage': [round((tmp.shape[0] / delivery.shape[0] * 100) , 2)]})
MySQL思路:
用 avg()
計算占比,用round()
保留兩位小數
MySQL實現:
SELECTround( 100 * avg( order_date = customer_pref_delivery_date ), 2 ) AS immediate_percentage
FROMdelivery
1907. Count Salary Categories
原題鏈接:1907. Count Salary Categories
Table: Accounts
+-------------+------+
| Column Name | Type |
+-------------+------+
| account_id | int |
| income | int |
+-------------+------+
account_id is the primary key (column with unique values) for this table.
Each row contains information about the monthly income for one bank account.
Write a solution to calculate the number of bank accounts for each salary category. The salary categories are:
- “Low Salary”: All the salaries strictly less than
$20000
. - “Average Salary”: All the salaries in the inclusive range
[$20000, $50000]
. - “High Salary”: All the salaries strictly greater than
$50000
.
The result table must contain all three categories. If there are no accounts in a category, return0
.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input:
Accounts table:
+------------+--------+
| account_id | income |
+------------+--------+
| 3 | 108939 |
| 2 | 12747 |
| 8 | 87709 |
| 6 | 91796 |
±-----------±-------+
Output:
+----------------+----------------+
| category | accounts_count |
+----------------+----------------+
| Low Salary | 1 |
| Average Salary | 0 |
| High Salary | 3 |
+----------------+----------------+
Explanation:
Low Salary: Account 2.
Average Salary: No accounts.
High Salary: Accounts 3, 6, and 8.
題目大意:
按照規則統計低收入中等收入高收入三類人的人數
pandas思路:
統計數目比較容易,按行篩選后的行數即可。怎么組成題目要求返回的形式是本題的重點,用到 pd.DataFrame()
函數
pandas實現:
import pandas as pddef count_salary_categories(accounts: pd.DataFrame) -> pd.DataFrame:low_salary = accounts[accounts['income'] < 20000].shape[0]average_salary = accounts[(accounts['income'] >= 20000)&(accounts['income'] <= 50000)].shape[0]high_salary = accounts[accounts['income'] > 50000].shape[0]ans = pd.DataFrame({'category': ['Low Salary', 'Average Salary', 'High Salary'],'accounts_count': [low_salary, average_salary, high_salary]})return ans
MySQL思路:
case wen來按條件篩選,三個sql的結果union一下
MySQL實現:
SELECT 'Low Salary' AS category,SUM(CASE WHEN income < 20000 THEN 1 ELSE 0 END) AS accounts_count
FROM AccountsUNION
SELECT 'Average Salary' category,SUM(CASE WHEN income >= 20000 AND income <= 50000 THEN 1 ELSE 0 END) AS accounts_count
FROM AccountsUNION
SELECT 'High Salary' category,SUM(CASE WHEN income > 50000 THEN 1 ELSE 0 END) AS accounts_count
FROM Accounts
數據分組
1741. Find Total Time Spent by Each Employee
原題鏈接:1741. Find Total Time Spent by Each Employee
Table: Employees
+-------------+------+
| Column Name | Type |
+-------------+------+
| emp_id | int |
| event_day | date |
| in_time | int |
| out_time | int |
+-------------+------+
(emp_id, event_day, in_time) is the primary key (combinations of columns with unique values) of this table.
The table shows the employees' entries and exits in an office.
event_day is the day at which this event happened, in_time is the minute at which the employee entered the office, and out_time is the minute at which they left the office.
in_time and out_time are between 1 and 1440.
It is guaranteed that no two events on the same day intersect in time, and in_time < out_time.
Write a solution to calculate the total time in minutes spent by each employee on each day at the office. Note that within one day, an employee can enter and leave more than once. The time spent in the office for a single entry is out_time - in_time.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input:
Employees table:
+--------+------------+---------+----------+
| emp_id | event_day | in_time | out_time |
+--------+------------+---------+----------+
| 1 | 2020-11-28 | 4 | 32 |
| 1 | 2020-11-28 | 55 | 200 |
| 1 | 2020-12-03 | 1 | 42 |
| 2 | 2020-11-28 | 3 | 33 |
| 2 | 2020-12-09 | 47 | 74 |
+--------+------------+---------+----------+
Output:
+------------+--------+------------+
| day | emp_id | total_time |
+------------+--------+------------+
| 2020-11-28 | 1 | 173 |
| 2020-11-28 | 2 | 30 |
| 2020-12-03 | 1 | 41 |
| 2020-12-09 | 2 | 27 |
+------------+--------+------------+
Explanation:
Employee 1 has three events: two on day 2020-11-28 with a total of (32 - 4) + (200 - 55) = 173, and one on day 2020-12-03 with a total of (42 - 1) = 41.
Employee 2 has two events: one on day 2020-11-28 with a total of (33 - 3) = 30, and one on day 2020-12-09 with a total of (74 - 47) = 27.
題目大意:
給出每個員工的進場和離場表,要求統計每個員工在場的總時長
pandas思路:
一個時間段的值作差就可得到,每個員工的總時長可以通過groupby之后sum一下得到
pandas實現:
import pandas as pddef total_time(employees: pd.DataFrame) -> pd.DataFrame:employees['total_time'] = employees['out_time'] - employees['in_time']ans = employees.groupby(by=['event_day', 'emp_id'], as_index=False).agg('sum')ans.rename(columns={'event_day':'day'}, inplace=True)ans = ans[['day', 'emp_id', 'total_time']]return ans
MySQL思路:
groupby一下,查詢sum就可以了
MySQL實現:
select event_day as day, emp_id, sum(out_time - in_time) as total_time
from Employees
group by event_day, emp_id
511. Game Play Analysis I
原題鏈接:511. Game Play Analysis I
Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key (combination of columns with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write a solution to find the first login date for each player.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+
題目大意:
輸出員工的最早打卡時間
pandas思路:
排序+去重保留第一條就可以實現題目要求
pandas實現:
import pandas as pddef game_analysis(activity: pd.DataFrame) -> pd.DataFrame:activity.sort_values(by='event_date', inplace=True) # 按照登錄時間排序activity.drop_duplicates(subset='player_id', keep='first', inplace=True) # 去重activity.rename(columns={'event_date':'first_login'}, inplace=True)return activity[['player_id', 'first_login']]
2356. Number of Unique Subjects Taught by Each Teacher
原題鏈接:2356. Number of Unique Subjects Taught by Each Teacher
Table: Teacher
+-------------+------+
| Column Name | Type |
+-------------+------+
| teacher_id | int |
| subject_id | int |
| dept_id | int |
+-------------+------+
(subject_id, dept_id) is the primary key (combinations of columns with unique values) of this table.
Each row in this table indicates that the teacher with teacher_id teaches the subject subject_id in the department dept_id.
Write a solution to calculate the number of unique subjects each teacher teaches in the university.
Return the result table in any order.
The result format is shown in the following example.
Example 1:
Input:
Teacher table:
+------------+------------+---------+
| teacher_id | subject_id | dept_id |
+------------+------------+---------+
| 1 | 2 | 3 |
| 1 | 2 | 4 |
| 1 | 3 | 3 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 1 |
+------------+------------+---------+
Output:
+------------+-----+
| teacher_id | cnt |
+------------+-----+
| 1 | 2 |
| 2 | 4 |
+------------+-----+
Explanation:
Teacher 1:
- They teach subject 2 in departments 3 and 4.
- They teach subject 3 in department 3.
Teacher 2:
- They teach subject 1 in department 1.
- They teach subject 2 in department 1.
- They teach subject 3 in department 1.
- They teach subject 4 in department 1.
題目大意:
統計一下每個老師所交的課程的數目,同一個課程不同教室只算一門
pandas思路:
用groupby按教師id進行分組,nunique()
用于統計去重后的科目數
pandas實現:
import pandas as pddef count_unique_subjects(teacher: pd.DataFrame) -> pd.DataFrame:ans = teacher.groupby(by='teacher_id')['subject_id'].nunique().reset_index()ans.rename(columns={'subject_id':'cnt'}, inplace=True)return ans
MySQL思路:
group by,然后用count() 計數
MySQL實現:
select teacher_id, count(distinct subject_id) as cnt
from Teacher
group by teacher_id
596. Classes More Than 5 Students
原題鏈接:596. Classes More Than 5 Students
Table: Courses
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| student | varchar |
| class | varchar |
+-------------+---------+
(student, class) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates the name of a student and the class in which they are enrolled.
Write a solution to find all the classes that have at least five students.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input:
Courses table:
+---------+----------+
| student | class |
+---------+----------+
| A | Math |
| B | English |
| C | Math |
| D | Biology |
| E | Math |
| F | Computer |
| G | Math |
| H | Math |
| I | Math |
+---------+----------+
Output:
+---------+
| class |
+---------+
| Math |
+---------+
Explanation:
- Math has 6 students, so we include it.
- English has 1 student, so we do not include it.
- Biology has 1 student, so we do not include it.
- Computer has 1 student, so we do not include it.
題目大意:
返回學生數大于等于5的課程名
pandas思路:
使用 groupby()
按班級分組, size()
計算每個班級的出現次數(也就是學生數),然后進行行篩選即可
pandas實現:
import pandas as pddef find_classes(courses: pd.DataFrame) -> pd.DataFrame:df = courses.groupby('class').size().reset_index(name='count') # size()計算每個值出現的次數df = df[df['count'] >= 5]return df[['class']]
MySQL思路:
where關鍵字無法和聚合函數一起使用,having子句可以篩選分組后的各組數據
MySQL實現:
select class
from courses
group by class
having count(class) >= 5
586. Customer Placing the Largest Number of Orders
原題鏈接:586. Customer Placing the Largest Number of Orders
Table: Orders
+-----------------+----------+
| Column Name | Type |
+-----------------+----------+
| order_number | int |
| customer_number | int |
+-----------------+----------+
order_number is the primary key (column with unique values) for this table.
This table contains information about the order ID and the customer ID.
Write a solution to find the customer_number
for the customer who has placed the largest number of orders.
The test cases are generated so that exactly one customer will have placed more orders than any other customer.
The result format is in the following example.
Example 1:
Input:
Orders table:
+--------------+-----------------+
| order_number | customer_number |
+--------------+-----------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
+--------------+-----------------+
Output:
+-----------------+
| customer_number |
+-----------------+
| 3 |
+-----------------+
Explanation:
The customer with number 3 has two orders, which is greater than either customer 1 or 2 because each of them only has one order.
So the result is customer_number 3.
Follow up:
What if more than one customer has the largest number of orders, can you find all the customer_number in this case?
題目大意:
返回點單數最多的顧客的編號
pandas思路:
通過 groupby()
進行分組,通過 size()
來統計出現次數,對出現次數進行降序排列,最后輸出第一行即可
pandas實現:
import pandas as pddef largest_orders(orders: pd.DataFrame) -> pd.DataFrame:df = orders.groupby('customer_number').size().reset_index(name='count') # 統計次數df.sort_values(by='count', ascending=False, inplace=True) # 根據次數降序排列return df[['customer_number']].head(1)
MySQL思路:
一樣也是 group by
聚合,然后 order by
排序,用 limit 1
來返回第一條
MySQL實現:
select customer_number
from orders
group by customer_number
order by count(*) desc
limit 1
1484. Group Sold Products By The Date(好題)
原題鏈接:1484. Group Sold Products By The Date
Table Activities
:
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| sell_date | date |
| product | varchar |
+-------------+---------+
There is no primary key (column with unique values) for this table. It may contain duplicates.
Each row of this table contains the product name and the date it was sold in a market.
Write a solution to find for each date the number of different products sold and their names.
The sold products names for each date should be sorted lexicographically(字典序).
Return the result table ordered by sell_date
.
The result format is in the following example.
Example 1:
Input:
Activities table:
+------------+------------+
| sell_date | product |
+------------+------------+
| 2020-05-30 | Headphone |
| 2020-06-01 | Pencil |
| 2020-06-02 | Mask |
| 2020-05-30 | Basketball |
| 2020-06-01 | Bible |
| 2020-06-02 | Mask |
| 2020-05-30 | T-Shirt |
+------------+------------+
Output:
+------------+----------+------------------------------+
| sell_date | num_sold | products |
+------------+----------+------------------------------+
| 2020-05-30 | 3 | Basketball,Headphone,T-shirt |
| 2020-06-01 | 2 | Bible,Pencil |
| 2020-06-02 | 1 | Mask |
+------------+----------+------------------------------+
Explanation:
For 2020-05-30, Sold items were (Headphone, Basketball, T-shirt), we sort them lexicographically and separate them by a comma.
For 2020-06-01, Sold items were (Pencil, Bible), we sort them lexicographically and separate them by a comma.
For 2020-06-02, the Sold item is (Mask), we just return it.
題目大意:
按照日期將多行合并為一行,并統計數目
pandas思路:
使用 groupby()
進行分組,然后 agg()
進行聚合操作,重點在于將多行合并到一行,這個時不時會遇到,我覺得一定要掌握!!!
pandas實現:
import pandas as pddef categorize_products(activities: pd.DataFrame) -> pd.DataFrame:groups = activities.groupby('sell_date') # groupby的結果是一個DataFrameGroupBy對象ans = groups.agg(num_sold = ('product', 'nunique'),products = ('product', lambda x : ','.join(sorted(set(x)))) # 去重后按字典序排列, 用逗號分隔組成字符串).reset_index()ans.sort_values('sell_date', inplace=True) # 按照日期排序return ans
MySQL思路:
重點也是多行合并到一行
MySQL實現:
selectsell_date,count(distinct product) as num_sold,group_concat(distinct product order by product separator ',') as products -- 重點
fromactivities
group bysell_date
order bysell_date asc