目錄
- 2025年泰迪杯B題詳細解題思路
- 問題一
- 問題分析
- 數學模型
- Python代碼
- Matlab代碼
- 問題二
- 問題分析
- 數學模型
- Python代碼
- Matlab代碼
- 問題三
- 問題分析
- 數學模型
- Python代碼
- Matlab代碼
- 問題四
- 問題分析
- 數學模型
- Python代碼
- Matlab代碼
2025年泰迪杯B題詳細解題思路
初步分析整理了B題的賽題分析與解題思路,后面還會更新詳細的建模論文與解題代碼,明天完成!
問題一
問題分析
需要從附件1的加速度數據中提取MET值,并按強度分類統計時長。核心在于正確處理時間戳間隔和MET區間分類。由于時間戳為毫秒級,需計算相鄰時間差并累加至對應活動類別。需注意時間差計算的精度及MET區間的邊界條件。
數學模型
Python代碼
import pandas as pd
import re
import osdef process_volunteer(file_path):df = pd.read_csv(file_path)df['日期'] = pd.to_numeric(df['日期'])df = df.sort_values('日期')df['delta'] = df['日期'].diff().shift(-1) / (3600 * 1000) # 轉換為小時df = df.dropna(subset=['delta'])# 提取MET值df['MET'] = df['標簽'].apply(lambda x: float(re.search(r'MET值\s*([0-9.]+)', x).group(1)))# 分類統計bins = [-float('inf'), 1, 1.6, 3, 6, float('inf')]labels = ['睡眠', '靜態活動', '低等強度', '中等強度', '高等強度']df['category'] = pd.cut(df['MET'], bins=bins, labels=labels, right=False)result = df.groupby('category')['delta'].sum().to_dict()return {'志愿者ID': os.path.basename(file_path).split('.')[0],'記錄總時長(小時)': round(df['delta'].sum(), 4),'睡眠總時長(小時)': round(result.get('睡眠', 0), 4),'高等強度運動總時長(小時)': round(result.get('高等強度', 0), 4),'中等強度運動總時長(小時)': round(result.get('中等強度', 0), 4),'低等強度運動總時長(小時)': round(result.get('低等強度', 0), 4),'靜態活動總時長(小時)': round(result.get('靜態活動', 0), 4)}# 主程序
metadata = pd.read_csv('Metadatal.csv')
results = []
for vid in metadata['志愿者ID']:file_path = f'附件1/P{vid}.csv'if os.path.exists(file_path):res = process_volunteer(file_path)results.append(res)pd.DataFrame(results).to_excel('result_1.xlsx', index=False)
Matlab代碼
function B1()dataDir = '附件1/';meta = readtable('Metadatal.csv');results = cell(height(meta), 7);for i = 1:height(meta)vid = meta.志愿者ID{i};file = [dataDir 'P' vid '.csv'];if ~exist(file, 'file'), continue; end% 讀取數據并排序tbl = readtable(file);tbl.日期 = str2double(tbl.日期);[~, idx] = sort(tbl.日期);tbl = tbl(idx, :);% 計算時間差delta = diff(tbl.日期) / (3600 * 1000); % 轉換為小時met = zeros(length(delta), 1);for j = 1:length(delta)metStr = tbl.標簽{j};metVal = regexp(metStr, 'MET值\s*([0-9.]+)', 'tokens', 'once');met(j) = str2double(metVal{1});end% 分類統計edges = [-inf, 1, 1.6, 3, 6, inf];[~, bin] = histc(met, edges);total = sum(delta);counts = accumarray(bin, delta, [5, 1], @sum, 0);% 保存結果results(i, :) = {vid, total, counts(1), counts(5), counts(4), counts(3), counts(2)};end% 輸出到ExcelT = cell2table(results, 'VariableNames', {'志愿者ID', '總時長', '睡眠', '高等', '中等', '低等', '靜態'});writetable(T, 'result_1.xlsx');
end
問題二
問題分析
需構建回歸模型預測MET值。輸入特征包括三軸加速度的時域、頻域統計量及元數據(年齡、性別)。模型需捕捉加速度與MET值的非線性關系。
數學模型
Python代碼
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_scoredef extract_features(data):x = data['X'].valuesy = data['Y'].valuesz = data['Z'].valuesvm = np.sqrt(x**2 + y**2 + z**2)# 時域特征features = {'x_mean': np.mean(x), 'x_std': np.std(x),'y_mean': np.mean(y), 'y_std': np.std(y),'z_mean': np.mean(z), 'z_std': np.std(z),'vm_mean': np.mean(vm), 'vm_std': np.std(vm),'vm_rms': np.sqrt(np.mean(vm**2))}# 頻域特征for axis, sig in zip(['x', 'y', 'z'], [x, y, z]):fft = np.abs(np.fft.rfft(sig))features[f'{axis}_energy'] = np.sum(fft**2)return features# 訓練數據準備
metadata = pd.read_csv('Metadatal.csv')
X, y = [], []
for vid in metadata['志愿者ID']:df = pd.read_csv(f'附件1/P{vid}.csv')df['MET'] = df['標簽'].str.extract(r'MET值\s*([0-9.]+)').astype(float)# 滑動窗口處理(窗口5秒)window_size = 5for i in range(0, len(df
) - window_size, window_size):window = df.iloc[i:i+window_size]feat = extract_features(window)feat['age'] = metadata.loc[metadata['志愿者ID'] == vid, '年齡'].values[0]feat['gender'] = 1 if metadata.loc[metadata['志愿者ID'] == vid, '性別'].values[0] == '男' else 0X.append(feat)y.append(window['MET'].mean())# 訓練模型
model = RandomForestRegressor(n_estimators=100)
scores = cross_val_score(model, pd.DataFrame(X), y, cv=5, scoring='r2')
print(f'交叉驗證R2得分: {np.mean(scores):.4f}')
model.fit(pd.DataFrame(X), y)# 預測附件2數據
Matlab代碼
function B2()% 特征提取函數function feat = extractFeatures(x, y, z)vm = sqrt(x.^2 + y.^2 + z.^2);feat = [mean(x), std(x), mean(y), std(y), mean(z), std(z), ...mean(vm), std(vm), rms(vm), sum(abs(fft(x)).^2), ...sum(abs(fft(y)).^2), sum(abs(fft(z)).^2)];end% 加載數據meta = readtable('Metadatal.csv');X = []; y = [];for i = 1:height(meta)file = ['附件1/P' meta.志愿者ID{i} '.csv'];tbl = readtable(file);met = cellfun(@(s) str2double(regexp(s, 'MET值\s*([0-9.]+)', 'tokens', 'once')), tbl.標簽);% 滑動窗口處理winSize = 5; % 5秒窗口for j = 1:winSize:height(tbl)-winSizex = tbl.X(j:j+winSize-1);y_axis = tbl.Y(j:j+winSize-1);z = tbl.Z(j:j+winSize-1);feat = extractFeatures(x, y_axis, z);X = [X; feat meta.年齡(i) strcmp(meta.性別{i}, '男')];y = [y; mean(met(j:j+winSize-1))];endend% 訓練隨機森林model = TreeBagger(100, X, y, 'Method', 'regression');% 預測附件2
end
問題三
問題分析
睡眠階段通過低活動量時段檢測。計算向量幅度(VM)的滑動窗口均值,低于閾值視為睡眠候選,進一步聚類劃分模式。
數學模型
Python代碼
from sklearn.cluster import KMeansdef detect_sleep(file_path):df = pd.read_csv(file_path)df['vm'] = np.sqrt(df['X']**2 + df['Y']**2 + df['Z']**2)# 滑動窗口檢測低活動(30秒窗口)window_size = 30df['window'] = df.index // window_sizeactivity = df.groupby('window')['vm'].mean()sleep_windows = activity[activity < 0.1].index# 提取窗口特征features = []for win in sleep_windows:win_data = df[df['window'] == win]vm_mean = win_data['vm'].mean()vm_std = win_data['vm'].std()features.append([vm_mean, vm_std])# K-means聚類if len(features) == 0:return {'睡眠總時長': 0.0, '模式一': 0.0, '模式二': 0.0, '模式三': 0.0}kmeans = KMeans(n_clusters=3).fit(features)labels = kmeans.labels_counts = np.bincount(labels, minlength=3)hours = counts * window_size / 3600 # 轉換為小時return {'睡眠總時長': round(np.sum(hours), 4),'模式一': round(hours[0], 4),'模式二': round(hours[1], 4),'模式三': round(hours[2], 4)}# 處理附件2并保存結果
Matlab代碼
function B3()function [total, modes] = detectSleep(file)tbl = readtable(file);vm = sqrt(tbl.X.^2 + tbl.Y.^2 + tbl.Z.^2);% 檢測低活動窗口(30秒窗口)winSize = 30;numWin = floor(height(tbl)/winSize);act = zeros(numWin, 1);for i = 1:numWinidx = (i-1)*winSize + 1 : i*winSize;act(i) = mean(vm(idx));endsleepWins = find(act < 0.1);% 提取特征并聚類features = zeros(length(sleepWins), 2);for j = 1:length(sleepWins)idx = (sleepWins(j)-1)*winSize + 1 : sleepWins(j)*winSize;vmWin = vm(idx);features(j, :) = [mean(vmWin), std(vmWin)];endif isempty(features)total = 0; modes = zeros(1,3);else[~, C] = kmeans(features, 3);counts = histcounts(C, 1:4);total = sum(counts) * winSize / 3600;modes = counts * winSize / 3600;endend% 應用至附件2(略)
end
問題四
問題分析
檢測連續靜態活動(MET<1.6)超過30分鐘的時段。遍歷預測的MET序列,記錄連續滿足條件的時段。
數學模型
設MET序列為 ( MET(t) ),窗口步長 ( \Delta t )(單位:分鐘),久坐判定條件為:
[
\sum_{i=t}^{t+\Delta t} MET(i) < 1.6 \quad \text{且} \quad \Delta t \geq 30
]
Python代碼
def sedentary_alert(met_series, window_min=5):delta = window_min / 60 # 轉換為小時sedentary = []current_duration = 0.0start_idx = Nonefor i, met in enumerate(met_series):if met < 1.6:current_duration += deltaif start_idx is None:start_idx = ielse:if current_duration >= 0.5: # 0.5小時=30分鐘end_idx = i - 1sedentary.append((start_idx, end_idx, current_duration))current_duration = 0.0start_idx = Noneif current_duration >= 0.5:sedentary.append((start_idx, len(met_series)-1, current_duration))return sedentary# 應用至附件2預測結果
Matlab代碼
function B4()function alerts = detectSedentary(met, winSize)delta = winSize / 60; % 窗口分鐘轉小時alerts = [];start = 1; count = 0;for i = 1:length(met)if met(i) < 1.6count = count + delta;if isempty(start), start = i; endelseif count >= 0.5 % 0.5小時=30分鐘alerts = [alerts; [start, i-1, count]];endcount = 0;start = [];endendif count >= 0.5alerts = [alerts; [start, length(met), count]];endend% 應用至附件2(略)
end
完整論文代碼獲取,請看下方~ 可直接指導比賽,沖國獎