是否能聯通
psql -h 127.0.0.1 -U Shinelon-d mimic --password
pgadmin建庫
psql -d mimiciv -f mimic-iv/buildmimic/postgres/create.sql
也可以pgadmin直接運行create.sql
導入csv.gz數據
psql -d mimic -v ON_ERROR_STOP=1 -v mimic_data_dir=H:/mimic-iv-2.2 -f C:\Users\Shinelon\Downloads\mimic-code-main\mimic-iv\buildmimic\postgres\load_gz.sql
醫院表
表名 | 功能描述 |
---|---|
admissions | 患者入院信息(入院時間、出院時間、入院類型、科室等) |
d_hcpcs | HCPCS 編碼字典(解釋 HCPCS 代碼對應的醫療服務 / 物品含義) |
d_icd_diagnoses | ICD-9 診斷編碼字典(關聯diagnoses_icd ,提供診斷名稱、縮寫等) |
d_icd_procedures | ICD-9 手術編碼字典(關聯procedures_icd ,解釋手術編碼含義) |
d_labitems | 實驗室檢測項字典(關聯labevents ,解釋itemid 對應的檢測名稱、單位等) |
diagnoses_icd | 患者診斷記錄(存儲 ICD-9 編碼,關聯subject_id 、hadm_id ) |
drgcodes | DRG 分組信息(患者的診斷相關分組,用于醫保付費等) |
emar | 電子用藥醫囑(用藥醫囑概要,如藥品、時間) |
emar_detail | 電子用藥醫囑詳情(劑量、頻率、給藥途徑等) |
hcpcs_events | HCPCS 服務記錄(患者接受的 HCPCS 編碼對應的醫療服務) |
labevents | 實驗室檢測結果(患者的檢驗數據,關聯d_labitems 的itemid ) |
microbiologyevents | 微生物檢測結果(細菌培養、藥敏試驗等) |
omr | 門診病歷記錄(Outpatient Medical Records,門診相關數據) |
patients | 患者基本信息(性別、出生日期、唯一標識subject_id 等) |
pharmacy | 藥房信息(藥品庫存、調配記錄等) |
poe | 醫囑錄入(Provider Order Entry,醫囑錄入的基本信息) |
poe_detail | 醫囑詳情(poe 的補充信息,如醫囑具體參數) |
prescriptions | 處方記錄(患者的處方信息,含藥品、劑量、開方時間等) |
procedures_icd | 手術記錄(存儲 ICD-9 手術編碼,關聯subject_id 、hadm_id ) |
provider | 醫護人員信息(ID、姓名、科室等) |
services | 院內服務記錄(患者所在服務單元,如 ICU、普通病房等) |
transfers | 轉科記錄(患者在院內的科室轉移時間、去向等) |
卒中icd數據和提取
-
ICD-9 和 ICD-10 中 “卒中(stroke)” 的核心編碼范圍如下:
ICD-9-CM(MIMIC-III 常用)
卒中編碼集中在 430–438(循環系統疾病類目),關鍵亞類包括:
- 430:蛛網膜下腔出血
- 431:腦內出血(非創傷性)
- 434:腦動脈閉塞(如腦血栓形成、腦栓塞)
- 436:急性難分類腦血管病
- 排除:435(短暫性腦缺血發作,TIA)通常不被視為卒中
ICD-10-CM(MIMIC-IV 常用)
卒中編碼對應 I60–I69(腦血管疾患類目),關鍵亞類包括:
- I60:非創傷性蛛網膜下腔出血
- I61:非創傷性腦內出血
- I63:腦梗死(缺血性卒中)
- I64:未特指為出血或梗死的卒中
- I69:腦血管病后遺癥(如卒中后偏癱)
應用場景(如 MIMIC 數據庫)
- 若使用 MIMIC-III(ICD-9),可通過
diagnoses_icd.icd_code and diagnoses_icd.icd_version=9 BETWEEN '430' AND '438'
篩選卒中病例; - 若使用 MIMIC-IV(ICD-10),則通過
diagnoses_icd10.icd_code BETWEEN 'I60' AND 'I69'
篩選。
提取所有ICD-9編碼為434(缺血性卒中)的患者唯一ID(subject_id)
SELECT DISTINCT di.subject_id
FROM MIMICIV_HOSP.diagnoses_icd di – ICD-9診斷表
WHERE di.icd_code like ‘434%’ and di.icd_version=9 – 注意ICD編碼是字符串類型,需加單引號
or di.icd_code like ‘163%’ and di.icd_version=10 – 注意ICD編碼是字符串類型,需加單引號
提取AD的患者
G301 G309 G308 G300 3310
SELECT DISTINCT di.subject_id FROM MIMICIV_HOSP.diagnoses_icd di
WHERE (di.icd_version = 9 AND di.icd_code = ‘3310’) OR
(di.icd_version = 10 AND (di.icd_code IN (‘G301’, ‘G309’, ‘G308’, ‘G300’)) );
診斷排序
SELECT
icd_code,
count(icd_code) AS item_count – 每個itemid的出現次數
FROM MIMICIV_HOSP.diagnoses_icd
GROUP BY icd_code-- 按icd項目分組
ORDER BY desc item_count ; – 按itemid排序
匹配icd title
`import pandas as pd
# 讀取診斷數據
all_diagnosis_df = pd.read_csv(r’H:\mimic-iv-2.2\all_diagnosis.csv’)
# 讀取ICD字典數據
d_icd_df = pd.read_csv(r’H:\mimic-iv-2.2\hosp\csv\d_icd_diagnoses.csv’)
# 去除icd_code中的所有空格(包括中間和首尾的空格)
# 為原始數據添加處理后的icd_code列,用于匹配
all_diagnosis_df[‘icd_code_clean’] = all_diagnosis_df[‘icd_code’].astype(str).str.replace(r’\s+', ‘’, regex=True)
d_icd_df[‘icd_code_clean’] = d_icd_df[‘icd_code’].astype(str).str.replace(r’\s+', ‘’, regex=True)
# 提取ICD字典中處理后的所有icd_code并轉為集合(提高查找效率)
icd_codes_in_dict = set(d_icd_df[‘icd_code_clean’].unique())
# 提取待匹配的所有處理后的icd_code(去重)
all_icd_codes = all_diagnosis_df[‘icd_code_clean’].unique()
# 同時保留原始icd_code用于顯示
original_icd_mapping = dict(zip(all_diagnosis_df[‘icd_code_clean’], all_diagnosis_df[‘icd_code’]))
# 存儲未找到匹配的icd_code(原始值和處理后的值)
not_found_codes = []
# 逐個匹配并檢查
print(“開始匹配ICD代碼(已自動去除空格)…”)
for clean_code in all_icd_codes:
? original_code = original_icd_mapping[clean_code]
? if clean_code not in icd_codes_in_dict:
? not_found_codes.append({
? ‘原始代碼’: original_code,
? ‘處理后代碼’: clean_code
? })
? print(f"未找到匹配: 原始代碼=‘{original_code}’,處理后代碼=‘{clean_code}’")
# 打印總結信息
print(f"\n匹配完成。共檢查 {len(all_icd_codes)} 個唯一ICD代碼,其中 {len(not_found_codes)} 個未找到匹配。")
# 執行合并操作(使用處理后的代碼進行匹配)
merged_df = pd.merge(
? all_diagnosis_df,
? d_icd_df[[‘icd_code_clean’, ‘icd_version’, ‘long_title’, ‘icd_code’]], # 保留原始icd_code
? on=‘icd_code_clean’,
? how=‘left’,
? suffixes=(‘’, ‘_dict’) # 區分原始和字典中的icd_code
)
# 調整列順序,移除臨時的clean列
merged_df = merged_df.drop(columns=[‘icd_code_clean’])
# 可選:如果需要可以將字典中的原始icd_code重命名
# merged_df = merged_df.rename(columns={‘icd_code_dict’: ‘icd_code_from_dict’})
# 保存結果文件
csv_path = r’H:\mimic-iv-2.2\all_diagnosis_filled.csv’
merged_df.to_csv(csv_path, index=False)
print(f"\n合并結果已保存至: {csv_path}")
# 可選:將未找到的代碼保存到文件
if not_found_codes:
? not_found_df = pd.DataFrame(not_found_codes)
? not_found_path = r’H:\mimic-iv-2.2\icd_codes_not_found.csv’
? not_found_df.to_csv(not_found_path, index=False)
? print(f"未找到的ICD代碼已保存至: {not_found_path}")
? `
前20的診斷
一般診斷數越多病例越多越好出文章
icd_code | item_count | icd_version | long_title |
---|---|---|---|
4019 | 97361 | 9 | Unspecified essential hypertension |
2724 | 63875 | 9 | Other and unspecified hyperlipidemia |
I10 | 51706 | 10 | Essential (primary) hypertension |
E785 | 48581 | 10 | Hyperlipidemia, unspecified |
53081 | 46238 | 9 | Esophageal reflux |
25000 | 41029 | 9 | Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled |
Z87891 | 38893 | 10 | Personal history of nicotine dependence |
42731 | 35180 | 9 | Atrial fibrillation |
311 | 34632 | 9 | Depressive disorder, not elsewhere classified |
4280 | 34597 | 9 | Congestive heart failure, unspecified |
41401 | 34237 | 9 | Coronary atherosclerosis of native coronary artery |
K219 | 33737 | 10 | Gastro-esophageal reflux disease without esophagitis |
V1582 | 30197 | 9 | Personal history of tobacco use |
F329 | 28787 | 10 | Major depressive disorder, single episode, unspecified |
5849 | 27641 | 9 | Acute kidney failure, unspecified |
2449 | 27065 | 9 | Unspecified acquired hypothyroidism |
I2510 | 25398 | 10 | Atherosclerotic heart disease of native coronary artery without angina pectoris |
3051 | 24542 | 9 | Tobacco use disorder |
2859 | 23288 | 9 | Anemia, unspecified |
實驗室檢查分析統計
統計每個itemid的記錄數,并按itemid排列
SELECT
itemid,
count(itemid) AS item_count – 每個itemid的出現次數
FROM MIMICIV_HOSP.labevents
GROUP BY itemid – 按檢測項目分組
ORDER BY itemid desc; – 按itemid排序(此時排序有效)
依舊是匹配之后出前20的實驗室指標
itemid | item_count | label | fluid | category |
---|---|---|---|---|
51221 | 3325700 | Hematocrit | Blood | Hematology |
50912 | 3282278 | Creatinine | Blood | Chemistry |
51265 | 3216656 | Platelet Count | Blood | Hematology |
51006 | 3189474 | Urea Nitrogen | Blood | Chemistry |
51222 | 3188835 | Hemoglobin | Blood | Hematology |
51301 | 3171798 | White Blood Cells | Blood | Hematology |
51249 | 3167190 | MCHC | Blood | Hematology |
51279 | 3167075 | Red Blood Cells | Blood | Hematology |
51250 | 3167074 | MCV | Blood | Hematology |
51248 | 3167073 | MCH | Blood | Hematology |
51277 | 3166860 | RDW | Blood | Hematology |
50971 | 3150261 | Potassium | Blood | Chemistry |
50983 | 3119778 | Sodium | Blood | Chemistry |
50902 | 3083705 | Chloride | Blood | Chemistry |
50882 | 2972827 | Bicarbonate | Blood | Chemistry |
50868 | 2965452 | Anion Gap | Blood | Chemistry |
50931 | 2749562 | Glucose | Blood | Chemistry |
50893 | 2190816 | Calcium, Total | Blood | Chemistry |
50960 | 2154209 | Magnesium | Blood | Chemistry |
醫囑procedure分析
SELECT
itemid,
count(itemid) AS item_count – 每個itemid的出現次數
FROM MIMICIV_HOSP.procedures_icd
GROUP BY itemid – 按檢測項目分組
ORDER BY itemid desc; – 按itemid排序(此時排序有效)
前20procedure
icd_code | item_count | icd_version | long_title | icd_code_dict |
---|---|---|---|---|
3893 | 13928 | 9 | Venous catheterization, not elsewhere classified | 3893 |
02HV33Z | 10061 | 10 | Insertion of Infusion Device into Superior Vena Cava, Percutaneous Approach | 02HV33Z |
8938 | 10041 | 9 | Other nonoperative respiratory measurements | 8938 |
3897 | 9843 | 9 | Central venous catheter placement with guidance | 3897 |
8856 | 9043 | 9 | Coronary arteriography using two catheters | 8856 |
966 | 7811 | 9 | Enteral infusion of concentrated nutritional substances | 966 |
3995 | 7374 | 9 | Hemodialysis | 3995 |
0040 | 7158 | 9 | Procedure on single vessel | 0040 |
9671 | 7036 | 9 | Continuous invasive mechanical ventilation for less than 96 consecutive hours | 9671 |
8952 | 6506 | 9 | Electrocardiogram | 8952 |
5491 | 6245 | 9 | Percutaneous abdominal drainage | 5491 |
9604 | 6203 | 9 | Insertion of endotracheal tube | 9604 |
3722 | 5788 | 9 | Left heart cardiac catheterization | 3722 |
8744 | 5482 | 9 | Routine chest x-ray, so described | 8744 |
4513 | 5470 | 9 | Other endoscopy of small intestine | 4513 |
0066 | 5100 | 9 | Percutaneous transluminal coronary angioplasty [PTCA] | 0066 |
9925 | 5006 | 9 | Injection or infusion of cancer chemotherapeutic substance | 9925 |
3E0G76Z | 4871 | 10 | Introduction of Nutritional Substance into Upper GI, Via Natural or Artificial Opening | 3E0G76Z |
3961 | 4833 | 9 | Extracorporeal circulation auxiliary to open heart surgery | 3961 |
3899 | 4712 | 9 | Other puncture of vein | 3899 |
HCPC分析
HCPC 全稱為 Healthcare Common Procedure Coding System(醫療通用程序編碼系統),是美國用于標準化描述醫療服務、醫療程序、醫療用品(如器械、藥品)的編碼體系,由美國醫療保險和醫療補助服務中心(CMS)與美國醫學會(AMA)共同維護,核心作用是統一醫療服務的標識,支撐醫保結算、醫療數據統計和服務費用核算。
核心組成與用途
- 主要編碼類型
- Level I(CPT 編碼):對應醫生提供的診療服務(如手術、檢查、門診治療),由 AMA 制定,編碼為 5 位數字(如 99213 代表常規門診隨訪)。
- Level II:對應醫療用品、設備、特殊服務(如輪椅、假肢、家庭醫療服務),編碼以字母開頭 + 4 位數字(如 E0100 代表標準輪椅)。
- 核心用途
- 醫保報銷:作為美國醫保(如 Medicare/Medicaid)判斷服務是否覆蓋、計算報銷金額的依據;
- 數據標準化:統一不同醫療機構的服務記錄,便于醫療質量分析、疾病與治療關聯研究;
- 費用核算:明確醫療服務的具體編碼,對應收費標準。
SELECT
hcpcs_cd,
count(hcpcs_cd) AS item_count – 每個itemid的出現次數
FROM MIMICIV_HOSP.hcpcsevents
GROUP BY hcpcs_cd – 按檢測項目分組
ORDER BY hcpcs_cd desc; – 按itemid排序(此時排序有效)
前20hcpc
hcpcs_cd | item_count | code | short_description | long_description |
---|---|---|---|---|
G0378 | 53184 | G0378 | Hospital observation per hr | Hospital observation service, per hour |
99219 | 44401 | 99219 | Hospital observation services | |
99218 | 10222 | 99218 | Hospital observation services | |
99220 | 9775 | 99220 | Hospital observation services | |
43262 | 960 | 43262 | Digestive system | |
93454 | 923 | 93454 | Cardiovascular | |
44970 | 921 | 44970 | Digestive system | |
43264 | 839 | 43264 | Digestive system | |
43239 | 719 | 43239 | Digestive system | |
47562 | 717 | 47562 | Digestive system | |
C9600 | 627 | C9600 | Perc drug-el cor stent sing | Percutaneous transcatheter placement of drug eluting intracoronary stent(s), with coronary angioplasty when performed; single major coronary artery or branch |
43235 | 441 | 43235 | Digestive system | |
64447 | 398 | 64447 | Nervous system | |
45380 | 382 | 45380 | Digestive system | |
93458 | 373 | 93458 | Cardiovascular | |
64415 | 347 | 64415 | Nervous system | |
92980 | 328 | 92980 | Cardiovascular | |
43274 | 323 | 43274 | Digestive system | |
36246 | 311 | 36246 | Cardiovascular system |