MYSQL PARTITIONING分區操作和性能測試

PARTITION OR NOT PARTITION IN MYSQl

Bill Karwin says “In most circumstances, you’re better off using indexes instead of partitioning as your main method of query optimization.”
According to RICK JAMES: “It is so tempting to believe that PARTITIONing will solve performance problems. But it is so often wrong.”
let’s find out what’s going on by building a test case

TWO TABLES READY

How many partitions? views from Rick James: Have 20-50 partitions; no more.
In this page, we do 10 partitions
Remember: Always test your real case.

  1. Partition table with 10 partitions
CREATE TABLE points_partition 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time))
PARTITION BY RANGE( YEAR(created_time) ) (PARTITION p16 VALUES less than (2016),PARTITION p17 VALUES less than (2017),PARTITION p18 VALUES less than (2018),PARTITION p19 VALUES less than (2019),PARTITION p20 VALUES less than (2020),PARTITION p21 VALUES less than (2021),PARTITION p22 VALUES less than (2022),PARTITION p23 VALUES less than (2023),PARTITION p24 VALUES less than (2024),PARTITION p25 VALUES less than (2025)
) ;
  1. Normal table
CREATE TABLE points_full_table 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time));

Create millions of rows

For test case, each table holds 10 millions of rows
If using mysql to insert, example 2 is better than example 1

-- sql example 1
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
-- sql example 2
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),("data1", "data2"),("data1", "data2");

Add large data with tools

from faker import Faker
import randomdef insert_large_data(nums=10):fake = Faker()data = [(random.random(), random.random(), random.random(),str(fake.date_time_between(start_date='-10y', end_date='now'))) for i in range(nums)]cursor = connection.cursor()sql = f"INSERT INTO points_partition (x, y, z, created_time) VALUES (%s, %s, %s, %s)"# execute sql with your idea tool

DB-status

partition table take extra files to preserve data, also, extra disk space
請添加圖片描述
partition table
請添加圖片描述

TEST RESULTS WITHOUT EXTRA INDEX(created_time)

test-1
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where

FROM:mysqlslap

# partition_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.156 secondsMinimum number of seconds to run all queries: 0.156 secondsMaximum number of seconds to run all queries: 0.156 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.172 secondsMinimum number of seconds to run all queries: 0.172 secondsMaximum number of seconds to run all queries: 0.172 secondsNumber of clients running queries: 10Average number of queries per client: 10

In general, it is expected that fewer touched rows would result in less time for query execution.
since this query only required limit rows under condition without order, mysql optimizer is doing a good job here.
the worse case for the full table is that do a full table scan, but to get just 100 target rows from random data, much less time is needed.

however, if we put a order by in where clause, things will be a huge different.

test-2
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where; Using filesort
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where; Using filesort

FROM:mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 4.931 secondsMinimum number of seconds to run all queries: 4.931 secondsMaximum number of seconds to run all queries: 4.931 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 54.652 secondsMinimum number of seconds to run all queries: 54.652 secondsMaximum number of seconds to run all queries: 54.652 secondsNumber of clients running queries: 10Average number of queries per client: 10

A huge time gap between two queries.
what’ going on?
under condition of “order by”
a full table needs a full table-field sort, that’s cost a lot,
a partition table only need to sort a partition after located target partition.
we always say: test your real case, by this way, you find your circumstance to do a partition table.

WHY:In most circumstances, you’re better off using indexes instead of partitioning

the test are not done yet
From mysql explain, the extra field print a message: “Using filesort”
normally, you should considering a index here to improve performance: MYSQL: explain-extra-information

let’s add a index

ALTER TABLE `points_partition` ADD INDEX `created_time_index` (`created_time`);
ALTER TABLE `points_full_table` ADD INDEX `created_time_index` (`created_time`);

TEST RESULTS WITH INDEX

test-3
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition; Using MRR

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.168 secondsMinimum number of seconds to run all queries: 0.168 secondsMaximum number of seconds to run all queries: 0.168 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.368 secondsMinimum number of seconds to run all queries: 0.368 secondsMaximum number of seconds to run all queries: 0.368 secondsNumber of clients running queries: 10Average number of queries per client: 10

again: In general, it is expected that fewer touched rows would result in less time for query execution.
new queries cost a little more time than without extra index.
what happens? explain shows “condition index” are being used here.
stop here, it’s not how indexes are introduced.
sometimes, index is not help if the goal was retrieve 100 target rows. the worst case, yes, but not all.

let’s put a “order by” to see the magic

test-4
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.162 secondsMinimum number of seconds to run all queries: 0.162 secondsMaximum number of seconds to run all queries: 0.162 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.185 secondsMinimum number of seconds to run all queries: 0.185 secondsMaximum number of seconds to run all queries: 0.185 secondsNumber of clients running queries: 10Average number of queries per client: 10

same touched rows as no “order by”.
but the time cost of queries are getting really closed.
makes sense “In this circumstance, you’re better off using indexes instead of partitioning”.
after all, there are different types of queries were influenced and Maintenance of PARTITION is also a big thing.
For example: select count() is much slower for partition tables. unless doing a partition count()

more tests?
let’s stop here

table vs (better view)

key/typepartitionnormalpartition+ordernormal+orderpartition+indexnormal+indexpartition+order+indexnormal+order+index
diskspace~590m~540m~590m~540m~750m~700m~750m~700m
mysqlslap-benchmark0.156s0.172s4.931s54.652s0.168s0.368s0.162s0.185s
mysql-explain-touched-rows9116259747207911625974720745581226417844558122641784
index////created_time_indexcreated_time_indexcreated_time_indexcreated_time_index

POINTS BASED ON TEST(mysqlslap & mysql workbench)

  1. Index works good without partitioning, most of cases even better
  2. Under condition of range query by partition field, partitioning tables works good indeed
  3. drop partitions is much more efficient when doing a big delete
  4. if queries use specific partition, performance will better

Other Points Related & documents & Links:

  1. Partitioning mainly helps when your full table is larger than RAM
  2. No partitioning without million rows, Only BY RANGE provides any performance…
  3. index order(DESC or ASC) is also important
  4. mysqlslap–benchmark tool
  5. questions about partition

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/web/62184.shtml
繁體地址,請注明出處:http://hk.pswp.cn/web/62184.shtml
英文地址,請注明出處:http://en.pswp.cn/web/62184.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

深入解析 Loss 減少方式:mean和sum的區別及其在大語言模型中的應用 (中英雙語)

深入解析 Loss 減少方式:mean 和 sum 的區別及其在大語言模型中的應用 在訓練大語言模型(Large Language Models, LLM)時,損失函數(Loss Function)的處理方式對模型的性能和優化過程有顯著影響。本文以 re…

基于 AutoFlow 快速搭建基于 TiDB 向量搜索的本地知識庫問答機器人

導讀 本文將詳細介紹如何通過 PingCAP 開源項目 AutoFlow 實現快速搭建基于 TiDB 的本地知識庫問答機器人。如果提前準備好 Docker、TiDB 環境,整個搭建過程估計在 10 分鐘左右即可完成,無須開發任何代碼。 文中使用一篇 TiDB 文檔作為本地數據源作為示…

生信技能63 - 構建gnomAD變異位點的SQLite查詢數據庫

將數據量巨大的gnomAD數據庫,通過SQLite數據庫尋找gnomAD中存在的各種變異注釋信息(如等位基因計數,深度,次要等位基因頻率等),查詢300.000個變量的查詢需要大約40秒,通過染色體編號+位置+REF+ALT即可進行快速查詢。 1. gnomAD變異注釋VCF文件字段 gnomAD VCF各版本包…

【前端】將vue的方法掛載到window上供全局使用,也方便跟原生js做交互

【前端】將vue的方法掛載到window上供全局使用&#xff0c;也方便跟原生js做交互 <template><div><el-button click"start">調用方法</el-button></div> </template> <script> // import { JScallbackProc } from ./JScal…

基于XML的AOP開發

AOP 為 Aspect Oriented Programming 的縮寫&#xff0c;意思為面向切面編程。 AOP相關術語&#xff1a; 目標對象(Target)&#xff1a; 你要去代理的對象&#xff0c;可以理解為之前很單純的那個對象。 代理對象(Proxy)&#xff1a; 你把你那個單純的對象給我&#xff0c…

記錄blender學習過程中遇到的問題

物體發射的方向不對 被發射物體&#xff08;例如一棵樹&#xff09;n鍵看旋轉歸0 切換正視圖 將被發射物體的局部坐標的Z軸 指向 全局方向的X軸時 并且把粒子系統設置的物體旋轉勾選上 方向就對了 做倒角發現有問題 檢查縮放應用、面朝向、有沒有重合點&#xff08;融合點&am…

Ubuntu系統中Redis的安裝步驟及服務配置

目錄 內容概括 系統環境 安裝方式 1、apt包管理器安裝 &#xff08;1&#xff09;安裝redis服務 &#xff08;2&#xff09;安裝客戶端&#xff08;進入命令行操作使用&#xff0c;包含redis-cli&#xff09; &#xff08;3&#xff09;安裝檢驗 &#xff08;4&#xf…

半導體設備中的微型導軌應如何選擇合適的潤滑油?

微型導軌的潤滑對于保證其高精度和高穩定性至關重要&#xff0c;尤其是在半導體設備中&#xff0c;微型導軌的潤滑油選擇需要考慮多個因素&#xff0c;以確保設備的最佳性能和壽命。以下是一些關鍵點&#xff1a; 1、黏度&#xff1a;潤滑油的黏度是影響其流動性和潤滑效果的重…

RocketMq詳解:六、RocketMq的負載均衡機制

上一章&#xff1a;《SpringBootAop實現RocketMq的冪等》 文章目錄 1.背景1.1 什么是負載均衡1.2 負載均衡的意義 2.RocketMQ消息消費2.1 消息的流轉過程2.2 Consumer消費消息的流程 3.RocketMq的負載均衡策略3.1 Broker負載均衡3.2 Producer發送消息負載均衡3.3 消費端的負載均…

yocto的xxx.bb文件在什么時候會拷貝文件到build目錄

在 Yocto 中&#xff0c;.bb 文件用于描述如何構建和安裝一個軟件包&#xff0c;而文件在構建過程中的拷貝操作通常會在某些特定的步驟中進行。具體來說&#xff0c;文件會在以下幾個階段被拷貝到 build 目錄&#xff08;或者更準確地說&#xff0c;拷貝到目標目錄 ${D}&#x…

主打極致性價比,AMD RX 8600/8800顯卡定了

*以下內容僅為網絡爆料及傳聞&#xff0c;一切以官方消息為準。 這誰能想到&#xff0c;率先掏出下一代桌面獨立顯卡的不是老大哥 NVIDIA&#xff0c;也不是 AMD&#xff0c;反而是三家中存在感最弱的 Intel&#xff01; 就在 12 月 3 日&#xff0c;Intel 正式發布了自家第二…

數組哪些方法會觸發Vue監聽,哪些不會觸發監聽

發現寶藏 前些天發現了一個巨牛的人工智能學習網站&#xff0c;通俗易懂&#xff0c;風趣幽默&#xff0c;忍不住分享一下給大家。【寶藏入口】。 在 Vue 中&#xff0c;數組的變化是通過 響應式 系統來監聽的。Vue 使用 getter 和 setter 來追蹤數組的變化&#xff0c;并在數…

npm, yarn, pnpm之間的區別

前言 在現代化的開發中&#xff0c;一個人可能同時開發多個項目&#xff0c;安裝的項目越來越多&#xff0c;所隨之安裝的依賴包也越來越臃腫&#xff0c;而且有時候所安裝的速度也很慢&#xff0c;甚至會安裝失敗。 因此我們就需要去了解一下&#xff0c;我們的包管理器&#…

工業檢測基礎-工業相機選型及應用場景

以下是一些常見的工業檢測相機種類、檢測原理、應用場景及選型依據&#xff1a; 2D相機 檢測原理&#xff1a;基于二維圖像捕獲&#xff0c;通過分析圖像的明暗、紋理、顏色等信息來檢測物體的特征和缺陷.應用場景&#xff1a;廣泛應用于平面工件的外觀檢測&#xff0c;如檢測…

C語言連接數據庫

文章目錄 一、初始化數據庫二、創建數據庫連接三、執行增刪改查語句1、增刪改2、查 四、執行增刪改查語句 接下來我簡單的介紹一下怎么用C語言連接數據庫。 初始化數據庫創建數據庫連接執行增刪改查語句關閉數據庫連接 一、初始化數據庫 // 數據庫初始化 MYSQL mysql; MYSQL* r…

優化LabVIEW數據運算效率的方法

在LabVIEW中進行大量數據運算時&#xff0c;提升計算效率并減少時間占用是開發過程中常遇到的挑戰。為此&#xff0c;可以從多個角度著手優化&#xff0c;包括合理選擇數據結構與算法、并行處理、多線程技術、硬件加速、內存管理和界面優化等。通過采用這些策略&#xff0c;可以…

開源模型應用落地-安全合規篇-用戶輸入價值觀判斷(四)

一、前言 在深度合規功能中,對用戶輸入內容的價值觀判斷具有重要意義。這一功能不僅僅是對信息合法性和合規性的簡單審核,更是對信息背后隱含的倫理道德和社會責任的深刻洞察。通過對價值觀的判斷,系統能夠識別可能引發不當影響或沖突的內容,從而為用戶提供更安全、更和諧的…

計算機的錯誤計算(一百七十六)

摘要 利用某一大語言模型計算 的值&#xff0c;輸出為 0 . 例1. 在某一大語言模型下&#xff0c;計算 的值。其中sin中值取弧度。結果保留16位有效數字。 直接貼圖吧&#xff1a; 點評&#xff1a; &#xff08;1&#xff09;以上為一個大模型給的答案。從其回答可知&…

數據結構與算法——1204—遞歸分治法

1、斐波那契數列優化 使用滾動變量&#xff0c;保存當前計算結果和前兩項值 (1)RAB (2)更新計算對象&#xff0c;AB&#xff0c;BR #include<iostream> using namespace std;int fun(int n) {if (n 0)return 0;if (n 1 || n 2)return 1;int num11;int num21;int su…

openstack內部rpc消息通信源碼分析

我們知道openstack內部消息隊列基于AMQP協議&#xff0c;默認使用的rabbitmq 消息隊列。談到rabbitmq&#xff0c;大家或許并不陌生&#xff0c;但或許會對oslo message有些陌生。openstack內部并不是直接使用rabbitmq&#xff0c;而是使用了oslo.message 。oslo.message 后端的…