搭建 dataX
前置條件
- JDK(1.8以上,推薦1.8)
- Python(2或3都可以)
- Apache Maven 3.x (Compile DataX)
下載 datax 編譯好的包
https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz
進入目錄,使用 powershell 打開
執行解壓命令
tar -xzf datax.tar.gz
得到
配置 windows 命令行支持 utf-8
不配置這個命令,執行同步命令會出現中文亂碼
先把日志“亂碼”變成可讀中文,最簡單兩步即可:
- 臨時改當前窗口編碼
打開 cmd,先執行
chcp 65001
再運行
python datax.py C:\Users\longz\Downloads\1.json
日志就會以 UTF-8 輸出,中文正常顯示。
- 永久生效(可選)
在 Windows 10/11 上可以把系統全局控制臺編碼改成 UTF-8:- 設置 → 時間和語言 → 區域 → 相關設置 → 管理語言設置 → 更改系統區域設置 → 勾選 “Beta: 使用 Unicode UTF-8 提供全球語言支持” → 重啟。
之后所有 cmd/PowerShell 窗口默認就是 UTF-8,不會再出現方塊或亂碼。
- 設置 → 時間和語言 → 區域 → 相關設置 → 管理語言設置 → 更改系統區域設置 → 勾選 “Beta: 使用 Unicode UTF-8 提供全球語言支持” → 重啟。
或者直接在控制面板找
把如下的復選框勾上
重啟
準備測試數據
新建兩個庫,相同的表名,user
CREATE TABLE IF NOT EXISTS `user` (`id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,`username` VARCHAR(50) NOT NULL,`password` VARCHAR(100) NOT NULL,`email` VARCHAR(100) NOT NULL,`created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
在 test1 庫,新增測試數據。
我這里用存儲過程,直接調用即可生成測試數據。
DELIMITER $$CREATE PROCEDURE GenerateUserTestData(IN num_records INT)
BEGINDECLARE i INT DEFAULT 1;DECLARE v_username VARCHAR(50);DECLARE v_password VARCHAR(100);DECLARE v_email VARCHAR(100);WHILE i <= num_records DO-- 生成隨機用戶名(如 User_12345)SET v_username = CONCAT('User_', FLOOR(10000 + RAND() * 90000));-- 生成隨機密碼(簡化示例,實際應加密)SET v_password = CONCAT('Pass_', FLOOR(100000 + RAND() * 900000));-- 生成隨機郵箱(如 user12345@example.com)SET v_email = CONCAT('user', FLOOR(10000 + RAND() * 90000), '@example.com');INSERT INTO `user` (username, password, email)VALUES (v_username, v_password, v_email);-- 每1000條提交一次,避免事務過大IF i % 1000 = 0 THENCOMMIT;END IF;SET i = i + 1;END WHILE;
END$$DELIMITER ;
生成完成后,目標,使用 datax 同步 test1 庫的數據到 test2 庫
新建 datax 的同步腳本
{"job": {"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "root","connection": [{"jdbcUrl": ["jdbc:mysql://localhost:3306/test1"],"table": ["user"]}],"column": ["*"],"splitPk": ""}},"writer": {"name": "mysqlwriter","parameter": {"username": "root","password": "root","connection": [{"jdbcUrl": "jdbc:mysql://localhost:3306/test2","table": ["user"]}],"column": ["*"],"writeMode": "insert"}}}],"setting": {"speed": {"channel": 1}}}
}
執行同步命令
python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.json
執行完成
D:\Information_Technology\worksapce_tool\datax-new\datax\bin>python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.jsonDataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2025-08-08 14:00:36.524 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2025-08-08 14:00:36.526 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2025-08-08 14:00:36.536 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2025-08-08 14:00:36.541 [main] INFO Engine - the machine info =>osInfo: Windows 10 amd64 10.0jvmInfo: Oracle Corporation 1.8 25.341-b10cpu num: 16totalPhysicalMemory: -0.00GfreePhysicalMemory: -0.00GmaxFileDescriptorCount: -1currentOpenFileDescriptorCount: -1GC Names [PS MarkSweep, PS Scavenge]MEMORY_NAME | allocation_size | init_sizePS Eden Space | 256.00MB | 256.00MBCode Cache | 240.00MB | 2.44MBCompressed Class Space | 1,024.00MB | 0.00MBPS Survivor Space | 42.50MB | 42.50MBPS Old Gen | 683.00MB | 683.00MBMetaspace | -0.00MB | 0.00MB2025-08-08 14:00:36.552 [main] INFO Engine -
{"content":[{"reader":{"name":"mysqlreader","parameter":{"username":"root","password":"****","connection":[{"jdbcUrl":["jdbc:mysql://localhost:3306/test1"],"table":["user"]}],"column":["*"],"splitPk":""}},"writer":{"name":"mysqlwriter","parameter":{"username":"root","password":"****","connection":[{"jdbcUrl":"jdbc:mysql://localhost:3306/test2","table":["user"]}],"column":["*"],"writeMode":"insert"}}}],"setting":{"speed":{"channel":1}}
}2025-08-08 14:00:36.565 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2025-08-08 14:00:36.566 [main] INFO JobContainer - DataX jobContainer starts job.
2025-08-08 14:00:36.566 [main] INFO JobContainer - Set jobId = 0
Fri Aug 08 14:00:36 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:42.909 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2025-08-08 14:00:42.910 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的風險. 因為您未配置讀取數據庫表的列,當您的表字段個數、類型有變動時,可能影響任務正確性甚至會運行出錯。請檢查您的配置并作出修改.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.091 [job-0] INFO OriginalConfPretreatmentUtil - table:[user] all columns:[
id,username,password,email,created_at
].
2025-08-08 14:00:43.091 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在風險. 因為您配置的寫入數據庫表的列為*,當您的表字段個數、類型有變動時,可能影響任務正確性甚至會運行出錯。請檢查您的配置并作出修改.
2025-08-08 14:00:43.092 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,username,password,email,created_at) VALUES(?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://localhost:3306/test2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2025-08-08 14:00:43.093 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2025-08-08 14:00:43.093 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2025-08-08 14:00:43.093 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2025-08-08 14:00:43.094 [job-0] INFO JobContainer - jobContainer starts to do split ...
2025-08-08 14:00:43.094 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
2025-08-08 14:00:43.097 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2025-08-08 14:00:43.097 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2025-08-08 14:00:43.115 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2025-08-08 14:00:43.117 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2025-08-08 14:00:43.119 [job-0] INFO JobContainer - Running by standalone Mode.
2025-08-08 14:00:43.124 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2025-08-08 14:00:43.126 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2025-08-08 14:00:43.127 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2025-08-08 14:00:43.136 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2025-08-08 14:00:43.140 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.179 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2025-08-08 14:00:43.456 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[321]ms
2025-08-08 14:00:43.456 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2025-08-08 14:00:53.142 [job-0] INFO StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.142 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2025-08-08 14:00:53.142 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2025-08-08 14:00:53.143 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
2025-08-08 14:00:53.143 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2025-08-08 14:00:53.144 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: D:\Information_Technology\worksapce_tool\datax-new\datax\hook
2025-08-08 14:00:53.144 [job-0] INFO JobContainer -[total cpu info] =>averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00%[total gc info] =>NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTimePS MarkSweep | 1 | 1 | 1 | 0.016s | 0.016s | 0.016sPS Scavenge | 1 | 1 | 1 | 0.008s | 0.008s | 0.008s2025-08-08 14:00:53.144 [job-0] INFO JobContainer - PerfTrace not enable!
2025-08-08 14:00:53.145 [job-0] INFO StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.146 [job-0] INFO JobContainer -
任務啟動時刻 : 2025-08-08 14:00:36
任務結束時刻 : 2025-08-08 14:00:53
任務總計耗時 : 16s
任務平均流量 : 519B/s
記錄寫入速度 : 10rec/s
讀出記錄總數 : 100
讀寫失敗總數 : 0
搭建 datax-web
下載源碼
git clone https://github.com/WeiYe-Jing/datax-web.git
創建數據庫
執行bin/db下面的datax_web.sql文件(注意老版本更新語句有指定庫名)
修改項目配置
1.修改datax_admin下resources/application.yml文件
#數據源datasource:username: rootpassword: rooturl: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8driver-class-name: com.mysql.jdbc.Driver
修改數據源配置,目前僅支持mysql
# 配置mybatis-plus打印sql日志
logging:level:com.wugui.datax.admin.mapper: errorpath: ./data/applogs/admin
修改日志路徑path
# datax-web emailmail:host: smtp.qq.comport: 25username: xxx@qq.compassword: xxxproperties:mail:smtp:auth: truestarttls:enable: truerequired: truesocketFactory:class: javax.net.ssl.SSLSocketFactory
修改郵件發送配置(不需要可以不修改)
2.修改datax_executor下resources/application.yml文件
# log config
logging:config: classpath:logback.xmlpath: ./data/applogs/executor/jobhandler
修改日志路徑path
datax:job:admin:### datax-web admin addressaddresses: http://127.0.0.1:8080executor:appname: datax-executorip:port: 9999### job log pathlogpath: ./data/applogs/executor/jobhandler### job log retention dayslogretentiondays: 30executor:jsonpath: D:\\Information_Technology\\worksapce_tool\\tmp\\executor\\json\\pypath: D:\\Information_Technology\\worksapce_tool\\datax-new\\datax\\bin\\datax.py
修改datax.job配置
- admin.addresses datax_admin部署地址,如調度中心集群部署存在多個地址則用逗號分隔,執行器將會使用該地址進行"執行器心跳注冊"和"任務結果回調";
- executor.appname 執行器AppName,每個執行器機器集群的唯一標示,執行器心跳注冊分組依據;
- executor.ip 默認為空表示自動獲取IP,多網卡時可手動設置指定IP,該IP不會綁定Host僅作為通訊實用;地址信息用于 “執行器注冊” 和 “調度中心請求并觸發任務”;
- executor.port 執行器Server端口號,默認端口為9999,單機部署多個執行器時,注意要配置不同執行器端口;
- executor.logpath 執行器運行日志文件存儲磁盤路徑,需要對該路徑擁有讀寫權限;
- executor.logretentiondays 執行器日志文件保存天數,過期日志自動清理, 限制值大于等于3時生效; 否則, 如-1, 關閉自動清理功能;
- executor.jsonpath datax json臨時文件保存路徑
- pypath DataX啟動腳本地址,例如:xxx/datax/bin/datax.py(這個路徑是上面搭建 dataX 已經創建好的啟動腳本)
如果系統配置DataX環境變量(DATAX_HOME),logpath、jsonpath、pypath可不配,log文件和臨時json存放在環境變量路徑下。
啟動項目
本地idea開發環境
- 1.運行datax_admin下 DataXAdminApplication
- 2.運行datax_executor下 DataXExecutorApplication
admin啟動成功后日志會輸出三個地址,兩個接口文檔地址,一個前端頁面地址
啟動成功
啟動成功后打開頁面(默認管理員用戶名:admin 密碼:123456)
http://localhost:8080/index.html#/dashboard
實戰
項目管理-添加項目
配置數據源
創建執行器
構建任務生成同步 json
點構建,會自動生成同步 datax 腳本,復制 json,回到菜單
創建任務
創建任務,按照如下填寫,并把 json 貼過來,并把 byte 的值改為 0