支持以下引擎
Spark
Flink
SeaTunnel Zeta
關鍵特性
批處理
精確一次性處理
列投影
并行處理
支持用戶自定義拆分
支持查詢 SQL 并實現投影效果
描述
通過 JDBC 讀取外部數據源數據。
支持的數據源信息
Datasource | Supported versions | Driver | Url | Maven |
---|---|---|---|---|
Vertica | Different dependency version has different driver class. | com.vertica.jdbc.Driver | jdbc:vertica://localhost:5433/vertica | Download |
## 數據庫依賴 |
請下載與 'Maven' 對應的支持列表,并將其復制到 '$SEATNUNNEL_HOME/plugins/jdbc/lib/' 工作目錄中
例如,Vertica 數據源:cp vertica-jdbc-xxx.jar $SEATNUNNEL_HOME/plugins/jdbc/lib/
數據類型映射
Vertical Data type | SeaTunnel Data type |
---|---|
BIT | BOOLEAN |
TINYINT TINYINT UNSIGNED SMALLINT SMALLINT UNSIGNED MEDIUMINT MEDIUMINT UNSIGNED INT INTEGER YEAR | INT |
INT UNSIGNED INTEGER UNSIGNED BIGINT | LONG |
BIGINT UNSIGNED | DECIMAL(20,0) |
DECIMAL(x,y)(Get the designated column's specified column size.<38) | DECIMAL(x,y) |
DECIMAL(x,y)(Get the designated column's specified column size.>38) | DECIMAL(38,18) |
DECIMAL UNSIGNED | DECIMAL((Get the designated column's specified column size)+1, (Gets the designated column's number of digits to right of the decimal point.))) |
FLOAT FLOAT UNSIGNED | FLOAT |
DOUBLE DOUBLE UNSIGNED | DOUBLE |
CHAR VARCHAR TINYTEXT MEDIUMTEXT TEXT LONGTEXT JSON | STRING |
DATE | DATE |
TIME | TIME |
DATETIME TIMESTAMP | TIMESTAMP |
TINYBLOB MEDIUMBLOB BLOB LONGBLOB BINARY VARBINAR BIT(n) | BYTES |
GEOMETRY UNKNOWN | Not supported yet |
源選項
Name | Type | Required | Default | Description |
---|---|---|---|---|
url | String | Yes | - | The URL of the JDBC connection. Refer to a case: jdbc:vertica://localhost:5433/vertica |
driver | String | Yes | - | The jdbc class name used to connect to the remote data source, if you use Vertica the value is com.vertica.jdbc.Driver . |
user | String | No | - | Connection instance user name |
password | String | No | - | Connection instance password |
query | String | Yes | - | Query statement |
connection_check_timeout_sec | Int | No | 30 | The time in seconds to wait for the database operation used to validate the connection to complete |
partition_column | String | No | - | The column name for parallelism's partition, only support numeric type,Only support numeric type primary key, and only can config one column. |
partition_lower_bound | Long | No | - | The partition_column min value for scan, if not set SeaTunnel will query database get min value. |
partition_upper_bound | Long | No | - | The partition_column max value for scan, if not set SeaTunnel will query database get max value. |
partition_num | Int | No | job parallelism | The number of partition count, only support positive integer. default value is job parallelism |
fetch_size | Int | No | 0 | For queries that return a large number of objects,you can configure the row fetch size used in the query toimprove performance by reducing the number database hits required to satisfy the selection criteria. Zero means use jdbc default value. |
common-options | No | - | Source plugin common parameters, please refer to Source Common Options for details |
- 提示
如果未設置 partition_column
,則會在單一并發中運行;如果設置了 partition_column
,則將根據任務的并發性進行并行執行。
任務示例
簡單示例:
此示例在單一并行中查詢您的測試“數據庫”中的 type_bin 'table'
16 個數據,并查詢其所有字段。您還可以指定要查詢的字段,以便將最終輸出顯示在控制臺上。
env {您可以在此處設置 Flink 配置
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
query = "select * from type_bin limit 16"
}
}transform {
# 如果您想獲取有關如何配置 seatunnel 的更多信息,并查看完整的轉換插件列表,
# 請訪問 https://seatunnel.apache.org/docs/transform-v2/sql
}sink {
Console {}
}
并行示例:
并行讀取您的查詢表,使用您配置的 shard 字段和 shard 數據。如果要讀取整個表,可以這樣做。
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 根據需要定義查詢邏輯
query = "select * from type_bin"
# 并行分片讀取字段
partition_column = "id"
# 片段數量
partition_num = 10
}
}
并行邊界示例:
根據查詢的上限和下限指定數據更加高效,根據您配置的上限和下限來讀取數據源更加高效
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 根據需要定義查詢邏輯
query = "select * from type_bin"
partition_column = "id"
# 讀取起始邊界
partition_lower_bound = 1
# 讀取結束邊界
partition_upper_bound = 500
partition_num = 10
}
}
本文由 白鯨開源科技 提供發布支持!