Paimon Catalog
更新时间:2025-05-29
快速体验 PALO & Paimon
使用须知
- 数据放在 hdfs 时,需要将 core-site.xml,hdfs-site.xml 和 hive-site.xml 放到 FE 和 BE 的 conf 目录下。优先读取 conf 目录下的 hadoop 配置文件,再读取环境变量
HADOOP_CONF_DIR
的相关配置文件。 - 当前适配的 Paimon 版本为 0.8。
创建 Catalog
Paimon Catalog 当前支持两种类型的 Metastore 创建 Catalog:
- filesystem(默认),同时存储元数据和数据在 filesystem。
- hive metastore,它还将元数据存储在 Hive metastore 中。用户可以直接从 Hive 访问这些表。
基于 FileSystem 创建 Catalog
HDFS
SQL
1CREATE CATALOG `paimon_hdfs` PROPERTIES (
2 "type" = "paimon",
3 "warehouse" = "hdfs://HDFS8000871/user/paimon",
4 "dfs.nameservices" = "HDFS8000871",
5 "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
6 "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
7 "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
8 "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
9 "hadoop.username" = "hadoop"
10);
11
12CREATE CATALOG `paimon_kerberos` PROPERTIES (
13 'type'='paimon',
14 "warehouse" = "hdfs://HDFS8000871/user/paimon",
15 "dfs.nameservices" = "HDFS8000871",
16 "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
17 "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
18 "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
19 "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
20 'hadoop.security.authentication' = 'kerberos',
21 'hadoop.kerberos.keytab' = '/doris/hdfs.keytab',
22 'hadoop.kerberos.principal' = 'hdfs@HADOOP.COM'
23);
MINIO
SQL
1CREATE CATALOG `paimon_s3` PROPERTIES (
2 "type" = "paimon",
3 "warehouse" = "s3://bucket_name/paimons3",
4 "s3.endpoint" = "http://<ip>:<port>",
5 "s3.access_key" = "ak",
6 "s3.secret_key" = "sk"
7);
OBS
SQL
1CREATE CATALOG `paimon_obs` PROPERTIES (
2 "type" = "paimon",
3 "warehouse" = "obs://bucket_name/paimon",
4 "obs.endpoint"="obs.cn-north-4.myhuaweicloud.com",
5 "obs.access_key"="ak",
6 "obs.secret_key"="sk"
7);
COS
SQL
1CREATE CATALOG `paimon_s3` PROPERTIES (
2 "type" = "paimon",
3 "warehouse" = "cosn://paimon-1308700295/paimoncos",
4 "cos.endpoint" = "cos.ap-beijing.myqcloud.com",
5 "cos.access_key" = "ak",
6 "cos.secret_key" = "sk"
7);
OSS
SQL
1CREATE CATALOG `paimon_oss` PROPERTIES (
2 "type" = "paimon",
3 "warehouse" = "oss://paimon-zd/paimonoss",
4 "oss.endpoint" = "oss-cn-beijing.aliyuncs.com",
5 "oss.access_key" = "ak",
6 "oss.secret_key" = "sk"
7);
Google Cloud Storage
SQL
1CREATE CATALOG `paimon_gcs` PROPERTIES (
2 "type" = "paimon",
3 "warehouse" = "gs://bucket/warehouse",
4 "s3.access_key" = "ak",
5 "s3.secret_key" = "sk",
6 "s3.region" = "region",
7 "s3.endpoint" = "storage.googleapis.com"
8);
基于 Hive Metastore 创建 Catalog
SQL
1CREATE CATALOG `paimon_hms` PROPERTIES (
2 "type" = "paimon",
3 "paimon.catalog.type" = "hms",
4 "warehouse" = "hdfs://HDFS8000871/user/zhangdong/paimon2",
5 "hive.metastore.uris" = "thrift://172.21.0.44:7004",
6 "dfs.nameservices" = "HDFS8000871",
7 "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
8 "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
9 "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
10 "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
11 "hadoop.username" = "hadoop"
12);
13
14CREATE CATALOG `paimon_kerberos` PROPERTIES (
15 "type" = "paimon",
16 "paimon.catalog.type" = "hms",
17 "warehouse" = "hdfs://HDFS8000871/user/zhangdong/paimon2",
18 "hive.metastore.uris" = "thrift://172.21.0.44:7004",
19 "hive.metastore.sasl.enabled" = "true",
20 "hive.metastore.kerberos.principal" = "hive/xxx@HADOOP.COM",
21 "dfs.nameservices" = "HDFS8000871",
22 "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
23 "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
24 "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
25 "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
26 "hadoop.security.authentication" = "kerberos",
27 "hadoop.kerberos.principal" = "hdfs@HADOOP.COM",
28 "hadoop.kerberos.keytab" = "/doris/hdfs.keytab"
29);
基于 Aliyun DLF 创建 Catalog
该功能自 2.1.7 和 3.0.3 版本支持。
Plain Text
1CREATE CATALOG `paimon_dlf` PROPERTIES (
2 "type" = "paimon",
3 "paimon.catalog.type" = "dlf",
4 "warehouse" = "oss://xx/yy/",
5 "dlf.proxy.mode" = "DLF_ONLY",
6 "dlf.uid" = "xxxxx",
7 "dlf.region" = "cn-beijing",
8 "dlf.access_key" = "ak",
9 "dlf.secret_key" = "sk"
10
11 -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com", -- optional
12 -- "dlf.catalog.id" = "xxxx", -- optional
13);
基于 Google Dataproc Metastore 创建 Catalog
SQL
1CREATE CATALOG `paimon_gms` PROPERTIES (
2 "type" = "paimon",
3 "paimon.catalog.type" = "hms",
4 "hive.metastore.uris" = "thrift://ip:port",
5 "warehouse" = "gs://bucket/warehouse",
6 "s3.access_key" = "ak",
7 "s3.secret_key" = "sk",
8 "s3.region" = "region",
9 "s3.endpoint" = "storage.googleapis.com"
10);
列类型映射
Paimon Data Type | PALO Data Type | Comment |
---|---|---|
BooleanType | Boolean | |
TinyIntType | TinyInt | |
SmallIntType | SmallInt | |
IntType | Int | |
FloatType | Float | |
BigIntType | BigInt | |
DoubleType | Double | |
VarCharType | VarChar | |
CharType | Char | |
VarBinaryType, BinaryType | String | |
DecimalType(precision, scale) | Decimal(precision, scale) | |
TimestampType,LocalZonedTimestampType | DateTime | |
DateType | Date | |
ArrayType | Array | 支持 Array 嵌套 |
MapType | Map | 支持 Map 嵌套 |
RowType | Struct | 支持 Struct 嵌套(2.0.10 和 2.1.3 版本开始支持) |
常见问题
-
Kerberos 问题
- 确保 principal 和 keytab 配置正确。
- 需在 BE 节点启动定时任务(如 crontab),每隔一定时间(如 12 小时),执行一次
kinit -kt your_principal your_keytab
命令。
-
Unknown type value: UNSUPPORTED
这是 PALO 2.0.2 版本和 Paimon 0.5 版本的一个兼容性问题,需要升级到 2.0.3 或更高版本解决,或自行 patch
-
访问对象存储(OSS、S3 等)报错文件系统不支持
在 2.0.5(含)之前的版本,用户需手动下载以下 jar 包并放置在
${DORIS_HOME}/be/lib/java_extensions/preload-extensions
目录下,重启 BE。- 访问 OSS:paimon-oss-0.6.0-incubating.jar
-
访问其他对象存储:paimon-s3-0.6.0-incubating.jar
2.0.6 之后的版本不再需要用户手动放置。