Impala 使用指南
更新时间:2024-08-15
Impala
Impala 是用于处理存储在 Hadoop 集群中的大量数据的 MPP(大规模并行处理)SQL 查询引擎。 它是一个用 C ++ 和 Java 编写的开源软件。 与其他 Hadoop 的 SQL 引擎相比,它提供了高性能和低延迟。
安装步骤
安装 metastore
参考 Presto使用指南 一文中"基于 S3 的 presto 访问"一节安装并配置 metastore
安装 impala
- 下载 rpm 包,地址http://archive.cloudera.com/cdh5/repo-as-tarball/5.14.0/cdh5.14.0-centos6.tar.gz
使用
tar -zxvf cdh5.14.0-centos6.tar.gz
解压之后,cd cdh/5.14.0,创建本地 server,运行
Bash
1python -m SimpleHTTPServer 8092 &
- 配置本地 yum 源
Bash
1vim /etc/yum.repos.d/localimp.repo
2[localimp]
3name=localimp
4baseurl=http://127.0.0.1:8092/
5gpgcheck=0
6enabled=1
- 使用如下命令安装
Bash
1yum install -y impala impala-server impala-state-store impala-catalog impala-shell
- 将 hive 的配置文件(metastore-site.xml)复制到 impala 的配置路径下:
Bash
1#把配置好的conf复制到/etc/impala/conf/路径下
2cp metastore/conf/metastore-site.xml /etc/impala/conf/hive-site.xml
- 增加 s3 配置 vim /etc/impala/conf/core-site.xml,参考 impala-s3 配置
XML
1<configuration>
2 <property>
3 <name>fs.s3a.block.size</name>
4 <value>134217728 </value>
5 </property>
6<property>
7 <name>fs.azure.user.agent.prefix</name>
8 <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/None</value>
9</property>
10<property>
11 <name>fs.s3a.connection.maximum</name>
12 <value>1500</value>
13</property>
14<property>
15 <name>fs.defaultFS</name>
16 <value>s3a://${bucket}</value>
17 </property>
18<property>
19 <name>fs.s3a.endpoint</name>
20 <value>s3.bj.bcebos.com</value>
21 <description>endpoint</description>
22 </property>
23<property>
24 <name>fs.s3a.access.key</name>
25 <value>${AK}</value>
26 <description>AK</description>
27 </property>
28<property>
29 <name>fs.s3a.secret.key</name>
30 <value>${SK}</value>
31 <description>SK</description>
32 </property>
33</configuration>
- 修改 bigtop 的配置,设置 JAVA_HOME,并确保 impala 用户也具有访问权限。 修改 bigtop 的 java_home 路径(3台机器)
Bash
1vim /etc/default/bigtop-utils
2export JAVA_HOME=/export/servers/jdk1.8.0_65
- 设置 mysql 驱动的软链接:
Bash
1ln -s mysql-connector-java-5.1.32.jar /usr/share/java/mysql-connector-java.jar
- 启动 impala
Bash
1service impala-state-store start
2service impala-catalog start
3service impala-server start
启动后可在 /var/log/impala 文件夹下查看日志 运行 impala-shell 命令:
Bash
1[root@my-node impala]# impala-shell
2Starting Impala Shell without Kerberos authentication
3Connected to my-node:21000
4Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
5***********************************************************************************
6Welcome to the Impala shell.
7(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan 6 13:27:16 PST 2018)
8
9When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
10the delimiter for fields in the same row. The default is ','.
11***********************************************************************************
12[my-node:21000] > show databases;
13Query: show databases
14+------------------+----------------------------------------------+
15| name | comment |
16+------------------+----------------------------------------------+
17| _impala_builtins | System database for Impala builtin functions |
18| default | Default Hive database |
19+------------------+----------------------------------------------+
20Fetched 2 row(s) in 0.16s
21[my-node:21000] > CREATE DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3';
22Query: create DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3'
23WARNINGS: Path 's3a://my-bigdata/impala' cannot be reached: Path does not exist.
24
25Fetched 0 row(s) in 2.51s
26[my-node:21000] > show databases;
27Query: show databases
28+------------------+----------------------------------------------+
29| name | comment |
30+------------------+----------------------------------------------+
31| _impala_builtins | System database for Impala builtin functions |
32| db_on_s3 | |
33| default | Default Hive database |
34+------------------+----------------------------------------------+
35Fetched 3 row(s) in 0.01s
36[my-node:21000] > use db_on_s3;
37Query: use db_on_s3
38[my-node:21000] > create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
39Query: create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
40Fetched 0 row(s) in 2.11s
41[my-node:21000] > insert into hive_test(a, b) values(1,'tom');
42Query: insert into hive_test(a, b) values(1,'tom')
43Query submitted at: 2023-09-13 19:20:26 (Coordinator: http://my-node:25000)
44Query progress can be monitored at: http://my-node:25000/query_plan?query_id=ec4463f20d37dfe4:5192e94f00000000
45Modified 1 row(s) in 7.57s
46[my-node:21000] > insert into hive_test(a, b) values(2,'jerry');
47Query: insert into hive_test(a, b) values(2,'jerry')
48Query submitted at: 2023-09-13 19:20:42 (Coordinator: http://my-node:25000)
49Query progress can be monitored at: http://my-node:25000/query_plan?query_id=694061adf492a154:4a24912d00000000
50Modified 1 row(s) in 1.02s
在对应路径下可看见新生成的文件: