Impala 使用指南
更新时间:2024-08-15
Impala
Impala 是用于处理存储在 Hadoop 集群中的大量数据的 MPP(大规模并行处理)SQL 查询引擎。 它是一个用 C ++ 和 Java 编写的开源软件。 与其他 Hadoop 的 SQL 引擎相比,它提供了高性能和低延迟。
安装步骤
安装 metastore
参考 Presto使用指南 一文中"基于 S3 的 presto 访问"一节安装并配置 metastore
安装 impala
- 下载 rpm 包,地址http://archive.cloudera.com/cdh5/repo-as-tarball/5.14.0/cdh5.14.0-centos6.tar.gz
使用 
tar -zxvf cdh5.14.0-centos6.tar.gz解压之后,cd cdh/5.14.0,创建本地 server,运行 
                Bash
                
            
            1python -m SimpleHTTPServer 8092 &
            - 配置本地 yum 源
 
                Bash
                
            
            1vim /etc/yum.repos.d/localimp.repo
2[localimp]
3name=localimp
4baseurl=http://127.0.0.1:8092/ 
5gpgcheck=0
6enabled=1
            - 使用如下命令安装
 
                Bash
                
            
            1yum install -y impala impala-server impala-state-store impala-catalog impala-shell
            - 将 hive 的配置文件(metastore-site.xml)复制到 impala 的配置路径下:
 
                Bash
                
            
            1#把配置好的conf复制到/etc/impala/conf/路径下
2cp metastore/conf/metastore-site.xml  /etc/impala/conf/hive-site.xml 
            - 增加 s3 配置 vim /etc/impala/conf/core-site.xml,参考 impala-s3 配置
 
                XML
                
            
            1<configuration>
2 <property>
3     <name>fs.s3a.block.size</name>
4     <value>134217728 </value>
5 </property>
6<property>
7    <name>fs.azure.user.agent.prefix</name>
8    <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/None</value>
9</property>
10<property>
11    <name>fs.s3a.connection.maximum</name>
12    <value>1500</value>
13</property>
14<property>
15    <name>fs.defaultFS</name>
16        <value>s3a://${bucket}</value>
17        </property>
18<property>
19    <name>fs.s3a.endpoint</name>
20        <value>s3.bj.bcebos.com</value>
21            <description>endpoint</description>
22            </property>
23<property>
24    <name>fs.s3a.access.key</name>
25        <value>${AK}</value>
26            <description>AK</description>
27            </property>
28<property>
29    <name>fs.s3a.secret.key</name>
30        <value>${SK}</value>
31            <description>SK</description>
32            </property>
33</configuration>
            - 修改 bigtop 的配置,设置 JAVA_HOME,并确保 impala 用户也具有访问权限。 修改 bigtop 的 java_home 路径(3台机器)
 
                Bash
                
            
            1vim /etc/default/bigtop-utils
2export JAVA_HOME=/export/servers/jdk1.8.0_65
            - 设置 mysql 驱动的软链接:
 
                Bash
                
            
            1ln -s mysql-connector-java-5.1.32.jar /usr/share/java/mysql-connector-java.jar
            - 启动 impala
 
                Bash
                
            
            1service impala-state-store start
2service impala-catalog start
3service impala-server start
            启动后可在 /var/log/impala 文件夹下查看日志 运行 impala-shell 命令:
                Bash
                
            
            1[root@my-node impala]# impala-shell
2Starting Impala Shell without Kerberos authentication
3Connected to my-node:21000
4Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
5***********************************************************************************
6Welcome to the Impala shell.
7(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)
8
9When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
10the delimiter for fields in the same row. The default is ','.
11***********************************************************************************
12[my-node:21000] > show databases;
13Query: show databases
14+------------------+----------------------------------------------+
15| name             | comment                                      |
16+------------------+----------------------------------------------+
17| _impala_builtins | System database for Impala builtin functions |
18| default          | Default Hive database                        |
19+------------------+----------------------------------------------+
20Fetched 2 row(s) in 0.16s
21[my-node:21000] > CREATE DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3';
22Query: create DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3'
23WARNINGS: Path 's3a://my-bigdata/impala' cannot be reached: Path does not exist.
24
25Fetched 0 row(s) in 2.51s
26[my-node:21000] > show databases;
27Query: show databases
28+------------------+----------------------------------------------+
29| name             | comment                                      |
30+------------------+----------------------------------------------+
31| _impala_builtins | System database for Impala builtin functions |
32| db_on_s3         |                                              |
33| default          | Default Hive database                        |
34+------------------+----------------------------------------------+
35Fetched 3 row(s) in 0.01s
36[my-node:21000] > use db_on_s3;
37Query: use db_on_s3
38[my-node:21000] > create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
39Query: create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
40Fetched 0 row(s) in 2.11s
41[my-node:21000] > insert into hive_test(a, b) values(1,'tom');
42Query: insert into hive_test(a, b) values(1,'tom')
43Query submitted at: 2023-09-13 19:20:26 (Coordinator: http://my-node:25000)
44Query progress can be monitored at: http://my-node:25000/query_plan?query_id=ec4463f20d37dfe4:5192e94f00000000
45Modified 1 row(s) in 7.57s
46[my-node:21000] > insert into hive_test(a, b) values(2,'jerry');
47Query: insert into hive_test(a, b) values(2,'jerry')
48Query submitted at: 2023-09-13 19:20:42 (Coordinator: http://my-node:25000)
49Query progress can be monitored at: http://my-node:25000/query_plan?query_id=694061adf492a154:4a24912d00000000
50Modified 1 row(s) in 1.02s
            在对应路径下可看见新生成的文件:

