Impala 使用指南

更新时间：2024-08-15

Impala

Impala 是用于处理存储在 Hadoop 集群中的大量数据的 MPP（大规模并行处理）SQL 查询引擎。它是一个用 C ++ 和 Java 编写的开源软件。与其他 Hadoop 的 SQL 引擎相比，它提供了高性能和低延迟。

安装步骤

安装 metastore

参考 Presto使用指南一文中"基于 S3 的 presto 访问"一节安装并配置 metastore

安装 impala

下载 rpm 包，地址http://archive.cloudera.com/cdh5/repo-as-tarball/5.14.0/cdh5.14.0-centos6.tar.gz 使用 tar -zxvf cdh5.14.0-centos6.tar.gz 解压之后，cd cdh/5.14.0，创建本地 server，运行

Bash

1python -m SimpleHTTPServer 8092 &

配置本地 yum 源

                Bash
                
            

                vim /etc/yum.repos.d/localimp.repo
[localimp]
name=localimp
baseurl=http://127.0.0.1:8092/ 
gpgcheck=0
enabled=1
            

使用如下命令安装

Bash

1yum install -y impala impala-server impala-state-store impala-catalog impala-shell

将 hive 的配置文件(metastore-site.xml)复制到 impala 的配置路径下：

Bash

1#把配置好的conf复制到/etc/impala/conf/路径下
2cp metastore/conf/metastore-site.xml  /etc/impala/conf/hive-site.xml

增加 s3 配置 vim /etc/impala/conf/core-site.xml，参考 impala-s3 配置

                XML
                
            

                <configuration>
 <property>
     <name>fs.s3a.block.size</name>
     <value>134217728 </value>
 </property>
<property>
    <name>fs.azure.user.agent.prefix</name>
    <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/None</value>
</property>
<property>
    <name>fs.s3a.connection.maximum</name>
    <value>1500</value>
</property>
<property>
    <name>fs.defaultFS</name>
        <value>s3a://${bucket}</value>
        </property>
<property>
    <name>fs.s3a.endpoint</name>
        <value>s3.bj.bcebos.com</value>
            <description>endpoint</description>
            </property>
<property>
    <name>fs.s3a.access.key</name>
        <value>${AK}</value>
            <description>AK</description>
            </property>
<property>
    <name>fs.s3a.secret.key</name>
        <value>${SK}</value>
            <description>SK</description>
            </property>
</configuration>
            

修改 bigtop 的配置，设置 JAVA_HOME，并确保 impala 用户也具有访问权限。修改 bigtop 的 java_home 路径（3台机器）

                Bash
                
                vim /etc/default/bigtop-utils
export JAVA_HOME=/export/servers/jdk1.8.0_65

设置 mysql 驱动的软链接：

Bash

1ln -s mysql-connector-java-5.1.32.jar /usr/share/java/mysql-connector-java.jar

启动 impala

                Bash
                
                service impala-state-store start
service impala-catalog start
service impala-server start

启动后可在 /var/log/impala 文件夹下查看日志运行 impala-shell 命令：

                Bash
                
            

                [root@my-node impala]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to my-node:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
the delimiter for fields in the same row. The default is ','.
***********************************************************************************
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 2 row(s) in 0.16s
[my-node:21000] > CREATE DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3';
Query: create DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3'
WARNINGS: Path 's3a://my-bigdata/impala' cannot be reached: Path does not exist.

Fetched 0 row(s) in 2.51s
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| db_on_s3         |                                              |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 3 row(s) in 0.01s
[my-node:21000] > use db_on_s3;
Query: use db_on_s3
[my-node:21000] > create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Query: create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Fetched 0 row(s) in 2.11s
[my-node:21000] > insert into hive_test(a, b) values(1,'tom');
Query: insert into hive_test(a, b) values(1,'tom')
Query submitted at: 2023-09-13 19:20:26 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=ec4463f20d37dfe4:5192e94f00000000
Modified 1 row(s) in 7.57s
[my-node:21000] > insert into hive_test(a, b) values(2,'jerry');
Query: insert into hive_test(a, b) values(2,'jerry')
Query submitted at: 2023-09-13 19:20:42 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=694061adf492a154:4a24912d00000000
Modified 1 row(s) in 1.02s