sqoop 实现将postgresql表导入hive表

要使用Sqoop将PostgreSQL表导入Hive表,首先需要确保已经安装了Sqoop和相关的依赖库。然后按照以下步骤操作:,,1. 编写一个名为sqoop_import.sh的脚本文件,内容如下:,,``bash,#!/bin/bash,sqoop import \n--connect jdbc:postgresql://<数据库地址>:<端口>/<数据库名> \n--username <用户名> \n--password <密码> \n--table <源表名> \n--hive-import \n--hive-table <目标Hive表名> \n--m 1,`,,注意替换<数据库地址><端口><数据库名><用户名><密码><源表名><目标Hive表名>为实际的值。,,2. 为脚本文件添加可执行权限:,,`bash,chmod +x sqoop_import.sh,`,,3. 运行脚本文件:,,`bash,./sqoop_import.sh,``,,这样,PostgreSQL表中的数据就会被导入到Hive表中。

Sqoop是一个用于在Hadoop和结构化数据存储(如关系数据库)之间进行大量数据传输的工具,它可以将关系数据库中的数据导入到Hadoop的HDFS、Hive、HBase等分布式文件系统中,也可以将Hadoop中的数据导出到关系数据库中,本文将介绍如何使用Sqoop将PostgreSQL表导入到Hive表中。

环境准备

1、安装并配置好Hadoop、Hive、PostgreSQL和Sqoop。

sqoop 实现将postgresql表导入hive表

2、在PostgreSQL中创建一个表,

CREATE TABLE test_postgresql (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT
);

3、在Hive中创建一个与PostgreSQL表结构相同的表:

sqoop 实现将postgresql表导入hive表

CREATE EXTERNAL TABLE test_hive (
    id INT,
    name STRING,
    age INT
) STORED BY 'org.apache.hadoop.hive.jdbc.storage.postgresql.PostgresStorageHandler'
TBLPROPERTIES (
    'hive.database' = 'default',
    'hive.table' = 'test_hive',
    'hive.external.jdbc.driver' = 'org.postgresql.Driver',
    'hive.external.jdbc.url' = 'jdbc:postgresql://localhost:5432/test',
    'hive.external.jdbc.username' = 'postgres',
    'hive.external.jdbc.password' = 'password',
    'hive.exec.dynamic.partition.mode' = 'nonstrict',
    'hive.compactor.initiator.on' = 'true',
    'hive.compactor.worker.threads' = '1',
    'hive.compactor.worker.checkinterval' = '600',
    'hive.compactor.worker.iothreads' = '1',
    'hive.compactor.heapsize' = '1073741824',
    'hive.compactor.logcleaner.maxbackups' = '100',
    'hive.compactor.logcleaner.minbackupstokeep' = '10',
    'hive.compactor.logcleaner.retaindeletes' = 'false',
    'hive.compactor.logcleaner.ttl' = '7200',
    'hive.compactor.compaction.enabled' = 'true',
    'hive.compactor.compaction.deltatargetsize' = '134217728',
    'hive.compactor.compaction.maxnumdeltapartitions' = '1000000',
    'hive.compactor.compaction.minnumdeltapartitions' = '100000',
    'hive.compactor.compaction.numthreads' = '1',
    'hive.compactor.compaction.queuesize' = '1000000',
    'hive.compactor.compaction.smallfilesthresholdmb' = '16',
    'hive.compactor.compaction.largefilesthresholdmb' = '134217728',
    'hive.compactor.compaction.initialnumofworkers' = '2',
    'hive.compactor.compaction.maxnumofworkers' = '20',
    'hive.compactor.compaction.policyclassname' = '',
    'hive.compactor.compactionschedulerclassname' = '',
    'hive.compactor.compactiontaskmanagerclassname' = '',
    'hive.compactor.compactiontaskexecutorclassname' = '',
    'hive.compactor.compactiontaskexecutorparams' = '',
    'hive.compactor.logcleanerclassname' => '',
    'hive.compactor.logcleanerschedulerclassname' => '',
    'hive.compactor.logcleanertaskmanagerclassname' => '',
    'hive.compactor.logcleanertaskexecutorclassname' => '',
    'hive.compactor.logcleanertaskexecutorparams' => '',
    'mapreduce.jobtrackeraddresses' => '',
    'mapreduce.frameworkjarsdirs': '',
    'mapreduce.jobhistoryserveraddress': '',
    'mapreduce.jobhistoryserverport': '',
    'mapreduce.jobhistoryserverwebappurl': '',
    'mapreduceclientsubmitterpluginclass': '',
    'mapreduceclientsubmitterpluginpath': '',
    'mapreduceclientjobsubmitterpluginclass': '',
    'mapreduceclientjobsubmitterpluginpath': '',
    'mapreduceclientjobcompleterpluginclass': '',
    'mapreduceclientjobcompleterpluginpath': '',
    'mapreduceclientsideinputpluginclass': '',
    'mapreduceclientsideinputpluginpath': '',
    'mapreduceclientsideoutputpluginclass': '',
    'mapreduceclientsideoutputpluginpath': '',
    'mapreduceclientshufflepluginclass': '',
    'mapreduceclientshufflepluginpath': '',
    'mapreduceclientsortpluginclass': '',
    'mapreduceclientsortpluginpath': '',
    'mapreduceclientaggregatorpluginclass': '',
    'mapreduceclientaggregatorpluginpath': '',
);

使用Sqoop将PostgreSQL表导入到Hive表中

1、使用以下命令将PostgreSQL表中的数据导入到Hive表中:

sqoop import 
connect jdbc:postgresql://localhost:5432/test 
username postgres 
password password 
table test_postgresql 
columns "id","name","age" 
targetdir /user/hadoop/test_hive 
astextfile 
nullstring '\N'; 
nullif 'N'; 
linesperrecord 1 
m 1 
direct 
mappers 1 
fields terminated by '\t'; 
batch 
numFiles 20 
outdir /user/hadoop/test_import; 

2、执行完上述命令后,Sqoop会将PostgreSQL表中的数据导入到Hive表中,可以使用以下命令查看Hive表中的数据:

sqoop 实现将postgresql表导入hive表

beeline u "jdbc:hive2://localhost:10000/default" e "select * from test_hive;" outputformat=tsCDN; 

相关问题与解答

原创文章,作者:K-seo,如若转载,请注明出处:https://www.kdun.cn/ask/503684.html

Like (0)
Donate 微信扫一扫 微信扫一扫
K-seo的头像K-seoSEO优化员
Previous 2024-05-21 04:16
Next 2024-05-21 04:18

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

免备案 高防CDN 无视CC/DDOS攻击 限时秒杀,10元即可体验  (专业解决各类攻击)>>点击进入