sqoop_import.sh
的脚本文件,内容如下:,,``bash,#!/bin/bash,sqoop import \n--connect jdbc:postgresql://<数据库地址>:<端口>/<数据库名> \n--username <用户名> \n--password <密码> \n--table <源表名> \n--hive-import \n--hive-table <目标Hive表名> \n--m 1,
`,,注意替换
<数据库地址>、
<端口>、
<数据库名>、
<用户名>、
<密码>、
<源表名>和
<目标Hive表名>为实际的值。,,2. 为脚本文件添加可执行权限:,,
`bash,chmod +x sqoop_import.sh,
`,,3. 运行脚本文件:,,
`bash,./sqoop_import.sh,
``,,这样,PostgreSQL表中的数据就会被导入到Hive表中。Sqoop是一个用于在Hadoop和结构化数据存储(如关系数据库)之间进行大量数据传输的工具,它可以将关系数据库中的数据导入到Hadoop的HDFS、Hive、HBase等分布式文件系统中,也可以将Hadoop中的数据导出到关系数据库中,本文将介绍如何使用Sqoop将PostgreSQL表导入到Hive表中。
环境准备
1、安装并配置好Hadoop、Hive、PostgreSQL和Sqoop。
2、在PostgreSQL中创建一个表,
CREATE TABLE test_postgresql ( id INT PRIMARY KEY, name VARCHAR(50), age INT );
3、在Hive中创建一个与PostgreSQL表结构相同的表:
CREATE EXTERNAL TABLE test_hive ( id INT, name STRING, age INT ) STORED BY 'org.apache.hadoop.hive.jdbc.storage.postgresql.PostgresStorageHandler' TBLPROPERTIES ( 'hive.database' = 'default', 'hive.table' = 'test_hive', 'hive.external.jdbc.driver' = 'org.postgresql.Driver', 'hive.external.jdbc.url' = 'jdbc:postgresql://localhost:5432/test', 'hive.external.jdbc.username' = 'postgres', 'hive.external.jdbc.password' = 'password', 'hive.exec.dynamic.partition.mode' = 'nonstrict', 'hive.compactor.initiator.on' = 'true', 'hive.compactor.worker.threads' = '1', 'hive.compactor.worker.checkinterval' = '600', 'hive.compactor.worker.iothreads' = '1', 'hive.compactor.heapsize' = '1073741824', 'hive.compactor.logcleaner.maxbackups' = '100', 'hive.compactor.logcleaner.minbackupstokeep' = '10', 'hive.compactor.logcleaner.retaindeletes' = 'false', 'hive.compactor.logcleaner.ttl' = '7200', 'hive.compactor.compaction.enabled' = 'true', 'hive.compactor.compaction.deltatargetsize' = '134217728', 'hive.compactor.compaction.maxnumdeltapartitions' = '1000000', 'hive.compactor.compaction.minnumdeltapartitions' = '100000', 'hive.compactor.compaction.numthreads' = '1', 'hive.compactor.compaction.queuesize' = '1000000', 'hive.compactor.compaction.smallfilesthresholdmb' = '16', 'hive.compactor.compaction.largefilesthresholdmb' = '134217728', 'hive.compactor.compaction.initialnumofworkers' = '2', 'hive.compactor.compaction.maxnumofworkers' = '20', 'hive.compactor.compaction.policyclassname' = '', 'hive.compactor.compactionschedulerclassname' = '', 'hive.compactor.compactiontaskmanagerclassname' = '', 'hive.compactor.compactiontaskexecutorclassname' = '', 'hive.compactor.compactiontaskexecutorparams' = '', 'hive.compactor.logcleanerclassname' => '', 'hive.compactor.logcleanerschedulerclassname' => '', 'hive.compactor.logcleanertaskmanagerclassname' => '', 'hive.compactor.logcleanertaskexecutorclassname' => '', 'hive.compactor.logcleanertaskexecutorparams' => '', 'mapreduce.jobtrackeraddresses' => '', 'mapreduce.frameworkjarsdirs': '', 'mapreduce.jobhistoryserveraddress': '', 'mapreduce.jobhistoryserverport': '', 'mapreduce.jobhistoryserverwebappurl': '', 'mapreduceclientsubmitterpluginclass': '', 'mapreduceclientsubmitterpluginpath': '', 'mapreduceclientjobsubmitterpluginclass': '', 'mapreduceclientjobsubmitterpluginpath': '', 'mapreduceclientjobcompleterpluginclass': '', 'mapreduceclientjobcompleterpluginpath': '', 'mapreduceclientsideinputpluginclass': '', 'mapreduceclientsideinputpluginpath': '', 'mapreduceclientsideoutputpluginclass': '', 'mapreduceclientsideoutputpluginpath': '', 'mapreduceclientshufflepluginclass': '', 'mapreduceclientshufflepluginpath': '', 'mapreduceclientsortpluginclass': '', 'mapreduceclientsortpluginpath': '', 'mapreduceclientaggregatorpluginclass': '', 'mapreduceclientaggregatorpluginpath': '', );
使用Sqoop将PostgreSQL表导入到Hive表中
1、使用以下命令将PostgreSQL表中的数据导入到Hive表中:
sqoop import connect jdbc:postgresql://localhost:5432/test username postgres password password table test_postgresql columns "id","name","age" targetdir /user/hadoop/test_hive astextfile nullstring '\N'; nullif 'N'; linesperrecord 1 m 1 direct mappers 1 fields terminated by '\t'; batch numFiles 20 outdir /user/hadoop/test_import;
2、执行完上述命令后,Sqoop会将PostgreSQL表中的数据导入到Hive表中,可以使用以下命令查看Hive表中的数据:
beeline u "jdbc:hive2://localhost:10000/default" e "select * from test_hive;" outputformat=tsCDN;
相关问题与解答
原创文章,作者:K-seo,如若转载,请注明出处:https://www.kdun.cn/ask/503684.html