Unable To Create Hive Table Using Presto From A Csv File

Unable to create Hive table using Presto from a CSV File

Tags: amazon-s3 , sql , csv , hive , presto Answers: 1 | Viewed 6,844 times



I want to create a Hive table using Presto with data stored in a csv file on S3.


I have uploaded the file on S3 and I am sure that the Presto is able to connect to the bucket.


Now, when I give the create table command, I get all the values(rows) as NULL upon querying the table.


I tried looking into similar issues but it turns out Presto is not so famous on Stackoverflow.


Some of the rows from the file are:


PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S
12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S
13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S
14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S
15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S
16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S
17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q
18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S
19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S
20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C

My csv file is here, take train.csv from here. Hence, my presto command is:


create table testing_nan_4 ( PassengerId integer, Survived integer, Pclass integer, Name varchar, Sex varchar, Age integer, SibSp integer, Parch integer, Ticket integer, Fare double, Cabin varchar, Embarked varchar ) with ( external_location = 's3://my_bucket/titanic_train/', format = 'textfile' );

The results are:


 passengerid | survived | pclass | name | sex  | age  | sibsp | parch | ticket | fare | cabin | embarked
-------------+----------+--------+------+------+------+-------+-------+--------+------+-------+----------
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL
NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL

and expected is to get the actual data.


Some Answers For Unable To Create Hive Table Using Presto From A Csv File

#1. Unable to create Hive table using Presto from a CSV File

Jun 16, 2019  · Unable to create Hive table using Presto from a CSV File. Ask Question Asked 3 years, 3 months ago. ... Viewed 9k times 3 1. I want to create a Hive table using Presto …


PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S 5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S 6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q 7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S 8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S 9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S 10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C 11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S 12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S 13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S 14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S 15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S 16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S 17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q 18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S 19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S 20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C 

create table testing_nan_4 ( PassengerId integer, Survived integer, Pclass integer, Name varchar, Sex varchar, Age integer, SibSp integer, Parch integer, Ticket integer, Fare double, Cabin varchar, Embarked varchar ) with ( external_location = 's3://my_bucket/titanic_train/', format = 'textfile' );

 passengerid | survived | pclass | name | sex  | age  | sibsp | parch | ticket | fare | cabin | embarked -------------+----------+--------+------+------+------+-------+-------+--------+------+-------+----------  NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL

CREATE TABLE hive.default.csv_table_with_custom_parameters (
c_bigint varchar,
c_varchar varchar) WITH (
csv_escape = '',
csv_quote = '',
csv_separator = U&'\0001', -- to pass unicode character
external_location = 'hdfs://hadoop/datacsv_table_with_custom_parameters',
format = 'CSV')

CREATE TABLE hive.default.csv_table_with_custom_parameters (
PassengerId int, Survived int, Pclass int, Name string, Sex string, Age int, SibSp int, Parch int, Ticket int, Fare double, Cabin string, Embarked string) WITH (
csv_escape = '\',
csv_quote = '"',
csv_separator = ',',
external_location = 's3://my_bucket/titanic_train/',
format = 'CSV')

CREATE EXTERNAL TABLE mytable (
PassengerId int, Survived int, Pclass int, Name string, Sex string, Age int, SibSp int, Parch int, Ticket int, Fare double, Cabin string, Embarked string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\' ) STORED AS TEXTFILE LOCATION 's3://bucket-path/csv_data/' TBLPROPERTIES (
"skip.header.line.count"="1")

#2. Unable to create a hive table from the .csv file t.

May 03, 2022  · Unable to create a hive table from the .csv file that exists in a directory in S3 location (s3a://test/dir2/) , however the same works when the .csv file is present directly in s3 …


export AWS_ACCESS_KEY_ID=xMK6bdX8iY************************************** export AWS_SECRET_KEY=34***************************************
set fs.s3a.endpoint=cluster.domain.*;
set fs.s3a.access.key=$$$$$$$$$$$$$$###;
set fs.s3a.secret.key=####$$$$;
0: jdbc:hive2://>
CREATE EXTERNAL TABLE s3dir ( . . . . . . . . >
col1 int, . . . . . . . . >
col2 string, . . . . . . . . >
col3 string, . . . . . . . . >
col4 string . . . . . . . . >
) . . . . . . . . >
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' . . . . . . . . >
LOCATION 's3a://test/dir2/' . . . . . . . . >
TBLPROPERTIES ( . . . . . . . . >
"s3select.format" = "csv" . . . . . . . . >
);
22/05/03 03:06:32 [2199007f-0721-4e46-89b6-40cef824235c main]: WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 22/05/03 03:06:36 [HiveServer2-Background-Pool: Thread-71]: ERROR exec.Task: Failed org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)))
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1170) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1175) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:140) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:82) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:88) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:327) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_322]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_322]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) [hadoop-common-3.1.1.7.1.7.1000-141.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:345) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_322]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_322]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322] Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63918) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63886) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63812) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1796) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1783) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3622) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:145) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1082) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1067) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_322]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_322]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_322]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_322]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at com.sun.proxy.$Proxy35.createTable(Unknown Source) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_322]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_322]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_322]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_322]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3515) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at com.sun.proxy.$Proxy35.createTable(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1159) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
... 28 more 22/05/03 03:06:36 [HiveServer2-Background-Pool: Thread-71]: ERROR exec.Task: DDLTask failed, DDL Operation: class org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))) ERROR : FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))) Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))) (state=08S01,code=40000)
0: jdbc:hive2://>
CREATE EXTERNAL TABLE s3notdir ( . . . . . . . . >
col1 int, . . . . . . . . >
col2 string, . . . . . . . . >
col3 string, . . . . . . . . >
col4 string . . . . . . . . >
) . . . . . . . . >
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' . . . . . . . . >
LOCATION 's3a://test/' . . . . . . . . >
TBLPROPERTIES ( . . . . . . . . >
"s3select.format" = "csv" . . . . . . . . >
);
OK No rows affected (2.223 seconds) 0: jdbc:hive2://>

#3. Load and Query CSV File in S3 with Presto | by Yifeng …

Jul 08, 2020  · Create an external table for CSV data. You can create many tables under a single schema. Notes: CSV format table currently only supports VARCHAR data type. I set …


head -n 3 tlc_yellow_trips_2018.csvVendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount
2,05/19/2018 11:51:48 PM,05/20/2018 12:07:31 AM,1,2.01,1,N,48,158,2,11.5,0.5,0.5,0,0,0.3,12.8
1,05/19/2018 11:22:53 PM,05/19/2018 11:35:14 PM,1,1.3,1,N,142,164,2,9,0.5,0.5,0,0,0.3,10.3
1,05/19/2018 11:37:02 PM,05/19/2018 11:52:41 PM,1,2.2,1,N,164,114,1,11,0.5,0.5,3.05,0,0.3,15.35
hdfs dfs -mkdir -p s3a://deephub/warehouse/nyc_text.db/tlc_yellow_trips_2018
s5cmd --endpoint-url=http://192.168.170.12:80 cp tlc_yellow_trips_2018.csv s3://deephub/warehouse/nyc_text.db/tlc_yellow_trips_2018/tlc_yellow_trips_2018.csv
presto-cli --server <coordinate_node:port>
--catalog hive
presto>
CREATE SCHEMA nyc_text WITH (LOCATION = 's3a://deephub/warehouse/nyc_text.db');
presto>
CREATE TABLE hive.nyc_text.tlc_yellow_trips_2018 (

vendorid VARCHAR,

tpep_pickup_datetime VARCHAR,

tpep_dropoff_datetime VARCHAR,

passenger_count VARCHAR,

trip_distance VARCHAR,

ratecodeid VARCHAR,

store_and_fwd_flag VARCHAR,

pulocationid VARCHAR,

dolocationid VARCHAR,

payment_type VARCHAR,

fare_amount VARCHAR,

extra VARCHAR,

mta_tax VARCHAR,

tip_amount VARCHAR,

tolls_amount VARCHAR,

improvement_surcharge VARCHAR,

total_amount VARCHAR)
WITH (FORMAT = 'CSV',

skip_header_line_count = 1,

EXTERNAL_LOCATION = 's3a://deephub/warehouse/nyc_text.db/tlc_yellow_trips_2018')
;
presto>
SELECT * FROM nyc_text.tlc_yellow_trips_2018 LIMIT 10;
hdfs dfs -mkdir -p s3a://deephub/warehouse/nyc_parq.db
presto>
CREATE SCHEMA nyc_parq WITH (LOCATION = 's3a://deephub/warehouse/nyc_parq.db');
presto>
CREATE TABLE hive.nyc_parq.tlc_yellow_trips_2018
COMMENT '2018 Newyork City taxi data'
WITH (FORMAT = 'PARQUET')
AS
SELECT

cast(vendorid as INTEGER) as vendorid,

date_parse(tpep_pickup_datetime, '%m/%d/%Y %h:%i:%s %p') as tpep_pickup_datetime,

date_parse(tpep_dropoff_datetime, '%m/%d/%Y %h:%i:%s %p') as tpep_dropoff_datetime,

cast(passenger_count as SMALLINT) as passenger_count,

cast(trip_distance as DECIMAL(8, 2)) as trip_distance,

cast(ratecodeid as INTEGER) as ratecodeid,

cast(store_and_fwd_flag as CHAR(1)) as store_and_fwd_flag,

cast(pulocationid as INTEGER) as pulocationid,

cast(dolocationid as INTEGER) as dolocationid,

cast(payment_type as SMALLINT) as payment_type,

cast(fare_amount as DECIMAL(8, 2)) as fare_amount,

cast(extra as DECIMAL(8, 2)) as extra,

cast(mta_tax as DECIMAL(8, 2)) as mta_tax,

cast(tip_amount as DECIMAL(8, 2)) as tip_amount,

cast(tolls_amount as DECIMAL(8, 2)) as tolls_amount,

cast(improvement_surcharge as DECIMAL(8, 2)) as improvement_surcharge,

cast(total_amount as DECIMAL(8, 2)) as total_amount
FROM hive.nyc_text.tlc_yellow_trips_2018
;
presto>
SELECT * FROM nyc_parq.tlc_yellow_trips_2018 LIMIT 10;
presto>
describe nyc_parq.tlc_yellow_trips_2018;

Column
|
Type
| Extra | Comment
-----------------------+--------------+-------+---------
vendorid | integer
|
|
tpep_pickup_datetime | timestamp
|
|
tpep_dropoff_datetime | timestamp
|
|
passenger_count
| smallint
|
|
trip_distance
| decimal(8,2) |
|
ratecodeid| integer
|
|
store_and_fwd_flag
| char(1)
|
|
pulocationid

| integer
|
|
dolocationid

| integer
|
|
payment_type

| smallint
|
|
fare_amount

| decimal(8,2) |
|
extra
| decimal(8,2) |
|
mta_tax
| decimal(8,2) |
|
tip_amount| decimal(8,2) |
|
tolls_amount

| decimal(8,2) |
|
improvement_surcharge | decimal(8,2) |
|
total_amount

| decimal(8,2) |
|
(17 rows)

#4. hadoop - Unable to create table in Hive for csv file - Stack …

Jul 09, 2016  · cat -v file.txt CREATE TABLE TestData ( id1 int, id2 int, id3 int, id4 String) ROW FORMAT DELIMITED FIELDS TERMINATED BY M-bM-^@M-^X,M-bM-^@M-^Y stored as …


CREATE TABLE TestData ( id1 int, id2 int, id3 int, id4 String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ stored as textfile;

FAILED: ParseException line 1:106 mismatched input ',' expecting StringLiteral near 'BY' in table row format's field separator 

cat -v file.txt
CREATE TABLE TestData ( id1 int, id2 int, id3 int, id4 String) ROW FORMAT DELIMITED FIELDS TERMINATED BY M-bM-^@M-^X,M-bM-^@M-^Y stored as textfile;

#5. Create Hive tables from CSV files - Cloudera Community


CREATE EXTERNAL TABLE table_name ( colA ... colB.. )  partitioned by ..  location ...
msck repair table table_name alter table table_name add partition ...

msck repair table table_name alter table table_name add partition ...

#6. Stack Overflow - Where Developers Learn, Share, & Build …

Stack Overflow - Where Developers Learn, Share, & Build Careers


PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S 5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S 6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q 7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S 8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S 9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S 10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C 11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S 12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S 13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S 14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S 15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S 16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S 17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q 18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S 19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S 20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C 

create table testing_nan_4 ( PassengerId integer, Survived integer, Pclass integer, Name varchar, Sex varchar, Age integer, SibSp integer, Parch integer, Ticket integer, Fare double, Cabin varchar, Embarked varchar ) with ( external_location = 's3://my_bucket/titanic_train/', format = 'textfile' );

 passengerid | survived | pclass | name | sex  | age  | sibsp | parch | ticket | fare | cabin | embarked -------------+----------+--------+------+------+------+-------+-------+--------+------+-------+----------  NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL NULL
| NULL
| NULL
| NULL | NULL | NULL | NULL | NULL | NULL
| NULL | NULL | NULL

CREATE TABLE hive.default.csv_table_with_custom_parameters (
c_bigint varchar,
c_varchar varchar) WITH (
csv_escape = '',
csv_quote = '',
csv_separator = U&'\0001', -- to pass unicode character
external_location = 'hdfs://hadoop/datacsv_table_with_custom_parameters',
format = 'CSV')

CREATE TABLE hive.default.csv_table_with_custom_parameters (
PassengerId int, Survived int, Pclass int, Name string, Sex string, Age int, SibSp int, Parch int, Ticket int, Fare double, Cabin string, Embarked string) WITH (
csv_escape = '\',
csv_quote = '"',
csv_separator = ',',
external_location = 's3://my_bucket/titanic_train/',
format = 'CSV')

CREATE EXTERNAL TABLE mytable (
PassengerId int, Survived int, Pclass int, Name string, Sex string, Age int, SibSp int, Parch int, Ticket int, Fare double, Cabin string, Embarked string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\' ) STORED AS TEXTFILE LOCATION 's3://bucket-path/csv_data/' TBLPROPERTIES (
"skip.header.line.count"="1")

#7. Hive Load CSV File into Table - Spark by {Examples}


 LOAD DATA [LOCAL] INPATH 'filepath'
[OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat'
SERDE 'serde']

 CREATE TABLE IF NOT EXISTS emp.employee (  id int,  name string,  age int,  gender string )  COMMENT 'Employee Table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

 LOAD DATA INPATH '/user/hive/data/data.csv'
INTO TABLE emp.employee;

 hdfs dfs -ls /user/hive/warehouse/emp.db/employee/ -rw-r--r--
1 hive supergroup
52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt

 SELECT * FROM emp.employee 

 LOAD DATA LOCAL INPATH '/home/hive/data.csv'
INTO TABLE emp.employee;

 LOAD DATA LOCAL INPATH '/home/hive/data.csv'
OVERWRITE INTO TABLE emp.employee;

 LOAD DATA LOCAL INPATH '/home/hive/data.csv'
OVERWRITE INTO TABLE emp.employee PARTITION(date=2020);

 INSERT INTO emp.employee values(7,'scott',23,'M');
INSERT INTO emp.employee values(8,'raman',50,'M');

#8. Create your first table on Hive using data from CSV

Jun 15, 2021  · What we will be doing in this section is to download a CSV file from here in our local machine and transfer it to hdfs and create a hive view over it to query the data with plain …


start-dfs.sh
hadoop fs -mkdir /user/harssing/bds/
hadoop fs -put /user/harssing/downloads/yellow_tripdata_2019-01.csv  /user/harssing/bds/
Create table u_harssing.cabs
(VendorID int, pickup timestamp, dropoff timestamp, passenger_count int, trip_distance float, RatecodeID int, store_and_fwd_flag string, PULocationID int, DOLocationID int, payment_type int, fare_amount int, extra int, mta_tax int, tip_amount int, tolls_amount int, improvement_surcharge int , total_amount int, congestion_surcharge int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar'=','
,'quoteChar'
='\"'
)
STORED AS TEXTFILE
location 'hdfs:///user/harssing/bds/'
TBLPROPERTIES('skip.header.line.count'='1');
select * from u_harssing.cabs limit 10;
Create table u_harssing.cabs_orc

(VendorID int, pickup timestamp, dropoff timestamp, passenger_count int, trip_distance float, RatecodeID int, store_and_fwd_flag string, PULocationID int, DOLocationID int, payment_type int, fare_amount int, extra int, mta_tax int, tip_amount int, tolls_amount int, improvement_surcharge int , total_amount int, congestion_surcharge int)
STORED AS ORC;
INSERT INTO u_harssing.cabs_orc SELECT * FROM u_harssing.cabs;

Please leave your answer here: