In Previous chapter we learned about HIVE DATA TYPES and today lets check out HIVE TABLE CREATION. In HIVE there are two kinds of tables , Internal and External. Lets check each one of them separately and at the end we will see the difference between them.
Table of Contents
HIVE Internal Table
Internal table is the one that gets created when we create a table without the External keyword. Internal tables are also known as Managed Tables.
How to Create Internal Table in HIVE
The Table creation in Hive is similar to SQL but with many additional features. In this blog I am listing down only few of the important ones.
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name (col_name data_type [COMMENT 'col_comment'], ...) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment..'], ...)] [COMMENT 'table_comment..'] [STORED AS file_format] [LOCATION 'hdfs_path'] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
- CREATE – Keyword to create a table.
- EXTERNAL – This is optional if we do not specify this then Internal table is created.
- IF NOT EXISTS – If the table you are trying to create already exists, then hive throws an error. To avoid that we use this keyword to tell hive to create the table only if it does not exists .
- COMMENT – You describe the column by adding a comment.
- PARTITIONED BY – If you have a partition column then you can mention the column here. We will study partition in dept in Later Chapters.
- STORED AS – This specifies the type of file in which you want to store the data.
- LOCATION – By default Internal table stores the data in hive.metastore.warehouse.dir (/user/hive/warehouse/databasename.db). If you want to store the data in some other location you can specify here.
- TBLPROPERTIES – You can provide table properties as key value pairs. Some of the properties which gets created by default are last_modified_user and last_modified_time.
Create table Emp1( Name String , Age Int) Show create table Emp1 CREATE TABLE `Emp1`( `name` string, `age` int) ** LOCATION 'hdfs://**/user/hive/warehouse/emp1' **
HIVE External Table
If you need to access any file present outside of HIVE warehouse you need to create external tables on top of them Ex- HDFS , Azure Storage. Imagine there is a file in some HDFS location and you need to read the data. In such case you create an external table and provide the location of this table where the file resides.
Remember hive doesn’t even check if the location exists while creating the Table.
How to Create External Table in HIVE
Creation of Hive External table is similar to internal table with EXTERNAL keyword added to it.
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name (col_name data_type [COMMENT 'col_comment'], ...) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment..'], ...)] [COMMENT 'table_comment..'] [STORED AS file_format] [LOCATION 'hdfs_path'] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
Note that here the EXTERNAL keyword is mandatory. The remaining properties are same as internal tables, refer the description at the start of this page.
HIVE External vs Internal Table
DIFFERENCE – Based on Data Storage
Internal tables stores that data by default in hive.metastore.warehouse.dir (/user/hive/warehouse/databasename.db)
External tables the data is present outside of Hive. Ex- HDFS , Azure Storage
Both Internal and External Table stores the metadata in Hive Metastore.
DIFFERENCE – Based on Table Drop
On dropping Internal table, Hive deletes both the metadata and actual data.
On dropping External table Hive only deletes the metadata but the actual data stays as it is.
DIFFERENCE – Based on TRUNCATE
Truncate works on Internal Table.
Truncate doesn’t work on External Table.
HIVE CREATE Table FAQ
How do I add table properties in hive?
There are various way in which the table properties can be altered, but the simplest would be to run an alter table query. Check the query below.
ALTER TABLE TABLE_NAME SET TBLPROPERTIES( key =value..)
To check the other methods you can visit this link.
Can we truncate external table in hive?
No we cannot. The reason being Hive does not own the data . But if it is an internal table then we can remove all the records. Remember when we truncate the any table whether internal or external only the records are deleted but the partitions remains as it is.
How can you tell if a table is external in hive?
There are different way to tell this but the simplest of them all is to run SHOW CREATE TABLE TABLE_NAME . This will show you the table creation statement .If you find EXTERNAL keyword then you know its an external table.
show create table tbl_nm CREATE EXTERNAL TABLE `tbl_nm`( `col1` string , ... )
Where does hive stores its data?
Hive stores the data in hive compatible file systems like HDFS, S3. It is a common misconception that the hive metastore stores the actual data, but metastore only stores metadata information of a table like location , partition columns etc. You can check this interesting article on Hive Metastore here.
Can we create an External table without providing the location?
YES ,when we create an external table without providing any location, it gets created in default hive location i.e user/hive/warehouse/ .Remember that we normally shouldn’t be doing this because if we want the data to be stores in warehouse we should create an internal table.
Can we create Internal Table by Providing an External Location?
Yes we can.
CREATE TABLE `Emp1`( `name` string, `age` int) LOCATION 'hdfs://** Location'
What happens to files when we drop an Internal Table pointing to external location.
As its an Internal Table, both the metadata and the data in the external location are deleted.
Can we insert data into Hive external table?
Yes, there are various ways by which we can do that. Lets check it with examples. Imagine we have an external table with 2 columns ‘name’ and ‘age’.
- Using Select.
insert into table1 select "Smith" , 23
- From another tables data.
insert into table1 select name , age from table2
- Using HDFS commands
hdfs dfs -put /path/ hdfs dfs -copyFromLocal /frompath/ /topath/
In the next chapter we will learn about