HIVE ALTER TABLE - UnderstandingBigData

In Previous chapter we learned about HIVE TABLE CREATION and today lets check out HIVE ALTER TABLE. Here we will list down all the alterations that can be done on a Hive Table.

Table of Contents

ALTER Statement on HIVE Table

Here we will discuss how we can change table level properties.

rename hive table

ALTER TABLE tbl_nm RENAME TO new_tbl_nm;

In the above statement the table name was changed from tbl_nm to new_tbl_nm. Note that there is no impact on the data that resides in the table. Lets check it with an example.

create table old_tbl (age int);
insert into old_tbl select 7;
alter table old_tbl rename to new_tbl;
select * from new_tbl;
output:
age
7

alter SerDe properties of hive table

ALTER TABLE tbl_nm_name [PARTITION part_spec] 
SET SERDE serde_class_name
WITH SERDEPROPERTIES ('prop1' = 'value1');
 
ALTER TABLE tbl_nm [PARTITION par_spec] 
SET SERDEPROPERTIES ('prop1' = 'value1');

SerDe stands for serialization and deserailization. This basically tells how we should read and write the data. Using this you can replace a SerDe property or can add a new one. In the below example we lets set the SerDe to OpenCSVSerde.

alter table new_tbl SET SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde';

alter table properties [tblproperties]

ALTER TABLE table_name SET TBLPROPERTIES ('prop1'='value1');

Using this you can add or modify table properties. Remember that you won’t be able to remove any of the existing properties using this. In the below example lets add autopurge=true to our table.

alter table new_tbl set tblproperties ("auto.purge"="true")

alter storage properties

ALTER TABLE table_name SET FILEFORMAT file_type;

Hive supports various file formats like CSV , TEXT, ORC , PARQUET etc. We can change the file formats using the SET FILEFORMAT statement. Our table new_tbl stores the data in Text format, lets change it to Parquet.

alter table new_tbl set fileformat parquet;

ALTER Statement on HIVE Column

Using this you can alter different column related properties. Below you will find that each syntax contains [CASCADE|RESTRICT] which i will be explaining at the end.

alter hive column

ALTER TABLE tbl_nm CHANGE [COLUMN] col_old_nm col_new_nm col_type
  [COMMENT col_comment] [FIRST|AFTER col_nm] [CASCADE|RESTRICT];

The above syntax could look confusing but by using them you can perform a lot of operation like below.

alter column name
change data type of column
alter position
add comments to column

Let’s check them one by one:

--Create a table with 3 columns  
create table altcol(col1 string , col2 string , col3 string)

--change col1 name to newcol1
alter table altcol change col1 newcol1 String;
--the new col structure is newcol1 string, col2 string, col3 string

--move col3 from last position to first porition
alter table altcol change col3 col3 string first;
--the new col structure is col3 string ,newcol1 string, col2 string

--add comment to col3
alter table altcol change col3 col3 string comment 'newly moved to 1st position';
--the new col structure is col3 string COMMENT 'newly moved to 1st position' ,newcol1 string, col2 string

--Now, lets do multiple things together
--rename newcol1 to newcol2 and change the datatype from string to int and place it  after col2 
alter table altcol change newcol1 newcol2 int after col2;
--the new col structure col3 string COMMENT 'newly moved to 1st position', col2 string,newcol2 int

add or replace hive column

Using ADD you can add columns at the end of existing columns . If the table is partitioned the columns gets added at the end but before the partitioned column. The syntax is as follows.

ALTER TABLE tbl_nm ADD COLUMNS (col_nm data_type)   [CASCADE|RESTRICT]

Using REPLACE you can complete remove all the columns from the existing table and add new columns. Remember that the data is not dropped.

ALTER TABLE tbl_nm REPLACE COLUMNS (col_nm data_type)   [CASCADE|RESTRICT]

[CASCADE|RESTRICT] in Hive

This basically says if you want to apply the change to only the metadata of table or both table metadata and partition metadata. What i have noticed is, these are relevant only when your table has partitions. We will check couple of examples by adding columns to existing table with partition while using [CASCADE|RESTRICT].

with partition with restrict

Let’s create a table with partition and then add columns to it with RESTRICT and see how it behaves.

create table CRTest(name String,age int) partitioned by(country String)

--Data Inserted into CRTest
insert into CRTest partition(country = 'Canada') select "Nancy" , 45

--Data Visible
select * from CRTest
name	age	country
Nancy	45	Canada

--Column added with RESTRICT
alter table CRTest add columns (height float ) RESTRICT

--Data inserted into Existing Partition 'Canada'
insert into CRTest partition(country = 'Canada') select "Betty" , 35 ,5.6

--New column 'heigth' doesnt show data
select * from CRTest
name	age	height	country
Nancy	45	<null>	Canada
Betty	35	<null>	Canada

--Data inserted into new Partition 'India'
insert into CRTest partition(country = 'India') Select "Prrethy" ,37,5.6

--Data seen in 'height' column for new partition 'India'
select * from CRTest
name	age	height	country
Nancy	45	<null>	Canada
Betty	35	<null>	Canada
Prrethy	37	5.6	        India

So what we understand from this is, when we use RESTRICT the changes are visible for new Partitions created but the existing partitions are not impacted.

If you want the new column to be added to existing partition then you have to execute them separately for each partition. Also the default if we do not provide anything during alter statement is RESTRICT.

alter table CRTest partition(country = 'Canada') add columns (height float ) RESTRICT

select * from CRTest
name	age	height	country
Nancy	45	<null>	Canada
Betty	35	5.6	        Canada
Prrethy	37	5.6	    India

One thing to note here is that, as soon as i ran the alter on the partition Betty’s height became visible. That means the data was present all these while but hive was not showing it to us until we specifically added column to that partition. Now lets check cascade

with partition with cascade

If we want the change to propagate to all the existing and future partition metadata then we should use cascade while altering the table.

create table CRTest(name String,age int) partitioned by(country String)

--Data Inserted into CRTest
insert into CRTest partition(country = 'Canada') select "Nancy" , 45

--Data Visible
select * from CRTest
name	age	country
Nancy	45	Canada

--Column added with CASCADE
alter table CRTest add columns (height float ) CASCADE

--Data inserted into Existing Partition 'Canada'
insert into CRTest partition(country = 'Canada') select "Betty" , 35 ,5.6

--New column 'heigth' shows data
select * from CRTest
name	age	height	country
Nancy	45	<null>	Canada
Betty	35	5.6	Canada

--Data inserted into new Partition 'India'
insert into CRTest partition(country = 'India') Select "Prrethy" ,37,5.6

--Data seen for both old and new partition
select * from CRTest
name	age	height	country
Nancy	45	<null>	Canada
Betty	35	5.6	Canada
Prrethy	37	5.6	India