You can split a row in Hive table into multiple rows using lateral view explode function. The one thing that needs to be present is a delimiter using which we can split the values. Lets dive straight into how to implement it.

split row on single delimiter

In most cases all the values in the column are split using a single delimiter. For the example below I have used exclamation [!] as delimiter.

STEP 1 : Lets create a Hive table named ‘student_grp‘ which has two columns ,group name and students name in the group. The student names are split based on exclamation [!] as delimiter.

create table student_grp (group_name string , member_name string);
insert into student_grp
select 'GroupA' , 'Neha Sharma!Henry Smith'
union all select 'GroupB' , 'Shamira!Yuri Chen!Mike Hustle'
union all select 'GroupC', null

select * from student_grp;
group_name	member_name
GroupA	Neha Sharma!Henry Smith
GroupB	Shamira!Yuri Chen!Mike Hustle
GroupC	<null>

STEP 2: Lets now split the records on delimiter and explode the data.

Select group_name , student_name from student_grp
lateral view explode (split(member_name , ‘!’)) member_name as student_name

Select group_name , student_name from student_grp
lateral view explode (split(member_name , '!')) member_name as student_name

group_name	student_name
GroupA	Neha Sharma
GroupA	Henry Smith
GroupB	Shamira
GroupB	Yuri Chen
GroupB	Mike Hustle

Now you see each row is converted to multiple rows. Also notice that the group_name column value is also present in each record.

If you are a keen observer you must have noticed that our initial data has 3 groups Group A/B/C but after we explode the data we are now left with 2 groups A and B. The reason being GroupC has null in student_name column, and because of this it was rejected. If in your requirement you need to retain the record with null then you need to add outer keyword in your query.

Select group_name , student_name from student_grp
lateral view
outer explode (split(member_name , ‘!’)) member_name as student_name

Select group_name , student_name from student_grp
lateral view outer explode (split(member_name , '!')) member_name as student_name

group_name	student_name
GroupA	Neha Sharma
GroupA	Henry Smith
GroupB	Shamira
GroupB	Yuri Chen
GroupB	Mike Hustle
GroupC	<null>

You see that now we have null record as well.

split row on multiple delimiter

In some cases there could be more than 1 delimiter in a column. For example may the we need to split the data either on exclamation [!] delimiter or at the rate [@] delimiter.

Step 1 : Lets create a Hive table named ‘student_grp‘ which has two columns ,group name and students name in the group. The student names are split based on exclamation [!] delimiter and at the rate [@] delimiter.

create table student_grp (group_name string , member_name string);
insert into student_grp
select 'GroupA' , 'Neha Sharma!Henry Smith'
union all select 'GroupB' , 'Shamira!Yuri Chen@Mike Hustle'
union all select 'GroupC', null

select * from student_grp;
group_name	member_name
GroupA	Neha Sharma!Henry Smith
GroupB	Shamira!Yuri Chen@Mike Hustle
GroupC	<null>

STEP 2: Lets now split the records on delimiter and explode the data. Now we need to pass all the list of delimiters in qoutes separated by pipe[|].

Select group_name , student_name from student_grp
lateral view outer explode (split(member_name ,
‘@|!’)) member_name as student_name

Select group_name , student_name from student_grp
lateral view outer explode (split(member_name , '@|!')) member_name as student_name

group_name	student_name
GroupA	Neha Sharma
GroupA	Henry Smith
GroupB	Shamira
GroupB	Yuri Chen
GroupB	Mike Hustle
GroupC	<null>

Conclusion

So today we learnt how to split a row into multiple rows in Hive using lateral view explode. Also we understood how to split with single and multiple delimiter.

๐Ÿ™‚ kudos for learning something newย  ๐Ÿ™‚

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.