You can split a row in Hive table into multiple rows using lateral view explode function. The one thing that needs to be present is a delimiter using which we can split the values. Lets dive straight into how to implement it.
Table of Contents
split row on single delimiter
In most cases all the values in the column are split using a single delimiter. For the example below I have used exclamation [!] as delimiter.
STEP 1 : Lets create a Hive table named ‘student_grp‘ which has two columns ,group name and students name in the group. The student names are split based on exclamation [!] as delimiter.
create table student_grp (group_name string , member_name string); insert into student_grp select 'GroupA' , 'Neha Sharma!Henry Smith' union all select 'GroupB' , 'Shamira!Yuri Chen!Mike Hustle' union all select 'GroupC', null select * from student_grp; group_name member_name GroupA Neha Sharma!Henry Smith GroupB Shamira!Yuri Chen!Mike Hustle GroupC <null>
STEP 2: Lets now split the records on delimiter and explode the data.
Select group_name , student_name from student_grp
lateral view explode (split(member_name , ‘!’)) member_name as student_name
Select group_name , student_name from student_grp lateral view explode (split(member_name , '!')) member_name as student_name group_name student_name GroupA Neha Sharma GroupA Henry Smith GroupB Shamira GroupB Yuri Chen GroupB Mike Hustle
Now you see each row is converted to multiple rows. Also notice that the group_name column value is also present in each record.
If you are a keen observer you must have noticed that our initial data has 3 groups Group A/B/C but after we explode the data we are now left with 2 groups A and B. The reason being GroupC has null in student_name column, and because of this it was rejected. If in your requirement you need to retain the record with null then you need to add outer keyword in your query.
Select group_name , student_name from student_grp
lateral view outer explode (split(member_name , ‘!’)) member_name as student_name
Select group_name , student_name from student_grp lateral view outer explode (split(member_name , '!')) member_name as student_name group_name student_name GroupA Neha Sharma GroupA Henry Smith GroupB Shamira GroupB Yuri Chen GroupB Mike Hustle GroupC <null>
You see that now we have null record as well.
split row on multiple delimiter
In some cases there could be more than 1 delimiter in a column. For example may the we need to split the data either on exclamation [!] delimiter or at the rate [@] delimiter.
Step 1 : Lets create a Hive table named ‘student_grp‘ which has two columns ,group name and students name in the group. The student names are split based on exclamation [!] delimiter and at the rate [@] delimiter.
create table student_grp (group_name string , member_name string); insert into student_grp select 'GroupA' , 'Neha Sharma!Henry Smith' union all select 'GroupB' , 'Shamira!Yuri Chen@Mike Hustle' union all select 'GroupC', null select * from student_grp; group_name member_name GroupA Neha Sharma!Henry Smith GroupB Shamira!Yuri Chen@Mike Hustle GroupC <null>
STEP 2: Lets now split the records on delimiter and explode the data. Now we need to pass all the list of delimiters in qoutes separated by pipe[|].
Select group_name , student_name from student_grp
lateral view outer explode (split(member_name , ‘@|!’)) member_name as student_name
Select group_name , student_name from student_grp lateral view outer explode (split(member_name , '@|!')) member_name as student_name group_name student_name GroupA Neha Sharma GroupA Henry Smith GroupB Shamira GroupB Yuri Chen GroupB Mike Hustle GroupC <null>
Conclusion
So today we learnt how to split a row into multiple rows in Hive using lateral view explode. Also we understood how to split with single and multiple delimiter.
🙂 kudos for learning something new 🙂
1 thought on “Hive Split a row into multiple rows”