HIVE SHOW PARTITIONS - UnderstandingBigData

If you want to display all the Partitions of a HIVE table you can do that using SHOW PARTITIONS command. In big data world, efficient data querying is important. Apache Hive, a cornerstone in the Hadoop ecosystem, offers a powerful feature known as table partitioning. This feature not only optimizes query performance but also organizes datasets .If you want to learn more about Hive Table Partitions you can check it here.

Understanding Hive Partitions:

Table of Contents

Understanding Hive Partitions:

At its core, partitioning in Hive allows tables to be divided & structured into more manageable pieces, each stored in its separate directory. This division is based on the value of a table column (or columns), known as the partition key. When querying data, Hive can then skip over partitions that are irrelevant to the query, leading to faster query execution.

show partitions syntax

The syntax of show partition is pretty straight forward and it works on both internal or external Hive Tables. The output is order alphabetically by default.

SHOW PARTITIONS table_name;

Lets create a customer table with 2 partition columns ‘country’ and ‘state’ and add few partitions to it.

create table customer (cust_nm string , cust_id Int) partitioned by (country string ,state string);
alter table customer add partition (Country = 'USA' ,State = 'California');
alter table customer add partition (Country = 'USA' ,State = 'Ohio');

show partitions customer;
partition
country=USA/state=California
country=USA/state=Ohio

show partitions using where orderby & limit clause

From hive 4.0 we can use where , order by and limit clause along with show partitions in hive.Lets implement and see.

limit clause

Using limit clause you can limit the number of partitions you need to fetch. You can apply this on the entire table or on a sub partitions. This is especially useful when dealing with tables that have a large number of partitions. Lets check both the scenario below .

SHOW PARTITIONS table_name [PARTITION(partition_spec)] [LIMIT rows];

create table customer (cust_nm string , cust_id Int) partitioned by (country string ,state string);
alter table customer add partition (Country = 'USA' ,State = 'California');
alter table customer add partition (Country = 'USA' ,State = 'Ohio');
alter table customer add partition (Country = 'India' ,State = 'Karnataka');
alter table customer add partition (Country = 'India' ,State = 'Delhi');

--show only 2 partitions
show partitions customer limit 2;
partition
country=India/state=Delhi
country=India/state=Karnataka

--show only 1 sub-partition for partition India
show partitions customer (country = 'India') limit 1;
country=India/state=Delhi

where clause

Using where clause you can fetch specific partition information from the Hive table. Again like limit this is applicable on entire table or on specific partitions.

SHOW PARTITIONS table_name [PARTITION(partition_spec)] [WHERE where_condition] ;

--check if country partition has USA
show partitions customer where country ='USA';

--check if country partition for USA has Delhi as State partition
show partitions customer (country = 'India') where state = 'Delhi';

order by clause

By default, partitions are displayed in ascending order. However, with the ORDER BY clause, you can specify the order, either ascending or descending.

SHOW PARTITIONS table_name [PARTITION(partition_spec)] [ORDER BY col_list] ;

--check if country partition has USA and display the partitions in desc order
show partitions customer where country ='USA' order by state desc;

Note: You can also use all the clauses in one query in Hive

SHOW PARTITIONS table_name [PARTITION(partition_spec)] [WHERE where_condition] [ORDER BY column_list] [LIMIT rows];

Conclusion

Hive partitions are a game-changer in the world of big data. They not only enhance query performance but also provide a structured way to manage vast datasets. With the advancements in Hive 4.0, data engineers now have even more tools at their disposal to fine-tune their data retrieval processes. As we continue to generate more data, understanding and effectively utilizing features like Hive partitions will be crucial for anyone in the data engineering field.

🙂 kudos for learning something new 🙂