In Previous chapter we learned about Spark Dataframe Actions and today lets check out How to replace null values in Spark Dataframe. It is really important to handle null values in dataframe if we want to avoid null pointer exception. For this Spark Dataframe API has a DataFrameNaFunctions class with fill( ) function. In this post we will […]
In Previous chapter we learned about HIVE SHOW PARTITION and today lets check out the difference between Hive Insert Into vs Insert Overwrite. We will also discuss the impact on both Hive Partitioned and Non-Partitioned tables in the blog below. Simply put Insert Into command appends the rows in the existing table whereas Insert Overwrite as […]
Today we will learn about Spark Lazy Evaluation. We will learn about what it is, why is it required, how spark implements them, and what is its advantage. We know that Spark is written in Scala and Scala has an option to run lazily [You can check the lesson here] but for Spark, the execution […]
When a file is stored in HDFS, Hadoop breaks the file into BLOCKS before storing them. What this means is, when you store a file of big size Hadoop breaks them into smaller chunks based on predefined block size and then stores them in Data Nodes across the cluster. The default block size is 128mb […]
If you want to display all the Partitions of a HIVE table you can do that using SHOW PARTITIONS command. If you want to learn more about Hive Table Partitions you can check it here. So today we are going to understand the below topics. Table of Contents show partitions syntaxshow partitions using where orderby […]
You can split a row in Hive table into multiple rows using lateral view explode function. The one thing that needs to be present is a delimiter using which we can split the values. Lets dive straight into how to implement it. Table of Contents split row on single delimitersplit row on multiple delimiterConclusion split […]
Using Hive Partition you can divide a table horizontally into multiple sections. This division happens based on a partition key which is just a column in your Hive table. Through out this lesson we will understand various aspects of Hive Partition. Table of Contents why use Partition in Hivehow to create partition in hive tablecreate […]
When we call an Action on a Spark dataframe all the Transformations gets executed one by one. This happens because of Spark Lazy Evaluation which does not execute the transformations until an Action is called. In this article we will check commonly used Actions on Spark dataframe. Table of Contents Spark Dataframe show()head() and first() […]
The data we normally deal with may not be clean. In such cases we may need to clean the data by applying some logic . One such case is presence of null values in rows. We can handle it by dropping the spark dataframe rows using the drop() function . Table of Contents drop rows […]
Using Spark withColumn() function we can add , rename , derive, split etc a Dataframe Column. There are many other things which can be achieved using withColumn() which we will check one by one with suitable examples. But first lets create a dataframe which we will use to modify throughout this tutorial. Through out this […]
Something went wrong. Please refresh the page and/or try again.
Follow My Blog
Get new content delivered directly to your inbox.