Spark Lazy Evaluation

Today we will learn about Spark Lazy Evaluation. We will learn about what it is, why is it required, how spark implements them, and what […]

HDFS Data Blocks and Block Size

When a file is stored in HDFS, Hadoop breaks the file into BLOCKS before storing them. What this means is, when you store a file […]


If you want to display all the Partitions of a HIVE table you can do that using SHOW PARTITIONS command. If you want to learn […]

Hive Split a row into multiple rows

You can split a row in Hive table into multiple rows using lateral view explode function. The one thing that needs to be present is […]

Hive Table Partition

Using Hive Partition you can divide a table horizontally into multiple sections. This division happens based on a partition key which is just a column […]

Spark Dataframe Actions

When we call an Action on a Spark dataframe all the Transformations gets executed one by one. This happens because of Spark Lazy Evaluation which […]

Spark Dataframe withColumn

Using Spark withColumn() function we can add , rename , derive, split etc a Dataframe Column. There are many other things which can be achieved […]


Using Spark Union and UnionAll you can merge data of 2 Dataframes and create a new Dataframe. Remember you can merge 2 Spark Dataframes only […]

SPARK distinct and dropDuplicates

Both Spark distinct and dropDuplicates function helps in removing duplicate records. One additional advantage with dropDuplicates() is that you can specify the columns to be […]


Something went wrong. Please refresh the page and/or try again.

Follow My Blog

Get new content delivered directly to your inbox.