In Previous chapter we learned about Spark Dataframe Actions and today lets check out How to replace null values in Spark Dataframe. It is really important to handle […]
Blogs
Hive Insert Into vs Insert Overwrite
In Previous chapter we learned about HIVE SHOW PARTITION and today lets check out the difference between Hive Insert Into vs Insert Overwrite. We will also […]
Spark Lazy Evaluation
Today we will learn about Spark Lazy Evaluation. We will learn about what it is, why is it required, how spark implements them, and what […]
HDFS Data Blocks and Block Size
When a file is stored in HDFS, Hadoop breaks the file into BLOCKS before storing them. What this means is, when you store a file […]
HIVE SHOW PARTITIONS
If you want to display all the Partitions of a HIVE table you can do that using SHOW PARTITIONS command. In big data world, efficient […]
Hive Split a row into multiple rows
You can split a row in Hive table into multiple rows using lateral view explode function. The one thing that needs to be present is […]
Hive Table Partition
Using Hive Partition you can divide a table horizontally into multiple sections. This division happens based on a partition key which is just a column […]
Spark Dataframe Actions
When we call an Action on a Spark dataframe all the Transformations gets executed one by one. This happens because of Spark Lazy Evaluation which […]
Spark Dataframe drop rows with NULL values
The data we normally deal with may not be clean. In such cases we may need to clean the data by applying some logic . […]
Spark Dataframe withColumn
Using Spark withColumn() function we can add , rename , derive, split etc a Dataframe Column. There are many other things which can be achieved […]