Latest from the Blog

How To Replace Null Values in Spark Dataframe

In Previous chapter we learned about  Spark Dataframe Actions  and today lets check out How to replace null values in Spark Dataframe. It is really important to handle null values in dataframe if we want to avoid null pointer exception. For this Spark Dataframe API has a DataFrameNaFunctions class with fill( ) function. In this post we will […]

Hive Insert Into vs Insert Overwrite

In Previous chapter we learned about HIVE SHOW PARTITION and today lets check out the difference between Hive Insert Into vs Insert Overwrite. We will also discuss the impact on both Hive Partitioned and Non-Partitioned tables in the blog below. Simply put Insert Into command appends the rows in the existing table whereas Insert Overwrite as […]

Spark Lazy Evaluation

Today we will learn about Spark Lazy Evaluation. We will learn about what it is, why is it required, how spark implements them, and what is its advantage. We know that Spark is written in Scala and Scala has an option to run lazily [You can check the lesson here] but for Spark, the execution […]

Get new content delivered directly to your inbox.