Latest from the Blog
In Previous chapter we learned about Spark Dataframe Actions and today lets check out How to replace null values in Spark Dataframe. It is really important to handle null values in dataframe if we want to avoid null pointer exception. For this Spark Dataframe API has a DataFrameNaFunctions class with fill( ) function. In this post we will […]
In Previous chapter we learned about HIVE SHOW PARTITION and today lets check out the difference between Hive Insert Into vs Insert Overwrite. We will also discuss the impact on both Hive Partitioned and Non-Partitioned tables in the blog below. Simply put Insert Into command appends the rows in the existing table whereas Insert Overwrite as […]
Today we will learn about Spark Lazy Evaluation. We will learn about what it is, why is it required, how spark implements them, and what is its advantage. We know that Spark is written in Scala and Scala has an option to run lazily [You can check the lesson here] but for Spark, the execution […]
Get new content delivered directly to your inbox.