In this blog we will understand how to read a Json file using Spark and load it into a dataframe. All the code examples is […]
Category: Spark Tutorial
This category contains blogs on Spark Tutorial. Easily understand Spark topics in this blog.
Spark Broadcast Variable explained
Broadcast variable helps the programmer to keep a read only copy of the variable in each machine/node where Spark is executing its job. The variable […]
Spark Read multiline (multiple line) CSV file with Scala
Spark DataFrame API allows us to read CSV file type using [spark.read.csv()]. If the CSV file contains multiple lines then they can be read using […]
How To Replace Null Values in Spark Dataframe
In Previous chapter we learned about Spark Dataframe Actions and today lets check out How to replace null values in Spark Dataframe. It is really important to handle […]
Spark Lazy Evaluation
Today we will learn about Spark Lazy Evaluation. We will learn about what it is, why is it required, how spark implements them, and what […]
Spark Dataframe Actions
When we call an Action on a Spark dataframe all the Transformations gets executed one by one. This happens because of Spark Lazy Evaluation which […]
Spark Dataframe drop rows with NULL values
The data we normally deal with may not be clean. In such cases we may need to clean the data by applying some logic . […]
Spark Dataframe withColumn
Using Spark withColumn() function we can add , rename , derive, split etc a Dataframe Column. There are many other things which can be achieved […]
SPARK DATAFRAME Union AND UnionAll
Using Spark Union and UnionAll you can merge data of 2 Dataframes and create a new Dataframe. Remember you can merge 2 Spark Dataframes only […]
SPARK distinct and dropDuplicates
Both Spark distinct and dropDuplicates function helps in removing duplicate records. One additional advantage with dropDuplicates() is that you can specify the columns to be […]