Today we will learn how to create empty dataframe in Spark Scala. We will cover various methods on how to create empty dataframe with no schema and also create with schema.
Empty Dataframe with no schema
Here we will create an empty dataframe with does not have any schema/columns. For this we will use emptyDataframe() method. Lets us see an example below.
val df: DataFrame =spark.emptyDataFrame
Empty Dataframe with schema
Here we will create an empty dataframe with schema. We will make use of createDataFrame method for creation of dataframe. Just like emptyDataframe here we will make use of emptyRDD[Row] tocreate an empty rdd . We will also create a strytype schema variable. Let us see an example.
val schema = new StructType() .add("fnm",StringType,false) .add("lnm",StringType,false) val df: DataFrame = spark.createDataFrame(spark.sparkContext.emptyRDD[Row],schema) df.printSchema() root |-- fnm: string (nullable = false) |-- lnm: string (nullable = false) df.show() +---+---+ |fnm|lnm| +---+---+ +---+---+
Empty Dataframe with same Schema as another Dataframe
Here we will see how to create an empty dataframe having the same schema as another dataframe.
Lets us say we have a dataframe dfStudent having 2 columns and 3 rows.
val dfStudent = Seq(("Mark","Henry"),("Alita","Fernandez"),("Cuban","Leslie")).toDF("fnm","lnm") dfStudent.show() +-----+---------+ | fnm| lnm| +-----+---------+ | Mark| Henry| |Alita|Fernandez| |Cuban| Leslie| +-----+---------+
Now we need to create a new dataframe dfTeacher having same schema as dfStudent but no records. Let’s see how we can achieve that.
Exmaple1:
val dfTeacher = dfStudent.limit(0) dfTeacher.show() +---+---+ |fnm|lnm| +---+---+ +---+---+
Exmaple2:
val dfTeacher = dfStudent.filter(lit(1) === lit(2)) dfTeacher.show() +---+---+ |fnm|lnm| +---+---+ +---+---+
Empty Dataframe using Case Class
Here we will create an empty dataframe using schema from case class.
case class student(fnm: String ,lnm: String) val df: DataFrame = Seq.empty[student].toDF() df.printSchema() root |-- fnm: string (nullable = true) |-- lnm: string (nullable = true) df.show() +---+---+ |fnm|lnm| +---+---+ +---+---+
Empty Dataframe using Implicit Encoder
Let us see an example below.
val schemaSeq = Seq("fnm", "lnm") val df = Seq.empty[(String, String)].toDF(schemaSeq: _*) df.printSchema() root |-- fnm: string (nullable = true) |-- lnm: string (nullable = true) df.show() +---+---+ |fnm|lnm| +---+---+ +---+---+
🙂 kudos for learning something new 🙂
If you want to check my other spark blogs click here.