Today we will learn how to create empty dataframe in Spark Scala. We will cover various methods on how to create empty dataframe with no schema and also create with schema.

Empty Dataframe with no schema

Here we will create an empty dataframe with does not have any schema/columns. For this we will use emptyDataframe() method. Lets us see an example below.

val df: DataFrame =spark.emptyDataFrame

Empty Dataframe with schema

Here we will create an empty dataframe with schema. We will make use of createDataFrame method for creation of dataframe. Just like emptyDataframe here we will make use of emptyRDD[Row] tocreate an empty rdd . We will also create a strytype schema variable. Let us see an example.

  val schema = new StructType()
    .add("fnm",StringType,false)
    .add("lnm",StringType,false)

  val df: DataFrame = spark.createDataFrame(spark.sparkContext.emptyRDD[Row],schema)

  df.printSchema()

  root
  |-- fnm: string (nullable = false)
  |-- lnm: string (nullable = false)

  df.show()

  +---+---+
  |fnm|lnm|
  +---+---+
  +---+---+

Empty Dataframe with same Schema as another Dataframe

Here we will see how to create an empty dataframe having the same schema as another dataframe.

Lets us say we have a dataframe dfStudent having 2 columns and 3 rows.

  val dfStudent = Seq(("Mark","Henry"),("Alita","Fernandez"),("Cuban","Leslie")).toDF("fnm","lnm")
  dfStudent.show()

  +-----+---------+
  |  fnm|      lnm|
  +-----+---------+
  | Mark|    Henry|
  |Alita|Fernandez|
  |Cuban|   Leslie|
  +-----+---------+

Now we need to create a new dataframe dfTeacher having same schema as dfStudent but no records. Let’s see how we can achieve that.

Exmaple1:

  val dfTeacher = dfStudent.limit(0)

  dfTeacher.show()

  +---+---+
  |fnm|lnm|
  +---+---+
  +---+---+

Exmaple2:

  val dfTeacher = dfStudent.filter(lit(1) === lit(2))

  dfTeacher.show()

  +---+---+
  |fnm|lnm|
  +---+---+
  +---+---+

Empty Dataframe using Case Class

Here we will create an empty dataframe using schema from case class.

  case class student(fnm: String ,lnm: String)

  val df: DataFrame = Seq.empty[student].toDF()

  df.printSchema()

  root
  |-- fnm: string (nullable = true)
  |-- lnm: string (nullable = true)

  df.show()

  +---+---+
  |fnm|lnm|
  +---+---+
  +---+---+

Empty Dataframe using Implicit Encoder

Let us see an example below.

  val schemaSeq = Seq("fnm", "lnm")

  val df = Seq.empty[(String, String)].toDF(schemaSeq: _*)

  df.printSchema()

  root
  |-- fnm: string (nullable = true)
  |-- lnm: string (nullable = true)

  df.show()

  +---+---+
  |fnm|lnm|
  +---+---+
  +---+---+

🙂 kudos for learning something new 🙂

If you want to check my other spark blogs click here.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from UnderstandingBigData

Subscribe now to keep reading and get access to the full archive.

Continue reading