One can create dataframe from List or Seq using the toDF() functions. To use toDF() we need to import spark.implicits._
scala> val value = Seq(("Smith",6,9.5),("Max",0,2.5)) value: Seq[(String, Int, Double)] = List((Smith,6,9.5), (Max,0,2.5)) scala> val df1 = value.toDF() df1: org.apache.spark.sql.DataFrame = [_1: string, _2: int ... 1 more field] scala> df1.show +-----+---+---+ | _1| _2| _3| +-----+---+---+ |Smith| 6|9.5| | Max| 0|2.5| +-----+---+---+
Here the column names are default to _1 , _2 etc. If we want to provide a proper column name then we can use the below syntax.
scala> val value = Seq(("Smith",6,9.5),("Max",0,2.5)) value: Seq[(String, Int, Double)] = List((Smith,6,9.5), (Max,0,2.5)) scala> val df1 = value.toDF("Name","Mark1","Mark2") df1: org.apache.spark.sql.DataFrame = [Name: string, Mark1: int ... 1 more field] scala> df1.show +-----+-----+-----+ | Name|Mark1|Mark2| +-----+-----+-----+ |Smith| 6| 9.5| | Max| 0| 2.5| +-----+-----+-----+
The same can be used to create dataframe from List.
Limitation:
While using toDF we cannot provide the column type and nullable property .
scala> df1.printSchema() root |-- Name: string (nullable = true) |-- Mark1: integer (nullable = false) |-- Mark2: double (nullable = false)
Note: We cannot create dataframes using Array . When we do we get the below error.
scala> val value = Array(("Smith",6,9.5),("Max",0,2.5)).toDF() <console>:23: error: value toDF is not a member of Array[(String, Int, Double)] val value = Array(("Smith",6,9.5),("Max",0,2.5)).toDF()