One can create dataframe from List or Seq using the toDF() functions. To use toDF() we need to import spark.implicits._

scala> val value = Seq(("Smith",6,9.5),("Max",0,2.5))
value: Seq[(String, Int, Double)] = List((Smith,6,9.5), (Max,0,2.5))
scala> val df1 = value.toDF()
df1: org.apache.spark.sql.DataFrame = [_1: string, _2: int ... 1 more field]
scala> df1.show
+-----+---+---+
|   _1| _2| _3|
+-----+---+---+
|Smith|  6|9.5|
|  Max|  0|2.5|
+-----+---+---+

Here the column names are default to _1 , _2 etc. If we want to provide a proper column name then we can use the below syntax.

scala> val value = Seq(("Smith",6,9.5),("Max",0,2.5))
value: Seq[(String, Int, Double)] = List((Smith,6,9.5), (Max,0,2.5))
scala> val df1 = value.toDF("Name","Mark1","Mark2")
df1: org.apache.spark.sql.DataFrame = [Name: string, Mark1: int ... 1 more field]
scala> df1.show
+-----+-----+-----+
| Name|Mark1|Mark2|
+-----+-----+-----+
|Smith|    6|  9.5|
|  Max|    0|  2.5|
+-----+-----+-----+

The same can be used to create dataframe from List.

Limitation:
While using toDF we cannot provide the column type and nullable property .

scala> df1.printSchema()
root
 |-- Name: string (nullable = true)
 |-- Mark1: integer (nullable = false)
 |-- Mark2: double (nullable = false)

Note: We cannot create dataframes using Array . When we do we get the below error.

scala> val value = Array(("Smith",6,9.5),("Max",0,2.5)).toDF()
<console>:23: error: value toDF is not a member of Array[(String, Int, Double)]

val value = Array(("Smith",6,9.5),("Max",0,2.5)).toDF()

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from UnderstandingBigData

Subscribe now to keep reading and get access to the full archive.

Continue reading