Spark - Create Dataframe From List

One can create dataframe from List or Seq using the toDF() functions. To use toDF() we need to import spark.implicits._

scala> val value = Seq(("Smith",6,9.5),("Max",0,2.5))
value: Seq[(String, Int, Double)] = List((Smith,6,9.5), (Max,0,2.5))
scala> val df1 = value.toDF()
df1: org.apache.spark.sql.DataFrame = [_1: string, _2: int ... 1 more field]
scala> df1.show
+-----+---+---+
|   _1| _2| _3|
+-----+---+---+
|Smith|  6|9.5|
|  Max|  0|2.5|
+-----+---+---+

Here the column names are default to _1 , _2 etc. If we want to provide a proper column name then we can use the below syntax.

scala> val value = Seq(("Smith",6,9.5),("Max",0,2.5))
value: Seq[(String, Int, Double)] = List((Smith,6,9.5), (Max,0,2.5))
scala> val df1 = value.toDF("Name","Mark1","Mark2")
df1: org.apache.spark.sql.DataFrame = [Name: string, Mark1: int ... 1 more field]
scala> df1.show
+-----+-----+-----+
| Name|Mark1|Mark2|
+-----+-----+-----+
|Smith|    6|  9.5|
|  Max|    0|  2.5|
+-----+-----+-----+

The same can be used to create dataframe from List.

Open Question – Is there a difference between dataframe made from List vs Seq

Limitation:
While using toDF we cannot provide the column type and nullable property .

scala> df1.printSchema()
root
 |-- Name: string (nullable = true)
 |-- Mark1: integer (nullable = false)
 |-- Mark2: double (nullable = false)

Note: We cannot create dataframes using Array . When we do we get the below error.

scala> val value = Array(("Smith",6,9.5),("Max",0,2.5)).toDF()
<console>:23: error: value toDF is not a member of Array[(String, Int, Double)]

val value = Array(("Smith",6,9.5),("Max",0,2.5)).toDF()

Open Question – Why we cannot create dataframe from Array.

Spark – Create Dataframe From List

Leave a ReplyCancel reply

Leave a ReplyCancel reply

Discover more from UnderstandingBigData