Today we will learn Different ways of creating delta table in Databricks. We will check how the tables can be created using the existing apache spark code and also spark sql.
create managed delta table using SQL
In a managed table, databricks maintains both the data and metadata of the table. Which means if you drop a table the underlying data also gets deleted. The syntax of creating a delta table using sql is similar to any other create statement . By default when you do not provide the type a delta table is created. The files will be stored in the location managed by hive metastore (/user/hive/warehouse/myTable)
create table CountryCapital_SQL (Country String, Capital String)
To check if actually a delta table is created or not, you can run a show create table
show create table CountryCapital_SQL CREATE TABLE spark_catalog.default.countrycapital_sql ( Country STRING, Capital STRING) USING delta TBLPROPERTIES ( 'Type' = 'MANAGED', 'delta.minReaderVersion' = '1', 'delta.minWriterVersion' = '2')
In the above code you can see “USING DELTA” is mentioned which means this table is of delta format. We will learn more regarding “USING” when we create unmanaged / external tables.
Note that you can mention “USING DELTA” in the create table syntax to specifically mention to create delta table. Please check the below example
create table CountryCapital_SQL1 (Country String, Capital String) USING DELTA
You can also create all other types of tables by mentioning them after USING . ex: USING CSV , USING PARQUET etc.
create managed delta table using apache spark
To create a table you can use the existing Apache Spark command and use the write format as “delta” instead of parquet , csv etc.
val df = Seq(("Belgium","Brussels"),("Croatia","Zagreb"),("France","Paris"),("Greece","Athens")).toDF("Country","Captial_City") //Create Managed Table df .write .format("delta") .saveAsTable("CountryCapital_ManagedTable")
create unmanaged or external delta table using SQL
In unmanaged or external table databricks only maintains the metadata and not the actual data. Which means when you drop the table the data is not deleted. While creating an unmanaged delta table you need to provide the location as well. Let’s check an example
create table CountryCapital_SQL_UNMANAGED (Country String, Capital String) USING DELTA LOCATION "/dbfs/FileStore/import-stage"
To check if actually an unmanaged table is created, lets run the show create table command
show create table CountryCapital_SQL_UNMANAGED CREATE TABLE spark_catalog.default.countrycapital_sql_unmanaged ( Country STRING, Capital STRING) USING delta LOCATION 'dbfs:/dbfs/FileStore/import-stage' TBLPROPERTIES ( 'Type' = 'EXTERNAL', 'delta.minReaderVersion' = '1', 'delta.minWriterVersion' = '2')
In the able create table, you can see the Type as EXTERNAL which means an unmanaged table has been created.
create unmanaged or external delta table using apache spark
We can create an external delta table using apache spark as below. The syntax is same as creating a regular external table but in format we need to use “delta”. The syntax is as below
val df = spark.range(0,5) df .write .format("delta") .save("dbfs:/FileStore/import-stage")
convert parquet table to delta table
Here we will learn ways in which you can convert your existing parquet table into a delta table. This will come in handy when you want to move your project into databricks and want to use the benefits of delta table .
The first step is to convert the parquet files present in the location to delta file. This can be done using CONVERT TO DELTA command. Once this is run the parquet files are converted to delta . Now we just need to create a delta table on top of the files and read them.
%sql CONVERT TO DELTA parquet.`/FileStore/parquetFile1`; CREATE TABLE events USING DELTA LOCATION '/FileStore/parquetFile1'; Select * from events;
You can also directly convert a Parquet Table into a Delta table using CONVERT TO DELTA. For this lets first create a parquet table.
val df = spark.range(0,5) df .write .format("parquet") .saveAsTable("parTable") %sql show create table parTable; createtab_stmt: CREATE TABLE default.partable ( id BIGINT) USING parquet
As you can see the parTable is a parquet table. Now lets convert it into a delta table.
%sql convert to delta parTable; show create table parTable; CREATE TABLE spark_catalog.default.partable ( id BIGINT) USING delta TBLPROPERTIES ( 'Type' = 'MANAGED', 'delta.minReaderVersion' = '1', 'delta.minWriterVersion' = '2')
As you can see above, the table has now been converted into a delta managed table.
You can learn about table creation in Hive here.
🙂 kudos for learning something new 🙂