Latest from the Blog

Hive Insert Into vs Insert Overwrite

In Previous chapter we learned about HIVE SHOW PARTITION and today lets check out the difference between Hive Insert Into vs Insert Overwrite. We will also discuss the impact on both Hive Partitioned and Non-Partitioned tables in the blog below. Simply put Insert Into command appends the rows in the existing table whereas Insert Overwrite as […]

Spark Lazy Evaluation

Today we will learn about Spark Lazy Evaluation. We will learn about what it is, why is it required, how spark implements them, and what is its advantage. We know that Spark is written in Scala and Scala has an option to run lazily [You can check the lesson here] but for Spark, the execution […]

HDFS Data Blocks and Block Size

When a file is stored in HDFS, Hadoop breaks the file into BLOCKS before storing them. What this means is, when you store a file of big size Hadoop breaks them into smaller chunks based on predefined block size and then stores them in Data Nodes across the cluster. The default block size is 128mb […]

Get new content delivered directly to your inbox.