Apache Spark: Dataset vs Dataframe - The Tortoise and Hare
Introduction Hi, welcome to my first blog post. This post is the first one in a series of many that will follow. Who is Tortoise and who is Hare? Well, In many books about Apache Spark that I was reading, I didn’t found a clear idea of the performance of dataframes compared to the datasets. In this blog post, we will debunk that mystery and show some concrete results and insights regarding this matter. ...