Online Learning Platform

Big Data > Spark > History of Spark

History of Spark:

Apache Spark began in 2009 as a research project at the University of California, Berkeley, in the RAD Lab, which later became known as the AMPLab. The goal was to create a faster and more flexible alternative to Hadoop’s MapReduce, which was found to be inefficient for certain tasks like iterative machine learning and interactive data analysis. Spark was designed to overcome these limitations by supporting in-memory data processing and quicker data reuse. It was initially released under the BSD license, which allowed for open and flexible use by developers and organizations.

By 2010, Spark was already showing impressive results—being 10 to 20 times faster than MapReduce for specific workloads. In 2013, the project was donated to the Apache Software Foundation, where it quickly gained popularity in the big data community. It officially became a top-level Apache project in 2014.

In 2014, Databricks, a company founded by the original creators of Spark, played a major role in its development and adoption. Databricks contributed to the ongoing improvement of Spark and built a cloud-based platform that made it easier for organizations to use Spark for large-scale data analytics, machine learning, and artificial intelligence. Their work helped Spark grow into a widely used and trusted data processing engine.

Over time, Spark expanded to include several built-in libraries such as Spark SQL for structured data, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for real-time data. These made Spark a powerful unified engine for large-scale data processing. Today, it is widely used by companies and researchers around the world for big data and machine learning applications.

Online Learning Platform

Big Data > Spark > History of Spark

What is Spark?

Differences Hadoop VS Spark

Feedback

ABOUT

Statlearner STUDY