Why Spark ?
- kumarnitinkarn
- Sep 9, 2019
- 1 min read
Apache Spark is open source, general-purpose distributed computing engine used for processing and analyzing a large amount of data.
Notice the word "Processing" and "Analyzing", it has nothing to do with providing storage to our data sets.Though Apache spark could be easily integrated with any of the File system like HDFS, S3 etc or any of the No SQL database like Cassandra, Mongo DB etc.
How IT Companies are leveraging Apache Spark ?
Now a days, everyone is switching their domain to BigData/Hadoop/Spark/Analytics. For beginners, it is very important to understand how and where Spark fits in Analytics/Data Science roles.
Currently, two types of Job profiles are in demand :
BigData Hadoop - Hive, HDFS, Pig, Sqoop, oozie, Spark, YARN, Any one programming language (Python, Scala, Java), Any No SQL Database (may or may not be)
Data Scientist (Machine Learning) - Machine Learning Algorithms, Python, R, Advance Excel, any Reporting tool
Those who has already worked on MapReduce, knows the magic of Spark. Companies are either migrating their MapReduce paradigm to Spark or developing new paradigms on Spark.
Articles in this Blog will mainly focus on Spark and may be on HDFS, Hive, Sqoop, Pig, Oozie if asked.
Happy Learning !!
Comments