Big data is an important technology as it allows analyzing large amounts of information – both structured and unstructured – quickly. Map Reduce is a paradigm that allows computation to be done in distributed and parallel manner. It is cheaper to run 1000 machine with 1 GB RAM than buying a single 1000 GB machine.
Hadoop is a MapReduce framework that enables processing large datasets in parallel, on clusters of commodity hardware. It’s faster on massive data volumes as data processing is done in parallel. Read more about Hadoop here
Apache Spark can be said as a better version of Hadoop using in-memory operations whenever possible. Spark provides a unified and comprehensive solution to manage numerous big data workloads. It improves the performance and makes speed several times faster as compared to other big data technologies.
Spark has originally been written in Scala programming language and runs on Java Virtual Machine environment (JVM Environment). It supports multiple programming languages for developing applications, these are – Scala, Java, Python, SQL, and R.
Hi,I am Arvind and I have completed MS (Computer Science) from IISc in 2007and worked in Nvidia / Symantec / Synopsys before venturing in to the world of startups
I likes to teach and speaks at multiple meetups / conferences. Some of them include Droidcon - an international conference on Android, meetups by HasGeek and Yourstory and GDG Conferences.
I did my masters in Computer Architecture and Compiler Design.
I have co-founded Limitless, a popular chrome extension with more than 70,000 active users. Worked on machine learning algortihms for classifying documents online. and advises few tech companies on technology stack.
I moderates India's largest technology forum,Facebook group https://www.facebook.com/groups/core.cs/ with more than 300,000 members.
To know more https://www.linkedin.com/in/arvinddevaraj/