Pandas for data analytics

Learn Data Analytics before diving deep into Data Science

Is it true that to become a data scientist you master the following: statistics, linear algebra, calculus, programming, databases, distributed computing, machine learning, visualization, experimental design, clustering, deep learning, natural language processing, and more???

The answer of this Simply NO!!!

Data science is the simply process of questioning interesting facts, and then answering those questions using large set of data. So data science, in general can be understand as process which includes following steps

  • Design the questionnaire
  • Gather data that might help you to answer that question
  • Clean the data
  • Explore, analyse, and visualize the data
  • Build and evaluate a machine learning model
  • Determine the required results

All above listed tasks do not necessarily requires knowledge of advanced mathematics, a mastery of deep learning, or many of the other skills as mentioned above.

But it still require , skill to understand the data and  ability to work with data with any programming language(R or Python). So Just don’t start with complex concepts, start with Data analytics with Pandas.

Why data analysis  with pandas ?

For working with data in Python, you should learn how to use the pandas library.

Pandas  provides a high-performance data structure (called a “DataFrame”) that is suitable for tabular data with columns of different types, similar to an Excel spreadsheet or SQL table.

It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets, visualizing data, and so much more. In short, learning pandas will significantly increase your efficiency when working with data.


Job Interview tips that will help you get hired

Here are 10 tips for a successful job interview

1) Write a powerful resume

Your resume is the
  • documented proof of the successes you have achieved in various fields in the past.
  • your version of story.
  • through your resume you are able to tell others what you would like them to know about you.

Here are the points to write powerful resume,

  • Have a Master Resume(it can be your LinkedIn profile) with all your projects and work experience.then you can prepare resume from your master resume and match the particular job you are applying.
  • Your front page of resume should be clear so that the employer will get interest to read further of it.
  • There are many templates or websites available to create a resume..Use anyone of them and update your resume consistently
  • Keep it focused. Don’t give complete summary of your academic qualification or about the projects.Make it in few words and you can explain them in details if the employer shows interest in it.
  • Get your resume reviewed by friend or mentor or career counselor.
  • Have a LinkedIn profile so that you will be update your resume frequently whenever you had any updates.
Your resume reveals lot more than the professional career.It reveals,
  • How you view yourself.
  • How well you can present yourself.
  • How much you love your success.
  • How positive and energetic you are for achieving success.
  • How creative you are.
  • How well you can sell yourself.

2) Research about the company
Get to know more about the company which you have applied for interview

and you will be clear to decide if the company is right place for you to work.

This will also help you while having conversation with interviewer and you can show yourself well in “tell me about yourself” by adding few words about the company and your interest in the company.  gives reviews about companies. You can search for companies.

Match your qualifications with Job requirements listed  and be prepared to answer how you are the right choice for the job position.

3) Make a checklist and check it on night before the interview.

  • Resume and referral mail(if you are going through some referral) copies
  • check the Bus route to reach the company.
  • Formal outfit
  • Healthy meal
  • Prepare the questions which you want to ask interviewer to get clarity about the company or the job position.
  • If you have the details of interviewer then check their profile in
  • Avoid sweaty palms

4) Work on your Soft skills

 Apart from technical skills , workplace success needs these skillls. You may be asked few questions indirectly that reveals your soft skills. For example, “how do you spend your free time” , “are you comfortable working in group”, “what was your last group project and how did you handle conflict”, “did you have someone as mentor” , “do you like mentoring your juniors”, “do you attend meetups and network”, “what is your long term goal”, “where do you see yourself in 5 years“, “have you taken online courses”, “do you have github profile”.
  •  People skills
  •  Find mentors
  •  Networking
  •  Build a brand
  •  Long-term career perspective
  •  Value non-technical courses
  •  Understand data gathering and analysis
  •  Presentation skills
  •  Separate personal and professional lives
  •  Work/life balance
  •  Prioritizing and executing
5) Dress for Success

Pick your comfortable formal outfits for your job interview.Just keep your outfit as simple as that because appearance plays major role in getting first good impression.

6) Have a Mock interview

Practice most common interview questions like tell me about yourself , your strengths and weakness , what are your expectations.

Most importantly there are some things which you should tell and not to tell or do during interview.

Should  in interview:

  • You are motivated
  • You are constantly seeking to learn new things
  • You are flexible at work
  • You are a good influncer
  • You are excited about the job.
  • Provide an example for the questions asked by interviewer.

Should not  in interview:

  • Talking negatively about the current employer.
  • showing not interest in knowing about the interviewer or the company.
  • Answering phone calls or text messages.
  • Taking long time for answering the question.If you dont know the answer then be open and tell that you dont know the answer.
  • While introducing you dont tell more about personal life.Just few words is fine.
7) Prepare for Aptitude Test: The first step of almost every IT company is an aptitude test that can be online or written. Aptitude test is conducted to check the overall problem solving ability of the person plus the basics.
It may include:
• Arithmetic Aptitude (questions on basic maths)
• Logical Reasoning
• Verbal (checking how good the person is in English language)
You may practice aptitude at:

8) Prepare for Group Discussion

Some companies may conduct group discussion to evaluate your communication skills and how well you present your points of view
  • When the topic is assigned, if you have any doubts on the topic, just ask. Ask the coordinator whatever is not clear to you before the discussion starts.
  • The one who starts or concludes the discussion has the maximum chances of clearing the GD.
  • Be confident and fluent when you are expressing your points. You need to show your command in English language as well as your confidence about the point you are stating.
  • Be clear with your points. If you are in favour of a topic, you need to support the topic throughout the discussion. You cannot switch your views between the discussion, even if you are against the whole crowd. Extra points are always awarded to the person who has a stand in the discussion.
  • Don’t make the GD a debate. Remember it is a discussion and you don’t have to fight if someone has different views than yours.
  • Let others speak. If you see someone is trying to speak but is not able to speak because of others, you should give him/her a chance by actually saying “Let him speak” politely. It gives a good impression on the coordinator.
  • If you don’t have much knowledge of the topic and don’t have your own points, then listen to everyone’s points and form your view from that and speak. You have to speak at least once in the discussion.
  • When you are addressing the other people in the GD, don’t call them by their name. Address them by saying “my colleague” or “my fellow mate” or “previous speaker”. For example, if you want to carry forward the point of your previous speaker, you can start by “As my fellow mate said” or “I agree or disagree to my previous speaker”.
  • Quote examples on the topic or come up with the facts. It makes the coordinator feel that you have a good knowledge about the topic. Also, other than facts and examples, do come up with your own point of view on the topic as well.
  • If you keep all the above points in mind and practice group discussions with your friends, I can assure you that there will be 90% chances that you will clear the GD.
9) Prepare for the Technical Round
The subjects you may need to study for the multiple choice questions:
• Database Management System
• Data Structures
• Design and Analysis of Algorithms
• Object Oriented Programming Concepts
• Basic C fundamentals

Depending on the background of the company, you may be asked questions from these subjects as well
Operating Systems

Computer Networking

 10) Prepare for the HR Interview – 

One common question asked during HR round is about your strength and weakness. Evaluate your strengths and weakness objectively

Here are few of the strengths with examples that you can quote in the interview (if you have them in you):

Hardworking: I got 10 CGPA in my class 10th or I got my branch changed on the basis of merit after first year or being the head of some society in the college, I was able to make all the last minute preparations for an event and made it hit.
Team Player: Member of societies of the college or member of football team or successful completion of internship project in a team of 4 etc.
Leadership Qualities: Head of some society of the college, caption of the football team or lead a project in a company successfully.
Punctuality: Always on time in class, Timely completion of the projects assigned to you.
Creative: Organised creative events in college, any creativity you have done in your projects.
Good communication skills: You can prove this in your interview itself or give example of debates or speech competitions you have taken part in or any kind of anchoring you have done on the stage.
Good hold in a language, be it English or any foreign language (French, Spanish etc) : It can be a real benefit if you are applying to a multinational company which has foreign clients.
Patient: Give example of a situation that you handled patiently.
Adaptable: You can explain adapting in the hostel environment or a new place you went for a job for the first time or you can explain about being adaptable to new technologies.
Handle Pressure well: Give example of situation that was a bit stressful but you handled it well. You can always explain handling the exam pressure well or the pressure of a new project.

If you feel you don’t have any strength worth telling, you can also manipulate your strengths with the above given examples but make sure you have a good example to tell.
Make sure you know your strengths before going for the interview.


Telling your weakness is little more complex than telling your strengths to the interviewer because this can go completely against you.
So, whenever you are asked your weakness, you need to keep the following point in mind:
When you tell your weakness, along with that, you have to mention what you are doing to overcome that weakness or how that weakness sometimes motivates you to work more. This will make the interviewer understand that you are actually working to overcome your weakness and it will help you have a good impression from the interviewer.

I am mentioning some of the weaknesses along with how to overcome them. You can use them as you want:

Problem of Time management: Having such a busy schedule, I have a time management problem and out of so many things to do, some things are missed.
REMEDY: Making a list of the tasks to do and checking them when completed so that nothing is missed. Or making a proper schedule for the day and following it properly.
I get stressed out when I am not able to meet people’s expectations:
BENEFIT: It motivates me to work harder to fulfill everyone’s expectations and in that process, I get better and better.
Being too critical about myself: I often criticize me for not performing up to the mark.
BENEFIT: It leads to my constant improvement.
You can mention any subject that you are afraid of but make sure you don’t mention that subject that is very important for the job.
English Speaking skills: If you are applying for a pure “in the laptop” job, where you don’t have to interact with much people, you can mention this weakness, or otherwise if you mention, you have to tell what you have do till now to improve it.
IMPROVEMENTS: Read English novels, Watch English movies, Read English newspaper, Some English speaking coaching classes.
I get stressed out when it comes to deadlines:
BENEFIT: So, I make a proper schedule for the deadline to complete and I assure that I complete my work at a fair time before the deadline.

Always remember, you can’t mention that quality your weakness that is much needed for the job.
For example, if you are applying for IT job, you can’t say that Data Structures is your weakness but you can say that you don’t like mathematics that much. But if you are applying for an Accounting job, you can’t say that Maths is your weakness.

So, you may have to mold your weakness on the basis of the company and the post you are applying for. Also, you need to show your flexibility in overcoming your weakness.
I hope now you can answer this tricky question well and take the interview in your favor.

Full Stack development

  1. CSS , HTML, Javascript Jquery
  2. Bootstrap and Foundation are popular CSS frameworks
  3. A responsive website supports different layouts
  4. Frontend build tools – speeds up the development process. Gulp and Grunt are popular ones
  5. Dependency management – Browserify, Webpack, Require.js , Yeomen
  6. Javascript frameworks – React.js, Angular.js, Backbone.js, Ember.js,Vue.js
  7. Backend frameworks – This is where business logic resides – Ruby on Rails, Python Django, PHP, Node.JS , Java, C#
  8. Database – Persistent (MySQL, MongoDB,redis,PostgreSQL,Cassandra)
  9. Caching – Caching reduces the need to hit the database everytime (Nginx, Apache, Redis, InMemory)
  10. Web Platforms – Hosting providers (AWS, DigitalOcean , Heroku, Azure) –
  11. DevOps – Bridges the development process with Server Administration. Automates the Worflow and deployment.
  12. Docker is used for containizering applications, Vagrant is used for provisioning virtual machines(ensures the development environment matches with server)
  13. Server Management – Configuration Management tools. Way of allowing servers to be provisioned (Salt, Puppet, Chef, Ansible)
  14. Others – Authentication, Authorization, API, RESTful services, SOA,Continuos Integration and Deployment
  15. Tools – FTP,SSH, Github

Introduction to Apache Hadoop and friends

Hadoop is a MapReduce framework that enables processing large datasets in parallel, on clusters of commodity hardware. This is cheaper, as it’s a open source solution that can run on commodity hardware while handling petabytes of data. It’s faster on massive data volumes as data processing is done in parallel.

A complete Hadoop MapReduce based solution may have following layers.

1.Hadoop core- HDFS (Hadoop Distributed File System)

This is the data storage system where data is splitted into large files ~64 or 128MB blocks and saved. This can scale to 1000s of nodes and is inspired by Google File System that resolved the problem of indexing the web. In order to cater for fault tolerance, by default, HDFS keep 3 replicas of same data that it stored. Hence there can be integrity violations on the data set at times.

2.MapReduce API

Allows job-based(batch) parallelizable processing, across data in HDFS. This API enables auto-parallelize for huge amounts of data while preserving fault tolerance and adding high availability. Also enables computing logics to come to data, rather than data travelling across the network to be operated on. Mapreduce algorithm expects hardware failures in the used commodity hardware, so have automatic retry is built-in.


As shown in the above picture, there 3 main phases in MapReduce. In the mapping phase, it splits the incoming files into blocks that can be saved in HDFS and output a key-value pair each of these data piece. This output is then consumed by the shuffling phase, which transforms data to reducers. While at shuffling phase, data can be subjected to optimization mechanisms such as sorting or pre-processing that will save time at reducers. Reducers derive the final desired output according to a defined reducing logic consuming the shuffled output.

3. Data access

access the data store in HDFS in optimized or enhanced ways for particular domains. Following are few such layers available for use.

● Pig – A scripting language that provides ‘Extract Transform Load’ library for HDFS
● Hive – Provides Hive Query Language which is similar to SQL for HDFS
● HBase – Inspired by Google BigTable for GFS, HBase provides a more abstract layer to leverages the distributed data storage of HDFS
● Mahout – Specific data access layer for scalable machine learning algorithms on HDFS

4. Tools and libraries

● HUE – Hadoop User Experience that provides a web interface for Hadoop platform.
● Sqoop – A tool to efficiently transfer bulk data between Hadoop and structured data stores. This is useful when enterprises using relational databases, encounter massive growths, which leads to massive amounts of data that can no longer be efficiently handled in relation databases.

5. Monitoring and alerting

On the cluster node performance and statistics related to current jobs in execution, mostly to be used by administration.

Though Hadoop is a framework with lot of power, it is not a solution for all the scalability and data volume problems in the enterprise. It should only be used when batch-processing and the involved latencies are fine with the application.

For example Hadoop may not be suitable for real time analysis where interactivity is needed. Facebook uses Hadoop for ‘Ad Targeting’ while Hadoop is also used for ‘point of sale transactional analysis (for relational systems)’, threat analysis, data sandboxing, recommendation engines, risk modelling and trade surveillance when huge volumes of data is involved.

While Hadoop is an open source framework, there are several vendors such as Cloudera, HortonWorks, MapR, AWS and Windows Azure who provides a premium layer on top of that under a cost, with value addition and support

Introduction to Apache Spark

Apache Spark is an emerging platform that has more flexibility than MapReduce but more structure than a basic message passing interface.

It relies on the concept of distributed data structures (what it calls RDDs) and operators. Because Spark is a lower level thing that sits on top of a message passing interface, it has higher level libraries to make it more accessible to data scientists. The Machine Learning library built on top of it is called MLib

Apache Spark is an open-source data processing framework for Big Data Analytics.

It is unified and parallel data processing framework, designed to cover a wide range of big data workloads such as Batch processing, Real-time processing, Stream analytics, Machine learning and Interactive SQL.

Apart from supporting all these workloads in a system, Spark lessens the burden of memory and tools management.

Apache Spark was initially developed in UC Berkeley’s AMP Lab in the year 2009 but was open sourced in 2010 under BSD license. It was donated to Apache Software Foundation in 2013 and became an Apache project since 2014. It is now considered as an alternative to all other big data technologies.

Spark has originally been written in Scala programming language and runs on Java Virtual Machine environment (JVM Environment). It supports multiple programming languages for developing applications, these are – Scala, Java, Python, SQL, and R.

Apache Spark – Framework Libraries

Features of Apache Spark

Apache Spark is fast cluster computing technology, designed for fast processing of large-scale data. Spark provides a unified and comprehensive solution to manage numerous big data workloads.

With features like near real-time processing and in-memory data storage, it improves the performance and makes speed several times faster as compared to other big data technologies.

The important and advanced features of Apache Spark are:
Unified Framework

Spark offers a unified framework that is packaged with higher-level libraries. It can manage big data processing with different data sets having a diverse data source (batch and real-time streaming) and diverse nature (text, graphics, audio, video etc). This unification improves performance and developer’s productivity.


Spark allows applications in Hadoop cluster to run up to 100 times faster in memory and about 10 times faster on disk. Spark is actually considered to be super fast as it is about three times faster than Hadoop. It becomes possible because Spark reduces the number of the read/write operations on the disk, through holding intermediate results in memory rather than on disk.

Multiple Languages Support

Spark comes up with built-in consistent and concise APIs in multiple languages such as Java, Python, and Scala; and thus allows you to write applications in these different languages.

Ease of Use

Spark contains easy to use APIs for processing large datasets. It also has a built-in set of over 100 high-level operators for processing data, and can also be used to query data interactively within the shell.

Runs Everywhere

Apache Spark can run just everywhere; the user can run Spark standalone, on Hadoop, on Mesos, and in the cloud. Basically, Spark uses HDFS file system for the purpose of data storage but it can also work with any of the Hadoop compatible data sources such as HDFS, HBase, Tachyon, Cassandra, etc.

Advanced Analytics

Spark framework library consists of Spark streaming, SQL and Dataframes, MLib for machine learning and GraphX for graph computation. The developers can use these libraries separately or can combine them in the same application. Thus it supports streaming data, graph algorithms, SQL queries and machine learning along with Map and Reduce operations.

Spark Core Engine

Apache Spark is designed with executive engine i.e. Spark Core Engine can work in memory as well on disk. The high-level Spark operators perform external operations when data size exceeds memory limit. This engine can process all data streams and processes in a faster and expressive way.

There are some other features of Apache Spark that makes it prior over other big data processing techniques.

  • The one of them is it supports lazy evaluation of big data queries that optimizes the number of steps in data processing.
  • It also provides a higher level API that creates a consistent architect model for big data analytics and enhances developer productivity.

These advanced features of Apache Spark extend MapReduce Model and make it efficient for stream processing and interactive queries.

_Competitive programming_

Every Year Big Giants of IT conduct competitions for Computer Science/Information Technology /MCA students. In these competitions they check technical and soft skills of students and offers various types of prizes (like internship, job offers and cash prizes).

Steps for getting into competitive programming

To begin your journey for Competition programming, below are some tips which would help to get into the competition programming.

  • Learn a programming language(like C,C++,JAVA etc) you are going to use in all competitions. C++ is predominant in programming contests and competitions because of its speed, available data structures / algorithms in STL library and because all the competitions support it.
  • Participation and practice is the ultimate mantra. Start with easy problems then move on complex problems and topic wise problems. Also check initial submissions of good coders and editorials, this will make you learn a lot.
  • Avoid time limit exceeded error, always consider worst cases before you start coding. Do proper order analysis of your solution because sometimes a little optimization is to be done but sometime there is a need of totally new and efficient algorithm.
  • Must learn works, proof, code and analysis of basic and well known algorithms. A book “Introduction to Algorithms”, written by Cormen, is considered Bible on this topic. Some of the well known algorithms to be worth of learning are-
    • Sieve of Eratosthenes, or another prime number sieve
    • Depth-first search
    • Breadth-first search
    • Dijkstra’s algorithm
    • Floyd–Warshall algorithm
    • Either Kruskal’s or Prim’s algorithm
    • Some implementation of topological sorting, such as by using DFS
    • Convex hull (like Monotone Chains algorithm)
    • Edmonds–Karp, or another implementation of the Ford–Fulkerson method; or a preflow-push algorithm; or
      Dinic’s algorithm.
  • To get good at writing fast codes and improving your implementation follow this:
    On Topcoder, start with Div2 250 master it then start with Div2 500 master it then move to Div1 250. Meanwhile keep learning algorithms and keep practicing them on SPOJ or Codechef or Codeforces . And do read the tutorials, after a time you will realize that the tricks and methods to solve are repeating themselves.
Benefits of competitive programming

Below are the highlights and top benefits of competitive programming when it comes to finding career success.

1. Makes you a desirable candidate to major companies:

International level programming competition, such as ACM ICPC, is a great opportunity to be seen by Apple, Facebook, Google, IBM and many other executives in the IT field. These major companies are constantly taking care of these types of incidents and participants to find talented employees.

2. Teaches you how to work in teams:

This is a very important skill, because in most of the jobs you will need at some point to work in a team. Competitive programming helps you learn how to work effectively together, because you must work with others in your team to accomplish the same task. You learn how to assess the strengths and weaknesses of your team members and effectively divide the responsibilities between each other.

3. Makes you faster and more focused:

Training and taking part in programming contests make you more disciplined, faster and focused coder. In the programming competition, you must solve problems in stressful situations within a deadline or you will lose. So, taking part in competitive programming teaches you how to be more focused on the task and not only complete it quickly, but also accurately. These skills are highly beneficial for any job, not just in coding.

4. Helps you solve complicated problems:

In programming competitions, you must solve difficult tasks, which are not for ordinary programmers. And this makes you learn how to solve complecated and out of box tasks. This helps prepare you for a job because you can adapt and solve various types of problems, in various types of situations.

5. Training and participation helps you prepare for a career in coding:

It is not surprising to receive job offers from companies immediately after participating in a coding competition. Having experience in competitive programming shows employers you can work in a team, solve complicated problems in stressful situations too, manage time and deadlines with accuracy. It also shows you are a disciplined, focused and fast, which are all indispensable skills.

Below is the list of some Competitions in Computer Science field

  1. IBM TGMC (IBM – The Great Mind Challenge – TGMC – India)
  2. ACM ICPC (ACM International Collegiate Programming Contest)
  3. HackerRank (HackerRank)
  4. Microsoft Imagine Cup (Microsoft Imagine Cup | Microsoft Imagine Cup)
  5. IOI (International Olympiad in Informatics)
  6. Accenture Innovation Jockey (Looking for India’s Innovation Jockeys)
  7. Top Coder (topcoder)
  8. Code Chef (Programming Competition,Programming Contest,Online Computer Programming)
  9. Google Summer of Code (GSoC) (Welcome to Melange)
  10. HackerEarth (HackerEarth – Programming challenges)
  11. Google Code Jam (Google Code Jam)
  12. SPOJ (Sphere Online Judge (SPOJ))
  13. Facebook Hacker Cup (Facebook Hacker Cup)
  14. Google AI Challenge (AI Challenge)
  15. Al Zimmermann’s Programming Contests and some more

Creating Chatbots using Python

Chatbots is computer program which will have a conversation or interaction with user through chat and it is hot now because,

  • create a unique customer experience and also user-friendly.
  • and  give you a feeling that you’re talking to a real person rather than a computer
  • It can schedule meetings, tell you the weather, and provide customer support and many more.That’s why business absolutely love them and use them as part of their branding .

We conducted a webinar on how to develop chatbot using python.

About the Speaker

Srushith from CodeOps Technology , Experienced Software Engineer with a demonstrated history of working in the information technology and services industry. He loves Python, Serverless technologies, Multisim, Matlab, C++, Amazon Web Services, and Cloud Technologies.

This two hour webinar covers,

  1.  Write fun and tiny little code in Python to automatically tweet (and become popular)
  2. Build a simple personal assistant/Conversational User Interface (chatbot) using Python

Introduction to Tensorflow

Tensorflow is an open source software library for numerical computation by using data flow graphs. In graphs, nodes represent mathematical operations, edges represent multidimensional data known as tensors.Computations can be deployed on one or more CPU or GPU in desktop or mobile.

TensorFlow for Machine learning and Deep learning

  • Open source software library created by Google.
  • A library for dataflow programming.
  • As we know that both Machine learning and Deep learning have a pool of powerful algorithms-and both works to skilled a computer to learn automatically complex problem and make a decision and provide solution.
  •  It leverages various optimization techniques to make the calculation of mathematical expressions easier and more performing. Because of this, it is becoming heart of Machine learning and Deep Learning.

Some of the key features of TensorFlow are:

  • Tensorflow is implemented in C++ and is available for C language and Python
  • Efficiently works with mathematical expressions involving multi-dimensional arrays
  • Good support of deep neural networks and machine learning concepts
  • GPU/CPU computing where the same code can be executed on both architectures
  • High scalability of computation across machines and huge data sets

Together, these features make TensorFlow the perfect framework for machine intelligence at a production scale.
If you’re interested in details please refer the below links

  1. In this link , you will learn how you can use simple yet powerful machine learning methods in TensorFlow and how you can use some of its auxiliary libraries to debug, visualize, and tweak the models created with it.
  2. In this link you will learn some new features of TensorFlow’s 1.4 release
    We will keep updated you as will go through the good articles.

For complete details on Tensorflow refer

Introduction to Data Mining

Today, the demand for data analysts and data scientists is so high that the companies are struggling to fill their open positions.

A data scientist is the most in-demand job title in the market and as per the trend will continue to remain so for next couple of decades. So learning about data mining techniques will surely help you in preparation for becoming data analyst or data scientist.

Data mining is a process of getting a useful information from an unorganised raw data.

These techniques is used to
  • predict the future trends
  • mainly to identify the customers and to develop marketing stratergies to increase the sales rate.
The ultimate goal of data mining is prediction – and predictive data mining is the most common type of data mining.
The biggest challenge is to analyse the data to extract meaningful information that can be used to solve a problem or for the growth of the business. There are powerful tools and techniques available to mine data and find insights from it.

There are various data mining techniques. Each technique helps us find different patterns.

Below is the list of the most common data mining techniques.


  • Collect the data by classifying into different classes based on their attribute.
  • These pre-defined classes will help in segregation of data for furthur analysis to give better results.
  • And the classification analysis is majorly used in machine learning algorithms.


  • In this analysis,you can find relationship between multiple variables.
  • It helps in identifying the amount dependency of the variable on other variables.
  • It will predict how one variable will change if a variable related to it changes.


  • It is used to find the relation between variables in large data set and it will extract the hidden patterns in the data.
  • Major application of this technique is in retail industry.


  • Placing data into groups based on similar values.
  • The grouping is done in such a manner that the objects within the same cluster are very similar to each other but they are very dissimilar to the objects in some other cluster.

Anamoly detection

  • As name suggests,it is used to detect unusual pattern.
  • It has wide applications on dectecting fraud in credit/debit card transactions or dectecting hack in network traffic.

Decision tree

  • Well, decision tree is represented graphically as hierarchical structures so they have a very unique property that they are easy to read and understand.
  • In fact, they are among the few models that are interpretable, where you can understand exactly why the classifier makes a decision.Also, it is able to handle numerical and categorical data.Read more

Neural network

A neural network is just an attempt to make computer models like a brain because if computers were more like the brain they could be good at some of the things humans are good at, like pattern recognition. So a neural network simulate a collection of neurons just as done in the brain and these simulated neurons take inputs and give outputs through their connections.Read more

R programming language for statistics and data science

R programming is about 70%  widely used tool from among all the Data analytics tools and languages because it is an open source free software easily extendable with lots of packages.

Due to these reasons R Programming is an important skill acquired in the Data Science field of study.

Importance of R cannot be limited in words, so here we tried to consolidate the resources to learning R. Here you will find lists of Books, MOOCs ,tutorials and much more to learn R.


There are lot of books has written on R. R is worthwhile in variety of fields(Data Science, Business Analytics, Social media and so on) ,  so here I tried to categorized books for R according to different usability of R.

1.Books for Beginners

Beginning R – Free Download eBook – pdf

This book examines R language using simple statistical examples, showing how R operates in a user-friendly context. This book  is useful for learning simple summary statistics , hypothesis testing, creating Graph, regression , and much more . It covers formula notation, complex statistics, manipulating data and extracting components and rudimentary programming.

Hands-On Programming with R

With this book, you’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions and use all of R’s programming tools.

RStudio Master Instructor Garrett Grolemund not only teaches you how to program, but also shows you how to get more from R than just visualizing and modelling data.

2.Installing  RStudio

Getting Started with RStudio

This concise book provides new and experienced users with an overview of R Studio, as well as hands-on instructions for analysing data, generating reports, and developing R software packages.

3.Work with Graphics and R

R Graphics Cookbook

This Practical guide provides more than 150 recipes to help you generate high-quality graphs quickly, without having much knowledge of R’s graphics systems.

Guidebook to R Graphics Using Microsoft Windows

This book takes readers step by step through the process of creating histograms, oxplots ,strip charts ,time series graphs , steam-and-leaf displays , scatterplot matrices and map graphs.

4.Data visualization using R and Java script

Pro Data Visualization using R and JavaScript

In this book, you will learn how to gather data effectively, and also how to understand the philosophy and implementation of each type of chart, so as to be able to represent the results visually.

5.Data science algorithms implemention in R

Practical Data Science Cookbook

This books guide you from the basics( how to set up your numerical programming environment) to advance level of data science pipeline( introduce you to data science iterative process of project completion).After leraning this book will able to implement data science algorithms in both R and Python.

6.Machine learning algorithms with R

Machine Learning for Hackers

If you are an experienced programmer interested in crunching data this book will get you started with machine learning – a toolkit of algorithms that enables computers to train themselves to automate useful tasks.

Using R programming, you will learn how to analyze sample dataset and write simple codes for machine learning algorithms. Machine learning for hackers is ideal for programmers from any background, including business, government, and academics research.

Machine Learning with R Cookbook

This book covers the basics of R by setting up a user-friendly programming environment and performing data ETL in You will then dive into important machine learning topics including data classification, regression, clustering, association rule mining, and dimension reduction.

7.Social media analysis with  R

Social Media Mining with R

This book provides detailed instructions on how to obtain, process and analyze a variety of socially-generated data while providing a theoretical back ground to help you accurately interpret your findings.

8.Business analytics and R

Data Mining and Business Analytics with R

In This book readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction and classification.

9.Web development in R

Web Application Development with R Using Shiny

After learning this full book , you will be able to build useful and engaging web applications with only a few lines of code- no java script required.

10.Analysing Big Data:  R and Hadoop

Big Data Analytics with R and Hadoop

This book is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop.

11.500 Plus links on R programming, statistics and visualization. by Alket Cecaj on Algorithms and Data Fusion


1.R Tutorial – Code School

Codeschool started teaching programming language in the banner of ‘Learn with doing it’. It is an interactive course and the content presentation is very lucid.

2.DataCamp: The Easy Way To Learn R & Data Science Online

Datacamp is the best portal to learn about data science. They’ve created tutorials in simple manner.

3.R Tutorial at tutorialspoint

Tutorialspoint ,One of the site widely known for sharing knowledge about various programming languages. They have created R tutorials as well.



Here is list of different MOOC program where you can learn R

  1. R Programming – Johns Hopkins University | Coursera
  2. Introduction to R for Data Science
  3. Free Introduction to R Programming Online Course| DataCamp
  4. R Programming A-Z™: R For Data Science With Real Exercises!
  5. Learn R Programming from Scratch – Udemy
  6. Introduction to R for Data Science| edX
  7. R Programming – Johns Hopkins University | Coursera
  8. R Fundamentals |
  9. swirl: Learn R, in R.