Exploratory data analysis

The exploratory data analysis ( EDA) or exploratory statistics is a branch of statistics. They examined and evaluated data from which there is only little knowledge about their interrelationships. Many EDA techniques are used in data mining. Moreover, they are often taught in statistics events as an introduction to statistical thinking.

This name was introduced by John W. Tukey in the 1970s. Tukey argued that too great emphasis is placed in the statistics on the evaluation and testing of given hypotheses. He also suggested to use this data in order to obtain possible hypotheses that are subsequently tested. In fact, the confusion between the two types of analysis and its application in the same amount of data can lead to bias.

Objectives

Goals of exploratory statistics are:

  • To make assumptions (hypotheses) about the cause and reason for the observed data
  • Estimate assumptions on which statistical inference can be based
  • To support the selection of appropriate statistical tools and techniques
  • To provide a basis for further data collection through surveys or design of experiments

Method

Basic graphical methods, which are used in the exploratory statistics are:

Histogram

QQ plot

Scatterplot

Mosaikplot

  • Multivariate chart
  • Run Chart
  • Pareto chart
  • Stem leaf plot

Basic quantitative methods are:

  • Median polish
  • Letter values
  • Resistant line
  • Resistant smooth
  • Rootogram

Specific methods

Software

  • Geodata - FREE, user -friendly graphic interface for the application of various methods of exploratory data analysis
  • Ggobi - Free interactive multivariate visualization software with link to R.
  • Live Graph - Open Source Framework for Data Visualization and Exploratory Data Analysis ( includes a real-time graph plotter ).
  • MANET - Free Mac interactive EDA software.
  • Mondrian - Free interactive software for EDA.
  • Orange - Free Component -based software for interactive EDA.
  • PS -Explore
  • VISALIX - Free interactive web application for data visualization and Exploratory Data Analysis.
  • Open analyzer - data analysis software with function blocks for exploratory data analysis ( free for college students ; Free version limited to 5000 records).
  • DataLab - Full version available as a free research license, free evaluation version is limited in the size of the data matrix
  • Fathom 2 ( Dynamic Stochastic and Data Analysis Software ) - free evaluation version is a year and run but includes printing, storage and export restrictions.
323104
de