Big Data

Big Data [ bɪɡ deɪtə ] (of English. = Big size, data = data ) refers to data sets that are too large to evaluate them with manual and classical methods of data processing. The data can come from a variety of sources such as sensors, cameras, or the monitoring of internet traffic. There are new technologies needed to big data to capture, distribute, store, search, analyze, and visualize. The term is defined blurred, reached around the year 2010 in the German-speaking world and is usually associated with data volumes in the order of terabytes, petabytes and exabytes.


Calculations from the year 2011, according to the global data volume is doubling every 2 years. This development is mainly driven by the increasing production of machine data, eg via protocols of telecommunications connections (Call Detail Record CDR) and web access (log files ), automatic detections of RFID readers, cameras, microphones and other sensors. Big Data fall in the financial industry (financial transactions, stock exchange data ), as well as in the energy sector (consumption data) and health care ( prescriptions ). In science also produces large amounts of data, for example, in geology, genetics, climate research and nuclear physics. The IT industry association Bitkom Big Data has referred to as a trend in 2012.


For companies, the analysis of big data offers the opportunity to gain competitive advantages, generation of potential savings and the creation of new business areas. In research, new insights can be gained by combining large amounts of data and statistical analysis. State agencies are hoping for better results in the field of criminology and terrorism. Examples are:

  • Timely evaluation of web statistics and adaptation of online advertising
  • Better, faster market research
  • Discovery of irregularities in financial transactions ( fraud detection )
  • Implementation and optimization of an intelligent management of energy consumption ( smart metering)
  • Identify correlations in medical diagnostics
  • Real-time cross and up in e-commerce and stationary distribution
  • Construction of flexible billing systems in the telecommunications
  • Secret official business, create motion profiles with programs such as Boundless Informant
  • Data access and analysis on spatio-temporal raster data in science and industry, for example by the Open Geospatial Consortium standard Web Coverage Service

Processing of Big Data

Traditional relational database systems as well as statistical and visualization programs are often not able to process huge amounts of data. For Big Data comes a new type of software used, which works in parallel to up to hundreds or thousands of processors or servers. There are the following challenges:

  • Processing of multiple data sets
  • Processing of multiple columns within a data set
  • Quickly import large amounts of data
  • Immediate query imported data (real time processing )
  • Short response times even for complex queries
  • Ability to process many concurrent queries (concurrent queries)

The development of software for the processing of big data is still in an early stage. Prominent of the MapReduce approach for open source software ( Apache Hadoop and MongoDB ), as well as some commercial products ( Aster Data, Greenplum, MIOedge and others) is is used.


Criticism there is on " Big Data " above all to the effect that the data collection and analysis is often carried out by technical aspects, so that for example the technically simplest way is chosen to collect and to process these data, the evaluation of the possibilities of the data, is limited. Statistical principles as that of a representative sample are often neglected. So criticized the social researcher Danah Boyd:

  • Larger amounts of data would not be of better quality data
  • Not all data are equally valuable
  • "What" and "why" are two different questions
  • In interpretations of caution is warranted
  • Just because it is available, it is not ethical

Thus, a researcher determined, for example, that people maintain not more than 150 friends, which was then introduced as a technical limitation in social networks - in the false assumption as "friends" called acquaintances real friendships would reflect. Certainly not everyone would call all his Facebook friends in an interview as friends - the concept of a " friend " on Facebook is only a willingness to communicate on.

Another critical approach deals with the question of whether Big Data means the end of all theory. Chris Anderson, editor in chief of WIRED described 2008, the lack of credibility of any scientific hypothesis and each model with simultaneous real-time analysis of living and non-living systems. Correlations are more important than causal explanations, the true often later or can falsify.

The Schleswig-Holstein's data protection commissioner has warned: " big data opens up possibilities of informational abuse of power through manipulation, discrimination and informational economic exploitation - connected with the violation of basic human rights. "