The IT Law Wiki
[Q]uantifying the amount of information that exists in the world is hard. What is clear is that there is an awful lot of it, and it is growing at a terrific rate (a compound annual 60%) that is speeding up all the time. The flood of data from sensors, computers, research labs, cameras, phones and the like surpassed the capacity of storage technologies in 2007.[1]
90% of the data in the world today has been created in the last two years alone. Some estimate that data production will be 44 times greater in 2020 than it was in 2009. Others estimate an additional 2.5 quintillion bytes of data is being generated every day.[2]
Estimates show that the amount of data in the world doubles every two years. Should this trend continue, by 2020 there would be 500 times the amount of data as existed in 2011.[3]


Big data

consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis.[4]
is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight, decision making, and process optimization.[5]
refers to the rising flood of digital data from many sources, including the Web, biological and industrial sensors, video, e-mail and social network communications.[6]
is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.[7]
is a collection of datasets so large and complex that it is difficult to use on-hand database management tools, or traditional data processing applications, for their processing that includes capturing, storage, search, sharing, transfer, analysis, and visualization.[8]
[is] data which "exceed(s) the capacity or capability of current or conventional methods and systems." In other words, the notion of "big" is relative to the current standard of computation.[9]
[is] a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.[10]
[is] extremely large data sets that may be analysed computationally to reveal patterns, trends and associations, especially relating to human behaviour and interactions.[11]
[is] a confluence of factors, including the nearly ubiquitous collection of consumer data from a variety of sources, the plummeting cost of data storage, and powerful new capabilities to analyze data to draw connections and make inferences and predictions.[12]


Big data is big in two different senses. It is big in the quantity and variety of data that are available to be processed. And, it is big in the scale of analysis (termed "analytics") that can be applied to those data, ultimately to make inferences and draw conclusions. By data mining and other kinds of analytics, non‐obvious and sometimes private information can be derived from data that, at the time of their collection, seemed to raise no, or only manageable, privacy issues. Such new information, used appropriately, may often bring benefits to individuals and society. Even in principle, however, one can never know what information may later be extracted from any particular collection of big data, both because that information may result only from the combination of seemingly unrelated data sets, and because the algorithm for revealing the new information may not even have been invented at the time of collection.

What really matters about big data is what it does. Aside from how we define big data as a technological phenomenon, the wide variety of potential uses for big data analytics raises crucial questions about whether our legal, ethical, and social norms are sufficient to protect privacy and other values in a big data world. Unprecedented computational power and sophistication make possible unexpected discoveries, innovations, and advancements in our quality of life. But these capabilities, most of which are not visible or available to the average consumer, also create an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it.[13]

The same data and analytics that provide benefits to individuals and society if used appropriately can also create potential harms — threats to individual privacy according to privacy norms both widely shared and personal. For example, large‐scale analysis of research on disease, together with health data from electronic medical records and genomic information, might lead to better and timelier treatment for individuals but also to inappropriate disqualification for insurance or jobs. GPS tracking of individuals might lead to better community‐based public transportation facilities, but also to inappropriate use of the whereabouts of individuals.

Part of the challenge, too, lies in understanding the many different contexts in which big data comes into play. Big data may be viewed as property, as a public resource, or as an expression of individual identity. Big data applications may be the driver of America's economic future or a threat to cherished liberties. Big data may be all of these things.[14]

The Three Vs[]

A common framework for characterizing big data relies on the "three Vs," the volume, velocity, and variety of data, each of which is growing at a rapid rate as technological advances permit the analysis and use of this data in ways that were not possible previously.

  • Velocity is the speed with which companies can accumulate, analyze, and use new data. Technological improvements allow companies to harness the predictive power of data more quickly than ever before, sometimes instantaneously.
  • Variety means the breadth of data that companies can analyze effectively. Companies can now combine very different, once unlinked, kinds of data — either on their own or through data brokers or analytics firms — to infer consumer preferences and predict consumer behavior, for example.

Together, the three Vs allow for more robust research and correlation. Previously, finding a representative data sample sufficient to produce statistically significant results could be very difficult and expensive. Today, the present scope and scale of data collection enables cost-effective, substantial research of even obscure or mundane topics (e.g., the amount of foot traffic in a park at different times of day)."

Sources of big data[]

The sources and formats of data continue to grow in variety and complexity. A partial list of sources includes the public web; social media; mobile applications; federal, state and local records and databases; commercial databases that aggregate individual data from a spectrum of commercial transactions and public records; geospatial data; surveys; and traditional offline documents scanned by optical character recognition into electronic form. The advent of the more Internet-enabled devices and sensors expands the capacity to collect data from physical entities, including sensors and radio-frequency identification (RFID) chips. Personal location data can come from GPS chips, cell-tower triangulation of mobile devices, mapping of wireless networks, and in-person payments.


  1. "Data, Data, Everywhere: A Special Report on Managing Information," The Economist (Feb. 25, 2010) (full-text).
  2. IBM, "Big Data at the Speed of Business" (full-text); CSC, "Big Data Universe Beginning to Explode" (full-text).
  3. NIST Big Data Interoperability Framework, Vol. 1, at 13.
  4. NIST Big Data Interoperability Framework, Vol. 1, at 5.
  5. Gartner, "The Importance of 'Big Data': A Definition" (full-text).
  6. Steve Lohr, "New U.S. Research Will Aim at Flood of Digital Data," N.Y. Times (Mar. 29, 2012) (full-text).
  7. James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh & Angela Hung Byers, McKinsey Global Institute, "Big Data: The Next Frontier for Innovation, Competition and Productivity," Executive Summary 1 (May 2011) (full-text).
  8. Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives, at 7.
  9. National Institute of Standards and Technology.
  10. Jonathan Stuart Ward & Adam Barker, "Undefined By Data: A Survey of Big Data Definitions" (Sept.20, 2013) (full-text).
  11. Make or Break: The UK's Digital Future, at 21 n.27.
  12. Big Data: A Tool for Inclusion or Exclusion?: Understanding the Issues, at 1.
  13. Big Data: Seizing Opportunities, Preserving Values, at 3.
  14. Id. (citation omitted).


See also[]

External resources[]

  • "Data, Data Everywhere, A Special Report on Managing Information," The Economist (Feb. 25, 2010) (full-text).
  • "Dealing with Data," Science (special issue) (Feb. 11, 2011) (full-text).
  • Robert Kirkpatrick, "Beyond Targeted Ads: Big Data for a Better World" (2012) (full-text).
  • Jules Polonetsky & Omer Tene, "Privacy and Big Data: Making Ends Meet," 66 Stan. L. Rev. Online 25 (2013) (full-text).
  • Edith Ramirez, "The Privacy Challenges of Big Data: A View from the Lifeguard's Chair," Keynote Address by FTC Chairwoman Edith Ramirez (Technology Policy Institute Aspen Forum) (Aug. 19, 2013) (full-text).