Digital Universe study finds that 0.5% of global data is analysed, and just half of data requiring security measures is protected
• More from the Guardian on big data
• More data journalism and data visualisations from the Guardian
The global data supply reached 2.8 zettabytes (ZB) in 2012 – or 2.8 trillion GB – but just 0.5% of this is used for analysis, according to the Digital Universe Study.
Volumes of data are projected to reach 40ZB by 2020, or 5.25 GB per person, with emerging economies accounting for an increasingly large proportion of the world’s total.
The report also contained a warning on data security, with levels of protection shown to be lagging behind the expansion in volume. In 2012 less than a fifth of the world’s data was protected, despite 35% requiring such measures.
The study, carried out by the International Data Corporation (IDC) and sponsored by big data specialists EMC, is the sixth annual audit of global data inventories, and includes all such material gathered, created and replicated to date.
IDC estimated that almost a quarter of data currently held could yield useful insights if properly tagged and analysed, but this potential is still a long way from being achieved.
Just 3% of all data is currently tagged and ready for manipulation, and only one sixth of this – 0.5% – is used for analysis. The gulf between availability and exploitation represents a significant opportunity for businesses worldwide, with global revenues surrounding the collection, storage, and analysis of big data set to reach $16.9bn in 2015 – a fivefold increase since 2010.
“As the volume and complexity of data barraging businesses from all angles increases, IT organizations have a choice: they can either succumb to information-overload paralysis, or they can take steps to harness the tremendous potential teeming within all of those data streams”, said Jeremy Burton, Executive Vice President, Product Operations and Marketing for EMC.
The composition of data with analysis potential is also projected to change, with consumer images and voice calls predicted to disappear almost completely by 2020 in proportional terms, while the share generated from medical uses is set to increase fourfold over the same period.
In 2012 just over a third of all data required some form of protection, but with companies and the public sector generating and holding increasing amounts of personal information, this proportion is expected to exceed 40% by 2020.
Currently the three main reasons for data to require protection are privacy (simple contact details such as an email address), custody (information whose leak could lead to identity theft) and confidentiality (private documents, contact lists), but high security data, such as bank details and medical records, is expected to overtake confidential information in volume over the next eight years.
The report also highlights the concern that emerging markets, which will account for almost two thirds of the world’s data by the end of this decade, typically have lower rates of data protection than the global average.
In 2012, 53% of the world’s data classified as requiring protection of some kind was found to have such measures in place, compared to 44% for India. Outside of Western Europe (59%), the US (58%), China (48%), and India, the rate is 49%.
NEW! Buy our book
More open data
World government data