Dr Mark van Rijmenam
The rise of cloud storage has helped companies collect and manage massive amounts of data. Data comes from corporate systems, Internet of Things objects and unstructured sources like online forums. New analytics tools like Hadoop help companies make sense of that data.
Yet merely having data and analysis tools doesn’t mean the results of an analysis are meaningful. Getting real insights from data depends on the data being correct. With the many sources that feed into data lakes and the many transformations big data goes through to be processed, there are many possible ways for errors to be accidentally or deliberately introduced. It’s no surprise, then, that one survey found only a third of executives trust their analytics programs.
This lack of trust in the data not only limits its use within the enterprise that collected it but also limits the potential for companies to monetize their databy sharing it with others.
The solution to these problems may be found in an unexpected source: the blockchain technology that supports cryptocurrencies like Bitcoin.
Blockchain and Data Quality
As a technology, blockchain became famous along with Bitcoin, and most companies probably think its relevance, if any, is as another payments technology.
The right way to view blockchain, though, is to see blockchain as a technology and cryptocurrency as the first business domain where it was successfully applied. The features of blockchain apply to any industry, and they specifically address the concerns about data correctness and security that limit the use and sharing of big data.
After all, blockchain is essentially a distributed database forming a ledger. Changes to the ledger need to be agreed upon by every participant in the blockchain, which is reached through a consensus mechanism such as Proof of Work or Proof of Stake. In addition, the hash algorithm and timestamp ensures that data on a blockchain is immutable, verifiable and traceable. However, it important to note that low-quality data isn’t magically transformed into high-quality data. Garbage in still means garbage out. As such, it is vital that data is verified when it is acquired to ensure that bad data is not recorded on the blockchain.
Therefore, organisations that want to apply Blockchain within their organisation will need to ensure that their big data is correct and of the highest standards, since once on a blockchain it can no longer be altered. If done correctly, Blockchain could be a catalyst for better data, resulting in better insights.
Data Provenance and Analytics on a Blockchain
The first place to look at applying blockchain to big data is finance. Every Bitcoin transaction is held within Bitcoin’s blockchain. While Bitcoin offers a level of anonymity, the transaction data is not private; Bitcoin is pseudo-anonymous. With sufficient data, is it possible to find patterns in Bitcoin transactions and eventually link those to people. Multiple companies are working on such solutions, such as Chainalysis, which offers analytics to prevent, detect and investigate cryptocurrency money laundering, fraud and compliance violations.
But viewing blockchain as simply a database means it applies to any industry, not just finance. By storing granular transaction data, any industry can understand its interactions with suppliers and their customers. This provenance of transaction data offers a lot of possibilities for organisations. This is particularly true when sensors are added to products to collect data about usage throughout their lifecycle. Walmart is already using blockchain to increase food safety by increasing the traceability of the product from its origin to the consumer. With Walmart producing 40 petabytes of data daily, managing big data is integral to the store’s continued success and having immutable, verifiable and traceable product data on a blockchain could offer Walmart’s customers insights in the origin of their food and be sure that the data is reliable.
Especially when private data is involved, blockchains have a lot of potential. For example, doctor-patient interactions are transactions that generate lots of records that you want to keep private. Today’s electronic health records make sharing data difficult and are known to contain many errors and lack protection of patient data. Healthcare data provenance could offer research valuable insights into the effects of medicine on patients and different treatments. However, you want to be sure that this data is kept private and secure. In the UK, Google DeepMind and the NHS are partnering to use blockchain to encrypt and safely store patient data. One use of this data will be to create a verifiable data audit, ensuring that data used in research projects has the appropriate permissions for use in the research.
Big Data Tools for Blockchain
Companies that want to make use of big data and blockchain will find new tools developed to support this, such as BigChainDB built on top of MongoDB. By adding the features needed for enterprise development, including scalability, queryability, and audit trails, it will become easier for organisations to build blockchain-based applications that meet corporate standards. The spread of these tools, along with the continued push for digital transformation that requires companies to make better use of digitally generated and collected data, will drive companies to adopt blockchain to effectively out-compete others in their industries.
Being able to analyse data provenance and be ensured that the data is reliable by being immutable, verifiable and traceable is a paradigm shift. Blockchain could be a catalyst for data quality, and as such, the convergence of big data and blockchain offers tremendous opportunities for organisations and consumers.