Across the globe there are a number of regulations, compliance laws and privacy restrictions which businesses need to navigate through. Although these rules are all designed to protect humanity and societies in different cultures, global businesses are subjected to spending massive amounts of time and money to overcome these obstacles in order to do business across borders.
Yet, for businesses to gain competitive business insight, they must access sensitive data which is sometimes located in different countries. In this, businesses will also need to weave through different country’s regulations, to access this critical data. This sensitive data must be protected, traced, and tracked in order to keep all the information safe.
What is data lineage and how does this impact businesses?
To begin with the definition of data lineage, it involves the curation of the original data – the original truth. No matter how many times this data is moved or copied, its origins and in all its locations and forms is removed. Now, when this requirement is multiplied by petabytes of data globally within an enterprise and for most, the challenge can seem daunting. Successful enterprises can navigate through these requirements, however – but it can be costly.
Businesses in regulated industries must prove that they can effectively store a certain type of sensitive data, but also must be able to prove that, when permitted, this data no longer exists anywhere. If this data has been deleted, but still shows up in any form or location, the data can still be recalled, which can bring on litigation cases against the business. It does not matter whether the enterprise is aware of the existence of the rogue data or not, they are still liable. The Data Life Cycle Management process is the sequence of the creation, use, retention, and eventual erasure of data. In some industries, the duration of this life cycle can span decades or more.
The associated costs and burden which businesses have of managing data, tracking its movements, replication, and its locations, can place a massive strain on the ability of an organisation to conduct the business they need to do. The issue is the “use” phase of the data’s life cycle – how do businesses make data useful, accessible, and analysable across a vast web of multinational regulations, without losing track of it? The answer is perhaps simpler than expected – leave the data in place, where it is safe and controllable – leave the original as the original.
Even though analysing data-in-place sounds like an easy solution to this industry problem, it is not the first time this approach has been tried. The problem of network latency comes into play – it is not sufficient to just access the original data where it persists from anywhere. The race between network latency and data size has been a back-and-forth struggle throughout the history of computer networking. Even as the world gets digitally smaller, network latencies can make accessing data seem too far away to be efficiently analysed with high performance analytical databases engines which are already on the market.
What is the solution to network latency?
There are three primary types of network latency which are; latency caused by distance, latency caused by congestion, and latency caused by the network design itself, intentionally or by accident. Combinations of these latency types in the same network, makes the issue much worse. All three types can, however, cause analytic access to data to be too slow to be useful, which reduces the usable throughput, which is required to gain insight from critical data, to outright intolerable.
The instinctive solution is to place the data near the processing engines, where it is needed. This means copying data to local storage locations to give the data local performance access. However, this creates a whole new set of issues. Needing to keep track of where all of these data copies are located and when the use of the data is completed and removing the data from all locations, can be difficult and costly. This includes tracking down potential locally backed up copies and any off-site media copies, and local disaster recovery replicas in those remote locations.
The simplest and most practical solution is to leave the original data in place. This is possible today with the combination of technologies which are already on the market. When businesses choose the right combination, they could optimise latencies in Wide Area Networks (WAN) and potentially increase throughput by over seven-times when directly compared to the same WAN by itself.
Businesses could be able to use as much as 95 percent of the WAN connection to analyse data where it is stored versus copying and staging the data closer to their analytic engines. That is global analytics with data-in-place and at scale. This frees up IT teams to solve bigger issues, rather than needing to keep track of where sensitive data is being copied. They can manage and control data where they need to. This also has the potential to minimise regulatory requirements as some regulations allow the transient inflight use of data versus the persistence of data in other countries.
The cost savings and reduced management spent could also play a role in planning data access methods. Combining the right technology is perfect for on-premises, private, hybrid, public and multi-cloud environments where long network latency might halt enterprises from being able to fully leverage access to their sensitive data. NC
© 2022 Vcinity