Click here for Support   |    Sales: +1 866 755 0267
Article

Establishing a Continuous Data Pipeline with Vcinity on AWS

By Saptarshi Banerjee, Solutions Architect – AWS
By Scott Henderson, Solutions Architect – Vcinity

Modern enterprises and agencies manage enormous amounts of data—often widely dispersed across their organizations, which can pose hurdles regarding cost, security, and time.

These issues become increasingly problematic as an organization or agency’s operations and data sources scale and the distance between data at the edge and on-premises grows.

Maintaining a continuous data pipeline, despite the distance, is critical to business outcomes. It builds trust in your data and insights as your infrastructure expands to the edge. Future-proofing your business in this regard ensures optimal performance moving forward.

Getting data to the Amazon Web Services (AWS) cloud and delivering an end-to-end data pipeline that offers more secure and global access to data, users can optimize data-intensive application performance and derive greater value from their data.

In this post, we share how enterprises and agencies can quickly, efficiently, and easily extract the true value of their data with AWS microservices and market-ready solutions.

Vcinity is an AWS Specialization Partner and AWS Marketplace Seller with the Migration and Modernization Competency. Vcinity’s software products increase the agility and velocity of digital transformation for enterprises by enabling AWS services to instantly access and operate on data with local-like performance, regardless of where it exists.

Solution Overview

Vcinity offers two solutions to create a continuous data pipeline: data movement and remote data access. These tools make data more accessible—even over great distances—whether moving it into or around the AWS cloud or not moving it at all.

Data Movement

Moving data from one place to another is a core function of edge, cloud, hybrid, or multicloud computing. Vcinity enables fast, secure, and reliable data movement into and around AWS. You can learn more about the AWS Validation testing in this video.

Vcinity expedites data transfers by using existing wide-area network (WAN) connections and moving data at nearline speeds with more than 90% sustained bandwidth.

Figure 1 – Vcinity transforms AWS’s global reach to data at scale.

This solution is a great fit for:

Remote Data Access

Remote data access enables your geographically dispersed, unstructured data to become a single, globally accessible dataset. This means your AWS workloads can have access to critical, time-sensitive data from anywhere in the world.

Not having to move data, while still providing a local experience, saves time and transfer costs, reduces data management complexity, and reduces security risks for enterprises.

This enables AWS applications and microservices to tap into your data as soon as it’s created, regardless of whether they are in the AWS cloud or in another region.

Figure 2 – Vcinity turns distributed data into a single, globally-available dataset.

Remote data access is a strong fit for the following use cases:

  • Burst compute for rendering geo-distributed analytics at time of data creation.
  • Training artificial intelligent (AI), generative AI, and machine learning (ML) models on siloed data.
  • Enabling remote workstations or distributed teams.

How Do These Solutions Work?

Vcinity solutions are designed to extend and manage remote directory memory access (RDMA)-based protocols over high-latency networks where other protocols would be ineffective.

Rather than attempting to solve the problems of high-latency networks with data modification or relying on minor improvements to legacy protocols, Vcinity has taken a different approach from the ground up in order to deliver the world-class performance for global data movement and remote data access workflows.

This includes the following key functionalities:

  • Network fabric: Vcinity developed a new approach to addressing the effects of latency across long-haul data links, taking best-of-breed high-performance storage networking protocols and services and stretching them to WAN scale. Customer data is carried within Vcinity’s proprietary traffic-engineered flows to manage the network transport for high-latency and high-bandwidth connections and accommodate large amounts of data in flight.
  • Routing: Vcinity’s network traffic engineering is encapsulated in IP; this means Vcinity’s high-performance protocols can operate on any later 1, 2, or 3 network, and does not require multiple streams to achieve.
  • Data in-flight buffering/crediting: The Vcinity system contains significant buffer memory to ensure data in flight over a high-latency network can persist in the event of packet loss or other errors on the network.

These buffers can support up to 2.5 seconds of latency for a link speed of 10 Gbps, or 250 ms at 100 Gbps, depending on system configuration.

Vcinity’s Solution Components

  • Flow control: Vcinity manages the lossless tunnel with flow control between any two endpoints, and includes forward congestion signaling downstream and backward congestion signaling upstream. Both lossless and lossy networks will benefit from this capability, as it ensures data is only put on the network when there’s space to receive the data at the far end of the network. It’s also possible to configure multiple tunnels for aggregated performance and increased overall resiliency with Vcinity’s DataPrizm capability.
  • Security: Vcinity supports the Advanced Encryption Standard in Galois Counter Mode (AES-GCM) and authentication at line rate for security. The encryption is done within the Field Programmable Gate Arrays (FPGAs) to minimize data impacts and performance degradation.

To create an even stronger security paradigm, customers can combine the multi-tunnel capabilities presented by DataPrizm with per-tunnel encryption, with even more bespoke security available by leveraging a key orchestration system that regularly rotates keys. This approach ensures that even a successful man-in-the-middle attack would only have access to an encrypted fraction of the in-flight data.

Use Case: Supporting Analytics and AI/ML Workloads

In combination with cloud-based analytics tools, Vcinity’s solution enables customers across industries to unify and transform disparate data into business and customer value more quickly and easily.

AI/ML and analytics applications can accelerate the consolidation of data and even analyze data in near real-time across hybrid, distributed environments. Although Vcinity’s data movement solution accelerates how quickly analytic and AI/ML functions can begin processing and providing outputs, there are certain circumstances wherein data cannot be moved from a certain location.

Likewise, even the fastest data movement cannot always satisfy business or customer needs, such as time-sensitive actions like specialized, remotely located doctors reviewing patient medical imagery files to provide treatment recommendations, a financial analyst merging datasets to run portfolio analytics, or a data scientist training a genomics sequencing model.

In these cases, Vcinity’s remote data access solution enables organizations to access and operate on data while it stays in place—securely and in real-time with local-like performance. This eliminates the need to migrate, copy, or cache data to the user or application.

Getting Started

Whether you want to move edge or on-premises data to an AWS database for analysis, or deploy the remote data access solution to access and run analysis on data on-premises before it’s moved to the cloud, use the following steps. Additional details are available in the Vcinity AWS User Guide.

Step 1: Create and Launch the Vcinity Amazon EC2 Instance

  • Vcinity offers its software via AWS Marketplace. Choose the appropriate version and Amazon Elastic Compute Cloud (Amazon EC2) instance type for the Vcinity deployment.
    • Vcinity provides guidance and recommended instance types for the Amazon Machine Image (AMI) on AWS Marketplace. Further consultation on instance types is available via Vcinity account teams.
  • Choose your deployment parameters, including virtual private cloud (VPC), network segments, security groups, and attached Amazon Elastic Block Store (Amazon EBS) disks.
  • Each Vcinity EC2 endpoint is created independently.

Step 2: Prepare Configuration of the Vcinity EC2 Instance

  • Following the Vcinity AWS User Guide, perform the baseline system configuration via secure shell protocol (SSH) to gain root access to the Vcinity instance and name the host. Once that’s complete, the remaining configuration is defined in a configuration text file. Downloadable samples of this text file are included with the Vcinity AMI, and guidance on updating the text file are included in the User Guide.
  • Parameters defined include local and remote interface detail, network shared disks (NSDs) to be used for the Vcinity file system, firewall port information, and wide-area network information including frame size and tunnel size.
  • Each Vcinity EC2 endpoint is configured independently on each Vcinity system.

Step 3: Connect the Vcinity EC2 Instance to Another Vcinity Endpoint

  • Once the configuration file is complete, it’s uploaded via Vcinity’s browser-based graphical user interface (GUI). After upload, the system self-deploys using the details gathered in the configuration file.
  • After initial deployment is completed, the remote location’s Vcinity fabric extender is discovered and a tunnel is established.
  • This remote site discovery process is configured independently on each Vcinity system.

Step 4: Connect the Vcinity EC2 Instance to Local Services

  • The Vcinity node interfaces with local storage services via Amazon S3 (including Amazon S3 Express One Zone), Network File System (NFS), or Server Message Block (SMB) protocols.

Step 5: Start Moving Data

  • Once the tunnel between the two Vcinity endpoints is established and local storage and/or compute services are connected at each end, data can be copied or moved between two sites using legacy system tools, or by using Vcinity’s AccessX software.
  • Vcinity’s AccessX tool enables storage-to-storage data transfer and has some built-in tools to optimize the transfer process without any data modification. AccessX can package small files into a larger file to mitigate latency.
  • Copy jobs can be executed on demand or via a scheduler and can be run at a file or directory level, whichever is most appropriate for a customer workflow.

Conclusion

In this post, we demonstrated how Vcinity supports organizations’ ability to access data—at scale, across great distances, and in real-time—fueling better and faster insights for customers.

Using Vcinity’s data movement and remote data access solutions can unlock distant datasets and clusters to deliver a continuous data pipeline and seamless access to distributed data at scale.

Vcinity allows you to modernize your data strategy and drive powerful outcomes by operating dynamically across diverse datasets, workloads, and systems, reliably delivering rich, data-driven results.

Get started in AWS Marketplace, where you can select the Vcinity solution that fits your environment needs. For deployment support, please contact Vcinity.