Overview
  • 10 Mar 2023
  • Dark
    Light
  • PDF

Overview

  • Dark
    Light
  • PDF

Overview

Hyperscale Compliance is an API-based interface that is designed to enhance the performance of masking large datasets. It allows you to achieve faster masking results using the existing Delphix Continuous Compliance offering without adding the complexity of configuring multiple jobs. Hyperscale Compliance first breaks the large and complex datasets into numerous modules and then orchestrates the masking jobs across multiple Continuous Compliance Engines. In general, datasets larger than 10 TB in size will see improved masking performance when run on the Hyperscale architecture.

Hyperscale Compliance Deployment Architecture​

For achieving faster masking results, Hyperscale Compliance uses bulk import or export utilities of data sources. Using these utilities, it exports the data into smaller chunks of delimited files. Hyperscale Compliance engine then configures the masking jobs of all the respective chunks across multiple Continuous Compliance Engines. Upon successful completion of the masking jobs, the masked data is imported back into the database.

HSM_architecture

Hyperscale Compliance Components

The Hyperscale Compliance architecture consists of four components mainly; the Hyperscale Compliance Engine, Source/Target Connectors, the Continuous Compliance Engine Cluster, and the Staging Server.

Hyperscale Compliance Engine

The Hyperscale Compliance Engine is responsible for unloading the data from source and horizontally scaling the masking process by initiating multiple parallel masking jobs across nodes in the Continuous Compliance Engine cluster. Once data is masked, it loads it back to the target data sources. Depending on the number of nodes in the cluster, you can increase or decrease the total throughput of an individual masking job. In the case of relational databases as source and target data sources, it also handles the pre-load (disabling indexes, triggers and constraints) and post load (enabling indexes, triggers and constrainst) tasks like disabling and enabling indexes, triggers and constraints.Currently, the Hyperscale Compliance Engine supports the following two strategies to distribute the masking jobs across nodes available :

  • Intelligent Load Balancing (Default) : This strategy considers each Continuous Compliance Engine’s current capacity before assigning any masking jobs to the node Continuous Compliance Engines. It calculates the capacity using available resources on node Continuous Compliance Engines and already running masking jobs on the engines.
    Below is the formula used to calculate the capacity of the Continuous Compliance Engines:
Engine’s current jobCapacity = Engine’s total jobCapacity - no of currently running jobs on Engine

Engine’s total jobCapacity = Minimum of {CapacityBasedOnMemory, CapacityBasedOnCores}

where
CapacityBasedOnMemory = (TotalAllocatedMemoryForJobs on Engine / MaxMemory assigned to each Engine Job)
CapacityBasedOnCores = [Engine’s CpuCoreCount - 1]
  • Round Robin Load Balancing : This strategy simply distributes the masking jobs to all the node Continuous Compliance Engines using the round robin algorithm.

Staging Area

The Staging Area is where data from the SOR is unloaded to a series of files by the Hyperscale Compliance Engine. It can be a file system that supports NFS protocol. The file system can be attached to volumes, or it can be supplied via the Delphix Continuous Data Engine empty VDB feature. In either case, there must be enough storage available to hold the dataset in an uncompressed format. The staging area should be accessible by Continuous Compliance Engine cluster as well for masking.

Continuous Compliance Engine Cluster

The Continuous Compliance Engine Cluster is a group of Delphix Continuous Compliance Engines (version 6.0.14.0 and later) leveraged by the Hyperscale Compliance Engine to run large masking jobs in parallel. For installing and configuring the Continuous Compliance Engine procedures, see Continuous Compliance Documentation.

Source and Target Data Sources

The Hyperscale Compliance Engine is responsible for unloading data from the source datasource into a series of files located in the staging area. The Hyperscale Compliance Engine require network access to the source from the host running the Hyperscale Compliance Engine and credentials to run the appropriate unload commands. After files are masked, the masked data from the files get uploaded to the target datasource.

In the case of Oracle and MS SQL data sources, a failure in the load may leave the target datasource in an inconsistent state since the load step truncates the target when it begins. If the source and target data source are configured to be the same datasource and a failure occurs in the load step, it is recommended that the single datasource be restored from a backup (or use the Continuous Data Engine's rewind feature if you have a VDB as the single datasource) after the failure in the load step as the datasource may be in an inconsistent state. After the datasource is restored, you may proceed to kick off the another hyperscale job. If the source and target data source are configured to be different, you may use the Hyperscale Compliance Engine's restartability feature to restart the job from the point of failure in the load/post-load step.

The Continuous Compliance Platform

Delphix Continuous Compliance is a multi-user, browser-based web application that provides complete, secure, and scalable software for your sensitive data discovery, masking, and tokenization needs while meeting enterprise-class infrastructure requirements. To read further about Continuous Compliance features and architecture, read the Continuous Compliance Documentation

Next Steps


Was this article helpful?