How to Setup a Hyperscale Compliance Job
  • 09 May 2023
  • Dark
    Light
  • PDF

How to Setup a Hyperscale Compliance Job

  • Dark
    Light
  • PDF

Article Summary

Pre-checks

You must check the following before starting a job:

  • Storage space must be 2 times the size of the source data for NFS storage.
  • You must have sufficient storage in the target DB for loading the masked data.
  • You must check and increase the size of the temporary tablespace in Oracle. For example, if you have 4 billion rows, then you must use 100G.
  • You must check and provide the required permission(i.e 770 or 700) after creating an empty VDB(or mounting an NFS share) on the mount folder on the Hyperscale Compliance host.
  • Based on the umask value for the user that is used to mount, the permissions for the staging area directory could get altered after the empty VDB or NFS share has been mounted. In such cases, you must re-apply the permissions (i.e 770 or 700) on the staging area directory.
  • You must restart the services after changing the permission on VDB mounted folder in case you already have created the containers.
  • Continuous Compliance Engine should be cleaned up before use and should only be used with Hyperscale Job. Any other masking job on Continuous Compliance Engine apart from Hyperscale Compliance Engine will impact the performance of Hyperscale Compliance jobs.
  • Currently, the Hyperscale Compliance Engine doesn’t provide the ability to allow you to configure the masking job behavior in case of non-conformant data and does not process non-conformant data warning from the Delphix Continuous Compliance Engine. Therefore, it is recommended to verify the value of DefaultNonConformantDataHandling algorithm group setting on all the Hyperscale Compliance Engines. For more information, refer to the Algorithm Group Settings section. It is recommended to set the value to FAIL so that Hyperscale Job will also fail instead of leaving the data unmasked.
  • If the table that you are masking has a column type of BLOB/CLOB, then you must have a minimum of 2GB memory per CLOB/BLOB column. Depending upon the unload-split you are using, you may need to increase this memory in multiple of that. For example, if you have 4 tables (each with 1 column as BLOB/CLOB type) and unload-split is 3, then your memory requirement on the Hyperscale Compliance host will be: (4(no. of tables) x 2(memory required per CLOB/BLOB column) x 3(unload-split used)GB + 16 GB (minimum required memory for running Hyperscale Compliance Engine) = 40 GB approx.

API Flow to Setup a Hyperscale Compliance Job

The following is the API flow for setting up and executing a Hyperscale Compliance job.

  1. Register Continuous Compliance Engine(s)
  2. Create a Mount Point
  3. Create Connector Info
  4. Create a Dataset
  5. Create a Job
  6. Create Execution

The following are the sample API requests/responses for a typical Hyperscale Compliance job execution workflow. The APIs can be accessed using a swagger-based API client by accessing URL https://<hyperscale-compliance-host-address>/hyperscale-compliance.

Note

APIs must be called only in the below order.

Engines API

POST /engines (Register an engine):

Request:

{
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"password": "password123"
}

Response:

{
"id": 1,
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"ssl": true,
"ssl_hostname_check": true
}

MountFileSystems API

POST /mount-filesystems (Create a File Mount)

Request:

{
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}

Response:

{
"id": 1,
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}

ConnectorInfo API

POST /connector-info (Create Connector Info for Hyperscale Compliance)

Oracle Request:

{
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user",
"password": "password123"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user",
"password": "password123"
}
}

Oracle Response:

{
"id": 1,
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user"
}
}

MSSQL Request

{
  "source": {
        "jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019",
     "user": "sa",
     "password": "password123"
  },
  "target": {
    "jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019;",
    "user": "sa",
     "password": "password123"
  }
}

MSSQL Response

{
 "id": 1,
 "source": {
   "user": "sa",
   "jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019"
 },
 "target": {
   "jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019;",
   "user": "sa",
  }
}
Warning
A failure in the load may leave the target datasource in an inconsistent state since the load step truncates the target when it begins. If the source and target data source are configured to be the same datasource and a failure occurs in the load step, it is recommended that the single datasource be restored from a backup(or use the continuous data engine's rewind feature if you have a VDB as the single datasource) after the failure in the load step as the datasource may be in an inconsistent state. After the datasource is restored, you may kick off another hyperscale job. If the source and target data source are configured to be different, you may use the Hyperscale Compliance Engine's restartability feature to restart the job from the point of failure in the load/post-load step.

DataSets API

Note
  • Table and schema names are case-sensitive.
  • For the MSSQL connector, it's recommended to provide filter_key if the unload_split count is more than 1, and the table does not contain a primary/unique key, else data will be unloaded through a sequential approach which may be slower. If filter_key is not provided and the table includes a primary/unique key then Hyperscale will scan the table to fetch the key automatically.
  • The dataset date format should be the same as the environment variable date format value.
  • In one dataset, all the inventories should use the same date format for all date formats.

POST /data-sets (Create DataSet for Hyperscale Compliance)

Note

Alternatively, you can create or update the dataset using payload in a file with the below endpoints:

  • POST /data-sets/file-upload
  • PUT /data-sets/file-upload/{dataSetId}

The above endpoints require a file_upload_ref that can be generated via the POST /file-upload endpoint. For more information, refer to the Hyperscale Compliance API documentation.

Request (With Single Table):

{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4
},
"target": {
    "schema_name": "SCHEMA_1_TARGET",
    "table_name": "TABLE_1_TARGET",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}

Response (With Single Table):

{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4
},
"target": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}

Request (with single table & filter_key in the source)

{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4,
    "filter_key": "PKID"
},
"target": {
    "schema_name": "SCHEMA_1_TARGET",
    "table_name": "TABLE_1_TARGET",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}

Response (with single table & filter_key in the source)

{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "unload_split": 4,
    "filter_key": "PKID"
},
"target": {
    "schema_name": "SCHEMA_1",
    "table_name": "TABLE_1",
    "stream_size": 65536
},
"masking_inventory": [
    {
    "field_name": "FIRST_NAME",
    "domain_name": "FIRST_NAME",
    "algorithm_name": "FirstNameLookup"
    },
    {
    "field_name": "LAST_NAME",
    "domain_name": "LAST_NAME",
    "algorithm_name": "LastNameLookup"
    }
]
}
]
}

Request (With multiple tables):

{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS" -->(optional field, this needs to be added only while working with date/time masking)
}
]
}
]
}

Response (With multiple tables):

{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS"
}
]
}
]
}
Note

Algorithm and Domain names to be provided in the Data Set request should be used from Continuous Compliance Engine. The Continuous Compliance Engine APIs that could be used to get these names are:

  1. Get all algorithms (GET /algorithms) for Algorithm Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllAlgorithms
  2. Get all domains (GET /domains) for Domain Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllDomains

To check about extra parameters that need to be provided in the Data Set request for Date and Multi Column Algorithms, refer to Model DataSet_masking_inventory on the Hyperscale Compliance API Documentation page available in the API Reference section of this Documentation.

Alternatively, you can create/update the dataset using payload in a file with below end-points:

  1. POST /data-sets/file-upload
  2. PUT /data-sets/file-upload/{dataSetId}

Above endpoints requires a file_upload_ref, which can be generated via POST /file-upload endpoint.See <Link to API Doc>

Jobs API

POST /jobs (Create a Hyperscale Compliance Job)

{
  "name": "job_1",
  "masking_engine_ids": [
    1
  ],
  "data_set_id": 1,
  "app_name_prefix": "app",
  "env_name_prefix": "env",
  "retain_execution_data": "NO",
  "source_configs": {
    "max_concurrent_source_connection": 30
  },
  "target_configs": {
    "max_concurrent_target_connection": 30,
    "parallelism_degree": 15
  },
  "masking_job_config": {
    "max_memory": 1024,
    "min_memory": 0,
    "description": "Job created by Hyperscale Masking",
    "feedback_size": 100000,
    "stream_row_limit": 10000,
    "num_input_streams": 1
  }
}
Note
  • For more information on retain_execution_data flag, see Cleaning Up Execution Data.
  • In the case of Oracle, set parallelism_degree in the target_configto use the degree of parallelism while re-creating the indexes in the post-load step.
    1. Set {"parallelism_degree" :0 }: Use unmodified DDL provided by Oracle.
    2. Set {"parallelism_degree": -1}: Remove any parallel or nonparallel clause from the DDL.
    3. Set {"parallelism_degree": any positive vale}:  Remove existing parallel degree or nonparallel clause and add Parallel <parallelism_degree> to the DDL.

Response:

Note
Below properties are only applicable to Oracle Datasource.
{
  "id": 1,
  "name": "Test_Job",
  "masking_engine_ids": [
    1,
    2,
    3
  ],
  "data_set_id": 1,
  "app_name_prefix": "Test_App",
  "env_name_prefix": "Test_Env",
  "retain_execution_data": "NO",
  "source_configs": {
    "max_concurrent_source_connection": 30
  },
  "target_configs": {
    "max_concurrent_target_connection": 30
  },
  "masking_job_config": {
    "max_memory": 1024,
    "min_memory": 1024,
    "description": "Job created by Hyperscale Masking",
    "feedback_size": 100000,
    "stream_row_limit": 10000,
    "num_input_streams": 1
  }
}

JobExecution API

POST /executions (Create an execution of a Hyperscale Job)
Request:

{
    "job_id": 1
}

Response: (Immediate response will be like below. Realtime response can be fetched using GET /executions/{execution_id} endpoint)

{
  "id": 124,
  "job_id": 38,
  "status": "RUNNING",
  "create_time": "2023-05-04T12:43:03.444964",
  "tasks": [
    {
      "name": "Unload"
    },
    {
      "name": "Masking"
    },
    {
      "name": "Load"
    },
    {
      "name": "Post Load"
    }
  ]
}

GET /executions/{id}/summary (Returns the Job Execution by execution id in summarized format)

{
  "id": 72,
  "job_id": 5,
  "status": "SUCCEEDED",
  "create_time": "2022-12-18T13:38:43.722917",
  "end_time": "2022-12-18T13:43:16.554603",
  "total_objects": 4,
  "total_rows": 16,
  "tasks": [
    {
      "name": "Unload",
      "status": "SUCCEEDED",
      "start_time": "2022-12-18T13:38:44.184296",
      "end_time": "2022-12-18T13:38:54.972883",
      "received_objects": 4,
      "succeeded_objects": 4,
      "failed_objects": 0,
      "processing_objects": 0,
      "processed_rows": 16,
      "total_rows": 16
    },
    {
      "name": "Masking",
      "status": "SUCCEEDED",
      "start_time": "2022-12-18T13:38:51.979725",
      "end_time": "2022-12-18T13:42:58.569202",
      "received_objects": 4,
      "succeeded_objects": 4,
      "failed_objects": 0,
      "processing_objects": 0,
      "processed_rows": 16,
      "total_rows": 16
    },
    {
      "name": "Load",
      "status": "SUCCEEDED",
      "start_time": "2022-12-18T13:40:39.350857",
      "end_time": "2022-12-18T13:43:12.966492",
      "received_objects": 4,
      "succeeded_objects": 4,
      "failed_objects": 0,
      "processing_objects": 0,
      "processed_rows": 16,
      "total_rows": 16
    },
    {
      "name": "Post Load",
      "status": "SUCCEEDED",
      "start_time": "2022-12-18T13:43:12.981490",
      "end_time": "2022-12-18T13:43:15.764366",
      "metadata": [
        {
          "type": "Constraints",
          "total": 20,
          "processed": 20,
          "status": "SUCCESS",
          "start_time": "2022-12-18T13:43:12.981490",
          "end_time": "2022-12-18T13:43:15.764366"
        },
        {
          "type": "Indexes",
          "total": 10,
          "processed": 10,
          "status": "SUCCESS",
          "start_time": "2022-12-18T13:43:12.981490",
          "end_time": "2022-12-18T13:43:15.764366"
        },
        {
          "type": "Triggers",
          "total": 5,
          "processed": 5,
          "status": "SUCCESS",
          "start_time": "2022-12-18T13:43:12.981490",
          "end_time": "2022-12-18T13:43:15.764366"
        }
      ]
    }
  ]
}

GET /executions/{execution_id} (Returns the Job Execution by execution_id in the detailed format)

Note
The execution response may initially return an approximate number of rows at the start of execution and provide actual values later during the execution.

Request:

id: 1

Response:

{
  "id": 1,
  "job_id": 1,
  "status": "SUCCEEDED",
  "create_time": "2023-04-26T12:34:38.012768",
  "end_time": "2023-04-26T12:37:32.410297",
  "total_objects": 1,
  "total_rows": 499999,
  "tasks": [
    {
      "name": "Unload",
      "status": "SUCCEEDED",
      "start_time": "2023-04-26T12:34:38.027224",
      "end_time": "2023-04-26T12:34:42.435849",
      "metadata": [
        {
          "source_key": "dbo.test_TEMP",
          "total_rows": 499999,
          "status": "SUCCEEDED",
          "unloaded_rows": 499999
        }
      ]
    },
    {
      "name": "Masking",
      "status": "SUCCEEDED",
      "start_time": "2023-04-26T12:34:40.420073",
      "end_time": "2023-04-26T12:35:12.423744",
      "metadata": [
        {
          "source_key": "dbo.test_TEMP",
          "total_rows": 499999,
          "status": "SUCCEEDED",
          "masked_rows": 499999
        }
      ]
    },
    {
      "name": "Load",
      "status": "SUCCEEDED",
      "start_time": "2023-04-26T12:37:08.482240",
      "end_time": "2023-04-26T12:37:22.417561",
      "metadata": [
        {
          "source_key": "dbo.test_TEMP",
          "total_rows": 499999,
          "status": "SUCCEEDED",
          "loaded_rows": 499999
        }
      ]
    },
    {
      "name": "Post Load",
      "status": "SUCCEEDED",
      "start_time": "2023-04-26T12:37:22.426813",
      "end_time": "2023-04-26T12:37:22.814583",
      "metadata": [
        {
          "status": "SUCCEEDED",
          "table_set": [
            "test_TEMP_Result"
          ],
          "object_details": [
            {
              "type": "Triggers",
              "total": 2,
              "processed": 2,
              "status": "SUCCEEDED",
              "start_time": "2023-04-26T12:35:10.325948",
              "end_time": "2023-04-26T12:37:22.804792"
            },
            {
              "type": "Indexes",
              "total": 4,
              "processed": 4,
              "status": "SUCCEEDED",
              "start_time": "2023-04-26T12:35:10.325948",
              "end_time": "2023-04-26T12:37:22.804792"
            },
            {
              "type": "Constraints",
              "total": 5,
              "processed": 5,
              "status": "SUCCEEDED",
              "start_time": "2023-04-26T12:35:10.325948",
              "end_time": "2023-04-26T12:37:22.804792"
            }
          ]
        }
      ]
    }
  ]
}
  • Only in case of execution failure, the below API can be used to restart the execution: PUT /executions/{execution_id}/restart
    (Restart a failed execution)
  • The below API can be used only for manually cleaning up the execution: DELETE /executions/{execution_id} (Clean up the execution)

Was this article helpful?