- 09 May 2023
- Print
- DarkLight
- PDF
How to Setup a Hyperscale Compliance Job
- Updated on 09 May 2023
- Print
- DarkLight
- PDF
Pre-checks
You must check the following before starting a job:
- Storage space must be 2 times the size of the source data for NFS storage.
- You must have sufficient storage in the target DB for loading the masked data.
- You must check and increase the size of the temporary tablespace in Oracle. For example, if you have 4 billion rows, then you must use 100G.
- You must check and provide the required permission(i.e 770 or 700) after creating an empty VDB(or mounting an NFS share) on the mount folder on the Hyperscale Compliance host.
- Based on the umask value for the user that is used to mount, the permissions for the staging area directory could get altered after the empty VDB or NFS share has been mounted. In such cases, you must re-apply the permissions (i.e 770 or 700) on the staging area directory.
- You must restart the services after changing the permission on VDB mounted folder in case you already have created the containers.
- Continuous Compliance Engine should be cleaned up before use and should only be used with Hyperscale Job. Any other masking job on Continuous Compliance Engine apart from Hyperscale Compliance Engine will impact the performance of Hyperscale Compliance jobs.
- Currently, the Hyperscale Compliance Engine doesn’t provide the ability to allow you to configure the masking job behavior in case of non-conformant data and does not process non-conformant data warning from the Delphix Continuous Compliance Engine. Therefore, it is recommended to verify the value of
DefaultNonConformantDataHandling
algorithm group setting on all the Hyperscale Compliance Engines. For more information, refer to the Algorithm Group Settings section. It is recommended to set the value to FAIL so that Hyperscale Job will also fail instead of leaving the data unmasked. - If the table that you are masking has a column type of BLOB/CLOB, then you must have a minimum of 2GB memory per CLOB/BLOB column. Depending upon the unload-split you are using, you may need to increase this memory in multiple of that. For example, if you have 4 tables (each with 1 column as BLOB/CLOB type) and unload-split is 3, then your memory requirement on the Hyperscale Compliance host will be:
(4(no. of tables) x 2(memory required per CLOB/BLOB column) x 3(unload-split used)GB + 16 GB (minimum required memory for running Hyperscale Compliance Engine) = 40 GB approx
.
API Flow to Setup a Hyperscale Compliance Job
The following is the API flow for setting up and executing a Hyperscale Compliance job.
- Register Continuous Compliance Engine(s)
- Create a Mount Point
- Create Connector Info
- Create a Dataset
- Create a Job
- Create Execution
The following are the sample API requests/responses for a typical Hyperscale Compliance job execution workflow. The APIs can be accessed using a swagger-based API client by accessing URL APIs must be called only in the below order.https://<hyperscale-compliance-host-address>/hyperscale-compliance
.
Engines API
POST /engines (Register an engine):
Request:
{
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"password": "password123"
}
Response:
{
"id": 1,
"name": "Delphix Continuous Compliance Engine 6.0.14.0 on AWS",
"type": "MASKING",
"protocol": "http",
"hostname": "de-6014-continuous-compliance.delphix.com",
"username": "hyperscale_compliance_user",
"ssl": true,
"ssl_hostname_check": true
}
MountFileSystems API
POST /mount-filesystems (Create a File Mount)
Request:
{
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}
Response:
{
"id": 1,
"mountName": "staging_area",
"hostAddress": "de-6014-continuous-data.dlpxdc.co",
"mountPath": "/domain0/group-2/appdata_container-12/appdata_timeflow-13/datafile",
"mountType": "NFS4",
"options": "rw"
}
ConnectorInfo API
POST /connector-info (Create Connector Info for Hyperscale Compliance)
Oracle Request:
{
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user",
"password": "password123"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user",
"password": "password123"
}
}
Oracle Response:
{
"id": 1,
"source": {
"jdbc_url": "jdbc:oracle:thin:@oracle-19-src.dlpxdc.co:1521/VDBOMSRDC20SRC",
"user": "oracle_db_user"
},
"target": {
"jdbc_url": "jdbc:oracle:thin:@rh79-ora-19-tgt.dlpxdc.co:1521/VDBOMSRDC200B_TGT",
"user": "oracle_db_user"
}
}
MSSQL Request
{
"source": {
"jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019",
"user": "sa",
"password": "password123"
},
"target": {
"jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019;",
"user": "sa",
"password": "password123"
}
}
MSSQL Response
{
"id": 1,
"source": {
"user": "sa",
"jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019"
},
"target": {
"jdbc_url": "jdbc:sqlserver://hyperscale-mssql.dlpxdc.co;database=SourceDB2019;instanceName=SQL2019;",
"user": "sa",
}
}
DataSets API
- Table and schema names are case-sensitive.
- For the MSSQL connector, it's recommended to provide filter_key if the unload_split count is more than 1, and the table does not contain a primary/unique key, else data will be unloaded through a sequential approach which may be slower. If filter_key is not provided and the table includes a primary/unique key then Hyperscale will scan the table to fetch the key automatically.
- The dataset date format should be the same as the environment variable date format value.
- In one dataset, all the inventories should use the same date format for all date formats.
POST /data-sets (Create DataSet for Hyperscale Compliance) Alternatively, you can create or update the dataset using payload in a file with the below endpoints: The above endpoints require a file_upload_ref that can be generated via the POST /file-upload endpoint. For more information, refer to the Hyperscale Compliance API documentation.
Request (With Single Table):
{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"unload_split": 4
},
"target": {
"schema_name": "SCHEMA_1_TARGET",
"table_name": "TABLE_1_TARGET",
"stream_size": 65536
},
"masking_inventory": [
{
"field_name": "FIRST_NAME",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
},
{
"field_name": "LAST_NAME",
"domain_name": "LAST_NAME",
"algorithm_name": "LastNameLookup"
}
]
}
]
}
Response (With Single Table):
{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"unload_split": 4
},
"target": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"stream_size": 65536
},
"masking_inventory": [
{
"field_name": "FIRST_NAME",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
},
{
"field_name": "LAST_NAME",
"domain_name": "LAST_NAME",
"algorithm_name": "LastNameLookup"
}
]
}
]
}
Request (with single table & filter_key in the source)
{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"unload_split": 4,
"filter_key": "PKID"
},
"target": {
"schema_name": "SCHEMA_1_TARGET",
"table_name": "TABLE_1_TARGET",
"stream_size": 65536
},
"masking_inventory": [
{
"field_name": "FIRST_NAME",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
},
{
"field_name": "LAST_NAME",
"domain_name": "LAST_NAME",
"algorithm_name": "LastNameLookup"
}
]
}
]
}
Response (with single table & filter_key in the source)
{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"unload_split": 4,
"filter_key": "PKID"
},
"target": {
"schema_name": "SCHEMA_1",
"table_name": "TABLE_1",
"stream_size": 65536
},
"masking_inventory": [
{
"field_name": "FIRST_NAME",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
},
{
"field_name": "LAST_NAME",
"domain_name": "LAST_NAME",
"algorithm_name": "LastNameLookup"
}
]
}
]
}
Request (With multiple tables):
{
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS" -->(optional field, this needs to be added only while working with date/time masking)
}
]
}
]
}
Response (With multiple tables):
{
"id": 1,
"connector_id": 1,
"mount_filesystem_id": 1,
"data_info": [
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_0"
},
"masking_inventory": [
{
"field_name": "col_VARCHAR",
"domain_name": "FIRST_NAME",
"algorithm_name": "FirstNameLookup"
}
]
},
{
"source": {
"unload_split": 2,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"target": {
"stream_size": 65536,
"schema_name": "DLPXDBORA",
"table_name": "test_multi_1"
},
"masking_inventory": [
{
"field_name": "COL_TIMESTAMP",
"domain_name": "DOB",
"algorithm_name": "DateShiftVariable",
"date_format": "yyyy-MM-dd HH:mm:ss.SSS"
}
]
}
]
}
Algorithm and Domain names to be provided in the Data Set request should be used from Continuous Compliance Engine. The Continuous Compliance Engine APIs that could be used to get these names are:
- Get all algorithms (GET /algorithms) for Algorithm Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllAlgorithms
- Get all domains (GET /domains) for Domain Names. Sample Endpoint: https://maskingdocs.delphix.com/maskingApiEndpoints/5_1_15_maskingApiEndpoints.html#getAllDomains
To check about extra parameters that need to be provided in the Data Set request for Date and Multi Column Algorithms, refer to Model DataSet_masking_inventory on the Hyperscale Compliance API Documentation page available in the API Reference section of this Documentation.
Alternatively, you can create/update the dataset using payload in a file with below end-points:
- POST /data-sets/file-upload
- PUT /data-sets/file-upload/{dataSetId}
Above endpoints requires a file_upload_ref, which can be generated via POST /file-upload endpoint.See <Link to API Doc>
Jobs API
POST /jobs (Create a Hyperscale Compliance Job)
{
"name": "job_1",
"masking_engine_ids": [
1
],
"data_set_id": 1,
"app_name_prefix": "app",
"env_name_prefix": "env",
"retain_execution_data": "NO",
"source_configs": {
"max_concurrent_source_connection": 30
},
"target_configs": {
"max_concurrent_target_connection": 30,
"parallelism_degree": 15
},
"masking_job_config": {
"max_memory": 1024,
"min_memory": 0,
"description": "Job created by Hyperscale Masking",
"feedback_size": 100000,
"stream_row_limit": 10000,
"num_input_streams": 1
}
}
- For more information on
retain_execution_data flag
, see Cleaning Up Execution Data. - In the case of Oracle, set parallelism_degree in the target_configto use the degree of parallelism while re-creating the indexes in the post-load step.
- Set {"parallelism_degree" :0 }: Use unmodified DDL provided by Oracle.
- Set {"parallelism_degree": -1}: Remove any parallel or nonparallel clause from the DDL.
- Set {"parallelism_degree": any positive vale}: Remove existing parallel degree or nonparallel clause and add Parallel <parallelism_degree> to the DDL.
Response:
{
"id": 1,
"name": "Test_Job",
"masking_engine_ids": [
1,
2,
3
],
"data_set_id": 1,
"app_name_prefix": "Test_App",
"env_name_prefix": "Test_Env",
"retain_execution_data": "NO",
"source_configs": {
"max_concurrent_source_connection": 30
},
"target_configs": {
"max_concurrent_target_connection": 30
},
"masking_job_config": {
"max_memory": 1024,
"min_memory": 1024,
"description": "Job created by Hyperscale Masking",
"feedback_size": 100000,
"stream_row_limit": 10000,
"num_input_streams": 1
}
}
JobExecution API
POST /executions (Create an execution of a Hyperscale Job)
Request:
{
"job_id": 1
}
Response: (Immediate response will be like below. Realtime response can be fetched using GET /executions/{execution_id} endpoint)
{
"id": 124,
"job_id": 38,
"status": "RUNNING",
"create_time": "2023-05-04T12:43:03.444964",
"tasks": [
{
"name": "Unload"
},
{
"name": "Masking"
},
{
"name": "Load"
},
{
"name": "Post Load"
}
]
}
GET /executions/{id}/summary (Returns the Job Execution by execution id in summarized format)
{
"id": 72,
"job_id": 5,
"status": "SUCCEEDED",
"create_time": "2022-12-18T13:38:43.722917",
"end_time": "2022-12-18T13:43:16.554603",
"total_objects": 4,
"total_rows": 16,
"tasks": [
{
"name": "Unload",
"status": "SUCCEEDED",
"start_time": "2022-12-18T13:38:44.184296",
"end_time": "2022-12-18T13:38:54.972883",
"received_objects": 4,
"succeeded_objects": 4,
"failed_objects": 0,
"processing_objects": 0,
"processed_rows": 16,
"total_rows": 16
},
{
"name": "Masking",
"status": "SUCCEEDED",
"start_time": "2022-12-18T13:38:51.979725",
"end_time": "2022-12-18T13:42:58.569202",
"received_objects": 4,
"succeeded_objects": 4,
"failed_objects": 0,
"processing_objects": 0,
"processed_rows": 16,
"total_rows": 16
},
{
"name": "Load",
"status": "SUCCEEDED",
"start_time": "2022-12-18T13:40:39.350857",
"end_time": "2022-12-18T13:43:12.966492",
"received_objects": 4,
"succeeded_objects": 4,
"failed_objects": 0,
"processing_objects": 0,
"processed_rows": 16,
"total_rows": 16
},
{
"name": "Post Load",
"status": "SUCCEEDED",
"start_time": "2022-12-18T13:43:12.981490",
"end_time": "2022-12-18T13:43:15.764366",
"metadata": [
{
"type": "Constraints",
"total": 20,
"processed": 20,
"status": "SUCCESS",
"start_time": "2022-12-18T13:43:12.981490",
"end_time": "2022-12-18T13:43:15.764366"
},
{
"type": "Indexes",
"total": 10,
"processed": 10,
"status": "SUCCESS",
"start_time": "2022-12-18T13:43:12.981490",
"end_time": "2022-12-18T13:43:15.764366"
},
{
"type": "Triggers",
"total": 5,
"processed": 5,
"status": "SUCCESS",
"start_time": "2022-12-18T13:43:12.981490",
"end_time": "2022-12-18T13:43:15.764366"
}
]
}
]
}
GET /executions/{execution_id} (Returns the Job Execution by execution_id in the detailed format)
Request:
id: 1
Response:
{
"id": 1,
"job_id": 1,
"status": "SUCCEEDED",
"create_time": "2023-04-26T12:34:38.012768",
"end_time": "2023-04-26T12:37:32.410297",
"total_objects": 1,
"total_rows": 499999,
"tasks": [
{
"name": "Unload",
"status": "SUCCEEDED",
"start_time": "2023-04-26T12:34:38.027224",
"end_time": "2023-04-26T12:34:42.435849",
"metadata": [
{
"source_key": "dbo.test_TEMP",
"total_rows": 499999,
"status": "SUCCEEDED",
"unloaded_rows": 499999
}
]
},
{
"name": "Masking",
"status": "SUCCEEDED",
"start_time": "2023-04-26T12:34:40.420073",
"end_time": "2023-04-26T12:35:12.423744",
"metadata": [
{
"source_key": "dbo.test_TEMP",
"total_rows": 499999,
"status": "SUCCEEDED",
"masked_rows": 499999
}
]
},
{
"name": "Load",
"status": "SUCCEEDED",
"start_time": "2023-04-26T12:37:08.482240",
"end_time": "2023-04-26T12:37:22.417561",
"metadata": [
{
"source_key": "dbo.test_TEMP",
"total_rows": 499999,
"status": "SUCCEEDED",
"loaded_rows": 499999
}
]
},
{
"name": "Post Load",
"status": "SUCCEEDED",
"start_time": "2023-04-26T12:37:22.426813",
"end_time": "2023-04-26T12:37:22.814583",
"metadata": [
{
"status": "SUCCEEDED",
"table_set": [
"test_TEMP_Result"
],
"object_details": [
{
"type": "Triggers",
"total": 2,
"processed": 2,
"status": "SUCCEEDED",
"start_time": "2023-04-26T12:35:10.325948",
"end_time": "2023-04-26T12:37:22.804792"
},
{
"type": "Indexes",
"total": 4,
"processed": 4,
"status": "SUCCEEDED",
"start_time": "2023-04-26T12:35:10.325948",
"end_time": "2023-04-26T12:37:22.804792"
},
{
"type": "Constraints",
"total": 5,
"processed": 5,
"status": "SUCCEEDED",
"start_time": "2023-04-26T12:35:10.325948",
"end_time": "2023-04-26T12:37:22.804792"
}
]
}
]
}
]
}
- Only in case of execution failure, the below API can be used to restart the execution:
PUT /executions/{execution_id}/restart
(Restart a failed execution) - The below API can be used only for manually cleaning up the execution:
DELETE /executions/{execution_id}
(Clean up the execution)