Executing a Snowflake profiler job
Start a profiler job
Validate the configured connector-info's on the Hyperscale controller using the
/connector-info
GET API endpoint (or/connector-info/<connector-id>
).Configure a profiler-job using the
/profiler-jobs
POST API endpoint. The details required are like creating a Hyperscale Snowflake Connector data-set without the ‘masking-inventory’. It consists of two additional parameters:
a. max_rows_to_fetch – This determines the number of rows to query when ‘VARINAT’ type data is detected in source table.
b. minimum_recognizer_score – This helps with the overall result of the profiler. It determines the minimum recognizer score that is required for a column name/key name matching a prospective regex of a configured private field. For more information, see Tuning the profiler results.
{
"connector_id": 1,
"data_info": [
{
"max_rows_to_fetch": 5,
"minimum_recognizer_score": 0.5,
"source": {
"warehouse_name": "SOURCE_WH",
"database_name": "SOURCE_DB",
"schema_name": "SOURCE_SCHEMA",
"table_name": "SOURCE_TABLE",
"stage_name": "SNOWFLAKE_EXTERNAL_STAGE",
"max_file_size": 16777216
},
"target": {
"warehouse_name": "TARGET_WH",
"database_name": "TARGET_DB",
"schema_name": "TARGET_SCHEMA",
"table_name": "TARGET_TABLE"
}
}
]
}
3. After a profiler job is configured, an execution can be triggered using the /execution
POST API end point. Keep a note of the execution ID received.
{
"profiler_job_id": 1
}
Monitoring execution results
To monitor and get the status of an execution, use the
/execution/status/<execution_id>
GET API endpoint.
{
"id": 1,
"profiler_job_id": 1,
"status": "SUCCEEDED",
"start_time": "2025-01-13T10:33:32.824710",
"end_time": "2025-01-13T10:33:47.114322",
"error": null,
"profiled_masking_inventory": [
{
"warehouse_name": "SOURCE_WH",
"database_name": "SOURCE_DB",
"schema_name": "SOURCE_SCHEMA",
"table_name": "SOURCE_TABLE",
"masking_inventory": [
{
"field_name": "AGE",
"domain_name": "AGE",
"algorithm_name": "dlpx-core:Age SL"
},
{
"field_name": "DETAILS",
"structured_data_format_id": 9
}
]
}
]
}
The ‘structured_data_format_id
’ contains the masking inventory details for ‘VARIANT
’ type data.
Optional - to validate the structured data format from the profiler, the
/structure-data-format/<structuredDataFormatId>
is extended and available on the controller.
Creating the Hyperscale Snowflake Connector Data Set
After an execution succeeds, verify the data set details using the
/data-set/<execution_id>
GET API endpoint.
{
"connector_id": 1,
"data_info": [
{
"source": {
"warehouse_name": "SOURCE_WH",
"database_name": "SOURCE_DB",
"schema_name": "SOURCE_SCHEMA",
"table_name": "SOURCE_TABLE",
"stage_name": "SNOWFLAKE_EXTERNAL_STAGE",
"max_file_size": 16777216
},
"target": {
"warehouse_name": "TARGET_WH",
"database_name": "TARGET_DB",
"schema_name": "TARGET_SCHEMA",
"table_name": "TARGET_TABLE"
},
"masking_inventory": [
{
"field_name": "AGE",
"domain_name": "AGE",
"algorithm_name": "dlpx-core:Age SL"
},
{
"field_name": "DETAILS",
"structured_data_format_id": 9
}
]
}
]
}
You can upload the data set to the controller using the profiler using
/data-set/upload/<execution_id>
POST API endpoint.