Executing a Snowflake profiler job

Start a profiler job

Validate the configured connector-info's on the Hyperscale controller using the /connector-info GET API endpoint (or /connector-info/<connector-id>).
Configure a profiler-job using the /profiler-jobs POST API endpoint. The details required are like creating a Hyperscale Snowflake Connector data-set without the ‘masking-inventory’. It consists of two additional parameters:

a. max_rows_to_fetch – This determines the number of rows to query when ‘VARINAT’ type data is detected in source table.

b. minimum_recognizer_score – This helps with the overall result of the profiler. It determines the minimum recognizer score that is required for a column name/key name matching a prospective regex of a configured private field. For more information, see Tuning the profiler results.

CODE

{
  "connector_id": 1,
  "data_info": [
    {
      "max_rows_to_fetch": 5,
      "minimum_recognizer_score": 0.5,
      "source": {
        "warehouse_name": "SOURCE_WH",
        "database_name": "SOURCE_DB",
        "schema_name": "SOURCE_SCHEMA",
        "table_name": "SOURCE_TABLE",
        "stage_name": "SNOWFLAKE_EXTERNAL_STAGE",
        "max_file_size": 16777216
      },
      "target": {
        "warehouse_name": "TARGET_WH",
        "database_name": "TARGET_DB",
        "schema_name": "TARGET_SCHEMA",
        "table_name": "TARGET_TABLE"
      }
    }
  ]
}

3. After a profiler job is configured, an execution can be triggered using the /execution POST API end point. Keep a note of the execution ID received.

CODE

{
  "profiler_job_id": 1
}

Monitoring execution results

To monitor and get the status of an execution, use the /execution/status/<execution_id> GET API endpoint.

CODE

{
  "id": 1,
  "profiler_job_id": 1,
  "status": "SUCCEEDED",
  "start_time": "2025-01-13T10:33:32.824710",
  "end_time": "2025-01-13T10:33:47.114322",
  "error": null,
  "profiled_masking_inventory": [
    {
      "warehouse_name": "SOURCE_WH",
      "database_name": "SOURCE_DB",
      "schema_name": "SOURCE_SCHEMA",
      "table_name": "SOURCE_TABLE",
      "masking_inventory": [
        {
          "field_name": "AGE",
          "domain_name": "AGE",
          "algorithm_name": "dlpx-core:Age SL"
        },
        {
          "field_name": "DETAILS",
          "structured_data_format_id": 9
        }
      ]
    }
  ]
}

The ‘structured_data_format_id’ contains the masking inventory details for ‘VARIANT’ type data.

Optional - to validate the structured data format from the profiler, the /structure-data-format/<structuredDataFormatId> is extended and available on the controller.

Creating the Hyperscale Snowflake Connector Data Set

After an execution succeeds, verify the data set details using the /data-set/<execution_id> GET API endpoint.

CODE

{
  "connector_id": 1,
  "data_info": [
    {
      "source": {
        "warehouse_name": "SOURCE_WH",
        "database_name": "SOURCE_DB",
        "schema_name": "SOURCE_SCHEMA",
        "table_name": "SOURCE_TABLE",
        "stage_name": "SNOWFLAKE_EXTERNAL_STAGE",
        "max_file_size": 16777216
      },
      "target": {
        "warehouse_name": "TARGET_WH",
        "database_name": "TARGET_DB",
        "schema_name": "TARGET_SCHEMA",
        "table_name": "TARGET_TABLE"
      },
      "masking_inventory": [
        {
          "field_name": "AGE",
          "domain_name": "AGE",
          "algorithm_name": "dlpx-core:Age SL"
        },
        {
          "field_name": "DETAILS",
          "structured_data_format_id": 9
        }
      ]
    }
  ]
}

You can upload the data set to the controller using the profiler using /data-set/upload/<execution_id> POST API endpoint.