Tuning the profiler results
The profiler uses regular expressions to identify sensitive data fields based on the column names and key values of VARIANT type data. These regular expressions are configured in a file called recognizers.yaml and each expression is mapped to a masking algorithm and has a matching score. Here is an example configuration:
- name: ACCOUNT_NO
patterns:
- name: ACCOUNT_NO 1
regex: '(?i)(?>(account|accnt|acct)_?-? ?(number|num|nbr|no|user))($|[ _-])'
score: 0.67
supported_entity: ACCOUNT_NO
supported_language: en
To improve the results, you can edit the recognizers.yaml to add new regular expressions or to change the score associated to an existing configuration.
The default
recognizers.yamlis available in/app/src/config/recognizers.yaml. But the execution will use therecognizers.yamlcopied over to/app/delphix/profiler.Copying the
recognizers.yamlfrom the pod/container.For Kubernetes deployment:
CODEkubectl cp --namespace=<profiler-namespace> <snowflake-profiler-service-pod-name>:/app/src/config/recognizers.yaml ./recognizers.yamlFor Docker Compose deployment:
CODEdocker cp <snowflake-profiler-service-container>:/app/src/config/recognizers.yaml ./recognizers.yaml
Edit the
recognizers.yamlto either add new regex regular expressions or to edit the scores. For instance, in the below example we added an identifier for city name where the column name is enclosed in single quotes:CODE- name: CITY patterns: ... # New city identifier - name: CITY 3 regex: (?i)(?>'(address_?-? ?city|city|city_?-? ?address)') score: 0.9Copy the edited
recognizers.yamlto the pod/container.For Kubernetes deployment:
CODEkubectl cp --namespace=<profiler-namespace> ./recognizers.yaml <snowflake-profiler-service-pod-name>:/app/delphix/profiler/recognizers.yamlFor Docker Compose deployment:
CODEdocker cp ./recognizers.yaml <snowflake-profiler-service-container>:/app/delphix/profiler/recognizers.yaml