Tuning the profiler results
The profiler uses regular expressions to identify sensitive data fields based on the column names and key values of VARIANT type data. These regular expressions are configured in a file called recognizers.yaml
and each expression is mapped to a masking algorithm and has a matching score. Here is an example configuration:
- name: ACCOUNT_NO
patterns:
- name: ACCOUNT_NO 1
regex: '(?i)(?>(account|accnt|acct)_?-? ?(number|num|nbr|no|user))($|[ _-])'
score: 0.67
supported_entity: ACCOUNT_NO
supported_language: en
To improve the results, you can edit the recognizers.yaml
to add new regular expressions or to change the score associated to an existing configuration.
The default
recognizers.yaml
is available in/app/src/config/recognizers.yaml
. But the execution will use therecognizers.yaml
copied over to/app/delphix/profiler
.Copying the
recognizers.yaml
from the pod/container.For Kubernetes deployment:
CODEkubectl cp --namespace=<profiler-namespace> <snowflake-profiler-service-pod-name>:/app/src/config/recognizers.yaml ./recognizers.yaml
For Docker Compose deployment:
CODEdocker cp <snowflake-profiler-service-container>:/app/src/config/recognizers.yaml ./recognizers.yaml
Edit the
recognizers.yaml
to either add new regex regular expressions or to edit the scores. For instance, in the below example we added an identifier for city name where the column name is enclosed in single quotes:CODE- name: CITY patterns: ... # New city identifier - name: CITY 3 regex: (?i)(?>'(address_?-? ?city|city|city_?-? ?address)') score: 0.9
Copy the edited
recognizers.yaml
to the pod/container.For Kubernetes deployment:
CODEkubectl cp --namespace=<profiler-namespace> ./recognizers.yaml <snowflake-profiler-service-pod-name>:/app/delphix/profiler/recognizers.yaml
For Docker Compose deployment:
CODEdocker cp ./recognizers.yaml <snowflake-profiler-service-container>:/app/delphix/profiler/recognizers.yaml