Data Source Support
Oracle Connector
Oracle Database (commonly referred to as Oracle RDBMS or simply as Oracle) is a multi-model database management system produced and marketed by Oracle Corporation. The following table lists the versions that have been tested in the lab setup:
Platforms | Version |
---|---|
Linux |
|
User on source database must select privileges
User on target database side must have all privileges and SELECT_CATALOG_ROLE.
Supported Data Types
The following are the different data types that are tested in our lab setup:
VARCHAR
VARCHAR2
NUMBER
FLOAT
DATE
TIMESTAMP(default)
CLOB
BLOB(with text)
XMLTYPE
Hyperscale Compliance restricts the support of the following special characters for a database column name: ~!@#$%^&*()\\\"?:;,/\\\\`+=[]{}|<>'-.\")]
Property values
Property | Value |
---|---|
|
|
|
|
For default values, see Configuration settings.
MS SQL Connector
Supported versions
Microsoft SQL Server 2019
Supported data types
The following are the different data types that are tested in our lab setup:
VARCHAR
CHAR
DATETIME
INT
TEXT
XML (only unload/load))
VARBINARY (only unload/load)
SMALLINT
SMALLMONEY
MONEY
BIGINT
NVARCHAR
TINYINT
NUMERIC(X,Y)
DECIMAL(X,Y)
FLOAT
NCHAR
BIT
NTEXT
MONEY
Property Values
Property | Value |
---|---|
|
|
|
|
For default values, see Configuration settings .
Known Limitations
If the applied algorithm's produced mask data exceeds the corresponding target table columns datatype's max value range, then job execution will fail in load service.
Schemas, tables, and column names having special characters are not supported.
Masking of columns with
VARBINARY
datatype is not supported.Hyperscale Compliance can mask up to a maximum 1000 tables in a single job.
Delimited Files Connector
The connector can be used to mask large delimited files. The delimited unload service splits the large files into smaller chunks and passes them onto the masking service. After the masking is completed, the files are sent to the load service which joins back the split files (the end user also has a choice to disable the join operation).
Pre-requisites
The source and target (NFS) locations have to be mounted onto the docker containers of unload and load service. Please note that the locations on the containers are what needs to be used when creating the connector-info’s using the controller.
CODE# As an example unload-service: image: delphix-delimited-unload-service-app:<HYPERSCALE VERSION> ... volumes: ... - /path/to/nfs/mounted/source1/files:/mnt/source1 - /path/to/nfs/mounted/source2/files:/mnt/source2 ... load-service: image: delphix-delimited-load-service-app:<HYPERSCALE VERSION> ... volumes: ... - /path/to/nfs/mounted/target1/files:/mnt/target1 - /path/to/nfs/mounted/target2/files:/mnt/target2
Property values
Property | Value |
| unique_source_files_identifier |
| false |
For default values, see Configuration settings.
Supported data types
The following are the supported data types for delimited files hyperscale connector:
String/Text
Double
Columns with double data types will be converted to strings.
e.g.,
36377974237282886994505
will be converted to“36377974237282886994505"
.
Int64
Columns with int64 data type will be converted to strings.
e.g.,
00009435304391722556805
will be converted to“00009435304391722556805"
.
Timestamp
Known Limitations
Supports only Single-character ASCII delimiters
The end-of-record character can only be
\n
,\r
, or\r\n
.
Output files will exclusively enclose all string types with double quotes (`”`).
MongoDB Connector
The connector can be used to mask large MongoDB files. The Mongo unload service splits the large collections into smaller chunks and passes them onto the masking service. After the masking is completed, the files are sent to the Mongo load service, which imports the masked files into the target collection.
Supported Versions
Platforms | Version |
---|---|
Linux | MongoDB 4.4.x MongoDB 5.0.x MongoDB 6.0.x |
Pre-requisites
MongoDB user should have the following privileges:
CODEuse admin db.createUser({user:"backupadmin", pwd:"xxxxxx", roles:[{role:"backup", db: "admin"}]})
Mongo Unload and Mongo Load service image names are to be used under unload-service and load-service. The NFS location has to be mounted onto the Docker containers for unload and load services. Example for mounting
/mnt/hyperscale
.CODE# As an example docker-compose.yaml unload-service: image: delphix-mongo-unload-service-app:${VERSION} volumes: # Uncomment below lines to mount respective paths. - /mnt/hyperscale:/etc/hyperscale load-service: image: delphix-mongo-load-service-app:${VERSION} volumes: # Uncomment below lines to mount respective paths. - /mnt/hyperscale:/etc/hyperscale
Uncomment the below lines from
docker-compose.yaml
file undercontroller > environment
:
# uncomment below for MongoDB connector
#- SOURCE_KEY_FIELD_NAMES=database_name,collection_name
#- VALIDATE_UNLOAD_ROW_COUNT_FOR_STATUS=${VALIDATE_UNLOAD_ROW_COUNT_FOR_STATUS:-false}
#- VALIDATE_MASKED_ROW_COUNT_FOR_STATUS=${VALIDATE_MASKED_ROW_COUNT_FOR_STATUS:-false}
#- VALIDATE_LOAD_ROW_COUNT_FOR_STATUS=${VALIDATE_LOAD_ROW_COUNT_FOR_STATUS:-false}
#- DISPLAY_BYTES_INFO_IN_STATUS=${DISPLAY_BYTES_INFO_IN_STATUS:-true}
#- DISPLAY_ROW_COUNT_IN_STATUS=${DISPLAY_ROW_COUNT_IN_STATUS:-false}
Set the value of
LOAD_SERVICE_REQUIRE_POST_LOAD=false
inside the “.env
”CODE# Set LOAD_SERVICE_REQUIRE_POST_LOAD=false for MongoDB Connector LOAD_SERVICE_REQUIRE_POST_LOAD=false
Uncomment the below lines from “
.env
” file.CODE# Uncomment below for MongoDB Connector #VALIDATE_UNLOAD_ROW_COUNT_FOR_STATUS=false #VALIDATE_MASKED_ROW_COUNT_FOR_STATUS=false #VALIDATE_LOAD_ROW_COUNT_FOR_STATUS=false #DISPLAY_BYTES_INFO_IN_STATUS=true #DISPLAY_ROW_COUNT_IN_STATUS=false
Property values
Mandatory changes are required for the MongoDB Connector in the docker-compose.yaml
and .env
files:
Property | Value |
---|---|
SOURCE_KEY_FIELD_NAMES | database_name,collection_name |
LOAD_SERVICE_REQUIRE_POST_LOAD | false |
VALIDATE_UNLOAD_ROW_COUNT_FOR_STATUS | false |
VALIDATE_MASKED_ROW_COUNT_FOR_STATUS | false |
VALIDATE_LOAD_ROW_COUNT_FOR_STATUS | false |
DISPLAY_BYTES_INFO_IN_STATUS | true |
DISPLAY_ROW_COUNT_IN_STATUS | false |
For default values, see Configuration settings.
Known Limitation:
Sharded MongoDB Atlas is not supported.
In-Place Masking is not supported.
Parquet Connector
The connector can be used to mask large Parquet files. The parquet unload service splits the large files into smaller chunks and passes them onto the masking service. After the masking is completed, the files are sent to the load service, which joins back the split files (you also have a choice to disable the join operation).
Pre-requisites
The connector should be able to access the AWS S3 buckets (the source and target locations). The following approaches are supported by the connector and can be used to authenticate with the S3 bucket:
Attaching the IAM role to the EC2 instance where the hyperscale masking services will be deployed.
IAM Roles are designed for applications to securely make AWS-API requests from EC2 instances, without the necessity to manage the security credentials that the applications use.
Using the AWS console UI or AWS CLI, attach the IAM role to the EC2 instance running the Hyperscale services. To know more, check the AWS Documentation.
With IAM role authentication, there is no need to pass the AWS credentials during the connector-info creation.
CODE# Example connector-info payload { "source": { "type": "AWS", "properties": { "server": "S3", "path": "aws_s3_bucket/sub_folder(s)" } }, "target": { "type": "AWS", "properties": { "server": "S3", "path": "aws_s3_bucket/sub_folder(s)" } } }
Passing the AWS Access Key ID & AWS Secret Access Key attached to an AWS role:
Access keys are long-term credentials generated for an IAM user or role. These keys can be for programmatic requests to the AWS CLI or AWS API (directly or using the AWS SDK). To know more, check the AWS Documentation.
These credentials can be passed during the connector-info creation.
CODE# Example connector-info payload { "source": { "type": "AWS", "properties": { "server": "S3", "path": "aws_s3_bucket/sub_folder(s)", "aws_region": "us-west-2", "aws_access_key_id": "AWS_ACCESS_KEY_ID", "aws_secret_access_key": "AWS_SECRET_ACCESS_KEY" } }, "target": { "type": "AWS", "properties": { "server": "S3", "path": "aws_s3_bucket/sub_folder(s)", "aws_region": "us-west-2", "aws_access_key_id": "AWS_ACCESS_KEY_ID", "aws_secret_access_key": "AWS_SECRET_ACCESS_KEY" } } }
They can also be set as environment variables when bringing up the Parquet connector services.
CODEunload-service: ... environment: - AWS_DEFAULT_REGION=us-east-1 - AWS_ACCESS_KEY_ID=<aws_access_key_id> - AWS_SECRET_ACCESS_KEY=<aws_secret_access_key> ... load-service: ... environment: - AWS_DEFAULT_REGION=us-east-1 - AWS_ACCESS_KEY_ID=<aws_access_key_id> - AWS_SECRET_ACCESS_KEY=<aws_secret_access_key>
Property values
Configurations on the controller service:
Property | Value |
---|---|
| unique_source_files_identifier |
| false |
Configuration on the parquet-unload-service:
Property | Value |
---|---|
| 512 |
For default values, see Configuration settings.
Supported data types
The following are the supported data types for parquet files hyperscale connector:
BOOLEAN
INT32
INT64
INT96
FLOAT
DOUBLE
BYTE_ARRAY
Known Limitations
Generally, the parquet files are compressed and the compression factor could vary from 2x to 70x or even more. So, when working with such larger files the connector will need a host which has large enough memory to accommodate the parallel execution of multiple large parquet files. In case the sum of the uncompressed size of parquet files that are getting executed in parallel exceeds 80% of RAM size then the chances of having an “out of memory” error are high. To avoid OOM, the end user can reduce the MAX_WORKER_THREADS_PER_JOB (i.e. reduce the number of parallel threads), ultimately reducing the memory usage.