Data Connection Info Parameter Description

When using the GPA tool, you need to enter "connection info" to access databases or data files. The following is a detailed explanation of the parameters for different sources of data connection info.

SuperMap SDX+ (--providerType=sdx)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is sdx, indicating the use of SuperMap SDX+ spatial data engine to access data.
--dbType, --type, -T, --db-type (Required)	Engine type, supporting access to various databases and vector or raster files, for example: UDBX file: --dbType=UDBX PostGIS database: --dbType=PGGIS Vector file: --dbType=VECTORFILE
--server, -S (Required)	Database engine server address or file path.
--dataset, --dt (Required)	Dataset name to read or save.
--CHARSET	Character set encoding, used only for vector file engine type.
--driver	Driver name required for data source connection.
--database, --db	Database name for data source connection.
--alias, -A	Alias.
--user, -U	Username for database login.
--password, --pwd	Password for database login.
--maxConnPoolNum	Maximum number of connections for the data source.
--numPartitions, --num-parts, -N	Number of partitions, default is 0.
--partitionField, --part-field	Field used for partitioning, default is the SmID field, but if the database engine (e.g., PGGIS) does not have SmID, the user needs to specify the partition field.
--idField, --id-field	Field used to read unique feature IDs in feature objects. It must ensure that field values are unique. If empty, the default SmID field value is used.
--blockingWrite	Write data in blocks, which can effectively improve performance for large data, default is true.

Examples:

Connect to UDBX: --providerType=sdx --dbType=UDBX --server=F:\data\landuse.udbx --dataset=DLTB
Connect to PostGIS: --providerType=sdx --dbType=PGGIS --server=127.0.0.1 --database=postgis --user=postgres --password=uitest --maxConnPoolNum=10 --dataset=DLTB
Connect to ArcGIS for Oracle: --providerType=sdx --server=127.0.0.1:3452/xe --user=testdb --password=testdb --alias=test --maxConnPoolNum=1 --dataset=dt --dbType=ARCSDE_ORACLE
Connect to vector file (ShapeFile): --providerType=sdx --server=E:\data\DLTB.shp --dataset=DLTB --CHARSET=ANSI --dbType=VECTORFILE

JDBC (--providerType=jdbc)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is jdbc, indicating the use of Java API provided by JDBC to access data.
--dbType, --type, -T, --db-type (Required)	Only supports PostGIS and OracleSpatial databases, and can achieve distributed read and write in cluster mode.
--host (Required)	Database server address.
--port (Required)	Database service port number, default is 0.
--database, --db	Database name for data source connection.
--dataset, --dt (Required)	Dataset name to read or save.
--schema	Database schema to access.
--user, -U	Username for database login.
--password, --pwd	Password for database login.
--numPartitions, --num-parts, -N	Number of partitions, default is 0.
--partitionField, --part-field	Field used for partitioning, default is the SmID field, but if the database engine (e.g., PGGIS) does not have SmID, the user needs to specify the partition field.
--predicates	Array of partition conditions.

Examples:

Connect to OracleSpatial: --providerType=jdbc --host=127.0.0.1 --port=1521 --schema=testosp --database=orcl --user=testosp --password=testosp --dbtype=oracle --table=SMDTV (Note: Here, you need to fill in the dataset table name.)
Connect to PostGIS: --providerType=jdbc --host=127.0.0.1 --port=5432 --schema=postgres --database=uitest --user=postgres --password=uitest --dbtype=postgis --dataset=DLTB

DSF (--providerType=dsf)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url (Required)	DSF directory address, supporting local directory and HDFS directory.
--bounds, -B	Query bounds, mainly used for range queries, invalid when as-dsf-rdd is true. Compared to using the box query in Query, queryBounds uses the first-level index for filtering once, which is more efficient in performance.
-fields, --result-fields	Field names to read, multiple fields separated by commas.
--as-dsf-rdd	Whether to read as DSFFeatureRDD, default is false. If true, read as DSFFeatureRDD; when performing spatial relationships or spatial judgments between two datasets, ensure that the partition indexes of the two DSFFeatureRDD datasets are consistent. Otherwise, users can set it to false to read as FeatureRDD, so that when performing spatial operations or judgments between two datasets, partition indexes will be uniformly established.

Examples:

Local directory (Windows system): --providerType=dsf --path=E:/data/Zoo_pt
Local directory (Linux system): --providerType=dsf --path=/home/data/Zoo_pt
HDFS directory: --providerType=dsf --path=hdfs://127.0.0.1:9000/data/DLTB

CSV (--providerType=csv)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is csv, indicating the source of data is csv file.
--path, -P, --url (Required)	CSV directory address, supporting local directory and HDFS directory.
-fields, -F	Fields to import, separated by commas. If the csv has a header, pass field names and set FirstRowIsField to true; if no header, pass col0, col1, etc., according to column order.
--geofields, --geo-fields, -GF	Geo fields to import, separated by commas. Pass one column for line or polygon data, two columns for point data, more are invalid. If the csv has a header, pass field names and set FirstRowIsField to true; if no header, pass col0, col1, etc., according to column order. Note: Only supports latitude and longitude coordinate data in the WGS84 coordinate system, and the range must be within [-180,180],[-90,90].
--firstRowIsField, --first-as-field, -FAF	Whether the first row is a header as field names.
--idField, --id-field, -IF	Specify the id field name; if not specified, the default is uuid.
--modifyDesFieldTypes, --des-field-type, -DFT	Specify field types; if not specified, W defaults to reading fields as string. Input format is 'field name -> type, field name -> type'.
--timeZoneOffset,--time-Zone-Offset,-TZO	When specifying the type as date in the modifyDesFieldTypes parameter, it defaults to UTC time. This parameter can be used for time difference conversion, with the format "+/-hh:mm:ss". For example, convert to Beijing time: --timeZoneOffset=+08:00:00, indicating UTC+8 hours.

Examples:

Local directory (Windows system): --providerType=csv --path=E:/data/Zoo_pt
Local directory (Linux system): --providerType=csv --path=/home/data/Zoo_pt
HDFS directory: --providerType=csv --path=hdfs://127.0.0.1:9000/data/DLTB

ESRI File Geodatabase (--providerType=gdb)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url (Required)	GDB directory address, supporting local directory and HDFS directory.
--dataset, --table	Dataset name to read in GDB.
--minPartitions, --min-part, -M	Spark RDD minimum number of partitions, default is 0.

Examples:

Local directory: --providerType=gdb --path=F:/data/landuse.gdb --table=DLTB

ShapeFile (--providerType=shape)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url (Required)	ShapeFile directory address, supporting local directory and HDFS directory. Note: The address here is the folder address where the ShapeFile file is located.

Examples:

Local directory: --providerType=shape --path=E:\data\shp

Elasticsearch (--providerType=elasticsearch)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--dataset, --table	Specify the table to read.
--index (Required)	Index name.
--es.nodes, --nodes (Required)	ES server address.
--es.port, --port (Required)	ES service port.
--es.batch.size.bytes, --batch-size-bytes, --BSB	Batch write size, default is 1 MB.
--es.batch.size.entries, --batch-size-entries, --BSE	Matches es.batch.size.bytes, executes batch updates, default is 1000.
--es.internal.es.cluster.name, --cluster-name, CN	Cluster name, required for read and write in cluster versions 5.x and 6.x, optional for version 7.x.
--es.mapping.id	Fields and attributes to write, whose names include es.mapping.id.
--number_of_shards	Number of shards, default is 5.
--number_of_replicas	Number of replicas, default is 1.

Examples:

--providerType=elasticsearch --index=test --table=test --nodes=localhost --port=9200

SimpleJson (--providerType=simplejson)

Parameter Name	Parameter Interpretation
--providerType (Required)	The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url (Required)	SimpleJson directory address, supporting local directory and HDFS directory.
--minPartitions, --min-part, -M	Spark RDD minimum number of partitions, default is 0.
--metaInfo, --meta-info, --meta	SimpleJson file metadata or metadata address; users can specify independent SimpleJson file metadata.
--idField, --id-field, -IF	Specify the field for feature ID; if the SimpleJson object does not have a featureID, you can use --id-field to specify a field to construct featureID, but the user must ensure that the field values are unique.

Examples:

--providerType=simplejson --path=E:\data\Region.json