Data Connection Info Parameter Description

When using the GPA tool, you need to enter "connection info" to access databases or data files. The following is a detailed explanation of the parameters for different sources of data connection info.

SuperMap SDX+ (--providerType=sdx)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is sdx, indicating the use of SuperMap SDX+ spatial data engine to access data.
--dbType, --type, -T, --db-type
(Required)
Engine type, supporting access to various databases and vector or raster files, for example:
UDBX file: --dbType=UDBX
PostGIS database: --dbType=PGGIS
Vector file: --dbType=VECTORFILE
--server, -S
(Required)
Database engine server address or file path.
--dataset, --dt
(Required)
Dataset name to read or save.
--CHARSET Character set encoding, used only for vector file engine type.
--driver Driver name required for data source connection.
--database, --db Database name for data source connection.
--alias, -A Alias.
--user, -U Username for database login.
--password, --pwd Password for database login.
--maxConnPoolNum Maximum number of connections for the data source.
--numPartitions, --num-parts, -N Number of partitions, default is 0.
--partitionField, --part-field Field used for partitioning, default is the SmID field, but if the database engine (e.g., PGGIS) does not have SmID, the user needs to specify the partition field.
--idField, --id-field Field used to read unique feature IDs in feature objects. It must ensure that field values are unique. If empty, the default SmID field value is used.
--blockingWrite Write data in blocks, which can effectively improve performance for large data, default is true.

Examples:

  • Connect to UDBX: --providerType=sdx --dbType=UDBX --server=F:\data\landuse.udbx --dataset=DLTB
  • Connect to PostGIS: --providerType=sdx --dbType=PGGIS --server=127.0.0.1 --database=postgis --user=postgres --password=uitest --maxConnPoolNum=10 --dataset=DLTB
  • Connect to ArcGIS for Oracle: --providerType=sdx --server=127.0.0.1:3452/xe --user=testdb --password=testdb --alias=test --maxConnPoolNum=1 --dataset=dt --dbType=ARCSDE_ORACLE
  • Connect to vector file (ShapeFile): --providerType=sdx --server=E:\data\DLTB.shp --dataset=DLTB --CHARSET=ANSI --dbType=VECTORFILE

JDBC (--providerType=jdbc)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is jdbc, indicating the use of Java API provided by JDBC to access data.
--dbType, --type, -T, --db-type
(Required)
Only supports PostGIS and OracleSpatial databases, and can achieve distributed read and write in cluster mode.
--host
(Required)
Database server address.
--port
(Required)
Database service port number, default is 0.
--database, --db Database name for data source connection.
--dataset, --dt
(Required)
Dataset name to read or save.
--schema Database schema to access.
--user, -U Username for database login.
--password, --pwd Password for database login.
--numPartitions, --num-parts, -N Number of partitions, default is 0.
--partitionField, --part-field Field used for partitioning, default is the SmID field, but if the database engine (e.g., PGGIS) does not have SmID, the user needs to specify the partition field.
--predicates Array of partition conditions.

Examples:

  • Connect to OracleSpatial: --providerType=jdbc --host=127.0.0.1 --port=1521 --schema=testosp --database=orcl --user=testosp --password=testosp --dbtype=oracle --table=SMDTV (Note: Here, you need to fill in the dataset table name.)
  • Connect to PostGIS: --providerType=jdbc --host=127.0.0.1 --port=5432 --schema=postgres --database=uitest --user=postgres --password=uitest --dbtype=postgis --dataset=DLTB

DSF (--providerType=dsf)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url
(Required)
DSF directory address, supporting local directory and HDFS directory.
--bounds, -B Query bounds, mainly used for range queries, invalid when as-dsf-rdd is true. Compared to using the box query in Query, queryBounds uses the first-level index for filtering once, which is more efficient in performance.
-fields, --result-fields Field names to read, multiple fields separated by commas.
--as-dsf-rdd Whether to read as DSFFeatureRDD, default is false. If true, read as DSFFeatureRDD; when performing spatial relationships or spatial judgments between two datasets, ensure that the partition indexes of the two DSFFeatureRDD datasets are consistent. Otherwise, users can set it to false to read as FeatureRDD, so that when performing spatial operations or judgments between two datasets, partition indexes will be uniformly established.

Examples:

  • Local directory (Windows system): --providerType=dsf --path=E:/data/Zoo_pt
  • Local directory (Linux system): --providerType=dsf --path=/home/data/Zoo_pt
  • HDFS directory: --providerType=dsf --path=hdfs://127.0.0.1:9000/data/DLTB

CSV (--providerType=csv)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is csv, indicating the source of data is csv file.
--path, -P, --url
(Required)
CSV directory address, supporting local directory and HDFS directory.
-fields, -F Fields to import, separated by commas. If the csv has a header, pass field names and set FirstRowIsField to true; if no header, pass col0, col1, etc., according to column order.
--geofields, --geo-fields, -GF Geo fields to import, separated by commas. Pass one column for line or polygon data, two columns for point data, more are invalid. If the csv has a header, pass field names and set FirstRowIsField to true; if no header, pass col0, col1, etc., according to column order. Note: Only supports latitude and longitude coordinate data in the WGS84 coordinate system, and the range must be within [-180,180],[-90,90].
--firstRowIsField, --first-as-field, -FAF Whether the first row is a header as field names.
--idField, --id-field, -IF Specify the id field name; if not specified, the default is uuid.
--modifyDesFieldTypes, --des-field-type, -DFT Specify field types; if not specified, W defaults to reading fields as string. Input format is 'field name -> type, field name -> type'.
--timeZoneOffset,--time-Zone-Offset,-TZO When specifying the type as date in the modifyDesFieldTypes parameter, it defaults to UTC time. This parameter can be used for time difference conversion, with the format "+/-hh:mm:ss". For example, convert to Beijing time: --timeZoneOffset=+08:00:00, indicating UTC+8 hours.

Examples:

  • Local directory (Windows system): --providerType=csv --path=E:/data/Zoo_pt
  • Local directory (Linux system): --providerType=csv --path=/home/data/Zoo_pt
  • HDFS directory: --providerType=csv --path=hdfs://127.0.0.1:9000/data/DLTB

ESRI File Geodatabase (--providerType=gdb)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url
(Required)
GDB directory address, supporting local directory and HDFS directory.
--dataset, --table Dataset name to read in GDB.
--minPartitions, --min-part, -M Spark RDD minimum number of partitions, default is 0.

Examples:

  • Local directory: --providerType=gdb --path=F:/data/landuse.gdb --table=DLTB

ShapeFile (--providerType=shape)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url
(Required)
ShapeFile directory address, supporting local directory and HDFS directory. Note: The address here is the folder address where the ShapeFile file is located.

Examples:

  • Local directory: --providerType=shape --path=E:\data\shp

Elasticsearch (--providerType=elasticsearch)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--dataset, --table Specify the table to read.
--index
(Required)
Index name.
--es.nodes, --nodes
(Required)
ES server address.
--es.port, --port
(Required)
ES service port.
--es.batch.size.bytes, --batch-size-bytes, --BSB Batch write size, default is 1 MB.
--es.batch.size.entries, --batch-size-entries, --BSE Matches es.batch.size.bytes, executes batch updates, default is 1000.
--es.internal.es.cluster.name, --cluster-name, CN Cluster name, required for read and write in cluster versions 5.x and 6.x, optional for version 7.x.
--es.mapping.id Fields and attributes to write, whose names include es.mapping.id.
--number_of_shards Number of shards, default is 5.
--number_of_replicas Number of replicas, default is 1.

Examples:

  • --providerType=elasticsearch --index=test --table=test --nodes=localhost --port=9200

SimpleJson (--providerType=simplejson)

Parameter Name Parameter Interpretation
--providerType
(Required)
The type of data provider, here is dsf, indicating the use of DSF distributed spatial file engine to access data.
--path, -P, --url
(Required)
SimpleJson directory address, supporting local directory and HDFS directory.
--minPartitions, --min-part, -M Spark RDD minimum number of partitions, default is 0.
--metaInfo, --meta-info, --meta SimpleJson file metadata or metadata address; users can specify independent SimpleJson file metadata.
--idField, --id-field, -IF Specify the field for feature ID; if the SimpleJson object does not have a featureID, you can use --id-field to specify a field to construct featureID, but the user must ensure that the field values are unique.

Examples:

  • --providerType=simplejson --path=E:\data\Region.json