Spark cluster optimization configuration

iServer's distributed analysis service is based on the Spark computing platform, providing GIS distributed analysis and processing capabilities. Different hardware environments, Spark cluster environments, and analysis data of different size all affect the performance of distributed analysis. To achieve the best performance, you need to optimize the configuration according to different scenarios. Here are some commonly used optimization methods:

When Spark is running, it will start the executor to perform the task, and you can optimize the data processing efficiency by adjusting the memory allocated to the executor (iServer built-in Spark defaults to 4G) in the Spark configuration file according to the actual situation of your machine. The method is:

Enter the conf directory of the Spark installation package, such as [iServer installation directory]\support\spark\conf, rename the spark-defaults.conf.template file to spark-defaults.conf

Open spark-defaults.conf with document editor, add: spark.executor.memory 8g

When the result data of the analysis is relatively large, it will consume many system hardware resources when the spark cluster master node collects the result data from various sub nodes and then stores the data in the local file or iServer DataStore. In order to improve the efficiency of analysis, you can take the following optimal configuration method:

Modify the spark driver memory
Enter the conf directory of the Spark installation package, such as [iServer installation directory]\support\spark\conf, rename the spark-defaults.conf.template file to spark-defaults.conf
Open spark-defaults.conf with document editor, add: spark.driver.memory 5g
If you are using the built-in spark in iServer, you can set the Spark master node not to participate in the analysis task
Click "Cluster", "Join cluster" in the iServer where the spark master node is located, cancel the cluster reporter in Whether to be the Distributed Analysis node.

There are two main scheduling modes in Spark: FIFO (First In First Out ) and FAIR (Fair Dispatch). The iServer built-in Spark runtime uses FAIR, which can process multiple analysis jobs concurrently. Spark defaults to FIFO mode.You can set according to the actual situation. The method is:

Enter the conf directory of the Spark installation package, such as [iServer installation directory]\support\spark\conf, rename the spark-defaults.conf.template file to spark-defaults.conf

Open spark-defaults.conf with document editor, add: spark.scheduler.mode FIFO