I'm using an r4.8xlarge on AWS Batch Service to run Spark. This is already a big machine, 32 vCPU, and 244 GB. On AWS Batch Service the process runs inside a Docker container. Out of multiple sources, I saw that we should use java with the parameters:
-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1
Even with this parameters the process never when over 31Gb resident memory and 45 GB of virtual memory.
As analyzes I did:
java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 -XshowSettings:vm -version
VM settings:
Max. Heap Size (Estimated): 26.67G
Ergonomics Machine Class: server
Using VM: OpenJDK 64-Bit Server VM
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
second test
docker run -it --rm 650967531325.dkr.ecr.eu-west-1.amazonaws.com/java8_aws java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -XshowSettings:vm -version
VM settings:
Max. Heap Size (Estimated): 26.67G
Ergonomics Machine Class: server
Using VM: OpenJDK 64-Bit Server VM
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
third test
java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=10 -XshowSettings:vm -version
VM settings:
Max. Heap Size (Estimated): 11.38G
Ergonomics Machine Class: server
Using VM: OpenJDK 64-Bit Server VM
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
The system is build with Native Packager as a standalone application. A SparkSession is built as follows with Cores equal to 31 (32-1):
SparkSession
.builder()
.appName(applicationName)
.master(s"local[$Cores]")
.config("spark.executor.memory", "3g")
Answer to egorlitvinenko:
$ docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
0c971993f830 ecs-marcos-BatchIntegration-DedupOrder-3-default-aab7fa93f0a6f1c86800 1946.34% 27.72GiB / 234.4GiB 11.83% 0B / 0B 72.9MB / 160kB 0
a5d6bf5522f6 ecs-agent 0.19% 19.56MiB / 240.1GiB 0.01% 0B / 0B 25.7MB / 930kB 0
More tests, now with Oracle JDK, the memory never went over 4G:
$ 'spark-submit' '--class' 'integration.deduplication.DeduplicationApp' '--master' 'local[31]' '--executor-memory' '3G' '--driver-memory' '3G' '--conf' '-Xmx=150g' '/localName.jar' '--inPath' 's3a://dp-import-marcos-refined/platform-services/order/merged/*/*/*/*' '--outPath' 's3a://dp-import-marcos-refined/platform-services/order/deduplicated' '--jobName' 'DedupOrder' '--skuMappingPath' 's3a://dp-marcos-dwh/redshift/item_code_mapping'
I used the parameters -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 on my Spark, clearly not using all the memory. How can I go about this issue?
1024Mby default. How do you submit the Spark app?