Initial job has not accepted any resources : Investigating the cluster state
Application isn’t using all of the Cores: How to set the Cores used by a Spark App
Spark Executor OOM: How to set Memory Parameters on Spark
Increase the parallelism of your job. Try increasing the number of partitions in your job. By splitting the work into smaller sets of data less information will have to be resident in memory at a given time. For a Spark Cassandra Connector job this would mean decreasing the split size variable. The variable, spark.cassandra.input.split.size, can be set either on the command line as above or in the SparkConf object. For other RDD types look into their api’s to determine exactly how they determine partition size.
Increase the storage fraction variable, spark.storage.memoryFraction. This can be set as above on either the command line or in the SparkConf object. This variable sets exactly how much of the JVM will be dedicated to the caching and storage of RDD’s. You can set it as a value between 0 and 1, describing what portion of executor JVM memory will be dedicated for caching RDDs. If you have a job that will require very little shuffle memory but will utilize a lot of cached RDD’s increase this variable (example: Caching an RDD then performing aggregates on it.)
If all else fails you may just need additional ram on each worker. For DSE users adjust your spark-env.sh (or dse.yaml file in DSE 4.6) file to increase SPARK_MEM reserved for Spark jobs. You will need to restart your workers for these new memory limits to take effect (dse sparkworker restart.) Then increase the amount of ram the application requests by setting spark.executor.memory variable either on the command line or in the SparkConf object.
Shark Server/ Long Running Application Metadata Cleanup
Class Not Found: Classpath Issues
What is happening : A Brief Tour of The Spark UI