Smart Resource Utilization With Spark Dynamic Allocation
Spark cluster is designed to run many jobs in parallel. Each job specifies the resource it requires and when the cluster has enough free resources - it will allocate those resources to that application. These resources are dedicated to this application throughout its entire lifecycle and cannot be used by other applications. This pattern of resource allocation is suitable for batch applications. They claim the resources, run their business and when they are done, the resources are free for other jobs. However, lately Spark has become a great platform to serve users’ needs through long running applications. Such applications are up and running perpetually, waiting for users’ commands or new data to arrive and process them immediately. Their idle time might be long and during that time the resources allocated to them are not used. Do they need all the resources all the time? Or maybe we should allocate them only small portion of resources. If so, what will happen when they encounter resource consuming situation such as large data or complicated commands?
Dynamic allocation is what we need here. On peak times allocate more resources to application but on idle times the application will let go of its resources (or at least most of them). This is exactly the solution provided by Spark Dynamic Allocation: elastic behavior of resource management. Interested in understanding how it works? Need a short guide to make it happen? Read more on my article at DZone.