Cloudera
Apache Spark Application Performance Tuning Training Course
Pages
2
Time to read
3 mins
Publication
Language
English
Pages
2
Time to read
3 mins
Publication
Language
English
This document is a training sheet detailing a three-day hands-on course focused on improving the performance of Apache Spark applications. The course aims to equip developers with the necessary concepts and expertise to identify and resolve common performance issues in Spark applications. Participants will learn about Apache Spark’s architecture, job execution, and performance optimization techniques, including lazy execution and pipelining. The training includes instructor-led demonstrations and hands-on exercises in an interactive notebook environment, emphasizing practical application of the concepts taught. Key topics covered include performance characteristics of RDDs and DataFrames, file format selection for optimal performance, and techniques for addressing data skew. Additionally, the course introduces the new features in Spark 3.0, including the Adaptive Query Execution framework. The course is designed for software developers, engineers, and data scientists with prior experience in Spark application development.