Apache Spark for Big Data Processing

...
This is now a virtual classroom course. You can find more information about our virtual classroom here
A discount is available when:
  • a single customer proceeds with more than three registrations for a particular session;
  • a learner is not currently employed.
Please contact us to receive your discount coupon if you meet any of the above-mentioned criteria.

Please also note that the upcoming sessions are guaranteed to run once they have a minimum number of learners registered.
Total price:
540.00660.00 final price

Available Sessions

DateTimezoneTraining TypeWeekend-WeekdayDescriptionPrice 
Coming soon540.00660.00 final price

The 16-hour course starts with an introduction to Apache Spark and the fundamental concepts and APIs that enable Big Data Processing. Real world datasets and a Spark Computing cluster will be available and fully working examples in all languages (Python / R / Java / Scala / SQL) along with exercises will be provided.

Consequently, participants are focusing on using Spark API for common real world Data Engineering, Data Integration and processing tasks from various sources (eg Relational Databases, Distributed Filesystems) within Spark Engine. Particular focus will be given to Observability, Monitoring and Performance assessment of the Tasks so as to build up understanding and further tune and optimize for optimal performance and increased cluster utilization.

Concepts such as scheduling jobs and tuning schedulers for optimal performance will be covered as well. Once building a solid understanding on Spark API on processing batch data, participants will further move to Spark Structured Streaming and build and optimize stream processing pipelines (both Stateful and Stateless) using Kafka and other real world input sources.

Who should attend

This 16-hour course can be attended by anyone interested in using Spark for Data Engineering with some programming skills. However, this course has been created having in mind:

  • Software Engineers,
  • Data Warehouse engineers, 
  • Data Scientists and
  • Data Engineers

with adequate programming skills willing to make the next step and understand how distributed data processing engines work in practice and how they can make best use of them to solve real world problems.

Prerequisites

Programming experience with one of Python / R / Java / Scala / SQL. Solid understanding of the each selected language’s structures, collections and input/output API.

Experience with data management with relational databases and understanding of SQL internals is a plus.

What will you learn

Distributed Data Processing Fundamentals

  • HDFS and Distributed Filesystems
  • Resource Managers 
  • Distributed Jobs Scheduling
  • File Formats 

Batch Processing with Spark

  • Introduction to Sparks’ fundamental APIs (DataFrames, Datasets, RDDs)
  • Connecting to Sources , writing Output
  • Working and integrating different types of data (Structured/ Unstructured)
  • Schema definition and Management , Partition Management
  • High Performance Aggregations and Joins among disparate datasets
  • Advanced RDD operations
  • Deploying / Redeploying / Restarting after failure and Monitoring Spark applications
  • Debugging and Tuning Spark applications

Stream Processing with Spark

  • Intro to Stream Processing with Structured Streaming
  • Structured Streaming Sources and Output
  • Event Time based Stream Processing
  • Stateful Stream Processing 
  • Monitoring and Optimizing Structured Streaming Applications
  • Highly available Streams 
  • Handling Errors , Restarting , Redeploying streams without losing data
Timezone

Training Type

Weekend-Weekday

How it works

Registration

You may enroll in the course by providing your full name and email address through our website's 'add to cart' feature or by sending an email to hello@learningactors.com. We treat your personal data, including your full name and email address, with strict confidentiality.

Payment Options

We offer multiple payment methods, including credit cardbank transfer, and PayPal. To receive your invoice, please provide your VAT number (when applicable), address, and zip code. Again, you may provide us with this information either through our website's 'add to cart' feature or by sending an email to hello@learningactors.com. Please note that you can typically expect to receive your invoice on the first day of training, unless there are exceptional circumstances that necessitate a different approach. We’re happy to provide you with a pro-forma invoice if this meets your needs.

Expectations

Following registration, you can anticipate an info email from us approximately one week before the course start date. This email will contain course details, such as the title, date, and time, along with a request to complete a pre-course form. Additionally, we will provide the relevant Zoom link, which is  applicable for all our virtual-live courses, and will invite you to the course's Slack channel on the LA Slack workspace (learningactors.slack.com). For specific courses, you will also receive a preparation email that will set you ready for the course.
Finally, after the session concludes, we kindly request that you take a moment to provide us with your feedback, as it is of great importance to us. Additionally, we will ensure that you receive the learning material through Slack, which will be instrumental in keeping your learning journey on track!

Reminders

  • Discounts are available in two scenarios: when a single customer registers for more than three sessions or if the customer is currently unemployed. Contact the LA team to receive a discount coupon if you meet these criteria.
  • All upcoming sessions displayed on our website are guaranteed to proceed once the minimum required number of learners register. In the case that we must cancel a session, a full refund will be issued to those who have pre-paid for that specific session.