Data is no more static and offloading it for reporting in big batches once a day is not enough. Data is also not stored only in databases and useful data for your business can be generated from external sources (facebook , real time payments information from webhooks by a bank, 3rd party services) in real time. Learn how to take advantage of such data to ingest and analyse in realtime by developing streaming applications.
Would like a deep dive in the Big Data Streaming Architectures and Techniques? Then you are in the right place. During this course we will focus on Streaming Big Data Applications, we will build, monitor and manipulate data pipelines with tools and frameworks such as Apache Spark
Who should attend
IT Professionals interested in crossing over into development territory, in Big Data domain.
- SQL fluency
- Python basics
- An overview of Batch Big Data Processing
- Some experience with queueing systems (optional) or Change Data Capture/Log Shipping engines found in modern relational databases
What will you learn
The Course will consist of the following chapters:
- Big Data Architecture Overview
- Describe the Big Data landscape including examples of real world streaming big data problems including the three key sources of Big Data: people, organizations, and sensors.
- Provide an explanation of the architectural components and programming models used for scalable streaming big data analysis.
- Summarize the features and value of core Hadoop stack components including the YARN resource and job management system, the HDFS file system and the MapReduce programming model.
- Big Data Architecture Overview
- Big Data Tools & Practices
- Data Warehouse
- Streaming Applications
- Retrieve data from example database and big data management systems real or semi-real time
- Identify when a big data problem needs data integration with real time data generating systems (e.g. web services, queueing systems)
- Execute simple big data integration and processing on Hadoop and Spark platforms
- Select a data model to suit the characteristics of your data
- Apply techniques to handle streaming data
- How to persist and organize streaming data for further offline processing
- Unified Tools & User Interfaces for
- Operations (join streams or with reference data, manage state , sort , filter) on streaming data using modern a framework like Apache Spark
- Monitoring operations on streaming data and long running flows
- How to handle high-availability and errors so as not to stay behind in realtime data processing
- Integration with Other tools
- Recognize different streaming data elements in your own work and in everyday life problems
- Explain why your team needs to design a Big Data Infrastructure Plan and Information System Design
- Identify the frequent data operations required for various types of streaming data
- Build your own streaming data analytics pipelines with tools such as Hortoworks Streaming Analytics Manager
- Data Engineers Workflows
All course material will be taught on a reference hadoop installation and all users will be required to run and/or develop examples using the tools that we will make available
Next 2-day course has been scheduled for 15-16 May 2018 9.30am – 5.30pm. (available)
The course is taking place in Learning Actors premises, 62A Ethnikis Antistasis, Chalandri, Greece