Overview
During this series, Mark Payne, a Principal Software Engineer at Cloudera and co-creator of Apache NiFi, will explain several common ways that people use NiFi incorrectly or inefficiently. After explaining the weaknesses of each approach, Mark then shows how to improve those flows to make better use of NiFi's design and architecture.
Course Length
This course includes 1 hour of video content.
Course Outline
- Part 1: Flows Overview examines a flow that splits and rejoins data, treats structured/semi-structured data as unstructured text, and blurs the line between FlowFile content and attributes.
- Part 2: Flow Layout illustrates how a disorganized dataflow can make it difficult to understand and maintain. Mark shares tips for laying out the dataflow to make it clean, simple, and easy for others to follow.
- Part 3: Load Balancing explains how to make your dataflows more scalable by balancing the load across a cluster of nodes. Mark also references his Cloudera technical blog post that shows how NiFi can process more than one billion events per second.
- Part 4: Scheduling covers scheduling and concurrency anti-patterns. Mark discusses common problems related to thread pools, scheduling processors, and how to configure settings for best performance.
- Part 5: Primary Node Only looks at the primary node and how it is sometimes misused.