Table of Contents
Enroll Here: Controlling Hadoop Jobs using Oozie Cognitive Class Exam Quiz Answers
Introduction to Controlling Hadoop Jobs using Oozie
Oozie is a workflow scheduler system to manage Apache Hadoop jobs. It allows you to define a workflow of dependent jobs, which can be Hadoop MapReduce jobs, Hive queries, Pig scripts, and others, and schedule their execution. Here’s a basic introduction to controlling Hadoop jobs using Oozie:
1. Workflow Definition
Oozie allows you to define workflows using XML. A typical workflow consists of a sequence of actions. Each action can represent a Hadoop job or a task to be executed. Example actions include:
- Hadoop MapReduce Action: Executes a MapReduce job.
- Pig Action: Executes a Pig script.
- Hive Action: Executes a Hive query.
- Shell Action: Executes a shell script.
2. Workflow Control Nodes
In Oozie, workflows are defined using control nodes that define the workflow structure:
- Start Node: Entry point of the workflow.
- End Node: Marks the end of the workflow.
- Action Nodes: Define individual actions to be executed. These nodes specify what action to execute and any input/output specifications.
- Decision Nodes: Allow conditional execution paths based on the result of a preceding action.
- Fork and Join Nodes: Enable parallel execution of actions.
3. Workflow Configuration
Each action in an Oozie workflow requires configuration specific to the type of action. For example, a MapReduce action needs to specify the jar file, input/output paths, and other job-specific properties. This configuration is defined in the XML workflow definition.
4. Coordinator and Bundle Applications
- Coordinator Application: Allows you to schedule and manage recurrent workflows based on time and data availability.
- Bundle Application: Helps in managing workflows that are dependent on each other or need to be executed together.
5. Workflow Execution
Once a workflow is defined and configured, you can submit it to Oozie for execution. Oozie coordinates the execution of actions based on the workflow definition. It monitors the progress of actions and handles retries in case of failures.
6. Oozie CLI and Web UI
- Oozie CLI: Provides command-line tools (
oozie
command) to interact with Oozie, submit workflows, check status, and manage jobs. - Oozie Web UI: Web-based interface to monitor workflows, view job history, and manage coordinator and bundle applications.
7. Error Handling and Notifications
Oozie provides mechanisms for error handling and notifications:
- Actions can specify retry policies.
- You can define email notifications on job completion or failure.
Conclusion
Oozie provides a powerful framework for managing and coordinating Hadoop jobs through workflows. It simplifies the task of scheduling and monitoring complex job dependencies and executions. By defining workflows and coordinating actions, Oozie helps in efficiently managing data processing tasks in a Hadoop environment.
Controlling Hadoop Jobs using Oozie Cognitive Class Certification Answers
Module 1 – Introduction of Oozie Workflows Quiz Answers
Question 1: Oozie definitions written in the Hadoop Process Definition Language (hPDL) are encoded in which of the following files?
- workflow.txt
- workflow.html
- workflow.json
- workflow.xml
Question 2: Oozie detects job completion via callback and polling. True or false?
- False
- True
Question 3: The Oozie expression language (EL) provides access to all of the following except
- error codes
- workflow job size
- application name
- workflow job id
Module 2 – Oozie Coodinator Quiz Answers
Question 1: Which of the following can trigger the start of an Oozie job?
- The Oozie CLI
- Data
- An application call to the API
- Time
- All of the above
Question 2: The Oozie coordinator works with Central European Time (CET). True or false?
- False
- True
Question 3: The Coordinator Job uses all of the following files except
- job.properties
- coord-config-default.xml
- coordinator.properties
- coordinator.xml
Module 3 – BigInsights Workflow Editor Quiz Answers
Question 1: Which of the following statements about the BigInsights Workflow Editor is correct?
- It displays a read-only diagram to show the overall workflow
- It runs in an Eclipse environment
- It supports complex Oozie workflows without requiring knowledge of the Oozie xml xds schema
- It’s a new feature, and it was introduced to BigInsights in version 2.0
- All of the above
Question 2: You can use the BigInsights Workflow Publishing Wizard as a graphical tool to create and modify a workflow.xml file. True or false?
- False
- True
Question 3: Which of the following statements is NOT correct?
- The InfoSphere BigInsights Tool for Eclipse is essentially an Eclipse module with BigInsights add-ins.
- At a higher level, we can link multiple applications to run in sequence.
- We cannot build sub-workflows in a workflow.
- Deployed applications can be scheduled.
Controlling Hadoop Jobs using Oozie Final Exam Answers
Question 1: What is the primary purpose of Oozie in the Hadoop architecture?
- To provide logging support for Hadoop jobs
- To support the execution of workflows consisting of a collection of actions
- To support SQL access to relational data stored in Hadoop
- To move data into HDFS
Question 2: How are Oozie workflows defined?
- Using the Java programming language
- Using JSON
- Using a plain text file that defines the graph elements
- Using hPDL
Question 3: Control nodes in an Oozie Workflow can contain all of the following except
- Start
- Fork
- Pig
- End
- Kill
Question 4: A workflow job can be executed from
- A Java API
- A Web-server API
- The command line
- All of the Above
Question 5: Where do the workflow.xml, config-default.xml, JAR, and .so files need to be stored prior to Oozie workflow job execution?
- On a web-server
- In HDFS within a defined directory structure
- On the local file system where you are executing the job
- None of the above
Question 6: What is the purpose of the Oozie Coordinator?
- To invoke workflows when some external event occurs
- To invoke workflows when data becomes available
- To invoke workflows at regular intervals
- All of the above
Question 7: Which of the following need to be stored in HDFS?
- coordinator.xml only
- coord-config-default.xml only
- coordinator.properties only
- coordinator.xml and coord-config-default.xml only
- coordinator.xml and coordinator.properties only
Question 8: The Oozie coordinator can be executed from
- A Java API
- A Web-server API
- The command line
- All of the Above
Question 9: How is an Oozie coordinator configured?
- Using the Java programming language
- Using JSON
- Using a plain text file that defines the workflow schedule
- Using XML
Question 10: By defining a dataset template as part of the coordinator.xml file, you can use the coordinator to trigger a workflow when an updated dataset has arrived in HDFS. True or false?
- True
- False
Question 11: coordinator.properties can be used to establish
- values for variables used in workflow.xml
- values for variables used in coordinator.xml
- the location of the coordinator job in HDFS
- All of the above
Question 12: job.properties can be used to establish
- The location of the workflow job in HDFS, only
- Values for variables used in workflow.xml, only
- The actions to perform at each stage of the workflow, only
- Values for variables used in workflow.xml, and the actions to perform at each stage of the workflow
- The location of the workflow job in HDFS, and values for variables used in workflow.xml
Question 13: The kill node is used to indicate a successful completion of the Oozie workflow. True or false?
- True
- False
Question 14: The join node in an Oozie workflow will wait until all forked paths have completed. True or false?
- True
- False
Question 15: Decision nodes can be used to select from multiple alternative paths through an Oozie workflow. True or false?
- True
- False