Table of Contents
Enroll Here: Machine Learning with Apache SystemML Cognitive Class Exam Quiz Answers
Introduction to Machine Learning with Apache SystemML
Apache SystemML is a powerful open-source platform designed to facilitate large-scale machine learning (ML) tasks. It aims to provide both flexibility and scalability, making it suitable for handling big data scenarios efficiently. Here’s an overview of Apache SystemML and its key features:
What is Apache SystemML?
Apache SystemML is a declarative machine learning platform that allows users to define their machine learning algorithms in a high-level language. It abstracts the underlying complexities of distributed computing and optimization, enabling data scientists and developers to focus more on the logic and structure of their algorithms rather than low-level implementation details.
Key Features
- Declarative Language: SystemML uses a high-level, declarative language called DML (Declarative Machine Learning) for expressing machine learning algorithms. This language is designed to be intuitive and close to mathematical notation, making it easier for data scientists to write and understand complex algorithms.
- Scalability: One of the primary strengths of Apache SystemML is its ability to scale efficiently. It leverages Apache Hadoop and Apache Spark frameworks for distributed computing, allowing it to process large datasets across clusters of computers.
- Automatic Optimization: SystemML includes an optimizer that automatically transforms high-level algorithm descriptions (expressed in DML) into distributed, optimized execution plans. This helps in achieving high performance while executing machine learning tasks on big data.
- Flexibility: It supports a wide range of machine learning algorithms, including linear regression, logistic regression, matrix factorization, and more. Users can also extend the system by integrating custom algorithms written in DML.
- Interoperability: Apache SystemML can be integrated with other Apache projects such as Apache Spark, allowing seamless interaction and leveraging the strengths of both platforms.
Getting Started with Apache SystemML
To begin working with Apache SystemML, follow these steps:
- Installation: Apache SystemML typically runs on Apache Spark. Install Apache Spark first and then configure SystemML to run on top of it.
- Learn DML: Familiarize yourself with the DML language, which is used to specify machine learning algorithms in SystemML. DML is similar to R or Python in syntax and is specifically designed for expressing matrix and tensor operations common in machine learning.
- Write and Execute Scripts: Start writing DML scripts to define and execute machine learning workflows. Use the built-in algorithms or implement custom ones as per your requirements.
- Scale Up: As your data and computation needs grow, scale up your SystemML deployment by adding more nodes to your Apache Spark cluster.
Conclusion
Apache SystemML is a versatile tool for data scientists and engineers working with large-scale machine learning tasks. By providing a high-level language and leveraging distributed computing frameworks like Apache Spark, it simplifies the development and execution of complex machine learning algorithms on big data. Whether you are just starting with machine learning or managing advanced data science projects, Apache SystemML offers the scalability and flexibility necessary to tackle real-world challenges effectively.
Machine Learning with Apache SystemML Cognitive Class Certification Answers
Module 1 – What is SystemML Quiz Answers
Question 1: In machine learning, as analytical models are exposed to new data, they are able to independently adapt. True or false?
- True
- False
Question 2: Which of the following are types of alternatives to SystemML?
- R
- MLlib
- Spark R
- Mahout
- All of the above
Question 3: The R language was designed for machine learning and works great for big data. True or false?
- True
- False
Module 2 – SystemML and the Spark MLContext Quiz Answers
Question 1: What the ways you can use SystemML’s Spark MLContext?
- spark-shell
- Through an application using the API
- Through the SystemML console
- A notebook interface
- None of the above
Question 2: You must pass in the reference of the SparkContext to the MLContext constructor. True or false?
- True
- False
Question 3: Why would you use the Spark MLContext?
- Programmatic interface into SystemML’s libraries
- To benefit from the optimizations that come with SystemML
- When you need to convert the data to a binary block matrix
- A and B only
- None of the above
Module 3 – SystemML Algorithms Quiz Answers
Question 1: The Regression algorithm is an ensemble learning method that creates a model composed of a set of tree models for classification. True or false?
- True
- False
Question 2: K-means is an unsupervised learning algorithm used to assign a category label to each record so that each similar record tend to get the same label. True or false.
- True
- False
Question 3: The Kaplan-Meier algorithm predicts how likely it is someone will purchase a product of similar category. True or false?
- True
- False
Module 4 – Declarative Machine Learning (DML) Quiz Answers
Question 1: What does DML stand for?
- Data manipulation language
- Data machine language
- Declarative machine learning
- Declarative machine language
Question 2: To run a DML script, which of the following jar file is required at runtime?
- MLContext.jar
- DML.jar
- SystemML.jar
- spark-context.jar
Question 3: Which of the following way to pass command-line arguments is recommended?
- positional arguments
- named arguments
- a comma separated list
- a file
Module 5 – SystemML Architecture and Optimization Quiz Answers
Question 1: In the ALS performance comparison, at which dataset does the MLlib code run out of memory??
- Large
- Medium
- Small
- None
Question 2: Which of the following does NOT belong to the SystemML Optimizer stack?
- Create the RDDs for the high-level algorithm
- Compute memory estimates
- Generate runtime program
- Live variable analysis
Question 3: How does SystemML know it is better to run the code on one machine?
- Advanced Rewrites
- Propagation of statistics
- Live variable analysis
- Efficient runtime
- The developer tells it to
Machine Learning with Apache SystemML Final Exam Answers
Question 1: What is machine learning?
- Artificial intelligence for machines to make decisions
- Same as data science to gather insight using machines
- Enable computers to learn without being explicitly programmed
- Learning about how machines operate
Question 2: What is the purpose of SystemML?
- Programming language for big data
- In-memory analytics engine
- Machine learning for spark
- Machine learning on hadoop
- All of the above
Question 3: What are the challenges of machine learning on big data using R?
- Programmers are needed to convert the high-level code to low level code for parallel computing
- Each iteration of the code takes time to be rewritten and recompile
- Chances for errors are higher during the translation of the algorithms
- All of the above
Question 4: What is the vision of SystemML?
- Run the same algorithm developed for small data on big data
- Provide flexible algorithm of ML algorithms
- Automatic generation of hybrid runtime plans
- All of the above
Question 5: Which of the following languages is SystemML most similar?
- R
- Python
- Java
- Scala
- Perl
- R and Python
- Java and Scala
Question 6: Which of the following line of code will launch the Spark shell with SystemML?
- ./bin/spark-shell –jars SystemML.jar
- ./bin/spark-shell –executor-memory 4G –jars SystemML.jar
- ./bin/spark-shell –driver-memory 4G –jars SystemML.jar
- ./bin/spark-shell –executor-memory 4G –driver-memory 4G –jars SystemML.jar
- All of the above
Question 7: Why would you convert a DataFrame to a binary-block matrix?
- To enable parallelization within the Spark engine
- To use the rich set of APIs provided by the binary-block matrix
- Allows algorithm performance to be measured separately from data conversion time
- Allows a more efficient runtime processing of the data
Question 8: Which of the following is TRUE with regards to helper methods in SystemML?
- SystemML’s output is encapsulated in the MLContext object
- SystemML’s output is encapsulated in the MLOutput object
- Helper methods retrieves the values from the MLOutput object
- Helper methods retrieves the values from the MLContext object
- A and D only
- B and C only
Question 9: Which is NOT a benefit of using SystemML algorithms?
- Run in parallel
- It is faster than all other algorithms
- No need for translation into a lower-level language
- Algorithms are optimized based on data and cluster characteristics
Question 10: Which of the following classes of algorithms provide a recommendation?
- Regression
- Classification
- Matrix Factorization
- Descriptive statistics
Question 11: Which of the following algorithm can group a set of data into known categories?
- Regression
- Clustering
- Survival Analysis
- Classification
Question 12: Which of the following algorithm can be used for prediction, forecasting, or error reduction?
- Clustering
- Regression
- Survival Analysis
- Descriptive statistics
Question 13: Which of the following value typesis NOT supported in the DML language?
- String
- Double
- Varchar
- Boolean
Question 14: Matrix-vector operations avoids the need for creating replicated matrix for a certain subset of operations. True or false?
- True
- False
Question 15: Global variables cannot be access within a function. True or false?
- True
- False
Question 16: Which of the following are NOT types of categories of built-in functions in DML?
- Derivative built-in functions
- Matrix built-in functions
- Statistical built-in functions
- Casting built-in functions
Question 17: In the statistics propagation phase of the SystemML optimizer, what exactly is happening?
- To determine the confidence level of the computed results
- All the statistics is propagated to the top node to determine the most efficient runtime for query execution
- To determine of probability of the operation succeeding within a given period of time
- Find the widest matrix required and determine if it all fits into the heap.
Question 18: What is the benefit of doing the matrix rewrite?
- Reduce the line of code needed to represent the matrix
- To determine the confidence level of the computed results
- Clean up and unused memory from the matrix
- To enable parallelization of the given matrixithin a given period of time
- Represent the final matrix without computing the intermediate matrices
Question 19: Which is NOT part of the SystemML runtime for Spark?
- Automates critical performance decisions
- Distributed vs. local runtime
- Efficient linear algebra optimizations
- Automated RDD caching
- None of the above
Question 20: SystemML is an Apache open-source project. True or false
- True
- False