Predictive Modeling Fundamentals I Cognitive Class Exam Answers

Enroll Here: Predictive Modeling Fundamentals I Cognitive Class Exam Quiz Answers

Introduction to Predictive Modeling Fundamentals I

Predictive modeling is a powerful technique used in data science and machine learning to predict outcomes based on data. It involves building a model from historical data with known outcomes (the training data) and using this model to make predictions on new data where the outcomes are unknown.

Key Concepts:

Predictive Modeling: The process of creating a model that predicts the value of a target variable based on input features.
Target Variable: Also known as the dependent variable or response variable, this is the variable you are trying to predict.
Features: These are the input variables used to predict the target variable. They can be numerical, categorical, or even text data.
Training Data: Historical data used to train or build the predictive model. It consists of features and their corresponding known target variable values.
Model Building: The process of selecting an appropriate algorithm, training the model on the training data, and optimizing its parameters to achieve accurate predictions.
Prediction: Applying the trained model to new data to predict the target variable’s value.

Steps in Predictive Modeling:

Data Collection: Gather relevant data that includes both features and the target variable.
Data Cleaning: Preprocess the data to handle missing values, outliers, and format the data appropriately for modeling.
Feature Selection/Engineering: Choose relevant features that are likely to have a strong influence on predicting the target variable. This might involve transforming variables or creating new features.
Model Selection: Select an appropriate algorithm based on the nature of the problem (e.g., regression for continuous target variables, classification for categorical target variables).
Model Training: Use the training data to fit the model to learn the relationship between features and the target variable.
Model Evaluation: Assess the model’s performance using evaluation metrics such as accuracy, precision, recall, or mean squared error, depending on the problem type.
Model Tuning: Fine-tune the model by adjusting hyperparameters or trying different algorithms to improve its performance.
Prediction: Deploy the trained model to make predictions on new data where the target variable is unknown.

Common Algorithms Used:

Linear Regression: For predicting continuous variables.
Logistic Regression: For predicting categorical variables.
Decision Trees: For both regression and classification tasks.
Random Forests: Ensemble method based on decision trees.
Gradient Boosting Machines (GBM): Another ensemble method that builds trees sequentially.

Applications:

Finance: Predicting stock prices, credit risk assessment.
Healthcare: Diagnosing diseases based on symptoms.
Marketing: Targeted advertising and customer segmentation.
E-commerce: Recommender systems and predicting customer behavior.

Challenges:

Overfitting: When a model performs well on training data but fails to generalize to new, unseen data.
Underfitting: When a model is too simplistic to capture the underlying patterns in the data.
Data Quality: Poor quality data can lead to inaccurate predictions.
Interpretability: Complex models like neural networks might be accurate but difficult to interpret.

Conclusion:

Predictive modeling is a crucial tool in extracting insights and making informed decisions from data. By understanding the fundamentals of predictive modeling, you can leverage data effectively to solve real-world problems across various domains.

Predictive Modeling Fundamentals I Cognitive Class Certification Answers

Module 1: Introduction to Data Mining Quiz Answers

Question 1: Which of the following applications would require the use of data mining? Select all that apply.

Predicting the outcome of flipping a fair coin
Determining which products in a store are likely to be purchased together
Predicting future stock prices using historical records
Determining the total number of products sold by a store
Sorting a student database by gender

Question 2: Which of the following is NOT a section of the Modeler Interface?

Nodes
Palettes
Stream Canvas
Stream, Outputs, and Model Manager
All of the above are sections of the Modeler Interface

Question 3: Which of the following is NOT a part of the Cross-Industry Process for Data Mining?

Business Understanding
Evaluation
Data Preparation
Data Storage
Modeling

Module 2: The Data Mining Process Quiz Answers

Question 1: Which phase of the data mining process focuses on understanding the project requirements and objectives?

Data Understanding
Data Exploration
Data Preprocessing
Business Understanding
Data Preparation

Question 2: Which Data Preprocessing task focuses on removing outliers and filling in missing values?

Data Integration
Data Cleaning
Data Transformation
Data Reduction
None of the above

Question 3: The IBM SPSS Modeler supports which data type?

Nominal
Categorical
Ordinal
Continuous
All of the above

Module 3: Modeling Techniques Quiz Answers

Question 1: Which of the following methods are commonly used for supervised learning tasks? Select all that apply.

Neural Networks
Decision Trees
K-Means
CARMA
Regression

Question 2: Classification is a subset of supervised learning that focuses on modeling continuous variables. True or false?

True
False

Question 3: Which of the following algorithms is NOT supported by the SPSS Modeler?

Logistic Regression
CARMA
K-Means
Apriori
All of the above algorithms are supported

Module 4: Model Evaluation Quiz Answers

Question 1: What is the term for a negative data point that is incorrectly classified as positive?

True Negative
False Positive
True Positive
False Negative
None of the above

Question 2: Which of the following is NOT a cost-sensitive performance metric?

Precision
Accuracy
Specificity
Sensitivity
All of the above metrics are cost-sensitive

Question 3: What is the formula for the precision metric?

(False Positive) / (True Negative + True Positive)
(True Positive) / (True Positive + False Positive)
(False Positive) / (True Positive + False Positive)
(True Positive) / (True Positive + False Negative)
(True Negative) / (True Negative + False Positive)

Module 5: Deployment on IBM Bluemix Quiz Answers

Question 1: In general, the testing dataset should be significantly larger than the training dataset. True or false?

True
False

Question 2: Which of the following is NOT a model deployment solution?

Bluemix
CRISP-DM
IBM Collaboration and Deployment Services
SPSS Solution Publisher
All of the above are model deployment solutions

Question 3: Which of the following statements are true of IBM Bluemix? Select all that apply.

Bluemix generally takes about a week to deploy an app
Bluemix is supported by a growing community
Bluemix is closed-source
Bluemix provides a self-service application-hosting environment
Bluemix provides built-in load-balancing capabilities

Predictive Modeling Fundamentals I Final Exam Answers

Question 1: Which of the following suggests that the model is overfitting the data?

High accuracy on training data and high accuracy on testing data
Low accuracy on training data and high accuracy on testing data
Low accuracy on training data and low accuracy on testing data
High accuracy on training data and low accuracy on testing data
None of the above

Question 2: Which of the following tasks would require the use of data mining?

Predicting the outcome of rolling two fair dice
Determining which products in a store are likely to be purchased together
Sorting a customer database by age
Computing the number of products sold over a given time period
All of the above

Question 3: Suppose you have collected data on your customers and you wish to determine the demographics they fall into. Which technique is best suited for this task?

Neural Network
Logistic Regression
Clustering
Linear Regression
Decision Tree

Question 4: Suppose you wish to use data mining in order to determine which customers are most likely to sign up for a new service. Which technique is best suited for this task?

Apriori
Decision Tree
Sequence
K-means
CARMA

Question 5: Which SPSS Modeler node can be used to determine a model’s performance? Select all that apply.

Evaluation Node
Analysis Node
Table Node
Auto Classifier Node
Sequence Node

Question 6: Which of the following is NOT a classification or prediction algorithm in SPSS Modeler?

Linear Regression
Neural Network
Logistic Regression
Discriminant
Apriori

Question 7: Which SPSS Modeler node is used to specify whether a given field is an input or a target?

Auto Classifier Node
Data Audit Node
Table Node
Type Node
Analysis Node

Question 8: Which SPSS Modeler node is useful for exploratory analysis on a data set?

Analysis Node
Auto Classifier Node
Table Node
Data Audit Node
Evaluation Node

Question 9: Which SPSS Modeler node is used to both rename fields and exclude fields from the model?

Restructure Node
Filter Node
Partition Node
Data Audit Node
Evaluation Node

Question 10: What is the formula for the accuracy metric? TP = true positive, TN = true negative, FP = false positive, and FN = false negative.

TN / (TN + FP)
TP / (TP + FN)
TP / (TP + FP)
(FP + FN) / (TP + TN + FP + FN)
(TP + TN) / (TP + TN + FP + FN)

Question 11: Which major data preprocessing step focuses on feature selection and feature extraction?

Data Integration
Data Cleaning
Data Reduction
Data Audit
Data Transformation

Question 12: Which SPSS Modeler node is used to identify missing data and screen out potentially problematic fields?

Auto Data Preparation Node
Auto Classifier Node
Restructure Node
Evaluation Node
Data Audit Node

Question 13: SPSS Modeler provides automated tools that determine the best algorithm to use for an application. True or false?

True
False

Question 14: Which SPSS Modeler node is used for sampling the data set?

Type Node
Partition Node
Data Audit Node
Filter Node
Restructure Node

Question 15: Which phase of the data mining process focuses on gathering insights about the data set?

Data Integration
Data Understanding
Business Understanding
Data Preparation
Data Preprocessing