Table of Contents

**Enroll Here: Predictive Modeling Fundamentals I Cognitive Class Exam Quiz Answers**

**Introduction to Predictive Modeling Fundamentals I**

Predictive modeling is a powerful technique used in data science and machine learning to predict outcomes based on data. It involves building a model from historical data with known outcomes (the training data) and using this model to make predictions on new data where the outcomes are unknown.

**Key Concepts:**

**Predictive Modeling:**The process of creating a model that predicts the value of a target variable based on input features.**Target Variable:**Also known as the dependent variable or response variable, this is the variable you are trying to predict.**Features:**These are the input variables used to predict the target variable. They can be numerical, categorical, or even text data.**Training Data:**Historical data used to train or build the predictive model. It consists of features and their corresponding known target variable values.**Model Building:**The process of selecting an appropriate algorithm, training the model on the training data, and optimizing its parameters to achieve accurate predictions.**Prediction:**Applying the trained model to new data to predict the target variable’s value.

**Steps in Predictive Modeling:**

**Data Collection:**Gather relevant data that includes both features and the target variable.**Data Cleaning:**Preprocess the data to handle missing values, outliers, and format the data appropriately for modeling.**Feature Selection/Engineering:**Choose relevant features that are likely to have a strong influence on predicting the target variable. This might involve transforming variables or creating new features.**Model Selection:**Select an appropriate algorithm based on the nature of the problem (e.g., regression for continuous target variables, classification for categorical target variables).**Model Training:**Use the training data to fit the model to learn the relationship between features and the target variable.**Model Evaluation:**Assess the model’s performance using evaluation metrics such as accuracy, precision, recall, or mean squared error, depending on the problem type.**Model Tuning:**Fine-tune the model by adjusting hyperparameters or trying different algorithms to improve its performance.**Prediction:**Deploy the trained model to make predictions on new data where the target variable is unknown.

**Common Algorithms Used:**

**Linear Regression:**For predicting continuous variables.**Logistic Regression:**For predicting categorical variables.**Decision Trees:**For both regression and classification tasks.**Random Forests:**Ensemble method based on decision trees.**Gradient Boosting Machines (GBM):**Another ensemble method that builds trees sequentially.

**Applications:**

**Finance:**Predicting stock prices, credit risk assessment.**Healthcare:**Diagnosing diseases based on symptoms.**Marketing:**Targeted advertising and customer segmentation.**E-commerce:**Recommender systems and predicting customer behavior.

**Challenges:**

**Overfitting:**When a model performs well on training data but fails to generalize to new, unseen data.**Underfitting:**When a model is too simplistic to capture the underlying patterns in the data.**Data Quality:**Poor quality data can lead to inaccurate predictions.**Interpretability:**Complex models like neural networks might be accurate but difficult to interpret.

**Conclusion:**

Predictive modeling is a crucial tool in extracting insights and making informed decisions from data. By understanding the fundamentals of predictive modeling, you can leverage data effectively to solve real-world problems across various domains.

**Predictive Modeling Fundamentals I Cognitive Class Certification Answers**

**Module 1: Introduction to Data Mining Quiz Answers**

**Question 1: Which of the following applications would require the use of data mining? Select all that apply.**

- Predicting the outcome of flipping a fair coin
**Determining which products in a store are likely to be purchased together****Predicting future stock prices using historical records**- Determining the total number of products sold by a store
- Sorting a student database by gender

**Question 2: Which of the following is NOT a section of the Modeler Interface?**

- Nodes
- Palettes
- Stream Canvas
- Stream, Outputs, and Model Manager
**All of the above are sections of the Modeler Interface**

**Question 3: Which of the following is NOT a part of the Cross-Industry Process for Data Mining?**

- Business Understanding
- Evaluation
- Data Preparation
**Data Storage**- Modeling

**Module 2: The Data Mining Process Quiz Answers**

**Question 1: Which phase of the data mining process focuses on understanding the project requirements and objectives?**

- Data Understanding
- Data Exploration
- Data Preprocessing
**Business Understanding**- Data Preparation

**Question 2: Which Data Preprocessing task focuses on removing outliers and filling in missing values?**

- Data Integration
**Data Cleaning**- Data Transformation
- Data Reduction
- None of the above

**Question 3: The IBM SPSS Modeler supports which data type?**

- Nominal
- Categorical
- Ordinal
- Continuous
**All of the above**

**Module 3: Modeling Techniques Quiz Answers**

**Question 1: Which of the following methods are commonly used for supervised learning tasks? Select all that apply.**

**Neural Networks****Decision Trees**- K-Means
- CARMA
**Regression**

**Question 2: Classification is a subset of supervised learning that focuses on modeling continuous variables. True or false?**

- True
**False**

**Question 3: Which of the following algorithms is NOT supported by the SPSS Modeler?**

- Logistic Regression
- CARMA
- K-Means
- Apriori
**All of the above algorithms are supported**

**Module 4: Model Evaluation Quiz Answers**

**Question 1: What is the term for a negative data point that is incorrectly classified as positive?**

- True Negative
**False Positive**- True Positive
- False Negative
- None of the above

**Question 2: Which of the following is NOT a cost-sensitive performance metric?**

- Precision
**Accuracy**- Specificity
- Sensitivity
- All of the above metrics are cost-sensitive

**Question 3: What is the formula for the precision metric?**

- (False Positive) / (True Negative + True Positive)
**(True Positive) / (True Positive + False Positive)**- (False Positive) / (True Positive + False Positive)
- (True Positive) / (True Positive + False Negative)
- (True Negative) / (True Negative + False Positive)

**Module 5: Deployment on IBM Bluemix Quiz Answers**

**Question 1: In general, the testing dataset should be significantly larger than the training dataset. True or false?**

- True
**False**

**Question 2: Which of the following is NOT a model deployment solution?**

- Bluemix
**CRISP-DM**- IBM Collaboration and Deployment Services
- SPSS Solution Publisher
- All of the above are model deployment solutions

**Question 3: Which of the following statements are true of IBM Bluemix? Select all that apply.**

- Bluemix generally takes about a week to deploy an app
**Bluemix is supported by a growing community**- Bluemix is closed-source
**Bluemix provides a self-service application-hosting environment****Bluemix provides built-in load-balancing capabilities**

**Predictive Modeling Fundamentals I Final Exam Answers**

**Question 1: Which of the following suggests that the model is overfitting the data?**

- High accuracy on training data and high accuracy on testing data
- Low accuracy on training data and high accuracy on testing data
- Low accuracy on training data and low accuracy on testing data
**High accuracy on training data and low accuracy on testing data**- None of the above

**Question 2: Which of the following tasks would require the use of data mining?**

- Predicting the outcome of rolling two fair dice
**Determining which products in a store are likely to be purchased together**- Sorting a customer database by age
- Computing the number of products sold over a given time period
- All of the above

**Question 3: Suppose you have collected data on your customers and you wish to determine the demographics they fall into. Which technique is best suited for this task?**

- Neural Network
- Logistic Regression
**Clustering**- Linear Regression
- Decision Tree

**Question 4: Suppose you wish to use data mining in order to determine which customers are most likely to sign up for a new service. Which technique is best suited for this task?**

- Apriori
**Decision Tree**- Sequence
- K-means
- CARMA

**Question 5: Which SPSS Modeler node can be used to determine a model’s performance? Select all that apply.**

**Evaluation Node****Analysis Node**- Table Node
- Auto Classifier Node
- Sequence Node

**Question 6: Which of the following is NOT a classification or prediction algorithm in SPSS Modeler?**

- Linear Regression
- Neural Network
- Logistic Regression
- Discriminant
**Apriori**

**Question 7: Which SPSS Modeler node is used to specify whether a given field is an input or a target?**

- Auto Classifier Node
- Data Audit Node
- Table Node
**Type Node**- Analysis Node

**Question 8: Which SPSS Modeler node is useful for exploratory analysis on a data set?**

- Analysis Node
- Auto Classifier Node
- Table Node
**Data Audit Node**- Evaluation Node

**Question 9: Which SPSS Modeler node is used to both rename fields and exclude fields from the model?**

- Restructure Node
**Filter Node**- Partition Node
- Data Audit Node
- Evaluation Node

**Question 10: What is the formula for the accuracy metric? TP = true positive, TN = true negative, FP = false positive, and FN = false negative.**

- TN / (TN + FP)
- TP / (TP + FN)
- TP / (TP + FP)
- (FP + FN) / (TP + TN + FP + FN)
**(TP + TN) / (TP + TN + FP + FN)**

**Question 11: Which major data preprocessing step focuses on feature selection and feature extraction?**

- Data Integration
- Data Cleaning
**Data Reduction**- Data Audit
- Data Transformation

**Question 12: Which SPSS Modeler node is used to identify missing data and screen out potentially problematic fields?**

**Auto Data Preparation Node**- Auto Classifier Node
- Restructure Node
- Evaluation Node
- Data Audit Node

**Question 13: SPSS Modeler provides automated tools that determine the best algorithm to use for an application. True or false?**

**True**- False

**Question 14: Which SPSS Modeler node is used for sampling the data set?**

- Type Node
**Partition Node**- Data Audit Node
- Filter Node
- Restructure Node

**Question 15: Which phase of the data mining process focuses on gathering insights about the data set?**

- Data Integration
**Data Understanding**- Business Understanding
- Data Preparation
- Data Preprocessing