Table of Contents
Enroll Here: Predictive Modeling Fundamentals I Cognitive Class Exam Quiz Answers
Introduction to Predictive Modeling Fundamentals I
Predictive modeling is a powerful technique used in data science and machine learning to predict outcomes based on data. It involves building a model from historical data with known outcomes (the training data) and using this model to make predictions on new data where the outcomes are unknown.
Key Concepts:
- Predictive Modeling: The process of creating a model that predicts the value of a target variable based on input features.
- Target Variable: Also known as the dependent variable or response variable, this is the variable you are trying to predict.
- Features: These are the input variables used to predict the target variable. They can be numerical, categorical, or even text data.
- Training Data: Historical data used to train or build the predictive model. It consists of features and their corresponding known target variable values.
- Model Building: The process of selecting an appropriate algorithm, training the model on the training data, and optimizing its parameters to achieve accurate predictions.
- Prediction: Applying the trained model to new data to predict the target variable’s value.
Steps in Predictive Modeling:
- Data Collection: Gather relevant data that includes both features and the target variable.
- Data Cleaning: Preprocess the data to handle missing values, outliers, and format the data appropriately for modeling.
- Feature Selection/Engineering: Choose relevant features that are likely to have a strong influence on predicting the target variable. This might involve transforming variables or creating new features.
- Model Selection: Select an appropriate algorithm based on the nature of the problem (e.g., regression for continuous target variables, classification for categorical target variables).
- Model Training: Use the training data to fit the model to learn the relationship between features and the target variable.
- Model Evaluation: Assess the model’s performance using evaluation metrics such as accuracy, precision, recall, or mean squared error, depending on the problem type.
- Model Tuning: Fine-tune the model by adjusting hyperparameters or trying different algorithms to improve its performance.
- Prediction: Deploy the trained model to make predictions on new data where the target variable is unknown.
Common Algorithms Used:
- Linear Regression: For predicting continuous variables.
- Logistic Regression: For predicting categorical variables.
- Decision Trees: For both regression and classification tasks.
- Random Forests: Ensemble method based on decision trees.
- Gradient Boosting Machines (GBM): Another ensemble method that builds trees sequentially.
Applications:
- Finance: Predicting stock prices, credit risk assessment.
- Healthcare: Diagnosing diseases based on symptoms.
- Marketing: Targeted advertising and customer segmentation.
- E-commerce: Recommender systems and predicting customer behavior.
Challenges:
- Overfitting: When a model performs well on training data but fails to generalize to new, unseen data.
- Underfitting: When a model is too simplistic to capture the underlying patterns in the data.
- Data Quality: Poor quality data can lead to inaccurate predictions.
- Interpretability: Complex models like neural networks might be accurate but difficult to interpret.
Conclusion:
Predictive modeling is a crucial tool in extracting insights and making informed decisions from data. By understanding the fundamentals of predictive modeling, you can leverage data effectively to solve real-world problems across various domains.
Predictive Modeling Fundamentals I Cognitive Class Certification Answers
Module 1: Introduction to Data Mining Quiz Answers
Question 1: Which of the following applications would require the use of data mining? Select all that apply.
- Predicting the outcome of flipping a fair coin
- Determining which products in a store are likely to be purchased together
- Predicting future stock prices using historical records
- Determining the total number of products sold by a store
- Sorting a student database by gender
Question 2: Which of the following is NOT a section of the Modeler Interface?
- Nodes
- Palettes
- Stream Canvas
- Stream, Outputs, and Model Manager
- All of the above are sections of the Modeler Interface
Question 3: Which of the following is NOT a part of the Cross-Industry Process for Data Mining?
- Business Understanding
- Evaluation
- Data Preparation
- Data Storage
- Modeling
Module 2: The Data Mining Process Quiz Answers
Question 1: Which phase of the data mining process focuses on understanding the project requirements and objectives?
- Data Understanding
- Data Exploration
- Data Preprocessing
- Business Understanding
- Data Preparation
Question 2: Which Data Preprocessing task focuses on removing outliers and filling in missing values?
- Data Integration
- Data Cleaning
- Data Transformation
- Data Reduction
- None of the above
Question 3: The IBM SPSS Modeler supports which data type?
- Nominal
- Categorical
- Ordinal
- Continuous
- All of the above
Module 3: Modeling Techniques Quiz Answers
Question 1: Which of the following methods are commonly used for supervised learning tasks? Select all that apply.
- Neural Networks
- Decision Trees
- K-Means
- CARMA
- Regression
Question 2: Classification is a subset of supervised learning that focuses on modeling continuous variables. True or false?
- True
- False
Question 3: Which of the following algorithms is NOT supported by the SPSS Modeler?
- Logistic Regression
- CARMA
- K-Means
- Apriori
- All of the above algorithms are supported
Module 4: Model Evaluation Quiz Answers
Question 1: What is the term for a negative data point that is incorrectly classified as positive?
- True Negative
- False Positive
- True Positive
- False Negative
- None of the above
Question 2: Which of the following is NOT a cost-sensitive performance metric?
- Precision
- Accuracy
- Specificity
- Sensitivity
- All of the above metrics are cost-sensitive
Question 3: What is the formula for the precision metric?
- (False Positive) / (True Negative + True Positive)
- (True Positive) / (True Positive + False Positive)
- (False Positive) / (True Positive + False Positive)
- (True Positive) / (True Positive + False Negative)
- (True Negative) / (True Negative + False Positive)
Module 5: Deployment on IBM Bluemix Quiz Answers
Question 1: In general, the testing dataset should be significantly larger than the training dataset. True or false?
- True
- False
Question 2: Which of the following is NOT a model deployment solution?
- Bluemix
- CRISP-DM
- IBM Collaboration and Deployment Services
- SPSS Solution Publisher
- All of the above are model deployment solutions
Question 3: Which of the following statements are true of IBM Bluemix? Select all that apply.
- Bluemix generally takes about a week to deploy an app
- Bluemix is supported by a growing community
- Bluemix is closed-source
- Bluemix provides a self-service application-hosting environment
- Bluemix provides built-in load-balancing capabilities
Predictive Modeling Fundamentals I Final Exam Answers
Question 1: Which of the following suggests that the model is overfitting the data?
- High accuracy on training data and high accuracy on testing data
- Low accuracy on training data and high accuracy on testing data
- Low accuracy on training data and low accuracy on testing data
- High accuracy on training data and low accuracy on testing data
- None of the above
Question 2: Which of the following tasks would require the use of data mining?
- Predicting the outcome of rolling two fair dice
- Determining which products in a store are likely to be purchased together
- Sorting a customer database by age
- Computing the number of products sold over a given time period
- All of the above
Question 3: Suppose you have collected data on your customers and you wish to determine the demographics they fall into. Which technique is best suited for this task?
- Neural Network
- Logistic Regression
- Clustering
- Linear Regression
- Decision Tree
Question 4: Suppose you wish to use data mining in order to determine which customers are most likely to sign up for a new service. Which technique is best suited for this task?
- Apriori
- Decision Tree
- Sequence
- K-means
- CARMA
Question 5: Which SPSS Modeler node can be used to determine a model’s performance? Select all that apply.
- Evaluation Node
- Analysis Node
- Table Node
- Auto Classifier Node
- Sequence Node
Question 6: Which of the following is NOT a classification or prediction algorithm in SPSS Modeler?
- Linear Regression
- Neural Network
- Logistic Regression
- Discriminant
- Apriori
Question 7: Which SPSS Modeler node is used to specify whether a given field is an input or a target?
- Auto Classifier Node
- Data Audit Node
- Table Node
- Type Node
- Analysis Node
Question 8: Which SPSS Modeler node is useful for exploratory analysis on a data set?
- Analysis Node
- Auto Classifier Node
- Table Node
- Data Audit Node
- Evaluation Node
Question 9: Which SPSS Modeler node is used to both rename fields and exclude fields from the model?
- Restructure Node
- Filter Node
- Partition Node
- Data Audit Node
- Evaluation Node
Question 10: What is the formula for the accuracy metric? TP = true positive, TN = true negative, FP = false positive, and FN = false negative.
- TN / (TN + FP)
- TP / (TP + FN)
- TP / (TP + FP)
- (FP + FN) / (TP + TN + FP + FN)
- (TP + TN) / (TP + TN + FP + FN)
Question 11: Which major data preprocessing step focuses on feature selection and feature extraction?
- Data Integration
- Data Cleaning
- Data Reduction
- Data Audit
- Data Transformation
Question 12: Which SPSS Modeler node is used to identify missing data and screen out potentially problematic fields?
- Auto Data Preparation Node
- Auto Classifier Node
- Restructure Node
- Evaluation Node
- Data Audit Node
Question 13: SPSS Modeler provides automated tools that determine the best algorithm to use for an application. True or false?
- True
- False
Question 14: Which SPSS Modeler node is used for sampling the data set?
- Type Node
- Partition Node
- Data Audit Node
- Filter Node
- Restructure Node
Question 15: Which phase of the data mining process focuses on gathering insights about the data set?
- Data Integration
- Data Understanding
- Business Understanding
- Data Preparation
- Data Preprocessing