Table of Contents
Enroll Here: Data Science Methodology Cognitive Class Exam Quiz Answers
Introduction to Data Science Methodology
Introducing the methodology of data science involves understanding the systematic approach used to derive insights and knowledge from data. Here’s a structured overview:
1. Problem Understanding
- Define the Problem: Clearly articulate the business problem or question that data science aims to address.
- Goals: Establish measurable objectives that the analysis should achieve.
- Stakeholder Alignment: Ensure alignment between data scientists and stakeholders on goals and expectations.
2. Data Collection
- Identify Data Sources: Determine what data is available and relevant to the problem.
- Data Acquisition: Collect structured or unstructured data from various sources like databases, APIs, files, etc.
- Data Exploration: Perform initial exploration to assess data quality, understand its characteristics, and identify potential issues.
3. Data Preparation
- Data Cleaning: Handle missing data, outliers, and inconsistencies.
- Data Transformation: Format data as required (e.g., normalization, scaling).
- Feature Engineering: Create new features from existing data to improve model performance.
4. Data Modeling
- Select Modeling Techniques: Choose appropriate algorithms or methods based on the problem and data.
- Model Training: Build a predictive or descriptive model using the prepared data.
- Model Evaluation: Assess model performance using metrics relevant to the problem (e.g., accuracy, precision, recall).
5. Model Deployment
- Deploy the Model: Integrate the model into the existing business process or software infrastructure.
- Monitor Performance: Continuously monitor the model to ensure it performs as expected in real-world scenarios.
- Feedback Loop: Gather feedback from users and stakeholders to iterate and improve the model.
6. Iterate and Refine
- Iterative Process: Data science methodology often involves iterating through the above steps based on findings and feedback.
- Refinement: Improve models, data quality, or methodology based on insights gained during deployment and usage.
7. Conclusion and Reporting
- Summarize Findings: Present insights and findings to stakeholders in a clear and understandable manner.
- Document the Process: Maintain documentation of the methodology, data sources, and decisions made throughout the project.
Key Considerations
- Ethical Considerations: Address privacy, bias, and fairness issues in data usage and model development.
- Domain Knowledge: Collaboration with domain experts is crucial to ensure the relevance and applicability of the analysis.
Data science methodology provides a structured framework to navigate the complexities of working with data, ensuring that insights extracted contribute meaningfully to solving business problems or answering research questions.
Data Science Methodology Cognitive Class Certification Answers
Module 1 – From Problem to Approach Quiz Answers
Question 1: Select the correct statement.
- A methodology is an application for a computer program.
- A methodology is a set of instructions.
- A methodology is a system of methods used in a particular area of study or activity.
- All of the above statements are correct.
Question 2: Select the correct statement.
- The data science methodology described in this course is only used by certified data scientists.
- The data science methodology described in this course is outlined by John Rollins from IBM.
- The data science methodology described in this course is limited to IBM.
- None of the above statements are correct.
Question 3: Select the correct statement.
- The first stage of the data science methodology is data understanding.
- The first stage of the data science methodology is modeling.
- The first stage of the data science methodology is business understanding.
- The first stage of the data science methodology is data collection.
Module 2 – From Requirements to Collection Quiz Answers
Question 1: Select the correct statement.
- If a problem is a dish, then data is an answer.
- If a problem is a dish, then data is an ingredient.
- If a problem is a dish, then data is a list of information.
- None of the above statements are correct.
Question 2: Select the correct statement.
- A data requirement is never refined.
- A data requirement is set in stone.
- A data requirement is the initial set of ingredients.
- None of the above statements are correct.
Question 3: Select the correct statement.
- Data scientists determine how to prepare the data.
- Data scientists identify the data that is required for data modeling.
- Data scientists determine how to collect the data.
- All of the above.
Module 3 – From Understanding to Preparation Quiz Answers
Question 1: Select the correct statement about data preparation.
- Data preparation involves properly formatting the data.
- Data preparation involves correcting invalid values and addressing outliers.
- Data preparation involves removing duplicate data.
- Data preparation involves addressing missing values.
- All of the above statements are correct.
Question 2: Select the correct statement about data understanding.
- Data understanding encompasses removing redundant data.
- Data understanding encompasses all activities related to constructing the dataset.
- Data understanding encompasses sorting the data.
- All of the above statements about data understanding are correct.
Question 3: Select the correct statement about what data scientists and database administrators (DBAs) do during data preparation.
- During data preparation, data scientists and DBAs identify missing data.
- During data preparation, data scientists and DBAs determine the timing of events.
- During data preparation, data scientists and DBAs aggregate the data and merge them from different sources.
- During data preparation, data scientists and DBAs define the variables to be used in the model.
- All of the above statements are correct.
Module 4 – From Modeling to Evaluation Quiz Answers
Question 1: Select thee correct statement.
- A training set is used for data visualization.
- A training set is used for predictive modeling.
- A training set is used for statistical analysis.
- A training set is used for descriptive modeling.
- None of the above statements are correct,
Question 2: A statistician calls a false-negative, a type I error, and a false-positive, a type II error.
- True
- False
Question 3: Select the correct statement about model evaluation.
- Model evaluation can include statistical significance testing.
- Model evaluation includes ensuring that the data are properly handled and interpreted.
- Model evaluation includes ensuring the model is designed as intended.
- Model evaluation includes snsuring that the model is working as intended.
- All of the above statements are correct.
Module 5 – From Deployment to Feedback Quiz Answers
Question 1: The final stages of the data science methodology are an iterative cycle between model evaluation, deployment, and feedback.
- True
- False
Question 2: What is model evaluation used for?
- Assessing the model after getting deployed.
- Assessing the model before getting deployed.
- Determining if the model is good for other uses.
- All of the above.
- None of the above.
Question 3: Select the correct statement about the feedback stage of the data science methodology.
- Feedback is essential to the long-term viability of the model.
- Feedback is not helpful and gets in the way.
- Feedback is not required once launched.
- None of the above statements are correct.
Data Science Methodology Final Exam Answers
Question 1: Select the correct sentence about the data science methodology explained in the course.
- Data science methodology is not an iterative process – one does not go back and forth between methodological steps.
- Data science methodology is a specific strategy that guides processes and activities relating to data science only for text analytics.
- Data science methodology always starts with data collection.
- Data science methodology provides the data scientist with a framework for how to proceed to obtain answers.
- Data science methodology depends on a specific set of technologies or tools.
Question 2: Business understanding is important in the data science methodology stage. Why?
- Because it shapes the rest of the methodological steps.
- Because it clearly defines the problem and the needs from a business perspective.
- Because it ensures that the work generates the intended solution.
- Because it involves domain expertise.
- All of the above.
Question 3: A data scientist determines that building a recommender system is the solution for a particular business problem at hand. What stage of the data science methodology does this represent?
- Modeling
- Deployment
- Model evaluation
- Analytic approach
- Data understanding
Question 4: Which of the following represent the two important characteristics of the data science methodology?
- It is a highly iterative process and immediately ends when the model is deployed.
- It is not an iterative process and it never ends.
- It has no endpoint because data collection occurs before identifying the data requirements.
- It immediately ends when the model is deployed because no feedback is required.
- It is a highly iterative process and it never ends.
Question 5: What do data scientists typically use for exploratory analysis of data and to get acquainted with them?
- They use support vector machines and neural networks as feature extraction techniques.
- They begin with regression, classification, or clustering.
- They use deep learning.
- They use descriptive statistics and data visualization techniques.
- All of the above.
Question 6: Select the correct statement about data preparation.
- Data preparation cannot be accelerated through automation.
- Data preparation involves dealing with missing improperly coded data and can include using text analysis to structure unstructured or semi-structured text data.
- Data preparation is typically the least time-consuming methodological step.
- All of the above.
- None of the above.
Question 7: Which statement best describes the modeling stage of the data science methodology.
- Modeling is followed by the analytic approach stage.
- Modeling may require testing multiple algorithms and parameters.
- Modeling is always based on predictive models.
- Modeling always uses training and test sets.
- All of the above.
Question 8: Which of the following statements best describe the model evaluation stage of the data science methodology?
- Model evaluation may entail statistical significance tests, particularly when additional proof is necessary to justify some of the emerging recommendations.
- Model evaluation is important because it examines how well the model performs in the context of the business problem.
- Model evaluation entails computing graphs and/or various diagnostic measures such as a confusion matrix.
- Model evaluation is done using a test set if the model is a predictive one.
- All of the above.
Question 9: What does deploying a model into production represent?
- It represents the end of the iterative process that includes feedback, model refinement, and redeployment.
- It represents the beginning of an iterative process that includes feedback, model refinement and redeployment and requires the input of additional groups, such as marketing personnel and business owners.
- It represents the final data science product.
- None of the above.
Question 10: A data scientist, John, was asked to help reduce readmission rates at a local hospital. After some time, John provided a model that predicted which patients were more likely to be readmitted to the hospital and declared that his work was done. Which of the following best describes this scenario?
- John only provided one model as a solution and he should have provided multiple models.
- The scenario is already optimal.
- Even though John only submitted one solution, it might be a good one. However, John needed feedback on his model from the hospital to confirm that his model was able to address the problem appropriately and sufficiently.
- John’s mistake is that he lied in the analytic approach step of the data science methodology.
- John still needed to collect more data.
Question 11: A car company asked a data scientist to determine what type of customers are more likely to purchase their vehicles. However, the data comes from several sources and is in a relatively “raw format”. What kind of processing can the data scientist perform on the data to prepare it for modeling?
- Feature engineering.
- Transforming the data into more useful variables.
- Combining the data from the various sources.
- Addressing missing/invalid values.
- All of the above.
Question 12: High-performance, massively parallel systems can be used to facilitate the following methodological steps.
- Data Preparation and Modeling.
- Modeling only.
- Deployment.
- Business Understanding.
- All of the above.
Question 13: Data scientists may use either a “top-down” approach or a “bottom-up” approach to data science. These two approaches refer to:
- “Top-down” approach – the data, when sorted, is modeled from the “top” of the data towards the “bottom”. “Bottom-up” approach – the data is modeled from the “bottom” of the data to the “top”.
- “Top-down” approach – models are fit before the data is explored. “Bottom-up” approach – data is explored, and then a model is fit.
- “Top-down” approach – first defining a business problem then analyzing the data to find a solution. “Bottom-up” approach – starting with the data, and then coming up with a business problem based on the data.
- “Top-down” approach – using massively parallel, warehouses with huge data volumes as the data source. “Bottom-up” approach – using a sample of small data before using large data.
- All of the above.
Question 14: The following are all examples of rapidly evolving technologies that affect data science methodology EXCEPT for?
- Data sampling.
- Automation.
- Text analysis.
- Platform growth.
- In-database analytics.
Question 15: Data understanding involves all of the following EXCEPT for?
- Discovering initial insights about the data.
- Visualizing the data.
- Assessing data quality.
- Understanding the content of the data.
- Gathering and analyzing feedback for assessment of the model’s performance.
Question 16: For predictive models, a test set, which is similar to – but independent of – the training set, is used to determine how well the model predicts outcomes. This is an example of what step in the methodology?
- Data preparation.
- Deployment.
- Analytic approach.
- Model evaluation.
- Data requirements.
Question 17: “When ______ data is available (such as customer call center logs or physicians’ notes in unstructured or semi-structured format), _______ analytics can be useful in deriving new structured variables to enrich the set predictors and improve model accuracy.” Which of the following most appropriately fills in the blanks?
- text; text
- market; statistical
- big; digital
- highly structured; text
- text; predictive
Question 18: Typically, in a predictive model, the training set and the test set are very different and independent, such as having a different set of variables or structure.
- True
- False
Question 19: Data scientists may frequently return to a previous stage to make adjustments, as they learn more about the data and the modeling.
- True
- False
Question 20: Why should data scientists maintain continuous communication with business sponsors throughout a project?
- So that business sponsors can provide domain expertise.
- So that business sponsors can ensure the work remains on track to generate the intended solution.
- So that business sponsors can review intermediate findings.
- All of the above.
- None of the above.