Remember Data Modeling?
(Nguyen 2009)
How to become better in data modeling?
“We’ve seen large organizations hire 30+ PhDs without clear business alignment upfront. They then emerge from a six week research hole only to realize they had misunderstood the target variable, rendering the analysis irrelevant.”
(Hotz 2024b)
Introducing:
Cross-industry standard process for data mining
CRISP-DM
CRISP-DM is
CRISP-DM is not
CRISP-DM can easily be combined with management frameworks.
And it is also easily adaptable to AI-related projects. (Saltz 2024)
CRISP-DM identifies six phases
Each of the six phases is further broken down into a list of tasks.
Leaper (2009)
We will step through some key points here.
For further reading, please refer to online material
(Chapman et al.
1999; Martínez-Plumed et al. 2021; Hotz 2024c).
You should first “thoroughly understand, from a business perspective, what the customer really wants to accomplish.”
Chapman et al. (1999)
Business Understanding leads to definition of
business
success criteria.
Focus shifts to identification, collection, and analyzation of data sets.
Data Preparation will take up 80% of time spent on a project.
What is meant by CRISP-DM’s modeling phase?
This does not mean data modeling
like outlined in the
presentation about data modeling!
Do the models meet business success criteria?
Common Data Science KPI Groups
Category | Question | Example |
---|---|---|
Traditional metrics | How are we performing relative to plan? | Time, budget, and scope variance to plan |
Agile metrics | How frequently are we providing value? |
Velocity metrics Cycle times |
Lean metrics | What percent of our time is value-add? | Effeciency |
Financial metrics | Are we creating organizational financial value? | Revenue and cost metrics, payback period, ROI, NPV |
Organizational goals | Is my project impacting organizational goals? | Varies widely |
Artifact creation | Are we creating re-useable artifacts? | Number / value of artifacts created |
Competencies gained | Are team members gaining valuable skillsets? | Number / value of competencies gained |
Stakeholder satisfaction | Are my project stakeholders satisfied? | Net promoter score; “gut feel” assessment |
Software metrics | What is the quality of the overall system being developed? | Defect count, defect resolution rate, latency, test coverage |
Model performance | How are the models performing? | RMSE, F1, recall, precision, ROC, p-value |
(Hotz 2024a)
Common Data Science KPI Groups
Anything missing on this list?
Category | Question | Example |
---|---|---|
Traditional metrics | How are we performing relative to plan? | Time, budget, and scope variance to plan |
Agile metrics | How frequently are we providing value? |
Velocity metrics Cycle times |
Lean metrics | What percent of our time is value-add? | Effeciency |
Financial metrics | Are we creating organizational financial value? | Revenue and cost metrics, payback period, ROI, NPV |
Organizational goals | Is my project impacting organizational goals? | Varies widely |
Artifact creation | Are we creating re-useable artifacts? | Number / value of artifacts created |
Competencies gained | Are team members gaining valuable skillsets? | Number / value of competencies gained |
Stakeholder satisfaction | Are my project stakeholders satisfied? | Net promoter score; “gut feel” assessment |
Software metrics | What is the quality of the overall system being developed? | Defect count, defect resolution rate, latency, test coverage |
Model performance | How are the models performing? | RMSE, F1, recall, precision, ROC, p-value |
A common pitfall in all data-related projects:
Data does not fit the question.
Data and training models are not good
at things that haven’t
happened before!