What is the workflow or process of a data scientist? What tools do they use in data science workflows?
- Dr Dilek Celik
- Jun 25
- 1 min read
Updated: Jul 9

Note all the purple arrows pointing backward—the data science workflow is non-linear, iterative, and cyclical. You can’t know the best path from the start.
Each stage demands different skills and tools.
Functional stack:
Stage 1: Ask a question (relevant to your organization)
Skills: scientific thinking, domain knowledge, curiosity, business sense
Tools: your brain, expert input, experience
Stage 2: Get the data
Skills: cleaning, querying, scraping, coding
Tools: SQL, Python, pandas, (Spark)
Stage 3: Explore the data
Skills: pattern recognition, hypothesis building
Tools: matplotlib, numpy, scipy, pandas, (Spark)
Stage 4: Model the data
Skills: regression, ML, validation
Tools: scikit-learn, pandas, (Spark, MLlib)
Stage 5: Communicate the data
Skills: storytelling, visuals, writing
Tools: matplotlib, Illustrator, PowerPoint
Stage 6: Implementation
Skills: product sense, communication, organizational savvy
This stage is crucial. Without pushing your work to implementation, you're just a consultant. You may not implement it alone—but it’s still your job to push it forward.
Stage 7: Test and measure impact
Skills: all previous
Did it work? Was it worth it? You’re best placed to answer.
Conclusion: Data science is a cyclical process: Ask, Get the Data, Explore, Model, Communicate, Implement, and Measure.
The strongest data scientists follow through all the way.
This framework stands out by stressing: the importance of asking meaningful questions and the need to revise those questions as you learn more.
There are many tools out there—this is just one powerful stack.

Very informative.