What is the workflow or process of a data scientist? What tools do they use in data science workflows?
- Dr Dilek Celik
- Jun 25
- 1 min read
Updated: Jul 9

Note all the purple arrows pointing backward—the data science workflow is non-linear, iterative, and cyclical. You can’t know the best path from the start.
Each stage demands different skills and tools.
Functional stack:
- Stage 1: Ask a question (relevant to your organization) - Skills: scientific thinking, domain knowledge, curiosity, business sense - Tools: your brain, expert input, experience 
- Stage 2: Get the data - Skills: cleaning, querying, scraping, coding - Tools: SQL, Python, pandas, (Spark) 
- Stage 3: Explore the data - Skills: pattern recognition, hypothesis building - Tools: matplotlib, numpy, scipy, pandas, (Spark) 
- Stage 4: Model the data - Skills: regression, ML, validation - Tools: scikit-learn, pandas, (Spark, MLlib) 
- Stage 5: Communicate the data - Skills: storytelling, visuals, writing - Tools: matplotlib, Illustrator, PowerPoint 
- Stage 6: Implementation - Skills: product sense, communication, organizational savvy - This stage is crucial. Without pushing your work to implementation, you're just a consultant. You may not implement it alone—but it’s still your job to push it forward. 
- Stage 7: Test and measure impact - Skills: all previous - Did it work? Was it worth it? You’re best placed to answer. 
- Conclusion: Data science is a cyclical process: Ask, Get the Data, Explore, Model, Communicate, Implement, and Measure. - The strongest data scientists follow through all the way. 
- This framework stands out by stressing: the importance of asking meaningful questions and the need to revise those questions as you learn more. 
 
There are many tools out there—this is just one powerful stack.


Very informative.