Why data science takes so long



Thursday March 14, 2024


i think I finally figured out why data science tasks take so long: everything is either a spike or a bug.

Imagine you’re building a function to compute a histogram. Well, simple: sort the data, bin it. Return the bins. Write a unit test to see if the inputs match the outputs.

Now imagine using that: why are the data skewed? Bimodal? What’s up with that outlier all the way on the tail? Who collected this data anyway? Oh, I did?! What’d I mess up in my data processing??

The easy stuff is writing the code for the hyperparameter tuning or model fit, etc. the hard stuff is…the data.


#datascience #engineering #agile


Bryan lives somewhere at the intersection of faith, fatherhood, and futurism and writes about tech, books, Christianity, gratitude, and whatever’s on his mind. If you liked reading, perhaps you’ll also like subscribing: