Thursday, July 17, 2014

New Focus: Data Scientist

Read this: http://www.forbes.com/sites/emc/2014/06/26/the-hottest-jobs-in-it-training-tomorrows-data-scientists/

Interesting subject areas: Statistics, Machine Learning, Algorithms.

I've had questions about data science from folks who (somehow) felt that calculus and differential equations were important parts of data science. I couldn't figure out how they decided that diffeq's were important. Their weird focus on calculus didn't seem to involve using any data. Odd: wanting to be a data scientist, but being unable to collect actual data.

Folks involved in data science seem to think otherwise. Calculus appears to be a side-issue at best.

I can see that statistics are clearly important for data science. Correlation and regression-based models appear to be really useful. I think, perhaps, that these are the lynch-pins of much data science. Use a sample to develop a model, confirm it over successive samples, then apply it to the population as a whole.

Algorithms become important because doing dumb statistical processing on large data sets can often prove to be intractable. Computing the median of a very large set of data can be essentially impossible if the only algorithm you know is to sort the data and find the middle-most item.

Machine learning and pattern detection may be relevant for deducing a model that offers some predictive power. Personally, I've never worked with this. I've only worked with actuaries and other quants who have a model they want to confirm (or deny or improve.)