Tuesday, November 29, 2016

A Reason for Avoiding Programming

From someone in the process of becoming a data scientist. They had a question on regular expressions, which made almost no sense. It appears that the core concepts of ETL -- Extracting source data, Transforming it into a useful form and the Loading into some persistent storage for long-term analysis -- had not been embraced. It appears the design pattern was unknown. All I could gather from the sketchy email chain was that something involving regular expressions had become difficult.

I wrote this in response: Handling Irregular File Formats.

Here's part of the follow-up.

"I have been focusing on the math associated w/ math optimization. I have been using spreadsheets to perform the computations."

Really.

Spreadsheets.

The ETL pipeline question/rant/complaint was part of loading a spreadsheet?

That seems somehow wrong. There are real tools available that really do real data science work. The word "optimization" hints that scipy.optimize might be a more useful exercise than hacking around with spreadsheets.

Perhaps some advice from a real data scientist might help: http://www.becomingadatascientist.com