by Caleb Jaffa

Archive for December, 2007

Importing Data to MySQL with Ruby

When I first started coding with Ruby regularly my code was not always The Ruby Way. One task I’ve come back to time and time again is importing data from a text file into a database. Though I can’t recall the specific code used, I believe my first attempt loaded the whole file into memory before running the INSERTstatements. This was needlessly expensive on the machine doing the processing and slow. I had started the process before leaving work and on my commute home puzzled out the better way to do it. The new version that read a line in, inserted the data in the database and then went on to the next line forgetting about the previous took much less time to execute and worked wonderfully.

Later I took over primary development of a large set of Ruby scripts that more or less managed importing and exporting data from MySQL using flat files. This system communicated with corporate headquarters and individual stores about everything, customers, orders, products, special pricing agreements, etc. Processing time varied on file size, but most files took a couple of minutes to process even on new hardware and the largest set of files, a potentially daily input of data, could take up to an hour to get through. Now there was some processing with this data to populate some fields and sometimes joins were necessary to set things up right.

However the client was looking to be able to take this process from a handful of sites to potentially hundreds. The system’s architecture was changed so we could throw more hardware to scale. I also looked for optimizations that could be made. It made sense to do things they way they were, Ruby read in a line of data, performed any calculations or string concatenations and then inserted the data in the database. There was something faster though. MySQL’s LOAD DATA INFILE was brought in to do all the heavy lifting of getting data from the flat files into the database. Then aSQL query could be run to calculate any fields that needed to be calculated. This cut processing time overall for the most common file from minutes to seconds. It was a good reminder of how usually if your database server can do it, it’ll probably do it faster than whatever language you’re working with.

Getting Things Done with OmniFocus

When I was employed as a web developer my tasks typically came as one or two major priorities for the day or week and small support/update requests that came in during the day. I worked at my position in the assembly line, taking proposals and designs and making the website. There wasn’t much need of organizing myself as all the immediate need tasks were laid out, part of the routine or sitting in my inbox.

Life as a freelancer is not so simple. There is typically no project coordinator, manager or boss. At least not one that is handling scheduling your time for all the various tasks you have to work on. I’ve heard of the getting things done approach before, but it was always overkill for my limited view of my pipeline. Being a freelancer I have a better view at the pipeline. Luckily for me The Omni Group is doing a public beta of their new personal task management program called OmniFocus It’s modelled on the getting things done principles. While I might not utilize all of the features, it has proven useful to be able to get a brain dump of what is on my plate now and getting it organized and done.

In the week of solid use it’s proven useful in scheduling my time and priorities. Before the end of the year I’ll be putting it through the paces as I’ve got a 30~40 hour project to be done alongside a few smaller projects.