When should I use Hadoop

Chris has written a good blog post Don’t use Hadoop – your data isn’t that big.

I agree with much of what Chris says although I am not sure even data around 5TB warrants using Hadoop. I wrote a short blog post on Hadoop a while back.

Elasticera processes hundreds of GBs of raw log files every month. Even on a 3 year old desktop PC with a single SATA drive we are able to grep across a months worth of log files in a few minutes.

Due to the way that we process log files we are able to report on multiple TBs of data in seconds using our reporting UI.

We don’t have a single Hadoop instance in sight.

Bulk importing large log files into an RDBMS such that for every log file entry there is a corresponding record is always going to be painfully slow and resource intensive. Think about your data and how it is to be queried and there is likely to be a simple solution.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: