Archive for January, 2012

eCommerce UK Linkedin Group M-Commerce Pecha Kucha Video

January 22, 2012

Back in December I presented at the eCommerce UK Linkedin Group M-Commerce Pecha Kucha event.

The videos from the event can be found here…

http://www.youtube.com/watch?v=aZI901waR8U

The slides for the event can be found here.

http://www.slideshare.net/MartinNewman/pecha-kucha-presentations

My presentation starts at slide 114

Hadoop

January 9, 2012

I recently bought Hadoop: The Definitive Guide Second Edition from O’Reilly.

The book is pretty good and gives a good overview of Hadoop and its family of projects

  • MapReduce (a distributed data processing model and execution environment that runs on large clusters of commodity machines)
  • HDFS (is an abstraction layer for multiple filesystems and a distributed filesystem that runs on large clusters of commodity machines)
  • Avro (an efficient serialization system for cross-language RPC, and persistent data storage)
  • Pig (a data flow language and execution environment for exploring very large datasets)
  • Hive (SQL like language which is translated to MapReduce jobs)
  • HBase (a distributed column-oriented(family) database which uses HDFS)
  • Zookeeper (a distributed, highly available coordination service)
  • Sqoop (a tool for efficiently moving data between relational databases and HDFS)

I can see Hadoop being useful for certain scenarios but the datasets would have to be huge and growing quickly. A commodity server, running an RDBMS, these days can be 48 cores and 1TB of RAM. That’s pretty powerful. It also depends on how the data is to be used. If the data can be summarised / aggregated daily then this can be efficiently stored within an RDBMS.

The book does include some good case studies. If you are looking to learn about Hadoop and its family of projects then this is a good book.