I recently bought Hadoop: The Definitive Guide Second Edition from O’Reilly.
The book is pretty good and gives a good overview of Hadoop and its family of projects
- MapReduce (a distributed data processing model and execution environment that runs on large clusters of commodity machines)
- HDFS (is an abstraction layer for multiple filesystems and a distributed filesystem that runs on large clusters of commodity machines)
- Avro (an efficient serialization system for cross-language RPC, and persistent data storage)
- Pig (a data flow language and execution environment for exploring very large datasets)
- Hive (SQL like language which is translated to MapReduce jobs)
- HBase (a distributed column-oriented(family) database which uses HDFS)
- Zookeeper (a distributed, highly available coordination service)
- Sqoop (a tool for efficiently moving data between relational databases and HDFS)
I can see Hadoop being useful for certain scenarios but the datasets would have to be huge and growing quickly. A commodity server, running an RDBMS, these days can be 48 cores and 1TB of RAM. That’s pretty powerful. It also depends on how the data is to be used. If the data can be summarised / aggregated daily then this can be efficiently stored within an RDBMS.
The book does include some good case studies. If you are looking to learn about Hadoop and its family of projects then this is a good book.
Advertisement
January 10, 2012 at 1:14 am |
Also check the old videos on http://www.cloudera.com/resources/Video/
Lots of sales and conference stuff, but watch the last two videos in the list named “Hadoop Training”
January 10, 2012 at 4:30 pm |
Marc
Thanks.
Are you guys at Nestoria using Hadoop in production?
Regards
Rich
January 18, 2012 at 8:04 pm |
Thanks for providing book infomatio on Hadoop. hadoop can also be used olap and oltp processing.
Please click here to know more on basics Hadoop
why Hadoop is introduced