Have a Need for Speed? Use Cloudera Impala for Real-Time Hadoop Queries

Old school MapReduce (MR) has been in widespread production use for several years now, and it certainly has both raving fans as well as detractors. Two of the most common and valid complaints the critics have is that MR can be both difficult to use (it requires some programming expertise) and that MR jobs take too long to complete. Although solutions like Pig and Hive have long helped make MR more accessible to non-developers, only recently has the community been able to achieve faster run times. With Impala, Cloudera addressed both of these concerns in one product: a simple-to-use engine that sits on top of your existing Hadoop cluster that can now return query results up to 70x faster.

Read more

Becoming a Rock Star Hadoop Administrator, Part 2

In part 1, we discussed the challenge of administering the Hadoop platform for admin newbies. We also reviewed the rationale and advantages of leveraging commercial Hadoop distributions to manage an evolving Hadoop platform. Today, we’ll look at HDFS, MapReduce, and security best practices.

Read more

Becoming a Rock Star Hadoop Administrator, Part 1

Hadoop is just storage and computing, so administering a Hadoop cluster should be a breeze, right? Well… not necessarily. When we’re talking about Hadoop, we’re talking about a fast-moving open source project that covers many disciplines and requires deep understanding of Linux, Java, and other ecosystem projects with funny names like ZooKeeper, Flume, and Sqoop. Fear not, in these posts, we hope to help you on your journey to becoming a rock star Hadoop administrator.

Read more