Get to know Impala
Impala is a memory processing tool created at Cloudera. Which was newly donated to the Apache foundation. Impala uses a SQL syntax based language like Hive does.
In this demo i'm running Cloudera on a VM in Virtualbox. You can get one of the different environments to run Impala from the official site here.
Let's get down to some coding.
Views are created in Imapa like somewhat like an variable or or a wrapper for a longer statement. This makes a whole more sense in the video posted below:
Next we take a look at Partitioned tables in Impala. An partitioning is just what it sounds like to have different sets. We do this as preprocessing for faster data processing.
Create a partitioned table:CREATE TABLE partitioned_medi (hospitalname STRING, score FLOAT, city STRING, address STRING) PARTITIONED BY (year SMALLINT, month TINYINT);
Lets script the import from a table:SELECT DISTINCT concat('insert into partitioned_medi partition (hospitalname=',cast(hospitalname as string),', city=',cast(city as string),', address=',cast(address as string), ') select hospitalname, city, address from hospitalspend where city=',cast(city as string),';') AS command FROM hospitalspend;