创建并导入表单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
hive> create table info(id int,name string,sal int) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.388 seconds

hive> load data inpath '/info.txt' overwrite into table info;
Loading data to table hive.info
OK
Time taken: 0.194 seconds

hive> select * from info;
OK
1 hu 2000
2 ya 7500
3 dong 1765
4 win 8000
5 goo 9000
6 app 65000
Time taken: 0.061 seconds, Fetched: 6 row(s)

查询(select)

  • 查询条目数:select count(*) from info;
  • 获取总薪水:select sum(sal) from info;
  • 获取薪水最高的3个ID与对应薪水:’select id,sal from info order by sal desc limit 3;’
  • 获取大部分人的薪水状况:select sal,count(*) as sum from info group by sal order by sum desc;————以薪水为组别按照出现次数降序排序
  • 以2000为区间统计薪水分布区间:select concat(floor(sal/2000)*2000,'-',(floor(sal/2000)+1)*2000),count(*) from info group by floor(sal/2000);————对sal除以2000向下取整作为分组依据,可以取得一定范围内的数值,例如floor(3000/2000)=1;floor(3600/2000)=1。
  • 获取薪水大于8000的员工的姓名:hive> select name,sal from info where sal>8000;
  • 获取每个ID的总工资:select id,sum(sal) from info group by id;
  • 获取每个ID的平均工资并按照工资降序排列:select id,avg(sal) as avg_sal from info group by id order by avg_sal desc;

表单导出

1
insert overwrite local directory '/root/' row format delimited fields terminated by '\t' select count(*) from loan

其他

  • 区间分析:
    1
    2
    3
    4
    5
    6
    7
    select count(case when length(text) between 0 and 20 then 1 end) as text1,
    count(case when length(text) between 21 and 40 then 1 end) as text2,
    count(case when length(text) between 41 and 60 then 1 end) as text3,
    count(case when length(text) between 61 and 80 then 1 end) as text4,
    count(case when length(text) between 81 and 100 then 1 end) as text5,
    count(case when length(text) > 100 then 1 end) as text6
    from table;

参考文档

Hive SQL 分区间统计问题