Skip to content

SciHive is a plugin for analyzing large NetCDF datasets in Hive.

Notifications You must be signed in to change notification settings

gengyifeng/SciHive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

SciHive is implemented as a plugin for analyzing large scientific datasets in Hive. It currently supports NetCDF file format.

Hncdump is ported as ncdump for NetCDF files in HDFS. It is also helpful to create a table according to the structure of the NetCDF file. Please run "./hncdump -h" to see the usage.

Current tool is built on hive-0.10.0 and hadoop-1.0.4, if you are intereted in porting it to other hadoop or hive version. Please check 
the patch in v0.10.0 directory.

Usage example of SciHive:

1. Create Table (The column name and type should be the same with those of the variable in the netcdf file. The command can be generated by hncdump)
create external table AIPO2006(lon float,lat float,depth float,time float,ssh float,t float,s float,u float,v float) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.NcFileSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.contrib.fileformat.netcdf.NcFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' location '/user/yifeng/AIPO2006';

2. Do queries
select avg(t) from AIPO2006 where depth<=100 and and lon<60 and lat>30;

About

SciHive is a plugin for analyzing large NetCDF datasets in Hive.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages