-
Notifications
You must be signed in to change notification settings - Fork 1
gengyifeng/SciHive
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SciHive is implemented as a plugin for analyzing large scientific datasets in Hive. It currently supports NetCDF file format. Hncdump is ported as ncdump for NetCDF files in HDFS. It is also helpful to create a table according to the structure of the NetCDF file. Please run "./hncdump -h" to see the usage. Current tool is built on hive-0.10.0 and hadoop-1.0.4, if you are intereted in porting it to other hadoop or hive version. Please check the patch in v0.10.0 directory. Usage example of SciHive: 1. Create Table (The column name and type should be the same with those of the variable in the netcdf file. The command can be generated by hncdump) create external table AIPO2006(lon float,lat float,depth float,time float,ssh float,t float,s float,u float,v float) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.NcFileSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.contrib.fileformat.netcdf.NcFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' location '/user/yifeng/AIPO2006'; 2. Do queries select avg(t) from AIPO2006 where depth<=100 and and lon<60 and lat>30;
About
SciHive is a plugin for analyzing large NetCDF datasets in Hive.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published