Hive Functions

Tajo provides a feature to use Hive functions directly without re-compilation or additional code.

Configuration

Only thing to do is registering path to a directory for jar files containing your hive functions. You can do this by set tajo.function.hive.jar-dir in tajo-site.xml like the following.

<property>
  <name>tajo.function.hive.jar-dir</name>
  <value>/path/to/hive/function/jar</value>
</property>

Note

The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.

Warning

The path must point to a directory, not a file. And multiple directory entries are not allowed. However, it is possible to load multiple jar files.

Using in detail

Function Name

Tajo reads hive functions override org.apache.hadoop.hive.ql.exec.UDF class. Function name is used as specified in @Description annotation. If it doesn’t exist, Tajo uses full qualified class name as function name. For example, it can be like this : select com_example_hive_udf_myupper('abcd'), so it is recommended to use Description annotation.

Additionally if some function signature duplicate occurs, it may throw AmbiguousFunctionException.

Parameter type / Return type

Hive uses Writable type of Hadoop in functions, but Tajo uses its internal Datum type. So only some Writable types are supported currently by internal converting. They are listed below.

Writable Tajo Type
ByteWritable INT1
ShortWritable INT2
IntWritable INT4
LongWritable INT8
FloatWritable FLOAT4
DoubleWritable FLOAT8
Text TEXT
BytesWritable VARBINARY
DateWritable(*) DATE
TimestampWritable(*) TIMESTAMP
HiveCharWritable(*) CHAR

Note

(*) They are in org.apache.hadoop.hive.serde2.io package, others are in org.apache.hadoop.io package.

Limitation

1. Currently, Hive UDAF is not supported. Old UDAF interface is deprecated in Hive, and new GenericUDAF interface cannot be applied because of function design difference between Tajo and Hive. For same reason, new GenericUDF functions are not supported in Tajo.

2. Because HDFS directory is not supported, Hive UDF jar files should be copied to each worker directory and each path should be specified in tajo-site.xml.