Tajo - Big Data Warehouse System on Hadoop
Tajo is a distributed data warehouse system on Hadoop that provides low-latency and scalable ad-hoc queries and ETL on large-data sets stored on HDFS and other data sources.
- Scalability and low latency
- Fully distributed SQL query processing on large data sets stored in HDFS and other data sources
- Very low response time (100 msec ~) against simple queries (e.g., just aggregation or small-large join) on reasonable data size
- Long running query support
- Fault tolerance support that avoids query restart when some tasks are failed.
- Dynamic scheduling support that handles struggling and heterogeneous cluster nodes
- ETL features that transform one data format to another data format
- Various file formats support, such as CSV, RCFile, and RowFile (a row store file)
- User-defined function support
- Scanner/Appender interface for custom file formats
- ANSI/ISO SQL standard compliance and PostgreSQL compliance for non-standard parts
- HiveQL mode support
- Tables access in HCatalog and Hive MetaStore
- JDBC driver support
- Interactive shell to allow users to submit SQL queries to Tajo clusters
- Backup/Restore utility
- Asynchronous/Synchronous Java API to enable clients to submit SQL queries to Tajo clusters
- [2013-11-20] Tajo 0.2.0-incubating Released. Now available for download!
- [2013-10-15] Tajo was presented at Bay Area Hadoop User Group Special Event.
- [2013-10-15] Tajo was introduced at Deview 2013.
- [2013-05-27] Two projects were accepted to Google Summer of Code 2013.
- [2013-04-09] Tajo was demonstrated at IEEE ICDE 2013.
- [2013-03-07] Tajo Project enters incubation.
- [2012-10-15] A demonstration paper of Tajo was accepted to IEEE ICDE 2013.
Apache Tajo is an effort undergoing incubation at The Apache Software Foundation (ASF) sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.