Apache Tajo - An open source big data warehouse system in Hadoop
The main goal of Apache Tajo project is to build an advanced open source data warehouse system in Hadoop for processing web-scale data sets
- Interactive and Batch Queries
- Fully distributed SQL query processing on large data sets stored in HDFS and other data sources
- Very low response time (100 msec ~) against simple queries (e.g., just aggregation or small-large join) on reasonable data size
- Long running query support
- Fault tolerance support that avoids query restart when some tasks are failed.
- Dynamic scheduling support that handles struggling and heterogeneous cluster nodes
- Query Optimization
- Cost-based optimization for bushy join trees
- Progressive query optimization for reoptimizing running queries
- ETL features that transform one data format to another data format
- Various file formats support, such as CSV, RCFile, and RowFile (a row store file)
- User-defined function support
- Scanner/Appender interface for custom file formats
- ANSI/ISO SQL standard compliance and PostgreSQL compliance for non-standard parts
- HiveQL mode support
- Tables access in HCatalog and Hive MetaStore
- JDBC driver support
- Interactive shell to allow users to submit SQL queries to Tajo clusters
- Backup/Restore utility
- Asynchronous/Synchronous Java API to enable clients to submit SQL queries to Tajo clusters
- [2014-02-27] Keuntae Park was invited to ApacheConf 2014.
- [2014-01-02] Keuntae Park was invited to become a new committer.
- [2013-11-20] Tajo 0.2.0-incubating Released. Now available for download!
- [2013-10-15] Tajo was presented at Bay Area Hadoop User Group - LinkedIn Special Event.
- [2013-10-15] Tajo was introduced at Deview 2013.
- [2013-05-27] Two projects were accepted to Google Summer of Code 2013.
- [2013-04-09] Tajo was demonstrated at IEEE ICDE 2013.
- [2013-03-07] Tajo Project enters incubation.
- [2012-10-15] A demonstration paper of Tajo was accepted to IEEE ICDE 2013.
Apache Tajo is an effort undergoing incubation at The Apache Software Foundation (ASF) sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.