NoSQL vs. column-store DBs

Large scale analysis solution must be able to provide very fast response time (<1sec) to some complex queries which are typically structured from a large fact table which is joined to many dimension tables and grouped-by them.

 

Using standard SQL databases such as MySQL InnoDB (as well as leading commercial vendors) will result in poor performance even when correct indexes are set up since in many case the database will be forced to perform unnecessary full table scans which will kill the performance if dimension tables are large.

 

The solution in this case would be to leverage a column-store database such as InfoBright or LucidDB (many other exist) which are storing and indexing the data columns rather than the data rows. Column-stores have good bulk-load performance and great query performance, which makes them ideal for analytic purposes using an R-OLAP engine such as Mondrian which are capable of translating multi-dimensional queries in MDX syntax to SQL queries. Their weakness is in poor update performance which is not a problem in the context of data-warehousing and analysis.

 

Column-stores are SQL complaint and differ from NoSQL data-stores (read more). Most NoSQL database are not suited to handle massive multi-dimensional kind of analysis but some projects such as Apache's Hive over Hadoop are targeting to further optimize NoSQL to support those needs.

 

 

 

Thank you for your interest!

We will contact you as soon as possible.

Send us a message

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com