InfiniDB: The Good, The Bad and the Ugly :: Tholis Consulting

来源:百度文库 编辑:神马文学网 时间:2024/04/30 00:02:44

InfiniDB: The Good, The Bad and the Ugly

2010-04-26 11:40

Last week the BBBTwas visited by startup database company Calpont, although 'startup'hardly fits the bill considering the fact that the company was foundedin 2000. I won't go into the company nor the BBBT session details; youcan find the last one here.What I do want to blog about is the product itself. Infinidb comes inboth a Community Edition (CE) and an Enterprise Edition (EE), where thelatter contains MPP capabilities, monitoring tools and differentsupport options. For more details, visit the comparison page. Since I don't have an MPP cluster available I took the CE for a spin and thought I'd share the results.

The Good

 Asyou might have noticed on the edition comparison page, the core serverfeatures are the same for both CE and EE. That's good news compared todirect competitor Infobright. Infobright has stripped out DML(insert/update/delete) capabilities from the Community Edition. Somepeople would call this 'cripple ware' as the only way to update adatawarehouse table is to drop, recreate and reload it. 1-0for InfiniDB already.

More on the good side: the installationprocess. The product installs in minutes and is only a 12.7 MBdownload. I downloaded the 64bit RPM version which requires two steps:extract the zip file and run the command rpm -i Calpont*.rpm as root. This will install the software in the default location /usr/local/Calpont. Then invoke the script  /usr/local/Calpont/bin/install-infinidb.sh, and configure the InfiniDB Aliases with . /usr/local/Calpont/bin/calpontAliasand you're good to go. In my case: almost good to go since the datadirectories are now under /user/local/Calpont which is not my 12 SDDdisk raid set. Simply mounting the data device to/usr/local/Calpont/data1 solves that problem too. Alternatively, youcan of course move the data1 directory to a different location andcreate a symlink in the original spot. The command service infinidb start fires up the database engine and invoking idbmysqlgives you command line access to the database. Remember, it's all MySQLso if you're familiar with that, working with InfiniDB is a breeze.

The MySQL thing is another piece of thegoodies: all front end tools, including the MySQL workbench (querybrowser, admin console) can be used with InfiniDB as well. The sameJDBC drivers you already have for MySQL can be used with InfiniDB aswell. The only difference when creating a new table is the fact thatyou should specify InfiniDB as the engine, but that's about it.

The last item I'd like to mention here is thebulk loader. The TPC-H benchmark consists of 22 queries and thedatabase contains 8 tables. The data files can be created using thedbgen utility and will generate pipe delimited text files with a .tblextension. The default settings InfiniDB uses for bulk loading textfiles are also the pipe delimiter and a .tbl extension, what aconvenience!  Other than that, the file names have to be named exactlyas the tables you want to load (so customer.tbl is data for the tablecustomer) and placed in the Calpont data import directory. Invoking thecommand colxml will create a bulkloader import job based on these files and the meta data from thetables in the database. To start the import, simply run the cpimportcommand and InfiniDB starts loading. This may sound more complex thanit actually is, so trust me, it couldn't be simpler. Or faster, forthat matter: loading the 100GB data set took only about 25 minutes, anew record on my machine!

The Bad

There's actually only one 'bad' thing about thecurrent version of InfiniDB: it's not finished yet. Yes, it works, it'sfast (more about that later), but there are still a couple of seriouslimitations. The most notable of these, at least when running a TPC-Hbenchmark, is the support for subqueries. Version 1.1.0 alpha didn'tsupport any form of subquery, so even a select * from table wherecolumn in (select othercolumn from othertable) couldn't be run. Version1.1.1 alpha, released on April 23, solved this last one, but morecomplex subquery constructs or correlated subqueries are not yetsupported. The upgrade from 1.1.0 to 1.1.1 enabled InfiniDB to complete10 of the 22 TPC-H queries, instead of the only 5 it could run a weekago. But, as I've said, this should only be a temporary problem. The roadmapshows that in a month or so, phase two of the subquery support shouldbe available in the next alpha release, with GA(General Availability) for version 1.1. set at early July. By then wecan have a look at the complete run and see how it behaves, also whenmultiple threads are running in parallel.

The Ugly

Calpont uses the 'no indexes needed' as one ofthe key benefits of the product; I tend to disagree on that one. It'snice that you don't need to explicitely specify indexes, but when aDBMS doesn't support any constraints AT ALL, well, that's plain ugly.Want to enforce a NOT NULL contraint? Bad luck. Primary/foreign keyrelationships? Ditto. You could argue that these features are notreally mandatory in a data-warehouse, but without constraint and indexsupport all the constraint enforcement must be built into the ETLprocess.

The $64,000 question...

There is actually only one single reason whyanyone would want to use a column store like InfiniDB in the firstplace: performance! So the main question is: does it deliver? Yes, itdoes. Compared to MySQL the performance improvement is no less thanspectacular. In fact, to date nobody has been brave (or patient) enoughto try an SF100 TPC-H on MySQL so a direct comparison is not evenavailable. There are however plenty of other comparisons that can bemade. The 10 queries that do run already all outperform Greenplum single node edition(except for query11) for instance. Some queries are somewhat faster(Q10, Q12, Q18), some are 3-4 times faster (Q1, Q3, Q4, Q14, Q16), andquery 6 is more than 20 times faster. For a disk based analyticaldatabase (InfiniDB doesn't seem to take as much advantage of memory asother products I evaluated) it's really, really fast. Query 1 is alwaysa good indicator since it forces a full table scan on the largest facttable (600 million rows in this case). If you can do this in under aminute on my moderate hardware, you do have a potential winner.

Conclusion

My initial thoughts about InfiniDB when I firsttested it weren't very positive, to say the least. But, given the factthat they are moving in the right direction and have kept theirpromised delivery dates so far, combined with the ease of installation,ease of use (it's all MySQL) and of course the already greatperformance, a second look is certainly warranted. Given thelimitations of the direct competitors (Kickfire with its proprietaryhardware, Infobright with its crippled community edition and lack ofMPP/scale out capabilities), InfiniDB should be on the top of yourshortlist when looking for a MySQL based data-warehouse solution.