Page 1 of 1

Random questions about DB

Posted: Tue May 06, 2014 7:11 pm
by indolering
I'm trying not to bug Domob and Micheal with stupid questions about the databases on the developer mailing list, so I figured I would reveal my ignorance to the forum and ask some silly questions about the database backends.

Full disclaimer: I've programmed using NoSQL and relational databases but I really don't understand the internal mechanics of databases nor Bitcoin beyond basic functional usage. Anyway, back to the (side)show.

I'm worried about having to port this stuff to a cross-browser, client-side database for Speech.js. Right now, I use the Namecoin RPC interface to push new entries into CouchDB. I use CouchDB because it could easily scale to millions of clients and because PouchDB rolls CouchDB's amazing synchronization capabilities into a cross-browser database abstraction.

Right now, Libcoin relies on SQLite, Namecoin relies on BerkeleyDB, and the mainline Bitcoin is using LevelDB.

SQLite makes sense for Kraken as relational databases were explicitly designed for financial transactions. However, SQLite was meant for small databases but it's doubling the size of the Bitcoin blockchain (~30Gigs ATM). Perhaps they just sync their Bitcoin instances with a more mature SQL database for server transactions? Why wouldn't Kraken switch to libmysqld at this point?

Furthermore, SQLite has been giving Domob headaches and it's likely that they will keel HunterCoin on BerkeleyDB. Bitcoin switched to LevelDB but LevelDB is also tuned for small databases.

So, for my most clueless question: what about building a storage abstraction layer? I know that this is more work than just switching to any single database, but I know of a few OSS projects with them.

Re: Random questions about DB

Posted: Tue May 06, 2014 8:09 pm
by John Kenney
SQLite is very mature, it's the most widely deployed database engine in the world, can handle large databases efficiently & is more reliable than MySQL. MySQL is a poor choice for anything financial really, or anything where data corruption could be bad. If you're going to use MySQL at all now then it should be the MariaDB fork rather than Oracle's version, but I don't think it'd be suitable, it's nice for websites that don't store anything of much importance but that's about it's limit. Originally SQLite was meant for small databases, but that was many years ago, it works great with large databases too as long as they're designed properly, the 'Lite' part of the name is just the memory/code footprint, nothing to do with the size of databases. It lacks features found in some larger databases like postgres, but you probably don't need those features anyway.

There might already be some abstraction layers that are free to use & suitable. They may lower efficiency slightly. They're usually used when a user is expected to provide their own database server & it could be one of a few different ones. It's probably easier/better to stick with SQLite.

Re: Random questions about DB

Posted: Wed May 07, 2014 6:09 am
by domob
I'm also not really an expert about databases, but I'll try to contribute some impressions about libcoin vs Namecoin/Bitcoin: The major difference is that libcoin uses a relational database, which means that blocks, transactions and all that are "rewritten" in order to fit into the table structure. For BDB (and I guess that also LevelDB works this way, but this is just an uneducated guess), it is completely different: BDB basically provides a persistent (and robust against data corruption / transaction rollback) binary tree to do name-value lookups. (Namecoin anyone? :D) In other words, you can look up, for instance, entries via tx id or name - but only with a single "key" and not "arbitrary" queries. Entries are just binary blobs, which are serialised using the same mechanisms that are used for network transmission and hashing internally.

My gut feeling is that the latter allows a more compact and optimised storage / retrieval since it is manually which indices you want to create and how to format data for storage. But I doubt that it has an influence in practice with all the work being done on SQLite and other relational engines by people who probably know a lot more about that than I do.

In principle, there exists already a kind of "database abstraction layer" in the code: The class CDB (db.h) is a wrapper around BDB, and we could probably replace BDB by something else with relatively little effort just rewriting db.h and db.cpp.

Also note that I believe there's a lot of further optimisation potential (after my current blkindex patch is merged) in the current storage format even before we start pruning the block chain (no matter which backend DB we use).

Re: Random questions about DB

Posted: Fri May 16, 2014 4:28 am
by John Kenney
I'm only familiar with sql relational databases & no expert. I know little about current namecoin internals. Relational databases can be very efficient if the table structure is designed properly to minimise or eliminate redundancy. Postgres is possibly worth considering, it recently got a new binary json storage format.