Berkeley DB

Post by **snailbrain** » Thu Apr 17, 2014 10:25 pm

phelix wrote:
snailbrain wrote:
I apologise if this is not relevant.. but something about incoming transfer of names in the qt is ringing some bells..
Hmm there should not have been any incoming transfers on that wallet... some ever pending ones, though.

i mean it need to check for them to put them in the gui? - am probably wrong

domob · Post by **domob** » Fri Apr 18, 2014 5:24 am

I think I've got a suspicion why syncing is so slow. I don't think it is related to BDB, since the blockchain actually isn't stored in BDB at all, but just "plain" files (the blk000?.dat ones). When verifying a block / transaction, obviously the client needs to check all the tx outs that are spent by it. Those are looked up from the blockchain files, since that's the only place where transactions are stored (except the wallet for local transactions). Thus, each block from which some transaction uses an output has to be read from disk - and this produces lots of random accesses to the blockchain files. It seems almost "obvious" that this should lead to horrible performance. Right? Even though this is somewhat trivial, I was not really aware of it like this (had never thought about the precise way how these things are done before).

What we can do about it: We could try to parse a new block first, looking for all the transactions that will be necessary to verify it. Then look them up in order to at least reduce cache misses a bit. However, I'm not sure if this is the way we should go. Instead, I suggest that we keep track of the full UTXO set either in memory or at least the blkindex.dat BDB file, including the full tx outs. This way, it should be possible to verify a transaction or block without ever looking into the blockchain files - just from the UTXO set. This is just guesswork, but I believe it could/would speed up syncing drastically. Anything I'm missing here? I don't think that it would actually be so hard to do - and will give it a try when I come around to it. (But it will probably need careful testing in order to make sure there are no bugs introduced!)

crosser · Post by **crosser** » Fri Apr 18, 2014 7:30 am

domob wrote:each block from which some transaction uses an output has to be read from disk - and this produces lots of random accesses to the blockchain files.

I do not know if this theory is correct, but if it is, then the problem is common to namecoin and bitcoin. If so, it would make sense to address "upstream", i.e. in the bitcoin codebase.

Just my two namecents.

domob · Post by **domob** » Fri Apr 18, 2014 8:39 am

crosser wrote:
domob wrote:each block from which some transaction uses an output has to be read from disk - and this produces lots of random accesses to the blockchain files.
I do not know if this theory is correct, but if it is, then the problem is common to namecoin and bitcoin. If so, it would make sense to address "upstream", i.e. in the bitcoin codebase.

Not entirely, since Bitcoin stores the blockchain no longer in the unstructured files but instead in a LevelDB. This possibly takes care of proper caching. (But I don't know the respective pieces of the Bitcoin code.) This makes the problem less severe than for Namecoin, so that it is acceptable at the moment for most users.

However, I believe that UTXO ideas have never been implemented in Bitcoin so far, even though they constantly flow around in discussions since a few years. Of course, the same things could be done for Bitcoin as well (if the devs like it). This is no longer really "upstream", though, as the code bases have diverged particularly in this respect (with Bitcoin using LevelDB now).

Post by **phelix** » Fri Apr 18, 2014 6:23 pm

Hmm dbcache=1500 things should not be limited by random disk access too much. Also on SSD systems things are somehow slowed down with GUI and a large wallet.

I will take look at that mutex locking issue.

domob · Post by **domob** » Sat Apr 19, 2014 7:12 am

phelix wrote:Hmm dbcache=1500 things should not be limited by random disk access too much. Also on SSD systems things are somehow slowed down with GUI and a large wallet.

I will take look at that mutex locking issue.

Have you actually tried this and does dbcache=1500 remove the disk accesses? Does it give good performance on a non-SSD system? Note that, as mentioned above, the blockchain itself is not stored in BDB. Thus also dbcache should not have an effect on it. Instead, for each tx output that is checked in ConnectInputs, the respective tx is loaded from disk (as far as I can tell from the code). While the dbcache will help with accesses to blkindex.dat, I doubt that it actually speeds up transaction verification. This is why I propose to move the full UTXO set to a BDB file, not just the tx index.

Post by **phelix** » Wed Apr 23, 2014 12:44 pm

domob wrote:
phelix wrote:Hmm dbcache=1500 things should not be limited by random disk access too much. Also on SSD systems things are somehow slowed down with GUI and a large wallet.

I will take look at that mutex locking issue.
Have you actually tried this and does dbcache=1500 remove the disk accesses? Does it give good performance on a non-SSD system? Note that, as mentioned above, the blockchain itself is not stored in BDB. Thus also dbcache should not have an effect on it. Instead, for each tx output that is checked in ConnectInputs, the respective tx is loaded from disk (as far as I can tell from the code). While the dbcache will help with accesses to blkindex.dat, I doubt that it actually speeds up transaction verification. This is why I propose to move the full UTXO set to a BDB file, not just the tx index.

Back then I merged a patch from Bitcoin to allow to change the dbcache size. I am quite certain that it helped considerably on the non-SSD system I tested it on. Other optimizations where about less flushing of the db files.

With reading from a non-BDB file should not the system cache take care of the random reads (at least at initial blockchain download and given enough RAM).

I never looked at the flushing of the blkxxxx.dat files:

Code: Select all

        // Flush stdio buffers and commit to disk before returning
        fflush(fileout);
        if (!IsInitialBlockDownload() || (nBestHeight+1) % 500 == 0)
        {
#ifdef __WXMSW__
            _commit(_fileno(fileout));
#else
            fsync(fileno(fileout));
#endif
        }

Anyway, the issue I see on my machine certainly is GUI related. I could not yet figure out how to get around it.

domob · Post by **domob** » Thu Apr 24, 2014 8:34 am

phelix wrote:With reading from a non-BDB file should not the system cache take care of the random reads (at least at initial blockchain download and given enough RAM).

In theory, probably. But for Huntercoin, it makes an order of magnitude difference to move the blockchain directory to a RAM disk before syncing, and back after that. I've not explicitly tried it out with Namecoin, but it is probably the same.

phelix wrote:Anyway, the issue I see on my machine certainly is GUI related. I could not yet figure out how to get around it.

Have you tried the locking patch? Does it help? For me, syncing the Namecoin blockchain always was somewhat painful and slow even though I never run the Namecoin-UI. Just running Namecoin in the background (and syncing a day's worth of blocks or so) is usually fine, though.

Post by **snailbrain** » Thu Apr 24, 2014 9:02 am

domob wrote:
phelix wrote:With reading from a non-BDB file should not the system cache take care of the random reads (at least at initial blockchain download and given enough RAM).
In theory, probably. But for Huntercoin, it makes an order of magnitude difference to move the blockchain directory to a RAM disk before syncing, and back after that. I've not explicitly tried it out with Namecoin, but it is probably the same.

phelix wrote:Anyway, the issue I see on my machine certainly is GUI related. I could not yet figure out how to get around it.
Have you tried the locking patch? Does it help? For me, syncing the Namecoin blockchain always was somewhat painful and slow even though I never run the Namecoin-UI. Just running Namecoin in the background (and syncing a day's worth of blocks or so) is usually fine, though.

with mechanical harddisk:
the namecoin daemon is pretty much the same as the qt for syncing - at least on windows... it always goes into none responding state and hard drive is going hard (permanently)
QT seems marginally slower... but you notice not responding a lot more.
On linux you don't notice as much. probably due to less fragmentation and you don't get not responding.. although qt can become unresponsive during sync. Still get insane hd activity though.

Post by **phelix** » Fri Apr 25, 2014 2:31 pm

GUI + large Wallet (with 200+ name_ops) = ~ 10-100 times slower

Unfortunately I could not really narrow it down to a particular function. Trying to profile it I see that it is wasting a lot of time locking and unlocking memory (but I could not get it to display function names).

I guess it is only a small problem as long as you are aware of it.

Namecoin Forum

Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB

Re: Berkeley DB