Page 2 of 3
Re: Berkeley DB
Posted: Thu Apr 17, 2014 10:25 pm
by snailbrain
phelix wrote:snailbrain wrote:
I apologise if this is not relevant.. but something about incoming transfer of names in the qt is ringing some bells..
Hmm there should not have been any incoming transfers on that wallet... some ever pending ones, though.
i mean it need to check for them to put them in the gui? - am probably wrong
Re: Berkeley DB
Posted: Fri Apr 18, 2014 5:24 am
by domob
I think I've got a suspicion why syncing is so slow. I don't think it is related to BDB, since the blockchain actually isn't stored in BDB at all, but just "plain" files (the blk000?.dat ones). When verifying a block / transaction, obviously the client needs to check all the tx outs that are spent by it. Those are looked up from the blockchain files, since that's the only place where transactions are stored (except the wallet for local transactions). Thus, each block from which some transaction uses an output has to be read from disk - and this produces lots of random accesses to the blockchain files. It seems almost "obvious" that this should lead to horrible performance. Right? Even though this is somewhat trivial, I was not really aware of it like this (had never thought about the precise way how these things are done before).
What we can do about it: We could try to parse a new block first, looking for all the transactions that will be necessary to verify it. Then look them up in order to at least reduce cache misses a bit. However, I'm not sure if this is the way we should go. Instead, I suggest that we keep track of the full UTXO set either in memory or at least the blkindex.dat BDB file, including the full tx outs. This way, it should be possible to verify a transaction or block without ever looking into the blockchain files - just from the UTXO set. This is just guesswork, but I believe it could/would speed up syncing drastically. Anything I'm missing here? I don't think that it would actually be so hard to do - and will give it a try when I come around to it. (But it will probably need careful testing in order to make sure there are no bugs introduced!)
Re: Berkeley DB
Posted: Fri Apr 18, 2014 7:30 am
by crosser
domob wrote:each block from which some transaction uses an output has to be read from disk - and this produces lots of random accesses to the blockchain files.
I do not know if this theory is correct, but if it is, then the problem is common to namecoin and bitcoin. If so, it would make sense to address "upstream", i.e. in the bitcoin codebase.
Just my two namecents.
Re: Berkeley DB
Posted: Fri Apr 18, 2014 8:39 am
by domob
crosser wrote:domob wrote:each block from which some transaction uses an output has to be read from disk - and this produces lots of random accesses to the blockchain files.
I do not know if this theory is correct, but if it is, then the problem is common to namecoin and bitcoin. If so, it would make sense to address "upstream", i.e. in the bitcoin codebase.
Not entirely, since Bitcoin stores the blockchain no longer in the unstructured files but instead in a LevelDB. This possibly takes care of proper caching. (But I don't know the respective pieces of the Bitcoin code.) This makes the problem less severe than for Namecoin, so that it is acceptable at the moment for most users.
However, I believe that UTXO ideas have never been implemented in Bitcoin so far, even though they constantly flow around in discussions since a few years. Of course, the same things could be done for Bitcoin as well (if the devs like it). This is no longer really "upstream", though, as the code bases have diverged particularly in this respect (with Bitcoin using LevelDB now).
Re: Berkeley DB
Posted: Fri Apr 18, 2014 6:23 pm
by phelix
Hmm dbcache=1500 things should not be limited by random disk access too much. Also on SSD systems things are somehow slowed down with GUI and a large wallet.
I will take look at that mutex locking issue.
Re: Berkeley DB
Posted: Sat Apr 19, 2014 7:12 am
by domob
phelix wrote:Hmm dbcache=1500 things should not be limited by random disk access too much. Also on SSD systems things are somehow slowed down with GUI and a large wallet.
I will take look at that mutex locking issue.
Have you actually tried this and does dbcache=1500 remove the disk accesses? Does it give good performance on a non-SSD system? Note that, as mentioned above, the blockchain itself is not stored in BDB. Thus also dbcache should not have an effect on it. Instead, for each tx output that is checked in ConnectInputs, the respective tx is loaded from disk (as far as I can tell from the code). While the dbcache will help with accesses to blkindex.dat, I doubt that it actually speeds up transaction verification. This is why I propose to move the full UTXO set to a BDB file, not just the tx index.
Re: Berkeley DB
Posted: Wed Apr 23, 2014 12:44 pm
by phelix
domob wrote:phelix wrote:Hmm dbcache=1500 things should not be limited by random disk access too much. Also on SSD systems things are somehow slowed down with GUI and a large wallet.
I will take look at that mutex locking issue.
Have you actually tried this and does dbcache=1500 remove the disk accesses? Does it give good performance on a non-SSD system? Note that, as mentioned above, the blockchain itself is not stored in BDB. Thus also dbcache should not have an effect on it. Instead, for each tx output that is checked in ConnectInputs, the respective tx is loaded from disk (as far as I can tell from the code). While the dbcache will help with accesses to blkindex.dat, I doubt that it actually speeds up transaction verification. This is why I propose to move the full UTXO set to a BDB file, not just the tx index.
Back then I merged a patch from Bitcoin to allow to change the dbcache size. I am quite certain that it helped considerably on the non-SSD system I tested it on. Other optimizations where about less flushing of the db files.
With reading from a non-BDB file should not the system cache take care of the random reads (at least at initial blockchain download and given enough RAM).
I never looked at the flushing of the blkxxxx.dat files:
Code: Select all
// Flush stdio buffers and commit to disk before returning
fflush(fileout);
if (!IsInitialBlockDownload() || (nBestHeight+1) % 500 == 0)
{
#ifdef __WXMSW__
_commit(_fileno(fileout));
#else
fsync(fileno(fileout));
#endif
}
Anyway, the issue I see on my machine certainly is GUI related. I could not yet figure out how to get around it.
Re: Berkeley DB
Posted: Thu Apr 24, 2014 8:34 am
by domob
phelix wrote:With reading from a non-BDB file should not the system cache take care of the random reads (at least at initial blockchain download and given enough RAM).
In theory, probably. But for Huntercoin, it makes an
order of magnitude difference to move the blockchain directory to a RAM disk before syncing, and back after that. I've not explicitly tried it out with Namecoin, but it is probably the same.
phelix wrote:Anyway, the issue I see on my machine certainly is GUI related. I could not yet figure out how to get around it.
Have you tried the locking patch? Does it help? For me, syncing the Namecoin blockchain always was somewhat painful and slow even though I never run the Namecoin-UI. Just running Namecoin in the background (and syncing a day's worth of blocks or so) is usually fine, though.
Re: Berkeley DB
Posted: Thu Apr 24, 2014 9:02 am
by snailbrain
domob wrote:phelix wrote:With reading from a non-BDB file should not the system cache take care of the random reads (at least at initial blockchain download and given enough RAM).
In theory, probably. But for Huntercoin, it makes an
order of magnitude difference to move the blockchain directory to a RAM disk before syncing, and back after that. I've not explicitly tried it out with Namecoin, but it is probably the same.
phelix wrote:Anyway, the issue I see on my machine certainly is GUI related. I could not yet figure out how to get around it.
Have you tried the locking patch? Does it help? For me, syncing the Namecoin blockchain always was somewhat painful and slow even though I never run the Namecoin-UI. Just running Namecoin in the background (and syncing a day's worth of blocks or so) is usually fine, though.
with mechanical harddisk:
the namecoin daemon is pretty much the same as the qt for syncing - at least on windows... it always goes into none responding state and hard drive is going hard (permanently)
QT seems marginally slower... but you notice not responding a lot more.
On linux you don't notice as much. probably due to less fragmentation and you don't get not responding.. although qt can become unresponsive during sync. Still get insane hd activity though.
Re: Berkeley DB
Posted: Fri Apr 25, 2014 2:31 pm
by phelix
GUI + large Wallet (with 200+ name_ops) = ~ 10-100 times slower
Unfortunately I could not really narrow it down to a particular function. Trying to profile it I see that it is wasting a lot of time locking and unlocking memory (but I could not get it to display function names).
I guess it is only a small problem as long as you are aware of it.