Page 2 of 2

Re: RealBay: A searchable torrent index

Posted: Fri Jan 23, 2015 7:13 pm
by krisives
What you're talking about is known as a "FN36 (full node 36 kiloblock) client" (at least that's what we're calling it now; expect a blog post formalizing this terminology soon). It's only necessary to process 36 kiloblocks of history, since name outputs (except name_new) are always spent after at most 36 kiloblocks. I don't think anyone's developed this yet, but it's probably not incredibly hard. I have a mostly working "FN36-ABR (full node 36 kiloblock all block receive) client", which has to download the full chain but only stores the most recent 36 kiloblocks. Once I've had some time to finish it up, assuming that no showstoppers are found, I'll release it along with a blogpost explaining how it works. The storage used by the FN36-ABR client is about 107 MiB.
Where can I grab the source to this or what build should I try to start from to do a similar approach? I haven't run the numbers yet but I could probably process the entire blockchain, since it's actually pretty small, and see how large a bloom filter lookup would be. I know there is work in Namecoin to replicate some of the bloom filter stuff happening in Bitcoin, but I don't know any status on that work and to be frank the Bitcoin work is still not in the ref client.
However, for full anonymity of lookups, you would need at least a FN36 client.
That's understandable. If possible I want to present two options to my users, one for users that are concerned about being tracked who would opt to run a full Namecoin client, and others that are less concerned about being tracked but upset their favorite torrent site is down.
I think name_filter is doing regexp operations on every name. Needless to say, that's very inefficient. I wouldn't object to a simple prefix search if it speeds things up a lot (which I assume it would). Depending on what you're using name_filter for, it's also plausible that an index of namespaces could help.
Does Namecoin have any indexing happening on it's own? I am interested in coming up with ways to make it faster to search for various use cases, even if it means keeping the data out of band. I think a bloom filter lookup table could be constructed without too much trouble, which would at least help you find which blocks have which text in them.
The downside is that those clients can not reliably service the network.
I think there is a solution to this. I think if all the clients could keep a bloom filter lookup table they would get what they want from the network without having to keep the entire Namecoin chain on disk. That leads to the problem that now there are clients that don't have the full chain to service the network. It could be possible to ensure that N number of clients distribute the index between each other, much like a torrent seeding. Each user would still not keep much Namecoin chain on disk, would still get fast lookup on domains from the bloom lookup, etc.

Thanks for helping me everyone I look forward to your replies and ideas.

Re: RealBay: A searchable torrent index

Posted: Sat Jan 24, 2015 8:01 am
by domob
krisives wrote:
I think name_filter is doing regexp operations on every name. Needless to say, that's very inefficient. I wouldn't object to a simple prefix search if it speeds things up a lot (which I assume it would). Depending on what you're using name_filter for, it's also plausible that an index of namespaces could help.
Does Namecoin have any indexing happening on it's own? I am interested in coming up with ways to make it faster to search for various use cases, even if it means keeping the data out of band. I think a bloom filter lookup table could be constructed without too much trouble, which would at least help you find which blocks have which text in them.
It keeps the names in BDB (old client) and LevelDB (Namecore). Those are (both) similar to a binary search tree, so that you can look up names by prefix quickly (and single names of course). Using name_show for single names and name_scan to start at some prefix uses this index. name_filter just iterates through all names, matching against your regexp.

A simple test shows that "name_scan d/foo 1" returns in 20 ms for me, while the roughly equivalent "name_filter ^d/foo 36000 0 1" takes 600 ms. (Both on Namecore.)
krisives wrote:
The downside is that those clients can not reliably service the network.
I think there is a solution to this. I think if all the clients could keep a bloom filter lookup table they would get what they want from the network without having to keep the entire Namecoin chain on disk. That leads to the problem that now there are clients that don't have the full chain to service the network. It could be possible to ensure that N number of clients distribute the index between each other, much like a torrent seeding. Each user would still not keep much Namecoin chain on disk, would still get fast lookup on domains from the bloom lookup, etc.
Yes, there's been some ideas and discussions going on for Bitcoin in this direction. I. e., each client (or those configured to do it to save disk space) keeps only a fraction of all blocks, ideally in some way that makes it easy to tell others which blocks those are with just a few bytes of data transferred as part of the handshake. AFAIK, the discussion went nowhere for the moment, though - but it is on gmaxwell's list. (And I'm interested in that, too.)

Re: RealBay: A searchable torrent index

Posted: Sun Jan 25, 2015 7:57 am
by krisives
Neat, so I can use name_scan and name_show. Thank you.

Re: RealBay: A searchable torrent index

Posted: Thu Jan 29, 2015 2:36 am
by biolizard89
domob wrote:Does your FN36 client also store the UTXO set? If so, note that you need no blocks at all for name lookup (it's in the UTXO set) nor for tx validation (also UTXO set). For Bitcoin there's a patch somewhere that does this and allows to run in "fully pruned mode". The same could be done with Namecore. The downside is that those clients can not reliably service the network.

If your client does not keep the UTXO set, it can not be fully validating: For tx processing you need potentially more than 36k blocks. I. e., you basically have some variant of SPV client.

I think that a client that keeps only the UTXO set and no blocks at all is the variant to go for in this "class" of light clients. gmaxwell hinted that they want to include the pruned mode into upstream Bitcoin, so at one point in the future, Namecore will have this as well.
It's not storing the full UTXO set, only the UNO set (or a subset matching a specific Namecoin namespace). It trusts that the currency inputs to name operations are valid, but it does verify the name inputs to name operations. So it's much lighter in terms of storage than storing the full UTXO set would be. Note that this is based on libcoin, so there's no concept of storing full blocks (it's in minimal persistence mode, so everything that is spent gets purged after maturity is reached).
krisives wrote:Where can I grab the source to this or what build should I try to start from to do a similar approach? I haven't run the numbers yet but I could probably process the entire blockchain, since it's actually pretty small, and see how large a bloom filter lookup would be. I know there is work in Namecoin to replicate some of the bloom filter stuff happening in Bitcoin, but I don't know any status on that work and to be frank the Bitcoin work is still not in the ref client.
Here's the code if you want to play with it: https://github.com/JeremyRand/libcoin/t ... -selective . A blogpost should be coming soon with more info on how this works. Note that no warranty is implied and it has been subject to very little testing, but if you have problems with it please do let me know.

FYI, since presumably your code doesn't need the d/, id/, or (nonstandard) u/ namespaces (which together are probably the majority of the Namecoin blockchain at the moment), the namespace whitelisting feature in this code may help you a lot. Basically it will only store the names that correspond to a set of namespaces that you specify. If you try this feature out, let us know how it goes.