RealBay: A searchable torrent index

Forum rules
Warning !
Avoid using binary softwares from untrusted users.
Prefer compiling it yourself and verify sources.
krisives
Posts: 9
Joined: Mon Dec 29, 2014 1:47 am

RealBay: A searchable torrent index

Post by krisives »

Hi everyone!

I've been working on a piece of software called "RealBay" recently that indexes torrent data into publishable indexes. The idea is that you trust a list of namecoin identities, so that you only search torrents from publishers you trust. Within the 500 byte record is a DHT hash to lookup that contains the index. The index is organized in such a way where you only have to download a small portion of it to search it. (It contains a bloom filter lookup table within the first piece of the torrent)

Most of my effort so far has been testing the bloom filter indexing, size constraints, and chaining together the indexes so that they can be effeciently searched without downloading much data, while maintaining plausible deniability of search terms. For example, if a torrent contains the word "Ubuntu" it will never actually occur in the index at all (Only a bloom filter representation of multiple words)

Now I've turned my eye to the Namecoin part of the equation. There are a lot of questions:

* I see that there is a prefix to registering addresses, such as d/ and u/ and I see it would be a good idea to discuss with others before I do much here. On the one hand I want to use the existing d/ or u/ prefixes, but I don't think my data maps well to those usages and I don't want people wasting their time trying to resolve addresses that will never exist. I was planning on using the 500 bytes to store a 20-byte hash used to find the index on DHT, and the remaining data to improve search lookup.

* For end users that aren't that concerned about running a "full node" could I use a DNS resolver somewhere? My idea is to let users decide on startup if they want to download the entire Namecoin chain / software or use a remote Namecoin resolver of some kind. Obviously this may be a problem with my custom format of mapping to 20-byte hashes.

* Is this invalid usage of Namecoin? It seems like a good usage, and I considered using BitMessage, but it didn't seem to solve my problems as well. I don't like that I have to download the entire Namecoin chain to get a single result from it, but I think that will be solved over time and isn't principle to the design of Namecoin. Namecoin seems like a good replacement for DNS if the goal is to avoid censorship.

------

For anyone interested in helping or just want to read the code it's still private currently at this Gitlab address:

https://gitlab.com/krisives/realbay/

The reason for it being private right now is because I want to avoid people looking at an unfinished project and writing it off too early. It's coded in Javascript currently using Node-Webkit (or just Node for the publishing tools over the command line) My "last" big problem to solve is with building very large indexes of millions of torrents. The indexes work fine if they are built, but the time to build them is very high.

Thanks for reading and let me know if you wish to get involved in the project before the code is released.

phelix
Posts: 1634
Joined: Thu Aug 18, 2011 6:59 am

Re: RealBay: A searchable torrent index

Post by phelix »

Not quite sure what exactly you want to store but it may be a case for a new namespace. "identities to trust" sounds like a good use for id/, though.

note:
https://forum.namecoin.info/viewtopic.php?f=5&t=1381

What about publishers using namecoin directly?
nx.bit - some namecoin stats
nf.bit - shortcut to this forum

krisives
Posts: 9
Joined: Mon Dec 29, 2014 1:47 am

Re: RealBay: A searchable torrent index

Post by krisives »

I had looked at that thread a bit before posting this, but it has some slightly different goals.

> Not quite sure what exactly you want to store

It's an index of lots of torrents. The data is arbitrary and searched by keyword of the torrent name. Some use cases are all the linux torrents ever published, a specific distro, etc. I expect the community to mix and match original content and aggregating it.

> What about publishers using namecoin directly?

This is a LOT of data to be putting directly into the block chain.

Using the id/ prefix what can I put into the 500 byte data value?

krisives
Posts: 9
Joined: Mon Dec 29, 2014 1:47 am

Re: RealBay: A searchable torrent index

Post by krisives »

Some work has been done on this:
https://github.com/realbay/realbay-cli

domob
Posts: 1129
Joined: Mon Jun 24, 2013 11:27 am
Contact:

Re: RealBay: A searchable torrent index

Post by domob »

krisives wrote:> Not quite sure what exactly you want to store

It's an index of lots of torrents. The data is arbitrary and searched by keyword of the torrent name. Some use cases are all the linux torrents ever published, a specific distro, etc. I expect the community to mix and match original content and aggregating it.
Sounds interesting! There were discussions of more of less this usecase already, and at least my opinion is that it is an interesting one. So far, nothing has been realised - so I'm looking forward to your project!
krisives wrote:> What about publishers using namecoin directly?

This is a LOT of data to be putting directly into the block chain.

Using the id/ prefix what can I put into the 500 byte data value?
In principle, you can put whatever data you like into any name. However, in your case, I suggest to use something along either of these formats:

1) Use id/ and follow mostly the spec. I. e., someone can just use an "ordinary" id/ value for their BM address, donation addresses, whatever they like. In addition, they would just publish the 20-byte string to their published Torrent data in a new JSON field you create outside of the spec. The advantage is that you only need a single name, the downside is that you have no extra bytes for "improving the search" (not sure if that's a problem for you or not).

2) Use id/ plus a new namespace like "rb/" or "realbay/" or something like that. Your torrent data plus whatever extra bytes of the 520 available you want to use is stored in whatever format you like in, say, rb/my-torrent-directory. In id/domob, I would then just add a new JSON field that says something like "realbay-torrents":"rb/my-torrent-directory" to link to it. Then someone can still look up torrents for my id/ name, and they can be sure that the torrents are, indeed, published by me. This also enables to publish multiple directories from a single id/ if you make the JSON field an array. And you get to use full 520 bytes in whatever format you like for your payload; but you need to consider two names. (But the id/ name may be present already if the publisher has an id/ anyway.)

3) Use a new namespace like "realbay/" similarly to above, but don't even link to it from id/. Instead, the realbay/ name may include signatures made by the Namecoin address (or some other key, e. g., "signer") of one or more id/'s. This is the most straight-forward approach for publishing torrents. With the signatures, users can still be sure that the Torrent is published by the id/ they think it is. In addition, this link is fully optional, and one could even produce a torrent published and signed by multiple id/'s together. The disadvantage (not sure) is that users need to search for the realbay/ name, not the id/ name. I. e., they can't look up what I have published, but only a particular directory they know of (like realbay/debian-downloads).

I strongly advise you not to use id/ and use all 520 bytes for your torrents. This makes the id/ name unusable for standard ID purposes. I as a publisher would like to have id/domob also for non-publishing uses.

Not sure if it is clear what I want to state - let me know if not.
BTC: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS
Use your Namecoin identity as OpenID: https://nameid.org/

krisives
Posts: 9
Joined: Mon Dec 29, 2014 1:47 am

Re: RealBay: A searchable torrent index

Post by krisives »

@domob Thanks for your thoughts and encouragement. From what I can tell I am leaning closer to option #3, although #2 is something I have considered as well. Option #1 seemed to problematic for the same reasons you outlined.

My only problem right now is the namecoin bootstrap time is relatively high, but I know the solutions to that are a much larger separate issue (and Bitcoin is solving some of these as well). I was considering using a "checkpoint" upon release possibly, since there would be no "realbay" content in the blockchain before that.

But then we hit the touchy subject of running Namecoin as a child process, asking users to run Namecoin, etc. What are your thoughts on that?

domob
Posts: 1129
Joined: Mon Jun 24, 2013 11:27 am
Contact:

Re: RealBay: A searchable torrent index

Post by domob »

krisives wrote:@domob Thanks for your thoughts and encouragement. From what I can tell I am leaning closer to option #3, although #2 is something I have considered as well. Option #1 seemed to problematic for the same reasons you outlined.

My only problem right now is the namecoin bootstrap time is relatively high, but I know the solutions to that are a much larger separate issue (and Bitcoin is solving some of these as well). I was considering using a "checkpoint" upon release possibly, since there would be no "realbay" content in the blockchain before that.

But then we hit the touchy subject of running Namecoin as a child process, asking users to run Namecoin, etc. What are your thoughts on that?
First of all, are you talking about running Namecoin to publish torrents (i. e., create names) or to look them up (only read names)? In the latter case, you could use NMControl as an intermediate layer. It allows to use a downloaded snapshot of names instead of running a real Namecoin process. It will (possibly - if a patch by me is merged) also allow to use some trusted API server running Namecoin for lookups. This may be a good alternative for people who do not want to run Namecoin on their own and do not require full trustlessness.

Other than that, have you tried Namecore? It is Namecoin reimplemented on top of the current Bitcoin code (not the ancient 0.3.x one), and runs already much nicer. You could also bundle (or download via Bittorrent) a precreated, signed blockchain so users only have to sync from that point onward. Or wait for a light client to appear, which is in discussion already for some time with ideas floating around.

My suggestion: Don't really care about that issue right now, and instead focus on adding your own stuff. We're working on improving syncing, and other solutions can be implemented, too, once you're own original conribution is done and working.
BTC: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS
Use your Namecoin identity as OpenID: https://nameid.org/

krisives
Posts: 9
Joined: Mon Dec 29, 2014 1:47 am

Re: RealBay: A searchable torrent index

Post by krisives »

My suggestion: Don't really care about that issue right now, and instead focus on adding your own stuff. We're working on improving syncing, and other solutions can be implemented, too, once you're own original conribution is done and working.
I agree. My only goal is to find something end users can use.
First of all, are you talking about running Namecoin to publish torrents (i. e., create names) or to look them up (only read names)?
I'm mainly talking about the end user client that will need ways to find a given address. For example, if someone enters "linuxtorrents" to find the realbay user it needs to find something like "bay/linuxtorrents" in the Namecoin chain and get that 20-byte DHT address to continue getting more meta data.
In the latter case, you could use NMControl as an intermediate layer. It allows to use a downloaded snapshot of names instead of running a real Namecoin process.
Other than that, have you tried Namecore?
How hard would it be for me to make a slightly forked version (which you are welcome to merge back) which would not even care about the blocks before a certain date. My idea here is only use blocks accepted pretty deep into the chain (like a day or two maybe?) and if something references a transaction before the "realbay epoch" it would assume it's valid since it's been in the index for a long time.
It will (possibly - if a patch by me is merged) also allow to use some trusted API server running Namecoin for lookups
That could be useful for some users that are willing to compromise security more for the sake of speed / easy setup. It would be cool if we could figure a scheme to exchange the desired search without actually revealing the search terms over the wire. Maybe bloom filters can help more there.

In the short term I'm simply using the default Namecoin client and trying `name_filter` out... But it's pretty slow.

biolizard89
Posts: 2001
Joined: Tue Jun 05, 2012 6:25 am
os: linux

Re: RealBay: A searchable torrent index

Post by biolizard89 »

krisives wrote:
In the latter case, you could use NMControl as an intermediate layer. It allows to use a downloaded snapshot of names instead of running a real Namecoin process.
Other than that, have you tried Namecore?
How hard would it be for me to make a slightly forked version (which you are welcome to merge back) which would not even care about the blocks before a certain date. My idea here is only use blocks accepted pretty deep into the chain (like a day or two maybe?) and if something references a transaction before the "realbay epoch" it would assume it's valid since it's been in the index for a long time.
What you're talking about is known as a "FN36 (full node 36 kiloblock) client" (at least that's what we're calling it now; expect a blog post formalizing this terminology soon). It's only necessary to process 36 kiloblocks of history, since name outputs (except name_new) are always spent after at most 36 kiloblocks. I don't think anyone's developed this yet, but it's probably not incredibly hard. I have a mostly working "FN36-ABR (full node 36 kiloblock all block receive) client", which has to download the full chain but only stores the most recent 36 kiloblocks. Once I've had some time to finish it up, assuming that no showstoppers are found, I'll release it along with a blogpost explaining how it works. The storage used by the FN36-ABR client is about 107 MiB.
krisives wrote:
It will (possibly - if a patch by me is merged) also allow to use some trusted API server running Namecoin for lookups
That could be useful for some users that are willing to compromise security more for the sake of speed / easy setup. It would be cool if we could figure a scheme to exchange the desired search without actually revealing the search terms over the wire. Maybe bloom filters can help more there.
We're looking at the possibility of doing SPV lookups of names over Tor, which would prevent certain attackers (not a passive global adversary) from learning which IP searched for which names. There are also some proposals for making names hashed, which would make it somewhat harder for an adversary to figure out what was searched for, depending on the entropy of the name. (There is not a consensus on whether this should be done.) However, for full anonymity of lookups, you would need at least a FN36 client.
krisives wrote:In the short term I'm simply using the default Namecoin client and trying `name_filter` out... But it's pretty slow.
I think name_filter is doing regexp operations on every name. Needless to say, that's very inefficient. I wouldn't object to a simple prefix search if it speeds things up a lot (which I assume it would). Depending on what you're using name_filter for, it's also plausible that an index of namespaces could help.
Jeremy Rand, Lead Namecoin Application Engineer
NameID: id/jeremy
DyName: Dynamic DNS update client for .bit domains.

Donations: BTC 1EcUWRa9H6ZuWPkF3BDj6k4k1vCgv41ab8 ; NMC NFqbaS7ReiQ9MBmsowwcDSmp4iDznjmEh5

domob
Posts: 1129
Joined: Mon Jun 24, 2013 11:27 am
Contact:

Re: RealBay: A searchable torrent index

Post by domob »

biolizard89 wrote:What you're talking about is known as a "FN36 (full node 36 kiloblock) client" (at least that's what we're calling it now; expect a blog post formalizing this terminology soon). It's only necessary to process 36 kiloblocks of history, since name outputs (except name_new) are always spent after at most 36 kiloblocks. I don't think anyone's developed this yet, but it's probably not incredibly hard. I have a mostly working "FN36-ABR (full node 36 kiloblock all block receive) client", which has to download the full chain but only stores the most recent 36 kiloblocks. Once I've had some time to finish it up, assuming that no showstoppers are found, I'll release it along with a blogpost explaining how it works. The storage used by the FN36-ABR client is about 107 MiB.
Does your FN36 client also store the UTXO set? If so, note that you need no blocks at all for name lookup (it's in the UTXO set) nor for tx validation (also UTXO set). For Bitcoin there's a patch somewhere that does this and allows to run in "fully pruned mode". The same could be done with Namecore. The downside is that those clients can not reliably service the network.

If your client does not keep the UTXO set, it can not be fully validating: For tx processing you need potentially more than 36k blocks. I. e., you basically have some variant of SPV client.

I think that a client that keeps only the UTXO set and no blocks at all is the variant to go for in this "class" of light clients. gmaxwell hinted that they want to include the pruned mode into upstream Bitcoin, so at one point in the future, Namecore will have this as well.
BTC: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS
Use your Namecoin identity as OpenID: https://nameid.org/

Post Reply