Consensus on Compression

Namecoin, NMControl
Post Reply
indolering
Posts: 801
Joined: Sun Aug 18, 2013 8:26 pm
os: mac

Consensus on Compression

Post by indolering »

EDIT: the resulting wiki article.

I was speaking with someone recently who brought up the idea of changing the storage format from JSON to a binary format to save space. I believe we have discussed this in the past and decided it wasn't worth doing, but I would like to confirm the consensus viewpoint the issue before writing a wiki article on it.

I think the biggest objection was that encoding and decoding are non-trivial tasks:
  • Client applications would have to handle encoding and decoding, if your language of choice doesn't have a port of MessagePack (or whatever format) you have to write one yourself.
  • It would break generic searches of record values using the blockchain database. I do not believe Namecoin nor Libcoin utilizes this feature at the moment, but I certainly would not want to loose that capability.
  • It makes debugging more of a headache. The reason Unix stores config files in plain text is because prior systems required binary decoders to manipulate anything and it was a royal PITA. As the Unix saying goes, text is THE universal interface.
The other issue is that the space savings are rather questionable, making it an exercise in premature optimization:
  • Clients with limited space should implement thin-client solutions. The savings from binary storage are a joke compared to SPV, UTXO, SCIP and friends.
  • Compression can be applied transparently at different layers of abstraction, such as within the database, at the file-system level, and over the wire.
  • Binary compression of records cannot utilize commonalities across records. For example, a domains registered via a registrar should simply reference a generic record for that registrar (which also simply references a nameserver). It's trivial for a DB or filesystem to compress this information and it should yield savings within an order of magnitude of what one would achieve using binary compression
Last edited by indolering on Mon Apr 28, 2014 7:44 am, edited 1 time in total.
DNS is much more than a key->value datastore.

John Kenney
Posts: 94
Joined: Sat Mar 29, 2014 2:20 pm
os: linux
Location: Sheffield, England
Contact:

Re: Consensus on Compression

Post by John Kenney »

JSON is simple & relatively readable. It's a bit hard to change to a totally different format too.

Compression should be used for data sent over the wire, if it isn't already. Maybe the database storage format could be optimised too. Database storage isn't a config file, that's the whole point of using a database rather than just flat files, it should be optimised. Your bold points are against particular bad 'optimisations', not an argument against optimising. I don't see any need to change the api from json.

virtual_master
Posts: 541
Joined: Mon May 20, 2013 12:03 pm
Contact:

Re: Consensus on Compression

Post by virtual_master »

indolering wrote:I was speaking with someone recently who brought up the idea of changing the storage format from JSON to a binary format to save space. I believe we have discussed this in the past and decided it wasn't worth doing, but I would like to confirm the consensus viewpoint the issue before writing a wiki article on it.

I think the biggest objection was that encoding and decoding are non-trivial tasks:
  • Client applications would have to handle encoding and decoding, if your language of choice doesn't have a port of MessagePack (or whatever format) you have to write one yourself.
  • It would break generic searches of record values using the blockchain database. I do not believe Namecoin nor Libcoin utilizes this feature at the moment, but I certainly would not want to loose that capability.
  • It makes debugging more of a headache. The reason Unix stores config files in plain text is because prior systems required binary decoders to manipulate anything and it was a royal PITA. As the Unix saying goes, text is THE universal interface.
The other issue is that the space savings are rather questionable, making it an exercise in premature optimization:
  • Clients with limited space should implement thin-client solutions. The savings from binary storage are a joke compared to SPV, UTXO, SCIP and friends.
  • Compression can be applied transparently at different layers of abstraction, such as within the database, at the file-system level, and over the wire.
  • Binary compression of records cannot utilize commonalities across records. For example, a domains registered via a registrar should simply reference a generic record for that registrar (which also simply references a nameserver). It's trivial for a DB or filesystem to compress this information and it should yield savings within an order of magnitude of what one would achieve using binary compression
They are all good points.
Another argument against a general compression would be that the characters are rather unpredictable and in some cases could use even binary form(for ex. if storing an avatar or a signature) - an easy compression form wouldn't bring much - a complicated one most be more intelligent and would require more processing power and more possibility of failure - standard text compression algorithms wouldn't help to much.
But I wouldn't throw away your idea.
In more specific cases, where the text format is more predictable, space is more critical and they are less interactions and points of failure with the rest of the network could have a good chance.
For ex. for a lite-client surfing .bit domains or another lite client for Namecoin IDs.
But I think you also came to a similar conclusion:
indolering wrote: UXTO makes DNSChain obsolete because they remove relying on a trusted 3rd party to sign the certificates. UXTO and SPV take the security model offered by the blockchain and compresses into something we can fit inside of a browser, ~10megs (by my back-of-the-envelope calculations).
Maybe could be used in my Namecoin side-chain proposal version 1a.
http://forum.namecoin.info/viewtopic.php?f=24&t=1754
Nodes would generate additionally to the main chain a side chain with id/ and another with d/ entries.
http://namecoinia.org/
Calendars for free to print: 2014 Calendar in JPG | 2014 Calendar in PDF Protect the Environment with Namecoin: 2014 Calendar in JPG | 2014 Calendar in PDF
BTC: 15KXVQv7UGtUoTe5VNWXT1bMz46MXuePba | NMC: NABFA31b3x7CvhKMxcipUqA3TnKsNfCC7S

domob
Posts: 1129
Joined: Mon Jun 24, 2013 11:27 am
Contact:

Re: Consensus on Compression

Post by domob »

Yes, sounds good to me. BTW, I've experimented a bit with using zlib to compress BDB entries, and a working patch is available for Huntercoin. It compresses nameindexfull.dat to about 40% there, but in general, I think this is probably not worth it for now. But I agree that if we want to optimise storage, it should be part of networking and/or disk storage, and not the high-level protocol specification.
BTC: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS
Use your Namecoin identity as OpenID: https://nameid.org/

indolering
Posts: 801
Joined: Sun Aug 18, 2013 8:26 pm
os: mac

Re: Consensus on Compression

Post by indolering »

domob wrote:Yes, sounds good to me. BTW, I've experimented a bit with using zlib to compress BDB entries, and a working patch is available for Huntercoin. It compresses nameindexfull.dat to about 40% there, but in general, I think this is probably not worth it for now. But I agree that if we want to optimise storage, it should be part of networking and/or disk storage, and not the high-level protocol specification.
And I thought you were already having trouble with BDB's speed!
DNS is much more than a key->value datastore.

domob
Posts: 1129
Joined: Mon Jun 24, 2013 11:27 am
Contact:

Re: Consensus on Compression

Post by domob »

indolering wrote:
domob wrote:Yes, sounds good to me. BTW, I've experimented a bit with using zlib to compress BDB entries, and a working patch is available for Huntercoin. It compresses nameindexfull.dat to about 40% there, but in general, I think this is probably not worth it for now. But I agree that if we want to optimise storage, it should be part of networking and/or disk storage, and not the high-level protocol specification.
And I thought you were already having trouble with BDB's speed!
Well, both. ;) But my current conjecture is that BDB is in general more efficient (since it is just a binary-tree based name-value storage and no full relational database with SQL support), but that there's some bug in Namecoin's usage that makes it use excessive disk. I hope to find out some time....
BTC: 1domobKsPZ5cWk2kXssD8p8ES1qffGUCm | NMC: NCdomobcmcmVdxC5yxMitojQ4tvAtv99pY
BM-GtQnWM3vcdorfqpKXsmfHQ4rVYPG5pKS
Use your Namecoin identity as OpenID: https://nameid.org/

Post Reply