[ATT: POOL OPS] PoolServerJ - scalable java pool backend

nodemaster · Post by **nodemaster** » Thu Aug 18, 2011 8:41 pm

Nevermind I missed:

usePushPoolCompatibleFormat=true

I didn't pay attention as it was in the loggin section and I thought it is only about file logging

nodemaster · Post by **nodemaster** » Thu Aug 18, 2011 9:10 pm

Could you please tell something about the hardware requirements for PoolServerJ? I thought it should be the same like pushpoold if it could replace it. I'm running entire alpha.masterpool with pushpoold, namecoind, bitcoind, merged-mining-proxy, mysql, nginx and the webinterface on one VPS with one CPU and 1Gig of RAM. As soon as I start submitting shares using PoolServerJ instead of pushpoold the whole server is lagging like hell

resulting in miners waiting for getworks. Using pushpoold while testing merged mining I could satisfy at least 15 miners with a total of 4.5 Ghash/s on this setup without any problems... Any idea where I can tweak PoolServerJ?

I attached an image from my monitoring switching two miners (400 Mhash/s each) between pushpoold and PoolServerJ. I used iptables to rewrite the port to either the one or the other daemon.

shads · Post by **shads** » Thu Aug 18, 2011 11:04 pm

It's memory usage will probably be higher but it shouldn't be any more of a load on cpu or disk than pushpool... My testing has shown it performs quite a bit faster in most scenarios and most of those tests were very heavily loaded. Admittedly I was quite liberal with the amount of memory available though limiting it's memory should impact it's performance by at most a few tens of % not by orders of magnitude.

My best guess would be memory usage. psj can eat as much memory as you let it and by default it's pretty greedy. But it can also run very lean. By default with the -server switch the default max heap size is something like 1/4 or 1/3 available memory so my guess would be you've got virtual mem disk paging happening.

Try setting the java switches -Xms16m -Xmx64m

That should be enough headroom for a pool that size. Remember java stores all it's work recorded internally and doesn't offload to memcached so it's guaranteed to eat more than the pushpoold process if you run them side by side.

Also are you running a 64bit OS? If so I'd highly recommend using a 32bit JDK. The 64bit JDK only really offers the possible advantage of using heaps larger than 2gb. If you don't have that requirement all it offers you is near double memory usage and often slower performance. If a 32bit JDK is not an option then you can use the -XX:+UseCompressedOops switch to reduce it's memory usage at the cost of small performance hit.

If you've already tweaked memory then it might be the other way around... if Xmx is too low the JVM will spend all it's time trying to free memory.

If you think I'm off the mark perhaps you could send me the properties file and I'll see if anything stands out...

One other thing to consider is because of the JIT compiler psj actually needs some time to 'warm up' before it reaches peak performance. On a flat out out server this can take a minute or two, on a less loaded server it could take longer though in that case I'd be surprised if the difference was noticeable. You can force it to precompile a lot faster with -Xbatch switch.

Here's a good explanation of some of the more common JVM tweaking params:
http://download.oracle.com/javase/1,5.0 ... /java.html (it's for java 1.5 but it's all still relevent)

and if you want get really hardcore about it try some of these:
http://www.oracle.com/technetwork/java/ ... 40102.html

shads · Post by **shads** » Fri Aug 19, 2011 3:50 am

Actually I just realised you are running pushpool and psj at the same time and just switching clients betweent then... that would probably rule out what I said or pushpool would be slowing down as well...

My new theory... if the pool is only occasionally utilised and you've got a large cache size set (it think the sample config file had something like 5000) then what might be happening is on startup the server goes flat out until the cache is full. Then during the cache timeout you are only using a few of the available works in the cache, so after your maxWorkAgeToFlush almost the entire cache is dumped the server goes flat out refilling 5000 works (most of which are wasted). Under a heavier load this would be more distributed as a new work request is started everytime time cache drops below max size.

Suggest you tweak the following.
source.local.1.maxCacheSize=10000 (this will currently actually set the cache to 1/2 that value, this is hangover from not having finished implementing dynamic cache resizing)
source.local.1.maxWorkAgeToFlush=3000

dropping the cache size should reduce the size of these bursts. increasing the maxWorkAge substantially might be a useful indicator to see if this really is the problem. It might give it enough space in between dumping the cache and refilling to see the busy/quiet cycle more clearly.

Also use the management interface to poke: http://localhost:8997/?method=getsourcestats just after server starts up and you can see the rate that it's getting work in the from the daemon. This should be well into the hundreds/sec if you have JK patched bitcoind.

nodemaster · Post by **nodemaster** » Fri Aug 19, 2011 10:46 am

shads wrote: Suggest you tweak the following.
source.local.1.maxCacheSize=10000 (this will currently actually set the cache to 1/2 that value, this is hangover from not having finished implementing dynamic cache resizing)
source.local.1.maxWorkAgeToFlush=3000

Yeah, that did the trick

Thank you! Compatibility Mode works without problems this way.

Unfortunately I now have another question. I have a modified pushpoold. I'm already writing the shares timestamp to database. However as name/bitcoind returns UNIX Timestamps as well I'm using them too. Thus I tried the following in order to get the timestamp into the database:

Code: Select all

usePushPoolCompatibleFormat=false
db.stmt.insertShare=INSERT INTO shares (rem_host, username, our_result, upstream_result, reason, solution, timestamp) VALUES (?, ?, ?, ?, ?, ?, UNIX_TIMESTAMP(?))

I enabled MySQL Logfile and the statements are prepared (and can be executed on MySQL console as well) but are not written to database.

shads · Post by **shads** » Fri Aug 19, 2011 12:21 pm

What type of column is the timestamp in your database? I thought I'd tested with bigints, datetimes and timestamps (in which case you shouldn't need the unix_timestamp part).

INSERT INTO shares (rem_host, username, our_result, upstream_result, reason, solution, timestamp) VALUES (?, ?, ?, ?, ?, ?, ?)

It is set with the statement:
stmt.setTimestamp(7, new Timestamp(entry.createTime));

If not can you set debug=true, logStacktraces=true then if the database is not writing it should be throwing an exception which is logged to stderr. That will show you both the error and the query resolved with values so you can see exactly what it's passing... Can you post it here if that doesn't help you sort it out...

BTW if your test server is exposed publicly you should be aware of this issue:
https://bitcointalk.org/index.php?topic ... #msg467586

Validations are not working correctly and allowing shares that don't meet difficulty 1 target. I've made to fix but due to hideous flu I'm probably not going to get new binaries posted until tomorrow.

shads · Post by **shads** » Fri Aug 19, 2011 12:31 pm

BTW just noticed you've got usePushPoolCompatibleFormat=false

that means our_result and upstream_result will be set as boolean (or 1,0). Not as 'Y' or 'N' that pushpool uses. If those columns are wrong type this could also cause a problem.

shads · Post by **shads** » Sat Aug 20, 2011 4:24 am

This release contains some fix's and updates essential for anyone running a live pool. Please read the changelog for details.

[0.2.8]
- implement 'include' in properties to allow seperation of config blocks into different files for easy changeover
- add check for duplicate solution on submit
- changed 'stale-work' to 'stale' for pushpool compatibility
- change property name 'useEasiestDifficulty' to 'useRidiculouslyEasyTargetForTesingButDONTIfThisIsARealPool' to make is clear this isn't the same as pushpools 'rpc.target.rewrite'
- crude support for share counter table updates rather than full submits.
- fix: cache size set to 1/2 maxCacheSize due to partially implement dynamic cache sizing
- fix: shares accepted below difficulty due to endian issues setting difficulty target. (thanks luke-jr for the help)
- fix: duplicate work checks not working properly due to race conditions. Moved atomic duplicate check/update operations into synchronized block.
- fix: duplicate work not being logged
- updated json-rpc and utils libs
- update sample properties file to reflect recent changes.

shads · Post by **shads** » Sat Aug 27, 2011 4:18 am

It appears the last critical update introduced a new bug which if triggered causes a memory leak due the cache flushing thread crashing. If you're using 0.2.8 upgrade is urgently recommended.

[0.2.9]
- fix: FastEqualsSolution not serializable causing exception dumping workmap during safe restart
- fix: nullpointer exception crashing cache cleaner thread leading to eventual OOM error.
- added generic try catch to all threads to catch unknown exceptions and prevent them stopping. - need to add 'shutdownOnCriticalError' option then these errors can be handled by shutting down the server and a wrapper script can restart.

shads · Post by **shads** » Sun Sep 18, 2011 8:39 am

This is a major milestone release for PoolServerJ. Many of fixes and improvements were as a result of an extensive stress testing process by BTC Guild while they were migrating the pool over. To celebrate I've changed to license to fully open source.

Notable changes include:

* Now licensed under GPL v3
* complete rewrite of longpolling code which has improved longpolling performance markedly
* alpha implementation of native longpolling listener with the aid of an intermediate daemon built by Caesium from rfcpool
* numerous stability fixes.

Complete changelog (including the changes from the unreleased 0.2.10):

[0.3.0rc1]

- license change to GPL v3.0
- added a temporary logger to record all stages of submission where a real solution is found.
- fix from <Eleuthria>: convert submitted data to lowercase, some miners return in uppercase resulting in failed work lookup from map.
- new mgmt interface methods:
?method=setCacheSize&source=<mySourceName>&value=<value>
?method=setMaxConcurrentDl&source=<mySourceName>&value=<value>
?method=setMaxWorkAgeToFlush&source=<mySourceName>&value=<value>
?method=setAllCacheSize&value=<value>
?method=setAllMaxConcurrentDl&value=<value>
?method=setAllMaxWorkAgeToFlush&value=<value>
?method=listWorkerCache

- force cache to be trimmed if shrinking cache size. If left untrimmed this can result in a lag when all the work expires. Normally as work is request it will open up a slot for fresh work to be fetched. However due to cache being oversized this doesn't happen. It's possible for the entire to end up getting purged so the server has to catch up filling the cache from daemon while servicing requests.
- fix: WorkerProxy was case sensitive. In cases where database case was different to user supplied case of worker name this would cause a cache miss and force a db query every time. Thanks to <Eleuthria> for finding.
- complete rewrite of longpolling code
- added async LP dispatch.
- restructured repo to include dependencies as source projects instead binaries
- fix: clean up longpoll shutdown. Missed a threadpool executor which was preventing JVM exit.
- fix: longpoll connection are now all explicity closed on shutdown. Some miners were not registering the closed socket so don't attempt to reconnect to LP when the server restarts.
- fix: prevent work being served from a source until it's confirmed on the new block
- pause work fetching for source until it's confirm on the new block
- add prev_block_hash checks to each incoming work as a new block indicator. This reduces the amount of polling needed when not in native longpoll mode.
- fix: set autoReconnect=true on JDBC connections. worker caching can leave long intervals between uses of the connection causing sql server to time it out.
- add fixed time worker cache eviction strategy. resolves issue #6
- implementation of longpoll connection counting and enforcement of optional limits
- trace logging with target groups for granular tracing.
- clean up of sample properties file add new config options
- fix: nullpointer if share output file not specified.
- fix: nullpointer if request output file not specified.
- addresses issue #7. When setting worker IP first check X-Forwarded-For header then falls back to remoteAddr. This covers situations where the server is behind a load balancing proxy.
- fix: use username.intern() to gain a per user canonical sync lock object. Prevents an obscure bug where two near simultaneous initial connections from one worker can result in multiple db lookups where one hasn't been put into the cache before the other is looked up.
- add keep-alive header to json-rpc client requests.

improve worksource syncing on block change
- prevent entries entering cache during out of sync period
- synchronize the change of sync status process
- remove redundant sourcesOnCurrentBlockArray
- change NotifyLongpollClients thread to a Runnable task to avoid having to start up a new thread.
- acceptNotifyBlockChange now had double check inside sync block to prevent double accepts.
- prestart LP executor threads
- fix mismatched sync objects for block change syncing
- workSource resync moved inside sync block

native longpolling
- add registration of native-enabled sources to native lp listener
- improved debug logging
- address handling using canonical host name
- enable verification request to report success or failure

[0.2.10] - unreleased

- fix: WorkSource request throttling was only activating for HTTP level failures. TCP failures (e.g. connection refused) would not activate throttling resulting in thousands of requests/sec and high CPU usage.
- convert blocknum from string to int
- refactor common elements to bitcoin-poolserverj-core
- fix missing semi-colons in sample sql scripts

Namecoin Forum

[ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

Re: [ATT: POOL OPS] PoolServerJ - scalable java pool backend

0.3.0rc1 Released now under GPL license