Discussion:
jute.maxbuffer not working
James Hardwick
2014-09-08 18:22:02 UTC
Permalink
Hi All,

I’m experiencing an issue on multiple hosts w/ Zookeeper 4.6 where Apache Solr filled the /overseer/queue node too full and can no longer read from it, and now I’m trying to “rmr /overseer/queue” to get things working again. Both systems have 200k+ child nodes of the node at fault.

On both systems I set -Djute.maxbuffer=5242880 within the zkServer.sh throughout the cluster and -Djute.maxbuffer=10000000 in zkCli.sh. On one system I couldn’t get this to work until I set zkCli’s setting substantially higher than the zkServer’s, but I *did* get it to work and have since cleared the queue for that given system.

However, I’m beating my head against a wall for our other system. I’ve set all of the exact same settings and am having no luck rmr’ing the node. I’ve tried bumping the maxbuffer settings to 2-4x as high and still no luck. Every attempt from zkCli results in "ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue"

I’m at my wits end here. I’ve checked everything over and over and cannot see any reason why this should not be working. It appears as a correctly set JVM arg when I grep the zookeeper process. Any advice from anyone is appreciated!

--
James Hardwick
Esteban Gutierrez
2014-09-08 19:53:30 UTC
Permalink
Hello James,

In which configuration file and environment variable are you setting
-Djute.maxbuffer
? ZooKeeper can pass in many places the JVM flags that could be used to
define to pass jut.maxbuffer. Ideally zookeeper-env.sh under the ZooKeeper
configuration directory should be the right place where you export flags
such as JVMFLAGS, SERVER_JVMFLAGS or CLIENT_JVMFLAGS.

regards,
esteban.



--
Cloudera, Inc.
Post by James Hardwick
Hi All,
I’m experiencing an issue on multiple hosts w/ Zookeeper 4.6 where Apache
Solr filled the /overseer/queue node too full and can no longer read from
it, and now I’m trying to “rmr /overseer/queue” to get things working
again. Both systems have 200k+ child nodes of the node at fault.
On both systems I set -Djute.maxbuffer=5242880 within the zkServer.sh
throughout the cluster and -Djute.maxbuffer=10000000 in zkCli.sh. On one
system I couldn’t get this to work until I set zkCli’s setting
substantially higher than the zkServer’s, but I *did* get it to work and
have since cleared the queue for that given system.
However, I’m beating my head against a wall for our other system. I’ve set
all of the exact same settings and am having no luck rmr’ing the node. I’ve
tried bumping the maxbuffer settings to 2-4x as high and still no luck.
KeeperErrorCode = ConnectionLoss for /overseer/queue"
I’m at my wits end here. I’ve checked everything over and over and cannot
see any reason why this should not be working. It appears as a correctly
set JVM arg when I grep the zookeeper process. Any advice from anyone is
appreciated!
--
James Hardwick
James Hardwick
2014-09-08 20:52:54 UTC
Permalink
We were setting it inline within zkServer.sh and zkCli.sh.

The problem turned out to be we weren’t setting it high enough. After further inspection, we stumbled upon this tidbit


"java.io.IOException: Packet len48536074 is out of range!”


which is crazy! So we upped our maxbuffer to 50mb and were then able to clear the queue! The fact that Zookeeper lets you get into this situation in the first place, combined with the fact that Solr has a bug (fixed in 4.10) that could get you into this situation, is unfortunate.

--
James Hardwick
Post by Esteban Gutierrez
Hello James,
In which configuration file and environment variable are you setting
-Djute.maxbuffer
? ZooKeeper can pass in many places the JVM flags that could be used to
define to pass jut.maxbuffer. Ideally zookeeper-env.sh (http://zookeeper-env.sh) under the ZooKeeper
configuration directory should be the right place where you export flags
such as JVMFLAGS, SERVER_JVMFLAGS or CLIENT_JVMFLAGS.
regards,
esteban.
--
Cloudera, Inc.
Post by James Hardwick
Hi All,
I’m experiencing an issue on multiple hosts w/ Zookeeper 4.6 where Apache
Solr filled the /overseer/queue node too full and can no longer read from
it, and now I’m trying to “rmr /overseer/queue” to get things working
again. Both systems have 200k+ child nodes of the node at fault.
On both systems I set -Djute.maxbuffer=5242880 within the zkServer.sh (http://zkServer.sh)
throughout the cluster and -Djute.maxbuffer=10000000 in zkCli.sh (http://zkCli.sh). On one
system I couldn’t get this to work until I set zkCli’s setting
substantially higher than the zkServer’s, but I *did* get it to work and
have since cleared the queue for that given system.
However, I’m beating my head against a wall for our other system. I’ve set
all of the exact same settings and am having no luck rmr’ing the node. I’ve
tried bumping the maxbuffer settings to 2-4x as high and still no luck.
KeeperErrorCode = ConnectionLoss for /overseer/queue"
I’m at my wits end here. I’ve checked everything over and over and cannot
see any reason why this should not be working. It appears as a correctly
set JVM arg when I grep the zookeeper process. Any advice from anyone is
appreciated!
--
James Hardwick
Loading...