jute.maxbuffer not working

Discussion:

James Hardwick

2014-09-08 18:22:02 UTC

Hi All,

Iâm experiencing an issue on multiple hosts w/ Zookeeper 4.6 where Apache Solr filled the /overseer/queue node too full and can no longer read from it, and now Iâm trying to ârmr /overseer/queueâ to get things working again. Both systems have 200k+ child nodes of the node at fault.

On both systems I set -Djute.maxbuffer=5242880 within the zkServer.sh throughout the cluster and -Djute.maxbuffer=10000000 in zkCli.sh. On one system I couldnât get this to work until I set zkCliâs setting substantially higher than the zkServerâs, but I *did* get it to work and have since cleared the queue for that given system.

However, Iâm beating my head against a wall for our other system. Iâve set all of the exact same settings and am having no luck rmrâing the node. Iâve tried bumping the maxbuffer settings to 2-4x as high and still no luck. Every attempt from zkCli results in "ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue"

Iâm at my wits end here. Iâve checked everything over and over and cannot see any reason why this should not be working. It appears as a correctly set JVM arg when I grep the zookeeper process. Any advice from anyone is appreciated!

--
James Hardwick

Esteban Gutierrez

2014-09-08 19:53:30 UTC

Permalink

Hello James,

In which configuration file and environment variable are you setting
-Djute.maxbuffer
? ZooKeeper can pass in many places the JVM flags that could be used to
define to pass jut.maxbuffer. Ideally zookeeper-env.sh under the ZooKeeper
configuration directory should be the right place where you export flags
such as JVMFLAGS, SERVER_JVMFLAGS or CLIENT_JVMFLAGS.

regards,
esteban.

--
Cloudera, Inc.

Post by James Hardwick
Hi All,
Iâm experiencing an issue on multiple hosts w/ Zookeeper 4.6 where Apache
Solr filled the /overseer/queue node too full and can no longer read from
it, and now Iâm trying to ârmr /overseer/queueâ to get things working
again. Both systems have 200k+ child nodes of the node at fault.
On both systems I set -Djute.maxbuffer=5242880 within the zkServer.sh
throughout the cluster and -Djute.maxbuffer=10000000 in zkCli.sh. On one
system I couldnât get this to work until I set zkCliâs setting
substantially higher than the zkServerâs, but I *did* get it to work and
have since cleared the queue for that given system.
However, Iâm beating my head against a wall for our other system. Iâve set
all of the exact same settings and am having no luck rmrâing the node. Iâve
tried bumping the maxbuffer settings to 2-4x as high and still no luck.
KeeperErrorCode = ConnectionLoss for /overseer/queue"
Iâm at my wits end here. Iâve checked everything over and over and cannot
see any reason why this should not be working. It appears as a correctly
set JVM arg when I grep the zookeeper process. Any advice from anyone is
appreciated!
--
James Hardwick

James Hardwick

2014-09-08 20:52:54 UTC

Permalink

We were setting it inline within zkServer.sh and zkCli.sh.

The problem turned out to be we werenât setting it high enough. After further inspection, we stumbled upon this tidbitâŠ

"java.io.IOException: Packet len48536074 is out of range!â

âŠwhich is crazy! So we upped our maxbuffer to 50mb and were then able to clear the queue! The fact that Zookeeper lets you get into this situation in the first place, combined with the fact that Solr has a bug (fixed in 4.10) that could get you into this situation, is unfortunate.

--
James Hardwick

Post by Esteban Gutierrez
Hello James,
In which configuration file and environment variable are you setting
-Djute.maxbuffer
? ZooKeeper can pass in many places the JVM flags that could be used to
define to pass jut.maxbuffer. Ideally zookeeper-env.sh (http://zookeeper-env.sh) under the ZooKeeper
configuration directory should be the right place where you export flags
such as JVMFLAGS, SERVER_JVMFLAGS or CLIENT_JVMFLAGS.
regards,
esteban.
--
Cloudera, Inc.

Post by James Hardwick
Hi All,
Iâm experiencing an issue on multiple hosts w/ Zookeeper 4.6 where Apache
Solr filled the /overseer/queue node too full and can no longer read from
it, and now Iâm trying to ârmr /overseer/queueâ to get things working
again. Both systems have 200k+ child nodes of the node at fault.
On both systems I set -Djute.maxbuffer=5242880 within the zkServer.sh (http://zkServer.sh)
throughout the cluster and -Djute.maxbuffer=10000000 in zkCli.sh (http://zkCli.sh). On one
system I couldnât get this to work until I set zkCliâs setting
substantially higher than the zkServerâs, but I *did* get it to work and
have since cleared the queue for that given system.
However, Iâm beating my head against a wall for our other system. Iâve set
all of the exact same settings and am having no luck rmrâing the node. Iâve
tried bumping the maxbuffer settings to 2-4x as high and still no luck.
KeeperErrorCode = ConnectionLoss for /overseer/queue"
Iâm at my wits end here. Iâve checked everything over and over and cannot
see any reason why this should not be working. It appears as a correctly
set JVM arg when I grep the zookeeper process. Any advice from anyone is
appreciated!
--
James Hardwick