This version (2017/05/27 13:44) is a draft.
Approvals: 0/1

[09:12:46] * ChanServ sets mode: +o temporalfox [12:21:26] * ChanServ sets mode: +o temporalfox

[17:10:25] * ChanServ sets mode: +o temporalfox [22:52:01] * ChanServ sets mode: +o temporalfox

[23:28:14] <temporalfox> hi AlexLehm

[23:41:06] <AlexLehm> Hello temporalfox

[23:41:21] <temporalfox> I've made progress on the netty 4.1.3.Final OOM isue

[23:41:29] <temporalfox> now I know more or less what it is

[23:41:41] <temporalfox> basically it's not a leak

[23:41:59] <temporalfox> The error happening is not a raw out of memory but rather an indication than the VM spend more CPU in the GC that in the application (98%).

[23:42:13] <temporalfox> This happens when weak references are used

[23:42:38] <temporalfox> Netty has a Recycler class that recycles pooled objects

[23:42:58] <temporalfox> normally a thread recycles an object directly into its pool

[23:43:41] <temporalfox> when an object is recycled from another thread, this object is stored in a weakhashmap<Stack, WeakOrderQueue>

[23:44:11] <temporalfox> and later these objects are pulled when the Recycler needs objects (from the thread the recycler belongs to)

[23:44:37] <temporalfox> it turns that the weakhashmap contains references to eventloop thread (VertxThread)

[23:44:54] <temporalfox> and recently a commit changed the behavior of the recycler

[23:45:06] <temporalfox> so more objects are in weakhashmap

[23:45:11] <temporalfox> and thus more threads

[23:45:30] <temporalfox> and it turns that during tests it keeps a lof of thread from being garbaged

[23:45:35] <temporalfox> specially in slow machines

[23:45:46] <temporalfox> where allocation is faster than GC

[23:45:55] <temporalfox> hence the specific GC issue

[23:46:41] <temporalfox> this commit is the change

[23:47:05] <temporalfox> I'm trying to figure out what is the correct thing to do

[23:47:19] <temporalfox> it may simply be a more appropriate GC setting for tetss

[23:47:25] <temporalfox> or change the recycler parameters

[23:47:29] <temporalfox> I don't know yet :-)

[23:48:19] <temporalfox> in the Recycler class there is

[23:48:26] <AlexLehm> is that an issue that mostly happens in tests since there more objects are created and recyled?

[23:48:32] <temporalfox> yes

[23:48:33] <AlexLehm> or would that happen in real applications as well?

[23:48:41] <temporalfox> no I don't think so

[23:48:43] <temporalfox> DELAYED_RECYCLED

[23:48:43] <AlexLehm> ok

[23:48:53] <temporalfox> this is the thread local weak hashmap

[23:50:12] <temporalfox> it's mainly because we create and destroy many Vertx instances

[23:50:14] <temporalfox> in tests

[23:51:13] <temporalfox> and it happens because of the ThreadDeathWatcher

[23:51:32] <temporalfox> that schedule tasks when some thread dies

[23:52:01] <temporalfox> with PoolThreadCache

[23:52:25] <temporalfox> so the ThreadDeathWatcher fast thread local contains a weakhashmap that grows

[23:52:35] <temporalfox> and weak retains the vertx threads

[23:52:56] <temporalfox> with the new change this map reach around 2000 elements

[23:53:02] <temporalfox> and without it remains quite low