[Nfd-dev] Memory related issues with NFD.
anilj.mailing at gmail.com
Sun May 8 23:39:46 PDT 2016
I tried to investigate using the stack tracer got from Valgrind report. Can
you please analyze these further and check if there is an issue here?
Here each time it is allocating ~9K block of memory. I am not sure when
this is released, but this is the topmost contributor for memory build up.
27 Encoder::Encoder(size_t totalReserve/* = 8800*/, size_t
reserveFromBack/* = 400*/)
28 : m_buffer(new Buffer(totalReserve))
30 m_begin = m_end = m_buffer->end() - (reserveFromBack < totalReserve ?
reserveFromBack : 0);
83 Buffer* buf = new Buffer(size);
84 std::copy_backward(m_buffer->begin(), m_buffer->end(), buf->end());
The other possibility is that dead-nonce-list is not getting cleared after
the *loop detection duration*. Or perhaps the ndn::Block() is not released
after the ndn::Name wireEncode() is done for the first time. This is the
second most reason for memory build up.
106 DeadNonceList::makeEntry(const Name& name, uint32_t nonce)
108 Block nameWire = name.wireEncode();
346 element_begin, element_end,
347 begin, element_end));
On Sat, May 7, 2016 at 12:24 AM, Anil Jangam <anilj.mailing at gmail.com>
> Hello All,
> We debugged this issue further and below are out findings.
> - The issue is reproducible also on standalone NFD. Vince tried about 100
> registration requests and there is a consistent increase in memory. This
> increase is present even if RibManager::sendSuccessResponse is not called.
> - The memory grows even if we bypass the RibManager completely by using
> "nfdc add-nexthop" and this problem is present in the latest code of NFD
> since Vince tested this with the most up-to-date version.
> - Another possibility we thought about was response messages getting
> cached in CS leading to increase in memory consumption by NFD. To rule this
> out, we set the CS size to 1 by calling 'ndnHelper.setCsSize(1);' before
> installing NDN L3 Stack on nodes, but yet we see memory growth.
> - I also checked that default CS size is 100 packets. So even with this
> size, CS should not grow beyond 100 packets. So we do not think CS is
> causing this growth.
> 42 StackHelper::StackHelper()
> 43 : m_needSetDefaultRoutes(false)
> 44 , m_maxCsSize(100)
> 45 , m_isRibManagerDisabled(false)
> 46 , m_isFaceManagerDisabled(false)
> 47 , m_isStatusServerDisabled(false)
> 48 , m_isStrategyChoiceManagerDisabled(false)
> - It seems to be some internal pipeline issue, because when we performed
> either 1000 add-nexthop commands or 1000 registration commands for the same
> prefix, the memory increase was observed.
> As mentioned above, we believe this issue also present in standalone NFD,
> it is not yet reported perhaps because of the scale. Since I am running
> with 100+ nodes, each having its own NFD instance on my laptop (8G RAM),
> the growth is very quick.
> We need your inputs to debug this issue further.
> On Wed, May 4, 2016 at 1:47 PM, Anil Jangam <anilj.mailing at gmail.com>
>> Here are some more data points from Valgrind Massiff analysis. I have ran
>> it for 25 and 50 nodes.
>> On Wed, May 4, 2016 at 2:26 AM, Anil Jangam <anilj.mailing at gmail.com>
>>> Hi Junxiao,
>>> The memory leak is now closed by back porting the fix you referred to.
>>> However, the growth in memory consumption is still evident. This time, I
>>> believe it is a bloating of the process size. Can you please comment
>>> looking at the attached Valgrind logs if this is a legitimate requirement
>>> of NFD or it is just holding up the resources without really needing it? I
>>> see the allocation emanating from RibManager and on receiving Interest as
>>> some of the major contributors.
>>> Likewise you said earlier, these are perhaps fixed into main branch of
>>> NFD but not ported yet into the NFD of ndnSIM. Please check, reports are
>>> 50 node simulation valgrind summary:
>>> ==9587== LEAK SUMMARY:
>>> ==9587== definitely lost: 0 bytes in 0 blocks
>>> ==9587== indirectly lost: 0 bytes in 0 blocks
>>> ==9587== possibly lost: 2,263,514 bytes in 67,928 blocks
>>> ==9587== still reachable: 1,474,943,776 bytes in 3,910,237 blocks
>>> ==9587== suppressed: 0 bytes in 0 blocks
>>> ==9587== For counts of detected and suppressed errors, rerun with: -v
>>> ==9587== ERROR SUMMARY: 37 errors from 37 contexts (suppressed: 0 from 0)
>>> 25 node simulation valgrind summary:
>>> ==9287== LEAK SUMMARY:
>>> ==9287== definitely lost: 0 bytes in 0 blocks
>>> ==9287== indirectly lost: 0 bytes in 0 blocks
>>> ==9287== possibly lost: 400,259 bytes in 11,100 blocks
>>> ==9287== still reachable: 437,147,930 bytes in 1,132,024 blocks
>>> ==9287== suppressed: 0 bytes in 0 blocks
>>> ==9287== For counts of detected and suppressed errors, rerun with: -v
>>> ==9287== ERROR SUMMARY: 31 errors from 31 contexts (suppressed: 0 from 0)
>>> On Tue, May 3, 2016 at 7:42 AM, Junxiao Shi <
>>> shijunxiao at email.arizona.edu> wrote:
>>>> Hi Anil
>>>> The call stack in the Valgrind report indicates that you are running
>>>> NFD within ndnSIM.
>>>> #3236 is fixed in
>>>> NFD commit 9c903e063ea8bdb324a421458eed4f51990ccd2c on Oct 04, 2015.
>>>> However, ndnSIM's NFD fork is dated back on Aug 21, 2015, and doesn't
>>>> contain the fix.
>>>> You may try to backport that commit to ndnSIM's NFD fork, or ask ndnSIM
>>>> developers to upgrade their fork.
>>>> Yours, Junxiao
>>>> On Mon, May 2, 2016 at 5:23 PM, Anil Jangam <anilj.mailing at gmail.com>
>>>>> Hi Junxiao,
>>>>> I am observing a memory leak with NFD and to verify the same I did
>>>>> couple of Valgrind enabled simulation runs with 25 and 50 nodes. Based on
>>>>> the Valgrind report, and output of 'top' command, I see that RAM
>>>>> consumption grows consistently and rapidly. My scaling test is affected
>>>>> that I am not able to run the simulation for longer time and/or with high
>>>>> number of nodes. Also, I see a very high number of timeouts
>>>>> I see a NFD leak issue in closed state, which confirms this leak
>>>>> however closed owing to its small size. Perhaps this is showing up a high
>>>>> Please check the attached Valgrind report. Let me know what other data
>>>>> you may need to debug this further. Also, please suggest a solution or
>>>>> workaround to this?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Nfd-dev