[Nfd-dev] Memory related issues with NFD.

Sun May 8 23:39:46 PDT 2016

I tried to investigate using the stack tracer got from Valgrind report. Can
you please analyze these further and check if there is an issue here?

=================================
./NFD/rib/rib-manager.cpp

188   m_keyChain.sign(*responseData);
189   m_face.put(*responseData);

./NFD/daemon/mgmt/manager-base.cpp

98   m_keyChain.sign(*responseData);
99   m_face->put(*responseData);

Here each time it is allocating ~9K block of memory. I am not sure when
this is released, but this is the topmost contributor for memory build up.
./ndn-cxx/src/encoding/encoder.cpp

27 Encoder::Encoder(size_t totalReserve/* = 8800*/, size_t
reserveFromBack/* = 400*/)
28   : m_buffer(new Buffer(totalReserve))
29 {
30   m_begin = m_end = m_buffer->end() - (reserveFromBack < totalReserve ?
reserveFromBack : 0);
31 }

83     Buffer* buf = new Buffer(size);
84     std::copy_backward(m_buffer->begin(), m_buffer->end(), buf->end());
85

=================================
The other possibility is that dead-nonce-list is not getting cleared after
the *loop detection duration*. Or perhaps the ndn::Block() is not released
after the ndn::Name wireEncode() is done for the first time. This is the
second most reason for memory build up.
Ref:
https://github.com/named-data/NFD/blob/master/daemon/table/dead-nonce-list.hpp#L39

./NFD/daemon/table/dead-nonce-list.cpp

105 DeadNonceList::Entry
106 DeadNonceList::makeEntry(const Name& name, uint32_t nonce)
107 {
108   Block nameWire = name.wireEncode();

./ndn-cxx/src/encoding/block.cpp

344       m_subBlocks.push_back(Block(m_buffer,
345                                   type,
346                                   element_begin, element_end,
347                                   begin, element_end));
348

On Sat, May 7, 2016 at 12:24 AM, Anil Jangam <anilj.mailing at gmail.com>
wrote:

> Hello All,
>
> We debugged this issue further and below are out findings.
>
> - The issue is reproducible also on standalone NFD. Vince tried about 100
> registration requests and there is a consistent increase in memory. This
> increase is present even if RibManager::sendSuccessResponse is not called.
>
> - The memory grows even if we bypass the RibManager completely by using
> "nfdc add-nexthop" and this problem is present in the latest code of NFD
> since Vince tested this with the most up-to-date version.
>
> - Another possibility we thought about was response messages getting
> cached in CS leading to increase in memory consumption by NFD. To rule this
> out, we set the CS size to 1 by calling  'ndnHelper.setCsSize(1);' before
> installing NDN L3 Stack on nodes, but yet we see memory growth.
>
> - I also checked that default CS size is 100 packets. So even with this
> size, CS should not grow beyond 100 packets. So we do not think CS is
> causing this growth.
>
>  42 StackHelper::StackHelper()
>  43   : m_needSetDefaultRoutes(false)
>  44   , m_maxCsSize(100)
>  45   , m_isRibManagerDisabled(false)
>  46   , m_isFaceManagerDisabled(false)
>  47   , m_isStatusServerDisabled(false)
>  48   , m_isStrategyChoiceManagerDisabled(false)
>
>
> - It seems to be some internal pipeline issue, because when we performed
> either 1000 add-nexthop commands or 1000 registration commands for the same
> prefix, the memory increase was observed.
>
> As mentioned above, we believe this issue also present in standalone NFD,
> it is not yet reported perhaps because of the scale. Since I am running
> with 100+ nodes, each having its own NFD instance on my laptop (8G RAM),
> the growth is very quick.
>
> We need your inputs to debug this issue further.
>
> Thanks,
> /anil
>
> On Wed, May 4, 2016 at 1:47 PM, Anil Jangam <anilj.mailing at gmail.com>
> wrote:
>
>> Here are some more data points from Valgrind Massiff analysis. I have ran
>> it for 25 and 50 nodes.
>>
>> /anil.
>>
>>
>> On Wed, May 4, 2016 at 2:26 AM, Anil Jangam <anilj.mailing at gmail.com>
>> wrote:
>>
>>> Hi Junxiao,
>>>
>>> The memory leak is now closed by back porting the fix you referred to.
>>> However, the growth in memory consumption is still evident. This time, I
>>> believe it is a bloating of the process size. Can you please comment
>>> looking at the attached Valgrind logs if this is a legitimate requirement
>>> of NFD or it is just holding up the resources without really needing it?  I
>>> see the allocation emanating from RibManager and on receiving Interest as
>>> some of the major contributors.
>>>
>>> Likewise you said earlier, these are perhaps fixed into main branch of
>>> NFD but not ported yet into the NFD of ndnSIM. Please check, reports are
>>> attached.
>>>
>>> 50 node simulation valgrind summary:
>>> -------------------------------------------------------
>>> ==9587== LEAK SUMMARY:
>>> ==9587==    definitely lost: 0 bytes in 0 blocks
>>> ==9587==    indirectly lost: 0 bytes in 0 blocks
>>> ==9587==      possibly lost: 2,263,514 bytes in 67,928 blocks
>>> ==9587==    still reachable: 1,474,943,776 bytes in 3,910,237 blocks
>>> ==9587==         suppressed: 0 bytes in 0 blocks
>>> ==9587==
>>> ==9587== For counts of detected and suppressed errors, rerun with: -v
>>> ==9587== ERROR SUMMARY: 37 errors from 37 contexts (suppressed: 0 from 0)
>>>
>>> 25 node simulation valgrind summary:
>>> -------------------------------------------------------
>>> ==9287== LEAK SUMMARY:
>>> ==9287==    definitely lost: 0 bytes in 0 blocks
>>> ==9287==    indirectly lost: 0 bytes in 0 blocks
>>> ==9287==      possibly lost: 400,259 bytes in 11,100 blocks
>>> ==9287==    still reachable: 437,147,930 bytes in 1,132,024 blocks
>>> ==9287==         suppressed: 0 bytes in 0 blocks
>>> ==9287==
>>> ==9287== For counts of detected and suppressed errors, rerun with: -v
>>> ==9287== ERROR SUMMARY: 31 errors from 31 contexts (suppressed: 0 from 0)
>>>
>>> /anil.
>>>
>>>
>>>
>>>
>>> On Tue, May 3, 2016 at 7:42 AM, Junxiao Shi <
>>> shijunxiao at email.arizona.edu> wrote:
>>>
>>>> Hi Anil
>>>>
>>>> The call stack in the Valgrind report indicates that you are running
>>>> NFD within ndnSIM.
>>>> #3236 is fixed in
>>>> NFD commit 9c903e063ea8bdb324a421458eed4f51990ccd2c on Oct 04, 2015.
>>>> However, ndnSIM's NFD fork is dated back on Aug 21, 2015, and doesn't
>>>> contain the fix.
>>>> You may try to backport that commit to ndnSIM's NFD fork, or ask ndnSIM
>>>> developers to upgrade their fork.
>>>>
>>>> Yours, Junxiao
>>>>
>>>>
>>>> On Mon, May 2, 2016 at 5:23 PM, Anil Jangam <anilj.mailing at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Junxiao,
>>>>>
>>>>> I am observing a memory leak with NFD and to verify the same I did
>>>>> couple of Valgrind enabled simulation runs with 25 and 50 nodes. Based on
>>>>> the Valgrind report, and output of 'top' command, I see that RAM
>>>>> consumption grows consistently and rapidly. My scaling test is affected
>>>>> that I am not able to run the simulation for longer time and/or with high
>>>>> number of nodes. Also, I see a very high number of timeouts
>>>>>
>>>>> I see a NFD leak issue in closed state, which confirms this leak
>>>>> however closed owing to its small size. Perhaps this is showing up a high
>>>>> scale?
>>>>> http://redmine.named-data.net/issues/3236/
>>>>>
>>>>> Please check the attached Valgrind report. Let me know what other data
>>>>> you may need to debug this further. Also, please suggest a solution or
>>>>> workaround to this?
>>>>>
>>>>> /anil.
>>>>>
>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20160508/1a8153d0/attachment.html>