[Nfd-dev] Memory related issues with NFD.

Mon May 9 11:47:49 PDT 2016

Done: http://redmine.named-data.net/issues/3618

On Mon, May 9, 2016 at 6:58 AM, Beichuan Zhang <bzhang at cs.arizona.edu>
wrote:

> Hi Anil,
>
> Can you create a redmine issue (http://redmine.named-data.net) to
> document all the information and discussion in one place? Most of us are
> working on paper deadline this week, so response may be slow.
>
> Thanks,
>
> Beichuan
>
> On May 9, 2016, at 1:39 AM, Anil Jangam <anilj.mailing at gmail.com> wrote:
>
> I tried to investigate using the stack tracer got from Valgrind report.
> Can you please analyze these further and check if there is an issue here?
>
> =================================
> ./NFD/rib/rib-manager.cpp
>
> 188   m_keyChain.sign(*responseData);
> 189   m_face.put(*responseData);
>
>
> ./NFD/daemon/mgmt/manager-base.cpp
>
> 98   m_keyChain.sign(*responseData);
> 99   m_face->put(*responseData);
>
>
> Here each time it is allocating ~9K block of memory. I am not sure when
> this is released, but this is the topmost contributor for memory build up.
> ./ndn-cxx/src/encoding/encoder.cpp
>
> 27 Encoder::Encoder(size_t totalReserve/* = 8800*/, size_t
> reserveFromBack/* = 400*/)
> 28   : m_buffer(new Buffer(totalReserve))
> 29 {
> 30   m_begin = m_end = m_buffer->end() - (reserveFromBack < totalReserve ?
> reserveFromBack : 0);
> 31 }
>
>
> 83     Buffer* buf = new Buffer(size);
> 84     std::copy_backward(m_buffer->begin(), m_buffer->end(), buf->end());
> 85
>
>
> =================================
> The other possibility is that dead-nonce-list is not getting cleared after
> the *loop detection duration*. Or perhaps the ndn::Block() is not released
> after the ndn::Name wireEncode() is done for the first time. This is the
> second most reason for memory build up.
> Ref:
> https://github.com/named-data/NFD/blob/master/daemon/table/dead-nonce-list.hpp#L39
>
> ./NFD/daemon/table/dead-nonce-list.cpp
>
> 105 DeadNonceList::Entry
> 106 DeadNonceList::makeEntry(const Name& name, uint32_t nonce)
> 107 {
> 108   Block nameWire = name.wireEncode();
>
>
> ./ndn-cxx/src/encoding/block.cpp
>
> 344       m_subBlocks.push_back(Block(m_buffer,
> 345                                   type,
> 346                                   element_begin, element_end,
> 347                                   begin, element_end));
> 348
>
>
>
> On Sat, May 7, 2016 at 12:24 AM, Anil Jangam <anilj.mailing at gmail.com>
> wrote:
>
>> Hello All,
>>
>> We debugged this issue further and below are out findings.
>>
>> - The issue is reproducible also on standalone NFD. Vince tried about 100
>> registration requests and there is a consistent increase in memory. This
>> increase is present even if RibManager::sendSuccessResponse is not called.
>>
>> - The memory grows even if we bypass the RibManager completely by using
>> "nfdc add-nexthop" and this problem is present in the latest code of NFD
>> since Vince tested this with the most up-to-date version.
>>
>> - Another possibility we thought about was response messages getting
>> cached in CS leading to increase in memory consumption by NFD. To rule this
>> out, we set the CS size to 1 by calling  'ndnHelper.setCsSize(1);' before
>> installing NDN L3 Stack on nodes, but yet we see memory growth.
>>
>> - I also checked that default CS size is 100 packets. So even with this
>> size, CS should not grow beyond 100 packets. So we do not think CS is
>> causing this growth.
>>
>>  42 StackHelper::StackHelper()
>>  43   : m_needSetDefaultRoutes(false)
>>  44   , m_maxCsSize(100)
>>  45   , m_isRibManagerDisabled(false)
>>  46   , m_isFaceManagerDisabled(false)
>>  47   , m_isStatusServerDisabled(false)
>>  48   , m_isStrategyChoiceManagerDisabled(false)
>>
>>
>> - It seems to be some internal pipeline issue, because when we performed
>> either 1000 add-nexthop commands or 1000 registration commands for the same
>> prefix, the memory increase was observed.
>>
>> As mentioned above, we believe this issue also present in standalone NFD,
>> it is not yet reported perhaps because of the scale. Since I am running
>> with 100+ nodes, each having its own NFD instance on my laptop (8G RAM),
>> the growth is very quick.
>>
>> We need your inputs to debug this issue further.
>>
>> Thanks,
>> /anil
>>
>> On Wed, May 4, 2016 at 1:47 PM, Anil Jangam <anilj.mailing at gmail.com>
>> wrote:
>>
>>> Here are some more data points from Valgrind Massiff analysis. I have
>>> ran it for 25 and 50 nodes.
>>>
>>> /anil.
>>>
>>>
>>> On Wed, May 4, 2016 at 2:26 AM, Anil Jangam <anilj.mailing at gmail.com>
>>> wrote:
>>>
>>>> Hi Junxiao,
>>>>
>>>> The memory leak is now closed by back porting the fix you referred to.
>>>> However, the growth in memory consumption is still evident. This time, I
>>>> believe it is a bloating of the process size. Can you please comment
>>>> looking at the attached Valgrind logs if this is a legitimate requirement
>>>> of NFD or it is just holding up the resources without really needing it?  I
>>>> see the allocation emanating from RibManager and on receiving Interest as
>>>> some of the major contributors.
>>>>
>>>> Likewise you said earlier, these are perhaps fixed into main branch of
>>>> NFD but not ported yet into the NFD of ndnSIM. Please check, reports are
>>>> attached.
>>>>
>>>> 50 node simulation valgrind summary:
>>>> -------------------------------------------------------
>>>> ==9587== LEAK SUMMARY:
>>>> ==9587==    definitely lost: 0 bytes in 0 blocks
>>>> ==9587==    indirectly lost: 0 bytes in 0 blocks
>>>> ==9587==      possibly lost: 2,263,514 bytes in 67,928 blocks
>>>> ==9587==    still reachable: 1,474,943,776 bytes in 3,910,237 blocks
>>>> ==9587==         suppressed: 0 bytes in 0 blocks
>>>> ==9587==
>>>> ==9587== For counts of detected and suppressed errors, rerun with: -v
>>>> ==9587== ERROR SUMMARY: 37 errors from 37 contexts (suppressed: 0 from
>>>> 0)
>>>>
>>>> 25 node simulation valgrind summary:
>>>> -------------------------------------------------------
>>>> ==9287== LEAK SUMMARY:
>>>> ==9287==    definitely lost: 0 bytes in 0 blocks
>>>> ==9287==    indirectly lost: 0 bytes in 0 blocks
>>>> ==9287==      possibly lost: 400,259 bytes in 11,100 blocks
>>>> ==9287==    still reachable: 437,147,930 bytes in 1,132,024 blocks
>>>> ==9287==         suppressed: 0 bytes in 0 blocks
>>>> ==9287==
>>>> ==9287== For counts of detected and suppressed errors, rerun with: -v
>>>> ==9287== ERROR SUMMARY: 31 errors from 31 contexts (suppressed: 0 from
>>>> 0)
>>>>
>>>> /anil.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 3, 2016 at 7:42 AM, Junxiao Shi <
>>>> shijunxiao at email.arizona.edu> wrote:
>>>>
>>>>> Hi Anil
>>>>>
>>>>> The call stack in the Valgrind report indicates that you are running
>>>>> NFD within ndnSIM.
>>>>> #3236 is fixed in
>>>>> NFD commit 9c903e063ea8bdb324a421458eed4f51990ccd2c on Oct 04, 2015.
>>>>> However, ndnSIM's NFD fork is dated back on Aug 21, 2015, and doesn't
>>>>> contain the fix.
>>>>> You may try to backport that commit to ndnSIM's NFD fork, or ask
>>>>> ndnSIM developers to upgrade their fork.
>>>>>
>>>>> Yours, Junxiao
>>>>>
>>>>>
>>>>> On Mon, May 2, 2016 at 5:23 PM, Anil Jangam <anilj.mailing at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Junxiao,
>>>>>>
>>>>>> I am observing a memory leak with NFD and to verify the same I did
>>>>>> couple of Valgrind enabled simulation runs with 25 and 50 nodes. Based on
>>>>>> the Valgrind report, and output of 'top' command, I see that RAM
>>>>>> consumption grows consistently and rapidly. My scaling test is affected
>>>>>> that I am not able to run the simulation for longer time and/or with high
>>>>>> number of nodes. Also, I see a very high number of timeouts
>>>>>>
>>>>>> I see a NFD leak issue in closed state, which confirms this leak
>>>>>> however closed owing to its small size. Perhaps this is showing up a high
>>>>>> scale?
>>>>>> http://redmine.named-data.net/issues/3236/
>>>>>>
>>>>>> Please check the attached Valgrind report. Let me know what other
>>>>>> data you may need to debug this further. Also, please suggest a solution or
>>>>>> workaround to this?
>>>>>>
>>>>>> /anil.
>>>>>>
>>>>>>
>>>>
>>>
>>
> _______________________________________________
> Nfd-dev mailing list
> Nfd-dev at lists.cs.ucla.edu
> http://www.lists.cs.ucla.edu/mailman/listinfo/nfd-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20160509/4a143d91/attachment.html>