[Nfd-dev] Memory related issues with NFD.
Anil Jangam
anilj.mailing at gmail.com
Sun May 8 23:39:46 PDT 2016
I tried to investigate using the stack tracer got from Valgrind report. Can
you please analyze these further and check if there is an issue here?
=================================
./NFD/rib/rib-manager.cpp
188 m_keyChain.sign(*responseData);
189 m_face.put(*responseData);
./NFD/daemon/mgmt/manager-base.cpp
98 m_keyChain.sign(*responseData);
99 m_face->put(*responseData);
Here each time it is allocating ~9K block of memory. I am not sure when
this is released, but this is the topmost contributor for memory build up.
./ndn-cxx/src/encoding/encoder.cpp
27 Encoder::Encoder(size_t totalReserve/* = 8800*/, size_t
reserveFromBack/* = 400*/)
28 : m_buffer(new Buffer(totalReserve))
29 {
30 m_begin = m_end = m_buffer->end() - (reserveFromBack < totalReserve ?
reserveFromBack : 0);
31 }
83 Buffer* buf = new Buffer(size);
84 std::copy_backward(m_buffer->begin(), m_buffer->end(), buf->end());
85
=================================
The other possibility is that dead-nonce-list is not getting cleared after
the *loop detection duration*. Or perhaps the ndn::Block() is not released
after the ndn::Name wireEncode() is done for the first time. This is the
second most reason for memory build up.
Ref:
https://github.com/named-data/NFD/blob/master/daemon/table/dead-nonce-list.hpp#L39
./NFD/daemon/table/dead-nonce-list.cpp
105 DeadNonceList::Entry
106 DeadNonceList::makeEntry(const Name& name, uint32_t nonce)
107 {
108 Block nameWire = name.wireEncode();
./ndn-cxx/src/encoding/block.cpp
344 m_subBlocks.push_back(Block(m_buffer,
345 type,
346 element_begin, element_end,
347 begin, element_end));
348
On Sat, May 7, 2016 at 12:24 AM, Anil Jangam <anilj.mailing at gmail.com>
wrote:
> Hello All,
>
> We debugged this issue further and below are out findings.
>
> - The issue is reproducible also on standalone NFD. Vince tried about 100
> registration requests and there is a consistent increase in memory. This
> increase is present even if RibManager::sendSuccessResponse is not called.
>
> - The memory grows even if we bypass the RibManager completely by using
> "nfdc add-nexthop" and this problem is present in the latest code of NFD
> since Vince tested this with the most up-to-date version.
>
> - Another possibility we thought about was response messages getting
> cached in CS leading to increase in memory consumption by NFD. To rule this
> out, we set the CS size to 1 by calling 'ndnHelper.setCsSize(1);' before
> installing NDN L3 Stack on nodes, but yet we see memory growth.
>
> - I also checked that default CS size is 100 packets. So even with this
> size, CS should not grow beyond 100 packets. So we do not think CS is
> causing this growth.
>
> 42 StackHelper::StackHelper()
> 43 : m_needSetDefaultRoutes(false)
> 44 , m_maxCsSize(100)
> 45 , m_isRibManagerDisabled(false)
> 46 , m_isFaceManagerDisabled(false)
> 47 , m_isStatusServerDisabled(false)
> 48 , m_isStrategyChoiceManagerDisabled(false)
>
>
> - It seems to be some internal pipeline issue, because when we performed
> either 1000 add-nexthop commands or 1000 registration commands for the same
> prefix, the memory increase was observed.
>
> As mentioned above, we believe this issue also present in standalone NFD,
> it is not yet reported perhaps because of the scale. Since I am running
> with 100+ nodes, each having its own NFD instance on my laptop (8G RAM),
> the growth is very quick.
>
> We need your inputs to debug this issue further.
>
> Thanks,
> /anil
>
> On Wed, May 4, 2016 at 1:47 PM, Anil Jangam <anilj.mailing at gmail.com>
> wrote:
>
>> Here are some more data points from Valgrind Massiff analysis. I have ran
>> it for 25 and 50 nodes.
>>
>> /anil.
>>
>>
>> On Wed, May 4, 2016 at 2:26 AM, Anil Jangam <anilj.mailing at gmail.com>
>> wrote:
>>
>>> Hi Junxiao,
>>>
>>> The memory leak is now closed by back porting the fix you referred to.
>>> However, the growth in memory consumption is still evident. This time, I
>>> believe it is a bloating of the process size. Can you please comment
>>> looking at the attached Valgrind logs if this is a legitimate requirement
>>> of NFD or it is just holding up the resources without really needing it? I
>>> see the allocation emanating from RibManager and on receiving Interest as
>>> some of the major contributors.
>>>
>>> Likewise you said earlier, these are perhaps fixed into main branch of
>>> NFD but not ported yet into the NFD of ndnSIM. Please check, reports are
>>> attached.
>>>
>>> 50 node simulation valgrind summary:
>>> -------------------------------------------------------
>>> ==9587== LEAK SUMMARY:
>>> ==9587== definitely lost: 0 bytes in 0 blocks
>>> ==9587== indirectly lost: 0 bytes in 0 blocks
>>> ==9587== possibly lost: 2,263,514 bytes in 67,928 blocks
>>> ==9587== still reachable: 1,474,943,776 bytes in 3,910,237 blocks
>>> ==9587== suppressed: 0 bytes in 0 blocks
>>> ==9587==
>>> ==9587== For counts of detected and suppressed errors, rerun with: -v
>>> ==9587== ERROR SUMMARY: 37 errors from 37 contexts (suppressed: 0 from 0)
>>>
>>> 25 node simulation valgrind summary:
>>> -------------------------------------------------------
>>> ==9287== LEAK SUMMARY:
>>> ==9287== definitely lost: 0 bytes in 0 blocks
>>> ==9287== indirectly lost: 0 bytes in 0 blocks
>>> ==9287== possibly lost: 400,259 bytes in 11,100 blocks
>>> ==9287== still reachable: 437,147,930 bytes in 1,132,024 blocks
>>> ==9287== suppressed: 0 bytes in 0 blocks
>>> ==9287==
>>> ==9287== For counts of detected and suppressed errors, rerun with: -v
>>> ==9287== ERROR SUMMARY: 31 errors from 31 contexts (suppressed: 0 from 0)
>>>
>>> /anil.
>>>
>>>
>>>
>>>
>>> On Tue, May 3, 2016 at 7:42 AM, Junxiao Shi <
>>> shijunxiao at email.arizona.edu> wrote:
>>>
>>>> Hi Anil
>>>>
>>>> The call stack in the Valgrind report indicates that you are running
>>>> NFD within ndnSIM.
>>>> #3236 is fixed in
>>>> NFD commit 9c903e063ea8bdb324a421458eed4f51990ccd2c on Oct 04, 2015.
>>>> However, ndnSIM's NFD fork is dated back on Aug 21, 2015, and doesn't
>>>> contain the fix.
>>>> You may try to backport that commit to ndnSIM's NFD fork, or ask ndnSIM
>>>> developers to upgrade their fork.
>>>>
>>>> Yours, Junxiao
>>>>
>>>>
>>>> On Mon, May 2, 2016 at 5:23 PM, Anil Jangam <anilj.mailing at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Junxiao,
>>>>>
>>>>> I am observing a memory leak with NFD and to verify the same I did
>>>>> couple of Valgrind enabled simulation runs with 25 and 50 nodes. Based on
>>>>> the Valgrind report, and output of 'top' command, I see that RAM
>>>>> consumption grows consistently and rapidly. My scaling test is affected
>>>>> that I am not able to run the simulation for longer time and/or with high
>>>>> number of nodes. Also, I see a very high number of timeouts
>>>>>
>>>>> I see a NFD leak issue in closed state, which confirms this leak
>>>>> however closed owing to its small size. Perhaps this is showing up a high
>>>>> scale?
>>>>> http://redmine.named-data.net/issues/3236/
>>>>>
>>>>> Please check the attached Valgrind report. Let me know what other data
>>>>> you may need to debug this further. Also, please suggest a solution or
>>>>> workaround to this?
>>>>>
>>>>> /anil.
>>>>>
>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20160508/1a8153d0/attachment.html>
More information about the Nfd-dev
mailing list