[Nfd-dev] [Operators] "No buffer space available"

Dehart, John jdd at wustl.edu
Thu Sep 7 07:37:48 PDT 2017


OK. I have a clue.

I am also collecting face information every minute (nfdc face list) and storing
that in a log file so I can see what faces existed around the time of the crash.
My thinking was that when we see the giant interest in the nfd.log I would be able
to tell where it was coming from and track down what was sending it.

In this case, during the time that nfd is timing out Fib Updates, I see in my
nfdc log file that nfdc no longer gets any results. I neglected to redirect stderr
to the log file but I suspect that nfdc is failing to connect to get its results.
But just before it starts failing it showed 501 TCP faces.
I have a small number of nodes in the Testbed that have a problem with
fragmentation and for them I run their faces as TCP. That seems to be a problem.

For this node, SRRU, it has faces to 6 other nodes. But just before it crashed
it had 70-100 faces to each of those 6 nodes.
Why would the TCP faces not go away when a new one was needed?

John

On Sep 7, 2017, at 9:07 AM, Dehart, John <jdd at wustl.edu<mailto:jdd at wustl.edu>> wrote:


All:

We had another FATAL error overnight.
I have a DEBUG nfd log file that caught it but I don’t see any enormous interests.
I’m still going through the log file to see what might have happened.
In case anyone else wants to look at it, it is here:

https://www.arl.wustl.edu/~jdd/nfd.log.SRRU.FATAL.gz

I’m also very curious about what Eric said in the nfd conference call yesterday.
I believe he said that in integration testing he has seen an instance of a long interest
with empty components. Is that actually something built into the tests or is this something
we should be concerned about?

John


On Sep 6, 2017, at 5:49 AM, Dehart, John <jdd at wustl.edu<mailto:jdd at wustl.edu>> wrote:


On Sep 6, 2017, at 3:49 AM, Davide Pesavento <davide.pesavento at lip6.fr<mailto:davide.pesavento at lip6.fr>> wrote:

On Tue, Sep 5, 2017 at 7:57 PM, Dehart, John <jdd at wustl.edu<mailto:jdd at wustl.edu>> wrote:

On Sep 5, 2017, at 11:36 AM, Davide Pesavento <davide.pesavento at lip6.fr<mailto:davide.pesavento at lip6.fr>> wrote:

On Tue, Sep 5, 2017 at 5:23 PM, Dehart, John <jdd at wustl.edu<mailto:jdd at wustl.edu>> wrote:

nfd.log file has not been touched in 11 hours:

So nfd is up but not doing anything meaningful? Can you run strace -p
on the stuck nfd?

Too late. I restarted with my updates to change to DEBUG and capture log files.
Yes, it was stuck and not responding to anything. Did attach to it with gdb
to see if that would tell me anything but not sure this really tells us anything:

(gdb) bt
#0  malloc_consolidate (av=av at entry=0x7f6a6de53760 <main_arena>) at malloc.c:4165
#1  0x00007f6a6db1072d in _int_free (av=0x7f6a6de53760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4059
#2  0x00007f6a6f50baf7 in std::_Sp_counted_ptr<ndn::Buffer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /usr/lib/x86_64-linux-gnu/libndn-cxx.so.0.5.1
#3  0x000000000045b639 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#4  0x0000000000472e44 in ?? ()
#5  0x000000000047c267 in ?? ()
#6  0x000000000047c393 in ?? ()
#7  0x000000000047a567 in ?? ()
#8  0x000000000044eff4 in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()
#9  0x0000000000450576 in ?? ()
#10 0x00000000004363ff in ?? ()
#11 0x00007f6a6dab2f45 in __libc_start_main (main=0x436060, argc=3, argv=0x7ffff81bf138, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffff81bf128) at libc-start.c:287
#12 0x0000000000449399 in ?? ()
(gdb)

When I see it again, I will try strace.

Thanks. A proper gdb backtrace could be useful too, but the above is
not. To make it more useful you need to install the debug symbols for
ndn-cxx and nfd.

What is the penalty for running with the versions with debug symbols?
Should we be doing that at all times or is it a performance killer?


When nfd was stuck, do you know if CPU consumption was at or close to 100%?

No. The CPU usage was close to 0.

John


Thanks,
Davide

_______________________________________________
Nfd-dev mailing list
Nfd-dev at lists.cs.ucla.edu<mailto:Nfd-dev at lists.cs.ucla.edu>
http://www.lists.cs.ucla.edu/mailman/listinfo/nfd-dev

_______________________________________________
Nfd-dev mailing list
Nfd-dev at lists.cs.ucla.edu<mailto:Nfd-dev at lists.cs.ucla.edu>
http://www.lists.cs.ucla.edu/mailman/listinfo/nfd-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20170907/563ba636/attachment-0001.html>


More information about the Nfd-dev mailing list