[Nfd-dev] [Operators] "No buffer space available"

Thu Sep 7 07:07:52 PDT 2017

All:

We had another FATAL error overnight.
I have a DEBUG nfd log file that caught it but I don’t see any enormous interests.
I’m still going through the log file to see what might have happened.
In case anyone else wants to look at it, it is here:

https://www.arl.wustl.edu/~jdd/nfd.log.SRRU.FATAL.gz

I’m also very curious about what Eric said in the nfd conference call yesterday.
I believe he said that in integration testing he has seen an instance of a long interest
with empty components. Is that actually something built into the tests or is this something
we should be concerned about?

John

On Sep 6, 2017, at 5:49 AM, Dehart, John <jdd at wustl.edu<mailto:jdd at wustl.edu>> wrote:

On Sep 6, 2017, at 3:49 AM, Davide Pesavento <davide.pesavento at lip6.fr<mailto:davide.pesavento at lip6.fr>> wrote:

On Tue, Sep 5, 2017 at 7:57 PM, Dehart, John <jdd at wustl.edu<mailto:jdd at wustl.edu>> wrote:

On Sep 5, 2017, at 11:36 AM, Davide Pesavento <davide.pesavento at lip6.fr<mailto:davide.pesavento at lip6.fr>> wrote:

On Tue, Sep 5, 2017 at 5:23 PM, Dehart, John <jdd at wustl.edu<mailto:jdd at wustl.edu>> wrote:

nfd.log file has not been touched in 11 hours:

So nfd is up but not doing anything meaningful? Can you run strace -p
on the stuck nfd?

Too late. I restarted with my updates to change to DEBUG and capture log files.
Yes, it was stuck and not responding to anything. Did attach to it with gdb
to see if that would tell me anything but not sure this really tells us anything:

(gdb) bt
#0  malloc_consolidate (av=av at entry=0x7f6a6de53760 <main_arena>) at malloc.c:4165
#1  0x00007f6a6db1072d in _int_free (av=0x7f6a6de53760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4059
#2  0x00007f6a6f50baf7 in std::_Sp_counted_ptr<ndn::Buffer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /usr/lib/x86_64-linux-gnu/libndn-cxx.so.0.5.1
#3  0x000000000045b639 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#4  0x0000000000472e44 in ?? ()
#5  0x000000000047c267 in ?? ()
#6  0x000000000047c393 in ?? ()
#7  0x000000000047a567 in ?? ()
#8  0x000000000044eff4 in boost::asio::detail::task_io_service::run(boost::system::error_code&) ()
#9  0x0000000000450576 in ?? ()
#10 0x00000000004363ff in ?? ()
#11 0x00007f6a6dab2f45 in __libc_start_main (main=0x436060, argc=3, argv=0x7ffff81bf138, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffff81bf128) at libc-start.c:287
#12 0x0000000000449399 in ?? ()
(gdb)

When I see it again, I will try strace.

Thanks. A proper gdb backtrace could be useful too, but the above is
not. To make it more useful you need to install the debug symbols for
ndn-cxx and nfd.

What is the penalty for running with the versions with debug symbols?
Should we be doing that at all times or is it a performance killer?

When nfd was stuck, do you know if CPU consumption was at or close to 100%?

No. The CPU usage was close to 0.

John

Thanks,
Davide

_______________________________________________
Nfd-dev mailing list
Nfd-dev at lists.cs.ucla.edu<mailto:Nfd-dev at lists.cs.ucla.edu>
http://www.lists.cs.ucla.edu/mailman/listinfo/nfd-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20170907/4d0d5a24/attachment-0001.html>