[Nfd-dev] Emergency

Mon Aug 21 16:19:49 PDT 2017

On Mon, Aug 21, 2017 at 7:10 PM, Junxiao Shi
<shijunxiao at email.arizona.edu> wrote:
> Hi John
>
>> >
>> > What does this message mean?
>> >
>> > 1503354619.041373 FATAL: [NFD] Non-recoverable error: request timed out
>> > code: 10060
>
> It seems there's uncaught exception somewhere.
> This looks like coming from ndn::nfd::Controller, and 10060 means "timeout".
> It's probably caused by either NFD or NLSR not responding to NFD-RIB's
> command, as the only usage of ndn::nfd::Controller is in NFD-RIB.

I think the exception is thrown by FibUpdater::onUpdateError. NFD is
repeatedly timing out on a FibAddNextHopCommand or
FibRemoveNextHopCommand. After several failed retries, FibUpdater
gives up and kills the whole daemon.

Maybe there's a flood of requests and NFD can't keep up?

>> 1503352769.396513 FATAL: [NFD] remote_endpoint: Transport endpoint is not
>> connected
>
> I've never seen this one. It probably comes from Boost libraries.
> Please provide a stack trace when it occurs. Make sure to install debug
> symbols, and use 'bt full' instead of 'bt' to collect stack trace.

Yes, I think it comes from calling remote_endpoint() on an unconnected
socket. The question is what socket, and why is it not connected?