[Nfd-dev] Emergency

Mon Aug 21 16:43:51 PDT 2017

Junxiao,

I doubt very much that it was your app.
We were seeing almost every node on the testbed restarting repeatedly.
Once I killed all my AWS instances the problem went away. So, it seems
to have something to do with my instances, which I had running for a long time
waiting for our SIGCOMM demo time.

John

On Aug 21, 2017, at 4:35 PM, Junxiao Shi <shijunxiao at email.arizona.edu<mailto:shijunxiao at email.arizona.edu>> wrote:

Hi Davide

>> >
>> > What does this message mean?
>> >
>> > 1503354619.041373 FATAL: [NFD] Non-recoverable error: request timed out
>> > code: 10060
>
> It seems there's uncaught exception somewhere.
> This looks like coming from ndn::nfd::Controller, and 10060 means "timeout".
> It's probably caused by either NFD or NLSR not responding to NFD-RIB's
> command, as the only usage of ndn::nfd::Controller is in NFD-RIB.

I think the exception is thrown by FibUpdater::onUpdateError. NFD is
repeatedly timing out on a FibAddNextHopCommand or
FibRemoveNextHopCommand. After several failed retries, FibUpdater
gives up and kills the whole daemon.

Maybe there's a flood of requests and NFD can't keep up?

I might have to apologize for this one. I was working on a tunneling app and it currently would keep 60 outstanding Interests and aggressively retransmit. There's still a problem when the peer is offline, because NFD would immediately Nack and the consumer would immediately send a new Interest. I forgot to turn off my other endpoint yesterday so it may have been blasting Interests. It's turned off now. I'll implement proper flow control procedure before reconnecting it.

Yours, Junxiao

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20170821/ce92d325/attachment.html>