[Ndn-interest] Ndn : Nlsr fundamental problems and short term workarounds

Mon Sep 23 08:08:45 PDT 2019

I wanted to follow up here and provide an updated status regarding Nfd
and Nlsr issues encountered during testing, as well as the workarounds
that we have implemented to allow testing in a dynamic environment.

The problems encountered are mostly related to Nlsr. However, I think
it is appropriate to post here as some of the issues are fundamental
from an NDN architectural point of view and also explicitly addressed
in the NDN Protocol Design Principles [1].

We have had some troubles testing and using Nfd / Nlsr due to several
issues, including requirement for clock synchronization [2],
requirements to backup Nlsr sequence numbers to offline storage [3],
and Nlsr / sync inability to "forget"/delete a node, once seen [4].

These issues were described and listed in the thread on this list with
subject "Network inconsistency and NLSR sequence numbers" in [4],
and included here.

There are at least three main issues which currently make testing and
use of nlsr in a dynamic environment hard / difficult and also
inefficient network-wise.

- nlsr requires backup of sequence numbers (to offline medium in case
  of re-install).

- nlsr requires periodic global synchronized re-start (stop, wait,
  start) of all nlsr instances in a network, to get rid of old
  information in nlsr / sync. (nfd can be left running during re-start
  of nlsr).

- nlsr requires synchronized clocks, that is, global time for correct
  operation. (nfd does not seem to require global synchronized
  clocks).

These issues are all quite painful when trying to use Nfd / Nlsr in a
test environment which is not completely static (nodes are
re-installed from time to time, nodes come and go, nodes form smaller
sub-networks, etc).

The second issue above, the inability to forget / delete nodes not
seen for some time, is known to the people involved in Nlsr
development, but not solved. The issue is highly problematic. A node
that maybe only showed up in the network once, will be remembered by
the nodes in the rest of the network. Hence, nodes will continue to
request information from such disappeared nodes, leading to wasted
network bandwidth.

After some considerations, I ended up with some modifications to Nlsr
which apparently overcome these issues, by some workarouds.  Please
note that the goal here is rather pragmatic - to allow Nfd / Nlsr to
be used in a somewhat dynamic environment. I also tried to perform the least
possible changes to the current Nlsr code base, while having decent
behavior from both system management and Nlsr network performance points
of view.

Shortly described the issues have been addressed by changes to the Nlsr
code base as follows:

Sequence numbers: Removed the requirement for backup of sequence
numbers. If a request is for information generated by this particular
node, Nlsr checks to see if the sequence number is higher compared to
what the node itself currently uses. This will happen if a node is
re-installed and then brought online again. In this case, Nlsr writes
this higher sequence number to it's sequence number file and then
exits. The Nlsr process is auto-restarted externally. A Nlsr process
restart will happen multiple times after a re-install, to synchronize
all the sequence numbers. As for example, both name and adjacency
sequence numbers are synchronized. Additionally, to avoid live-lock,
unless a requests for a node's own LSA happens within a certain time
limit, Nlsr exits, triggering a auto-restart.

Forget / delete nodes : This one seemed difficult to get around
without getting deep into Nlsr and Sync (chronosync or
psync). Fortunately, at each node I have access to IP routing
information from a traditional link state routing protocol running in
parallel with Nlsr. Whenever a LSA packet is to be sent, the modified
Nlsr code consults the IP routing information to check if the
corresponding node is reachable or not. Within this check, all LSA
packets destined for unreachable nodes are dropped /
pruned. Consequently, network bandwidth is not wasted. Hopefully,
nodes which have not been seen in the network for a long time, will
also be forgotten, as Nfd, Nlsr and the nodes themselves are
restarted, from time to time.

Clock synchronization: Removed the clock synchronization requirement
by, within Nlsr, re-writing the timestamp of each received LSA, using the
local time of the receiving node. In other words, pretending that the
LSA was generated when received.

I think all these issues should be fixed in a principal manner, for a
long term solution. Please let me know if there is any interest in
addressing these issues, with the goal of a less fragile and less
ad-hoc approach. Clearly, the need for filtering outgoing Nlsr packets
by means of another link state routing protocol should / must be removed.

Best regards
Viktor S. Wold Eide

[1] https://named-data.net/project/ndn-design-principles/
[2] https://www.lists.cs.ucla.edu/pipermail/ndn-interest/2019-June/002482.html
[3] https://www.lists.cs.ucla.edu/pipermail/ndn-interest/2019-August/002558.html
[4] https://www.lists.cs.ucla.edu/pipermail/ndn-interest/2019-August/002564.html