[Ndn-interest] any comments on naming convention?

Sat Sep 20 13:56:29 PDT 2014

Dave already touched on some of these concerns but let me add a couple of
things.

On 9/20/14, 11:15 AM, "Tai-Lin Chu" <tailinchu at gmail.com> wrote:

>1. LPM allows "data discovery". How will exact match do similar things?

LPM in the reverse path creates more problems than it solves.  It works in
some cases where you have only 2 nodes in the network, but when you have a
large complicated topology it doesn’t help that much.  Are you going to
request right-most child on every hop?   What if you have cross traffic
for the same prefix?  You may never get to what you’re looking for because
there is always a newer match.  The way to solve these problems are to
know more about the namespace.  If you know more about the namespace then
why did you need this type of discovery in the first place?

Discovery at the forwarder is a privacy problem.  Do you allow people to
as for “/“ and then start exploring everything you have in the cache?  Or
do you plan to limit this with a set of rules?

Discovery is needed, but what we need is not “forwarder-level” discovery.
 If you want a system that allows discovery and exploration of the caches
at the forwarder level then create a protocol to do that. If you want a
discovery protocol to run at the transport level then create a protocol
that does that.  If you need a discovery protocol at the application
level, then create one of those.

To top it off, LPM for interests is a very complicated problem if you
don’t have selectors.  You’re going to end up inserting termination
markers to do exact match so that you don’t get random stuff you don’t
want.  At that point might as well move to exact matching.  Selectors by
themselves are a problem see bellow.

Finally, it’s unlikely that medium and big routers will implement LPM and
selector matching.  Cisco doesn’t think they work, Alcatel-Lucent doesn’t
think they work, PARC doesn't think they work;  Huawei, what do you think?
 Ericsson? Juniper?  Anybody?

>2. will removing selectors improve performance? How do we use other
>faster technique to replace selector?

Selectors are needed if you use LPM for interest matching, otherwise you
can’t do much with LPM.

IF your argument is that selectors are useful because they are used for
discovery, then why not use real selectors that implement full regular
expressions?  Why not a full query language?

In terms of performance, what is the current limit on selectors?  How many
excludes can I have? (And what effect does that have on router
performance?) Are these enough? How many roundtrips do I need to take to
discover the data that I want?    I’ve always found this a funny argument
because you’re basically saying: I’m not sure what I want (hence the
discovery protocol), but I’ll know what I want when I see it.

Well, maybe if you had a real protocol that could specify what you wanted
you would have gotten it in one round trip. So this is general protocol
performance.

I would venture a guess that for most situations you can come up with that
require LPM and selectors you’re either talking about a situation where
you need a custom protocol (like some form of exploratory ad-hoc
networks), or you’re relying on flooding the whole network, or you’re
trying to solve a transport/application level problem.

>3. fixed byte length and type. I agree more that type can be fixed
>byte, but 2 bytes for length might not be enough for future.

This is a valid question, is 2 bytes enough for length?

The first think I want to say is that we should not confuse the length of
the network TLV format with application level objects.  This is like
arguing about the block size of hard disks.  As Dave mentioned, so far, we
haven’t had a lot of problems in getting to line rate with “small
objects”.  We (the network) are interested in things like MTUs of various
networks and how they interact with application layer data blobs, but we
don’t need to make all of these the same.

If you have a 4K movie that takes 25Gigs, it’s unlikely that’s going to be
a single network unit.  When you read these off disk, when you move them
through the filesystem or OS it’s going to be done in things smaller than
25Gigs.  It’s true that some optical links can have large envelopes, but
that’s not a problem. We can always encapsulate many little packets into a
big bundle.  The question to ask would be performance and overhead.  How
much overhead do we have by needing to use headers for every “regular
packet” inside one of these 25Gig envelopes?   I guess if you’re encoding
is inefficient this could be hi, but so far this doesn’t seem to be the
case (at least not for CCN, not sure about NDN overhead).

However, for the nearish future it’s hard to see all links being able to
support this (specially wireless links, which, as you imagine, are growing
in popularity). This means that we’ll need to chop this 25Gig thing up
into smaller units for the rest of the network.  This implies
fragmentation if the original message was 25Gigs.  Fragmentation is bad.
It’s expensive and breaks a lot of things (or at least makes them harder).
  Why not just use reasonable size messages and bundle them when we need
to send them in a large envelope?

In terms of the TLVs used for large files.  Well, you can always have a
field for “File Length”, of size “8”, so you can include a 64bit number
that refers to the actual file size.  There is no need for the network
level TLV parsing engine to deal with the overhead of 8 byte L fields or
with the need to process variable length encodings.

Nacho

>
>On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) <oran at cisco.com> wrote:
>>
>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu <tailinchu at gmail.com> wrote:
>>
>>>> I know how to make #2 flexible enough to do what things I can
>>>>envision we need to do, and with a few simple conventions on how the
>>>>registry of types is managed.
>>>
>>> Could you share it with us?
>>>
>> Sure. Here’s a strawman.
>>
>> The type space is 16 bits, so you have 65,565 types.
>>
>> The type space is currently shared with the types used for the entire
>>protocol, that gives us two options:
>> (1) we reserve a range for name component types. Given the likelihood
>>there will be at least as much and probably more need to component types
>>than protocol extensions, we could reserve 1/2 of the type space, giving
>>us 32K types for name components.
>> (2) since there is no parsing ambiguity between name components and
>>other fields of the protocol (sine they are sub-types of the name type)
>>we could reuse numbers and thereby have an entire 65K name component
>>types.
>>
>> We divide the type space into regions, and manage it with a registry.
>>If we ever get to the point of creating an IETF standard, IANA has 25
>>years of experience running registries and there are well-understood
>>rule sets for different kinds of registries (open, requires a written
>>spec, requires standards approval).
>>
>> - We allocate one “default" name component type for “generic name”,
>>which would be used on name prefixes and other common cases where there
>>are no special semantics on the name component.
>> - We allocate a range of name component types, say 1024, to globally
>>understood types that are part of the base or extension NDN
>>specifications (e.g. chunk#, version#, etc.
>> - We reserve some portion of the space for unanticipated uses (say
>>another 1024 types)
>> - We give the rest of the space to application assignment.
>>
>> Make sense?
>>
>>
>>>> While I’m sympathetic to that view, there are three ways in which
>>>>Moore’s law or hardware tricks will not save us from performance flaws
>>>>in the design
>>>
>>> we could design for performance,
>> That’s not what people are advocating. We are advocating that we *not*
>>design for known bad performance and hope serendipity or Moore’s Law
>>will come to the rescue.
>>
>>> but I think there will be a turning
>>> point when the slower design starts to become "fast enough”.
>> Perhaps, perhaps not. Relative performance is what matters so things
>>that don’t get faster while others do tend to get dropped or not used
>>because they impose a performance penalty relative to the things that go
>>faster. There is also the “low-end” phenomenon where impovements in
>>technology get applied to lowering cost rather than improving
>>performance. For those environments bad performance just never get
>>better.
>>
>>> Do you
>>> think there will be some design of ndn that will *never* have
>>> performance improvement?
>>>
>> I suspect LPM on data will always be slow (relative to the other
>>functions).
>> i suspect exclusions will always be slow because they will require
>>extra memory references.
>>
>> However I of course don’t claim to clairvoyance so this is just
>>speculation based on 35+ years of seeing performance improve by 4 orders
>>of magnitude and still having to worry about counting cycles and memory
>>references…
>>
>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) <oran at cisco.com>
>>>wrote:
>>>>
>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu <tailinchu at gmail.com> wrote:
>>>>
>>>>> We should not look at a certain chip nowadays and want ndn to perform
>>>>> well on it. It should be the other way around: once  ndn app becomes
>>>>> popular, a better chip will be designed for ndn.
>>>>>
>>>> While I’m sympathetic to that view, there are three ways in which
>>>>Moore’s law or hardware tricks will not save us from performance flaws
>>>>in the design:
>>>> a) clock rates are not getting (much) faster
>>>> b) memory accesses are getting (relatively) more expensive
>>>> c) data structures that require locks to manipulate successfully will
>>>>be relatively more expensive, even with near-zero lock contention.
>>>>
>>>> The fact is, IP *did* have some serious performance flaws in its
>>>>design. We just forgot those because the design elements that depended
>>>>on those mistakes have fallen into disuse. The poster children for
>>>>this are:
>>>> 1. IP options. Nobody can use them because they are too slow on
>>>>modern forwarding hardware, so they can’t be reliably used anywhere
>>>> 2. the UDP checksum, which was a bad design when it was specified and
>>>>is now a giant PITA that still causes major pain in working around.
>>>>
>>>> I’m afraid students today are being taught the that designers of IP
>>>>were flawless, as opposed to very good scientists and engineers that
>>>>got most of it right.
>>>>
>>>>> I feel the discussion today and yesterday has been off-topic. Now I
>>>>> see that there are 3 approaches:
>>>>> 1. we should not define a naming convention at all
>>>>> 2. typed component: use tlv type space and add a handful of types
>>>>> 3. marked component: introduce only one more type and add additional
>>>>> marker space
>>>>>
>>>> I know how to make #2 flexible enough to do what things I can
>>>>envision we need to do, and with a few simple conventions on how the
>>>>registry of types is managed.
>>>>
>>>> It is just as powerful in practice as either throwing up our hands
>>>>and letting applications design their own mutually incompatible
>>>>schemes or trying to make naming conventions with markers in a way
>>>>that is fast to generate/parse and also resilient against aliasing.
>>>>
>>>>> Also everybody thinks that the current utf8 marker naming convention
>>>>> needs to be revised.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe <felix at rabe.io> wrote:
>>>>>> Would that chip be suitable, i.e. can we expect most names to fit
>>>>>>in (the
>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN
>>>>>> experiments?
>>>>>>
>>>>>> I guess wide deployment could make for even longer names. Related:
>>>>>>Many URLs
>>>>>> I encounter nowadays easily don't fit within two 80-column text
>>>>>>lines, and
>>>>>> NDN will have to carry more information than URLs, as far as I see.
>>>>>>
>>>>>>
>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote:
>>>>>>
>>>>>> In fact, the index in separate TLV will be slower on some
>>>>>>architectures,
>>>>>> like the ezChip NP4.  The NP4 can hold the fist 96 frame bytes in
>>>>>>memory,
>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte
>>>>>>blocks
>>>>>> (there can be at most 5 blocks available at any one time).  If you
>>>>>>need to
>>>>>> switch between arrays, it would be very expensive.  If you have to
>>>>>>read past
>>>>>> the name to get to the 2nd array, then read it, then backup to get
>>>>>>to the
>>>>>> name, it will be pretty expensive too.
>>>>>>
>>>>>> Marc
>>>>>>
>>>>>> On Sep 18, 2014, at 2:02 PM, <Ignacio.Solis at parc.com>
>>>>>> <Ignacio.Solis at parc.com> wrote:
>>>>>>
>>>>>> Does this make that much difference?
>>>>>>
>>>>>> If you want to parse the first 5 components.  One way to do it is:
>>>>>>
>>>>>> Read the index, find entry 5, then read in that many bytes from the
>>>>>>start
>>>>>> offset of the beginning of the name.
>>>>>> OR
>>>>>> Start reading name, (find size + move ) 5 times.
>>>>>>
>>>>>> How much speed are you getting from one to the other?  You seem to
>>>>>>imply
>>>>>> that the first one is faster.  I don¹t think this is the case.
>>>>>>
>>>>>> In the first one you¹ll probably have to get the cache line for the
>>>>>>index,
>>>>>> then all the required cache lines for the first 5 components.  For
>>>>>>the
>>>>>> second, you¹ll have to get all the cache lines for the first 5
>>>>>>components.
>>>>>> Given an assumption that a cache miss is way more expensive than
>>>>>> evaluating a number and computing an addition, you might find that
>>>>>>the
>>>>>> performance of the index is actually slower than the performance of
>>>>>>the
>>>>>> direct access.
>>>>>>
>>>>>> Granted, there is a case where you don¹t access the name at all, for
>>>>>> example, if you just get the offsets and then send the offsets as
>>>>>> parameters to another processor/GPU/NPU/etc.  In this case you may
>>>>>>see a
>>>>>> gain IF there are more cache line misses in reading the name than in
>>>>>> reading the index.   So, if the regular part of the name that you¹re
>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is
>>>>>>to be
>>>>>> processed by a different processor, then your might see some
>>>>>>performance
>>>>>> gain in using the index, but in all other circumstances I bet this
>>>>>>is not
>>>>>> the case.   I may be wrong, haven¹t actually tested it.
>>>>>>
>>>>>> This is all to say, I don¹t think we should be designing the
>>>>>>protocol with
>>>>>> only one architecture in mind. (The architecture of sending the
>>>>>>name to a
>>>>>> different processor than the index).
>>>>>>
>>>>>> If you have numbers that show that the index is faster I would like
>>>>>>to see
>>>>>> under what conditions and architectural assumptions.
>>>>>>
>>>>>> Nacho
>>>>>>
>>>>>> (I may have misinterpreted your description so feel free to correct
>>>>>>me if
>>>>>> I¹m wrong.)
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Nacho (Ignacio) Solis
>>>>>> Protocol Architect
>>>>>> Principal Scientist
>>>>>> Palo Alto Research Center (PARC)
>>>>>> +1(650)812-4458
>>>>>> Ignacio.Solis at parc.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo"
>>>>>><massimo.gallo at alcatel-lucent.com>
>>>>>> wrote:
>>>>>>
>>>>>> Indeed each components' offset must be encoded using a fixed amount
>>>>>>of
>>>>>> bytes:
>>>>>>
>>>>>> i.e.,
>>>>>> Type = Offsets
>>>>>> Length = 10 Bytes
>>>>>> Value = Offset1(1byte), Offset2(1byte), ...
>>>>>>
>>>>>> You may also imagine to have a "Offset_2byte" type if your name is
>>>>>>too
>>>>>> long.
>>>>>>
>>>>>> Max
>>>>>>
>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote:
>>>>>>
>>>>>> if you do not need the entire hierarchal structure (suppose you only
>>>>>> want the first x components) you can directly have it using the
>>>>>> offsets. With the Nested TLV structure you have to iteratively parse
>>>>>> the first x-1 components. With the offset structure you cane
>>>>>>directly
>>>>>> access to the firs x components.
>>>>>>
>>>>>> I don't get it. What you described only works if the "offset" is
>>>>>> encoded in fixed bytes. With varNum, you will still need to parse
>>>>>>x-1
>>>>>> offsets to get to the x offset.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo
>>>>>> <massimo.gallo at alcatel-lucent.com> wrote:
>>>>>>
>>>>>> On 17/09/2014 14:56, Mark Stapp wrote:
>>>>>>
>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the
>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand
>>>>>>what
>>>>>> you
>>>>>> _do_ prefer, though. it sounds like you're describing an entirely
>>>>>> different
>>>>>> scheme where the info that describes the name-components is ...
>>>>>> someplace
>>>>>> other than _in_ the name-components. is that correct? when you say
>>>>>> "field
>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)?
>>>>>>
>>>>>> Correct.
>>>>>> In particular, with our name encoding, a TLV indicates the name
>>>>>> hierarchy
>>>>>> with offsets in the name and other TLV(s) indicates the offset to
>>>>>>use
>>>>>> in
>>>>>> order to retrieve special components.
>>>>>> As for the field separator, it is something like "/". Aliasing is
>>>>>> avoided as
>>>>>> you do not rely on field separators to parse the name; you use the
>>>>>> "offset
>>>>>> TLV " to do that.
>>>>>>
>>>>>> So now, it may be an aesthetic question but:
>>>>>>
>>>>>> if you do not need the entire hierarchal structure (suppose you only
>>>>>> want
>>>>>> the first x components) you can directly have it using the offsets.
>>>>>> With the
>>>>>> Nested TLV structure you have to iteratively parse the first x-1
>>>>>> components.
>>>>>> With the offset structure you cane directly access to the firs x
>>>>>> components.
>>>>>>
>>>>>> Max
>>>>>>
>>>>>>
>>>>>> -- Mark
>>>>>>
>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote:
>>>>>>
>>>>>> The why is simple:
>>>>>>
>>>>>> You use a lot of "generic component type" and very few "specific
>>>>>> component type". You are imposing types for every component in order
>>>>>> to
>>>>>> handle few exceptions (segmentation, etc..). You create a rule
>>>>>> (specify
>>>>>> the component's type ) to handle exceptions!
>>>>>>
>>>>>> I would prefer not to have typed components. Instead I would prefer
>>>>>> to
>>>>>> have the name as simple sequence bytes with a field separator. Then,
>>>>>> outside the name, if you have some components that could be used at
>>>>>> network layer (e.g. a TLV field), you simply need something that
>>>>>> indicates which is the offset allowing you to retrieve the version,
>>>>>> segment, etc in the name...
>>>>>>
>>>>>>
>>>>>> Max
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 16/09/2014 20:33, Mark Stapp wrote:
>>>>>>
>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote:
>>>>>>
>>>>>> I think we agree on the small number of "component types".
>>>>>> However, if you have a small number of types, you will end up with
>>>>>> names
>>>>>> containing many generic components types and few specific
>>>>>> components
>>>>>> types. Due to the fact that the component type specification is an
>>>>>> exception in the name, I would prefer something that specify
>>>>>> component's
>>>>>> type only when needed (something like UTF8 conventions but that
>>>>>> applications MUST use).
>>>>>>
>>>>>> so ... I can't quite follow that. the thread has had some
>>>>>> explanation
>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.)
>>>>>> and
>>>>>> there's been email trying to explain that applications don't have to
>>>>>> use types if they don't need to. your email sounds like "I prefer
>>>>>> the
>>>>>> UTF8 convention", but it doesn't say why you have that preference in
>>>>>> the face of the points about the problems. can you say why it is
>>>>>> that
>>>>>> you express a preference for the "convention" with problems ?
>>>>>>
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>> .
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ndn-interest mailing list
>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>
>>
>
>_______________________________________________
>Ndn-interest mailing list
>Ndn-interest at lists.cs.ucla.edu
>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest