[Ndn-interest] any comments on naming convention?

Sat Sep 20 12:14:34 PDT 2014

On Sep 20, 2014, at 2:15 PM, Tai-Lin Chu <tailinchu at gmail.com> wrote:

> I had thought about these questions, but I want to know your idea
> besides typed component:
> 1. LPM allows "data discovery". How will exact match do similar things?
It doesn’t. You layer discovery on top. And you do it in a way that does not permit cache exploration by snoopers as a bad security side effect.
There are any number of possible discovery protocols, including having “pointer” objects anchored at various places in the name tree. It would be really useful to start some research on the alternative of layering discovery above the NDN L3 as opposed to building it in as is done currently with selectors.

> 2. will removing selectors improve performance?
Undoubtedly. Especially in caches. They are less problematic if executed by the publisher application.

> How do we use other
> faster technique to replace selector?
If by “faster” you mean just the individual lookup, that’s not the only issue. More important is the “cache chasing” problem which can result in search multipliers and many round trips.

> 3. fixed byte length and type. I agree more that type can be fixed
> byte, but 2 bytes for length might not be enough for future.
> 
Whether we need individual TLV’s to be longer than 64K is worth further discussion. The question of whether entire NDN object messages need to be longer than 64K could be either a dependent design decision if the overall wire format uses recursive TLV or independent if there is a fixed header that permits overall length to exceed 65K.

Here’s one argument for not bothering to go over 64K. 

In 1978 or so, IP packets rarely exceeded 576 bytes. By 1982 this increased to 1500 or so due to Ethernet. Larger packets were attempted using IP fragmentation, which turned out to be a HORRIBLE design in practice, so PMTU discovery was proposed and adopted with limited success. Over the next two decades networks increased in speed by about 3 orders of magnitude, and inherent error rates (except for wireless) reduced by about 5 orders of magnitude. In that time, all we have seen in terms of L3 packet size increase has been the shift to 9K jumbo frames under carefully controlled deployments.

We now have hardware (e.g. NICS) that happily run at 10Gbps with no performance penalty due to packets of 9K or smaller. Next gen NICs (and router switching hardware) will run cost-effectively at between 40-100 Gig without increasing the packet size, sine nobody uses interrupt driven code or per-packet serial operations anymore (and haven’t for 5-10 years).

There is certainly an argument to be made that the system can be made faster and a bit simpler by allowing bigger L3 packets/messages. So far however, we aren’t even close to needing to go over 64K, and no existence proofs of a big win on future hardware of doing so. 30+ years of dramatic network performance increase has not pushed us beyond 64K.

That said, if the complexity cost or the cost for low-end memory-starved systems is small enough to justify larger length fields, then the balance may tip in favor of the future-proofing even in the absence of a compelling case today or in the near future.

DaveO.

> 
> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) <oran at cisco.com> wrote:
>> 
>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu <tailinchu at gmail.com> wrote:
>> 
>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed.
>>> 
>>> Could you share it with us?
>>> 
>> Sure. Here’s a strawman.
>> 
>> The type space is 16 bits, so you have 65,565 types.
>> 
>> The type space is currently shared with the types used for the entire protocol, that gives us two options:
>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components.
>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types.
>> 
>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval).
>> 
>> - We allocate one “default" name component type for “generic name”, which would be used on name prefixes and other common cases where there are no special semantics on the name component.
>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc.
>> - We reserve some portion of the space for unanticipated uses (say another 1024 types)
>> - We give the rest of the space to application assignment.
>> 
>> Make sense?
>> 
>> 
>>>> While I’m sympathetic to that view, there are three ways in which Moore’s law or hardware tricks will not save us from performance flaws in the design
>>> 
>>> we could design for performance,
>> That’s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore’s Law will come to the rescue.
>> 
>>> but I think there will be a turning
>>> point when the slower design starts to become "fast enough”.
>> Perhaps, perhaps not. Relative performance is what matters so things that don’t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the “low-end” phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better.
>> 
>>> Do you
>>> think there will be some design of ndn that will *never* have
>>> performance improvement?
>>> 
>> I suspect LPM on data will always be slow (relative to the other functions).
>> i suspect exclusions will always be slow because they will require extra memory references.
>> 
>> However I of course don’t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references…
>> 
>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) <oran at cisco.com> wrote:
>>>> 
>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu <tailinchu at gmail.com> wrote:
>>>> 
>>>>> We should not look at a certain chip nowadays and want ndn to perform
>>>>> well on it. It should be the other way around: once  ndn app becomes
>>>>> popular, a better chip will be designed for ndn.
>>>>> 
>>>> While I’m sympathetic to that view, there are three ways in which Moore’s law or hardware tricks will not save us from performance flaws in the design:
>>>> a) clock rates are not getting (much) faster
>>>> b) memory accesses are getting (relatively) more expensive
>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention.
>>>> 
>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are:
>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can’t be reliably used anywhere
>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around.
>>>> 
>>>> I’m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right.
>>>> 
>>>>> I feel the discussion today and yesterday has been off-topic. Now I
>>>>> see that there are 3 approaches:
>>>>> 1. we should not define a naming convention at all
>>>>> 2. typed component: use tlv type space and add a handful of types
>>>>> 3. marked component: introduce only one more type and add additional
>>>>> marker space
>>>>> 
>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed.
>>>> 
>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing.
>>>> 
>>>>> Also everybody thinks that the current utf8 marker naming convention
>>>>> needs to be revised.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe <felix at rabe.io> wrote:
>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the
>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN
>>>>>> experiments?
>>>>>> 
>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs
>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and
>>>>>> NDN will have to carry more information than URLs, as far as I see.
>>>>>> 
>>>>>> 
>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote:
>>>>>> 
>>>>>> In fact, the index in separate TLV will be slower on some architectures,
>>>>>> like the ezChip NP4.  The NP4 can hold the fist 96 frame bytes in memory,
>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks
>>>>>> (there can be at most 5 blocks available at any one time).  If you need to
>>>>>> switch between arrays, it would be very expensive.  If you have to read past
>>>>>> the name to get to the 2nd array, then read it, then backup to get to the
>>>>>> name, it will be pretty expensive too.
>>>>>> 
>>>>>> Marc
>>>>>> 
>>>>>> On Sep 18, 2014, at 2:02 PM, <Ignacio.Solis at parc.com>
>>>>>> <Ignacio.Solis at parc.com> wrote:
>>>>>> 
>>>>>> Does this make that much difference?
>>>>>> 
>>>>>> If you want to parse the first 5 components.  One way to do it is:
>>>>>> 
>>>>>> Read the index, find entry 5, then read in that many bytes from the start
>>>>>> offset of the beginning of the name.
>>>>>> OR
>>>>>> Start reading name, (find size + move ) 5 times.
>>>>>> 
>>>>>> How much speed are you getting from one to the other?  You seem to imply
>>>>>> that the first one is faster.  I don¹t think this is the case.
>>>>>> 
>>>>>> In the first one you¹ll probably have to get the cache line for the index,
>>>>>> then all the required cache lines for the first 5 components.  For the
>>>>>> second, you¹ll have to get all the cache lines for the first 5 components.
>>>>>> Given an assumption that a cache miss is way more expensive than
>>>>>> evaluating a number and computing an addition, you might find that the
>>>>>> performance of the index is actually slower than the performance of the
>>>>>> direct access.
>>>>>> 
>>>>>> Granted, there is a case where you don¹t access the name at all, for
>>>>>> example, if you just get the offsets and then send the offsets as
>>>>>> parameters to another processor/GPU/NPU/etc.  In this case you may see a
>>>>>> gain IF there are more cache line misses in reading the name than in
>>>>>> reading the index.   So, if the regular part of the name that you¹re
>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be
>>>>>> processed by a different processor, then your might see some performance
>>>>>> gain in using the index, but in all other circumstances I bet this is not
>>>>>> the case.   I may be wrong, haven¹t actually tested it.
>>>>>> 
>>>>>> This is all to say, I don¹t think we should be designing the protocol with
>>>>>> only one architecture in mind. (The architecture of sending the name to a
>>>>>> different processor than the index).
>>>>>> 
>>>>>> If you have numbers that show that the index is faster I would like to see
>>>>>> under what conditions and architectural assumptions.
>>>>>> 
>>>>>> Nacho
>>>>>> 
>>>>>> (I may have misinterpreted your description so feel free to correct me if
>>>>>> I¹m wrong.)
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Nacho (Ignacio) Solis
>>>>>> Protocol Architect
>>>>>> Principal Scientist
>>>>>> Palo Alto Research Center (PARC)
>>>>>> +1(650)812-4458
>>>>>> Ignacio.Solis at parc.com
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" <massimo.gallo at alcatel-lucent.com>
>>>>>> wrote:
>>>>>> 
>>>>>> Indeed each components' offset must be encoded using a fixed amount of
>>>>>> bytes:
>>>>>> 
>>>>>> i.e.,
>>>>>> Type = Offsets
>>>>>> Length = 10 Bytes
>>>>>> Value = Offset1(1byte), Offset2(1byte), ...
>>>>>> 
>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too
>>>>>> long.
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote:
>>>>>> 
>>>>>> if you do not need the entire hierarchal structure (suppose you only
>>>>>> want the first x components) you can directly have it using the
>>>>>> offsets. With the Nested TLV structure you have to iteratively parse
>>>>>> the first x-1 components. With the offset structure you cane directly
>>>>>> access to the firs x components.
>>>>>> 
>>>>>> I don't get it. What you described only works if the "offset" is
>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1
>>>>>> offsets to get to the x offset.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo
>>>>>> <massimo.gallo at alcatel-lucent.com> wrote:
>>>>>> 
>>>>>> On 17/09/2014 14:56, Mark Stapp wrote:
>>>>>> 
>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the
>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what
>>>>>> you
>>>>>> _do_ prefer, though. it sounds like you're describing an entirely
>>>>>> different
>>>>>> scheme where the info that describes the name-components is ...
>>>>>> someplace
>>>>>> other than _in_ the name-components. is that correct? when you say
>>>>>> "field
>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)?
>>>>>> 
>>>>>> Correct.
>>>>>> In particular, with our name encoding, a TLV indicates the name
>>>>>> hierarchy
>>>>>> with offsets in the name and other TLV(s) indicates the offset to use
>>>>>> in
>>>>>> order to retrieve special components.
>>>>>> As for the field separator, it is something like "/". Aliasing is
>>>>>> avoided as
>>>>>> you do not rely on field separators to parse the name; you use the
>>>>>> "offset
>>>>>> TLV " to do that.
>>>>>> 
>>>>>> So now, it may be an aesthetic question but:
>>>>>> 
>>>>>> if you do not need the entire hierarchal structure (suppose you only
>>>>>> want
>>>>>> the first x components) you can directly have it using the offsets.
>>>>>> With the
>>>>>> Nested TLV structure you have to iteratively parse the first x-1
>>>>>> components.
>>>>>> With the offset structure you cane directly access to the firs x
>>>>>> components.
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> 
>>>>>> -- Mark
>>>>>> 
>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote:
>>>>>> 
>>>>>> The why is simple:
>>>>>> 
>>>>>> You use a lot of "generic component type" and very few "specific
>>>>>> component type". You are imposing types for every component in order
>>>>>> to
>>>>>> handle few exceptions (segmentation, etc..). You create a rule
>>>>>> (specify
>>>>>> the component's type ) to handle exceptions!
>>>>>> 
>>>>>> I would prefer not to have typed components. Instead I would prefer
>>>>>> to
>>>>>> have the name as simple sequence bytes with a field separator. Then,
>>>>>> outside the name, if you have some components that could be used at
>>>>>> network layer (e.g. a TLV field), you simply need something that
>>>>>> indicates which is the offset allowing you to retrieve the version,
>>>>>> segment, etc in the name...
>>>>>> 
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 16/09/2014 20:33, Mark Stapp wrote:
>>>>>> 
>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote:
>>>>>> 
>>>>>> I think we agree on the small number of "component types".
>>>>>> However, if you have a small number of types, you will end up with
>>>>>> names
>>>>>> containing many generic components types and few specific
>>>>>> components
>>>>>> types. Due to the fact that the component type specification is an
>>>>>> exception in the name, I would prefer something that specify
>>>>>> component's
>>>>>> type only when needed (something like UTF8 conventions but that
>>>>>> applications MUST use).
>>>>>> 
>>>>>> so ... I can't quite follow that. the thread has had some
>>>>>> explanation
>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.)
>>>>>> and
>>>>>> there's been email trying to explain that applications don't have to
>>>>>> use types if they don't need to. your email sounds like "I prefer
>>>>>> the
>>>>>> UTF8 convention", but it doesn't say why you have that preference in
>>>>>> the face of the points about the problems. can you say why it is
>>>>>> that
>>>>>> you express a preference for the "convention" with problems ?
>>>>>> 
>>>>>> Thanks,
>>>>>> Mark
>>>>>> 
>>>>>> .
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Ndn-interest mailing list
>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Ndn-interest mailing list
>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>> 
>>