[Ndn-interest] Discovery (was: any comments on naming convention?)

Fri Oct 3 19:23:08 PDT 2014

So far this makes sense to me.   My impression is that the current NDN
proposal is to use set reconciliation / SYNC style approaches for
'discover all' and that LPM+selectors provide an option for 'discover
greatest' or 'discover nearest'.  (An example of the latter is an interest
in a video seek operation that is designed to return the closest keyframe
to a specified time, which is then followed by exact match interests to
get subsequent segments for playout.)

Another class of 'queries' that has come up in several applications is kNN
- given some target object name, return the k nearest neighbors to that
interest.  

Jeff

On 9/22/14, 1:33 PM, "Marc.Mosko at parc.com" <Marc.Mosko at parc.com> wrote:

>I received some feedback that people would like to know more about
>discovery and what we have done in ccnx 1.0, so I am starting a new
>thread for it.
>
>As I mentioned, we are not ready to publish our discovery protocols and I
>don’t want to go off half way with something we are still working on, but
>I can talk about what I think discovery is and should do.  Discovery is a
>very important topic and I don’t think research is anywhere near done on
>the topic.
>
>First, I think we need a clear definition of what services discover
>offers.  I would say it should do, at least, “discover greatest” and
>“discover all” with the obvious variations based on sort order and range,
>for some given prefix.  It should also support discovery scoped by one
>(or possibly more) publisher keys.
>
>What does “discover greatest” mean?  One could do something similar to
>ccnx 0.x and ndn, where its based on a canonical sort order of name
>components and one could ask for the greatest name after a given prefix.
>Or, one could do something specific to a versioning protocol or creation
>time, etc.
>
>What does “discover all” mean?  First, I think we should recognize that
>some data sets might be very, very large.  Like millions or billions of
>possible results (content objects).  The discovery protocol should be
>able to discover it all, efficiently (if not optimally).
>
>I also think there is no one discovery protocol.  Some applications may
>want strict ACID-style discovery (i.e. only see “completed” or “whole”
>results, such as where all segments are available) some might want
>eventually consistent discovery, some might take best-effort discovery.
>Some discovery protocols may require authentication and some may be open
>to the world.
>
>I think the discovery process should be separate from the retrieval
>process.  If one is publishing, say, 64KB objects or even 4GB objects (or
>even larger objects!), one does not want to have to fetch each object to
>discover it.
>
>All this leads me to think we need discovery protocols that allow us to
>talk about content without actually transferring the content.
>
>The discovery protocol should be able to handle multiple objects with the
>same name ― i.e. two publishers overwrite each other or even a single
>publisher publishes two objects with the same name.
>
>Discovery is also closely related to how routing works and forwarding
>strategy.  Does an Interest flood all replicas?  Does it only go anycast
>style?  How large can an Interest be, and what’s the performance tradeoff
>for large interests (i.e. dropping fragments)?  Is there an in-network
>control protocol to NAK or are all NAKs end-to-end?  How does a discovery
>process terminate (i.e. when do you know you’re done)?
>
>Marc
>
>
>
>On Sep 21, 2014, at 9:23 AM, Mosko, Marc <Marc.Mosko at parc.com>
><Marc.Mosko at parc.com> wrote:
>
>> No matter what the expressiveness of the predicates if the forwarder
>>can send interests different ways you don't have a consistent underlying
>>set to talk about so you would always need non-range exclusions to
>>discover every version.
>> 
>> Range exclusions only work I believe if you get an authoritative
>>answer.   If different content pieces are scattered between different
>>caches I don't see how range exclusions would work to discover every
>>version.
>> 
>> I'm sorry to be pointing out problems without offering solutions but
>>we're not ready to publish our discovery protocols.
>> 
>> Sent from my telephone
>> 
>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" <tailinchu at gmail.com> wrote:
>>> 
>>> I see. Can you briefly describe how ccnx discovery protocol solves the
>>> all problems that you mentioned (not just exclude)? a doc will be
>>> better.
>>> 
>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon
>>> expect [and] and [or], so boolean algebra is fully supported.  Regular
>>> language or context free language might become part of selector too.
>>> 
>>>> On Sat, Sep 20, 2014 at 11:25 PM,  <Marc.Mosko at parc.com> wrote:
>>>> That will get you one reading then you need to exclude it and ask
>>>>again.
>>>> 
>>>> Sent from my telephone
>>>> 
>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" <tailinchu at gmail.com> wrote:
>>>> 
>>>>>> Yes, my point was that if you cannot talk about a consistent set
>>>>>>with a particular cache, then you need to always use individual
>>>>>>excludes not range excludes if you want to discover all the versions
>>>>>>of an object.
>>>>> 
>>>>> I am very confused. For your example, if I want to get all today's
>>>>> sensor data, I just do (Any..Last second of last day)(First second of
>>>>> tomorrow..Any). That's 18 bytes.
>>>>> 
>>>>> 
>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude
>>>>> 
>>>>>> On Sat, Sep 20, 2014 at 10:55 PM,  <Marc.Mosko at parc.com> wrote:
>>>>>> 
>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu <tailinchu at gmail.com>
>>>>>>wrote:
>>>>>> 
>>>>>>>> If you talk sometimes to A and sometimes to B, you very easily
>>>>>>>>could miss content objects you want to discovery unless you avoid
>>>>>>>>all range exclusions and only exclude explicit versions.
>>>>>>> 
>>>>>>> Could you explain why missing content object situation happens?
>>>>>>>also
>>>>>>> range exclusion is just a shorter notation for many explicit
>>>>>>>exclude;
>>>>>>> converting from explicit excludes to ranged exclude is always
>>>>>>> possible.
>>>>>> 
>>>>>> Yes, my point was that if you cannot talk about a consistent set
>>>>>>with a particular cache, then you need to always use individual
>>>>>>excludes not range excludes if you want to discover all the versions
>>>>>>of an object.  For something like a sensor reading that is updated,
>>>>>>say, once per second you will have 86,400 of them per day.  If each
>>>>>>exclusion is a timestamp (say 8 bytes), that’s 691,200 bytes of
>>>>>>exclusions (plus encoding overhead) per day.
>>>>>> 
>>>>>> yes, maybe using a more deterministic version number than a
>>>>>>timestamp makes sense here, but its just an example of needing a lot
>>>>>>of exclusions.
>>>>>> 
>>>>>>> 
>>>>>>>> You exclude through 100 then issue a new interest.  This goes to
>>>>>>>>cache B
>>>>>>> 
>>>>>>> I feel this case is invalid because cache A will also get the
>>>>>>> interest, and cache A will return v101 if it exists. Like you
>>>>>>>said, if
>>>>>>> this goes to cache B only, it means that cache A dies. How do you
>>>>>>>know
>>>>>>> that v101 even exist?
>>>>>> 
>>>>>> I guess this depends on what the forwarding strategy is.  If the
>>>>>>forwarder will always send each interest to all replicas, then yes,
>>>>>>modulo packet loss, you would discover v101 on cache A.  If the
>>>>>>forwarder is just doing “best path” and can round-robin between
>>>>>>cache A and cache B, then your application could miss v101.
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> c,d In general I agree that LPM performance is related to the
>>>>>>>number
>>>>>>> of components. In my own thread-safe LMP implementation, I used
>>>>>>>only
>>>>>>> one RWMutex for the whole tree. I don't know whether adding lock
>>>>>>>for
>>>>>>> every node will be faster or not because of lock overhead.
>>>>>>> 
>>>>>>> However, we should compare (exact match + discovery protocol) vs
>>>>>>>(ndn
>>>>>>> lpm). Comparing performance of exact match to lpm is unfair.
>>>>>> 
>>>>>> Yes, we should compare them.  And we need to publish the ccnx 1.0
>>>>>>specs for doing the exact match discovery.  So, as I said, I’m not
>>>>>>ready to claim its better yet because we have not done that.
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM,  <Marc.Mosko at parc.com> wrote:
>>>>>>>> I would point out that using LPM on content object to Interest
>>>>>>>>matching to do discovery has its own set of problems.  Discovery
>>>>>>>>involves more than just “latest version” discovery too.
>>>>>>>> 
>>>>>>>> This is probably getting off-topic from the original post about
>>>>>>>>naming conventions.
>>>>>>>> 
>>>>>>>> a.  If Interests can be forwarded multiple directions and two
>>>>>>>>different caches are responding, the exclusion set you build up
>>>>>>>>talking with cache A will be invalid for cache B.  If you talk
>>>>>>>>sometimes to A and sometimes to B, you very easily could miss
>>>>>>>>content objects you want to discovery unless you avoid all range
>>>>>>>>exclusions and only exclude explicit versions.  That will lead to
>>>>>>>>very large interest packets.  In ccnx 1.0, we believe that an
>>>>>>>>explicit discovery protocol that allows conversations about
>>>>>>>>consistent sets is better.
>>>>>>>> 
>>>>>>>> b. Yes, if you just want the “latest version” discovery that
>>>>>>>>should be transitive between caches, but imagine this.  You send
>>>>>>>>Interest #1 to cache A which returns version 100.  You exclude
>>>>>>>>through 100 then issue a new interest.  This goes to cache B who
>>>>>>>>only has version 99, so the interest times out or is NACK’d.  So
>>>>>>>>you think you have it!  But, cache A already has version 101, you
>>>>>>>>just don’t know.   If you cannot have a conversation around
>>>>>>>>consistent sets, it seems like even doing latest version discovery
>>>>>>>>is difficult with selector based discovery.  From what I saw in
>>>>>>>>ccnx 0.x, one ended up getting an Interest all the way to the
>>>>>>>>authoritative source because you can never believe an intermediate
>>>>>>>>cache that there’s not something more recent.
>>>>>>>> 
>>>>>>>> I’m sure you’ve walked through cases (a) and (b) in ndn, I’d be
>>>>>>>>interest in seeing your analysis.  Case (a) is that a node can
>>>>>>>>correctly discover every version of a name prefix, and (b) is that
>>>>>>>>a node can correctly discover the latest version.  We have not
>>>>>>>>formally compared (or yet published) our discovery protocols (we
>>>>>>>>have three, 2 for content, 1 for device) compared to selector
>>>>>>>>based discovery, so I cannot yet claim they are better, but they
>>>>>>>>do not have the non-determinism sketched above.
>>>>>>>> 
>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups you
>>>>>>>>must do in the PIT to match a content object.  If you have a name
>>>>>>>>tree or a threaded hash table, those don’t all need to be hash
>>>>>>>>lookups, but you need to walk up the name tree for every prefix of
>>>>>>>>the content object name and evaluate the selector predicate.
>>>>>>>>Content Based Networking (CBN) had some some methods to create
>>>>>>>>data structures based on predicates, maybe those would be better.
>>>>>>>>But in any case, you will potentially need to retrieve many PIT
>>>>>>>>entries if there is Interest traffic for many prefixes of a root.
>>>>>>>>Even on an Intel system, you’ll likely miss cache lines, so you’ll
>>>>>>>>have a lot of NUMA access for each one.  In CCNx 1.0, even a naive
>>>>>>>>implementation only requires at most 3 lookups (one by name, one
>>>>>>>>by name + keyid, one by name + content object hash), and one can
>>>>>>>>do other things to optimize lookup for an extra write.
>>>>>>>> 
>>>>>>>> d. In (c) above, if you have a threaded name tree or are just
>>>>>>>>walking parent pointers, I suspect you’ll need locking of the
>>>>>>>>ancestors in a multi-threaded system (“threaded" here meaning LWP)
>>>>>>>>and that will be expensive.  It would be interesting to see what a
>>>>>>>>cache consistent multi-threaded name tree looks like.
>>>>>>>> 
>>>>>>>> Marc
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu <tailinchu at gmail.com>
>>>>>>>>>wrote:
>>>>>>>>> 
>>>>>>>>> I had thought about these questions, but I want to know your idea
>>>>>>>>> besides typed component:
>>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar
>>>>>>>>>things?
>>>>>>>>> 2. will removing selectors improve performance? How do we use
>>>>>>>>>other
>>>>>>>>> faster technique to replace selector?
>>>>>>>>> 3. fixed byte length and type. I agree more that type can be
>>>>>>>>>fixed
>>>>>>>>> byte, but 2 bytes for length might not be enough for future.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran)
>>>>>>>>>><oran at cisco.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu <tailinchu at gmail.com>
>>>>>>>>>>wrote:
>>>>>>>>>> 
>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can
>>>>>>>>>>>>envision we need to do, and with a few simple conventions on
>>>>>>>>>>>>how the registry of types is managed.
>>>>>>>>>>> 
>>>>>>>>>>> Could you share it with us?
>>>>>>>>>> Sure. Here’s a strawman.
>>>>>>>>>> 
>>>>>>>>>> The type space is 16 bits, so you have 65,565 types.
>>>>>>>>>> 
>>>>>>>>>> The type space is currently shared with the types used for the
>>>>>>>>>>entire protocol, that gives us two options:
>>>>>>>>>> (1) we reserve a range for name component types. Given the
>>>>>>>>>>likelihood there will be at least as much and probably more need
>>>>>>>>>>to component types than protocol extensions, we could reserve
>>>>>>>>>>1/2 of the type space, giving us 32K types for name components.
>>>>>>>>>> (2) since there is no parsing ambiguity between name components
>>>>>>>>>>and other fields of the protocol (sine they are sub-types of the
>>>>>>>>>>name type) we could reuse numbers and thereby have an entire 65K
>>>>>>>>>>name component types.
>>>>>>>>>> 
>>>>>>>>>> We divide the type space into regions, and manage it with a
>>>>>>>>>>registry. If we ever get to the point of creating an IETF
>>>>>>>>>>standard, IANA has 25 years of experience running registries and
>>>>>>>>>>there are well-understood rule sets for different kinds of
>>>>>>>>>>registries (open, requires a written spec, requires standards
>>>>>>>>>>approval).
>>>>>>>>>> 
>>>>>>>>>> - We allocate one “default" name component type for “generic
>>>>>>>>>>name”, which would be used on name prefixes and other common
>>>>>>>>>>cases where there are no special semantics on the name component.
>>>>>>>>>> - We allocate a range of name component types, say 1024, to
>>>>>>>>>>globally understood types that are part of the base or extension
>>>>>>>>>>NDN specifications (e.g. chunk#, version#, etc.
>>>>>>>>>> - We reserve some portion of the space for unanticipated uses
>>>>>>>>>>(say another 1024 types)
>>>>>>>>>> - We give the rest of the space to application assignment.
>>>>>>>>>> 
>>>>>>>>>> Make sense?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> While I’m sympathetic to that view, there are three ways in
>>>>>>>>>>>>which Moore’s law or hardware tricks will not save us from
>>>>>>>>>>>>performance flaws in the design
>>>>>>>>>>> 
>>>>>>>>>>> we could design for performance,
>>>>>>>>>> That’s not what people are advocating. We are advocating that
>>>>>>>>>>we *not* design for known bad performance and hope serendipity
>>>>>>>>>>or Moore’s Law will come to the rescue.
>>>>>>>>>> 
>>>>>>>>>>> but I think there will be a turning
>>>>>>>>>>> point when the slower design starts to become "fast enough”.
>>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so
>>>>>>>>>>things that don’t get faster while others do tend to get dropped
>>>>>>>>>>or not used because they impose a performance penalty relative
>>>>>>>>>>to the things that go faster. There is also the “low-end”
>>>>>>>>>>phenomenon where impovements in technology get applied to
>>>>>>>>>>lowering cost rather than improving performance. For those
>>>>>>>>>>environments bad performance just never get better.
>>>>>>>>>> 
>>>>>>>>>>> Do you
>>>>>>>>>>> think there will be some design of ndn that will *never* have
>>>>>>>>>>> performance improvement?
>>>>>>>>>> I suspect LPM on data will always be slow (relative to the
>>>>>>>>>>other functions).
>>>>>>>>>> i suspect exclusions will always be slow because they will
>>>>>>>>>>require extra memory references.
>>>>>>>>>> 
>>>>>>>>>> However I of course don’t claim to clairvoyance so this is just
>>>>>>>>>>speculation based on 35+ years of seeing performance improve by
>>>>>>>>>>4 orders of magnitude and still having to worry about counting
>>>>>>>>>>cycles and memory references…
>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran)
>>>>>>>>>>>>><oran at cisco.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu
>>>>>>>>>>>>><tailinchu at gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn
>>>>>>>>>>>>>to perform
>>>>>>>>>>>>> well on it. It should be the other way around: once  ndn app
>>>>>>>>>>>>>becomes
>>>>>>>>>>>>> popular, a better chip will be designed for ndn.
>>>>>>>>>>>> While I’m sympathetic to that view, there are three ways in
>>>>>>>>>>>>which Moore’s law or hardware tricks will not save us from
>>>>>>>>>>>>performance flaws in the design:
>>>>>>>>>>>> a) clock rates are not getting (much) faster
>>>>>>>>>>>> b) memory accesses are getting (relatively) more expensive
>>>>>>>>>>>> c) data structures that require locks to manipulate
>>>>>>>>>>>>successfully will be relatively more expensive, even with
>>>>>>>>>>>>near-zero lock contention.
>>>>>>>>>>>> 
>>>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in
>>>>>>>>>>>>its design. We just forgot those because the design elements
>>>>>>>>>>>>that depended on those mistakes have fallen into disuse. The
>>>>>>>>>>>>poster children for this are:
>>>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow
>>>>>>>>>>>>on modern forwarding hardware, so they can’t be reliably used
>>>>>>>>>>>>anywhere
>>>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was
>>>>>>>>>>>>specified and is now a giant PITA that still causes major pain
>>>>>>>>>>>>in working around.
>>>>>>>>>>>> 
>>>>>>>>>>>> I’m afraid students today are being taught the that designers
>>>>>>>>>>>>of IP were flawless, as opposed to very good scientists and
>>>>>>>>>>>>engineers that got most of it right.
>>>>>>>>>>>> 
>>>>>>>>>>>>> I feel the discussion today and yesterday has been
>>>>>>>>>>>>>off-topic. Now I
>>>>>>>>>>>>> see that there are 3 approaches:
>>>>>>>>>>>>> 1. we should not define a naming convention at all
>>>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of
>>>>>>>>>>>>>types
>>>>>>>>>>>>> 3. marked component: introduce only one more type and add
>>>>>>>>>>>>>additional
>>>>>>>>>>>>> marker space
>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can
>>>>>>>>>>>>envision we need to do, and with a few simple conventions on
>>>>>>>>>>>>how the registry of types is managed.
>>>>>>>>>>>> 
>>>>>>>>>>>> It is just as powerful in practice as either throwing up our
>>>>>>>>>>>>hands and letting applications design their own mutually
>>>>>>>>>>>>incompatible schemes or trying to make naming conventions with
>>>>>>>>>>>>markers in a way that is fast to generate/parse and also
>>>>>>>>>>>>resilient against aliasing.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming
>>>>>>>>>>>>>convention
>>>>>>>>>>>>> needs to be revised.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe <felix at rabe.io>
>>>>>>>>>>>>>>wrote:
>>>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names
>>>>>>>>>>>>>>to fit in (the
>>>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in
>>>>>>>>>>>>>>current NDN
>>>>>>>>>>>>>> experiments?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I guess wide deployment could make for even longer names.
>>>>>>>>>>>>>>Related: Many URLs
>>>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column
>>>>>>>>>>>>>>text lines, and
>>>>>>>>>>>>>> NDN will have to carry more information than URLs, as far
>>>>>>>>>>>>>>as I see.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some
>>>>>>>>>>>>>>architectures,
>>>>>>>>>>>>>> like the ezChip NP4.  The NP4 can hold the fist 96 frame
>>>>>>>>>>>>>>bytes in memory,
>>>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent
>>>>>>>>>>>>>>32-byte blocks
>>>>>>>>>>>>>> (there can be at most 5 blocks available at any one time).
>>>>>>>>>>>>>>If you need to
>>>>>>>>>>>>>> switch between arrays, it would be very expensive.  If you
>>>>>>>>>>>>>>have to read past
>>>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup
>>>>>>>>>>>>>>to get to the
>>>>>>>>>>>>>> name, it will be pretty expensive too.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Marc
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, <Ignacio.Solis at parc.com>
>>>>>>>>>>>>>> <Ignacio.Solis at parc.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Does this make that much difference?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If you want to parse the first 5 components. One way to do
>>>>>>>>>>>>>>it is:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes
>>>>>>>>>>>>>>from the start
>>>>>>>>>>>>>> offset of the beginning of the name.
>>>>>>>>>>>>>> OR
>>>>>>>>>>>>>> Start reading name, (find size + move ) 5 times.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> How much speed are you getting from one to the other?  You
>>>>>>>>>>>>>>seem to imply
>>>>>>>>>>>>>> that the first one is faster.  I don¹t think this is the
>>>>>>>>>>>>>>case.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In the first one you¹ll probably have to get the cache line
>>>>>>>>>>>>>>for the index,
>>>>>>>>>>>>>> then all the required cache lines for the first 5
>>>>>>>>>>>>>>components. For the
>>>>>>>>>>>>>> second, you¹ll have to get all the cache lines for the
>>>>>>>>>>>>>>first 5 components.
>>>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive
>>>>>>>>>>>>>>than
>>>>>>>>>>>>>> evaluating a number and computing an addition, you might
>>>>>>>>>>>>>>find that the
>>>>>>>>>>>>>> performance of the index is actually slower than the
>>>>>>>>>>>>>>performance of the
>>>>>>>>>>>>>> direct access.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Granted, there is a case where you don¹t access the name at
>>>>>>>>>>>>>>all, for
>>>>>>>>>>>>>> example, if you just get the offsets and then send the
>>>>>>>>>>>>>>offsets as
>>>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case
>>>>>>>>>>>>>>you may see a
>>>>>>>>>>>>>> gain IF there are more cache line misses in reading the
>>>>>>>>>>>>>>name than in
>>>>>>>>>>>>>> reading the index.   So, if the regular part of the name
>>>>>>>>>>>>>>that you¹re
>>>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the
>>>>>>>>>>>>>>name is to be
>>>>>>>>>>>>>> processed by a different processor, then your might see
>>>>>>>>>>>>>>some performance
>>>>>>>>>>>>>> gain in using the index, but in all other circumstances I
>>>>>>>>>>>>>>bet this is not
>>>>>>>>>>>>>> the case.   I may be wrong, haven¹t actually tested it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is all to say, I don¹t think we should be designing
>>>>>>>>>>>>>>the protocol with
>>>>>>>>>>>>>> only one architecture in mind. (The architecture of sending
>>>>>>>>>>>>>>the name to a
>>>>>>>>>>>>>> different processor than the index).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If you have numbers that show that the index is faster I
>>>>>>>>>>>>>>would like to see
>>>>>>>>>>>>>> under what conditions and architectural assumptions.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Nacho
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (I may have misinterpreted your description so feel free to
>>>>>>>>>>>>>>correct me if
>>>>>>>>>>>>>> I¹m wrong.)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Nacho (Ignacio) Solis
>>>>>>>>>>>>>> Protocol Architect
>>>>>>>>>>>>>> Principal Scientist
>>>>>>>>>>>>>> Palo Alto Research Center (PARC)
>>>>>>>>>>>>>> +1(650)812-4458
>>>>>>>>>>>>>> Ignacio.Solis at parc.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo"
>>>>>>>>>>>>>><massimo.gallo at alcatel-lucent.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Indeed each components' offset must be encoded using a
>>>>>>>>>>>>>>fixed amount of
>>>>>>>>>>>>>> bytes:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> i.e.,
>>>>>>>>>>>>>> Type = Offsets
>>>>>>>>>>>>>> Length = 10 Bytes
>>>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your
>>>>>>>>>>>>>>name is too
>>>>>>>>>>>>>> long.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Max
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose
>>>>>>>>>>>>>>you only
>>>>>>>>>>>>>> want the first x components) you can directly have it using
>>>>>>>>>>>>>>the
>>>>>>>>>>>>>> offsets. With the Nested TLV structure you have to
>>>>>>>>>>>>>>iteratively parse
>>>>>>>>>>>>>> the first x-1 components. With the offset structure you
>>>>>>>>>>>>>>cane directly
>>>>>>>>>>>>>> access to the firs x components.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don't get it. What you described only works if the
>>>>>>>>>>>>>>"offset" is
>>>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to
>>>>>>>>>>>>>>parse x-1
>>>>>>>>>>>>>> offsets to get to the x offset.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo
>>>>>>>>>>>>>> <massimo.gallo at alcatel-lucent.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I
>>>>>>>>>>>>>>like the
>>>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I
>>>>>>>>>>>>>>understand what
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an
>>>>>>>>>>>>>>entirely
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>> scheme where the info that describes the name-components is
>>>>>>>>>>>>>>...
>>>>>>>>>>>>>> someplace
>>>>>>>>>>>>>> other than _in_ the name-components. is that correct? when
>>>>>>>>>>>>>>you say
>>>>>>>>>>>>>> "field
>>>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from
>>>>>>>>>>>>>>a TLV)?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Correct.
>>>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the
>>>>>>>>>>>>>>name
>>>>>>>>>>>>>> hierarchy
>>>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the
>>>>>>>>>>>>>>offset to use
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> order to retrieve special components.
>>>>>>>>>>>>>> As for the field separator, it is something like "/".
>>>>>>>>>>>>>>Aliasing is
>>>>>>>>>>>>>> avoided as
>>>>>>>>>>>>>> you do not rely on field separators to parse the name; you
>>>>>>>>>>>>>>use the
>>>>>>>>>>>>>> "offset
>>>>>>>>>>>>>> TLV " to do that.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So now, it may be an aesthetic question but:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose
>>>>>>>>>>>>>>you only
>>>>>>>>>>>>>> want
>>>>>>>>>>>>>> the first x components) you can directly have it using the
>>>>>>>>>>>>>>offsets.
>>>>>>>>>>>>>> With the
>>>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the
>>>>>>>>>>>>>>first x-1
>>>>>>>>>>>>>> components.
>>>>>>>>>>>>>> With the offset structure you cane directly access to the
>>>>>>>>>>>>>>firs x
>>>>>>>>>>>>>> components.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Max
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- Mark
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The why is simple:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You use a lot of "generic component type" and very few
>>>>>>>>>>>>>>"specific
>>>>>>>>>>>>>> component type". You are imposing types for every component
>>>>>>>>>>>>>>in order
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a
>>>>>>>>>>>>>>rule
>>>>>>>>>>>>>> (specify
>>>>>>>>>>>>>> the component's type ) to handle exceptions!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I would prefer not to have typed components. Instead I
>>>>>>>>>>>>>>would prefer
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> have the name as simple sequence bytes with a field
>>>>>>>>>>>>>>separator. Then,
>>>>>>>>>>>>>> outside the name, if you have some components that could be
>>>>>>>>>>>>>>used at
>>>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something
>>>>>>>>>>>>>>that
>>>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the
>>>>>>>>>>>>>>version,
>>>>>>>>>>>>>> segment, etc in the name...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Max
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think we agree on the small number of "component types".
>>>>>>>>>>>>>> However, if you have a small number of types, you will end
>>>>>>>>>>>>>>up with
>>>>>>>>>>>>>> names
>>>>>>>>>>>>>> containing many generic components types and few specific
>>>>>>>>>>>>>> components
>>>>>>>>>>>>>> types. Due to the fact that the component type
>>>>>>>>>>>>>>specification is an
>>>>>>>>>>>>>> exception in the name, I would prefer something that specify
>>>>>>>>>>>>>> component's
>>>>>>>>>>>>>> type only when needed (something like UTF8 conventions but
>>>>>>>>>>>>>>that
>>>>>>>>>>>>>> applications MUST use).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some
>>>>>>>>>>>>>> explanation
>>>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing,
>>>>>>>>>>>>>>e.g.)
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> there's been email trying to explain that applications
>>>>>>>>>>>>>>don't have to
>>>>>>>>>>>>>> use types if they don't need to. your email sounds like "I
>>>>>>>>>>>>>>prefer
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that
>>>>>>>>>>>>>>preference in
>>>>>>>>>>>>>> the face of the points about the problems. can you say why
>>>>>>>>>>>>>>it is
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> you express a preference for the "convention" with problems
>>>>>>>>>>>>>>?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ndn-interest mailing list
>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ndn-interest mailing list
>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ndn-interest mailing list
>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ndn-interest mailing list
>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Ndn-interest mailing list
>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Ndn-interest mailing list
>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Ndn-interest mailing list
>>>>>>>>> Ndn-interest at lists.cs.ucla.edu
>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>>>>>> 
>
>_______________________________________________
>Ndn-interest mailing list
>Ndn-interest at lists.cs.ucla.edu
>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest