[Ndn-interest] any comments on naming convention?

Marc.Mosko at parc.com Marc.Mosko at parc.com
Wed Sep 24 00:37:14 PDT 2014


Ok, let’s take that example and run with it a bit.  I’ll walk through a “discover all” example.  This example leads me to why I say discovery should be separate from data retrieval.  I don’t claim that we have a final solution to this problem, I think in a distributed peer-to-peer environment solving this problem is difficult.  If you have a counter example as to how this discovery could progress using only the information know a priori by the requester, I would be interesting in seeing that example worked out.  Please do correct me if you think this is wrong.

You have mails that were originally numbered 0 - 10000, sequentially by the server.  

You travel between several places and access different emails from different places.  This populates caches.  Lets say 0,3,6,9,… are on cache A, 1,4,7,10,… are on cache B ,and  2,5,8,11… are on cache C.   Also, you have deleted 500 random emails, so there’s only 9500 emails actually out there.

You setup a new computer and now want to download all your emails.  The new computer is on the path of caches C, B, then A, then the authoritative source server.  The new email program has no initial state.  The email program only knows that the email number is an integer that starts at 0.  It issues an interest for /mail/inbox, and asks for left-most child because it want to populate in order.  It gets a response from cache C with mail 2. 

Now, what does the email program do?  It cannot exclude the range 0..2 because that would possibly miss 0 and 1.  So, all it can do is exclude the exact number “2” and ask again.  It then gets cache C again and it responds with “5”.  There are about 3000 emails on cache C, and if they all take 4 bytes (for the exclude component plus its coding overhead), then that’s 12KB of exclusions to finally exhaust cache C.

If we want Interests to avoid fragmentation, we can fit about 1200 bytes of exclusions, or 300 components.  This means we need about 10 interest messages.  Each interest would be something like “exclude 2,5,8,11,…, >300”, then the next would be “exclude < 300, 302, 305, 308, …, >600”, etc.

Those interests that exclude everything at cache C would then hit, say cache B and start getting results 1, 4, 7, ….  This means an Interest like  “exclude 2,5,8,11,…, >300” would then get back number 1.  That means the next request actually has to split that one interest’s exclude in to two (because the interest was at maximum size), so you now issue two interests where one is “exclude 1, 2, 5, 8, >210” and the other is “<210, 212, 215, …, >300”.

If you look in the CCNx 0.8 java code, there should be a class that does these Interest based discoveries and does the Interest splitting based on the currently know range of discovered content.  I don’t have the specific reference right now, but I can send a link if you are interested in seeing that.  The java class keeps state of what has been discovered so far, so it could re-start later if interrupted.

So all those interests would now be getting results form cache B.  You would then start to split all those ranges to accommodate the numbers coming back from B.  Eventually, you’ll have at least 10 Interest messages outstanding that would be excluding all the 9500 messages that are in caches A, B, and C.  Some of those interest messages might actually reach an authoritative server, which might respond too.  It would like be more than 10 interests due to the algorithm that’s used to split full interests, which likely is not optimal because it does not know exactly where breaks should be a priori.

Once you have exhausted caches A, B, and C, the interest messages would reach the authoritative source (if its on line), and it would be issuing NACKs (i assume) for interests have have excluded all non-deleted emails.

In any case, it takes, at best, 9,500 round trips to “discover” all 9500 emails.  It also required Sum_{i=1..10000} 4*i =  200,020,000 bytes of Interest exclusions.  Note that it’s an arithmetic sum of bytes of exclusion, because at each Interest the size of the exclusions increases by 4.  There was an NDN paper about light bulb discovery (or something like that) that noted this same problem and proposed some work around, but I don’t remember what they proposed.

Yes, you could possibly pipeline it, but what would you do?  In this example, where emails 0 - 10000 (minus some random ones) would allow you — if you knew a priori — to issue say 10 interest in parallel that ask for different ranges.  But, 2 years from now your undeleted emails might range form 100,000 - 150,000.  The point is that a discovery protocol does not know, a priori, what is to be discovered.  It might start learning some stuff as it goes on.

If you could have retrieved just a table of contents from each cache, where each “row” is say 64 bytes (i.e. the name continuation plus hash value), you would need to retrieve 3300 * 64 = 211KB from each cache (total 640 KB) to list all the emails.  That would take 640KB / 1200 = 534 interest messages of say 64 bytes = 34 KB to discover all 9500 emails plus another set to fetch the header rows. That’s, say 68 KB of interest traffic compared to 200 MB.  Now, I’ve not said how to list these tables of contents, so an actual protocol might be higher communication cost, but even if it was 10x worse that would still be an attractive tradeoff.

This assumes that you publish just the “header” in the 1st segment (say 1 KB total object size including the signatures).  That’s 10 MB to learn the headers.

You could also argue that the distribute of emails over caches is arbitrary.  That’s true, I picked a difficult sequence.  But unless you have some positive controls on what could be in a cache, it could be any difficult sequence.  I also did not address the timeout issue, and how do you know you are done?

This is also why sync works so much better than doing raw interest discovery.  Sync exchanges tables of contents and diffs, it does not need to enumerate by exclusion everything to retrieve.

Marc



On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu <tailinchu at gmail.com> wrote:

> discovery can be reduced to "pattern detection" (can we infer what
> exists?) and "pattern validation" (can we confirm this guess?)
> 
> For example, I see a pattern /mail/inbox/148. I, a human being, see a
> pattern with static (/mail/inbox) and variable (148) components; with
> proper naming convention, computers can also detect this pattern
> easily. Now I want to look for all mails in my inbox. I can generate a
> list of /mail/inbox/<number>. These are my guesses, and with selectors
> I can further refine my guesses.
> 
> To validate them, bloom filter can provide "best effort"
> discovery(with some false positives, so I call it "best-effort")
> before I stupidly send all the interests to the network.
> 
> The discovery protocol, as I described above, is essentially "pattern
> detection by naming convention" and "bloom filter validation." This is
> definitely one of the "simpler" discovery protocol, because the data
> producer only need to add additional bloom filter. Notice that we can
> progressively add entries to bfilter with low computation cost.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Sep 23, 2014 at 2:34 AM,  <Marc.Mosko at parc.com> wrote:
>> Ok, yes I think those would all be good things.
>> 
>> One thing to keep in mind, especially with things like time series sensor
>> data, is that people see a pattern and infer a way of doing it.  That’s easy
>> for a human :)  But in Discovery, one should assume that one does not know
>> of patterns in the data beyond what the protocols used to publish the data
>> explicitly require.  That said, I think some of the things you listed are
>> good places to start: sensor data, web content, climate data or genome data.
>> 
>> We also need to state what the forwarding strategies are and what the cache
>> behavior is.
>> 
>> I outlined some of the points that I think are important in that other
>> posting.  While “discover latest” is useful, “discover all” is also
>> important, and that one gets complicated fast.  So points like separating
>> discovery from retrieval and working with large data sets have been
>> important in shaping our thinking.  That all said, I’d be happy starting
>> from 0 and working through the Discovery service definition from scratch
>> along with data set use cases.
>> 
>> Marc
>> 
>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff <jburke at remap.ucla.edu> wrote:
>> 
>> Hi Marc,
>> 
>> Thanks – yes, I saw that as well. I was just trying to get one step more
>> specific, which was to see if we could identify a few specific use cases
>> around which to have the conversation.  (e.g., time series sensor data and
>> web content retrieval for "get latest"; climate data for huge data sets;
>> local data in a vehicular network; etc.)  What have you been looking at
>> that's driving considerations of discovery?
>> 
>> Thanks,
>> Jeff
>> 
>> From: <Marc.Mosko at parc.com>
>> Date: Mon, 22 Sep 2014 22:29:43 +0000
>> To: Jeff Burke <jburke at remap.ucla.edu>
>> Cc: <tailinchu at gmail.com>, <ndn-interest at lists.cs.ucla.edu>
>> Subject: Re: [Ndn-interest] any comments on naming convention?
>> 
>> Jeff,
>> 
>> Take a look at my posting (that Felix fixed) in a new thread on Discovery.
>> 
>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html
>> 
>> I think it would be very productive to talk about what Discovery should do,
>> and not focus on the how.  It is sometimes easy to get caught up in the how,
>> which I think is a less important topic than the what at this stage.
>> 
>> Marc
>> 
>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff <jburke at remap.ucla.edu> wrote:
>> 
>> Marc,
>> 
>> If you can't talk about your protocols, perhaps we can discuss this based
>> on use cases.   What are the use cases you are using to evaluate
>> discovery?
>> 
>> Jeff
>> 
>> 
>> 
>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" <Marc.Mosko at parc.com> wrote:
>> 
>> No matter what the expressiveness of the predicates if the forwarder can
>> send interests different ways you don't have a consistent underlying set
>> to talk about so you would always need non-range exclusions to discover
>> every version.
>> 
>> Range exclusions only work I believe if you get an authoritative answer.
>> If different content pieces are scattered between different caches I
>> don't see how range exclusions would work to discover every version.
>> 
>> I'm sorry to be pointing out problems without offering solutions but
>> we're not ready to publish our discovery protocols.
>> 
>> Sent from my telephone
>> 
>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" <tailinchu at gmail.com> wrote:
>> 
>> I see. Can you briefly describe how ccnx discovery protocol solves the
>> all problems that you mentioned (not just exclude)? a doc will be
>> better.
>> 
>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon
>> expect [and] and [or], so boolean algebra is fully supported.  Regular
>> language or context free language might become part of selector too.
>> 
>> On Sat, Sep 20, 2014 at 11:25 PM,  <Marc.Mosko at parc.com> wrote:
>> That will get you one reading then you need to exclude it and ask
>> again.
>> 
>> Sent from my telephone
>> 
>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" <tailinchu at gmail.com> wrote:
>> 
>> Yes, my point was that if you cannot talk about a consistent set
>> with a particular cache, then you need to always use individual
>> excludes not range excludes if you want to discover all the versions
>> of an object.
>> 
>> 
>> I am very confused. For your example, if I want to get all today's
>> sensor data, I just do (Any..Last second of last day)(First second of
>> tomorrow..Any). That's 18 bytes.
>> 
>> 
>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude
>> 
>> On Sat, Sep 20, 2014 at 10:55 PM,  <Marc.Mosko at parc.com> wrote:
>> 
>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu <tailinchu at gmail.com> wrote:
>> 
>> If you talk sometimes to A and sometimes to B, you very easily
>> could miss content objects you want to discovery unless you avoid
>> all range exclusions and only exclude explicit versions.
>> 
>> 
>> Could you explain why missing content object situation happens? also
>> range exclusion is just a shorter notation for many explicit
>> exclude;
>> converting from explicit excludes to ranged exclude is always
>> possible.
>> 
>> 
>> Yes, my point was that if you cannot talk about a consistent set
>> with a particular cache, then you need to always use individual
>> excludes not range excludes if you want to discover all the versions
>> of an object.  For something like a sensor reading that is updated,
>> say, once per second you will have 86,400 of them per day.  If each
>> exclusion is a timestamp (say 8 bytes), that’s 691,200 bytes of
>> exclusions (plus encoding overhead) per day.
>> 
>> yes, maybe using a more deterministic version number than a
>> timestamp makes sense here, but its just an example of needing a lot
>> of exclusions.
>> 
>> 
>> You exclude through 100 then issue a new interest.  This goes to
>> cache B
>> 
>> 
>> I feel this case is invalid because cache A will also get the
>> interest, and cache A will return v101 if it exists. Like you said,
>> if
>> this goes to cache B only, it means that cache A dies. How do you
>> know
>> that v101 even exist?
>> 
>> 
>> I guess this depends on what the forwarding strategy is.  If the
>> forwarder will always send each interest to all replicas, then yes,
>> modulo packet loss, you would discover v101 on cache A.  If the
>> forwarder is just doing “best path” and can round-robin between cache
>> A and cache B, then your application could miss v101.
>> 
>> 
>> 
>> c,d In general I agree that LPM performance is related to the number
>> of components. In my own thread-safe LMP implementation, I used only
>> one RWMutex for the whole tree. I don't know whether adding lock for
>> every node will be faster or not because of lock overhead.
>> 
>> However, we should compare (exact match + discovery protocol) vs
>> (ndn
>> lpm). Comparing performance of exact match to lpm is unfair.
>> 
>> 
>> Yes, we should compare them.  And we need to publish the ccnx 1.0
>> specs for doing the exact match discovery.  So, as I said, I’m not
>> ready to claim its better yet because we have not done that.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sat, Sep 20, 2014 at 2:38 PM,  <Marc.Mosko at parc.com> wrote:
>> I would point out that using LPM on content object to Interest
>> matching to do discovery has its own set of problems.  Discovery
>> involves more than just “latest version” discovery too.
>> 
>> This is probably getting off-topic from the original post about
>> naming conventions.
>> 
>> a.  If Interests can be forwarded multiple directions and two
>> different caches are responding, the exclusion set you build up
>> talking with cache A will be invalid for cache B.  If you talk
>> sometimes to A and sometimes to B, you very easily could miss
>> content objects you want to discovery unless you avoid all range
>> exclusions and only exclude explicit versions.  That will lead to
>> very large interest packets.  In ccnx 1.0, we believe that an
>> explicit discovery protocol that allows conversations about
>> consistent sets is better.
>> 
>> b. Yes, if you just want the “latest version” discovery that
>> should be transitive between caches, but imagine this.  You send
>> Interest #1 to cache A which returns version 100.  You exclude
>> through 100 then issue a new interest.  This goes to cache B who
>> only has version 99, so the interest times out or is NACK’d.  So
>> you think you have it!  But, cache A already has version 101, you
>> just don’t know.   If you cannot have a conversation around
>> consistent sets, it seems like even doing latest version discovery
>> is difficult with selector based discovery.  From what I saw in
>> ccnx 0.x, one ended up getting an Interest all the way to the
>> authoritative source because you can never believe an intermediate
>> cache that there’s not something more recent.
>> 
>> I’m sure you’ve walked through cases (a) and (b) in ndn, I’d be
>> interest in seeing your analysis.  Case (a) is that a node can
>> correctly discover every version of a name prefix, and (b) is that
>> a node can correctly discover the latest version.  We have not
>> formally compared (or yet published) our discovery protocols (we
>> have three, 2 for content, 1 for device) compared to selector based
>> discovery, so I cannot yet claim they are better, but they do not
>> have the non-determinism sketched above.
>> 
>> c. Using LPM, there is a non-deterministic number of lookups you
>> must do in the PIT to match a content object.  If you have a name
>> tree or a threaded hash table, those don’t all need to be hash
>> lookups, but you need to walk up the name tree for every prefix of
>> the content object name and evaluate the selector predicate.
>> Content Based Networking (CBN) had some some methods to create data
>> structures based on predicates, maybe those would be better.  But
>> in any case, you will potentially need to retrieve many PIT entries
>> if there is Interest traffic for many prefixes of a root.  Even on
>> an Intel system, you’ll likely miss cache lines, so you’ll have a
>> lot of NUMA access for each one.  In CCNx 1.0, even a naive
>> implementation only requires at most 3 lookups (one by name, one by
>> name + keyid, one by name + content object hash), and one can do
>> other things to optimize lookup for an extra write.
>> 
>> d. In (c) above, if you have a threaded name tree or are just
>> walking parent pointers, I suspect you’ll need locking of the
>> ancestors in a multi-threaded system (“threaded" here meaning LWP)
>> and that will be expensive.  It would be interesting to see what a
>> cache consistent multi-threaded name tree looks like.
>> 
>> Marc
>> 
>> 
>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu <tailinchu at gmail.com>
>> wrote:
>> 
>> I had thought about these questions, but I want to know your idea
>> besides typed component:
>> 1. LPM allows "data discovery". How will exact match do similar
>> things?
>> 2. will removing selectors improve performance? How do we use
>> other
>> faster technique to replace selector?
>> 3. fixed byte length and type. I agree more that type can be fixed
>> byte, but 2 bytes for length might not be enough for future.
>> 
>> 
>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran)
>> <oran at cisco.com> wrote:
>> 
>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu <tailinchu at gmail.com>
>> wrote:
>> 
>> I know how to make #2 flexible enough to do what things I can
>> envision we need to do, and with a few simple conventions on
>> how the registry of types is managed.
>> 
>> 
>> Could you share it with us?
>> 
>> Sure. Here’s a strawman.
>> 
>> The type space is 16 bits, so you have 65,565 types.
>> 
>> The type space is currently shared with the types used for the
>> entire protocol, that gives us two options:
>> (1) we reserve a range for name component types. Given the
>> likelihood there will be at least as much and probably more need
>> to component types than protocol extensions, we could reserve 1/2
>> of the type space, giving us 32K types for name components.
>> (2) since there is no parsing ambiguity between name components
>> and other fields of the protocol (sine they are sub-types of the
>> name type) we could reuse numbers and thereby have an entire 65K
>> name component types.
>> 
>> We divide the type space into regions, and manage it with a
>> registry. If we ever get to the point of creating an IETF
>> standard, IANA has 25 years of experience running registries and
>> there are well-understood rule sets for different kinds of
>> registries (open, requires a written spec, requires standards
>> approval).
>> 
>> - We allocate one “default" name component type for “generic
>> name”, which would be used on name prefixes and other common
>> cases where there are no special semantics on the name component.
>> - We allocate a range of name component types, say 1024, to
>> globally understood types that are part of the base or extension
>> NDN specifications (e.g. chunk#, version#, etc.
>> - We reserve some portion of the space for unanticipated uses
>> (say another 1024 types)
>> - We give the rest of the space to application assignment.
>> 
>> Make sense?
>> 
>> 
>> While I’m sympathetic to that view, there are three ways in
>> which Moore’s law or hardware tricks will not save us from
>> performance flaws in the design
>> 
>> 
>> we could design for performance,
>> 
>> That’s not what people are advocating. We are advocating that we
>> *not* design for known bad performance and hope serendipity or
>> Moore’s Law will come to the rescue.
>> 
>> but I think there will be a turning
>> point when the slower design starts to become "fast enough”.
>> 
>> Perhaps, perhaps not. Relative performance is what matters so
>> things that don’t get faster while others do tend to get dropped
>> or not used because they impose a performance penalty relative to
>> the things that go faster. There is also the “low-end” phenomenon
>> where impovements in technology get applied to lowering cost
>> rather than improving performance. For those environments bad
>> performance just never get better.
>> 
>> Do you
>> think there will be some design of ndn that will *never* have
>> performance improvement?
>> 
>> I suspect LPM on data will always be slow (relative to the other
>> functions).
>> i suspect exclusions will always be slow because they will
>> require extra memory references.
>> 
>> However I of course don’t claim to clairvoyance so this is just
>> speculation based on 35+ years of seeing performance improve by 4
>> orders of magnitude and still having to worry about counting
>> cycles and memory references…
>> 
>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran)
>> <oran at cisco.com> wrote:
>> 
>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu
>> <tailinchu at gmail.com> wrote:
>> 
>> We should not look at a certain chip nowadays and want ndn to
>> perform
>> well on it. It should be the other way around: once  ndn app
>> becomes
>> popular, a better chip will be designed for ndn.
>> 
>> While I’m sympathetic to that view, there are three ways in
>> which Moore’s law or hardware tricks will not save us from
>> performance flaws in the design:
>> a) clock rates are not getting (much) faster
>> b) memory accesses are getting (relatively) more expensive
>> c) data structures that require locks to manipulate
>> successfully will be relatively more expensive, even with
>> near-zero lock contention.
>> 
>> The fact is, IP *did* have some serious performance flaws in
>> its design. We just forgot those because the design elements
>> that depended on those mistakes have fallen into disuse. The
>> poster children for this are:
>> 1. IP options. Nobody can use them because they are too slow
>> on modern forwarding hardware, so they can’t be reliably used
>> anywhere
>> 2. the UDP checksum, which was a bad design when it was
>> specified and is now a giant PITA that still causes major pain
>> in working around.
>> 
>> I’m afraid students today are being taught the that designers
>> of IP were flawless, as opposed to very good scientists and
>> engineers that got most of it right.
>> 
>> I feel the discussion today and yesterday has been off-topic.
>> Now I
>> see that there are 3 approaches:
>> 1. we should not define a naming convention at all
>> 2. typed component: use tlv type space and add a handful of
>> types
>> 3. marked component: introduce only one more type and add
>> additional
>> marker space
>> 
>> I know how to make #2 flexible enough to do what things I can
>> envision we need to do, and with a few simple conventions on
>> how the registry of types is managed.
>> 
>> It is just as powerful in practice as either throwing up our
>> hands and letting applications design their own mutually
>> incompatible schemes or trying to make naming conventions with
>> markers in a way that is fast to generate/parse and also
>> resilient against aliasing.
>> 
>> Also everybody thinks that the current utf8 marker naming
>> convention
>> needs to be revised.
>> 
>> 
>> 
>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe <felix at rabe.io>
>> wrote:
>> Would that chip be suitable, i.e. can we expect most names
>> to fit in (the
>> magnitude of) 96 bytes? What length are names usually in
>> current NDN
>> experiments?
>> 
>> I guess wide deployment could make for even longer names.
>> Related: Many URLs
>> I encounter nowadays easily don't fit within two 80-column
>> text lines, and
>> NDN will have to carry more information than URLs, as far as
>> I see.
>> 
>> 
>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote:
>> 
>> In fact, the index in separate TLV will be slower on some
>> architectures,
>> like the ezChip NP4.  The NP4 can hold the fist 96 frame
>> bytes in memory,
>> then any subsequent memory is accessed only as two adjacent
>> 32-byte blocks
>> (there can be at most 5 blocks available at any one time).
>> If you need to
>> switch between arrays, it would be very expensive.  If you
>> have to read past
>> the name to get to the 2nd array, then read it, then backup
>> to get to the
>> name, it will be pretty expensive too.
>> 
>> Marc
>> 
>> On Sep 18, 2014, at 2:02 PM, <Ignacio.Solis at parc.com>
>> <Ignacio.Solis at parc.com> wrote:
>> 
>> Does this make that much difference?
>> 
>> If you want to parse the first 5 components. One way to do
>> it is:
>> 
>> Read the index, find entry 5, then read in that many bytes
>> from the start
>> offset of the beginning of the name.
>> OR
>> Start reading name, (find size + move ) 5 times.
>> 
>> How much speed are you getting from one to the other?  You
>> seem to imply
>> that the first one is faster.  I don¹t think this is the
>> case.
>> 
>> In the first one you¹ll probably have to get the cache line
>> for the index,
>> then all the required cache lines for the first 5
>> components. For the
>> second, you¹ll have to get all the cache lines for the first
>> 5 components.
>> Given an assumption that a cache miss is way more expensive
>> than
>> evaluating a number and computing an addition, you might
>> find that the
>> performance of the index is actually slower than the
>> performance of the
>> direct access.
>> 
>> Granted, there is a case where you don¹t access the name at
>> all, for
>> example, if you just get the offsets and then send the
>> offsets as
>> parameters to another processor/GPU/NPU/etc. In this case
>> you may see a
>> gain IF there are more cache line misses in reading the name
>> than in
>> reading the index.   So, if the regular part of the name
>> that you¹re
>> parsing is bigger than the cache line (64 bytes?) and the
>> name is to be
>> processed by a different processor, then your might see some
>> performance
>> gain in using the index, but in all other circumstances I
>> bet this is not
>> the case.   I may be wrong, haven¹t actually tested it.
>> 
>> This is all to say, I don¹t think we should be designing the
>> protocol with
>> only one architecture in mind. (The architecture of sending
>> the name to a
>> different processor than the index).
>> 
>> If you have numbers that show that the index is faster I
>> would like to see
>> under what conditions and architectural assumptions.
>> 
>> Nacho
>> 
>> (I may have misinterpreted your description so feel free to
>> correct me if
>> I¹m wrong.)
>> 
>> 
>> --
>> Nacho (Ignacio) Solis
>> Protocol Architect
>> Principal Scientist
>> Palo Alto Research Center (PARC)
>> +1(650)812-4458
>> Ignacio.Solis at parc.com
>> 
>> 
>> 
>> 
>> 
>> On 9/18/14, 12:54 AM, "Massimo Gallo"
>> <massimo.gallo at alcatel-lucent.com>
>> wrote:
>> 
>> Indeed each components' offset must be encoded using a fixed
>> amount of
>> bytes:
>> 
>> i.e.,
>> Type = Offsets
>> Length = 10 Bytes
>> Value = Offset1(1byte), Offset2(1byte), ...
>> 
>> You may also imagine to have a "Offset_2byte" type if your
>> name is too
>> long.
>> 
>> Max
>> 
>> On 18/09/2014 09:27, Tai-Lin Chu wrote:
>> 
>> if you do not need the entire hierarchal structure (suppose
>> you only
>> want the first x components) you can directly have it using
>> the
>> offsets. With the Nested TLV structure you have to
>> iteratively parse
>> the first x-1 components. With the offset structure you cane
>> directly
>> access to the firs x components.
>> 
>> I don't get it. What you described only works if the
>> "offset" is
>> encoded in fixed bytes. With varNum, you will still need to
>> parse x-1
>> offsets to get to the x offset.
>> 
>> 
>> 
>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo
>> <massimo.gallo at alcatel-lucent.com> wrote:
>> 
>> On 17/09/2014 14:56, Mark Stapp wrote:
>> 
>> ah, thanks - that's helpful. I thought you were saying "I
>> like the
>> existing NDN UTF8 'convention'." I'm still not sure I
>> understand what
>> you
>> _do_ prefer, though. it sounds like you're describing an
>> entirely
>> different
>> scheme where the info that describes the name-components is
>> ...
>> someplace
>> other than _in_ the name-components. is that correct? when
>> you say
>> "field
>> separator", what do you mean (since that's not a "TL" from a
>> TLV)?
>> 
>> Correct.
>> In particular, with our name encoding, a TLV indicates the
>> name
>> hierarchy
>> with offsets in the name and other TLV(s) indicates the
>> offset to use
>> in
>> order to retrieve special components.
>> As for the field separator, it is something like "/".
>> Aliasing is
>> avoided as
>> you do not rely on field separators to parse the name; you
>> use the
>> "offset
>> TLV " to do that.
>> 
>> So now, it may be an aesthetic question but:
>> 
>> if you do not need the entire hierarchal structure (suppose
>> you only
>> want
>> the first x components) you can directly have it using the
>> offsets.
>> With the
>> Nested TLV structure you have to iteratively parse the first
>> x-1
>> components.
>> With the offset structure you cane directly access to the
>> firs x
>> components.
>> 
>> Max
>> 
>> 
>> -- Mark
>> 
>> On 9/17/14 6:02 AM, Massimo Gallo wrote:
>> 
>> The why is simple:
>> 
>> You use a lot of "generic component type" and very few
>> "specific
>> component type". You are imposing types for every component
>> in order
>> to
>> handle few exceptions (segmentation, etc..). You create a
>> rule
>> (specify
>> the component's type ) to handle exceptions!
>> 
>> I would prefer not to have typed components. Instead I would
>> prefer
>> to
>> have the name as simple sequence bytes with a field
>> separator. Then,
>> outside the name, if you have some components that could be
>> used at
>> network layer (e.g. a TLV field), you simply need something
>> that
>> indicates which is the offset allowing you to retrieve the
>> version,
>> segment, etc in the name...
>> 
>> 
>> Max
>> 
>> 
>> 
>> 
>> 
>> On 16/09/2014 20:33, Mark Stapp wrote:
>> 
>> On 9/16/14 10:29 AM, Massimo Gallo wrote:
>> 
>> I think we agree on the small number of "component types".
>> However, if you have a small number of types, you will end
>> up with
>> names
>> containing many generic components types and few specific
>> components
>> types. Due to the fact that the component type specification
>> is an
>> exception in the name, I would prefer something that specify
>> component's
>> type only when needed (something like UTF8 conventions but
>> that
>> applications MUST use).
>> 
>> so ... I can't quite follow that. the thread has had some
>> explanation
>> about why the UTF8 requirement has problems (with aliasing,
>> e.g.)
>> and
>> there's been email trying to explain that applications don't
>> have to
>> use types if they don't need to. your email sounds like "I
>> prefer
>> the
>> UTF8 convention", but it doesn't say why you have that
>> preference in
>> the face of the points about the problems. can you say why
>> it is
>> that
>> you express a preference for the "convention" with problems ?
>> 
>> Thanks,
>> Mark
>> 
>> .
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> 
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> 
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> 
>> 
>> _______________________________________________
>> Ndn-interest mailing list
>> Ndn-interest at lists.cs.ucla.edu
>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>> 
>> 
>> 
>> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2595 bytes
Desc: not available
URL: <http://www.lists.cs.ucla.edu/pipermail/ndn-interest/attachments/20140924/63e99380/attachment.bin>


More information about the Ndn-interest mailing list