[Nfd-dev] Clarification of 'repo' definition

Sat Sep 10 23:38:18 PDT 2016

The CCNx 0.x repo sync protocol allowed a wildcard filter in the definition of a ‘slice’.  I’ve copied the old relevant docs below.

One could create multiple slices with different filters and those would create different views on the same underlying data.  I don’t remember what actually got implemented along these lines, but I think of it along the lines of a SQL view (a virtual table).  The repo only needs to store an object once and can index its name in different slices as their filters dictate.

I believe one shortcoming of this was in a multi-hop sync, one would only see the data traverse the sync that fits the intersection of the slice filters.  We had done some other design work to allow for the transitive propagation of slice filters (where allowed) so one could ‘tunnel’ that data without it showing up in the repos without those filters.  Kind of like a slice filter subscription.

Marc

https://github.com/ProjectCCNx/ccnx/blob/master/doc/technical/CreateCollectionProtocol.txt

==== Name

The *+Name+* in a filter clause is a name prefix that restricts the names in the Collection, and may contain wild card components.

* Each wild card component in a filter clause name matches a single component in a name.

* The encoding of a wild card component is the single byte 255 (+0xFF+). To enable byte 255 to start a literal component, any pattern component that starts with byte 255 and has more than 1 byte is treated as the literal component consisting of the bytes following the initial byte.

* A name matches a filter if it matches any of the filter clauses. (All components are required.) Components of a name longer than the name in the filter clause are accepted as matching. For example, using CCN URI syntax, +/X/%FF/Z+ matches the names +/X/Y/Z+ and +/X/Y/Z/W+, but does not match the name +/X/Z+.

From: Nfd-dev <nfd-dev-bounces at lists.cs.ucla.edu> on behalf of "Burke, Jeff" <jburke at remap.UCLA.EDU>
Date: Saturday, September 10, 2016 at 9:36 PM
To: "Zhang, Lixia" <lixia at CS.UCLA.EDU>
Cc: "nfd-dev at lists.cs.ucla.edu" <nfd-dev at lists.cs.ucla.edu>
Subject: Re: [Nfd-dev] Clarification of 'repo' definition

I think I figured out how to put this in a more straightforward way:

It seems to me that is may be common (or at least reasonable) that apps will be configured and deployed such that it is desirable that repos store data for name patterns that do not correspond to “all data in a given branch”.

i.e.,  it might be desirable to have a repo for

                /ucla/bms/*/electrical/*

but we seem pushed towards naming such that the repo must be for

                /ucla/bms/electrical/*

What are the implications of this?

A use case seems to be precisely the mini-bms hierarchical aggregator example implemented this year, except split by subsystem – i.e., a vertical solution vendor (Siemens for electrical, say) provides gathering and storage for one complete data type, another vendor supports another data type (Honeywell for HVAC, for example).  Leveraging NDN they can all contribute to the same coherent namespace of BMS data and participate in aggregation across data types, but in practice, different data types are published/stored by different vendors’ systems that cover different portions of the tree...     It’s not realistic here to either 1) require one vendors’ repo store anothers data, in such a system; 2) have manufacturer-specific names when it is not relevant.

I understand this may all seem basic – one just needs to organize the data such that Interests can get directed to the right place.  But, I think the subtlety here in this use case, and probably in others, is that the topology of the nodes providing the ‘live’ data may not match that of the repos.  So these may place competing requirements on the namespace design.  (There’s more to it than that – also security requirements further drive namespace considerations, etc)

Also, one thing that I mean by “brittleness” is that the “repos must store the complete branch they register” requirement seems to require a lot of advance knowledge of the namespace, such that namespace/repo design can be coordinated in this way. I’m doubt this is realistic.

For example, what if in the BMS case, the app developers/deployers figure out a scheme that organizes repos by rooms – so each room has a repo that stores all data for the room (or set of rooms) it registers - supporting
                /ucla/bms/melnitz/room=1403/electrical
                /ucla/bms/melnitz/room=2910/hvac, etc.
So there is a ‘room=2910’ repo and/or a melnitz repo, and a campus bms repo, etc.

What if then, the system operators acquire a new, high bandwidth sensor type and install it in all the rooms a year later:
                /ucla/bms/melnitz/room=2910/hd-equipment-room-camera

For administrative reasons, I want this under the /ucla/bms hierarchy, as is simplifies the trust management.
And I can afford to put one new box in each building to act as a repo for the data.

I’m not sure what to do about this:

1) Sure, maybe I misconfigured the namespace, but it wasn’t unreasonable when I deployed originally, and it’s hard to change now.
2) Maybe I really should put the cameras under some other namespace, but that has other implications for configuring trust management and/or interaction with other subsystems in the room
3) I can update all of the repos to register all their available child prefixes, perhaps...  (so I touch all application configuration?)
4) I can customize the forwarding somehow to get the camera interests to the appropriate place

I *think* what I’d like to be able to do in deployment is to be able to start up the new build-level camera repos and have them register their prefixes, using commands signed appropriately, and just work.  (I think it is not so far off to achieve this, if strategy does LPM to forward. But unless I am missing something, it does require the pre-existing repos, which don’t have the resources or desire to hold the new data, don’t have to.)

Perhaps we tend to think about namespace design as if it can always be done in advance, but it seems like it will grow more messily, and that resources will be added deep in a given namespace sometimes, and it would be nice if that “adding storage” to a namespace can be done without causing cascading responsibilities to be placed on repos already ‘covering’ the affected branches.

Jeff

From: Lixia Zhang <lixia at cs.ucla.edu>
Date: Saturday, September 10, 2016 at 8:46 PM
To: Jeff Burke <jburke at remap.ucla.edu>
Cc: "nfd-dev at lists.cs.ucla.edu" <nfd-dev at lists.cs.ucla.edu>
Subject: Re: [Nfd-dev] Clarification of 'repo' definition

On Sep 10, 2016, at 2:57 PM, Burke, Jeff <jburke at remap.ucla.edu<mailto:jburke at remap.ucla.edu>> wrote:

On Sep 8, 2016, at 7:18 AM, Burke, Jeff <jburke at remap.ucla.edu<mailto:jburke at remap.ucla.edu>> wrote:

Hi,

(Sorry for these messages all at once, and let me know if this should go to ndn-interest.)

Through an earlier conversation with Lixia this week, I realized that the definition / expectations of a ‘repo’ may be more specific than I previously understood.  In particular, she mentioned that repos storing and registering the same namespace should (eventually) have the same data for that namespace, presumably via sync.

I'll start by saying that this is all researchy: we are walking into a new territory that we are yet to fully explore.
So what I said, before or here, is just to share ideas with others.

[jb] Yes, of course. :)

It is easy to see why a desire of all repos for the same name prefix being sync'ed up: an interest for that prefix may be forwarded to any of the repos, if all repos have the same data, anyone can answer it; otherwise routers have to handle NACKs and reroute the interest to try other repos.

[jb] Right, but this is a forwarding-centric view. If I am an app that has some (sparse) knowledge of a namespace, perhaps I just want to make it available to the network, without having to worry about forwarding?  I thought we are not supposed to have to think about forwarding! (At least this is what I get told when I ask about publisher mobility... What about a mobile repo? :)

a publisher mobility is a different story than a partial repo.

A few related questions:

1) Is this attempt at eventual consistency of what’s stored for a given namespace a requirement to be called a ‘repo’?

see the above explanation.
I wont call it a requirement, but desirable for performance concerns.

[jb] As I alluded to above, I am having trouble reconciling this with the “share what you know” approach of sync-based communication.  If I am a repo storing the thousand most popular netflix titles for the local geographic region (persistently, for a month, let’s say), do I register a thousand prefixes, or do I need a namespace for popular content (hope not), or knowledge of other names to NACK the content I don’t have?  I’m just not sure of the implications of what it means to know about all content for a prefix you are, as a repo, publishing...  So I am wondering still if I’m missing a connotation of exactly what a repo is vs. some other publisher.

Jeff, how about we look at *each* of different cases, one at a time, to understand its design tradeoffs.
If we believe this is the beginning of research on repo usage: we are yet to find general solutions.

2) Practically, if such sync is ultimately required behavior for a repo, will/should repo-ng itself incorporate some basic synchronization features?

as I mentioned, there was a piece of work done on repo sync 2 years back (but I dont know its implementation status)

[jb] Yes, I remember this somewhat too.  But it seems that 1) persistent storage is fundamental to apps; 2) interaction with storage happens near/right above the network layer in the case of repo; and 3) there are some performance and conceptual concerns with providing consistency between nodes providing storage in the same namespace, so this should be important to support?  (Or, at least to we need to inform users of the existing repo-ng the probably trajectory of development that is expected but not happening yet, in the repo documentation on the wiki? I think this type of getting our implicit plans out there is important to supporting community effort.)

1/ I agree with everything said above.

2/ One should not read nothing more from repo-ng than a best-effort trial piece of code.

3/ Yes we all know we need documentation.
Sadly there is a simple issue of manpower shortage.

3) Does opportunistic caching behavior (without sync) mean that the storage is not a ‘repo’?

My own view: no.
my explanation to others:
- caching: opportunistic
- (managed) repo: managed storage

 [jb] Ok. (I will go back to the chilled water example, though... what about a publisher that naturally only has knowledge of part of a tree..?  Is that not really a repo if it’s knowledge isn’t complete for the branch it stores? What about if it doesn’t store depth all the way down)

When one only has knowledge of a tree, one can/should only announce that specific branch of the tree.

4) Would, for example, network-attached storage that stores everything for a prefix but only up to a given depth in the tree not qualify as a repo?

everything is attached to the network :)
so I am not sure what "network-attached storage" really means.

 [jb] Sorry, the adjective probably wasn’t that relevant.  :)

if your question is "what if a managed repo for a name prefix only contains incomplete data under that prefix" -- that still works, and *may* even work OK if one understands the traffic patterns so that most of the Interests forwarded to that repo can be satisfied, without having to zigzagging to other places seeking for data.

 [jb] Not sure I understand who the “one” is and how they control Interest forwarding?

One in "if one understands the traffic patterns" means the app developer.
I do not mean that he controls interest forwarding, but just saying if he has a good understanding on app behavior.

I’m trying to think in terms of scenarios where app developers don’t really have the ability to manage network forwarding any more than they do today. (Yes, of course, some people deploying large apps control their network infrastructure to that level of detail but many don’t.)

5) Or, for example, in the BMS case, if I use a repo to store all of the electrical current samples for the UCLA campus, but not chilled water, it will have only have some of the tree for the campus bms prefix.  Is the storage not a repo?  Should it not be registering the root bms prefix?   Should I have / what do I call storage that is filling in part of the tree but don’t need to or can’t store all of it?

1/ if chilled water has its own prefix announcement, maybe one can find a way to attract all interests for chilled water data to the place for chilled water.

 [jb] Yes, I think I understand that approach, but that’s not really the deployment that I’m thinking of... the scenario I have in mind is this.

Two buildings, with data described like this:

/building-1/electrical/power/<time>
which aggregates
/building-1/room-7/electrical/power/<time>
  /building-1/room-8/electrical/power/<time>

   but there is also
   /building-1/hvac/chilled_water_in/<time>
  etc.

then, let’s say we want a root level repo that stores electrical data but not HVAC.   It provides some secondary advantage by being “close” to some processes that want to run analytics on all of the electrical data but don’t care about anything else.  It’s also run by the electrical folks and they don’t want to worry about anyone else’s data being in their system.    It also provides persistence or access scalability for that electrical data beyond what can be offered by the panels in the field.

So, to follow what I understand of your comments:

1) we need the repo to register every prefix with electrical data that it wishes to serve (which could be very long list of names at different granularity), or
2) we need the data names to start with /electrical (which has other implications for forwarding interests based on building/campus layout), or
3) we need NACK support in strategy(?) so that the repo could NACK prefixes it knows it’s not going to store,
or?

1) No I did not say that; how many things are feasible to announce is an engineering design issue. For things one does not announce, one needs other means to get the Interests move toward data.

3) NFD does handle NACKs.

3/ again one needs to think not just what one wants to put where, but also how forwarding can work well.

[jb] Yes,  I understand this.  But I think that we are entering territory where what we are asking of people developing and deploying apps can be very sophisticated consideration of multiple intersecting requirements on namespaces....

Now given NDN made apps and network share the same namespace, app people need to work with network people to figure out what is the best way to achieve apps goals.
No one dictates one solution or another, we simply need to figure things out, and do trials and errors.
There is no ready solution at this time.

so when picking data names we have to consider:
        1. what makes sense for the app internally (and the data itself)
        2. how trust schema are embodied in the namespace
        3. how access granularity / permissions may be embodied in the namespace
        4. global / Internet / enterprise forwarding implications (e.g., forward interests for /building-1 towards building 1)
        5. what requests we want to make fast/efficient with simple interests  (e.g., where to put time in the namespace for time series)
        6. AND persistency/storage implications should the data at some point be stored persistently

You’ve convinced me before to stop worrying about #4 and allow mechanisms to evolve in the network to take care of this, for the most part.

because app people need to first tell what apps need first, then we can see what may be the best way to handle it.

for things that do not seem feasible to be directly supported by network forwarding, one can always add a layer of indirection.

Having app people to start with worrying about network layer problems does not seem to me the best starting point.

  And that for #5, maybe apps should just publish in multiple namespaces, or use sync.  But #6 seems to be in the same category... should we really have to worry about what node is making what data persistent in designing the namespace?  Can we do that a priori in real networks with lots of interacting nodes under the control of different developers, deployers, and users?

1/ again, app designs should start with app needs first
2/ then we can see how well that can be supported, then maybe we need to iterate the design.

Here’s another example – Consider the smart home, where each subsystem that people buy from home depot (ala Phillips Hue) could have its own persistent storage that really doesn’t want to be involved in storage and publishing of data that’s not relevant to it..  but might (naively?) be thought to publish in the same prefix as each other...

For example, let’s say that you buy a few bulbs of the Phillips Hue system and you buy a few bulbs from another manufacturer.  Both register /<my-home-prefix>/usage/energy/lights because it’s a well-known convention, and use name discovery/sync techniques to pick non-conflicting names that make sense to users and apps.

Let’s say from each manufacturer, you also get a wall-wart-sized repo that stores a year’s worth of energy consumption data for any device from Phillips (evaluated through some data signing/verification scheme)... and another from the other manufacturer.

So here, if we want those repos to have efficient forwarding of appropriate interests to them, the data (from the lights) should have the manufacturer-specific subprefixes Phillips/ and Foo/ for the respective repos because we don’t want them to publish persistent data in a prefix they don’t have complete information for?   Even though we might not really want to enforce this manufacturer-centric data model on the energy consumption data of our lights necessarily?    (In fact the names exposed to the user / their apps for other purposes should ideally not have this at all.)

my brain is slow tonight so I didn't figure out exactly what is the ideal solution you wanted from the above.
Once we know that, then we can see what would be the best way to achieve it.

6) What are other requirements to be a ‘repo’?  (Alternatively, is there a canonical reference in the literature for the type of storage that constitutes a repo?)

it does not really matter what we want to call something.
there is no arbitrary requirement.
as I said already, this is research, we have not done much playing with repo up to now.  It is all about figuring out how to make the system work in the best way it can.

[jb] I’d argue that it’s just as important to figure out whether we can keepbrittleness and hidden dependencies out of app design and deployment that emerge from interdependencies between things ostensibly in the control of app developers (names) and those that are not (forwarding configuration, knowing whether ‘my’ repo knows everything in the tree it’s publishing in)....

I do not believe there is any brittleness or hidden dependencies to worry about at this time, as no design has been done yet.  I do believe the need for better communications to help everyone see the problem clearly and avoid misunderstandings.

Lixia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20160911/abd3c9de/attachment.html>