[Nfd-dev] Clarification of 'repo' definition

Sat Sep 10 14:57:47 PDT 2016

On Sep 8, 2016, at 7:18 AM, Burke, Jeff <jburke at remap.ucla.edu<mailto:jburke at remap.ucla.edu>> wrote:

Hi,

(Sorry for these messages all at once, and let me know if this should go to ndn-interest.)

Through an earlier conversation with Lixia this week, I realized that the definition / expectations of a ‘repo’ may be more specific than I previously understood.  In particular, she mentioned that repos storing and registering the same namespace should (eventually) have the same data for that namespace, presumably via sync.

I'll start by saying that this is all researchy: we are walking into a new territory that we are yet to fully explore.
So what I said, before or here, is just to share ideas with others.

[jb] Yes, of course. :)

It is easy to see why a desire of all repos for the same name prefix being sync'ed up: an interest for that prefix may be forwarded to any of the repos, if all repos have the same data, anyone can answer it; otherwise routers have to handle NACKs and reroute the interest to try other repos.

[jb] Right, but this is a forwarding-centric view. If I am an app that has some (sparse) knowledge of a namespace, perhaps I just want to make it available to the network, without having to worry about forwarding?  I thought we are not supposed to have to think about forwarding! (At least this is what I get told when I ask about publisher mobility... What about a mobile repo? :)

A few related questions:

1) Is this attempt at eventual consistency of what’s stored for a given namespace a requirement to be called a ‘repo’?

see the above explanation.
I wont call it a requirement, but desirable for performance concerns.

[jb] As I alluded to above, I am having trouble reconciling this with the “share what you know” approach of sync-based communication.  If I am a repo storing the thousand most popular netflix titles for the local geographic region (persistently, for a month, let’s say), do I register a thousand prefixes, or do I need a namespace for popular content (hope not), or knowledge of other names to NACK the content I don’t have?  I’m just not sure of the implications of what it means to know about all content for a prefix you are, as a repo, publishing...  So I am wondering still if I’m missing a connotation of exactly what a repo is vs. some other publisher.

2) Practically, if such sync is ultimately required behavior for a repo, will/should repo-ng itself incorporate some basic synchronization features?

as I mentioned, there was a piece of work done on repo sync 2 years back (but I dont know its implementation status)

[jb] Yes, I remember this somewhat too.  But it seems that 1) persistent storage is fundamental to apps; 2) interaction with storage happens near/right above the network layer in the case of repo; and 3) there are some performance and conceptual concerns with providing consistency between nodes providing storage in the same namespace, so this should be important to support?  (Or, at least to we need to inform users of the existing repo-ng the probably trajectory of development that is expected but not happening yet, in the repo documentation on the wiki? I think this type of getting our implicit plans out there is important to supporting community effort.)

3) Does opportunistic caching behavior (without sync) mean that the storage is not a ‘repo’?

My own view: no.
my explanation to others:
- caching: opportunistic
- (managed) repo: managed storage

 [jb] Ok. (I will go back to the chilled water example, though... what about a publisher that naturally only has knowledge of part of a tree..?  Is that not really a repo if it’s knowledge isn’t complete for the branch it stores? What about if it doesn’t store depth all the way down)

4) Would, for example, network-attached storage that stores everything for a prefix but only up to a given depth in the tree not qualify as a repo?

everything is attached to the network :)
so I am not sure what "network-attached storage" really means.

 [jb] Sorry, the adjective probably wasn’t that relevant.  :)

if your question is "what if a managed repo for a name prefix only contains incomplete data under that prefix" -- that still works, and *may* even work OK if one understands the traffic patterns so that most of the Interests forwarded to that repo can be satisfied, without having to zigzagging to other places seeking for data.

 [jb] Not sure I understand who the “one” is and how they control Interest forwarding?  I’m trying to think in terms of scenarios where app developers don’t really have the ability to manage network forwarding any more than they do today. (Yes, of course, some people deploying large apps control their network infrastructure to that level of detail but many don’t.)

5) Or, for example, in the BMS case, if I use a repo to store all of the electrical current samples for the UCLA campus, but not chilled water, it will have only have some of the tree for the campus bms prefix.  Is the storage not a repo?  Should it not be registering the root bms prefix?   Should I have / what do I call storage that is filling in part of the tree but don’t need to or can’t store all of it?

1/ if chilled water has its own prefix announcement, maybe one can find a way to attract all interests for chilled water data to the place for chilled water.

 [jb] Yes, I think I understand that approach, but that’s not really the deployment that I’m thinking of... the scenario I have in mind is this.

Two buildings, with data described like this:

/building-1/electrical/power/<time>
which aggregates
/building-1/room-7/electrical/power/<time>
  /building-1/room-8/electrical/power/<time>

   but there is also
   /building-1/hvac/chilled_water_in/<time>
  etc.

then, let’s say we want a root level repo that stores electrical data but not HVAC.   It provides some secondary advantage by being “close” to some processes that want to run analytics on all of the electrical data but don’t care about anything else.  It’s also run by the electrical folks and they don’t want to worry about anyone else’s data being in their system.    It also provides persistence or access scalability for that electrical data beyond what can be offered by the panels in the field.

So, to follow what I understand of your comments:

1) we need the repo to register every prefix with electrical data that it wishes to serve (which could be very long list of names at different granularity), or
2) we need the data names to start with /electrical (which has other implications for forwarding interests based on building/campus layout), or
3) we need NACK support in strategy(?) so that the repo could NACK prefixes it knows it’s not going to store,
or?

3/ again one needs to think not just what one wants to put where, but also how forwarding can work well.

[jb] Yes,  I understand this.  But I think that we are entering territory where what we are asking of people developing and deploying apps can be very sophisticated consideration of multiple intersecting requirements on namespaces....

so when picking data names we have to consider:
        1. what makes sense for the app internally (and the data itself)
        2. how trust schema are embodied in the namespace
        3. how access granularity / permissions may be embodied in the namespace
        4. global / Internet / enterprise forwarding implications (e.g., forward interests for /building-1 towards building 1)
        5. what requests we want to make fast/efficient with simple interests  (e.g., where to put time in the namespace for time series)
        6. AND persistency/storage implications should the data at some point be stored persistently

You’ve convinced me before to stop worrying about #4 and allow mechanisms to evolve in the network to take care of this, for the most part.  And that for #5, maybe apps should just publish in multiple namespaces, or use sync.  But #6 seems to be in the same category... should we really have to worry about what node is making what data persistent in designing the namespace?  Can we do that a priori in real networks with lots of interacting nodes under the control of different developers, deployers, and users?

Here’s another example – Consider the smart home, where each subsystem that people buy from home depot (ala Phillips Hue) could have its own persistent storage that really doesn’t want to be involved in storage and publishing of data that’s not relevant to it..  but might (naively?) be thought to publish in the same prefix as each other...

For example, let’s say that you buy a few bulbs of the Phillips Hue system and you buy a few bulbs from another manufacturer.  Both register /<my-home-prefix>/usage/energy/lights because it’s a well-known convention, and use name discovery/sync techniques to pick non-conflicting names that make sense to users and apps.

Let’s say from each manufacturer, you also get a wall-wart-sized repo that stores a year’s worth of energy consumption data for any device from Phillips (evaluated through some data signing/verification scheme)... and another from the other manufacturer.

So here, if we want those repos to have efficient forwarding of appropriate interests to them, the data (from the lights) should have the manufacturer-specific subprefixes Phillips/ and Foo/ for the respective repos because we don’t want them to publish persistent data in a prefix they don’t have complete information for?   Even though we might not really want to enforce this manufacturer-centric data model on the energy consumption data of our lights necessarily?    (In fact the names exposed to the user / their apps for other purposes should ideally not have this at all.)

6) What are other requirements to be a ‘repo’?  (Alternatively, is there a canonical reference in the literature for the type of storage that constitutes a repo?)

it does not really matter what we want to call something.
there is no arbitrary requirement.
as I said already, this is research, we have not done much playing with repo up to now.  It is all about figuring out how to make the system work in the best way it can.

[jb] I’d argue that it’s just as important to figure out whether we can keep brittleness and hidden dependencies out of app design and deployment that emerge from interdependencies between things ostensibly in the control of app developers (names) and those that are not (forwarding configuration, knowing whether ‘my’ repo knows everything in the tree it’s publishing in)....

my 2 cents,
Lixia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20160910/04236886/attachment.html>