[Ndn-interest] Repo vs aggregation queries

Thu Jul 18 16:20:40 PDT 2019

On Thu, Jul 18, 2019, 17:17 Nick Briggs via Ndn-interest <
ndn-interest at lists.cs.ucla.edu> wrote:

>
> On Jul 18, 2019, at 4:00 PM, Junxiao Shi <shijunxiao at email.arizona.edu>
> wrote:
>
> Dear folks
>
> I saw a new paper coming out last week: The Role of Data Repositories in
> Named Data Networking.
> https://ieeexplore.ieee.org/abstract/document/8756944
> It explains the importance of data repositories as an architecture
> component, and how the repo could be used as a storage of application data.
> The paper asserts that having a repo makes data available even if the data
> producer application is offline.
>
> While the above points are all correct, I have one doubt: how to support
> aggregation queries in this architecture?
>
> I have a building sensing application.
> Each data point is, for example, the temperature measurement in a certain
> room at a certain time point. Using either passive or active data
> insertion, this Data could be stored into the repo. Then, anyone can ask
> for a single data point by expressing an Interest.
> Paired with a (general propose) namespace enumeration protocol, it's also
> possible for a consumer to discover what Data are available in a repo.
>
> A common use case is an *aggregation query*. For example, a consumer
> wants to know what's the maximum temperature among all the data points
> collected in a set of rooms within a time period. We further assume that
> the possible queries are not known in advance; for example, the time period
> could be arbitrary, and not necessarily aligned to the hour/day.
> In a relational database, this use case is supported by a simple SQL
> query. The SQL server spends computation power, but network usage is
> minimal.
> In a plain repo, this would require the consumer to retrieve every data
> point from the repo, and run the aggregation operator locally. If the query
> covers a long time period, the number of Data retrieved could be on order
> of 10^4. Isn't this a huge waste of network bandwidth?
> Of course, once an aggregation has been performed by someone, the result
> can be stored in a repo for future usage. But the first aggregation is very
> expensive.
>
> I can think about adding the aggregation operators into the repo. However,
> this would require the repo to understand application semantics, at a level
> much higher than understanding data retrieval pattern.
> At that level, did the "repo" stop being a repo and become a distributed
> database?
>
> On the other hand, I could have a separate "database" application
> answering aggregation queries. In this case, the database application could
> easily provide individual data points as well. Then, is there any value to
> still have the repo, and store every data point twice?
>
>
> I would not add aggregation operators to to the repo.  I would, however,
> have an aggregation app serving a slightly different namespace, that was
> able to locally access the data in the repo and provide the aggregated
> result to any clients requesting it.  That's going to be local, not
> network, I/O.  The aggregated response might be cached, possibly in the
> same repo as the raw measurements.  It doesn't require storing the data
> twice.
>
> What if the data under the same namespace is scattered over multiple repos?

>
>
> Suggestions?
> Yours, Junxiao
> _______________________________________________
> Ndn-interest mailing list
> Ndn-interest at lists.cs.ucla.edu
> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>
>
> _______________________________________________
> Ndn-interest mailing list
> Ndn-interest at lists.cs.ucla.edu
> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/ndn-interest/attachments/20190718/86e13a71/attachment-0001.html>