<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 18, 2019, at 4:00 PM, Junxiao Shi <<a href="mailto:shijunxiao@email.arizona.edu" class="">shijunxiao@email.arizona.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="auto" class=""><div dir="ltr" class="">Dear folks<div class=""><br class=""></div><div class="">I saw a new paper coming out last week: The Role of Data Repositories in Named Data Networking. <a href="https://ieeexplore.ieee.org/abstract/document/8756944" target="_blank" rel="noreferrer" class="">https://ieeexplore.ieee.org/abstract/document/8756944</a> </div><div class="">It explains the importance of data repositories as an architecture component, and how the repo could be used as a storage of application data. The paper asserts that having a repo makes data available even if the data producer application is offline.<br class=""></div><div class=""><br class=""></div><div class="">While the above points are all correct, I have one doubt: how to support aggregation queries in this architecture?</div><div class=""><br class=""></div><div class="">I have a building sensing application.</div><div dir="auto" class="">Each data point is, for example, the temperature measurement in a certain room at a certain time point. Using either passive or active data insertion, this Data could be stored into the repo. Then, anyone can ask for a single data point by expressing an Interest.</div><div dir="auto" class="">Paired with a (general propose) namespace enumeration protocol, it's also possible for a consumer to discover what Data are available in a repo.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">A common use case is an <b class="">aggregation query</b>. For example, a consumer wants to know what's the maximum temperature among all the data points collected in a set of rooms within a time period. We further assume that the possible queries are not known in advance; for example, the time period could be arbitrary, and not necessarily aligned to the hour/day.</div><div dir="auto" class="">In a relational database, this use case is supported by a simple SQL query. The SQL server spends computation power, but network usage is minimal.</div><div dir="auto" class="">In a plain repo, this would require the consumer to retrieve every data point from the repo, and run the aggregation operator locally. If the query covers a long time period, the number of Data retrieved could be on order of 10^4. Isn't this a huge waste of network bandwidth?</div><div dir="auto" class="">Of course, once an aggregation has been performed by someone, the result can be stored in a repo for future usage. But the first aggregation is very expensive.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">I can think about adding the aggregation operators into the repo. However, this would require the repo to understand application semantics, at a level much higher than understanding data retrieval pattern.</div><div dir="auto" class="">At that level, did the "repo" stop being a repo and become a distributed database?</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">On the other hand, I could have a separate "database" application answering aggregation queries. In this case, the database application could easily provide individual data points as well. Then, is there any value to still have the repo, and store every data point twice?</div></div></div></div></blockquote><div><br class=""></div>I would not add aggregation operators to to the repo.  I would, however, have an aggregation app serving a slightly different namespace, that was able to locally access the data in the repo and provide the aggregated result to any clients requesting it.  That's going to be local, not network, I/O.  The aggregated response might be cached, possibly in the same repo as the raw measurements.  It doesn't require storing the data twice.</div><div><br class=""></div><div><br class=""></div><div><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div dir="ltr" class=""><div class=""><br class=""></div><div dir="auto" class="">Suggestions?</div><div class="">Yours, Junxiao</div></div></div>

_______________________________________________<br class="">Ndn-interest mailing list<br class=""><a href="mailto:Ndn-interest@lists.cs.ucla.edu" class="">Ndn-interest@lists.cs.ucla.edu</a><br class="">http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest<br class=""></div></blockquote></div><br class=""></body></html>