From felix at rabe.io Tue Sep 2 14:19:02 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 02 Sep 2014 14:19:02 -0700 Subject: [Ndn-interest] Dockerfile for NFD on Ubuntu In-Reply-To: <8AF85944-F5FD-40AD-8E59-02E8706B2B5A@ucla.edu> References: <5403C70B.8020103@rabe.io> <8AF85944-F5FD-40AD-8E59-02E8706B2B5A@ucla.edu> Message-ID: <54063446.8000005@rabe.io> Greetings from around the corner :) As I see no way to edit that Wiki page (I'm logged in to Redmine), I've just written a bit of a writeup about the most important Docker commands and how to use the image in the attached file. Feel free to put the content on the wiki. - Felix On 01/Sep/14 03:38, Alex Afanasyev wrote: > Cool! > > When you have time, can you also write some basic steps on how one would start working with Docker? For example, on this wiki page: http://redmine.named-data.net/projects/nfd/wiki/Using_NFD_with_Docker?parent=Wiki (I already linked it from main NFD wiki) > > I have only read briefly about the Docker and others probably also have very limited knowledge about it. But from what I get, it could be very convenient way to deploy NFD on any linux distribution without the need to tailor the binary packages for each. > > --- > Alex > > On Aug 31, 2014, at 6:08 PM, Felix Rabe wrote: > >> I've just managed to install NFD on Ubuntu inside Docker, it was straightforward thanks to the instructions on http://named-data.net/doc/NFD/0.2.0/INSTALL.html. >> >> Dockerfile: >> ====== >> FROM ubuntu:14.04 >> >> RUN apt-get update >> RUN apt-get install -y software-properties-common >> >> RUN add-apt-repository -y ppa:named-data/ppa >> RUN apt-get update >> RUN apt-get install -y nfd >> ====== >> >> If you want man pages, throw an `apt-get install -y man` in there. >> >> Caveat: Have not actually run the software :) will do that on my next opportunity before the meeting. >> >> - Felix >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- # Using NFD with Docker Docker is a command-line frontend to operating system-level virtualization solutions such as Linux containers (LXC). This allows running application processes in isolation without the overhead of a more traditional VM ? for example, there is no (operating system) boot process involved. For more information, see https://www.docker.com/ and http://en.wikipedia.org/wiki/Docker_(software). ## Step 1: Install Docker Go to https://docs.docker.com/installation/ for installation instructions. Docker runs natively on (recent) Linux kernels, and via a VM on Windows and OS X. ## Step 2: Save the Dockerfile Create a new directory and put the following in a file called `Dockerfile`: # This is based on the instructions from: # http://named-data.net/doc/NFD/0.2.0/INSTALL.html#install-nfd-using-the-ndn-ppa-repository-on-ubuntu-linux FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y software-properties-common RUN add-apt-repository -y ppa:named-data/ppa RUN apt-get update RUN apt-get install -y nfd The syntax is documented at http://docs.docker.com/reference/builder/. ## Step 3: Build the image Open a terminal and change to the new directory created above, and execute the following command to build an image called `named_data/nfd`: docker build -t named_data/nfd . This is similar to a compilation step to transform source code (the Dockerfile) into executable code (the Docker image). If you are new to Docker, the first time you run that command, it will pull the Ubuntu base image, which will take some time. Later rebuilds happen fast, thanks to Docker's snapshotting. ## Step 4: Run a shell from the image To start a process (create a Docker container), you use the `docker run` command. As you might want to explore the Docker container at first, this shows you the usual way to start a Bash process that leaves no traces: (`--rm` removes the container afterwards) docker run --rm -ti named_data/nfd /bin/bash You find the full documentation of the `docker run` command at https://docs.docker.com/reference/run/, and the full command-line reference at https://docs.docker.com/reference/commandline/cli/. ## Next steps There is an interactive Docker tutorial at https://www.docker.com/tryit/, and more documentation at https://docs.docker.com/. `docker help [command]` is also helpful. Docker works best if a container runs only one process at a time, such as NFD. Bash is usually only used for exploration. To trim down the image, consider using Debian (90 MB) or Busybox (2.5 MB) as a base image instead of Ubuntu (225 MB). (There are currently no instructions for these base images, as these distributions are currently not supported / tested by the Named Data project.) (TODO: Push a trusted build to https://registry.hub.docker.com/ so others can directly pull the pre-built image.) ## Summary of the Docker command line This section lists the `docker` commands and arguments that are most commonly used. ### Commands build Build an image from a Dockerfile (`docker build -t imageName directory`) run Run a command in a new container (`docker run --rm -ti ubuntu:14.04`) ### `docker run` arguments: general Arguments marked with `*` can also be defined in the `Dockerfile`. -d Run in the background (and use `docker ps/logs/stop/kill`) -i Keep STDIN open (together with `-t`, for Bash) --name containerName Give a container a name (for `--link` and `--volume-from`) --rm Remove the container after it exits -t Allocate a pseudo-TTY (together with `-i`, for Bash) ### `docker run` arguments: networking --expose port * Expose a port for use with `--link` --link otherContainer:alias Link to exposed ports of another container (sets env. vars) -p hostPort:containerPort Publish a TCP port to the host ### `docker run` arguments: filesystem -v /containerPath * Make a mount point available without content (for `--volumes-from`) -v /hostPath:/containerPath Mount a host path into a volume (mount point, also for `--volumes-from`) --volumes-from name Mount volumes from another container From felix at rabe.io Tue Sep 2 14:34:24 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 02 Sep 2014 14:34:24 -0700 Subject: [Ndn-interest] Dockerfile for NFD on Ubuntu In-Reply-To: <54063446.8000005@rabe.io> References: <5403C70B.8020103@rabe.io> <8AF85944-F5FD-40AD-8E59-02E8706B2B5A@ucla.edu> <54063446.8000005@rabe.io> Message-ID: <540637E0.3010707@rabe.io> Just noticed that the Wikipedia link is borken, fixed in the new attachment. On 02/Sep/14 14:19, Felix Rabe wrote: > Greetings from around the corner :) > > As I see no way to edit that Wiki page (I'm logged in to Redmine), > I've just written a bit of a writeup about the most important Docker > commands and how to use the image in the attached file. Feel free to > put the content on the wiki. > > - Felix > > On 01/Sep/14 03:38, Alex Afanasyev wrote: >> Cool! >> >> When you have time, can you also write some basic steps on how one >> would start working with Docker? For example, on this wiki page: >> http://redmine.named-data.net/projects/nfd/wiki/Using_NFD_with_Docker?parent=Wiki >> (I already linked it from main NFD wiki) >> >> I have only read briefly about the Docker and others probably also >> have very limited knowledge about it. But from what I get, it could >> be very convenient way to deploy NFD on any linux distribution >> without the need to tailor the binary packages for each. >> >> --- >> Alex >> >> On Aug 31, 2014, at 6:08 PM, Felix Rabe wrote: >> >>> I've just managed to install NFD on Ubuntu inside Docker, it was >>> straightforward thanks to the instructions on >>> http://named-data.net/doc/NFD/0.2.0/INSTALL.html. >>> >>> Dockerfile: >>> ====== >>> FROM ubuntu:14.04 >>> >>> RUN apt-get update >>> RUN apt-get install -y software-properties-common >>> >>> RUN add-apt-repository -y ppa:named-data/ppa >>> RUN apt-get update >>> RUN apt-get install -y nfd >>> ====== >>> >>> If you want man pages, throw an `apt-get install -y man` in there. >>> >>> Caveat: Have not actually run the software :) will do that on my >>> next opportunity before the meeting. >>> >>> - Felix >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- # Using NFD with Docker Docker is a command-line frontend to operating system-level virtualization solutions such as Linux containers (LXC). This allows running application processes in isolation, just like a VM, but without the overhead of a traditional VM ? for example, there is no (operating system) boot process involved. For more information, see https://www.docker.com/ and http://en.wikipedia.org/wiki/Docker_%28software%29. ## Step 1: Install Docker Go to https://docs.docker.com/installation/ for installation instructions. Docker runs natively on (recent) Linux kernels, and via a VM on Windows and OS X. ## Step 2: Save the Dockerfile Create a new directory and put the following in a file called `Dockerfile`: # This is based on the instructions from: # http://named-data.net/doc/NFD/0.2.0/INSTALL.html#install-nfd-using-the-ndn-ppa-repository-on-ubuntu-linux FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y software-properties-common RUN add-apt-repository -y ppa:named-data/ppa RUN apt-get update RUN apt-get install -y nfd The syntax is documented at http://docs.docker.com/reference/builder/. ## Step 3: Build the image Open a terminal and change to the new directory created above, and execute the following command to build an image called `named_data/nfd`: docker build -t named_data/nfd . This is similar to a compilation step to transform source code (the Dockerfile) into executable code (the Docker image). If you are new to Docker, the first time you run that command, it will pull the Ubuntu base image, which will take some time. Later rebuilds happen fast, thanks to Docker's snapshotting. ## Step 4: Run a shell from the image To start a process (create a Docker container), you use the `docker run` command. As you might want to explore the Docker container at first, this shows you the usual way to start a Bash process that leaves no traces: (`--rm` removes the container afterwards) docker run --rm -ti named_data/nfd /bin/bash You find the full documentation of the `docker run` command at https://docs.docker.com/reference/run/, and the full command-line reference at https://docs.docker.com/reference/commandline/cli/. ## Next steps There is an interactive Docker tutorial at https://www.docker.com/tryit/, and more documentation at https://docs.docker.com/. `docker help [command]` is also helpful. Docker works best if a container runs only one process at a time, such as NFD. Bash is usually only used for exploration. To trim down the image, consider using Debian (90 MB) or Busybox (2.5 MB) as a base image instead of Ubuntu (225 MB). (There are currently no instructions for these base images, as these distributions are currently not supported / tested by the Named Data project.) (TODO: Push a trusted build to https://registry.hub.docker.com/ so others can directly pull the pre-built image.) ## Summary of the Docker command line This section lists the `docker` commands and arguments that are most commonly used. ### Commands build Build an image from a Dockerfile (`docker build -t imageName directory`) run Run a command in a new container (`docker run --rm -ti ubuntu:14.04`) ### `docker run` arguments: general Arguments marked with `*` can also be defined in the `Dockerfile`. -d Run in the background (and use `docker ps/logs/stop/kill`) -i Keep STDIN open (together with `-t`, for Bash) --name containerName Give a container a name (for `--link` and `--volume-from`) --rm Remove the container after it exits -t Allocate a pseudo-TTY (together with `-i`, for Bash) ### `docker run` arguments: networking --expose port * Expose a port for use with `--link` --link otherContainer:alias Link to exposed ports of another container (sets env. vars) -p hostPort:containerPort Publish a TCP port to the host ### `docker run` arguments: filesystem -v /containerPath * Make a mount point available without content (for `--volumes-from`) -v /hostPath:/containerPath Mount a host path into a volume (mount point, also for `--volumes-from`) --volumes-from name Mount volumes from another container From felix at rabe.io Tue Sep 2 15:36:22 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 02 Sep 2014 15:36:22 -0700 Subject: [Ndn-interest] Running NFD on a (VirtualBox) VM using Vagrant (Docker setup FAILS right now) Message-ID: <54064666.8020301@rabe.io> Hi I need to work more on the Docker-based installation method, it currently fails with the following error: felix-mba:docker-images fr$ docker run --rm -ti nfd root at cb92f60fb4c0:/# nfd-start root at cb92f60fb4c0:/# nfd-status ERROR: error while connecting to the forwarder (No such file or directory) In the meantime, I've got NFD running on Ubuntu 14.04 inside a VM using Vagrant: https://github.com/named-data-education/vagrant Whereas Docker is a frontend to Linux Containers, Vagrant is a frontend to VMs such as VirtualBox or VMWare. You can get it at http://www.vagrantup.com/. Quick start once you have Vagrant (and Git) installed: (This should work identically on OS X, Linux, Windows) git clone https://github.com/named-data-education/vagrant.git cd vagrant vagrant up vagrant ssh # and inside the VM: sudo bash nfd-status If you need help, `vagrant -h` is your friend. - Felix From felix at rabe.io Tue Sep 2 16:19:44 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 02 Sep 2014 16:19:44 -0700 Subject: [Ndn-interest] Running NFD on a (VirtualBox) VM using Vagrant (Docker setup FAILS right now) In-Reply-To: <54064666.8020301@rabe.io> References: <54064666.8020301@rabe.io> Message-ID: <54065090.7060701@rabe.io> Regarding the broken Docker setup, I've created a repository and an issue: https://github.com/named-data-education/ndn-with-docker/issues/1 Feel free to comment if you know Docker and have hints how to solve this. (I will investigate this later myself.) On 02/Sep/14 15:36, Felix Rabe wrote: > Hi > > I need to work more on the Docker-based installation method, it > currently fails with the following error: > > felix-mba:docker-images fr$ docker run --rm -ti nfd > root at cb92f60fb4c0:/# nfd-start > root at cb92f60fb4c0:/# nfd-status > ERROR: error while connecting to the forwarder (No such file or > directory) > > In the meantime, I've got NFD running on Ubuntu 14.04 inside a VM > using Vagrant: > > https://github.com/named-data-education/vagrant > > Whereas Docker is a frontend to Linux Containers, Vagrant is a > frontend to VMs such as VirtualBox or VMWare. You can get it at > http://www.vagrantup.com/. > > Quick start once you have Vagrant (and Git) installed: (This should > work identically on OS X, Linux, Windows) > > git clone https://github.com/named-data-education/vagrant.git > cd vagrant > vagrant up > vagrant ssh > # and inside the VM: > sudo bash > nfd-status > > If you need help, `vagrant -h` is your friend. > > - Felix > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From shijunxiao at email.arizona.edu Tue Sep 2 16:32:05 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Tue, 2 Sep 2014 16:32:05 -0700 Subject: [Ndn-interest] Dockerfile for NFD on Ubuntu In-Reply-To: <54063446.8000005@rabe.io> References: <5403C70B.8020103@rabe.io> <8AF85944-F5FD-40AD-8E59-02E8706B2B5A@ucla.edu> <54063446.8000005@rabe.io> Message-ID: Hi Felix I edited NFD wiki front page to link to your GitHub repositories. You can upload the document as README.md in the repository, so that readers can see it. Yours, Junxiao On Tue, Sep 2, 2014 at 2:19 PM, Felix Rabe wrote: > Greetings from around the corner :) > > As I see no way to edit that Wiki page (I'm logged in to Redmine), I've > just written a bit of a writeup about the most important Docker commands > and how to use the image in the attached file. Feel free to put the content > on the wiki. > > - Felix > > > On 01/Sep/14 03:38, Alex Afanasyev wrote: > >> Cool! >> >> When you have time, can you also write some basic steps on how one would >> start working with Docker? For example, on this wiki page: >> http://redmine.named-data.net/projects/nfd/wiki/Using_NFD_ >> with_Docker?parent=Wiki (I already linked it from main NFD wiki) >> >> I have only read briefly about the Docker and others probably also have >> very limited knowledge about it. But from what I get, it could be very >> convenient way to deploy NFD on any linux distribution without the need to >> tailor the binary packages for each. >> >> --- >> Alex >> >> On Aug 31, 2014, at 6:08 PM, Felix Rabe wrote: >> >> I've just managed to install NFD on Ubuntu inside Docker, it was >>> straightforward thanks to the instructions on >>> http://named-data.net/doc/NFD/0.2.0/INSTALL.html. >>> >>> Dockerfile: >>> ====== >>> FROM ubuntu:14.04 >>> >>> RUN apt-get update >>> RUN apt-get install -y software-properties-common >>> >>> RUN add-apt-repository -y ppa:named-data/ppa >>> RUN apt-get update >>> RUN apt-get install -y nfd >>> ====== >>> >>> If you want man pages, throw an `apt-get install -y man` in there. >>> >>> Caveat: Have not actually run the software :) will do that on my next >>> opportunity before the meeting. >>> >>> - Felix >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix at rabe.io Tue Sep 2 16:53:08 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 02 Sep 2014 16:53:08 -0700 Subject: [Ndn-interest] Dockerfile for NFD on Ubuntu In-Reply-To: References: <5403C70B.8020103@rabe.io> <8AF85944-F5FD-40AD-8E59-02E8706B2B5A@ucla.edu> <54063446.8000005@rabe.io> Message-ID: <54065864.7050802@rabe.io> Thanks, done. On 02/Sep/14 16:32, Junxiao Shi wrote: > Hi Felix > > I edited NFD wiki front page > to link to your > GitHub repositories. > You can upload the document as README.md in the repository, so that > readers can see it. > > Yours, Junxiao > > > On Tue, Sep 2, 2014 at 2:19 PM, Felix Rabe > wrote: > > Greetings from around the corner :) > > As I see no way to edit that Wiki page (I'm logged in to Redmine), > I've just written a bit of a writeup about the most important > Docker commands and how to use the image in the attached file. > Feel free to put the content on the wiki. > > - Felix > > > On 01/Sep/14 03:38, Alex Afanasyev wrote: > > Cool! > > When you have time, can you also write some basic steps on how > one would start working with Docker? For example, on this > wiki page: > http://redmine.named-data.net/projects/nfd/wiki/Using_NFD_with_Docker?parent=Wiki > (I already linked it from main NFD wiki) > > I have only read briefly about the Docker and others probably > also have very limited knowledge about it. But from what I > get, it could be very convenient way to deploy NFD on any > linux distribution without the need to tailor the binary > packages for each. > > --- > Alex > > On Aug 31, 2014, at 6:08 PM, Felix Rabe > wrote: > > I've just managed to install NFD on Ubuntu inside Docker, > it was straightforward thanks to the instructions on > http://named-data.net/doc/NFD/0.2.0/INSTALL.html. > > Dockerfile: > ====== > FROM ubuntu:14.04 > > RUN apt-get update > RUN apt-get install -y software-properties-common > > RUN add-apt-repository -y ppa:named-data/ppa > RUN apt-get update > RUN apt-get install -y nfd > ====== > > If you want man pages, throw an `apt-get install -y man` > in there. > > Caveat: Have not actually run the software :) will do that > on my next opportunity before the meeting. > > - Felix > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 0023 at naver.com Tue Sep 2 22:09:25 2014 From: 0023 at naver.com (=?UTF-8?B?7LWc7ZW07J28?=) Date: Wed, 3 Sep 2014 14:09:25 +0900 (KST) Subject: [Ndn-interest] =?utf-8?q?I_have_some_questions_of_NDN=2E?= In-Reply-To: <4bb2beaf1be3421cfe8b619acd112bf9@cweb10.nm.nhnsystem.com> References: <217039accbf4736317d59b9c578eccd@cweb08.nm.nhnsystem.com> <4bb2beaf1be3421cfe8b619acd112bf9@cweb10.nm.nhnsystem.com> Message-ID: Dear NDN. Hello. My name is Haeil Choi. I study Networking at Kookmin graduate school from Korea. I'm interesting the future internet. so I read the your paper NDN. - Networking Named Content by V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, R. L. Braynard CoNEXT 2009, Rome, December, 2009. I have some questions. so I write this e-mail. This paper was 5 years ago, I do not know what is changed now. I would appreciate your response. 1. I want to know what is IP+CCN Router in section 4.1. 2. In figure 6, If Routers have both IP function and CCN function, Does IP Router C know adjacency Router A, B, E, F? How does it know that? 3. I organized the problem related to figure 6 in attached ppt file. Do I show you the exact description? 4. I understand that CCN don't have source and destination. How does the IP Router C find the A and B in Interest packet sent from the F? 5. I want to simulate the same conditions as above in NS3. I don't know how can I designed IP+CCN router in NS3. I wonder if I could get a related data and ndnSIM source. I am looking forward to hearing from you. Thank you for your time. Sincerely, Haeil Choi. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Guestion of ndn.pptx Type: application/vnd.openxmlformats-officedocument.presentationml.presentation Size: 46157 bytes Desc: not available URL: From tailinchu at gmail.com Tue Sep 2 23:35:54 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Tue, 2 Sep 2014 23:35:54 -0700 Subject: [Ndn-interest] I have some questions of NDN. In-Reply-To: References: <217039accbf4736317d59b9c578eccd@cweb08.nm.nhnsystem.com> <4bb2beaf1be3421cfe8b619acd112bf9@cweb10.nm.nhnsystem.com> Message-ID: hi, CCN is now renamed to NDN, and a lot of changes happen in five years. Here is a good reading list to keep up with those changes: http://www.caida.org/workshops/ndn/1409/ I hope this helps. Best, Tai-Lin Chu On Tue, Sep 2, 2014 at 10:09 PM, ??? <0023 at naver.com> wrote: > > > Dear NDN. > > > > Hello. My name is Haeil Choi. I study Networking at Kookmin graduate > school from Korea. > > I'm interesting the future internet. so I read the your paper NDN. > > - Networking Named Content by V. Jacobson, D. K. Smetters, J. D. > Thornton, M. F. Plass, N. H. Briggs, R. L. Braynard CoNEXT 2009, Rome, > December, 2009. > > > > I have some questions. so I write this e-mail. > > This paper was 5 years ago, I do not know what is changed now. I would > appreciate your response. > > > > 1. I want to know what is IP+CCN Router in section 4.1. > > > > 2. In figure 6, If Routers have both IP function and CCN function, Does IP > Router C know adjacency Router A, B, E, F? How does it know that? > > > > 3. I organized the problem related to figure 6 in attached ppt file. Do I > show you the exact description? > > > > 4. I understand that CCN don't have source and destination. How does the > IP Router C find the A and B in Interest packet sent from the F? > > > > 5. I want to simulate the same conditions as above in NS3. I don't know > how can I designed IP+CCN router in NS3. > > I wonder if I could get a related data and ndnSIM source. > > > > I am looking forward to hearing from you. > > Thank you for your time. > > > > Sincerely, > > > > Haeil Choi. > > > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 0023 at naver.com Wed Sep 3 00:02:33 2014 From: 0023 at naver.com (=?UTF-8?B?7LWc7ZW07J28?=) Date: Wed, 3 Sep 2014 16:02:33 +0900 (KST) Subject: [Ndn-interest] =?utf-8?q?How_can_you_operate_with_ccn_and_ip=3F?= Message-ID: <47d197bd91908e967e3ac063d0174a@cweb15.nm.nhnsystem.com> I think you were so difficult to a lot of questions at once. so I ask a question I most want to know. When CCN Router is performed partially, I think the Current-Router must be able to work together. Could IP routers(like OSPF routing protocol) operate with CCN Routers(NDN)? I would appreciate you giving an example or involved paper. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Wed Sep 3 00:30:48 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Wed, 3 Sep 2014 00:30:48 -0700 Subject: [Ndn-interest] How can you operate with ccn and ip? In-Reply-To: <47d197bd91908e967e3ac063d0174a@cweb15.nm.nhnsystem.com> References: <47d197bd91908e967e3ac063d0174a@cweb15.nm.nhnsystem.com> Message-ID: yes. [NLSR](https://github.com/named-data/NLSR) [nfd](https://github.com/named-data/NFD) On Wed, Sep 3, 2014 at 12:02 AM, ??? <0023 at naver.com> wrote: > I think you were so difficult to a lot of questions at once. > > > > so I ask a question I most want to know. > > > > When CCN Router is performed partially, I think the Current-Router must be > able to work together. > > > > Could IP routers(like OSPF routing protocol) operate with CCN Routers(NDN)? > > > > I would appreciate you giving an example or involved paper. > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Wed Sep 3 00:34:08 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Wed, 3 Sep 2014 00:34:08 -0700 Subject: [Ndn-interest] How can you operate with ccn and ip? In-Reply-To: References: <47d197bd91908e967e3ac063d0174a@cweb15.nm.nhnsystem.com> Message-ID: Also I assume that you mean "routing algorithm" instead of "IP router". On Wed, Sep 3, 2014 at 12:30 AM, Tai-Lin Chu wrote: > yes. > > [NLSR](https://github.com/named-data/NLSR) > [nfd](https://github.com/named-data/NFD) > > > On Wed, Sep 3, 2014 at 12:02 AM, ??? <0023 at naver.com> wrote: > >> I think you were so difficult to a lot of questions at once. >> >> >> >> so I ask a question I most want to know. >> >> >> >> When CCN Router is performed partially, I think the Current-Router must >> be able to work together. >> >> >> >> Could IP routers(like OSPF routing protocol) operate with CCN >> Routers(NDN)? >> >> >> >> I would appreciate you giving an example or involved paper. >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanwang at memphis.edu Wed Sep 3 06:49:11 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Wed, 3 Sep 2014 13:49:11 +0000 Subject: [Ndn-interest] How can you operate with ccn and ip? In-Reply-To: <47d197bd91908e967e3ac063d0174a@cweb15.nm.nhnsystem.com> References: <47d197bd91908e967e3ac063d0174a@cweb15.nm.nhnsystem.com> Message-ID: If you are asking about if IP routers and NDN routers can interoperate, I think the answer is no. Lan On Sep 3, 2014, at 2:02 AM, ??? <0023 at naver.com> wrote: I think you were so difficult to a lot of questions at once. so I ask a question I most want to know. When CCN Router is performed partially, I think the Current-Router must be able to work together. Could IP routers(like OSPF routing protocol) operate with CCN Routers(NDN)? I would appreciate you giving an example or involved paper. [http://mail.naver.com/readReceipt/notify/?img=YsKYaqKZKAISFqvwpAbdpAvwaxg%2FpxvrFrEqM6KmFoFvKAuXFzJgMX%2B0Mou974lR74lcWNFlbX30WLloWrdQarlvWS9GWN30b4kq%2BuIn1BFdbZlobZl9MrwC74kv%2Bt%3D%3D.gif] _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From jburke at remap.ucla.edu Wed Sep 3 15:51:54 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Wed, 3 Sep 2014 22:51:54 +0000 Subject: [Ndn-interest] Launch of NDN Consortium Message-ID: Hello everyone, We're happy to announce the launch of the NDN Consortium per the press release below. For more information or to participate, please see http://named-data.net/consortium/ - The NDN Project Team ________________________________ UCLA-led consortium to focus on developing a new architecture for the Internet Universities will collaborate with Verisign, Cisco, Panasonic and other corporations to advance Named Data Networking protocol. September 3, 2014 Link to online version Launching a critical new phase in developing the Internet of the future, UCLA will host a consortium of universities and leading technology companies to promote the development and adoption of Named Data Networking. NDN is an emerging Internet architecture that promises to increase network security, accommodate growing bandwidth requirements and simplify the creation of increasingly sophisticated applications. The consortium is being organized by a team of NDN researchers at the UCLA Henry Samueli School of Engineering and Applied Science. Other founding academic members of the NDN project are UC San Diego, Colorado State University, the University of Arizona, the University of Illinois Urbana?Champaign, the University of Memphis, the University of Michigan and Washington University in St. Louis. The first NDN community meeting will be held Sept. 4 and 5 at UCLA?s School of Theater, Film and Television, which has played a key role in envisioning the future of human communication over NDN since the project?s origins in 2010. Among the industry partners planning to participate are Verisign, Cisco Systems and Panasonic. They will be joined by representatives from Anyang University (Korea), Tongji University and Tsinghua University (China), the University of Basel (Switzerland) and Waseda University (Japan). ?Collaboration with industry is an important step toward bringing Future Internet Architectures out of the laboratory and into the real world,? said Darleen Fisher, the NSF program officer who oversees the Future Internet Architectures program supporting NDN. The NDN team?s goal is to build a replacement for Transmission Control Protocol/Internet Protocol, or TCP/IP, the current underlying approach to all communication over the Internet. The consortium aims to generate a vibrant ecosystem of research and experimentation around NDN; preserve and promote the openness of the core NDN architecture; and organize community meetings, workshops and other activities. ?NDN has built significant momentum through a commitment to an open approach that aims to limit proprietary intellectual property claims on core elements of the architecture,? said Lixia Zhang, UCLA?s Jonathan B. Postel Chair in Computer Science and a co-leader of the project. ?This has spurred substantial interest from both academia and industry. Our goal with the consortium is to accelerate the development of architecture that will lift the Internet from its origins as a messaging and information tool and better prepare it for the wide-ranging uses it has today and will have tomorrow,? Zhang said. NDN leverages empirical evidence about what has worked on the Internet and what hasn?t, adapting to changes in usage over the past 30-plus years and simplifying the foundation for development of mobile platforms, smart cars and the Internet of Things ? in which objects and devices are equipped with embedded software and are able to communicate with wireless digital networks. Since 2010, the National Science Foundation?s Future Internet Architectures program has provided more than $13.5 million to the NDN project led by UCLA, including a grant of $5 million that was announced in May. ?Cisco Systems is enthusiastic about the formation of the NDN community,? said David Oran, a Cisco Fellow and a pioneer in Internet Protocol technologies. ?It will help evolve NDN by establishing a multifaceted community of academics, industry and users. We expect this consortium to be a major help in advancing the design, producing open-source software, and fostering standardization and adoption of the technology.? The NDN project is co-led by Zhang and Van Jacobson, a UCLA adjunct professor and member of the Internet Hall of Fame. UCLA became the birthplace of the Internet in 1969, when a message from the lab of UCLA computer science professor Leonard Kleinrock was sent to the Stanford Research Institute ? the first-ever message transmitted over the network that later became known as the Internet. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ignacio.Solis at parc.com Thu Sep 4 01:18:36 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Thu, 4 Sep 2014 08:18:36 +0000 Subject: [Ndn-interest] I have some questions of NDN. In-Reply-To: References: <217039accbf4736317d59b9c578eccd@cweb08.nm.nhnsystem.com> <4bb2beaf1be3421cfe8b619acd112bf9@cweb10.nm.nhnsystem.com> Message-ID: Hi Tai-Lin Chu, Just to clarify; CCN has not been renamed NDN. CCN is the Content Centric Networking project (and protocol) by PARC. You can find the project home page at www.ccnx.org. NDN is a separate project funded by NSF that was originally based on CCN but has now forked in code and functionality. This is the NDN interest list. Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/2/14, 11:35 PM, "Tai-Lin Chu" > wrote: hi, CCN is now renamed to NDN, and a lot of changes happen in five years. Here is a good reading list to keep up with those changes: http://www.caida.org/workshops/ndn/1409/ I hope this helps. Best, Tai-Lin Chu On Tue, Sep 2, 2014 at 10:09 PM, ??? <0023 at naver.com> wrote: Dear NDN. Hello. My name is Haeil Choi. I study Networking at Kookmin graduate school from Korea. I'm interesting the future internet. so I read the your paper NDN. - Networking Named Content by V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, R. L. Braynard CoNEXT 2009, Rome, December, 2009. I have some questions. so I write this e-mail. This paper was 5 years ago, I do not know what is changed now. I would appreciate your response. 1. I want to know what is IP+CCN Router in section 4.1. 2. In figure 6, If Routers have both IP function and CCN function, Does IP Router C know adjacency Router A, B, E, F? How does it know that? 3. I organized the problem related to figure 6 in attached ppt file. Do I show you the exact description? 4. I understand that CCN don't have source and destination. How does the IP Router C find the A and B in Interest packet sent from the F? 5. I want to simulate the same conditions as above in NS3. I don't know how can I designed IP+CCN router in NS3. I wonder if I could get a related data and ndnSIM source. I am looking forward to hearing from you. Thank you for your time. Sincerely, Haeil Choi. [X] _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From davide.pesavento at lip6.fr Thu Sep 4 11:46:33 2014 From: davide.pesavento at lip6.fr (Davide Pesavento) Date: Thu, 4 Sep 2014 11:46:33 -0700 Subject: [Ndn-interest] NFD usage survey - please participate Message-ID: Hi all, In order to better understand how NFD and the ndn-cxx library are used by the community, we (the NFD dev team) have prepared a very short survey that you can find at the following link: https://docs.google.com/forms/d/17cRq_c5a1OzBm5glGOthn5pk_wSANNWrWZtuT_HQv0U/viewform?usp=send_form The results will influence some design choices and will tell us where to focus our future development efforts. We encourage as many people as possible to participate in the survey. Even if you do not currently use NFD but plan to do so in the future, let us know what your needs are. Please submit your responses before Tue 9/9/2014 23:59:59 UTC. Thanks for your participation. On behalf of the NFD team, Davide Pesavento From ithkuil at gmail.com Thu Sep 4 19:00:58 2014 From: ithkuil at gmail.com (Jason Livesay) Date: Thu, 4 Sep 2014 19:00:58 -0700 Subject: [Ndn-interest] Ethereum Message-ID: Hello. Is there any similarity between Ethereum and NDN? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shijunxiao at email.arizona.edu Thu Sep 4 23:13:23 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Thu, 4 Sep 2014 23:13:23 -0700 Subject: [Ndn-interest] Ethereum In-Reply-To: References: Message-ID: Hi Jason I briefly read Ethereum white paper. It's an interesting document. I feel that the two solve different problems: * Ethereum is a blockchain system to carry out transactions. * NDN is a network protocol to distribute content. Although, it is feasible to run Ethereum as an application in NDN network. It is also possible to define a NDN trust model with Ethereum contracts. Yours, Junxiao On Sep 4, 2014 9:01 PM, "Jason Livesay" wrote: > Hello. Is there any similarity between Ethereum and NDN? Thanks. > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix at rabe.io Fri Sep 5 10:54:49 2014 From: felix at rabe.io (Felix Rabe) Date: Fri, 05 Sep 2014 10:54:49 -0700 Subject: [Ndn-interest] Email over NDN Message-ID: <5409F8E9.4030007@rabe.io> Hi This morning I was having a short discussion with someone about email over NDN and how current protocols are ill suited to secure replying and forwarding of signed messages. I can't remember who you were; please quickly drop me a line to felix at rabe.io. I'd like to continue the discussion. - Felix From wentaoshang at gmail.com Fri Sep 5 12:02:40 2014 From: wentaoshang at gmail.com (Wentao Shang) Date: Fri, 5 Sep 2014 12:02:40 -0700 Subject: [Ndn-interest] Email over NDN In-Reply-To: <5409F8E9.4030007@rabe.io> References: <5409F8E9.4030007@rabe.io> Message-ID: Hi Felix, I'm the guy who you talked to ;) My group at UCLA had a few discussions on this topic before. I'd like to exchange ideas with you either through this mailing list or privately. Best, Wentao On Fri, Sep 5, 2014 at 10:54 AM, Felix Rabe wrote: > Hi > > This morning I was having a short discussion with someone about email over > NDN and how current protocols are ill suited to secure replying and > forwarding of signed messages. > > I can't remember who you were; please quickly drop me a line to > felix at rabe.io. I'd like to continue the discussion. > > - Felix > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -- PhD @ IRL, CSD, UCLA -------------- next part -------------- An HTML attachment was scrubbed... URL: From nano at remap.ucla.edu Fri Sep 5 13:03:16 2014 From: nano at remap.ucla.edu (Alex Horn) Date: Fri, 5 Sep 2014 13:03:16 -0700 Subject: [Ndn-interest] Email over NDN In-Reply-To: <5409F8E9.4030007@rabe.io> References: <5409F8E9.4030007@rabe.io> Message-ID: Felix, can?t speak to how it?s ?ill suited? - but i can point to past work that demonstrates signed interests in NDN, and of course Wentao's further work in NDN-JS and BMS . also, the lighting control during the demos yesterday used signed interests. more generally - the following is a follow-up to our discussion yesterday / what your poster made me think of. (to those not familiar w/ his poster, his proposal is a sort of 'NDN Pinterest or Mural.ly ' (all apologies Felix :)) what I was trying to say yesterday re: ?documents? - a) each ?document? is a merkle tree of the names (including versions) of the elements you call them bits, i?d change that terminology for this crowd :) - i used ?elements? in my namespace) b) there is a ?link ? data object that you can use, avoids duplicating content by linking back to a different content object. it?s like a filesystem symbolic link. Likely not necessary in initial implementations, but something to keep in mind. here is a potential app namespace: /ndn/appname/user/document/[name]/[version] /ndn/appname/user/document/[name]/[version]/[element]/[version]/ the [element] above is shorthand for: /ndn/appname/user/element/[name]/[version] in this way. each user has /documents /elements this is potential way for users to include other user?s content/elements in their ?documents?, and it will automatically allow each user to update the elements in their document, or use the element version that they created it with (or any other version). anyway - this is what your poster made me think of... it could arguably even be implemented by sticking data =[NDN NAME] in exiting HTML Divs. that way you can replace content in the divs with the content of the NDN packet in this way you can use a browser as-is, and use NDN as transport. this can all be done with javascript, with a back-end repo on same machine as NFD to store content permanently. note there would be some discussion w/ team required to see what prefix (other than /ndn/appname/) should be used to make your application globally routable... but you can pick any prefix for development, and make sure it's easy to change when ready for others to use :) anyway, just some notes & suggestions to start, likely better ways. cheers Alex On Fri, Sep 5, 2014 at 10:54 AM, Felix Rabe wrote: > Hi > > This morning I was having a short discussion with someone about email over > NDN and how current protocols are ill suited to secure replying and > forwarding of signed messages. > > I can't remember who you were; please quickly drop me a line to > felix at rabe.io. I'd like to continue the discussion. > > - Felix > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix at rabe.io Fri Sep 5 16:40:48 2014 From: felix at rabe.io (Felix Rabe) Date: Fri, 05 Sep 2014 16:40:48 -0700 Subject: [Ndn-interest] Chat (IRC) and Forum Message-ID: <540A4A00.6060604@rabe.io> Hello list I'm looking for two more people :) Chat Question: I've looked at registering #ndn on FreeNode just now, but it was already registered by user "frank_o" about 11 hours ago. Are you on this list, and is this a chatroom for NDN? Forum Question: There was mention (during the NDNcomm 2014 meeting) of an online forum. Whoever started it: Where is it, can it be published / linked to as an auxiliary forum to the mailing lists, e.g. for topic-specific discussions not covered by these lists or development forums? - Felix From nano at remap.ucla.edu Fri Sep 5 16:53:25 2014 From: nano at remap.ucla.edu (Alex Horn) Date: Fri, 5 Sep 2014 16:53:25 -0700 Subject: [Ndn-interest] syncthing.net Message-ID: not reviewed source yet, but quite reminds me of a few NDN research apps :) http://syncthing.net/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From iliamo at ucla.edu Fri Sep 5 18:10:46 2014 From: iliamo at ucla.edu (Ilya Moiseenko) Date: Fri, 5 Sep 2014 18:10:46 -0700 Subject: [Ndn-interest] Chat (IRC) and Forum References: Message-ID: <6BBEC67E-6E0A-48B2-A4DB-6CFCDC7A39A5@ucla.edu> > Hi Felix > >> Hello list >> >> I'm looking for two more people :) >> >> Chat Question: I've looked at registering #ndn on FreeNode just now, but it was already registered by user "frank_o" about 11 hours ago. Are you on this list, and is this a chatroom for NDN? >> >> Forum Question: There was mention (during the NDNcomm 2014 meeting) of an online forum. Whoever started it: Where is it, can it be published / linked to as an auxiliary forum to the mailing lists, e.g. for topic-specific discussions not covered by these lists or development forums? > > I was talking about named-data.net website. Right now there is a section with mailing lists on the right side of the front page. > I proposed to add a discussion section to the top and include links to redmine pages, github pages, ICNRG, etc. > > Maybe it will get done soon > > Ilya > >> - Felix >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From jburke at remap.UCLA.EDU Sat Sep 6 23:02:14 2014 From: jburke at remap.UCLA.EDU (Burke, Jeff) Date: Sun, 7 Sep 2014 06:02:14 +0000 Subject: [Ndn-interest] Chat (IRC) and Forum In-Reply-To: <6BBEC67E-6E0A-48B2-A4DB-6CFCDC7A39A5@ucla.edu> Message-ID: The discussion menu item has been added to the named-data page; ideas on what should go there would be much appreciated. Jeff On 9/5/14, 6:10 PM, "Ilya Moiseenko" wrote: > >> Hi Felix >> >>> Hello list >>> >>> I'm looking for two more people :) >>> >>> Chat Question: I've looked at registering #ndn on FreeNode just now, >>>but it was already registered by user "frank_o" about 11 hours ago. Are >>>you on this list, and is this a chatroom for NDN? >>> >>> Forum Question: There was mention (during the NDNcomm 2014 meeting) of >>>an online forum. Whoever started it: Where is it, can it be published / >>>linked to as an auxiliary forum to the mailing lists, e.g. for >>>topic-specific discussions not covered by these lists or development >>>forums? >> >> I was talking about named-data.net website. Right now there is a >>section with mailing lists on the right side of the front page. >> I proposed to add a discussion section to the top and include links to >>redmine pages, github pages, ICNRG, etc. >> >> Maybe it will get done soon >> >> Ilya >> >>> - Felix >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jburke at remap.ucla.edu Sat Sep 6 23:21:49 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Sun, 7 Sep 2014 06:21:49 +0000 Subject: [Ndn-interest] Email over NDN In-Reply-To: Message-ID: Email was one of the original applications that we were asked to clarify at the time of the NDN proposal submission. FWIW, here's an only slightly edited version of the response ca. 2010, in case it is helpful: Consider the example of sending an email from named-data.net to UCLA address "van at ucla.edu". UCLA email server would 'register' that it wanted to receive Interests with the prefix 'ucla.edu/email_rcpt' (where the well-known name "email_rcpt" identifies the email service in the same way that tcp port 25 identifies it for IP). The UCLA client (named "cs-32-22" in this example) trying to send mail to "van at ucla.edu" would generate a locally unique 'conversation ID', say #21443, (the equivalent of the locally unique clientport 21443) then initiate the transaction by sending an NDN Interest with the name: ucla.edu/email_rcpt/van/named-data.net|cs-32-22|#21443 after 'registering' that it wanted to receive Interests with the prefix: named-data.net/cs-32-22/#21443/ucla.edu/email_rcpt/van (the 'reverse' of the server-to-client data flow name which is the client-to-server data flow name). The prefixes "ucla.edu" and "named-data.net" stand for routable prefixes so that Interests can get from one site to another; we are not addressing the operation of routing in this discussion. If the UCLA server wanted to accept the mail, it couldrespond with an "OK" Data packet to the client's Interest then send the Interest 'named-data.net/cs-32-22/#21443/ucla.edu/email_rcpt/van/S0' topull down the first segment (the "S0" component of the name) of the incoming message. This is the general case: in the special case when the message is short, this retrieval of message content can be avoided by including the content in the initial Interest, achieving the 2 packet optimization mentioned earlier. This example is intentionally simplified from what we would expect in practice, in order to illustrate clearly the simple structure of the basic mechanism. In practice, the initiating Interest would contain additional information beyond the name components for the reverse path. The additional information would include SMTP parameters such as identification of the sender, a signature so authenticity of the request may be verified, and even the entire message in some cases as mentioned earlier. Such information may be encrypted in the name to preserve privacy. Partial name encryption is facilitated in NDN by the fact that names are binary values not text strings as in the examples. Existing SMTP and MIME formats need be adapted only slightly for the parameters, results, and email data, though encoded and packaged differently. A more detailed description that addresses encryption and attachments along with some other issues can be provided. Finally, this basic example focused on how existing mail protocols can be adapted advantageously to NDN, which does not preclude new forms of email architecture designed to leverage NDN further. From: Wentao Shang > Date: Fri, 5 Sep 2014 12:02:40 -0700 To: Felix Rabe > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Email over NDN Hi Felix, I'm the guy who you talked to ;) My group at UCLA had a few discussions on this topic before. I'd like to exchange ideas with you either through this mailing list or privately. Best, Wentao On Fri, Sep 5, 2014 at 10:54 AM, Felix Rabe > wrote: Hi This morning I was having a short discussion with someone about email over NDN and how current protocols are ill suited to secure replying and forwarding of signed messages. I can't remember who you were; please quickly drop me a line to felix at rabe.io. I'd like to continue the discussion. - Felix _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -- PhD @ IRL, CSD, UCLA _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix at rabe.io Sun Sep 7 15:03:17 2014 From: felix at rabe.io (Felix Rabe) Date: Sun, 07 Sep 2014 18:03:17 -0400 Subject: [Ndn-interest] Chat (IRC) and Forum In-Reply-To: References: Message-ID: <540CD625.6000004@rabe.io> Thanks Jeff. Maybe rename "Redmine (Codebase Issues & Wikis)" to "Development (Issues, Wikis)". I'm looking to get an IRC channel with chat logs. (Or equivalent.) Also, a forum or separate mailing list for security (and maybe other topics as well) might be a good addition. On a related note - this is a mockup of the next iteration of named-data.education: http://rabe.io/preview-nde.txt. Comments welcome. - Felix On 06/Sep/14 23:02, Burke, Jeff wrote: > The discussion menu item has been added to the named-data page; ideas on > what should go there would be much appreciated. > Jeff > > > On 9/5/14, 6:10 PM, "Ilya Moiseenko" wrote: > >>> Hi Felix >>> >>>> Hello list >>>> >>>> I'm looking for two more people :) >>>> >>>> Chat Question: I've looked at registering #ndn on FreeNode just now, >>>> but it was already registered by user "frank_o" about 11 hours ago. Are >>>> you on this list, and is this a chatroom for NDN? >>>> >>>> Forum Question: There was mention (during the NDNcomm 2014 meeting) of >>>> an online forum. Whoever started it: Where is it, can it be published / >>>> linked to as an auxiliary forum to the mailing lists, e.g. for >>>> topic-specific discussions not covered by these lists or development >>>> forums? >>> I was talking about named-data.net website. Right now there is a >>> section with mailing lists on the right side of the front page. >>> I proposed to add a discussion section to the top and include links to >>> redmine pages, github pages, ICNRG, etc. >>> >>> Maybe it will get done soon >>> >>> Ilya >>> >>>> - Felix >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From nano at remap.ucla.edu Mon Sep 8 08:59:53 2014 From: nano at remap.ucla.edu (Alex Horn) Date: Mon, 8 Sep 2014 08:59:53 -0700 Subject: [Ndn-interest] NDN-esque IOT, Apps tracking Message-ID: In efforts to for everyone to maintain currency with existing IOT and 'NDN-ish' applications, I've compiled some correspondence into a google document. It's short, but gives a best-of-class overview. Until we have a better means of collaboratively editing; I'm just posting the contents to this list, and inviting comments. please comment on the doc if there are things missing and relevant, I will revise it. Cheers Alex public comment link: http://goo.gl/jK4cHJ current contents: Internet of Things - articles compilation of various IOT tools artistic IOT applications short presentation Internet of Things - API/SDK AllJoyn HomeKit Thread Group flutter - half-mile encrypted arduino rx/tx eclipse IOT contiki openIOT tessel Thing System Kinoma Heim Control IOT hardware apollo sensor stick Oort http://smarthings.com NDN-ish applications https://winch.io/how-it-works/ http://syncthing.net/ http://www.firebase.com/ https://www.ethereum.org/ bittorrent sync -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Sun Sep 14 20:39:25 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sun, 14 Sep 2014 20:39:25 -0700 Subject: [Ndn-interest] any comments on naming convention? Message-ID: hi, Just some questions to know how people feel about it. 1. Do you like it or not? why? 2. Does it fit the need of your application? 3. What might be some possible changes (or even a big redesign) if you are asked to purpose a naming convention? 4. some other thoughts Feel free to answer any of the questions. Thanks [1] http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf From mjs at cisco.com Mon Sep 15 07:55:30 2014 From: mjs at cisco.com (Mark Stapp) Date: Mon, 15 Sep 2014 10:55:30 -0400 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <5416FDE2.4050705@cisco.com> hmm - is this really "just some questions", some personal curiosity? or is it a broader invitation to the community to dig into what's been presented in v1 of the TR, with a view to developing the next version? just to be clear: I do not like the 'naming conventions', I see them as a hangover from the bad old days. Since my "application" is mainly infrastructure, the conventions do not meet my needs. I don't think we can rely on use of magic values in allegedly-opaque name component data, and hope for the best. I think we do need to have typed name components so that we have clear places to carry special kinds of name information, like segment number and version, in a way that is unambiguous. Thanks, Mark On 9/14/14 11:39 PM, Tai-Lin Chu wrote: > hi, > Just some questions to know how people feel about it. > 1. Do you like it or not? why? > 2. Does it fit the need of your application? > 3. What might be some possible changes (or even a big redesign) if you > are asked to purpose a naming convention? > 4. some other thoughts > > Feel free to answer any of the questions. > Thanks > > > [1] http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From tailinchu at gmail.com Mon Sep 15 10:41:39 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Mon, 15 Sep 2014 10:41:39 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5416FDE2.4050705@cisco.com> References: <5416FDE2.4050705@cisco.com> Message-ID: I think for naming convention, the community should come up with a solution that people "want" to use. The ndn platform library also follows naming convention, I don't want people to ditch them for disliking the convention. This will be a serious discussion for next version of TR. Could you describe the ambiguous case? I think I know what you mean by ambiguous, but I just want to make sure. (I remembered someone from cisco mentioned this in ndncomm 2014.) To avoid ambiguity, I will start from name component tlvs and allocate more types for different meanings. 8 = regular name component = segmented name component Thanks. On Mon, Sep 15, 2014 at 7:55 AM, Mark Stapp wrote: > hmm - is this really "just some questions", some personal curiosity? or is > it a broader invitation to the community to dig into what's been presented > in v1 of the TR, with a view to developing the next version? > > just to be clear: I do not like the 'naming conventions', I see them as a > hangover from the bad old days. Since my "application" is mainly > infrastructure, the conventions do not meet my needs. I don't think we can > rely on use of magic values in allegedly-opaque name component data, and > hope for the best. I think we do need to have typed name components so that > we have clear places to carry special kinds of name information, like > segment number and version, in a way that is unambiguous. > > Thanks, > Mark > > > On 9/14/14 11:39 PM, Tai-Lin Chu wrote: >> >> hi, >> Just some questions to know how people feel about it. >> 1. Do you like it or not? why? >> 2. Does it fit the need of your application? >> 3. What might be some possible changes (or even a big redesign) if you >> are asked to purpose a naming convention? >> 4. some other thoughts >> >> Feel free to answer any of the questions. >> Thanks >> >> >> [1] >> http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From shijunxiao at email.arizona.edu Mon Sep 15 10:49:39 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Mon, 15 Sep 2014 10:49:39 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5416FDE2.4050705@cisco.com> References: <5416FDE2.4050705@cisco.com> Message-ID: Dear folks I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. One easy solution to this problem is: restrict the octets to be used in regular names. In rev1, we could require regular NameComponent to start with a valid UTF8 character. In alternate proposal, we could forbid regular NameComponent to start with "_". However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. The MarkedComponent TLV can appear in place of NameComponent. The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* The canonical order is defined as: - A MarkedComponent is less than any NameComponent. - Two MarkedComponents are compared by their length and value, in the same way as NameComponent. The benefits of this solution is: - Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. - Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. - Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. - Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. Yours, Junxiao -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Mon Sep 15 11:14:08 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Mon, 15 Sep 2014 11:14:08 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: I love your solution :) I don't think ndn-tlv 0.2 should add numberComponent; instead markerComponent should be introduced. On Mon, Sep 15, 2014 at 10:49 AM, Junxiao Shi wrote: > Dear folks > > I agree with @MarkStapp that Naming Conventions rev1 does not guarantee > version/segment components to be unambiguous. > One alternate proposal was to use an additional NameComponent before the > number as a marker, such as "_v/" "_s/". This alternate > proposal is also unable to make version/segment components unambiguous, and > it doesn't work well with ChildSelector. > > One easy solution to this problem is: restrict the octets to be used in > regular names. > In rev1, we could require regular NameComponent to start with a valid UTF8 > character. > In alternate proposal, we could forbid regular NameComponent to start with > "_". > However, this solution is undesirable, because some applications do need to > operate with binary components (eg. SignatureBits component in signed > Interest). > > NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a > component is a number. > This is insufficient because it doesn't say the meaning of a number: is it a > version number or a segment number? > > One solution is to declare many new TLV types: VersionComponent, > SegmentComponent, TimestampComponent, etc. > This can guarantee unambiguity, but this restricts the introduction of new > convention, because when we want to introduce another convention in the > future, old consumer applications would not understand the new TLV type. > > > If I'm to redesign the convention, I would introduce a MarkedComponent TLV > type. > The MarkedComponent TLV can appear in place of NameComponent. > The value part of a MarkedComponent contains a VAR-NUMBER which is a marker > code, followed by zero or more arbitrary octets. > > Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* > FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | > MarkedComponent) > MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* > > The canonical order is defined as: > > A MarkedComponent is less than any NameComponent. > Two MarkedComponents are compared by their length and value, in the same way > as NameComponent. > > > The benefits of this solution is: > > Version/segment/etc components are distinguished from regular NameComponent, > because they have a distinct TLV-TYPE: MarkedComponent. > Adding a new convention only needs allocation of a marker code. No new TLV > type is introduced, so that old consumer can continue to work. > Encoding marker code as VAR-NUMBER allows much larger marker space than > restricting to one-octet marker. > Canonical order evaluation is efficient. It's unnecessary to compare marker > code and BYTE* individually, because most applications won't have different > markers under the same prefix. > > > > Yours, Junxiao > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Marc.Mosko at parc.com Mon Sep 15 11:20:36 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 15 Sep 2014 18:20:36 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: This is an interesting discussion. At PARC, when we went away from ccnb to TLV-based name components, we agreed with the Cisco position that different types of name components should have different TLV types. Anything that used to be a command marker was moved to a TLV type and we no longer use command markers. We see having TLV types in the name as redundant with command markers, so long as there is a type space for applications to use to generate their own application-dependent types. We use one general name (binary) name component, one for versions, one for segments (chunks), one for nonces (in the name, not an Interest nonce), one for keys. In our re-implementation of the 0.x repo protocol, those repo command-markers became their own application-dependent name TLV types. In our sync protocol, we use other application-dependent TLV types instead of command markers. Our ordering is defined as the lexicographic compare of each TLV, including the T and L. Because we use a fixed type and fixed length value, this ordering is always well-defined. About a year ago, when we were considering different variable length TLV schemes, it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding. There are some T and L encoding schemes that still allow comparison over the raw bytes, but they have their own drawbacks. One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. I would disagree with this statement. Anytime you introduce a new command-marker, old applications will not understand it. If the new command-marker is required for application execution, then all applications must be updated. If the new command-marker (or tlv type) is not required, then the old application should continue just fine treating the type as opaque. Marc Mosko On Sep 15, 2014, at 10:49 AM, Junxiao Shi > wrote: Dear folks I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. One easy solution to this problem is: restrict the octets to be used in regular names. In rev1, we could require regular NameComponent to start with a valid UTF8 character. In alternate proposal, we could forbid regular NameComponent to start with "_". However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. The MarkedComponent TLV can appear in place of NameComponent. The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* The canonical order is defined as: * A MarkedComponent is less than any NameComponent. * Two MarkedComponents are compared by their length and value, in the same way as NameComponent. The benefits of this solution is: * Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. * Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. * Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. * Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. Yours, Junxiao _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjs at cisco.com Mon Sep 15 11:33:06 2014 From: mjs at cisco.com (Mark Stapp) Date: Mon, 15 Sep 2014 14:33:06 -0400 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: <541730E2.5090005@cisco.com> interesting: On 9/15/14 1:41 PM, Tai-Lin Chu wrote: > I think for naming convention, the community should come up with a > solution that people "want" to use. I think that we should work toward specifying something that does what we've learned (over several years of work) we need to do, and that does not add any additional pain or unnecessary complexity. > > Could you describe the ambiguous case? I think I know what you mean by > ambiguous, but I just want to make sure. (I remembered someone from > cisco mentioned this in ndncomm 2014.) > > To avoid ambiguity, I will start from name component tlvs and allocate > more types for different meanings. > > 8 = regular name component > = segmented name component > yes, this is the approach I favor. typed name components allow us to identify the 'infrastucture' data in name components, while allowing us to set aside some number space for application-defined components. code that just wants to compare names or identify component boundaries needs to be able to treat the components as 'opaque', of course. Thanks, Mark From jefft0 at remap.ucla.edu Mon Sep 15 11:33:53 2014 From: jefft0 at remap.ucla.edu (Thompson, Jeff) Date: Mon, 15 Sep 2014 18:33:53 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: Hi Mark, Thanks for the clear summary. You say "it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding." Can you say more about why varible-length encoding makes strcmp difficult? - Jeff T From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:20 AM To: "shijunxiao at email.arizona.edu" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? This is an interesting discussion. At PARC, when we went away from ccnb to TLV-based name components, we agreed with the Cisco position that different types of name components should have different TLV types. Anything that used to be a command marker was moved to a TLV type and we no longer use command markers. We see having TLV types in the name as redundant with command markers, so long as there is a type space for applications to use to generate their own application-dependent types. We use one general name (binary) name component, one for versions, one for segments (chunks), one for nonces (in the name, not an Interest nonce), one for keys. In our re-implementation of the 0.x repo protocol, those repo command-markers became their own application-dependent name TLV types. In our sync protocol, we use other application-dependent TLV types instead of command markers. Our ordering is defined as the lexicographic compare of each TLV, including the T and L. Because we use a fixed type and fixed length value, this ordering is always well-defined. About a year ago, when we were considering different variable length TLV schemes, it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding. There are some T and L encoding schemes that still allow comparison over the raw bytes, but they have their own drawbacks. One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. I would disagree with this statement. Anytime you introduce a new command-marker, old applications will not understand it. If the new command-marker is required for application execution, then all applications must be updated. If the new command-marker (or tlv type) is not required, then the old application should continue just fine treating the type as opaque. Marc Mosko On Sep 15, 2014, at 10:49 AM, Junxiao Shi > wrote: Dear folks I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. One easy solution to this problem is: restrict the octets to be used in regular names. In rev1, we could require regular NameComponent to start with a valid UTF8 character. In alternate proposal, we could forbid regular NameComponent to start with "_". However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. The MarkedComponent TLV can appear in place of NameComponent. The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* The canonical order is defined as: * A MarkedComponent is less than any NameComponent. * Two MarkedComponents are compared by their length and value, in the same way as NameComponent. The benefits of this solution is: * Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. * Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. * Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. * Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. Yours, Junxiao _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjs at cisco.com Mon Sep 15 11:45:20 2014 From: mjs at cisco.com (Mark Stapp) Date: Mon, 15 Sep 2014 14:45:20 -0400 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: <541733C0.3040202@cisco.com> On 9/15/14 1:49 PM, Junxiao Shi wrote: > Dear folks I agree with your analysis of the issues with several schemes that have been proposed over the years. > > One solution is to declare many new TLV types: VersionComponent, > SegmentComponent, TimestampComponent, etc. > This can guarantee unambiguity, but this restricts the introduction of > new convention, because when we want to introduce another convention in > the future, old consumer applications would not understand the new TLV type. > that's always going to be an issue, whether you make the type part of the TLV "T" or hide it inside the "V", right? For now, to start, I'd prefer to allocate some "T" codes for information that we've already got in name-components, work with them, and see whether we are pushed into a more elaborate scheme by some actual use. I don't think there would need to "declare ... many", actually: I'll bet there'd be just a handful. [I see now that Marc Mosko has also replied - thanks, Marc, for detailing some of the kinds of name-comps you've been working with. Like him, I also think that simplifying the overall TLV encoding helps with issues like comparison and ordering.] Thanks, Mark From Marc.Mosko at parc.com Mon Sep 15 11:50:53 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 15 Sep 2014 18:50:53 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: <00C6FDEB-681A-436D-B196-84C48FC1A260@parc.com> On Sep 15, 2014, at 11:33 AM, Thompson, Jeff > wrote: Hi Mark, Thanks for the clear summary. You say "it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding." Can you say more about why varible-length encoding makes strcmp difficult? At the time, we were having discussions about is a 2-byte ?0? different than a 1-byte ?0?, for example. If they are the same meaning, but one is just incorrectly encoded in 2-bytes, then do we have to validate each T and throw away the ones that are mis-encoded? Also, if the T comes before the L, then the short-lex ordering does not work. Short-lex says that name component A is less than B if then length of A is less than B or of |A| = |B| and A sorts before B. If the T comes before the L, then you cannot simply do a strcmp() because the variable length T?s will throw things off. All you can say is that within a T value, you use short-lex. Marc - Jeff T From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:20 AM To: "shijunxiao at email.arizona.edu" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? This is an interesting discussion. At PARC, when we went away from ccnb to TLV-based name components, we agreed with the Cisco position that different types of name components should have different TLV types. Anything that used to be a command marker was moved to a TLV type and we no longer use command markers. We see having TLV types in the name as redundant with command markers, so long as there is a type space for applications to use to generate their own application-dependent types. We use one general name (binary) name component, one for versions, one for segments (chunks), one for nonces (in the name, not an Interest nonce), one for keys. In our re-implementation of the 0.x repo protocol, those repo command-markers became their own application-dependent name TLV types. In our sync protocol, we use other application-dependent TLV types instead of command markers. Our ordering is defined as the lexicographic compare of each TLV, including the T and L. Because we use a fixed type and fixed length value, this ordering is always well-defined. About a year ago, when we were considering different variable length TLV schemes, it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding. There are some T and L encoding schemes that still allow comparison over the raw bytes, but they have their own drawbacks. One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. I would disagree with this statement. Anytime you introduce a new command-marker, old applications will not understand it. If the new command-marker is required for application execution, then all applications must be updated. If the new command-marker (or tlv type) is not required, then the old application should continue just fine treating the type as opaque. Marc Mosko On Sep 15, 2014, at 10:49 AM, Junxiao Shi > wrote: Dear folks I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. One easy solution to this problem is: restrict the octets to be used in regular names. In rev1, we could require regular NameComponent to start with a valid UTF8 character. In alternate proposal, we could forbid regular NameComponent to start with "_". However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. The MarkedComponent TLV can appear in place of NameComponent. The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* The canonical order is defined as: * A MarkedComponent is less than any NameComponent. * Two MarkedComponents are compared by their length and value, in the same way as NameComponent. The benefits of this solution is: * Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. * Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. * Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. * Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. Yours, Junxiao _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From jefft0 at remap.ucla.edu Mon Sep 15 12:12:05 2014 From: jefft0 at remap.ucla.edu (Thompson, Jeff) Date: Mon, 15 Sep 2014 19:12:05 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <00C6FDEB-681A-436D-B196-84C48FC1A260@parc.com> References: <5416FDE2.4050705@cisco.com> <00C6FDEB-681A-436D-B196-84C48FC1A260@parc.com> Message-ID: Hi Marc. You say "if the T comes before the L, then the short-lex ordering does not work" meaning that the ordering will not depend on the length of the name component "value" but on the type. It seems Junxiao worried about this too when he said "It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix." Is there a use case where it matters that short-lex odering is thrown off when comparing two name components with different types? Is it safe to assume that an application will always be doing short-lex comparison of two name components of the same type (for example, leftmost child of two version components)? - Jeff T From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:50 AM To: Jeff Thompson > Cc: "shijunxiao at email.arizona.edu" >, "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? On Sep 15, 2014, at 11:33 AM, Thompson, Jeff > wrote: Hi Mark, Thanks for the clear summary. You say "it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding." Can you say more about why varible-length encoding makes strcmp difficult? At the time, we were having discussions about is a 2-byte ?0? different than a 1-byte ?0?, for example. If they are the same meaning, but one is just incorrectly encoded in 2-bytes, then do we have to validate each T and throw away the ones that are mis-encoded? Also, if the T comes before the L, then the short-lex ordering does not work. Short-lex says that name component A is less than B if then length of A is less than B or of |A| = |B| and A sorts before B. If the T comes before the L, then you cannot simply do a strcmp() because the variable length T?s will throw things off. All you can say is that within a T value, you use short-lex. Marc - Jeff T From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:20 AM To: "shijunxiao at email.arizona.edu" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? This is an interesting discussion. At PARC, when we went away from ccnb to TLV-based name components, we agreed with the Cisco position that different types of name components should have different TLV types. Anything that used to be a command marker was moved to a TLV type and we no longer use command markers. We see having TLV types in the name as redundant with command markers, so long as there is a type space for applications to use to generate their own application-dependent types. We use one general name (binary) name component, one for versions, one for segments (chunks), one for nonces (in the name, not an Interest nonce), one for keys. In our re-implementation of the 0.x repo protocol, those repo command-markers became their own application-dependent name TLV types. In our sync protocol, we use other application-dependent TLV types instead of command markers. Our ordering is defined as the lexicographic compare of each TLV, including the T and L. Because we use a fixed type and fixed length value, this ordering is always well-defined. About a year ago, when we were considering different variable length TLV schemes, it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding. There are some T and L encoding schemes that still allow comparison over the raw bytes, but they have their own drawbacks. One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. I would disagree with this statement. Anytime you introduce a new command-marker, old applications will not understand it. If the new command-marker is required for application execution, then all applications must be updated. If the new command-marker (or tlv type) is not required, then the old application should continue just fine treating the type as opaque. Marc Mosko On Sep 15, 2014, at 10:49 AM, Junxiao Shi > wrote: Dear folks I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. One easy solution to this problem is: restrict the octets to be used in regular names. In rev1, we could require regular NameComponent to start with a valid UTF8 character. In alternate proposal, we could forbid regular NameComponent to start with "_". However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. The MarkedComponent TLV can appear in place of NameComponent. The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* The canonical order is defined as: * A MarkedComponent is less than any NameComponent. * Two MarkedComponents are compared by their length and value, in the same way as NameComponent. The benefits of this solution is: * Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. * Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. * Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. * Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. Yours, Junxiao _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From jburke at remap.ucla.edu Mon Sep 15 12:12:48 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Mon, 15 Sep 2014 19:12:48 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541730E2.5090005@cisco.com> Message-ID: Hi all, On 9/15/14, 11:33 AM, "Mark Stapp" wrote: >interesting: > >On 9/15/14 1:41 PM, Tai-Lin Chu wrote: >> I think for naming convention, the community should come up with a >> solution that people "want" to use. > >I think that we should work toward specifying something that does what >we've learned (over several years of work) we need to do, and that does >not add any additional pain or unnecessary complexity. So - just to poke at this a bit - who's the "we" here, and whose pain and complexity? Application developers, network architects, equipment mfrs, or ? (And do they agree?) As you know, NDN takes an application-motivated approach to the development and evaluation of the architecture, and NSF has asked for this explicitly in their most recent RFP. If we could discuss some pros/cons from that perspective - perhaps via quick examples or case studies that have led you to this conclusion, it would be helpful (in my mind) to this current discussion. Attached are some previously shared thoughts on segmenting and versioning conventions related to this discussion, and some arguments against types, after page 2. Though Junxiao's marker type is an interesting new twist, I am so far unconvinced that it addresses all of the concerns in this doc. Note - these were not originally intended for a public discussion, at least not in this form, so be kind. :) I can also imagine that some of the discussion could be dismissed via "drop selectors and add manifests" but would suggest addressing the spirit of argument in light of the current NDN architecture instead. (Unfortunately I am on the road so not sure I can keep up with the discussion, but wanted to inject some other perspectives. :) Thanks, Jeff > >> >> Could you describe the ambiguous case? I think I know what you mean by >> ambiguous, but I just want to make sure. (I remembered someone from >> cisco mentioned this in ndncomm 2014.) >> >> To avoid ambiguity, I will start from name component tlvs and allocate >> more types for different meanings. >> >> 8 = regular name component >> = segmented name component >> > >yes, this is the approach I favor. typed name components allow us to >identify the 'infrastucture' data in name components, while allowing us >to set aside some number space for application-defined components. code >that just wants to compare names or identify component boundaries needs >to be able to treat the components as 'opaque', of course. > >Thanks, >Mark >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: A Few Arguments for Explicit Segment and Version Naming 4.pdf Type: application/pdf Size: 224524 bytes Desc: A Few Arguments for Explicit Segment and Version Naming 4.pdf URL: From Marc.Mosko at parc.com Mon Sep 15 12:23:12 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 15 Sep 2014 19:23:12 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> <00C6FDEB-681A-436D-B196-84C48FC1A260@parc.com> Message-ID: <3D3E112D-44A2-4D87-9374-D1450E718BBC@parc.com> If you wish to maintain the current NDN definition of canonical order, having the T before the L will not work, if the T can have different values. How do you do exclusions? Are the not based on canonical order? And prefix matching in general for ?left most child? versus ?right most child?? That is all affected by the canonical order. You can change the definition of the canonical order to make these things work, but it will ripple through the stack and forwarder. I would also disagree with the statement "because most applications won't have different markers under the same prefix.? What is that based on? Of course you need to match the markers and octet strings. Also, the forwarder has no idea of ?the application.? It has to treat all these things as opaque values, it will be comparing the entire values to determine canonical ordering for content to interest matching. Marc On Sep 15, 2014, at 12:12 PM, Thompson, Jeff > wrote: Hi Marc. You say "if the T comes before the L, then the short-lex ordering does not work" meaning that the ordering will not depend on the length of the name component "value" but on the type. It seems Junxiao worried about this too when he said "It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix." Is there a use case where it matters that short-lex odering is thrown off when comparing two name components with different types? Is it safe to assume that an application will always be doing short-lex comparison of two name components of the same type (for example, leftmost child of two version components)? - Jeff T From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:50 AM To: Jeff Thompson > Cc: "shijunxiao at email.arizona.edu" >, "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? On Sep 15, 2014, at 11:33 AM, Thompson, Jeff > wrote: Hi Mark, Thanks for the clear summary. You say "it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding." Can you say more about why varible-length encoding makes strcmp difficult? At the time, we were having discussions about is a 2-byte ?0? different than a 1-byte ?0?, for example. If they are the same meaning, but one is just incorrectly encoded in 2-bytes, then do we have to validate each T and throw away the ones that are mis-encoded? Also, if the T comes before the L, then the short-lex ordering does not work. Short-lex says that name component A is less than B if then length of A is less than B or of |A| = |B| and A sorts before B. If the T comes before the L, then you cannot simply do a strcmp() because the variable length T?s will throw things off. All you can say is that within a T value, you use short-lex. Marc - Jeff T From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:20 AM To: "shijunxiao at email.arizona.edu" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? This is an interesting discussion. At PARC, when we went away from ccnb to TLV-based name components, we agreed with the Cisco position that different types of name components should have different TLV types. Anything that used to be a command marker was moved to a TLV type and we no longer use command markers. We see having TLV types in the name as redundant with command markers, so long as there is a type space for applications to use to generate their own application-dependent types. We use one general name (binary) name component, one for versions, one for segments (chunks), one for nonces (in the name, not an Interest nonce), one for keys. In our re-implementation of the 0.x repo protocol, those repo command-markers became their own application-dependent name TLV types. In our sync protocol, we use other application-dependent TLV types instead of command markers. Our ordering is defined as the lexicographic compare of each TLV, including the T and L. Because we use a fixed type and fixed length value, this ordering is always well-defined. About a year ago, when we were considering different variable length TLV schemes, it became clear that it is difficult to have a ?strcmp()? style comparison over the raw TLV bytes with a variable-length T and L encoding. There are some T and L encoding schemes that still allow comparison over the raw bytes, but they have their own drawbacks. One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. I would disagree with this statement. Anytime you introduce a new command-marker, old applications will not understand it. If the new command-marker is required for application execution, then all applications must be updated. If the new command-marker (or tlv type) is not required, then the old application should continue just fine treating the type as opaque. Marc Mosko On Sep 15, 2014, at 10:49 AM, Junxiao Shi > wrote: Dear folks I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. One easy solution to this problem is: restrict the octets to be used in regular names. In rev1, we could require regular NameComponent to start with a valid UTF8 character. In alternate proposal, we could forbid regular NameComponent to start with "_". However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. The MarkedComponent TLV can appear in place of NameComponent. The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* The canonical order is defined as: * A MarkedComponent is less than any NameComponent. * Two MarkedComponents are compared by their length and value, in the same way as NameComponent. The benefits of this solution is: * Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. * Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. * Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. * Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. Yours, Junxiao _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From oran at cisco.com Mon Sep 15 12:31:04 2014 From: oran at cisco.com (Dave Oran (oran)) Date: Mon, 15 Sep 2014 19:31:04 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: On Sep 15, 2014, at 1:49 PM, Junxiao Shi wrote: > Dear folks > > I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. > One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. > > One easy solution to this problem is: restrict the octets to be used in regular names. > In rev1, we could require regular NameComponent to start with a valid UTF8 character. > In alternate proposal, we could forbid regular NameComponent to start with "_". > However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). > > NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. > This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? > > One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. > This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. > I believe that given the tradeoffs, this is the best approach. Why? - old consumer applications won?t understand a new naming convention no matter how we do it. The question is what is ?safe? for them to do. For this we need a ?meta-convention? about how to treat typed name components not understood by an application. Here, there?s a quite straightforward solution, which is that if the typed name component include a ?genericNameComponent? type, you simply say if you don?t understand the name component type you treat it as if it were ?genericNameComponent? - routers probably should not be required to interpret names in the first place, so this is a non-issue. However, if we define the above meta-convention for applications, we simply say routers treat all name component types as if they were genericNameComponent. - if somebody can come up with a persuasive reason for a router to understand some of the name component types, we can specify that in the architecture and deal with forward compatibility issues that might arise on a case-by-case basis. (I can actually think of a few clever uses, but don?t want to incite a flame-war by suggesting these). > > If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. > The MarkedComponent TLV can appear in place of NameComponent. > The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. > > Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* > FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) > MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* > > The canonical order is defined as: > ? A MarkedComponent is less than any NameComponent. > ? Two MarkedComponents are compared by their length and value, in the same way as NameComponent. > > The benefits of this solution is: > ? Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. > ? Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. I don?t see a significant different between using the TLV or a marker in terms of flexibility. In either case you need a registry to avoid collisions. Using the ?T? of a TLV for a name component has the substantial advantage of fitting directly in with the basic parsing machinery. > ? Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. > ? Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. > > > Yours, Junxiao > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Mon Sep 15 13:22:14 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Mon, 15 Sep 2014 20:22:14 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541730E2.5090005@cisco.com> Message-ID: What is this ?type explosion? the document is talking about? Whether it is a type for a component, a command marker, or a marker component, there needs to be some agreement on what the meaning of type/marker is. If not, then it would just be a regular name component for a single application. So, like Dave says, there needs to be some registry or overall agreement in some form. Wouldn?t there be the same number of types as command markers? So you would basically have ?command marker explosion?? Or is there something special about command markers? What we are discussing here (I hope) is, what is the best way for applications to talk about specific meaning of components. (Versions, Segments, etc). There are multiple ways to do this. - Typed named components are explicit and unambiguous. They always exist. They can be the Generic type or some special type (like Version). The allow arbitrary binary data in a component. - Command markers may or may not exist. This leads to aliasing if you allow arbitrary binary data in a component. If you don?t allow arbitrary binary data in a component, then to solve aliasing you need to escape potentially ambiguous data. - Command components suffer from the same fate. If your data structures do not use component types, then you?ll have to escape binary data to have a consistent type/marker system. Can somebody describe the disadvantage they see of using typed named components? Is it the potential requirement to register? Is it the limited type space? Is it the extra bytes? What is the benefit of command markers over typed named components? Is it possible for applications to follow a convention that doesn?t create aliasing? Yes, you could start creating rules that don?t allow binary data of some value in some components if followed by some components, etc. So now, any application that wants to put binary data has to check it?s own data to make sure it doesn?t break other apps. Sometimes it seems to me that people assume that these names will be human readable most of the time. Where is this assumption coming from? For all we know everything will be binary. Taking my convert-to-CCN hat off for a second I think that it?s in your best interest to use component markers, that way if you keep using prefix matching at least you can request /foo/bar/v_/* and at least force the network to get you something that is a version. Having said that, I still think you?ll end up doing exact matching soon, but in the meantime might as well take advantage of it. Nacho On 9/15/14, 12:12 PM, "Burke, Jeff" wrote: > >Hi all, > > >On 9/15/14, 11:33 AM, "Mark Stapp" wrote: > >>interesting: >> >>On 9/15/14 1:41 PM, Tai-Lin Chu wrote: >>> I think for naming convention, the community should come up with a >>> solution that people "want" to use. >> >>I think that we should work toward specifying something that does what >>we've learned (over several years of work) we need to do, and that does >>not add any additional pain or unnecessary complexity. > >So - just to poke at this a bit - who's the "we" here, and whose pain and >complexity? Application developers, network architects, equipment mfrs, >or ? (And do they agree?) As you know, NDN takes an application-motivated >approach to the development and evaluation of the architecture, and NSF >has asked for this explicitly in their most recent RFP. If we could >discuss some pros/cons from that perspective - perhaps via quick examples >or case studies that have led you to this conclusion, it would be helpful >(in my mind) to this current discussion. > >Attached are some previously shared thoughts on segmenting and versioning >conventions related to this discussion, and some arguments against types, >after page 2. Though Junxiao's marker type is an interesting new twist, I >am so far unconvinced that it addresses all of the concerns in this doc. > >Note - these were not originally intended for a public discussion, at >least not in this form, so be kind. :) I can also imagine that some of the >discussion could be dismissed via "drop selectors and add manifests" but >would suggest addressing the spirit of argument in light of the current >NDN architecture instead. > >(Unfortunately I am on the road so not sure I can keep up with the >discussion, but wanted to inject some other perspectives. :) > >Thanks, >Jeff > >> >>> >>> Could you describe the ambiguous case? I think I know what you mean by >>> ambiguous, but I just want to make sure. (I remembered someone from >>> cisco mentioned this in ndncomm 2014.) >>> >>> To avoid ambiguity, I will start from name component tlvs and allocate >>> more types for different meanings. >>> >>> 8 = regular name component >>> = segmented name component >>> >> >>yes, this is the approach I favor. typed name components allow us to >>identify the 'infrastucture' data in name components, while allowing us >>to set aside some number space for application-defined components. code >>that just wants to compare names or identify component boundaries needs >>to be able to treat the components as 'opaque', of course. >> >>Thanks, >>Mark >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com From zaher at illinois.EDU Mon Sep 15 13:05:56 2014 From: zaher at illinois.EDU (Abdelzaher, Tarek) Date: Mon, 15 Sep 2014 15:05:56 -0500 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <541746A4.1000206@illinois.edu> Ok, here is a short perspective from a person who does not write network code, but rather builds distributed applications: I feel squeamish about specifying naming conventions at all. I feel that the design should favor simplicity and should not burden application developers with having to understand what's better for the network. Hence, I would argue in favor of names that are just series of bits organized into substrings whose semantics are up to the application. I would not use any special bits/characters or other special conventions. I would also most definitely not embed any assumptions on name conventions into network-layer code. As an application developer, I would like to be able to think about name spaces the way I think of UNIX file-name hierarchies. To put it differently, I do not want to read a tutorial on UNIX filename design guidelines in order to ensure that the UNIX file system does file caching, block management, and other functions efficiently for my application. I want file system plumbing to be hidden from me, the application developer. A good design is one that hides such plumbing from the application without impacting efficiency. Same applies to NDN in my opinion. A discussion of special delimiters, markers, etc, that enhances efficiency of certain network functions seems to be going in the opposite direction from what makes NDN general and flexible. Tarek On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: > hi, > Just some questions to know how people feel about it. > 1. Do you like it or not? why? > 2. Does it fit the need of your application? > 3. What might be some possible changes (or even a big redesign) if you > are asked to purpose a naming convention? > 4. some other thoughts > > Feel free to answer any of the questions. > Thanks > > > [1] http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From tailinchu at gmail.com Mon Sep 15 13:53:11 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Mon, 15 Sep 2014 13:53:11 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541730E2.5090005@cisco.com> Message-ID: >So - just to poke at this a bit - who's the "we" here, and whose pain and complexity? Application developers, network architects, equipment mfrs, or ? (And do they agree?) I think "we" are the people who question whether there exists a better naming convention. The current naming convention does not receive enough comments from those who you described. I don't agree with "type explosion". But before I say my reason, I hope someone can bring up Van's reason why this type explosion happens(maybe he got his point too). On Mon, Sep 15, 2014 at 12:12 PM, Burke, Jeff wrote: > > Hi all, > > > On 9/15/14, 11:33 AM, "Mark Stapp" wrote: > >>interesting: >> >>On 9/15/14 1:41 PM, Tai-Lin Chu wrote: >>> I think for naming convention, the community should come up with a >>> solution that people "want" to use. >> >>I think that we should work toward specifying something that does what >>we've learned (over several years of work) we need to do, and that does >>not add any additional pain or unnecessary complexity. > > So - just to poke at this a bit - who's the "we" here, and whose pain and > complexity? Application developers, network architects, equipment mfrs, > or ? (And do they agree?) As you know, NDN takes an application-motivated > approach to the development and evaluation of the architecture, and NSF > has asked for this explicitly in their most recent RFP. If we could > discuss some pros/cons from that perspective - perhaps via quick examples > or case studies that have led you to this conclusion, it would be helpful > (in my mind) to this current discussion. > > Attached are some previously shared thoughts on segmenting and versioning > conventions related to this discussion, and some arguments against types, > after page 2. Though Junxiao's marker type is an interesting new twist, I > am so far unconvinced that it addresses all of the concerns in this doc. > > Note - these were not originally intended for a public discussion, at > least not in this form, so be kind. :) I can also imagine that some of the > discussion could be dismissed via "drop selectors and add manifests" but > would suggest addressing the spirit of argument in light of the current > NDN architecture instead. > > (Unfortunately I am on the road so not sure I can keep up with the > discussion, but wanted to inject some other perspectives. :) > > Thanks, > Jeff > >> >>> >>> Could you describe the ambiguous case? I think I know what you mean by >>> ambiguous, but I just want to make sure. (I remembered someone from >>> cisco mentioned this in ndncomm 2014.) >>> >>> To avoid ambiguity, I will start from name component tlvs and allocate >>> more types for different meanings. >>> >>> 8 = regular name component >>> = segmented name component >>> >> >>yes, this is the approach I favor. typed name components allow us to >>identify the 'infrastucture' data in name components, while allowing us >>to set aside some number space for application-defined components. code >>that just wants to compare names or identify component boundaries needs >>to be able to treat the components as 'opaque', of course. >> >>Thanks, >>Mark >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From tailinchu at gmail.com Mon Sep 15 13:56:29 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Mon, 15 Sep 2014 13:56:29 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <3D3E112D-44A2-4D87-9374-D1450E718BBC@parc.com> References: <5416FDE2.4050705@cisco.com> <00C6FDEB-681A-436D-B196-84C48FC1A260@parc.com> <3D3E112D-44A2-4D87-9374-D1450E718BBC@parc.com> Message-ID: @Marc, Do you mind giving a link to parc's tlv definition so that people will know the exact difference? Thanks. On Mon, Sep 15, 2014 at 12:23 PM, wrote: > If you wish to maintain the current NDN definition of canonical order, > having the T before the L will not work, if the T can have different values. > > How do you do exclusions? Are the not based on canonical order? And prefix > matching in general for ?left most child? versus ?right most child?? That > is all affected by the canonical order. You can change the definition of > the canonical order to make these things work, but it will ripple through > the stack and forwarder. > > I would also disagree with the statement "because most applications won't > have different markers under the same prefix.? What is that based on? Of > course you need to match the markers and octet strings. Also, the forwarder > has no idea of ?the application.? It has to treat all these things as > opaque values, it will be comparing the entire values to determine canonical > ordering for content to interest matching. > > Marc > > > On Sep 15, 2014, at 12:12 PM, Thompson, Jeff wrote: > > Hi Marc. You say "if the T comes before the L, then the short-lex ordering > does not work" meaning that the ordering will not depend on the length of > the name component "value" but on the type. > > It seems Junxiao worried about this too when he said "It's unnecessary to > compare marker code and BYTE* individually, because most applications won't > have different markers under the same prefix." > > Is there a use case where it matters that short-lex odering is thrown off > when comparing two name components with different types? Is it safe to > assume that an application will always be doing short-lex comparison of two > name components of the same type (for example, leftmost child of two version > components)? > > - Jeff T > > > From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:50 AM > To: Jeff Thompson > Cc: "shijunxiao at email.arizona.edu" , > "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? > > On Sep 15, 2014, at 11:33 AM, Thompson, Jeff wrote: > > Hi Mark, > > Thanks for the clear summary. You say "it became clear that it is difficult > to have a ?strcmp()? style comparison over the raw TLV bytes with a > variable-length T and L encoding." Can you say more about why > varible-length encoding makes strcmp difficult? > > > At the time, we were having discussions about is a 2-byte ?0? different than > a 1-byte ?0?, for example. If they are the same meaning, but one is just > incorrectly encoded in 2-bytes, then do we have to validate each T and throw > away the ones that are mis-encoded? > > Also, if the T comes before the L, then the short-lex ordering does not > work. Short-lex says that name component A is less than B if then length of > A is less than B or of |A| = |B| and A sorts before B. If the T comes > before the L, then you cannot simply do a strcmp() because the variable > length T?s will throw things off. All you can say is that within a T value, > you use short-lex. > > Marc > > - Jeff T > > From: "Marc.Mosko at parc.com" > Date: Monday, September 15, 2014 11:20 AM > To: "shijunxiao at email.arizona.edu" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? > > This is an interesting discussion. At PARC, when we went away from ccnb to > TLV-based name components, we agreed with the Cisco position that different > types of name components should have different TLV types. > > Anything that used to be a command marker was moved to a TLV type and we no > longer use command markers. We see having TLV types in the name as > redundant with command markers, so long as there is a type space for > applications to use to generate their own application-dependent types. > > We use one general name (binary) name component, one for versions, one for > segments (chunks), one for nonces (in the name, not an Interest nonce), one > for keys. In our re-implementation of the 0.x repo protocol, those repo > command-markers became their own application-dependent name TLV types. In > our sync protocol, we use other application-dependent TLV types instead of > command markers. > > Our ordering is defined as the lexicographic compare of each TLV, including > the T and L. Because we use a fixed type and fixed length value, this > ordering is always well-defined. About a year ago, when we were considering > different variable length TLV schemes, it became clear that it is difficult > to have a ?strcmp()? style comparison over the raw TLV bytes with a > variable-length T and L encoding. There are some T and L encoding schemes > that still allow comparison over the raw bytes, but they have their own > drawbacks. > > One solution is to declare many new TLV types: VersionComponent, > SegmentComponent, TimestampComponent, etc. > This can guarantee unambiguity, but this restricts the introduction of new > convention, because when we want to introduce another convention in the > future, old consumer applications would not understand the new TLV type. > > > I would disagree with this statement. Anytime you introduce a new > command-marker, old applications will not understand it. If the new > command-marker is required for application execution, then all applications > must be updated. If the new command-marker (or tlv type) is not required, > then the old application should continue just fine treating the type as > opaque. > > Marc Mosko > > > On Sep 15, 2014, at 10:49 AM, Junxiao Shi > wrote: > > Dear folks > > I agree with @MarkStapp that Naming Conventions rev1 does not guarantee > version/segment components to be unambiguous. > One alternate proposal was to use an additional NameComponent before the > number as a marker, such as "_v/" "_s/". This alternate > proposal is also unable to make version/segment components unambiguous, and > it doesn't work well with ChildSelector. > > One easy solution to this problem is: restrict the octets to be used in > regular names. > In rev1, we could require regular NameComponent to start with a valid UTF8 > character. > In alternate proposal, we could forbid regular NameComponent to start with > "_". > However, this solution is undesirable, because some applications do need to > operate with binary components (eg. SignatureBits component in signed > Interest). > > NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a > component is a number. > This is insufficient because it doesn't say the meaning of a number: is it a > version number or a segment number? > > One solution is to declare many new TLV types: VersionComponent, > SegmentComponent, TimestampComponent, etc. > This can guarantee unambiguity, but this restricts the introduction of new > convention, because when we want to introduce another convention in the > future, old consumer applications would not understand the new TLV type. > > > If I'm to redesign the convention, I would introduce a MarkedComponent TLV > type. > The MarkedComponent TLV can appear in place of NameComponent. > The value part of a MarkedComponent contains a VAR-NUMBER which is a marker > code, followed by zero or more arbitrary octets. > > Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* > FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | > MarkedComponent) > MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* > > The canonical order is defined as: > > A MarkedComponent is less than any NameComponent. > Two MarkedComponents are compared by their length and value, in the same way > as NameComponent. > > > The benefits of this solution is: > > Version/segment/etc components are distinguished from regular NameComponent, > because they have a distinct TLV-TYPE: MarkedComponent. > Adding a new convention only needs allocation of a marker code. No new TLV > type is introduced, so that old consumer can continue to work. > Encoding marker code as VAR-NUMBER allows much larger marker space than > restricting to one-octet marker. > Canonical order evaluation is efficient. It's unnecessary to compare marker > code and BYTE* individually, because most applications won't have different > markers under the same prefix. > > > > Yours, Junxiao > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Marc.Mosko at parc.com Mon Sep 15 13:58:16 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 15 Sep 2014 20:58:16 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <562EEC6D-1C1D-4271-9E79-BB3E4D8A7769@parc.com> Jeff, Thank you for sharing that document. I have a few comments. In CCNx 1.0, a name component is the tuple (type, length, value), carried everywhere. The ?type? is not just a namespace encoding, it is considered an integral part of the name component, as a command-marker would be. We do not ignore component types in forwarding. yes, the URI representation gets a bit messier as each name component is ?type=value? (we prefer to call it a ?label=value?, but that?s just wording). I do not think you can separate the name component type from forwarding. If two name component types have different meanings, then they need to be forwarded individually. They need to be included in other things, like selectors processing too. Otherwise, how could one exclude a name component with one type but not another? Using a separate name component to indicate the type again falls back to convention to indicate semantics. I could write a program that puts in a binary value that just happens to decode to ?_v? without intending the next component to be a version. Yes, if you specifically type the ?_v? component differently than a binary component, then it is not ambiguous, but now you are using name component type to indicate semantics, so why not just use it everywhere instead of the _v and _s, etc.? In the document you forwarded, I noticed someone make this observation, and the response was along the lines of ?this is true for any convention? and ?use unusual markers? does not seem like a good justification. Putting the type in a TLV ?T? for all name components is not just a convention, it is a mandatory, standardized system. For any ?unusual marker? one will always be able to find some binary name component that aliases it. In some discussions we had at PARC about this, my position was that if you are using a naming convention, it should be explicit somewhere and standardized, like a URI scheme. If an application cannot place a ?_v? binary component followed by a number in the 4th to last name component (e.g. /foo/_v/10/_s/20), then that really needs to be explicit so we know if an application is conforming to the convention. If the convention is mandatory, then its a standard and should be explicit somewhere in the packet. I do not subscribe to the notion that ?the application? will know ? the point of having data on a network is such that many applications could share the data, like a photo. It seems to me that there is little difference between coordinating the TLV type space for name components and the marker names (?_v?, etc.) in a naming convention. Everyone still needs to agree with them. I guess I?m disagreeing with the sentiment quoted here. > Conventions > can come into being organically based on agreement within a particular application community, and if > technical governance structures are needed to create standards that resolve ambiguity or promote > interoperability, they can arise - but all of this dialogue happens around namespaces, not the weird > intersection of typespaces and namespaces. In regards to the name component ?namespace? becoming bifurcated from the global type namespace, in CCNx 1.0 we promote a non-global TLV system. It just seems inevitable that we will end up there. Our main reason is this. Once a company ?A? gets assigned a TLV type, say in an Interest Guider for proprietary QoS, they will not want to coordinate with the rest of the world for the TLVs they put inside that assigned TLV container. They will want to have control over the inside of their container. Then company ?B? gets another assigned TLV number for an Interest Guider for a flow id, and they start putting their own TLVs in that container. If someone wants to support both of those proprietary types, then they will end up with context-dependent types. Either that, or each company must coordinate every TLV they put in a proprietary type. In CCNx 1.0 we specifically set aside a block of ?application dependent? types so applications or application communities could use non-standardized types. Typically, one would have a name component that identified the application or conventions being used prior to those types. If at some point they were considered generally useful to the community as a whole they could be promoted to standardized types. In a sense, one could think about this like context-dependent types in a name, their meaning coming from a ?left? name component that identifies the application or protocol being used. Marc On Sep 15, 2014, at 12:12 PM, Burke, Jeff wrote: > > Hi all, > > > On 9/15/14, 11:33 AM, "Mark Stapp" wrote: > >> interesting: >> >> On 9/15/14 1:41 PM, Tai-Lin Chu wrote: >>> I think for naming convention, the community should come up with a >>> solution that people "want" to use. >> >> I think that we should work toward specifying something that does what >> we've learned (over several years of work) we need to do, and that does >> not add any additional pain or unnecessary complexity. > > So - just to poke at this a bit - who's the "we" here, and whose pain and > complexity? Application developers, network architects, equipment mfrs, > or ? (And do they agree?) As you know, NDN takes an application-motivated > approach to the development and evaluation of the architecture, and NSF > has asked for this explicitly in their most recent RFP. If we could > discuss some pros/cons from that perspective - perhaps via quick examples > or case studies that have led you to this conclusion, it would be helpful > (in my mind) to this current discussion. > > Attached are some previously shared thoughts on segmenting and versioning > conventions related to this discussion, and some arguments against types, > after page 2. Though Junxiao's marker type is an interesting new twist, I > am so far unconvinced that it addresses all of the concerns in this doc. > > Note - these were not originally intended for a public discussion, at > least not in this form, so be kind. :) I can also imagine that some of the > discussion could be dismissed via "drop selectors and add manifests" but > would suggest addressing the spirit of argument in light of the current > NDN architecture instead. > > (Unfortunately I am on the road so not sure I can keep up with the > discussion, but wanted to inject some other perspectives. :) > > Thanks, > Jeff > >> >>> >>> Could you describe the ambiguous case? I think I know what you mean by >>> ambiguous, but I just want to make sure. (I remembered someone from >>> cisco mentioned this in ndncomm 2014.) >>> >>> To avoid ambiguity, I will start from name component tlvs and allocate >>> more types for different meanings. >>> >>> 8 = regular name component >>> = segmented name component >>> >> >> yes, this is the approach I favor. typed name components allow us to >> identify the 'infrastucture' data in name components, while allowing us >> to set aside some number space for application-defined components. code >> that just wants to compare names or identify component boundaries needs >> to be able to treat the components as 'opaque', of course. >> >> Thanks, >> Mark >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Mon Sep 15 14:02:48 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 15 Sep 2014 21:02:48 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> <00C6FDEB-681A-436D-B196-84C48FC1A260@parc.com> <3D3E112D-44A2-4D87-9374-D1450E718BBC@parc.com> Message-ID: <7F4E75C6-531C-418A-9405-4BC0BFC0C1A0@parc.com> All of our protocol specs are at http://www.ccnx.org/documentation/. You need to click on the black bar that says ?CCNx 1.0 PROTOCOL SPECIFICATION? to expand it. This document specifies the overall TLV format http://www.ccnx.org/pubs/ccnx-mosko-tlvpackets-01.txt This document specifies how we use our TLV format for CCNx messages and includes the definitions of the common name component types. http://www.ccnx.org/pubs/ccnx-mosko-tlvmessages-01.txt In Labeled Segment URIs, we describe a standard URI format for representing URIs with labels for each path segment. http://www.ccnx.org/pubs/ccnx-mosko-labeledsegments-01.txt In Labeled Content Information, we describe the new CCNx 1.0 URI scheme ?lci:? http://www.ccnx.org/pubs/ccnx-mosko-labeledcontent-01.txt In some of the other standards, you may find we introduce other TLV types for name components, as needed in those specifications (note: there?s an error in a couple that assigned over-lapping numbers which we will be fixing soon). Marc On Sep 15, 2014, at 1:56 PM, Tai-Lin Chu wrote: > @Marc, > Do you mind giving a link to parc's tlv definition so that people will > know the exact difference? > > Thanks. > > On Mon, Sep 15, 2014 at 12:23 PM, wrote: >> If you wish to maintain the current NDN definition of canonical order, >> having the T before the L will not work, if the T can have different values. >> >> How do you do exclusions? Are the not based on canonical order? And prefix >> matching in general for ?left most child? versus ?right most child?? That >> is all affected by the canonical order. You can change the definition of >> the canonical order to make these things work, but it will ripple through >> the stack and forwarder. >> >> I would also disagree with the statement "because most applications won't >> have different markers under the same prefix.? What is that based on? Of >> course you need to match the markers and octet strings. Also, the forwarder >> has no idea of ?the application.? It has to treat all these things as >> opaque values, it will be comparing the entire values to determine canonical >> ordering for content to interest matching. >> >> Marc >> >> >> On Sep 15, 2014, at 12:12 PM, Thompson, Jeff wrote: >> >> Hi Marc. You say "if the T comes before the L, then the short-lex ordering >> does not work" meaning that the ordering will not depend on the length of >> the name component "value" but on the type. >> >> It seems Junxiao worried about this too when he said "It's unnecessary to >> compare marker code and BYTE* individually, because most applications won't >> have different markers under the same prefix." >> >> Is there a use case where it matters that short-lex odering is thrown off >> when comparing two name components with different types? Is it safe to >> assume that an application will always be doing short-lex comparison of two >> name components of the same type (for example, leftmost child of two version >> components)? >> >> - Jeff T >> >> >> From: "Marc.Mosko at parc.com" >> Date: Monday, September 15, 2014 11:50 AM >> To: Jeff Thompson >> Cc: "shijunxiao at email.arizona.edu" , >> "ndn-interest at lists.cs.ucla.edu" >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> On Sep 15, 2014, at 11:33 AM, Thompson, Jeff wrote: >> >> Hi Mark, >> >> Thanks for the clear summary. You say "it became clear that it is difficult >> to have a ?strcmp()? style comparison over the raw TLV bytes with a >> variable-length T and L encoding." Can you say more about why >> varible-length encoding makes strcmp difficult? >> >> >> At the time, we were having discussions about is a 2-byte ?0? different than >> a 1-byte ?0?, for example. If they are the same meaning, but one is just >> incorrectly encoded in 2-bytes, then do we have to validate each T and throw >> away the ones that are mis-encoded? >> >> Also, if the T comes before the L, then the short-lex ordering does not >> work. Short-lex says that name component A is less than B if then length of >> A is less than B or of |A| = |B| and A sorts before B. If the T comes >> before the L, then you cannot simply do a strcmp() because the variable >> length T?s will throw things off. All you can say is that within a T value, >> you use short-lex. >> >> Marc >> >> - Jeff T >> >> From: "Marc.Mosko at parc.com" >> Date: Monday, September 15, 2014 11:20 AM >> To: "shijunxiao at email.arizona.edu" >> Cc: "ndn-interest at lists.cs.ucla.edu" >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> This is an interesting discussion. At PARC, when we went away from ccnb to >> TLV-based name components, we agreed with the Cisco position that different >> types of name components should have different TLV types. >> >> Anything that used to be a command marker was moved to a TLV type and we no >> longer use command markers. We see having TLV types in the name as >> redundant with command markers, so long as there is a type space for >> applications to use to generate their own application-dependent types. >> >> We use one general name (binary) name component, one for versions, one for >> segments (chunks), one for nonces (in the name, not an Interest nonce), one >> for keys. In our re-implementation of the 0.x repo protocol, those repo >> command-markers became their own application-dependent name TLV types. In >> our sync protocol, we use other application-dependent TLV types instead of >> command markers. >> >> Our ordering is defined as the lexicographic compare of each TLV, including >> the T and L. Because we use a fixed type and fixed length value, this >> ordering is always well-defined. About a year ago, when we were considering >> different variable length TLV schemes, it became clear that it is difficult >> to have a ?strcmp()? style comparison over the raw TLV bytes with a >> variable-length T and L encoding. There are some T and L encoding schemes >> that still allow comparison over the raw bytes, but they have their own >> drawbacks. >> >> One solution is to declare many new TLV types: VersionComponent, >> SegmentComponent, TimestampComponent, etc. >> This can guarantee unambiguity, but this restricts the introduction of new >> convention, because when we want to introduce another convention in the >> future, old consumer applications would not understand the new TLV type. >> >> >> I would disagree with this statement. Anytime you introduce a new >> command-marker, old applications will not understand it. If the new >> command-marker is required for application execution, then all applications >> must be updated. If the new command-marker (or tlv type) is not required, >> then the old application should continue just fine treating the type as >> opaque. >> >> Marc Mosko >> >> >> On Sep 15, 2014, at 10:49 AM, Junxiao Shi >> wrote: >> >> Dear folks >> >> I agree with @MarkStapp that Naming Conventions rev1 does not guarantee >> version/segment components to be unambiguous. >> One alternate proposal was to use an additional NameComponent before the >> number as a marker, such as "_v/" "_s/". This alternate >> proposal is also unable to make version/segment components unambiguous, and >> it doesn't work well with ChildSelector. >> >> One easy solution to this problem is: restrict the octets to be used in >> regular names. >> In rev1, we could require regular NameComponent to start with a valid UTF8 >> character. >> In alternate proposal, we could forbid regular NameComponent to start with >> "_". >> However, this solution is undesirable, because some applications do need to >> operate with binary components (eg. SignatureBits component in signed >> Interest). >> >> NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a >> component is a number. >> This is insufficient because it doesn't say the meaning of a number: is it a >> version number or a segment number? >> >> One solution is to declare many new TLV types: VersionComponent, >> SegmentComponent, TimestampComponent, etc. >> This can guarantee unambiguity, but this restricts the introduction of new >> convention, because when we want to introduce another convention in the >> future, old consumer applications would not understand the new TLV type. >> >> >> If I'm to redesign the convention, I would introduce a MarkedComponent TLV >> type. >> The MarkedComponent TLV can appear in place of NameComponent. >> The value part of a MarkedComponent contains a VAR-NUMBER which is a marker >> code, followed by zero or more arbitrary octets. >> >> Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* >> FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | >> MarkedComponent) >> MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* >> >> The canonical order is defined as: >> >> A MarkedComponent is less than any NameComponent. >> Two MarkedComponents are compared by their length and value, in the same way >> as NameComponent. >> >> >> The benefits of this solution is: >> >> Version/segment/etc components are distinguished from regular NameComponent, >> because they have a distinct TLV-TYPE: MarkedComponent. >> Adding a new convention only needs allocation of a marker code. No new TLV >> type is introduced, so that old consumer can continue to work. >> Encoding marker code as VAR-NUMBER allows much larger marker space than >> restricting to one-octet marker. >> Canonical order evaluation is efficient. It's unnecessary to compare marker >> code and BYTE* individually, because most applications won't have different >> markers under the same prefix. >> >> >> >> Yours, Junxiao >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> From christian.tschudin at unibas.ch Mon Sep 15 14:03:59 2014 From: christian.tschudin at unibas.ch (christian.tschudin at unibas.ch) Date: Mon, 15 Sep 2014 23:03:59 +0200 (CEST) Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> Message-ID: I question whether these marker components have to be part of the "name" that is of concern to the router (of the forwarder). I would collect all markers in a separate packet field. If one takes TR language of Section "3 Functional Name Components" at face value, it is about function parameters to a command or data rendering function, hence the following deconstruction: let INTEREST(NDN-name-according-to-TR-0022) render_and_retrieve_data(routingHint, attributeSet) routingHint ::= dataStemName dataStemName ::= NAME-TYPE TLV-LENGTH NameComponent* attributeSet ::= ATTRSET-TYPE TLV_LENGTH attributeVal* attributeVal ::= finalBlockID | version | segment | param 1 | ... | paramN | signatureInfo | signatureValue | The attributeSet goes beyond routing, concerns the lookup in the CS, the application layer, perhaps in-network security checks. If some attributes are positional (=depend on name component context), then they should obtain their own type number or use other conventions (e.g. hierarchical version info relating for a document's chapter, its sections, paragraphs). Note that this separate attribute set also covers I-feel-lucky: render_and_retrieve_data("/com/google/search?", "NDN-0022 filetype:pdf") christian On Mon, 15 Sep 2014, Dave Oran (oran) wrote: > > On Sep 15, 2014, at 1:49 PM, Junxiao Shi wrote: > >> Dear folks >> >> I agree with @MarkStapp that Naming Conventions rev1 does not guarantee version/segment components to be unambiguous. >> One alternate proposal was to use an additional NameComponent before the number as a marker, such as "_v/" "_s/". This alternate proposal is also unable to make version/segment components unambiguous, and it doesn't work well with ChildSelector. >> >> One easy solution to this problem is: restrict the octets to be used in regular names. >> In rev1, we could require regular NameComponent to start with a valid UTF8 character. >> In alternate proposal, we could forbid regular NameComponent to start with "_". >> However, this solution is undesirable, because some applications do need to operate with binary components (eg. SignatureBits component in signed Interest). >> >> NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a component is a number. >> This is insufficient because it doesn't say the meaning of a number: is it a version number or a segment number? >> >> One solution is to declare many new TLV types: VersionComponent, SegmentComponent, TimestampComponent, etc. >> This can guarantee unambiguity, but this restricts the introduction of new convention, because when we want to introduce another convention in the future, old consumer applications would not understand the new TLV type. >> > I believe that given the tradeoffs, this is the best approach. Why? > > - old consumer applications won?t understand a new naming convention no matter how we do it. The question is what is ?safe? for them to do. For this we need a ?meta-convention? about how to treat typed name components not understood by an application. Here, there?s a quite straightforward solution, which is that if the typed name component include a ?genericNameComponent? type, you simply say if you don?t understand the name component type you treat it as if it were ?genericNameComponent? > > - routers probably should not be required to interpret names in the first place, so this is a non-issue. However, if we define the above meta-convention for applications, we simply say routers treat all name component types as if they were genericNameComponent. > > - if somebody can come up with a persuasive reason for a router to understand some of the name component types, we can specify that in the architecture and deal with forward compatibility issues that might arise on a case-by-case basis. (I can actually think of a few clever uses, but don?t want to incite a flame-war by suggesting these). > > >> >> If I'm to redesign the convention, I would introduce a MarkedComponent TLV type. >> The MarkedComponent TLV can appear in place of NameComponent. >> The value part of a MarkedComponent contains a VAR-NUMBER which is a marker code, followed by zero or more arbitrary octets. >> >> Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* >> FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | MarkedComponent) >> MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* >> >> The canonical order is defined as: >> ? A MarkedComponent is less than any NameComponent. >> ? Two MarkedComponents are compared by their length and value, in the same way as NameComponent. >> >> The benefits of this solution is: >> ? Version/segment/etc components are distinguished from regular NameComponent, because they have a distinct TLV-TYPE: MarkedComponent. >> ? Adding a new convention only needs allocation of a marker code. No new TLV type is introduced, so that old consumer can continue to work. > I don?t see a significant different between using the TLV or a marker in terms of flexibility. In either case you need a registry to avoid collisions. Using the ?T? of a TLV for a name component has the substantial advantage of fitting directly in with the basic parsing machinery. > >> ? Encoding marker code as VAR-NUMBER allows much larger marker space than restricting to one-octet marker. >> ? Canonical order evaluation is efficient. It's unnecessary to compare marker code and BYTE* individually, because most applications won't have different markers under the same prefix. >> >> >> Yours, Junxiao >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Ignacio.Solis at parc.com Mon Sep 15 14:09:26 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Mon, 15 Sep 2014 21:09:26 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <5416FDE2.4050705@cisco.com> <00C6FDEB-681A-436D-B196-84C48FC1A260@parc.com> <3D3E112D-44A2-4D87-9374-D1450E718BBC@parc.com> Message-ID: >From a general perspective we gave a short description of the changes from CCN 0.x to 1.x at the last IETF. You can find the slides at http://www.ietf.org/proceedings/90/slides/slides-90-icnrg-10.pdf Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/15/14, 1:56 PM, "Tai-Lin Chu" wrote: >@Marc, >Do you mind giving a link to parc's tlv definition so that people will >know the exact difference? > >Thanks. > >On Mon, Sep 15, 2014 at 12:23 PM, wrote: >> If you wish to maintain the current NDN definition of canonical order, >> having the T before the L will not work, if the T can have different >>values. >> >> How do you do exclusions? Are the not based on canonical order? And >>prefix >> matching in general for ?left most child? versus ?right most child?? >>That >> is all affected by the canonical order. You can change the definition >>of >> the canonical order to make these things work, but it will ripple >>through >> the stack and forwarder. >> >> I would also disagree with the statement "because most applications >>won't >> have different markers under the same prefix.? What is that based on? >>Of >> course you need to match the markers and octet strings. Also, the >>forwarder >> has no idea of ?the application.? It has to treat all these things as >> opaque values, it will be comparing the entire values to determine >>canonical >> ordering for content to interest matching. >> >> Marc >> >> >> On Sep 15, 2014, at 12:12 PM, Thompson, Jeff >>wrote: >> >> Hi Marc. You say "if the T comes before the L, then the short-lex >>ordering >> does not work" meaning that the ordering will not depend on the length >>of >> the name component "value" but on the type. >> >> It seems Junxiao worried about this too when he said "It's unnecessary >>to >> compare marker code and BYTE* individually, because most applications >>won't >> have different markers under the same prefix." >> >> Is there a use case where it matters that short-lex odering is thrown >>off >> when comparing two name components with different types? Is it safe to >> assume that an application will always be doing short-lex comparison of >>two >> name components of the same type (for example, leftmost child of two >>version >> components)? >> >> - Jeff T >> >> >> From: "Marc.Mosko at parc.com" >> Date: Monday, September 15, 2014 11:50 AM >> To: Jeff Thompson >> Cc: "shijunxiao at email.arizona.edu" , >> "ndn-interest at lists.cs.ucla.edu" >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> On Sep 15, 2014, at 11:33 AM, Thompson, Jeff >>wrote: >> >> Hi Mark, >> >> Thanks for the clear summary. You say "it became clear that it is >>difficult >> to have a ?strcmp()? style comparison over the raw TLV bytes with a >> variable-length T and L encoding." Can you say more about why >> varible-length encoding makes strcmp difficult? >> >> >> At the time, we were having discussions about is a 2-byte ?0? different >>than >> a 1-byte ?0?, for example. If they are the same meaning, but one is >>just >> incorrectly encoded in 2-bytes, then do we have to validate each T and >>throw >> away the ones that are mis-encoded? >> >> Also, if the T comes before the L, then the short-lex ordering does not >> work. Short-lex says that name component A is less than B if then >>length of >> A is less than B or of |A| = |B| and A sorts before B. If the T comes >> before the L, then you cannot simply do a strcmp() because the variable >> length T?s will throw things off. All you can say is that within a T >>value, >> you use short-lex. >> >> Marc >> >> - Jeff T >> >> From: "Marc.Mosko at parc.com" >> Date: Monday, September 15, 2014 11:20 AM >> To: "shijunxiao at email.arizona.edu" >> Cc: "ndn-interest at lists.cs.ucla.edu" >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> This is an interesting discussion. At PARC, when we went away from >>ccnb to >> TLV-based name components, we agreed with the Cisco position that >>different >> types of name components should have different TLV types. >> >> Anything that used to be a command marker was moved to a TLV type and >>we no >> longer use command markers. We see having TLV types in the name as >> redundant with command markers, so long as there is a type space for >> applications to use to generate their own application-dependent types. >> >> We use one general name (binary) name component, one for versions, one >>for >> segments (chunks), one for nonces (in the name, not an Interest nonce), >>one >> for keys. In our re-implementation of the 0.x repo protocol, those repo >> command-markers became their own application-dependent name TLV types. >>In >> our sync protocol, we use other application-dependent TLV types instead >>of >> command markers. >> >> Our ordering is defined as the lexicographic compare of each TLV, >>including >> the T and L. Because we use a fixed type and fixed length value, this >> ordering is always well-defined. About a year ago, when we were >>considering >> different variable length TLV schemes, it became clear that it is >>difficult >> to have a ?strcmp()? style comparison over the raw TLV bytes with a >> variable-length T and L encoding. There are some T and L encoding >>schemes >> that still allow comparison over the raw bytes, but they have their own >> drawbacks. >> >> One solution is to declare many new TLV types: VersionComponent, >> SegmentComponent, TimestampComponent, etc. >> This can guarantee unambiguity, but this restricts the introduction of >>new >> convention, because when we want to introduce another convention in the >> future, old consumer applications would not understand the new TLV type. >> >> >> I would disagree with this statement. Anytime you introduce a new >> command-marker, old applications will not understand it. If the new >> command-marker is required for application execution, then all >>applications >> must be updated. If the new command-marker (or tlv type) is not >>required, >> then the old application should continue just fine treating the type as >> opaque. >> >> Marc Mosko >> >> >> On Sep 15, 2014, at 10:49 AM, Junxiao Shi >> wrote: >> >> Dear folks >> >> I agree with @MarkStapp that Naming Conventions rev1 does not guarantee >> version/segment components to be unambiguous. >> One alternate proposal was to use an additional NameComponent before the >> number as a marker, such as "_v/" "_s/". This >>alternate >> proposal is also unable to make version/segment components unambiguous, >>and >> it doesn't work well with ChildSelector. >> >> One easy solution to this problem is: restrict the octets to be used in >> regular names. >> In rev1, we could require regular NameComponent to start with a valid >>UTF8 >> character. >> In alternate proposal, we could forbid regular NameComponent to start >>with >> "_". >> However, this solution is undesirable, because some applications do >>need to >> operate with binary components (eg. SignatureBits component in signed >> Interest). >> >> NDN-TLV 0.2.0 (unapproved spec) introduces NumberComponent to indicate a >> component is a number. >> This is insufficient because it doesn't say the meaning of a number: is >>it a >> version number or a segment number? >> >> One solution is to declare many new TLV types: VersionComponent, >> SegmentComponent, TimestampComponent, etc. >> This can guarantee unambiguity, but this restricts the introduction of >>new >> convention, because when we want to introduce another convention in the >> future, old consumer applications would not understand the new TLV type. >> >> >> If I'm to redesign the convention, I would introduce a MarkedComponent >>TLV >> type. >> The MarkedComponent TLV can appear in place of NameComponent. >> The value part of a MarkedComponent contains a VAR-NUMBER which is a >>marker >> code, followed by zero or more arbitrary octets. >> >> Name ::= NAME-TYPE TLV-LENGTH (NameComponent | MarkedComponent)* >> FinalBlockId ::= FINAL-BLOCK-ID-TYPE TLV-LENGTH (NameComponent | >> MarkedComponent) >> MarkedComponent ::= MARKED-COMPONENT-TYPE TLV-LENGTH VAR-NUMBER BYTE* >> >> The canonical order is defined as: >> >> A MarkedComponent is less than any NameComponent. >> Two MarkedComponents are compared by their length and value, in the >>same way >> as NameComponent. >> >> >> The benefits of this solution is: >> >> Version/segment/etc components are distinguished from regular >>NameComponent, >> because they have a distinct TLV-TYPE: MarkedComponent. >> Adding a new convention only needs allocation of a marker code. No new >>TLV >> type is introduced, so that old consumer can continue to work. >> Encoding marker code as VAR-NUMBER allows much larger marker space than >> restricting to one-octet marker. >> Canonical order evaluation is efficient. It's unnecessary to compare >>marker >> code and BYTE* individually, because most applications won't have >>different >> markers under the same prefix. >> >> >> >> Yours, Junxiao >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From felix at rabe.io Mon Sep 15 14:17:29 2014 From: felix at rabe.io (Felix Rabe) Date: Mon, 15 Sep 2014 23:17:29 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541746A4.1000206@illinois.edu> References: <541746A4.1000206@illinois.edu> Message-ID: <54175769.6070205@rabe.io> I agree with a need for simplicity. But maybe we cannot have simplicity so let's look at some use cases that might justify the complexity of the conventions: (I will only discuss versions and timestamps here, as they interest me for my work.) Versions: I can give someone a link to a document, and know that they will see the version I currently see, unmodified, even if there is a newer version. Of course, any participant can always get a newer version. - But: I think applications should come up with conventions in versioning, as one application does not need it, another needs sequential versioning, another (like Git-style source code management) needs a DAG. Timestamps: My measurements are a growing data set of (time, value) tuples. I can access multiple measurements across time for comparisons. - But: Also, this needs to be defined by applications, as time-since-unix-epoch is (I think) unsuitable for a wide range of applications (archaeology, astronomy), where both range and granularity requirements are different. Then, there is vector clocks which work with logical time instead of real time. - Another aspect is that this could just be a special case of versioning, using time as a version "number". Segments: ... Sequence: ... Just some thoughts - Felix On 15/Sep/14 22:05, Abdelzaher, Tarek wrote: > Ok, here is a short perspective from a person who does not write > network code, but rather builds distributed applications: > > I feel squeamish about specifying naming conventions at all. I feel > that the design should favor simplicity and should not burden > application developers with having to understand what's better for the > network. Hence, I would argue in favor of names that are just series > of bits organized into substrings whose semantics are up to the > application. I would not use any special bits/characters or other > special conventions. I would also most definitely not embed any > assumptions on name conventions into network-layer code. > > As an application developer, I would like to be able to think about > name spaces the way I think of UNIX file-name hierarchies. To put it > differently, I do not want to read a tutorial on UNIX filename design > guidelines in order to ensure that the UNIX file system does file > caching, block management, and other functions efficiently for my > application. I want file system plumbing to be hidden from me, the > application developer. A good design is one that hides such plumbing > from the application without impacting efficiency. Same applies to NDN > in my opinion. A discussion of special delimiters, markers, etc, that > enhances efficiency of certain network functions seems to be going in > the opposite direction from what makes NDN general and flexible. > > Tarek > > > On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: >> hi, >> Just some questions to know how people feel about it. >> 1. Do you like it or not? why? >> 2. Does it fit the need of your application? >> 3. What might be some possible changes (or even a big redesign) if you >> are asked to purpose a naming convention? >> 4. some other thoughts >> >> Feel free to answer any of the questions. >> Thanks >> >> >> [1] >> http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From zaher at illinois.EDU Mon Sep 15 16:07:43 2014 From: zaher at illinois.EDU (Abdelzaher, Tarek) Date: Mon, 15 Sep 2014 18:07:43 -0500 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <54175769.6070205@rabe.io> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> Message-ID: <5417713F.4090201@illinois.edu> Felix, At the risk of spamming the list, let me elaborate. What I am arguing for is that we do not need unified conventions. Different applications will find solutions that are more appropriate to them. Otherwise, there will be too many things to add and it is a slippery slope. Assume all we have are name strings composed of a concatenation of sub-strings that, as far as the network is concerned, are all of the same type (i.e., no special characters, markers, delimiters, type values, etc), although from the application's perspective can have different semantics. Hence, for example, if versions are important to my application, I should have version numbers as part of the name space (e.g., /path/data-name/version). To get a specific version, one can ask for it by name (e.g., /path/data-name/version-label). To get the latest, there is an issue with caching (how long should things be cached before they expire). Barring that, one can ask for /path/data-name and have the interest be forwarded to the provider assuming that stale versions expired from caches. Alternatively, to make sure one bypasses caching, one can have a "no cache" bit in the interest or ask for, say, /path/data-name/random-unique-substring-encoding-a-request-to-the-app. Since the random-unique-substring-encoding-a-request-to-the-app will not match any cached names, the interest will be sent to the provider advertising the prefix, forwarded up to the application (registered on the face exporting the prefix) and invoke an application-layer function that will decode and interpret the unique-substring-encoding-a-request-to-the-app. The function will then answer the request by returning, say, the name of the requested latest version to the client, so it can send a proper interest for the named version. I am not suggesting that the above is necessarily a good idea. I am just saying there can be many different ways to solve the problem that depend on things such as how frequently versions are updated, how big the objects are, etc. Hence, I am arguing for simplicity and generality by making the underlying support as simple as possible. Philosophically speaking, the more "common cases" we think of and the more complex we make the underlying name format, the more assumptions we may be making that time may break later, causing the work to be potentially more short-lived. Tarek On 9/15/2014 4:17 PM, Felix Rabe wrote: > I agree with a need for simplicity. > > But maybe we cannot have simplicity so let's look at some use cases > that might justify the complexity of the conventions: (I will only > discuss versions and timestamps here, as they interest me for my work.) > > Versions: I can give someone a link to a document, and know that they > will see the version I currently see, unmodified, even if there is a > newer version. Of course, any participant can always get a newer version. > - But: I think applications should come up with conventions in > versioning, as one application does not need it, another needs > sequential versioning, another (like Git-style source code management) > needs a DAG. > > Timestamps: My measurements are a growing data set of (time, value) > tuples. I can access multiple measurements across time for comparisons. > - But: Also, this needs to be defined by applications, as > time-since-unix-epoch is (I think) unsuitable for a wide range of > applications (archaeology, astronomy), where both range and > granularity requirements are different. Then, there is vector clocks > which work with logical time instead of real time. > - Another aspect is that this could just be a special case of > versioning, using time as a version "number". > > Segments: ... > Sequence: ... > > Just some thoughts > - Felix > > On 15/Sep/14 22:05, Abdelzaher, Tarek wrote: >> Ok, here is a short perspective from a person who does not write >> network code, but rather builds distributed applications: >> >> I feel squeamish about specifying naming conventions at all. I feel >> that the design should favor simplicity and should not burden >> application developers with having to understand what's better for >> the network. Hence, I would argue in favor of names that are just >> series of bits organized into substrings whose semantics are up to >> the application. I would not use any special bits/characters or other >> special conventions. I would also most definitely not embed any >> assumptions on name conventions into network-layer code. >> >> As an application developer, I would like to be able to think about >> name spaces the way I think of UNIX file-name hierarchies. To put it >> differently, I do not want to read a tutorial on UNIX filename design >> guidelines in order to ensure that the UNIX file system does file >> caching, block management, and other functions efficiently for my >> application. I want file system plumbing to be hidden from me, the >> application developer. A good design is one that hides such plumbing >> from the application without impacting efficiency. Same applies to >> NDN in my opinion. A discussion of special delimiters, markers, etc, >> that enhances efficiency of certain network functions seems to be >> going in the opposite direction from what makes NDN general and >> flexible. >> >> Tarek >> >> >> On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: >>> hi, >>> Just some questions to know how people feel about it. >>> 1. Do you like it or not? why? >>> 2. Does it fit the need of your application? >>> 3. What might be some possible changes (or even a big redesign) if you >>> are asked to purpose a naming convention? >>> 4. some other thoughts >>> >>> Feel free to answer any of the questions. >>> Thanks >>> >>> >>> [1] >>> http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From tailinchu at gmail.com Mon Sep 15 17:06:12 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Mon, 15 Sep 2014 17:06:12 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5417713F.4090201@illinois.edu> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> Message-ID: I think at least segmentation name component needs to be defined. If we remove selectors, then how do we find the next segment name without a explicit segmentation name component? segmentation in my opinion is more important than other type like versioning. On Mon, Sep 15, 2014 at 4:07 PM, Abdelzaher, Tarek wrote: > Felix, > At the risk of spamming the list, let me elaborate. What I am arguing for is > that we do not need unified conventions. Different applications will find > solutions that are more appropriate to them. Otherwise, there will be too > many things to add and it is a slippery slope. Assume all we have are name > strings composed of a concatenation of sub-strings that, as far as the > network is concerned, are all of the same type (i.e., no special characters, > markers, delimiters, type values, etc), although from the application's > perspective can have different semantics. > > Hence, for example, if versions are important to my application, I should > have version numbers as part of the name space (e.g., > /path/data-name/version). To get a specific version, one can ask for it by > name (e.g., /path/data-name/version-label). To get the latest, there is an > issue with caching (how long should things be cached before they expire). > Barring that, one can ask for /path/data-name and have the interest be > forwarded to the provider assuming that stale versions expired from caches. > Alternatively, to make sure one bypasses caching, one can have a "no cache" > bit in the interest or ask for, say, > /path/data-name/random-unique-substring-encoding-a-request-to-the-app. Since > the random-unique-substring-encoding-a-request-to-the-app will not match any > cached names, the interest will be sent to the provider advertising the > prefix, forwarded up to the application (registered on the face exporting > the prefix) and invoke an application-layer function that will decode and > interpret the unique-substring-encoding-a-request-to-the-app. The function > will then answer the request by returning, say, the name of the requested > latest version to the client, so it can send a proper interest for the named > version. I am not suggesting that the above is necessarily a good idea. I am > just saying there can be many different ways to solve the problem that > depend on things such as how frequently versions are updated, how big the > objects are, etc. Hence, I am arguing for simplicity and generality by > making the underlying support as simple as possible. Philosophically > speaking, the more "common cases" we think of and the more complex we make > the underlying name format, the more assumptions we may be making that time > may break later, causing the work to be potentially more short-lived. > > Tarek > > > > On 9/15/2014 4:17 PM, Felix Rabe wrote: >> >> I agree with a need for simplicity. >> >> But maybe we cannot have simplicity so let's look at some use cases that >> might justify the complexity of the conventions: (I will only discuss >> versions and timestamps here, as they interest me for my work.) >> >> Versions: I can give someone a link to a document, and know that they will >> see the version I currently see, unmodified, even if there is a newer >> version. Of course, any participant can always get a newer version. >> - But: I think applications should come up with conventions in versioning, >> as one application does not need it, another needs sequential versioning, >> another (like Git-style source code management) needs a DAG. >> >> Timestamps: My measurements are a growing data set of (time, value) >> tuples. I can access multiple measurements across time for comparisons. >> - But: Also, this needs to be defined by applications, as >> time-since-unix-epoch is (I think) unsuitable for a wide range of >> applications (archaeology, astronomy), where both range and granularity >> requirements are different. Then, there is vector clocks which work with >> logical time instead of real time. >> - Another aspect is that this could just be a special case of versioning, >> using time as a version "number". >> >> Segments: ... >> Sequence: ... >> >> Just some thoughts >> - Felix >> >> On 15/Sep/14 22:05, Abdelzaher, Tarek wrote: >>> >>> Ok, here is a short perspective from a person who does not write network >>> code, but rather builds distributed applications: >>> >>> I feel squeamish about specifying naming conventions at all. I feel that >>> the design should favor simplicity and should not burden application >>> developers with having to understand what's better for the network. Hence, I >>> would argue in favor of names that are just series of bits organized into >>> substrings whose semantics are up to the application. I would not use any >>> special bits/characters or other special conventions. I would also most >>> definitely not embed any assumptions on name conventions into network-layer >>> code. >>> >>> As an application developer, I would like to be able to think about name >>> spaces the way I think of UNIX file-name hierarchies. To put it differently, >>> I do not want to read a tutorial on UNIX filename design guidelines in order >>> to ensure that the UNIX file system does file caching, block management, and >>> other functions efficiently for my application. I want file system plumbing >>> to be hidden from me, the application developer. A good design is one that >>> hides such plumbing from the application without impacting efficiency. Same >>> applies to NDN in my opinion. A discussion of special delimiters, markers, >>> etc, that enhances efficiency of certain network functions seems to be going >>> in the opposite direction from what makes NDN general and flexible. >>> >>> Tarek >>> >>> >>> On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: >>>> >>>> hi, >>>> Just some questions to know how people feel about it. >>>> 1. Do you like it or not? why? >>>> 2. Does it fit the need of your application? >>>> 3. What might be some possible changes (or even a big redesign) if you >>>> are asked to purpose a naming convention? >>>> 4. some other thoughts >>>> >>>> Feel free to answer any of the questions. >>>> Thanks >>>> >>>> >>>> [1] >>>> http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jburke at remap.ucla.edu Mon Sep 15 18:11:53 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Tue, 16 Sep 2014 01:11:53 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: Nacho, I guess I'm not following most of these questions - there are some musings (at least) in the document that try to sketch an alternate perspective to typed components, and most of these questions are covered in some way. I should qualify that they just represent my attempt to provide another side to the argument. Jeff On 9/15/14, 1:22 PM, "Ignacio.Solis at parc.com" wrote: >What is this ?type explosion? the document is talking about? > >Whether it is a type for a component, a command marker, or a marker >component, there needs to be some agreement on what the meaning of >type/marker is. If not, then it would just be a regular name component >for a single application. So, like Dave says, there needs to be some >registry or overall agreement in some form. > >Wouldn?t there be the same number of types as command markers? So you >would basically have ?command marker explosion?? Or is there something >special about command markers? > > >What we are discussing here (I hope) is, what is the best way for >applications to talk about specific meaning of components. (Versions, >Segments, etc). There are multiple ways to do this. > >- Typed named components are explicit and unambiguous. They always exist. >They can be the Generic type or some special type (like Version). The >allow arbitrary binary data in a component. >- Command markers may or may not exist. This leads to aliasing if you >allow arbitrary binary data in a component. If you don?t allow arbitrary >binary data in a component, then to solve aliasing you need to escape >potentially ambiguous data. >- Command components suffer from the same fate. > >If your data structures do not use component types, then you?ll have to >escape binary data to have a consistent type/marker system. > >Can somebody describe the disadvantage they see of using typed named >components? Is it the potential requirement to register? Is it the >limited type space? Is it the extra bytes? > >What is the benefit of command markers over typed named components? > > > > >Is it possible for applications to follow a convention that doesn?t create >aliasing? Yes, you could start creating rules that don?t allow binary data >of some value in some components if followed by some components, etc. So >now, any application that wants to put binary data has to check it?s own >data to make sure it doesn?t break other apps. > >Sometimes it seems to me that people assume that these names will be human >readable most of the time. Where is this assumption coming from? For all >we know everything will be binary. > >Taking my convert-to-CCN hat off for a second I think that it?s in your >best interest to use component markers, that way if you keep using prefix >matching at least you can request /foo/bar/v_/* and at least force the >network to get you something that is a version. Having said that, I still >think you?ll end up doing exact matching soon, but in the meantime might >as well take advantage of it. > >Nacho > > > > >On 9/15/14, 12:12 PM, "Burke, Jeff" wrote: > >> >>Hi all, >> >> >>On 9/15/14, 11:33 AM, "Mark Stapp" wrote: >> >>>interesting: >>> >>>On 9/15/14 1:41 PM, Tai-Lin Chu wrote: >>>> I think for naming convention, the community should come up with a >>>> solution that people "want" to use. >>> >>>I think that we should work toward specifying something that does what >>>we've learned (over several years of work) we need to do, and that does >>>not add any additional pain or unnecessary complexity. >> >>So - just to poke at this a bit - who's the "we" here, and whose pain and >>complexity? Application developers, network architects, equipment mfrs, >>or ? (And do they agree?) As you know, NDN takes an >>application-motivated >>approach to the development and evaluation of the architecture, and NSF >>has asked for this explicitly in their most recent RFP. If we could >>discuss some pros/cons from that perspective - perhaps via quick examples >>or case studies that have led you to this conclusion, it would be helpful >>(in my mind) to this current discussion. >> >>Attached are some previously shared thoughts on segmenting and versioning >>conventions related to this discussion, and some arguments against types, >>after page 2. Though Junxiao's marker type is an interesting new twist, I >>am so far unconvinced that it addresses all of the concerns in this doc. >> >>Note - these were not originally intended for a public discussion, at >>least not in this form, so be kind. :) I can also imagine that some of >>the >>discussion could be dismissed via "drop selectors and add manifests" but >>would suggest addressing the spirit of argument in light of the current >>NDN architecture instead. >> >>(Unfortunately I am on the road so not sure I can keep up with the >>discussion, but wanted to inject some other perspectives. :) >> >>Thanks, >>Jeff >> >>> >>>> >>>> Could you describe the ambiguous case? I think I know what you mean by >>>> ambiguous, but I just want to make sure. (I remembered someone from >>>> cisco mentioned this in ndncomm 2014.) >>>> >>>> To avoid ambiguity, I will start from name component tlvs and allocate >>>> more types for different meanings. >>>> >>>> 8 = regular name component >>>> = segmented name component >>>> >>> >>>yes, this is the approach I favor. typed name components allow us to >>>identify the 'infrastucture' data in name components, while allowing us >>>to set aside some number space for application-defined components. code >>>that just wants to compare names or identify component boundaries needs >>>to be able to treat the components as 'opaque', of course. >>> >>>Thanks, >>>Mark >>>_______________________________________________ >>>Ndn-interest mailing list >>>Ndn-interest at lists.cs.ucla.edu >>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > >-- >Nacho (Ignacio) Solis >Protocol Architect >Principal Scientist >Palo Alto Research Center (PARC) >+1(650)812-4458 >Ignacio.Solis at parc.com > From Ignacio.Solis at parc.com Mon Sep 15 19:13:49 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Tue, 16 Sep 2014 02:13:49 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5417713F.4090201@illinois.edu> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> Message-ID: Coordination of the namespace is needed in some way. This can be done directly or indirectly be it pure names, name types, markers or manifests. This is needed because you need a way for the transport protocol to communicate. What is the next item to retrieve? How do you open transport windows? When are you done transmitting? You could have every app re-implement a new transport protocol so it can talk to itself; creating new conventions for every type of communication. However, most apps will use a library and/or transport layer framework. These are inevitably creating agreements of some sort. Libraries that become popular create conventions. IP doesn?t define the use of port numbers, TCP does. Every app could implement a new transport protocol on top of IP, but instead, most use TCP or UDP. The same is true for NDN/CCN. In the case of CCN, we?ve decided that the lower ?network layer" offers the ability to use typed components. It (the network layer / base protocol), doesn?t make any assumptions about the meanings of the types (specially since we use exact matching). We define other protocols (like the chunking protocol or versioning protocol) that use one of the name component types to mean something. Yes, this involves a registry, but it basically means that once this settles as a standard, it is clear for any node or end device to know what the meaning is of this component type. Applications are always free to use the data portion of the names in any way they see fit. They can have their own conventions, markers, etc. As things become more and more settled and the community starts adopting them, they eventually convert into standards, potentially requiring a new type. For implementations of the transport (which I believe NDN calls this the library, of which there are 2 implementations?), the existence of these types means we know how to behave. This is specially important when you have multiple applications interacting (say, multiple browsers and a multiple servers). There was already an email on the list talking about how many applications do not care about what the network does. We strongly believe in this. We don?t think applications need to be reimplementing reliable delivery, in-oder delivery, dynamic window allocation, etc. This is done by the transport stack (in the case of CCN) or the client library (in the case of NDN). The APIs will need to reflect this. Please don?t confuse the ability of the applications to name the data with the names that the network needs to use to communicate at the interest/content-object level. Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/15/14, 4:07 PM, "Abdelzaher, Tarek" wrote: >Felix, >At the risk of spamming the list, let me elaborate. What I am arguing >for is that we do not need unified conventions. Different applications >will find solutions that are more appropriate to them. Otherwise, there >will be too many things to add and it is a slippery slope. Assume all we >have are name strings composed of a concatenation of sub-strings that, >as far as the network is concerned, are all of the same type (i.e., no >special characters, markers, delimiters, type values, etc), although >from the application's perspective can have different semantics. > >Hence, for example, if versions are important to my application, I >should have version numbers as part of the name space (e.g., >/path/data-name/version). To get a specific version, one can ask for it >by name (e.g., /path/data-name/version-label). To get the latest, there >is an issue with caching (how long should things be cached before they >expire). Barring that, one can ask for /path/data-name and have the >interest be forwarded to the provider assuming that stale versions >expired from caches. Alternatively, to make sure one bypasses caching, >one can have a "no cache" bit in the interest or ask for, say, >/path/data-name/random-unique-substring-encoding-a-request-to-the-app. >Since the random-unique-substring-encoding-a-request-to-the-app will not >match any cached names, the interest will be sent to the provider >advertising the prefix, forwarded up to the application (registered on >the face exporting the prefix) and invoke an application-layer function >that will decode and interpret the >unique-substring-encoding-a-request-to-the-app. The function will then >answer the request by returning, say, the name of the requested latest >version to the client, so it can send a proper interest for the named >version. I am not suggesting that the above is necessarily a good idea. >I am just saying there can be many different ways to solve the problem >that depend on things such as how frequently versions are updated, how >big the objects are, etc. Hence, I am arguing for simplicity and >generality by making the underlying support as simple as possible. >Philosophically speaking, the more "common cases" we think of and the >more complex we make the underlying name format, the more assumptions we >may be making that time may break later, causing the work to be >potentially more short-lived. > >Tarek > > >On 9/15/2014 4:17 PM, Felix Rabe wrote: >> I agree with a need for simplicity. >> >> But maybe we cannot have simplicity so let's look at some use cases >> that might justify the complexity of the conventions: (I will only >> discuss versions and timestamps here, as they interest me for my work.) >> >> Versions: I can give someone a link to a document, and know that they >> will see the version I currently see, unmodified, even if there is a >> newer version. Of course, any participant can always get a newer >>version. >> - But: I think applications should come up with conventions in >> versioning, as one application does not need it, another needs >> sequential versioning, another (like Git-style source code management) >> needs a DAG. >> >> Timestamps: My measurements are a growing data set of (time, value) >> tuples. I can access multiple measurements across time for comparisons. >> - But: Also, this needs to be defined by applications, as >> time-since-unix-epoch is (I think) unsuitable for a wide range of >> applications (archaeology, astronomy), where both range and >> granularity requirements are different. Then, there is vector clocks >> which work with logical time instead of real time. >> - Another aspect is that this could just be a special case of >> versioning, using time as a version "number". >> >> Segments: ... >> Sequence: ... >> >> Just some thoughts >> - Felix >> >> On 15/Sep/14 22:05, Abdelzaher, Tarek wrote: >>> Ok, here is a short perspective from a person who does not write >>> network code, but rather builds distributed applications: >>> >>> I feel squeamish about specifying naming conventions at all. I feel >>> that the design should favor simplicity and should not burden >>> application developers with having to understand what's better for >>> the network. Hence, I would argue in favor of names that are just >>> series of bits organized into substrings whose semantics are up to >>> the application. I would not use any special bits/characters or other >>> special conventions. I would also most definitely not embed any >>> assumptions on name conventions into network-layer code. >>> >>> As an application developer, I would like to be able to think about >>> name spaces the way I think of UNIX file-name hierarchies. To put it >>> differently, I do not want to read a tutorial on UNIX filename design >>> guidelines in order to ensure that the UNIX file system does file >>> caching, block management, and other functions efficiently for my >>> application. I want file system plumbing to be hidden from me, the >>> application developer. A good design is one that hides such plumbing >>> from the application without impacting efficiency. Same applies to >>> NDN in my opinion. A discussion of special delimiters, markers, etc, >>> that enhances efficiency of certain network functions seems to be >>> going in the opposite direction from what makes NDN general and >>> flexible. >>> >>> Tarek >>> >>> >>> On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: >>>> hi, >>>> Just some questions to know how people feel about it. >>>> 1. Do you like it or not? why? >>>> 2. Does it fit the need of your application? >>>> 3. What might be some possible changes (or even a big redesign) if you >>>> are asked to purpose a naming convention? >>>> 4. some other thoughts >>>> >>>> Feel free to answer any of the questions. >>>> Thanks >>>> >>>> >>>> [1] >>>> >>>>http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-nam >>>>ing-conventions.pdf >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Mon Sep 15 19:45:59 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Tue, 16 Sep 2014 02:45:59 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: Just as a clarification to the document, the current CCNx (1.0) uses typed components, we abandoned command markers. I read the document more throughly. According to page 8, the disadvantages to (#1) typed name components are: - ?type explosion? (still unsure where that is coming from), - defining a few components, - defining a way to add more, - define a URI representation, - defining what to do with unknown types. Aren?t all of these a problem with all options? The disadvantages to (#2) extra components: - aliasing (components can be ambiguous) The disadvantages to (#3) command markers: - aliasing (components can be ambiguous) Isn?t this a good case for typed components? To get rid of aliasing you?re basically going to require extra components or markers before every other component, effectively using types. Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/15/14, 6:11 PM, "Burke, Jeff" wrote: >Nacho, >I guess I'm not following most of these questions - there are some musings >(at least) in the document that try to sketch an alternate perspective to >typed components, and most of these questions are covered in some way. I >should qualify that they just represent my attempt to provide another side >to the argument. >Jeff > >On 9/15/14, 1:22 PM, "Ignacio.Solis at parc.com" >wrote: > >>What is this ?type explosion? the document is talking about? >> >>Whether it is a type for a component, a command marker, or a marker >>component, there needs to be some agreement on what the meaning of >>type/marker is. If not, then it would just be a regular name component >>for a single application. So, like Dave says, there needs to be some >>registry or overall agreement in some form. >> >>Wouldn?t there be the same number of types as command markers? So you >>would basically have ?command marker explosion?? Or is there something >>special about command markers? >> >> >>What we are discussing here (I hope) is, what is the best way for >>applications to talk about specific meaning of components. (Versions, >>Segments, etc). There are multiple ways to do this. >> >>- Typed named components are explicit and unambiguous. They always exist. >>They can be the Generic type or some special type (like Version). The >>allow arbitrary binary data in a component. >>- Command markers may or may not exist. This leads to aliasing if you >>allow arbitrary binary data in a component. If you don?t allow arbitrary >>binary data in a component, then to solve aliasing you need to escape >>potentially ambiguous data. >>- Command components suffer from the same fate. >> >>If your data structures do not use component types, then you?ll have to >>escape binary data to have a consistent type/marker system. >> >>Can somebody describe the disadvantage they see of using typed named >>components? Is it the potential requirement to register? Is it the >>limited type space? Is it the extra bytes? >> >>What is the benefit of command markers over typed named components? >> >> >> >> >>Is it possible for applications to follow a convention that doesn?t >>create >>aliasing? Yes, you could start creating rules that don?t allow binary >>data >>of some value in some components if followed by some components, etc. So >>now, any application that wants to put binary data has to check it?s own >>data to make sure it doesn?t break other apps. >> >>Sometimes it seems to me that people assume that these names will be >>human >>readable most of the time. Where is this assumption coming from? For all >>we know everything will be binary. >> >>Taking my convert-to-CCN hat off for a second I think that it?s in your >>best interest to use component markers, that way if you keep using prefix >>matching at least you can request /foo/bar/v_/* and at least force the >>network to get you something that is a version. Having said that, I still >>think you?ll end up doing exact matching soon, but in the meantime might >>as well take advantage of it. >> >>Nacho >> >> >> >> >>On 9/15/14, 12:12 PM, "Burke, Jeff" wrote: >> >>> >>>Hi all, >>> >>> >>>On 9/15/14, 11:33 AM, "Mark Stapp" wrote: >>> >>>>interesting: >>>> >>>>On 9/15/14 1:41 PM, Tai-Lin Chu wrote: >>>>> I think for naming convention, the community should come up with a >>>>> solution that people "want" to use. >>>> >>>>I think that we should work toward specifying something that does what >>>>we've learned (over several years of work) we need to do, and that does >>>>not add any additional pain or unnecessary complexity. >>> >>>So - just to poke at this a bit - who's the "we" here, and whose pain >>>and >>>complexity? Application developers, network architects, equipment mfrs, >>>or ? (And do they agree?) As you know, NDN takes an >>>application-motivated >>>approach to the development and evaluation of the architecture, and NSF >>>has asked for this explicitly in their most recent RFP. If we could >>>discuss some pros/cons from that perspective - perhaps via quick >>>examples >>>or case studies that have led you to this conclusion, it would be >>>helpful >>>(in my mind) to this current discussion. >>> >>>Attached are some previously shared thoughts on segmenting and >>>versioning >>>conventions related to this discussion, and some arguments against >>>types, >>>after page 2. Though Junxiao's marker type is an interesting new twist, >>>I >>>am so far unconvinced that it addresses all of the concerns in this doc. >>> >>>Note - these were not originally intended for a public discussion, at >>>least not in this form, so be kind. :) I can also imagine that some of >>>the >>>discussion could be dismissed via "drop selectors and add manifests" but >>>would suggest addressing the spirit of argument in light of the current >>>NDN architecture instead. >>> >>>(Unfortunately I am on the road so not sure I can keep up with the >>>discussion, but wanted to inject some other perspectives. :) >>> >>>Thanks, >>>Jeff >>> >>>> >>>>> >>>>> Could you describe the ambiguous case? I think I know what you mean >>>>>by >>>>> ambiguous, but I just want to make sure. (I remembered someone from >>>>> cisco mentioned this in ndncomm 2014.) >>>>> >>>>> To avoid ambiguity, I will start from name component tlvs and >>>>>allocate >>>>> more types for different meanings. >>>>> >>>>> 8 = regular name component >>>>> = segmented name component >>>>> >>>> >>>>yes, this is the approach I favor. typed name components allow us to >>>>identify the 'infrastucture' data in name components, while allowing us >>>>to set aside some number space for application-defined components. code >>>>that just wants to compare names or identify component boundaries needs >>>>to be able to treat the components as 'opaque', of course. >>>> >>>>Thanks, >>>>Mark >>>>_______________________________________________ >>>>Ndn-interest mailing list >>>>Ndn-interest at lists.cs.ucla.edu >>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> >>-- >>Nacho (Ignacio) Solis >>Protocol Architect >>Principal Scientist >>Palo Alto Research Center (PARC) >>+1(650)812-4458 >>Ignacio.Solis at parc.com >> > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From tailinchu at gmail.com Mon Sep 15 20:37:39 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Mon, 15 Sep 2014 20:37:39 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: First, I am all for typed name component at this moment, but I am still open to better solutions if exist. type explosion might mean that (I guessed based on the content in attachment ): 1. we give away all type reserved to developers in name components, and there is no way to take it back. 2. ..and we soon run out of type that the architecture can define in the future. But what we really need is maybe just a handful extra types for components (Maybe first for segment). We can define more as it fits, so there is no explosion will happen. Even if this is the case, ccnx will run out of type (2^16 types) much faster than ndn(2^64 types). On Mon, Sep 15, 2014 at 7:45 PM, wrote: > Just as a clarification to the document, the current CCNx (1.0) uses typed > components, we abandoned command markers. > > I read the document more throughly. According to page 8, the disadvantages > to (#1) typed name components are: > - ?type explosion? (still unsure where that is coming from), > - defining a few components, > - defining a way to add more, > - define a URI representation, > - defining what to do with unknown types. > > Aren?t all of these a problem with all options? > > The disadvantages to (#2) extra components: > - aliasing (components can be ambiguous) > > The disadvantages to (#3) command markers: > - aliasing (components can be ambiguous) > > > Isn?t this a good case for typed components? > > To get rid of aliasing you?re basically going to require extra components > or markers before every other component, effectively using types. > > > Nacho > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/15/14, 6:11 PM, "Burke, Jeff" wrote: > >>Nacho, >>I guess I'm not following most of these questions - there are some musings >>(at least) in the document that try to sketch an alternate perspective to >>typed components, and most of these questions are covered in some way. I >>should qualify that they just represent my attempt to provide another side >>to the argument. >>Jeff >> >>On 9/15/14, 1:22 PM, "Ignacio.Solis at parc.com" >>wrote: >> >>>What is this ?type explosion? the document is talking about? >>> >>>Whether it is a type for a component, a command marker, or a marker >>>component, there needs to be some agreement on what the meaning of >>>type/marker is. If not, then it would just be a regular name component >>>for a single application. So, like Dave says, there needs to be some >>>registry or overall agreement in some form. >>> >>>Wouldn?t there be the same number of types as command markers? So you >>>would basically have ?command marker explosion?? Or is there something >>>special about command markers? >>> >>> >>>What we are discussing here (I hope) is, what is the best way for >>>applications to talk about specific meaning of components. (Versions, >>>Segments, etc). There are multiple ways to do this. >>> >>>- Typed named components are explicit and unambiguous. They always exist. >>>They can be the Generic type or some special type (like Version). The >>>allow arbitrary binary data in a component. >>>- Command markers may or may not exist. This leads to aliasing if you >>>allow arbitrary binary data in a component. If you don?t allow arbitrary >>>binary data in a component, then to solve aliasing you need to escape >>>potentially ambiguous data. >>>- Command components suffer from the same fate. >>> >>>If your data structures do not use component types, then you?ll have to >>>escape binary data to have a consistent type/marker system. >>> >>>Can somebody describe the disadvantage they see of using typed named >>>components? Is it the potential requirement to register? Is it the >>>limited type space? Is it the extra bytes? >>> >>>What is the benefit of command markers over typed named components? >>> >>> >>> >>> >>>Is it possible for applications to follow a convention that doesn?t >>>create >>>aliasing? Yes, you could start creating rules that don?t allow binary >>>data >>>of some value in some components if followed by some components, etc. So >>>now, any application that wants to put binary data has to check it?s own >>>data to make sure it doesn?t break other apps. >>> >>>Sometimes it seems to me that people assume that these names will be >>>human >>>readable most of the time. Where is this assumption coming from? For all >>>we know everything will be binary. >>> >>>Taking my convert-to-CCN hat off for a second I think that it?s in your >>>best interest to use component markers, that way if you keep using prefix >>>matching at least you can request /foo/bar/v_/* and at least force the >>>network to get you something that is a version. Having said that, I still >>>think you?ll end up doing exact matching soon, but in the meantime might >>>as well take advantage of it. >>> >>>Nacho >>> >>> >>> >>> >>>On 9/15/14, 12:12 PM, "Burke, Jeff" wrote: >>> >>>> >>>>Hi all, >>>> >>>> >>>>On 9/15/14, 11:33 AM, "Mark Stapp" wrote: >>>> >>>>>interesting: >>>>> >>>>>On 9/15/14 1:41 PM, Tai-Lin Chu wrote: >>>>>> I think for naming convention, the community should come up with a >>>>>> solution that people "want" to use. >>>>> >>>>>I think that we should work toward specifying something that does what >>>>>we've learned (over several years of work) we need to do, and that does >>>>>not add any additional pain or unnecessary complexity. >>>> >>>>So - just to poke at this a bit - who's the "we" here, and whose pain >>>>and >>>>complexity? Application developers, network architects, equipment mfrs, >>>>or ? (And do they agree?) As you know, NDN takes an >>>>application-motivated >>>>approach to the development and evaluation of the architecture, and NSF >>>>has asked for this explicitly in their most recent RFP. If we could >>>>discuss some pros/cons from that perspective - perhaps via quick >>>>examples >>>>or case studies that have led you to this conclusion, it would be >>>>helpful >>>>(in my mind) to this current discussion. >>>> >>>>Attached are some previously shared thoughts on segmenting and >>>>versioning >>>>conventions related to this discussion, and some arguments against >>>>types, >>>>after page 2. Though Junxiao's marker type is an interesting new twist, >>>>I >>>>am so far unconvinced that it addresses all of the concerns in this doc. >>>> >>>>Note - these were not originally intended for a public discussion, at >>>>least not in this form, so be kind. :) I can also imagine that some of >>>>the >>>>discussion could be dismissed via "drop selectors and add manifests" but >>>>would suggest addressing the spirit of argument in light of the current >>>>NDN architecture instead. >>>> >>>>(Unfortunately I am on the road so not sure I can keep up with the >>>>discussion, but wanted to inject some other perspectives. :) >>>> >>>>Thanks, >>>>Jeff >>>> >>>>> >>>>>> >>>>>> Could you describe the ambiguous case? I think I know what you mean >>>>>>by >>>>>> ambiguous, but I just want to make sure. (I remembered someone from >>>>>> cisco mentioned this in ndncomm 2014.) >>>>>> >>>>>> To avoid ambiguity, I will start from name component tlvs and >>>>>>allocate >>>>>> more types for different meanings. >>>>>> >>>>>> 8 = regular name component >>>>>> = segmented name component >>>>>> >>>>> >>>>>yes, this is the approach I favor. typed name components allow us to >>>>>identify the 'infrastucture' data in name components, while allowing us >>>>>to set aside some number space for application-defined components. code >>>>>that just wants to compare names or identify component boundaries needs >>>>>to be able to treat the components as 'opaque', of course. >>>>> >>>>>Thanks, >>>>>Mark >>>>>_______________________________________________ >>>>>Ndn-interest mailing list >>>>>Ndn-interest at lists.cs.ucla.edu >>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>> >>>-- >>>Nacho (Ignacio) Solis >>>Protocol Architect >>>Principal Scientist >>>Palo Alto Research Center (PARC) >>>+1(650)812-4458 >>>Ignacio.Solis at parc.com >>> >> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From zaher at illinois.EDU Mon Sep 15 20:11:24 2014 From: zaher at illinois.EDU (Abdelzaher, Tarek) Date: Mon, 15 Sep 2014 22:11:24 -0500 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> Message-ID: <5417AA5C.8020906@illinois.edu> Yes, naming conventions make sense in the context of a trasnport layer protocol. This to me is below the application layer, however, so I was answering from the perspective of a higher layer. From the perspective of transport layer design, we absolutely need conventions that only the transport protocol is aware of. In principle, there may be multiple transport layer protocols using different conventions optimized for the need of different types of applications. In the transport protocol that we are currently designing at UIUC, we assumed that all the network knows is just strings with no assumptions on structure beyond the fact that longer prefixes mean a better match (i.e., a hierarchical structure). On top of that, the transport layer uses a few naming conventions, but that's a long discussion, probably too long for email. In general, this is a topic of interest for me. We are a little isolated here at UIUC, but I'd be happy to get involved in any telecons or other (higher bandwidth) discussions on how names can be exploited by the transport protocol for different application purposes. Our specific interest is, let is say, "big data" applications; applications that rely primarily on sampling larger data sets for various purposes (because the totality of data is overwhelming). Sorry for the digression from the naming discussion. It appears to me that if we are discussing transport layer design in the naming conventions document, it may also be good to preface it with the types of applications the particular transport is designed for. Real-time streaming applications, for example, may need different conventions from offline data retrieval applications, and applications that need reliability may favor different conventions from those who don't... at least in principle. There is no reason to believe that the same set of conventions will optimally suit all applications. This is another reason, btw, why the network layer should remain completely independent of any naming assumptions and conventions made at the transport layer - the simpler the better. Tarek On 9/15/2014 9:13 PM, Ignacio.Solis at parc.com wrote: > Coordination of the namespace is needed in some way. This can be done > directly or indirectly be it pure names, name types, markers or manifests. > This is needed because you need a way for the transport protocol to > communicate. What is the next item to retrieve? How do you open transport > windows? When are you done transmitting? > > You could have every app re-implement a new transport protocol so it can > talk to itself; creating new conventions for every type of communication. > However, most apps will use a library and/or transport layer framework. > These are inevitably creating agreements of some sort. Libraries that > become popular create conventions. > > IP doesn?t define the use of port numbers, TCP does. Every app could > implement a new transport protocol on top of IP, but instead, most use TCP > or UDP. The same is true for NDN/CCN. In the case of CCN, we?ve decided > that the lower ?network layer" offers the ability to use typed components. > It (the network layer / base protocol), doesn?t make any assumptions > about the meanings of the types (specially since we use exact matching). > We define other protocols (like the chunking protocol or versioning > protocol) that use one of the name component types to mean something. > Yes, this involves a registry, but it basically means that once this > settles as a standard, it is clear for any node or end device to know what > the meaning is of this component type. > > Applications are always free to use the data portion of the names in any > way they see fit. They can have their own conventions, markers, etc. As > things become more and more settled and the community starts adopting > them, they eventually convert into standards, potentially requiring a new > type. > > For implementations of the transport (which I believe NDN calls this the > library, of which there are 2 implementations?), the existence of these > types means we know how to behave. This is specially important when you > have multiple applications interacting (say, multiple browsers and a > multiple servers). > > > There was already an email on the list talking about how many applications > do not care about what the network does. We strongly believe in this. We > don?t think applications need to be reimplementing reliable delivery, > in-oder delivery, dynamic window allocation, etc. This is done by the > transport stack (in the case of CCN) or the client library (in the case of > NDN). The APIs will need to reflect this. > > Please don?t confuse the ability of the applications to name the data with > the names that the network needs to use to communicate at the > interest/content-object level. > > > Nacho > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/15/14, 4:07 PM, "Abdelzaher, Tarek" wrote: > >> Felix, >> At the risk of spamming the list, let me elaborate. What I am arguing >> for is that we do not need unified conventions. Different applications >> will find solutions that are more appropriate to them. Otherwise, there >> will be too many things to add and it is a slippery slope. Assume all we >> have are name strings composed of a concatenation of sub-strings that, >> as far as the network is concerned, are all of the same type (i.e., no >> special characters, markers, delimiters, type values, etc), although > >from the application's perspective can have different semantics. >> Hence, for example, if versions are important to my application, I >> should have version numbers as part of the name space (e.g., >> /path/data-name/version). To get a specific version, one can ask for it >> by name (e.g., /path/data-name/version-label). To get the latest, there >> is an issue with caching (how long should things be cached before they >> expire). Barring that, one can ask for /path/data-name and have the >> interest be forwarded to the provider assuming that stale versions >> expired from caches. Alternatively, to make sure one bypasses caching, >> one can have a "no cache" bit in the interest or ask for, say, >> /path/data-name/random-unique-substring-encoding-a-request-to-the-app. >> Since the random-unique-substring-encoding-a-request-to-the-app will not >> match any cached names, the interest will be sent to the provider >> advertising the prefix, forwarded up to the application (registered on >> the face exporting the prefix) and invoke an application-layer function >> that will decode and interpret the >> unique-substring-encoding-a-request-to-the-app. The function will then >> answer the request by returning, say, the name of the requested latest >> version to the client, so it can send a proper interest for the named >> version. I am not suggesting that the above is necessarily a good idea. >> I am just saying there can be many different ways to solve the problem >> that depend on things such as how frequently versions are updated, how >> big the objects are, etc. Hence, I am arguing for simplicity and >> generality by making the underlying support as simple as possible. >> Philosophically speaking, the more "common cases" we think of and the >> more complex we make the underlying name format, the more assumptions we >> may be making that time may break later, causing the work to be >> potentially more short-lived. >> >> Tarek >> >> >> On 9/15/2014 4:17 PM, Felix Rabe wrote: >>> I agree with a need for simplicity. >>> >>> But maybe we cannot have simplicity so let's look at some use cases >>> that might justify the complexity of the conventions: (I will only >>> discuss versions and timestamps here, as they interest me for my work.) >>> >>> Versions: I can give someone a link to a document, and know that they >>> will see the version I currently see, unmodified, even if there is a >>> newer version. Of course, any participant can always get a newer >>> version. >>> - But: I think applications should come up with conventions in >>> versioning, as one application does not need it, another needs >>> sequential versioning, another (like Git-style source code management) >>> needs a DAG. >>> >>> Timestamps: My measurements are a growing data set of (time, value) >>> tuples. I can access multiple measurements across time for comparisons. >>> - But: Also, this needs to be defined by applications, as >>> time-since-unix-epoch is (I think) unsuitable for a wide range of >>> applications (archaeology, astronomy), where both range and >>> granularity requirements are different. Then, there is vector clocks >>> which work with logical time instead of real time. >>> - Another aspect is that this could just be a special case of >>> versioning, using time as a version "number". >>> >>> Segments: ... >>> Sequence: ... >>> >>> Just some thoughts >>> - Felix >>> >>> On 15/Sep/14 22:05, Abdelzaher, Tarek wrote: >>>> Ok, here is a short perspective from a person who does not write >>>> network code, but rather builds distributed applications: >>>> >>>> I feel squeamish about specifying naming conventions at all. I feel >>>> that the design should favor simplicity and should not burden >>>> application developers with having to understand what's better for >>>> the network. Hence, I would argue in favor of names that are just >>>> series of bits organized into substrings whose semantics are up to >>>> the application. I would not use any special bits/characters or other >>>> special conventions. I would also most definitely not embed any >>>> assumptions on name conventions into network-layer code. >>>> >>>> As an application developer, I would like to be able to think about >>>> name spaces the way I think of UNIX file-name hierarchies. To put it >>>> differently, I do not want to read a tutorial on UNIX filename design >>>> guidelines in order to ensure that the UNIX file system does file >>>> caching, block management, and other functions efficiently for my >>>> application. I want file system plumbing to be hidden from me, the >>>> application developer. A good design is one that hides such plumbing >>>> from the application without impacting efficiency. Same applies to >>>> NDN in my opinion. A discussion of special delimiters, markers, etc, >>>> that enhances efficiency of certain network functions seems to be >>>> going in the opposite direction from what makes NDN general and >>>> flexible. >>>> >>>> Tarek >>>> >>>> >>>> On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: >>>>> hi, >>>>> Just some questions to know how people feel about it. >>>>> 1. Do you like it or not? why? >>>>> 2. Does it fit the need of your application? >>>>> 3. What might be some possible changes (or even a big redesign) if you >>>>> are asked to purpose a naming convention? >>>>> 4. some other thoughts >>>>> >>>>> Feel free to answer any of the questions. >>>>> Thanks >>>>> >>>>> >>>>> [1] >>>>> >>>>> http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-nam >>>>> ing-conventions.pdf >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From zaher at illinois.EDU Mon Sep 15 20:11:24 2014 From: zaher at illinois.EDU (Abdelzaher, Tarek) Date: Mon, 15 Sep 2014 22:11:24 -0500 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> Message-ID: <5417AA5C.8020906@illinois.edu> Yes, naming conventions make sense in the context of a trasnport layer protocol. This to me is below the application layer, however, so I was answering from the perspective of a higher layer. From the perspective of transport layer design, we absolutely need conventions that only the transport protocol is aware of. In principle, there may be multiple transport layer protocols using different conventions optimized for the need of different types of applications. In the transport protocol that we are currently designing at UIUC, we assumed that all the network knows is just strings with no assumptions on structure beyond the fact that longer prefixes mean a better match (i.e., a hierarchical structure). On top of that, the transport layer uses a few naming conventions, but that's a long discussion, probably too long for email. In general, this is a topic of interest for me. We are a little isolated here at UIUC, but I'd be happy to get involved in any telecons or other (higher bandwidth) discussions on how names can be exploited by the transport protocol for different application purposes. Our specific interest is, let is say, "big data" applications; applications that rely primarily on sampling larger data sets for various purposes (because the totality of data is overwhelming). Sorry for the digression from the naming discussion. It appears to me that if we are discussing transport layer design in the naming conventions document, it may also be good to preface it with the types of applications the particular transport is designed for. Real-time streaming applications, for example, may need different conventions from offline data retrieval applications, and applications that need reliability may favor different conventions from those who don't... at least in principle. There is no reason to believe that the same set of conventions will optimally suit all applications. This is another reason, btw, why the network layer should remain completely independent of any naming assumptions and conventions made at the transport layer - the simpler the better. Tarek On 9/15/2014 9:13 PM, Ignacio.Solis at parc.com wrote: > Coordination of the namespace is needed in some way. This can be done > directly or indirectly be it pure names, name types, markers or manifests. > This is needed because you need a way for the transport protocol to > communicate. What is the next item to retrieve? How do you open transport > windows? When are you done transmitting? > > You could have every app re-implement a new transport protocol so it can > talk to itself; creating new conventions for every type of communication. > However, most apps will use a library and/or transport layer framework. > These are inevitably creating agreements of some sort. Libraries that > become popular create conventions. > > IP doesn?t define the use of port numbers, TCP does. Every app could > implement a new transport protocol on top of IP, but instead, most use TCP > or UDP. The same is true for NDN/CCN. In the case of CCN, we?ve decided > that the lower ?network layer" offers the ability to use typed components. > It (the network layer / base protocol), doesn?t make any assumptions > about the meanings of the types (specially since we use exact matching). > We define other protocols (like the chunking protocol or versioning > protocol) that use one of the name component types to mean something. > Yes, this involves a registry, but it basically means that once this > settles as a standard, it is clear for any node or end device to know what > the meaning is of this component type. > > Applications are always free to use the data portion of the names in any > way they see fit. They can have their own conventions, markers, etc. As > things become more and more settled and the community starts adopting > them, they eventually convert into standards, potentially requiring a new > type. > > For implementations of the transport (which I believe NDN calls this the > library, of which there are 2 implementations?), the existence of these > types means we know how to behave. This is specially important when you > have multiple applications interacting (say, multiple browsers and a > multiple servers). > > > There was already an email on the list talking about how many applications > do not care about what the network does. We strongly believe in this. We > don?t think applications need to be reimplementing reliable delivery, > in-oder delivery, dynamic window allocation, etc. This is done by the > transport stack (in the case of CCN) or the client library (in the case of > NDN). The APIs will need to reflect this. > > Please don?t confuse the ability of the applications to name the data with > the names that the network needs to use to communicate at the > interest/content-object level. > > > Nacho > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/15/14, 4:07 PM, "Abdelzaher, Tarek" wrote: > >> Felix, >> At the risk of spamming the list, let me elaborate. What I am arguing >> for is that we do not need unified conventions. Different applications >> will find solutions that are more appropriate to them. Otherwise, there >> will be too many things to add and it is a slippery slope. Assume all we >> have are name strings composed of a concatenation of sub-strings that, >> as far as the network is concerned, are all of the same type (i.e., no >> special characters, markers, delimiters, type values, etc), although > >from the application's perspective can have different semantics. >> Hence, for example, if versions are important to my application, I >> should have version numbers as part of the name space (e.g., >> /path/data-name/version). To get a specific version, one can ask for it >> by name (e.g., /path/data-name/version-label). To get the latest, there >> is an issue with caching (how long should things be cached before they >> expire). Barring that, one can ask for /path/data-name and have the >> interest be forwarded to the provider assuming that stale versions >> expired from caches. Alternatively, to make sure one bypasses caching, >> one can have a "no cache" bit in the interest or ask for, say, >> /path/data-name/random-unique-substring-encoding-a-request-to-the-app. >> Since the random-unique-substring-encoding-a-request-to-the-app will not >> match any cached names, the interest will be sent to the provider >> advertising the prefix, forwarded up to the application (registered on >> the face exporting the prefix) and invoke an application-layer function >> that will decode and interpret the >> unique-substring-encoding-a-request-to-the-app. The function will then >> answer the request by returning, say, the name of the requested latest >> version to the client, so it can send a proper interest for the named >> version. I am not suggesting that the above is necessarily a good idea. >> I am just saying there can be many different ways to solve the problem >> that depend on things such as how frequently versions are updated, how >> big the objects are, etc. Hence, I am arguing for simplicity and >> generality by making the underlying support as simple as possible. >> Philosophically speaking, the more "common cases" we think of and the >> more complex we make the underlying name format, the more assumptions we >> may be making that time may break later, causing the work to be >> potentially more short-lived. >> >> Tarek >> >> >> On 9/15/2014 4:17 PM, Felix Rabe wrote: >>> I agree with a need for simplicity. >>> >>> But maybe we cannot have simplicity so let's look at some use cases >>> that might justify the complexity of the conventions: (I will only >>> discuss versions and timestamps here, as they interest me for my work.) >>> >>> Versions: I can give someone a link to a document, and know that they >>> will see the version I currently see, unmodified, even if there is a >>> newer version. Of course, any participant can always get a newer >>> version. >>> - But: I think applications should come up with conventions in >>> versioning, as one application does not need it, another needs >>> sequential versioning, another (like Git-style source code management) >>> needs a DAG. >>> >>> Timestamps: My measurements are a growing data set of (time, value) >>> tuples. I can access multiple measurements across time for comparisons. >>> - But: Also, this needs to be defined by applications, as >>> time-since-unix-epoch is (I think) unsuitable for a wide range of >>> applications (archaeology, astronomy), where both range and >>> granularity requirements are different. Then, there is vector clocks >>> which work with logical time instead of real time. >>> - Another aspect is that this could just be a special case of >>> versioning, using time as a version "number". >>> >>> Segments: ... >>> Sequence: ... >>> >>> Just some thoughts >>> - Felix >>> >>> On 15/Sep/14 22:05, Abdelzaher, Tarek wrote: >>>> Ok, here is a short perspective from a person who does not write >>>> network code, but rather builds distributed applications: >>>> >>>> I feel squeamish about specifying naming conventions at all. I feel >>>> that the design should favor simplicity and should not burden >>>> application developers with having to understand what's better for >>>> the network. Hence, I would argue in favor of names that are just >>>> series of bits organized into substrings whose semantics are up to >>>> the application. I would not use any special bits/characters or other >>>> special conventions. I would also most definitely not embed any >>>> assumptions on name conventions into network-layer code. >>>> >>>> As an application developer, I would like to be able to think about >>>> name spaces the way I think of UNIX file-name hierarchies. To put it >>>> differently, I do not want to read a tutorial on UNIX filename design >>>> guidelines in order to ensure that the UNIX file system does file >>>> caching, block management, and other functions efficiently for my >>>> application. I want file system plumbing to be hidden from me, the >>>> application developer. A good design is one that hides such plumbing >>>> from the application without impacting efficiency. Same applies to >>>> NDN in my opinion. A discussion of special delimiters, markers, etc, >>>> that enhances efficiency of certain network functions seems to be >>>> going in the opposite direction from what makes NDN general and >>>> flexible. >>>> >>>> Tarek >>>> >>>> >>>> On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: >>>>> hi, >>>>> Just some questions to know how people feel about it. >>>>> 1. Do you like it or not? why? >>>>> 2. Does it fit the need of your application? >>>>> 3. What might be some possible changes (or even a big redesign) if you >>>>> are asked to purpose a naming convention? >>>>> 4. some other thoughts >>>>> >>>>> Feel free to answer any of the questions. >>>>> Thanks >>>>> >>>>> >>>>> [1] >>>>> >>>>> http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-nam >>>>> ing-conventions.pdf >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From felix at rabe.io Tue Sep 16 00:34:59 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 16 Sep 2014 09:34:59 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5417713F.4090201@illinois.edu> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> Message-ID: <5417E823.70306@rabe.io> To clarify: I agree with you. The "But" sentences are my arguments against conventions at NDN level for versions and timestamps. On 16/Sep/14 01:07, Abdelzaher, Tarek wrote: > Felix, > At the risk of spamming the list, let me elaborate. What I am arguing > for is that we do not need unified conventions. Different applications > will find solutions that are more appropriate to them. Otherwise, > there will be too many things to add and it is a slippery slope. > Assume all we have are name strings composed of a concatenation of > sub-strings that, as far as the network is concerned, are all of the > same type (i.e., no special characters, markers, delimiters, type > values, etc), although from the application's perspective can have > different semantics. > > Hence, for example, if versions are important to my application, I > should have version numbers as part of the name space (e.g., > /path/data-name/version). To get a specific version, one can ask for > it by name (e.g., /path/data-name/version-label). To get the latest, > there is an issue with caching (how long should things be cached > before they expire). Barring that, one can ask for /path/data-name and > have the interest be forwarded to the provider assuming that stale > versions expired from caches. Alternatively, to make sure one bypasses > caching, one can have a "no cache" bit in the interest or ask for, > say, > /path/data-name/random-unique-substring-encoding-a-request-to-the-app. > Since the random-unique-substring-encoding-a-request-to-the-app will > not match any cached names, the interest will be sent to the provider > advertising the prefix, forwarded up to the application (registered on > the face exporting the prefix) and invoke an application-layer > function that will decode and interpret the > unique-substring-encoding-a-request-to-the-app. The function will then > answer the request by returning, say, the name of the requested latest > version to the client, so it can send a proper interest for the named > version. I am not suggesting that the above is necessarily a good > idea. I am just saying there can be many different ways to solve the > problem that depend on things such as how frequently versions are > updated, how big the objects are, etc. Hence, I am arguing for > simplicity and generality by making the underlying support as simple > as possible. Philosophically speaking, the more "common cases" we > think of and the more complex we make the underlying name format, the > more assumptions we may be making that time may break later, causing > the work to be potentially more short-lived. > > Tarek > > > On 9/15/2014 4:17 PM, Felix Rabe wrote: >> I agree with a need for simplicity. >> >> But maybe we cannot have simplicity so let's look at some use cases >> that might justify the complexity of the conventions: (I will only >> discuss versions and timestamps here, as they interest me for my work.) >> >> Versions: I can give someone a link to a document, and know that they >> will see the version I currently see, unmodified, even if there is a >> newer version. Of course, any participant can always get a newer >> version. >> - But: I think applications should come up with conventions in >> versioning, as one application does not need it, another needs >> sequential versioning, another (like Git-style source code >> management) needs a DAG. >> >> Timestamps: My measurements are a growing data set of (time, value) >> tuples. I can access multiple measurements across time for comparisons. >> - But: Also, this needs to be defined by applications, as >> time-since-unix-epoch is (I think) unsuitable for a wide range of >> applications (archaeology, astronomy), where both range and >> granularity requirements are different. Then, there is vector clocks >> which work with logical time instead of real time. >> - Another aspect is that this could just be a special case of >> versioning, using time as a version "number". >> >> Segments: ... >> Sequence: ... >> >> Just some thoughts >> - Felix >> >> On 15/Sep/14 22:05, Abdelzaher, Tarek wrote: >>> Ok, here is a short perspective from a person who does not write >>> network code, but rather builds distributed applications: >>> >>> I feel squeamish about specifying naming conventions at all. I feel >>> that the design should favor simplicity and should not burden >>> application developers with having to understand what's better for >>> the network. Hence, I would argue in favor of names that are just >>> series of bits organized into substrings whose semantics are up to >>> the application. I would not use any special bits/characters or >>> other special conventions. I would also most definitely not embed >>> any assumptions on name conventions into network-layer code. >>> >>> As an application developer, I would like to be able to think about >>> name spaces the way I think of UNIX file-name hierarchies. To put it >>> differently, I do not want to read a tutorial on UNIX filename >>> design guidelines in order to ensure that the UNIX file system does >>> file caching, block management, and other functions efficiently for >>> my application. I want file system plumbing to be hidden from me, >>> the application developer. A good design is one that hides such >>> plumbing from the application without impacting efficiency. Same >>> applies to NDN in my opinion. A discussion of special delimiters, >>> markers, etc, that enhances efficiency of certain network functions >>> seems to be going in the opposite direction from what makes NDN >>> general and flexible. >>> >>> Tarek >>> >>> >>> On 9/14/2014 10:39 PM, Tai-Lin Chu wrote: >>>> hi, >>>> Just some questions to know how people feel about it. >>>> 1. Do you like it or not? why? >>>> 2. Does it fit the need of your application? >>>> 3. What might be some possible changes (or even a big redesign) if you >>>> are asked to purpose a naming convention? >>>> 4. some other thoughts >>>> >>>> Feel free to answer any of the questions. >>>> Thanks >>>> >>>> >>>> [1] >>>> http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > From massimo.gallo at alcatel-lucent.com Tue Sep 16 02:12:17 2014 From: massimo.gallo at alcatel-lucent.com (Massimo Gallo) Date: Tue, 16 Sep 2014 11:12:17 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5417E823.70306@rabe.io> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> Message-ID: <5417FEF1.1060307@alcatel-lucent.com> Dear all, Interesting discussion! Here are my 2 cents to the discussion. I think we should avoid having explicit components' type. CCN/NDN are layer three technologies that allow an end host process to retrieve some named data. The networking layer does need only to check local cache (exact match IMO), PIT (exact match IMO) and FIB (LPM, I think we all agree on this :D!). The way a "consumer" requests the next in order segment is up to the transport layer as Tarek said and not a layer three functionality. Moreover, adding components' types may introduce problems as type explosion (also pointed by someone else in this thread). That said there might be some "special" components requiring conventions: Segment: the most important one IMO because with that field many segments can be identified as a single object and treated differently for caching purposes. (i.e. a naming convention here can be that Segment ID must be last name's component hence without an explicit segment identification) Versioning and Timestamp: there might be a case for explicitly identify those two components as suggested in the NDN-TR-22 but, IMO, too many naming conventions will complexify layer three operations that I would keep as simple as possible. Max -------------- next part -------------- An HTML attachment was scrubbed... URL: From oran at cisco.com Tue Sep 16 06:35:37 2014 From: oran at cisco.com (Dave Oran (oran)) Date: Tue, 16 Sep 2014 13:35:37 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5417FEF1.1060307@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> Message-ID: On Sep 16, 2014, at 5:12 AM, Massimo Gallo wrote: > > Dear all, > > Interesting discussion! > > Here are my 2 cents to the discussion. > > I think we should avoid having explicit components' type. Below you seem to argue pretty much the opposite. > CCN/NDN are layer three technologies that allow an end host process to retrieve some named data. The networking layer does need only to check local cache (exact match IMO), PIT (exact match IMO) and FIB (LPM, I think we all agree on this :D!). I certainly wish we could agree on this change to the design, but I think it?s premature to say we all agree that LPM on the PIT and CS has proven to be highly problematic. > The way a "consumer" requests the next in order segment is up to the transport layer as Tarek said and not a layer three functionality. Agree, but we?re discussing the protocol encoding, not the fetch algorithm. Names as a data structure are ?shared? across multiple layers (a good thing IMO) and hence defining their semantics once as opposed to having different interpretaitons in different layers seems a good tradeoff, even it ?exposes? some things to layer 3 that are not strictly necessary for layer 3 operation and could be opaque. I?ll note that the semantics and encoding of the typed name components needs to be (and is in the current proposal) done in such a way that simple byte-wise compares do the right thing for all of exact match, LPM, and anti-aliasing. > Moreover, adding components' types may introduce problems as type explosion (also pointed by someone else in this thread). Type explosion is a concern, but frankly is equally a problem no matter whether typing is done by convention, markers, or types. What distinguishes the option in my opinion is resilience against aliasing, for which I think explicit typing has substantial advantages. > > That said there might be some "special" components requiring conventions: > > Segment: the most important one IMO because with that field many segments can be identified as a single object and treated differently for caching purposes. (i.e. a naming convention here can be that Segment ID must be last name's component hence without an explicit segment identification) > Precisely. Which means all applications have to agree on a convention, or this needs to be ?baked? into the architecture somehow. There are multiple wats to do this, but typed name components seems utterly straightforward, and can leverage the exact registration policy and machinery that is used to control all architectural constants defined by the protocol. > Versioning and Timestamp: there might be a case for explicitly identify those two components as suggested in the NDN-TR-22 but, IMO, too many naming conventions will complexify layer three operations that I would keep as simple as possible. > As long as L3 can treat these as opaque and is not required to be cognizant of name component types, their existence does not complicate L3 in any way. > Max > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jefft0 at remap.ucla.edu Tue Sep 16 06:43:06 2014 From: jefft0 at remap.ucla.edu (Thompson, Jeff) Date: Tue, 16 Sep 2014 13:43:06 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> Message-ID: Hi Dave. You say: > I?ll note that the semantics and encoding of the typed name components >needs to be (and is in the current proposal) done in such a way that >simple byte-wise compares do the right thing for all of exact match, LPM, > and anti-aliasing. Do I understand correctly that byte-wise compare works for longest prefix match because the type field of the TLV is fixed-length? If NDN-TLV adopts typed name components, would it also have to use a fixed-length type encoding in order to preserve byte-wise compares? - Jeff T On 2014/9/16 6:35 AM, "Dave Oran (oran)" wrote: > >On Sep 16, 2014, at 5:12 AM, Massimo Gallo > wrote: > >> >> Dear all, >> >> Interesting discussion! >> >> Here are my 2 cents to the discussion. >> >> I think we should avoid having explicit components' type. >Below you seem to argue pretty much the opposite. > >> CCN/NDN are layer three technologies that allow an end host process to >>retrieve some named data. The networking layer does need only to >>check local cache (exact match IMO), PIT (exact match IMO) and FIB (LPM, >>I think we all agree on this :D!). >I certainly wish we could agree on this change to the design, but I think >it?s premature to say we all agree that LPM on the PIT and CS has proven >to be highly problematic. > >> The way a "consumer" requests the next in order segment is up to the >>transport layer as Tarek said and not a layer three functionality. >Agree, but we?re discussing the protocol encoding, not the fetch >algorithm. Names as a data structure are ?shared? across multiple layers >(a good thing IMO) and hence defining their semantics once as opposed to >having different interpretaitons in different layers seems a good >tradeoff, even it ?exposes? some things to layer 3 that are not strictly >necessary for layer 3 operation and could be opaque. I?ll note that the >semantics and encoding of the typed name components needs to be (and is >in the current proposal) done in such a way that simple byte-wise >compares do the right thing for all of exact match, LPM, and >anti-aliasing. > >> Moreover, adding components' types may introduce problems as type >>explosion (also pointed by someone else in this thread). >Type explosion is a concern, but frankly is equally a problem no matter >whether typing is done by convention, markers, or types. What >distinguishes the option in my opinion is resilience against aliasing, >for which I think explicit typing has substantial advantages. > >> >> That said there might be some "special" components requiring >>conventions: >> >> Segment: the most important one IMO because with that field many >>segments can be identified as a single object and treated differently >>for caching purposes. (i.e. a naming convention here can be that Segment >>ID must be last name's component hence without an explicit segment >>identification) >> >Precisely. Which means all applications have to agree on a convention, or >this needs to be ?baked? into the architecture somehow. There are >multiple wats to do this, but typed name components seems utterly >straightforward, and can leverage the exact registration policy and >machinery that is used to control all architectural constants defined by >the protocol. > >> Versioning and Timestamp: there might be a case for explicitly identify >>those two components as suggested in the NDN-TR-22 but, IMO, too many >>naming conventions will complexify layer three operations that I would >>keep as simple as possible. >> >As long as L3 can treat these as opaque and is not required to be >cognizant of name component types, their existence does not complicate L3 >in any way. > >> Max >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From massimo.gallo at alcatel-lucent.com Tue Sep 16 06:56:52 2014 From: massimo.gallo at alcatel-lucent.com (Massimo Gallo) Date: Tue, 16 Sep 2014 15:56:52 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> Message-ID: <541841A4.70508@alcatel-lucent.com> Hi Dave, Just to clarify, what i wanted to say is just said that we should specify name's component types if and only if layer three needs to understand them (e.g., segment, versioning?, ...). What is the advantage of having components' types at layer three if it NEVER uses them? Max On 16/09/2014 15:35, Dave Oran (oran) wrote: > On Sep 16, 2014, at 5:12 AM, Massimo Gallo wrote: > >> Dear all, >> >> Interesting discussion! >> >> Here are my 2 cents to the discussion. >> >> I think we should avoid having explicit components' type. > Below you seem to argue pretty much the opposite. > >> CCN/NDN are layer three technologies that allow an end host process to retrieve some named data. The networking layer does need only to check local cache (exact match IMO), PIT (exact match IMO) and FIB (LPM, I think we all agree on this :D!). > I certainly wish we could agree on this change to the design, but I think it?s premature to say we all agree that LPM on the PIT and CS has proven to be highly problematic. > >> The way a "consumer" requests the next in order segment is up to the transport layer as Tarek said and not a layer three functionality. > Agree, but we?re discussing the protocol encoding, not the fetch algorithm. Names as a data structure are ?shared? across multiple layers (a good thing IMO) and hence defining their semantics once as opposed to having different interpretaitons in different layers seems a good tradeoff, even it ?exposes? some things to layer 3 that are not strictly necessary for layer 3 operation and could be opaque. I?ll note that the semantics and encoding of the typed name components needs to be (and is in the current proposal) done in such a way that simple byte-wise compares do the right thing for all of exact match, LPM, and anti-aliasing. > >> Moreover, adding components' types may introduce problems as type explosion (also pointed by someone else in this thread). > Type explosion is a concern, but frankly is equally a problem no matter whether typing is done by convention, markers, or types. What distinguishes the option in my opinion is resilience against aliasing, for which I think explicit typing has substantial advantages. > >> That said there might be some "special" components requiring conventions: >> >> Segment: the most important one IMO because with that field many segments can be identified as a single object and treated differently for caching purposes. (i.e. a naming convention here can be that Segment ID must be last name's component hence without an explicit segment identification) >> > Precisely. Which means all applications have to agree on a convention, or this needs to be ?baked? into the architecture somehow. There are multiple wats to do this, but typed name components seems utterly straightforward, and can leverage the exact registration policy and machinery that is used to control all architectural constants defined by the protocol. > >> Versioning and Timestamp: there might be a case for explicitly identify those two components as suggested in the NDN-TR-22 but, IMO, too many naming conventions will complexify layer three operations that I would keep as simple as possible. >> > As long as L3 can treat these as opaque and is not required to be cognizant of name component types, their existence does not complicate L3 in any way. > >> Max >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > From mjs at cisco.com Tue Sep 16 06:58:01 2014 From: mjs at cisco.com (Mark Stapp) Date: Tue, 16 Sep 2014 09:58:01 -0400 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5417FEF1.1060307@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> Message-ID: <541841E9.6010608@cisco.com> There's a difference between saying "someone else" made some assertion, and that assertion's being supported with facts - isn't there? No one has demonstrated how the number of name-component types that need to be unambiguously indicated in packets is reduced by the use of "conventions." And no one has demonstrate that an extremely large number of standard name components is needed. The discussion has been about a handful of types: - segment - version - maybe timestamp (but that could be in 'version' too if necessary) - maybe signature (for things like Jeff's signed interests. but again that could be in an opaque or app-specific component if there's no expectation that the network or the general-purpose stack/client lib will need to process the thing) I can imagine a couple more (like 'message digest' for self-certifying names), and I think the PARC folks have thought of a couple. but that's still pretty much a "handful", not an "explosion." And I'm sure there will be plenty of discussion about everything beyond the core four or five. I haven't seen anyone say anything like "I don't think we need segmentation, I hate segmentation." I haven't seen anyone say "I have a way to make a _convention_ safer or more robust than an explicit name-component type." And I haven't seen anyone say "here's a list of ten thousand standard components that we would have to introduce if we left the UTF8 convention behind." So let's not keep throwing the "explosion" word around until it's supported with some actual substance. I have to point out that when the issue of the ccnb encoding was raised a couple of years ago, there were a lot of the same arguments made: that TLVs weren't sufficiently flexible, that applications didn't want TLVs, that we couldn't predict what we might need in the future. The reality is that applications that didn't care then still don't care - they wanted APIs to use, and didn't really care about the bits-on-the-wire representation. TLVs didn't harm them. But there are other parts of the network, and those parts (the client software stack, the forwarders, the caches) really benefitted from the encoding change. A similar thing is going on here, imo. There's no evidence that there will be a name-component "type explosion", but the phrase is out there and it sure sounds scary. In fact, we're talking about a handful of well-known component types, just a handful, that need to be visible outside the application namespace, and we all actually seem to agree about the initial set. The encoding change is opaque to applications that don't care about the actual details of the encoding, but it's very beneficial to other parts of the software stack and the overall network. There are only benefits to removing encoding ambiguity and aliasing by removing the use of "conventions." And there will be plenty of back-pressure if folks come forward and want to allocate additional values from the well-known space. Thanks, Mark From mjs at cisco.com Tue Sep 16 07:05:48 2014 From: mjs at cisco.com (Mark Stapp) Date: Tue, 16 Sep 2014 10:05:48 -0400 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541841A4.70508@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> Message-ID: <541843BC.1040307@cisco.com> And that seems like a reasonable place to start. there's a small number of types that are in wide use. The UTF8 convention should be eliminated, and they should be assigned component types. Now, I would want a way for applications to be able to use app-specific components, because they may want to do so to improve their own processing. that'd be an area opened up for experimentation by making the initial move to typed components. and as you say, there's no need to standardize/publicize component types that are application-specific, we'd just want to identify the code-point boundary between 'common' and 'app-specific'. -- Mark On 9/16/14 9:56 AM, Massimo Gallo wrote: > Hi Dave, > > Just to clarify, what i wanted to say is just said that we should > specify name's component types if and only if layer three needs to > understand them (e.g., segment, versioning?, ...). > What is the advantage of having components' types at layer three if it > NEVER uses them? > > Max > > > > > On 16/09/2014 15:35, Dave Oran (oran) wrote: >> On Sep 16, 2014, at 5:12 AM, Massimo Gallo >> wrote: >> >>> Dear all, >>> >>> Interesting discussion! >>> >>> Here are my 2 cents to the discussion. >>> >>> I think we should avoid having explicit components' type. >> Below you seem to argue pretty much the opposite. >> >>> CCN/NDN are layer three technologies that allow an end host process >>> to retrieve some named data. The networking layer does need only >>> to check local cache (exact match IMO), PIT (exact match IMO) and FIB >>> (LPM, I think we all agree on this :D!). >> I certainly wish we could agree on this change to the design, but I >> think it?s premature to say we all agree that LPM on the PIT and CS >> has proven to be highly problematic. >> >>> The way a "consumer" requests the next in order segment is up to >>> the transport layer as Tarek said and not a layer three functionality. >> Agree, but we?re discussing the protocol encoding, not the fetch >> algorithm. Names as a data structure are ?shared? across multiple >> layers (a good thing IMO) and hence defining their semantics once as >> opposed to having different interpretaitons in different layers seems >> a good tradeoff, even it ?exposes? some things to layer 3 that are not >> strictly necessary for layer 3 operation and could be opaque. I?ll >> note that the semantics and encoding of the typed name components >> needs to be (and is in the current proposal) done in such a way that >> simple byte-wise compares do the right thing for all of exact match, >> LPM, and anti-aliasing. >> >>> Moreover, adding components' types may introduce problems as type >>> explosion (also pointed by someone else in this thread). >> Type explosion is a concern, but frankly is equally a problem no >> matter whether typing is done by convention, markers, or types. What >> distinguishes the option in my opinion is resilience against aliasing, >> for which I think explicit typing has substantial advantages. >> >>> That said there might be some "special" components requiring >>> conventions: >>> >>> Segment: the most important one IMO because with that field many >>> segments can be identified as a single object and treated differently >>> for caching purposes. (i.e. a naming convention here can be that >>> Segment ID must be last name's component hence without an explicit >>> segment identification) >>> >> Precisely. Which means all applications have to agree on a convention, >> or this needs to be ?baked? into the architecture somehow. There are >> multiple wats to do this, but typed name components seems utterly >> straightforward, and can leverage the exact registration policy and >> machinery that is used to control all architectural constants defined >> by the protocol. >> >>> Versioning and Timestamp: there might be a case for explicitly >>> identify those two components as suggested in the NDN-TR-22 but, IMO, >>> too many naming conventions will complexify layer three operations >>> that I would keep as simple as possible. >>> >> As long as L3 can treat these as opaque and is not required to be >> cognizant of name component types, their existence does not complicate >> L3 in any way. >> >>> Max >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > . > From massimo.gallo at alcatel-lucent.com Tue Sep 16 07:29:52 2014 From: massimo.gallo at alcatel-lucent.com (Massimo Gallo) Date: Tue, 16 Sep 2014 16:29:52 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541843BC.1040307@cisco.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> Message-ID: <54184960.8060102@alcatel-lucent.com> I think we agree on the small number of "component types". However, if you have a small number of types, you will end up with names containing many generic components types and few specific components types. Due to the fact that the component type specification is an exception in the name, I would prefer something that specify component's type only when needed (something like UTF8 conventions but that applications MUST use). Max On 16/09/2014 16:05, Mark Stapp wrote: > And that seems like a reasonable place to start. there's a small > number of types that are in wide use. The UTF8 convention should be > eliminated, and they should be assigned component types. > > Now, I would want a way for applications to be able to use > app-specific components, because they may want to do so to improve > their own processing. that'd be an area opened up for experimentation > by making the initial move to typed components. and as you say, > there's no need to standardize/publicize component types that are > application-specific, we'd just want to identify the code-point > boundary between 'common' and 'app-specific'. > > -- Mark > > On 9/16/14 9:56 AM, Massimo Gallo wrote: >> Hi Dave, >> >> Just to clarify, what i wanted to say is just said that we should >> specify name's component types if and only if layer three needs to >> understand them (e.g., segment, versioning?, ...). >> What is the advantage of having components' types at layer three if it >> NEVER uses them? >> >> Max >> >> >> >> >> On 16/09/2014 15:35, Dave Oran (oran) wrote: >>> On Sep 16, 2014, at 5:12 AM, Massimo Gallo >>> wrote: >>> >>>> Dear all, >>>> >>>> Interesting discussion! >>>> >>>> Here are my 2 cents to the discussion. >>>> >>>> I think we should avoid having explicit components' type. >>> Below you seem to argue pretty much the opposite. >>> >>>> CCN/NDN are layer three technologies that allow an end host process >>>> to retrieve some named data. The networking layer does need only >>>> to check local cache (exact match IMO), PIT (exact match IMO) and FIB >>>> (LPM, I think we all agree on this :D!). >>> I certainly wish we could agree on this change to the design, but I >>> think it?s premature to say we all agree that LPM on the PIT and CS >>> has proven to be highly problematic. >>> >>>> The way a "consumer" requests the next in order segment is up to >>>> the transport layer as Tarek said and not a layer three functionality. >>> Agree, but we?re discussing the protocol encoding, not the fetch >>> algorithm. Names as a data structure are ?shared? across multiple >>> layers (a good thing IMO) and hence defining their semantics once as >>> opposed to having different interpretaitons in different layers seems >>> a good tradeoff, even it ?exposes? some things to layer 3 that are not >>> strictly necessary for layer 3 operation and could be opaque. I?ll >>> note that the semantics and encoding of the typed name components >>> needs to be (and is in the current proposal) done in such a way that >>> simple byte-wise compares do the right thing for all of exact match, >>> LPM, and anti-aliasing. >>> >>>> Moreover, adding components' types may introduce problems as type >>>> explosion (also pointed by someone else in this thread). >>> Type explosion is a concern, but frankly is equally a problem no >>> matter whether typing is done by convention, markers, or types. What >>> distinguishes the option in my opinion is resilience against aliasing, >>> for which I think explicit typing has substantial advantages. >>> >>>> That said there might be some "special" components requiring >>>> conventions: >>>> >>>> Segment: the most important one IMO because with that field many >>>> segments can be identified as a single object and treated differently >>>> for caching purposes. (i.e. a naming convention here can be that >>>> Segment ID must be last name's component hence without an explicit >>>> segment identification) >>>> >>> Precisely. Which means all applications have to agree on a convention, >>> or this needs to be ?baked? into the architecture somehow. There are >>> multiple wats to do this, but typed name components seems utterly >>> straightforward, and can leverage the exact registration policy and >>> machinery that is used to control all architectural constants defined >>> by the protocol. >>> >>>> Versioning and Timestamp: there might be a case for explicitly >>>> identify those two components as suggested in the NDN-TR-22 but, IMO, >>>> too many naming conventions will complexify layer three operations >>>> that I would keep as simple as possible. >>>> >>> As long as L3 can treat these as opaque and is not required to be >>> cognizant of name component types, their existence does not complicate >>> L3 in any way. >>> >>>> Max >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> . >> > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > From jburke at remap.UCLA.EDU Tue Sep 16 08:18:10 2014 From: jburke at remap.UCLA.EDU (Burke, Jeff) Date: Tue, 16 Sep 2014 15:18:10 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541841E9.6010608@cisco.com> Message-ID: Mark, Since I made it, let me explain that the intention of the "type explosion" comment was not the fear-mongering that you are suggesting. :) Rather, it was an attempt to explore the potential consequences of a design choice by posing the logical extreme... The examples didn't get captured in that doc, sorry. I don't know that it's so unimaginable that applications would start using T-V pairs loosely if allowed. Maybe we can chat at ICN about this. (This paper has been an interesting one to think about with respect to k/v descriptions of NDN data objects - https://www.dropbox.com/s/68ief1wcee908z9/Sechrest%26McClennen_BlendingHier archical.pdf?dl=0) I agree that applications using types for their own purposes is likely not a sufficient reason to discard the approach. But two questions I could still use some help on: First, in the document I sent, there are seven specific, though not equally well considered, reasons to use marker components that have nothing to with the so-called type explosion. As far as I can tell, no one has addressed these from the application developer's perspective. Could someone? Second, if the most important issue is eliminating ambiguity/aliasing, then why not define a new type that hints that the component can be interpreted as a key/value pair with some encoding convention? This could enable an unambiguous, short list of commonly used conventions that you've mentioned (using marker-like keys), while keeping information describing the data object in the name. It would also be very useful for applications that desire their own k/v representation for components, which Dave has argued for in other circumstances and we keep running across. It doesn't rule out use of hierarchy, and doesn't limit what an application defined keys could be. Yet, it could be ignored in forwarding (just another component) and perhaps have a still-meaningful sort order (key, then value). cheers, Jeff On 9/16/14, 4:58 PM, "Mark Stapp" wrote: > >There's a difference between saying "someone else" made some assertion, >and that assertion's being supported with facts - isn't there? No one >has demonstrated how the number of name-component types that need to be >unambiguously indicated in packets is reduced by the use of >"conventions." And no one has demonstrate that an extremely large number >of standard name components is needed. The discussion has been about a >handful of types: > >- segment >- version >- maybe timestamp (but that could be in 'version' too if necessary) >- maybe signature (for things like Jeff's signed interests. but again >that could be in an opaque or app-specific component if there's no >expectation that the network or the general-purpose stack/client lib >will need to process the thing) > >I can imagine a couple more (like 'message digest' for self-certifying >names), and I think the PARC folks have thought of a couple. but that's >still pretty much a "handful", not an "explosion." And I'm sure there >will be plenty of discussion about everything beyond the core four or >five. > >I haven't seen anyone say anything like "I don't think we need >segmentation, I hate segmentation." I haven't seen anyone say "I have a >way to make a _convention_ safer or more robust than an explicit >name-component type." And I haven't seen anyone say "here's a list of >ten thousand standard components that we would have to introduce if we >left the UTF8 convention behind." So let's not keep throwing the >"explosion" word around until it's supported with some actual substance. > >I have to point out that when the issue of the ccnb encoding was raised >a couple of years ago, there were a lot of the same arguments made: that >TLVs weren't sufficiently flexible, that applications didn't want TLVs, >that we couldn't predict what we might need in the future. The reality >is that applications that didn't care then still don't care - they >wanted APIs to use, and didn't really care about the bits-on-the-wire >representation. TLVs didn't harm them. But there are other parts of the >network, and those parts (the client software stack, the forwarders, the >caches) really benefitted from the encoding change. > >A similar thing is going on here, imo. There's no evidence that there >will be a name-component "type explosion", but the phrase is out there >and it sure sounds scary. In fact, we're talking about a handful of >well-known component types, just a handful, that need to be visible >outside the application namespace, and we all actually seem to agree >about the initial set. The encoding change is opaque to applications >that don't care about the actual details of the encoding, but it's very >beneficial to other parts of the software stack and the overall network. >There are only benefits to removing encoding ambiguity and aliasing by >removing the use of "conventions." And there will be plenty of >back-pressure if folks come forward and want to allocate additional >values from the well-known space. > >Thanks, >Mark >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From shijunxiao at email.arizona.edu Tue Sep 16 08:47:50 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Tue, 16 Sep 2014 08:47:50 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541841E9.6010608@cisco.com> Message-ID: Hi Jeff Please see my proposal of MarkedComponent < http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000085.html>, which is a solution to eliminate ambiguity by defining a new type specifically for key-value pair. Yours, Junxiao On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff wrote: > > Second, if the most important issue is eliminating ambiguity/aliasing, > then why not define a new type that hints that the component can be > interpreted as a key/value pair with some encoding convention? This could > enable an unambiguous, short list of commonly used conventions that you've > mentioned (using marker-like keys), while keeping information describing > the data object in the name. It would also be very useful for applications > that desire their own k/v representation for components, which Dave has > argued for in other circumstances and we keep running across. It doesn't > rule out use of hierarchy, and doesn't limit what an application defined > keys could be. Yet, it could be ignored in forwarding (just another > component) and perhaps have a still-meaningful sort order (key, then > value). > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Tue Sep 16 10:21:17 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Tue, 16 Sep 2014 10:21:17 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541841E9.6010608@cisco.com> Message-ID: Summarize all types that people need (feel free to add some, and paste in your reply) - regular - segment - version (timestamp) - signature - key: assuming that the next regular component will be value. The value is empty if it sees another key component immediately after. - app-specific However, I am not convinced that we need version, signature, and app-specific as typed component. Will these change how packet routes? On Tue, Sep 16, 2014 at 8:47 AM, Junxiao Shi wrote: > Hi Jeff > > Please see my proposal of MarkedComponent > , > which is a solution to eliminate ambiguity by defining a new type > specifically for key-value pair. > > Yours, Junxiao > > On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff wrote: >> >> >> Second, if the most important issue is eliminating ambiguity/aliasing, >> then why not define a new type that hints that the component can be >> interpreted as a key/value pair with some encoding convention? This could >> enable an unambiguous, short list of commonly used conventions that you've >> mentioned (using marker-like keys), while keeping information describing >> the data object in the name. It would also be very useful for applications >> that desire their own k/v representation for components, which Dave has >> argued for in other circumstances and we keep running across. It doesn't >> rule out use of hierarchy, and doesn't limit what an application defined >> keys could be. Yet, it could be ignored in forwarding (just another >> component) and perhaps have a still-meaningful sort order (key, then >> value). >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From felix at rabe.io Tue Sep 16 10:54:04 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 16 Sep 2014 19:54:04 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541841E9.6010608@cisco.com> Message-ID: <5418793C.9000808@rabe.io> On 16/Sep/14 19:21, Tai-Lin Chu wrote: > Summarize all types that people need (feel free to add some, and paste > in your reply) > - regular > - segment Segment: As I think more of it, this seems just like versioned content with a known end (length), with continuous numbering from 0...length-1. Whereas "usual" versioned content has an open end, but a latest version (somewhere). (Of course, the difference is that versioned content is similar or related content in a sequence, whereas segments are parts of the whole data.) > - version (timestamp) > - signature > - key: assuming that the next regular component will be value. The > value is empty if it sees another key component immediately after. Empty value: I would not allow this special case. You will soon find someone who tries to have an empty component, but for that needs to have some other (dummy) key just after that. Better to either allow empty components in general, or don't allow them here eiter. > - app-specific > > > However, I am not convinced that we need version, signature, and > app-specific as typed component. Will these change how packet routes? > > > > On Tue, Sep 16, 2014 at 8:47 AM, Junxiao Shi > wrote: >> Hi Jeff >> >> Please see my proposal of MarkedComponent >> , >> which is a solution to eliminate ambiguity by defining a new type >> specifically for key-value pair. >> >> Yours, Junxiao >> >> On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff wrote: >>> >>> Second, if the most important issue is eliminating ambiguity/aliasing, >>> then why not define a new type that hints that the component can be >>> interpreted as a key/value pair with some encoding convention? This could >>> enable an unambiguous, short list of commonly used conventions that you've >>> mentioned (using marker-like keys), while keeping information describing >>> the data object in the name. It would also be very useful for applications >>> that desire their own k/v representation for components, which Dave has >>> argued for in other circumstances and we keep running across. It doesn't >>> rule out use of hierarchy, and doesn't limit what an application defined >>> keys could be. Yet, it could be ignored in forwarding (just another >>> component) and perhaps have a still-meaningful sort order (key, then >>> value). >>> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Tue Sep 16 10:53:16 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Tue, 16 Sep 2014 17:53:16 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541841E9.6010608@cisco.com> Message-ID: I am not sure why something being a typed name is related to routing. Isn?t routing going to be over the full TLV representation of the name? Or do you consider the TLV ?T? as separate from the name component and not used in FIB or PIT matching? I think a serial version is more useful than a timestamp version. In CCNx 1.0, we have a type for both, but we generally use the serial version. A timestamp does not give distributed versioning, so like a serial version it is only useful from a single publisher and it gives an easy way to determine the next version. It does, of course, require that the publisher maintain state rather than rely on its real-time clock (or ntp) for its version number. A serial version number also allows unlimited version number generation, whereas a quantized (e.g. milli-second) timestamp limits the number of versions one can generate at a time without keeping state. As a general philosophy on named addresses, I see the hierarchical name components as providing protocol encapsulation, essentially encapsulating name components to the left (not all components are like this, but some are). For example, when you add a version component to a name it is a statement that a versioning protocol has encapsulated and possibly modified the content object identified by the left name. When a segmentation protocol is applied to a content object, it encapsulates a name to the left. They serve a similar purpose to header encapsulation in traditional packets. Therefore, I think that when a protocol is encapsulating the left-name, those should be unambiguous and explicit. For protocols that everyone needs to understand, like versioning or segmenting, those should be a standardized value, and not exclusive of other protocols. Someone might come out, for example, with a better segmentation protocol and that should have a different identifier than the earlier segmentation protocol. Therefore, wherever you do your multiplexing you need to coordinate. That?s going to be either in the TLV ?T? or in the ?key? of a ?key=value? inside the ?V?. Marc On Sep 16, 2014, at 10:21 AM, Tai-Lin Chu wrote: > Summarize all types that people need (feel free to add some, and paste > in your reply) > - regular > - segment > - version (timestamp) > - signature > - key: assuming that the next regular component will be value. The > value is empty if it sees another key component immediately after. > - app-specific > > > However, I am not convinced that we need version, signature, and > app-specific as typed component. Will these change how packet routes? > > > > On Tue, Sep 16, 2014 at 8:47 AM, Junxiao Shi > wrote: >> Hi Jeff >> >> Please see my proposal of MarkedComponent >> , >> which is a solution to eliminate ambiguity by defining a new type >> specifically for key-value pair. >> >> Yours, Junxiao >> >> On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff wrote: >>> >>> >>> Second, if the most important issue is eliminating ambiguity/aliasing, >>> then why not define a new type that hints that the component can be >>> interpreted as a key/value pair with some encoding convention? This could >>> enable an unambiguous, short list of commonly used conventions that you've >>> mentioned (using marker-like keys), while keeping information describing >>> the data object in the name. It would also be very useful for applications >>> that desire their own k/v representation for components, which Dave has >>> argued for in other circumstances and we keep running across. It doesn't >>> rule out use of hierarchy, and doesn't limit what an application defined >>> keys could be. Yet, it could be ignored in forwarding (just another >>> component) and perhaps have a still-meaningful sort order (key, then >>> value). >>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From oran at cisco.com Tue Sep 16 11:10:12 2014 From: oran at cisco.com (Dave Oran (oran)) Date: Tue, 16 Sep 2014 18:10:12 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541841A4.70508@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> Message-ID: On Sep 16, 2014, at 9:56 AM, Massimo Gallo wrote: > Hi Dave, > > Just to clarify, what i wanted to say is just said that we should specify name's component types if and only if layer three needs to understand them (e.g., segment, versioning?, ...). > What is the advantage of having components' types at layer three if it NEVER uses them? > Because Names are a multi-layer data structure. We don?t have naming tunnels, or other encapsulations in NDN. Also, history has shown that unless one goes to extreme lengths to opacify things (e.g. through encryption), lower layer implementations will inevitably ?peek? and possibly ?poke? at the supposedly higher layer data structures and conventions. We?ve seen this with port numbers and lots of other ?higher layer? constructs in the IP world. Therefore, it seems prudent to design the system such that the inevitable attempts by L3 to exploit L4+ stuff have less chance of breaking things. That?s why I advocate ensuring the L3 does not have to look at typed name components to work right, but that we not strictly layer naming conventions in such a way that we create aliasing problems, or robustness problems when the L3 guys do their ?peeking?. DaveO. > Max > > > > > On 16/09/2014 15:35, Dave Oran (oran) wrote: >> On Sep 16, 2014, at 5:12 AM, Massimo Gallo wrote: >> >>> Dear all, >>> >>> Interesting discussion! >>> >>> Here are my 2 cents to the discussion. >>> >>> I think we should avoid having explicit components' type. >> Below you seem to argue pretty much the opposite. >> >>> CCN/NDN are layer three technologies that allow an end host process to retrieve some named data. The networking layer does need only to check local cache (exact match IMO), PIT (exact match IMO) and FIB (LPM, I think we all agree on this :D!). >> I certainly wish we could agree on this change to the design, but I think it?s premature to say we all agree that LPM on the PIT and CS has proven to be highly problematic. >> >>> The way a "consumer" requests the next in order segment is up to the transport layer as Tarek said and not a layer three functionality. >> Agree, but we?re discussing the protocol encoding, not the fetch algorithm. Names as a data structure are ?shared? across multiple layers (a good thing IMO) and hence defining their semantics once as opposed to having different interpretaitons in different layers seems a good tradeoff, even it ?exposes? some things to layer 3 that are not strictly necessary for layer 3 operation and could be opaque. I?ll note that the semantics and encoding of the typed name components needs to be (and is in the current proposal) done in such a way that simple byte-wise compares do the right thing for all of exact match, LPM, and anti-aliasing. >> >>> Moreover, adding components' types may introduce problems as type explosion (also pointed by someone else in this thread). >> Type explosion is a concern, but frankly is equally a problem no matter whether typing is done by convention, markers, or types. What distinguishes the option in my opinion is resilience against aliasing, for which I think explicit typing has substantial advantages. >> >>> That said there might be some "special" components requiring conventions: >>> >>> Segment: the most important one IMO because with that field many segments can be identified as a single object and treated differently for caching purposes. (i.e. a naming convention here can be that Segment ID must be last name's component hence without an explicit segment identification) >>> >> Precisely. Which means all applications have to agree on a convention, or this needs to be ?baked? into the architecture somehow. There are multiple wats to do this, but typed name components seems utterly straightforward, and can leverage the exact registration policy and machinery that is used to control all architectural constants defined by the protocol. >> >>> Versioning and Timestamp: there might be a case for explicitly identify those two components as suggested in the NDN-TR-22 but, IMO, too many naming conventions will complexify layer three operations that I would keep as simple as possible. >>> >> As long as L3 can treat these as opaque and is not required to be cognizant of name component types, their existence does not complicate L3 in any way. >> >>> Max >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> > From mjs at cisco.com Tue Sep 16 11:33:04 2014 From: mjs at cisco.com (Mark Stapp) Date: Tue, 16 Sep 2014 14:33:04 -0400 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <54184960.8060102@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> Message-ID: <54188260.50306@cisco.com> On 9/16/14 10:29 AM, Massimo Gallo wrote: > > I think we agree on the small number of "component types". > However, if you have a small number of types, you will end up with names > containing many generic components types and few specific components > types. Due to the fact that the component type specification is an > exception in the name, I would prefer something that specify component's > type only when needed (something like UTF8 conventions but that > applications MUST use). > so ... I can't quite follow that. the thread has had some explanation about why the UTF8 requirement has problems (with aliasing, e.g.) and there's been email trying to explain that applications don't have to use types if they don't need to. your email sounds like "I prefer the UTF8 convention", but it doesn't say why you have that preference in the face of the points about the problems. can you say why it is that you express a preference for the "convention" with problems ? Thanks, Mark From christian.tschudin at unibas.ch Tue Sep 16 15:56:17 2014 From: christian.tschudin at unibas.ch (christian.tschudin at unibas.ch) Date: Wed, 17 Sep 2014 00:56:17 +0200 (CEST) Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541841E9.6010608@cisco.com> Message-ID: Hi Marc, a question regarding the insightful encapsulation-to-the-left view (and taking up Tai-Lin's routing question): I see that demux is useful for end nodes where the applications are sitting. But are there important cases where core routers should demux, i.e. forward to different faces, based on that handful set of typed components we talk about? If not, then LPM for the forwarding can be constrained to the (encapsulated) untyped name components up to the first marker - all following bytes will not influence routing. PIT and CS is another story. best, christian On Tue, 16 Sep 2014, Marc.Mosko at parc.com wrote: > I am not sure why something being a typed name is related to routing. > Isn?t routing going to be over the full TLV representation of the > name? Or do you consider the TLV ?T? as separate from the name > component and not used in FIB or PIT matching? > > I think a serial version is more useful than a timestamp version. In > CCNx 1.0, we have a type for both, but we generally use the serial > version. A timestamp does not give distributed versioning, so like a > serial version it is only useful from a single publisher and it gives > an easy way to determine the next version. It does, of course, > require that the publisher maintain state rather than rely on its > real-time clock (or ntp) for its version number. A serial version > number also allows unlimited version number generation, whereas a > quantized (e.g. milli-second) timestamp limits the number of versions > one can generate at a time without keeping state. > > As a general philosophy on named addresses, I see the hierarchical > name components as providing protocol encapsulation, essentially > encapsulating name components to the left (not all components are like > this, but some are). For example, when you add a version component to > a name it is a statement that a versioning protocol has encapsulated > and possibly modified the content object identified by the left name. > When a segmentation protocol is applied to a content object, it > encapsulates a name to the left. They serve a similar purpose to > header encapsulation in traditional packets. Therefore, I think that > when a protocol is encapsulating the left-name, those should be > unambiguous and explicit. > > For protocols that everyone needs to understand, like versioning or > segmenting, those should be a standardized value, and not exclusive of > other protocols. Someone might come out, for example, with a better > segmentation protocol and that should have a different identifier than > the earlier segmentation protocol. Therefore, wherever you do your > multiplexing you need to coordinate. That?s going to be either in the > TLV ?T? or in the ?key? of a ?key=value? inside the ?V?. > > Marc > > On Sep 16, 2014, at 10:21 AM, Tai-Lin Chu wrote: > >> Summarize all types that people need (feel free to add some, and paste >> in your reply) >> - regular >> - segment >> - version (timestamp) >> - signature >> - key: assuming that the next regular component will be value. The >> value is empty if it sees another key component immediately after. >> - app-specific >> >> >> However, I am not convinced that we need version, signature, and >> app-specific as typed component. Will these change how packet routes? >> >> >> >> On Tue, Sep 16, 2014 at 8:47 AM, Junxiao Shi >> wrote: >>> Hi Jeff >>> >>> Please see my proposal of MarkedComponent >>> , >>> which is a solution to eliminate ambiguity by defining a new type >>> specifically for key-value pair. >>> >>> Yours, Junxiao >>> >>> On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff wrote: >>>> >>>> >>>> Second, if the most important issue is eliminating ambiguity/aliasing, >>>> then why not define a new type that hints that the component can be >>>> interpreted as a key/value pair with some encoding convention? This could >>>> enable an unambiguous, short list of commonly used conventions that you've >>>> mentioned (using marker-like keys), while keeping information describing >>>> the data object in the name. It would also be very useful for applications >>>> that desire their own k/v representation for components, which Dave has >>>> argued for in other circumstances and we keep running across. It doesn't >>>> rule out use of hierarchy, and doesn't limit what an application defined >>>> keys could be. Yet, it could be ignored in forwarding (just another >>>> component) and perhaps have a still-meaningful sort order (key, then >>>> value). >>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > From tailinchu at gmail.com Tue Sep 16 16:39:42 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Tue, 16 Sep 2014 16:39:42 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541841E9.6010608@cisco.com> Message-ID: > If not, then LPM for the forwarding can be constrained to the (encapsulated) untyped name components up to the first marker - all following bytes will not influence routing. PIT and CS is another story. I assume that it is up to and "not including" the first marker. What you just said in my opinion limits the typed component to be placed in the end of a name, and I don't think this limit will be good. For example, /folder1/v1/file1/v2. According to your statement, file1 will not influence routing, but I think it should. IMHO, the only one requirement of allocating new types for typed components is necessity. i.e. Will this typed component help upper layer or router to process this packet differently (perhaps more efficiently)? On Tue, Sep 16, 2014 at 3:56 PM, wrote: > Hi Marc, > > a question regarding the insightful encapsulation-to-the-left view (and > taking up Tai-Lin's routing question): > > I see that demux is useful for end nodes where the applications are sitting. > But are there important cases where core routers should demux, i.e. forward > to different faces, based on that handful set of typed components we talk > about? > > If not, then LPM for the forwarding can be constrained to the (encapsulated) > untyped name components up to the first marker - all following bytes will > not influence routing. PIT and CS is another story. > > best, christian > > > > On Tue, 16 Sep 2014, Marc.Mosko at parc.com wrote: > >> I am not sure why something being a typed name is related to routing. >> Isn?t routing going to be over the full TLV representation of the name? Or >> do you consider the TLV ?T? as separate from the name component and not used >> in FIB or PIT matching? >> >> I think a serial version is more useful than a timestamp version. In CCNx >> 1.0, we have a type for both, but we generally use the serial version. A >> timestamp does not give distributed versioning, so like a serial version it >> is only useful from a single publisher and it gives an easy way to determine >> the next version. It does, of course, require that the publisher maintain >> state rather than rely on its real-time clock (or ntp) for its version >> number. A serial version number also allows unlimited version number >> generation, whereas a quantized (e.g. milli-second) timestamp limits the >> number of versions one can generate at a time without keeping state. >> >> As a general philosophy on named addresses, I see the hierarchical name >> components as providing protocol encapsulation, essentially encapsulating >> name components to the left (not all components are like this, but some >> are). For example, when you add a version component to a name it is a >> statement that a versioning protocol has encapsulated and possibly modified >> the content object identified by the left name. When a segmentation protocol >> is applied to a content object, it encapsulates a name to the left. They >> serve a similar purpose to header encapsulation in traditional packets. >> Therefore, I think that when a protocol is encapsulating the left-name, >> those should be unambiguous and explicit. >> >> For protocols that everyone needs to understand, like versioning or >> segmenting, those should be a standardized value, and not exclusive of other >> protocols. Someone might come out, for example, with a better segmentation >> protocol and that should have a different identifier than the earlier >> segmentation protocol. Therefore, wherever you do your multiplexing you >> need to coordinate. That?s going to be either in the TLV ?T? or in the >> ?key? of a ?key=value? inside the ?V?. >> >> Marc >> >> On Sep 16, 2014, at 10:21 AM, Tai-Lin Chu wrote: >> >>> Summarize all types that people need (feel free to add some, and paste >>> in your reply) >>> - regular >>> - segment >>> - version (timestamp) >>> - signature >>> - key: assuming that the next regular component will be value. The >>> value is empty if it sees another key component immediately after. >>> - app-specific >>> >>> >>> However, I am not convinced that we need version, signature, and >>> app-specific as typed component. Will these change how packet routes? >>> >>> >>> >>> On Tue, Sep 16, 2014 at 8:47 AM, Junxiao Shi >>> wrote: >>>> >>>> Hi Jeff >>>> >>>> Please see my proposal of MarkedComponent >>>> >>>> , >>>> which is a solution to eliminate ambiguity by defining a new type >>>> specifically for key-value pair. >>>> >>>> Yours, Junxiao >>>> >>>> On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff >>>> wrote: >>>>> >>>>> >>>>> >>>>> Second, if the most important issue is eliminating ambiguity/aliasing, >>>>> then why not define a new type that hints that the component can be >>>>> interpreted as a key/value pair with some encoding convention? This >>>>> could >>>>> enable an unambiguous, short list of commonly used conventions that >>>>> you've >>>>> mentioned (using marker-like keys), while keeping information >>>>> describing >>>>> the data object in the name. It would also be very useful for >>>>> applications >>>>> that desire their own k/v representation for components, which Dave has >>>>> argued for in other circumstances and we keep running across. It >>>>> doesn't >>>>> rule out use of hierarchy, and doesn't limit what an application >>>>> defined >>>>> keys could be. Yet, it could be ignored in forwarding (just another >>>>> component) and perhaps have a still-meaningful sort order (key, then >>>>> value). >>>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> > From massimo.gallo at alcatel-lucent.com Wed Sep 17 03:02:26 2014 From: massimo.gallo at alcatel-lucent.com (Massimo Gallo) Date: Wed, 17 Sep 2014 12:02:26 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <54188260.50306@cisco.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> Message-ID: <54195C32.6000609@alcatel-lucent.com> The why is simple: You use a lot of "generic component type" and very few "specific component type". You are imposing types for every component in order to handle few exceptions (segmentation, etc..). You create a rule (specify the component's type ) to handle exceptions! I would prefer not to have typed components. Instead I would prefer to have the name as simple sequence bytes with a field separator. Then, outside the name, if you have some components that could be used at network layer (e.g. a TLV field), you simply need something that indicates which is the offset allowing you to retrieve the version, segment, etc in the name... Max On 16/09/2014 20:33, Mark Stapp wrote: > > > On 9/16/14 10:29 AM, Massimo Gallo wrote: >> >> I think we agree on the small number of "component types". >> However, if you have a small number of types, you will end up with names >> containing many generic components types and few specific components >> types. Due to the fact that the component type specification is an >> exception in the name, I would prefer something that specify component's >> type only when needed (something like UTF8 conventions but that >> applications MUST use). >> > > so ... I can't quite follow that. the thread has had some explanation > about why the UTF8 requirement has problems (with aliasing, e.g.) and > there's been email trying to explain that applications don't have to > use types if they don't need to. your email sounds like "I prefer the > UTF8 convention", but it doesn't say why you have that preference in > the face of the points about the problems. can you say why it is that > you express a preference for the "convention" with problems ? > > Thanks, > Mark > From mjs at cisco.com Wed Sep 17 05:56:12 2014 From: mjs at cisco.com (Mark Stapp) Date: Wed, 17 Sep 2014 08:56:12 -0400 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <54195C32.6000609@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> Message-ID: <541984EC.1000009@cisco.com> ah, thanks - that's helpful. I thought you were saying "I like the existing NDN UTF8 'convention'." I'm still not sure I understand what you _do_ prefer, though. it sounds like you're describing an entirely different scheme where the info that describes the name-components is ... someplace other than _in_ the name-components. is that correct? when you say "field separator", what do you mean (since that's not a "TL" from a TLV)? -- Mark On 9/17/14 6:02 AM, Massimo Gallo wrote: > The why is simple: > > You use a lot of "generic component type" and very few "specific > component type". You are imposing types for every component in order to > handle few exceptions (segmentation, etc..). You create a rule (specify > the component's type ) to handle exceptions! > > I would prefer not to have typed components. Instead I would prefer to > have the name as simple sequence bytes with a field separator. Then, > outside the name, if you have some components that could be used at > network layer (e.g. a TLV field), you simply need something that > indicates which is the offset allowing you to retrieve the version, > segment, etc in the name... > > > Max > > > > > > On 16/09/2014 20:33, Mark Stapp wrote: >> >> >> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end up with names >>> containing many generic components types and few specific components >>> types. Due to the fact that the component type specification is an >>> exception in the name, I would prefer something that specify component's >>> type only when needed (something like UTF8 conventions but that >>> applications MUST use). >>> >> >> so ... I can't quite follow that. the thread has had some explanation >> about why the UTF8 requirement has problems (with aliasing, e.g.) and >> there's been email trying to explain that applications don't have to >> use types if they don't need to. your email sounds like "I prefer the >> UTF8 convention", but it doesn't say why you have that preference in >> the face of the points about the problems. can you say why it is that >> you express a preference for the "convention" with problems ? >> >> Thanks, >> Mark >> > > . > From oran at cisco.com Wed Sep 17 06:11:08 2014 From: oran at cisco.com (Dave Oran (oran)) Date: Wed, 17 Sep 2014 13:11:08 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <54195C32.6000609@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> Message-ID: <41566E14-D244-4416-A64E-B75E9BAE3714@cisco.com> On Sep 17, 2014, at 6:02 AM, Massimo Gallo wrote: > The why is simple: > > You use a lot of "generic component type" and very few "specific component type". You are imposing types for every component in order to handle few exceptions (segmentation, etc..). You create a rule (specify the component's type ) to handle exceptions! > > I would prefer not to have typed components. Instead I would prefer to have the name as simple sequence bytes with a field separator. Field separators are perhaps the most problematic cause of aliasing. They also break purely binary name components since they have to be escaped. > Then, outside the name, if you have some components that could be used at network layer (e.g. a TLV field), you simply need something that indicates which is the offset allowing you to retrieve the version, segment, etc in the name? Maybe we?re into aesthetics and not functionality but I tend to prefer self-describing data structures rather than ones with ancillary stuff like pointers into the middle. > > > Max > > > > > > On 16/09/2014 20:33, Mark Stapp wrote: >> >> >> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end up with names >>> containing many generic components types and few specific components >>> types. Due to the fact that the component type specification is an >>> exception in the name, I would prefer something that specify component's >>> type only when needed (something like UTF8 conventions but that >>> applications MUST use). >>> >> >> so ... I can't quite follow that. the thread has had some explanation about why the UTF8 requirement has problems (with aliasing, e.g.) and there's been email trying to explain that applications don't have to use types if they don't need to. your email sounds like "I prefer the UTF8 convention", but it doesn't say why you have that preference in the face of the points about the problems. can you say why it is that you express a preference for the "convention" with problems ? >> >> Thanks, >> Mark >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Wed Sep 17 07:36:28 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Wed, 17 Sep 2014 14:36:28 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <54195C32.6000609@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> Message-ID: <0E30789C-A55C-4B6B-871F-9B6870A9E55B@parc.com> I think if you require all name components to have a ?key=value? format, then that is an ok system, but it seems duplicative of having the TLV type. Personally, I would then encode the ?key=? piece the same way you encode the TLV ?T? (i.e. the 1/3/5 system). Though it does seem rather duplicative of having the TLV type, as I said. I think having one T (call it T0) for non-tagged and one T (call it T1) for ?key=value? introduces more overhead than is needed, as you now have two type systems for one value. You will need to keep the T0/T1 distinction everywhere in the code so you know if there is a ?key=? embedded in the name. Programmatically, you?ll probably need to sprout a new class or getter/setter for the ?key=? for those types. It seems simpler to me for the TLV ?T? to always be associated with a name component, so you are always working with the (T, value) pair. Marc On Sep 17, 2014, at 3:02 AM, Massimo Gallo wrote: > The why is simple: > > You use a lot of "generic component type" and very few "specific component type". You are imposing types for every component in order to handle few exceptions (segmentation, etc..). You create a rule (specify the component's type ) to handle exceptions! > > I would prefer not to have typed components. Instead I would prefer to have the name as simple sequence bytes with a field separator. Then, outside the name, if you have some components that could be used at network layer (e.g. a TLV field), you simply need something that indicates which is the offset allowing you to retrieve the version, segment, etc in the name... > > > Max > > > > > > On 16/09/2014 20:33, Mark Stapp wrote: >> >> >> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end up with names >>> containing many generic components types and few specific components >>> types. Due to the fact that the component type specification is an >>> exception in the name, I would prefer something that specify component's >>> type only when needed (something like UTF8 conventions but that >>> applications MUST use). >>> >> >> so ... I can't quite follow that. the thread has had some explanation about why the UTF8 requirement has problems (with aliasing, e.g.) and there's been email trying to explain that applications don't have to use types if they don't need to. your email sounds like "I prefer the UTF8 convention", but it doesn't say why you have that preference in the face of the points about the problems. can you say why it is that you express a preference for the "convention" with problems ? >> >> Thanks, >> Mark >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From shijunxiao at email.arizona.edu Wed Sep 17 07:46:54 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Wed, 17 Sep 2014 07:46:54 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <0E30789C-A55C-4B6B-871F-9B6870A9E55B@parc.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <0E30789C-A55C-4B6B-871F-9B6870A9E55B@parc.com> Message-ID: Hi Marc The MarkedComponent proposal < http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000085.html> is precisely a T0/T1 system: - T0=NameComponent - T1=MarkedComponent - key is encoded as variable length number (same way as T) This still requires all codes to distinguish between NameComponent and MarkedComponent everywhere, but we'll have exactly two types, instead of potentially many types. However, I agree that putting the key into TLV-TYPE is better than using MarkedComponent or having a key in every component, because the processing cost isn't any different. Yours, Junxiao On Wed, Sep 17, 2014 at 7:36 AM, wrote: > I think if you require all name components to have a ?key=value? format, > then that is an ok system, but it seems duplicative of having the TLV > type. Personally, I would then encode the ?key=? piece the same way you > encode the TLV ?T? (i.e. the 1/3/5 system). > > Though it does seem rather duplicative of having the TLV type, as I said. > I think having one T (call it T0) for non-tagged and one T (call it T1) for > ?key=value? introduces more overhead than is needed, as you now have two > type systems for one value. You will need to keep the T0/T1 distinction > everywhere in the code so you know if there is a ?key=? embedded in the > name. Programmatically, you?ll probably need to sprout a new class or > getter/setter for the ?key=? for those types. It seems simpler to me for > the TLV ?T? to always be associated with a name component, so you are > always working with the (T, value) pair. > > Marc > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Marc.Mosko at parc.com Wed Sep 17 07:47:57 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Wed, 17 Sep 2014 14:47:57 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541841E9.6010608@cisco.com> Message-ID: <88FFB87C-CB9F-4190-A5AF-92968905E6DA@parc.com> As Tai-Lin?s example shows, there can always be multiple encapsulations. This is also very common for the ?metadata? markers, /foo/bar/v1/bibliography/v0/s0, for example, where the left name is encapsulated by a metadata object that describes it via some second object. Christian?s question was specifically about a ?core router?. I seriously doubt core routers would have FIB entries that go out to application names or beyond as part of the ICN routing protocol. That said, they might have some special internal or administrative routes that include such things. Routers, beyond core routers, will have such routes when you get to data centers or enterprises. Also, in some implementations, there is little difference between the FIB and the PIT. If I remember correctly, in the 0.x ccnd PIT names were inserted in the same name tree as the FIB to find the LMP of a content object on the reverse path, then walk up the name tree to find all other possible matches for a returning content object. Marc On Sep 16, 2014, at 4:39 PM, Tai-Lin Chu wrote: >> If not, then LPM for the forwarding can be constrained to the (encapsulated) untyped name components up to the first marker - all following bytes will not influence routing. PIT and CS is another story. > > I assume that it is up to and "not including" the first marker. What > you just said in my opinion limits the typed component to be placed in > the end of a name, and I don't think this limit will be good. For > example, /folder1/v1/file1/v2. According to your statement, file1 will > not influence routing, but I think it should. > > > IMHO, the only one requirement of allocating new types for typed > components is necessity. i.e. Will this typed component help upper > layer or router to process this packet differently (perhaps more > efficiently)? > > > > > > On Tue, Sep 16, 2014 at 3:56 PM, wrote: >> Hi Marc, >> >> a question regarding the insightful encapsulation-to-the-left view (and >> taking up Tai-Lin's routing question): >> >> I see that demux is useful for end nodes where the applications are sitting. >> But are there important cases where core routers should demux, i.e. forward >> to different faces, based on that handful set of typed components we talk >> about? >> >> If not, then LPM for the forwarding can be constrained to the (encapsulated) >> untyped name components up to the first marker - all following bytes will >> not influence routing. PIT and CS is another story. >> >> best, christian >> >> >> >> On Tue, 16 Sep 2014, Marc.Mosko at parc.com wrote: >> >>> I am not sure why something being a typed name is related to routing. >>> Isn?t routing going to be over the full TLV representation of the name? Or >>> do you consider the TLV ?T? as separate from the name component and not used >>> in FIB or PIT matching? >>> >>> I think a serial version is more useful than a timestamp version. In CCNx >>> 1.0, we have a type for both, but we generally use the serial version. A >>> timestamp does not give distributed versioning, so like a serial version it >>> is only useful from a single publisher and it gives an easy way to determine >>> the next version. It does, of course, require that the publisher maintain >>> state rather than rely on its real-time clock (or ntp) for its version >>> number. A serial version number also allows unlimited version number >>> generation, whereas a quantized (e.g. milli-second) timestamp limits the >>> number of versions one can generate at a time without keeping state. >>> >>> As a general philosophy on named addresses, I see the hierarchical name >>> components as providing protocol encapsulation, essentially encapsulating >>> name components to the left (not all components are like this, but some >>> are). For example, when you add a version component to a name it is a >>> statement that a versioning protocol has encapsulated and possibly modified >>> the content object identified by the left name. When a segmentation protocol >>> is applied to a content object, it encapsulates a name to the left. They >>> serve a similar purpose to header encapsulation in traditional packets. >>> Therefore, I think that when a protocol is encapsulating the left-name, >>> those should be unambiguous and explicit. >>> >>> For protocols that everyone needs to understand, like versioning or >>> segmenting, those should be a standardized value, and not exclusive of other >>> protocols. Someone might come out, for example, with a better segmentation >>> protocol and that should have a different identifier than the earlier >>> segmentation protocol. Therefore, wherever you do your multiplexing you >>> need to coordinate. That?s going to be either in the TLV ?T? or in the >>> ?key? of a ?key=value? inside the ?V?. >>> >>> Marc >>> >>> On Sep 16, 2014, at 10:21 AM, Tai-Lin Chu wrote: >>> >>>> Summarize all types that people need (feel free to add some, and paste >>>> in your reply) >>>> - regular >>>> - segment >>>> - version (timestamp) >>>> - signature >>>> - key: assuming that the next regular component will be value. The >>>> value is empty if it sees another key component immediately after. >>>> - app-specific >>>> >>>> >>>> However, I am not convinced that we need version, signature, and >>>> app-specific as typed component. Will these change how packet routes? >>>> >>>> >>>> >>>> On Tue, Sep 16, 2014 at 8:47 AM, Junxiao Shi >>>> wrote: >>>>> >>>>> Hi Jeff >>>>> >>>>> Please see my proposal of MarkedComponent >>>>> >>>>> , >>>>> which is a solution to eliminate ambiguity by defining a new type >>>>> specifically for key-value pair. >>>>> >>>>> Yours, Junxiao >>>>> >>>>> On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Second, if the most important issue is eliminating ambiguity/aliasing, >>>>>> then why not define a new type that hints that the component can be >>>>>> interpreted as a key/value pair with some encoding convention? This >>>>>> could >>>>>> enable an unambiguous, short list of commonly used conventions that >>>>>> you've >>>>>> mentioned (using marker-like keys), while keeping information >>>>>> describing >>>>>> the data object in the name. It would also be very useful for >>>>>> applications >>>>>> that desire their own k/v representation for components, which Dave has >>>>>> argued for in other circumstances and we keep running across. It >>>>>> doesn't >>>>>> rule out use of hierarchy, and doesn't limit what an application >>>>>> defined >>>>>> keys could be. Yet, it could be ignored in forwarding (just another >>>>>> component) and perhaps have a still-meaningful sort order (key, then >>>>>> value). >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Marc.Mosko at parc.com Wed Sep 17 07:54:41 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Wed, 17 Sep 2014 14:54:41 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <0E30789C-A55C-4B6B-871F-9B6870A9E55B@parc.com> Message-ID: Yes, I meant to be referring to the MarkedComponent proposal with my T0/T1 example, but I did not call it out specifically. In the MarkedComponent proposal, the program state per name component is (hasKey, [key], value). So, each name component becomes multi-modal (ok, bi-modal). Some have a key some do not have a key. In the ?use the TLV type for markers? proposal (need a spiffier name for that, but it?s what ccnx 1.0 does), the program state is (T, value). Not multi-modal. Marc On Sep 17, 2014, at 7:46 AM, Junxiao Shi wrote: > Hi Marc > > The MarkedComponent proposal is precisely a T0/T1 system: > T0=NameComponent > T1=MarkedComponent > key is encoded as variable length number (same way as T) > This still requires all codes to distinguish between NameComponent and MarkedComponent everywhere, but we'll have exactly two types, instead of potentially many types. > > However, I agree that putting the key into TLV-TYPE is better than using MarkedComponent or having a key in every component, because the processing cost isn't any different. > > Yours, Junxiao > > On Wed, Sep 17, 2014 at 7:36 AM, wrote: > I think if you require all name components to have a ?key=value? format, then that is an ok system, but it seems duplicative of having the TLV type. Personally, I would then encode the ?key=? piece the same way you encode the TLV ?T? (i.e. the 1/3/5 system). > > Though it does seem rather duplicative of having the TLV type, as I said. I think having one T (call it T0) for non-tagged and one T (call it T1) for ?key=value? introduces more overhead than is needed, as you now have two type systems for one value. You will need to keep the T0/T1 distinction everywhere in the code so you know if there is a ?key=? embedded in the name. Programmatically, you?ll probably need to sprout a new class or getter/setter for the ?key=? for those types. It seems simpler to me for the TLV ?T? to always be associated with a name component, so you are always working with the (T, value) pair. > > Marc > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From jefft0 at remap.ucla.edu Wed Sep 17 10:10:33 2014 From: jefft0 at remap.ucla.edu (Thompson, Jeff) Date: Wed, 17 Sep 2014 17:10:33 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <0E30789C-A55C-4B6B-871F-9B6870A9E55B@parc.com> Message-ID: Hi Junxiao, In your MarkedComponent proposal, would you want to make the marker code fixed-length to address the concerns explained by Marc? In other words, the desire that a shorter name component value always sorts before a longer component value, regardless of its type? (The risk is that a shorter component value may have a marker code which encodes as a long VAR-NUMBER which would make the overall TLV longer and mess up the sorting.) - Jeff T From: Junxiao Shi > Date: Wednesday, September 17, 2014 7:46 AM To: "Marc.Mosko at parc.com" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? Hi Marc The MarkedComponent proposal is precisely a T0/T1 system: * T0=NameComponent * T1=MarkedComponent * key is encoded as variable length number (same way as T) This still requires all codes to distinguish between NameComponent and MarkedComponent everywhere, but we'll have exactly two types, instead of potentially many types. However, I agree that putting the key into TLV-TYPE is better than using MarkedComponent or having a key in every component, because the processing cost isn't any different. Yours, Junxiao On Wed, Sep 17, 2014 at 7:36 AM, > wrote: I think if you require all name components to have a ?key=value? format, then that is an ok system, but it seems duplicative of having the TLV type. Personally, I would then encode the ?key=? piece the same way you encode the TLV ?T? (i.e. the 1/3/5 system). Though it does seem rather duplicative of having the TLV type, as I said. I think having one T (call it T0) for non-tagged and one T (call it T1) for ?key=value? introduces more overhead than is needed, as you now have two type systems for one value. You will need to keep the T0/T1 distinction everywhere in the code so you know if there is a ?key=? embedded in the name. Programmatically, you?ll probably need to sprout a new class or getter/setter for the ?key=? for those types. It seems simpler to me for the TLV ?T? to always be associated with a name component, so you are always working with the (T, value) pair. Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From Marc.Mosko at parc.com Wed Sep 17 11:13:20 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Wed, 17 Sep 2014 18:13:20 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <0E30789C-A55C-4B6B-871F-9B6870A9E55B@parc.com> Message-ID: <080F0738-7D28-4365-9115-55616B2CDCFC@parc.com> If you want to keep exclusions, and to some extent left/right-most child, I think you only have a few options. For example, how do I exclude a version component but not a binary component with the same bytes? Are you going to add a MarkedComponent option to each exclusion predicate? 1) Use one ?T? then always use a (key=value) pair in the name component of every name component. This is a TLKV encoding. 2) Ditch TLV for name components, use a run-length encoding L(key=value) (LKV encoding), the exclude will include the ?KV? as the ?NameComponent?. 3) Use #2 with the observation that it?s really LTV encoding by another name. 4) Use one ?T? and use the ?_v/?? name component pairs. 5) Use MarkedComponents and extend the exclusion predicate to include the type. 6) Don?t use exclusions (the ccnx 1.0 option) It is not straightforward to include the type directly in the exclusion if you use multiple "T" TLV encoding with multiple types. You?d need to change the grammar of the exclusion. Exclude ::= EXCLUDE-TYPE TLV-LENGTH Any? (NameComponent (Any)?)+ Any ::= ANY-TYPE TLV-LENGTH(=0) Did I miss anything? Marc On Sep 17, 2014, at 10:10 AM, Thompson, Jeff wrote: > Hi Junxiao, > > In your MarkedComponent proposal, would you want to make the marker code fixed-length to address the concerns explained by Marc? In other words, the desire that a shorter name component value always sorts before a longer component value, regardless of its type? (The risk is that a shorter component value may have a marker code which encodes as a long VAR-NUMBER which would make the overall TLV longer and mess up the sorting.) > > - Jeff T > > From: Junxiao Shi > Date: Wednesday, September 17, 2014 7:46 AM > To: "Marc.Mosko at parc.com" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? > > Hi Marc > > The MarkedComponent proposal is precisely a T0/T1 system: > T0=NameComponent > T1=MarkedComponent > key is encoded as variable length number (same way as T) > This still requires all codes to distinguish between NameComponent and MarkedComponent everywhere, but we'll have exactly two types, instead of potentially many types. > > However, I agree that putting the key into TLV-TYPE is better than using MarkedComponent or having a key in every component, because the processing cost isn't any different. > > Yours, Junxiao > > On Wed, Sep 17, 2014 at 7:36 AM, wrote: >> I think if you require all name components to have a ?key=value? format, then that is an ok system, but it seems duplicative of having the TLV type. Personally, I would then encode the ?key=? piece the same way you encode the TLV ?T? (i.e. the 1/3/5 system). >> >> Though it does seem rather duplicative of having the TLV type, as I said. I think having one T (call it T0) for non-tagged and one T (call it T1) for ?key=value? introduces more overhead than is needed, as you now have two type systems for one value. You will need to keep the T0/T1 distinction everywhere in the code so you know if there is a ?key=? embedded in the name. Programmatically, you?ll probably need to sprout a new class or getter/setter for the ?key=? for those types. It seems simpler to me for the TLV ?T? to always be associated with a name component, so you are always working with the (T, value) pair. >> >> Marc >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From tailinchu at gmail.com Wed Sep 17 14:57:02 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Wed, 17 Sep 2014 14:57:02 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <080F0738-7D28-4365-9115-55616B2CDCFC@parc.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <0E30789C-A55C-4B6B-871F-9B6870A9E55B@parc.com> <080F0738-7D28-4365-9115-55616B2CDCFC@parc.com> Message-ID: 1. Because we will never open up the entire tlv type space to developers, markComponent is a must for application developers to add custom types. we could make version and segment in tlv type space, but since we will have markComponent anyway, placing them into markComponent might be be a bad choice. 2. for comparison, it is fairly simple to address. It only masses up sorting if we do direct strcmp. We parse both key and value. Compare key first. If the keys are the same, compare value like normal name component. Also markComponent is less than regular one. This means that we define another sorting scheme for markComponent. I don't think exclude will be an issue anymore. On Wed, Sep 17, 2014 at 11:13 AM, wrote: > > If you want to keep exclusions, and to some extent left/right-most child, I > think you only have a few options. For example, how do I exclude a version > component but not a binary component with the same bytes? Are you going to > add a MarkedComponent option to each exclusion predicate? > > 1) Use one ?T? then always use a (key=value) pair in the name component of > every name component. This is a TLKV encoding. > 2) Ditch TLV for name components, use a run-length encoding L(key=value) > (LKV encoding), the exclude will include the ?KV? as the ?NameComponent?. > 3) Use #2 with the observation that it?s really LTV encoding by another > name. > 4) Use one ?T? and use the ?_v/?? name component pairs. > 5) Use MarkedComponents and extend the exclusion predicate to include the > type. > 6) Don?t use exclusions (the ccnx 1.0 option) > > It is not straightforward to include the type directly in the exclusion if > you use multiple "T" TLV encoding with multiple types. You?d need to change > the grammar of the exclusion. > > Exclude ::= EXCLUDE-TYPE TLV-LENGTH Any? (NameComponent (Any)?)+ > Any ::= ANY-TYPE TLV-LENGTH(=0) > > > Did I miss anything? > > Marc > > On Sep 17, 2014, at 10:10 AM, Thompson, Jeff wrote: > > Hi Junxiao, > > In your MarkedComponent proposal, would you want to make the marker code > fixed-length to address the concerns explained by Marc? In other words, the > desire that a shorter name component value always sorts before a longer > component value, regardless of its type? (The risk is that a shorter > component value may have a marker code which encodes as a long VAR-NUMBER > which would make the overall TLV longer and mess up the sorting.) > > - Jeff T > > From: Junxiao Shi > Date: Wednesday, September 17, 2014 7:46 AM > To: "Marc.Mosko at parc.com" > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] any comments on naming convention? > > Hi Marc > > The MarkedComponent proposal > > is precisely a T0/T1 system: > > T0=NameComponent > T1=MarkedComponent > key is encoded as variable length number (same way as T) > > This still requires all codes to distinguish between NameComponent and > MarkedComponent everywhere, but we'll have exactly two types, instead of > potentially many types. > > However, I agree that putting the key into TLV-TYPE is better than using > MarkedComponent or having a key in every component, because the processing > cost isn't any different. > > Yours, Junxiao > > On Wed, Sep 17, 2014 at 7:36 AM, wrote: >> >> I think if you require all name components to have a ?key=value? format, >> then that is an ok system, but it seems duplicative of having the TLV type. >> Personally, I would then encode the ?key=? piece the same way you encode the >> TLV ?T? (i.e. the 1/3/5 system). >> >> Though it does seem rather duplicative of having the TLV type, as I said. >> I think having one T (call it T0) for non-tagged and one T (call it T1) for >> ?key=value? introduces more overhead than is needed, as you now have two >> type systems for one value. You will need to keep the T0/T1 distinction >> everywhere in the code so you know if there is a ?key=? embedded in the >> name. Programmatically, you?ll probably need to sprout a new class or >> getter/setter for the ?key=? for those types. It seems simpler to me for >> the TLV ?T? to always be associated with a name component, so you are always >> working with the (T, value) pair. >> >> Marc >> > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From massimo.gallo at alcatel-lucent.com Wed Sep 17 23:57:01 2014 From: massimo.gallo at alcatel-lucent.com (Massimo Gallo) Date: Thu, 18 Sep 2014 08:57:01 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541984EC.1000009@cisco.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> Message-ID: <541A823D.4010108@alcatel-lucent.com> On 17/09/2014 14:56, Mark Stapp wrote: > ah, thanks - that's helpful. I thought you were saying "I like the > existing NDN UTF8 'convention'." I'm still not sure I understand what > you _do_ prefer, though. it sounds like you're describing an entirely > different scheme where the info that describes the name-components is > ... someplace other than _in_ the name-components. is that correct? > when you say "field separator", what do you mean (since that's not a > "TL" from a TLV)? Correct. In particular, with our name encoding, a TLV indicates the name hierarchy with offsets in the name and other TLV(s) indicates the offset to use in order to retrieve special components. As for the field separator, it is something like "/". Aliasing is avoided as you do not rely on field separators to parse the name; you use the "offset TLV " to do that. So now, it may be an aesthetic question but: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. Max > > -- Mark > > On 9/17/14 6:02 AM, Massimo Gallo wrote: >> The why is simple: >> >> You use a lot of "generic component type" and very few "specific >> component type". You are imposing types for every component in order to >> handle few exceptions (segmentation, etc..). You create a rule (specify >> the component's type ) to handle exceptions! >> >> I would prefer not to have typed components. Instead I would prefer to >> have the name as simple sequence bytes with a field separator. Then, >> outside the name, if you have some components that could be used at >> network layer (e.g. a TLV field), you simply need something that >> indicates which is the offset allowing you to retrieve the version, >> segment, etc in the name... >> >> >> Max >> >> >> >> >> >> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>> >>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end up with >>>> names >>>> containing many generic components types and few specific components >>>> types. Due to the fact that the component type specification is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but that >>>> applications MUST use). >>>> >>> >>> so ... I can't quite follow that. the thread has had some explanation >>> about why the UTF8 requirement has problems (with aliasing, e.g.) and >>> there's been email trying to explain that applications don't have to >>> use types if they don't need to. your email sounds like "I prefer the >>> UTF8 convention", but it doesn't say why you have that preference in >>> the face of the points about the problems. can you say why it is that >>> you express a preference for the "convention" with problems ? >>> >>> Thanks, >>> Mark >>> >> >> . >> > > From tailinchu at gmail.com Thu Sep 18 00:27:43 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Thu, 18 Sep 2014 00:27:43 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541A823D.4010108@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> Message-ID: > if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. I don't get it. What you described only works if the "offset" is encoded in fixed bytes. With varNum, you will still need to parse x-1 offsets to get to the x offset. On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo wrote: > > On 17/09/2014 14:56, Mark Stapp wrote: >> >> ah, thanks - that's helpful. I thought you were saying "I like the >> existing NDN UTF8 'convention'." I'm still not sure I understand what you >> _do_ prefer, though. it sounds like you're describing an entirely different >> scheme where the info that describes the name-components is ... someplace >> other than _in_ the name-components. is that correct? when you say "field >> separator", what do you mean (since that's not a "TL" from a TLV)? > > Correct. > In particular, with our name encoding, a TLV indicates the name hierarchy > with offsets in the name and other TLV(s) indicates the offset to use in > order to retrieve special components. > As for the field separator, it is something like "/". Aliasing is avoided as > you do not rely on field separators to parse the name; you use the "offset > TLV " to do that. > > So now, it may be an aesthetic question but: > > if you do not need the entire hierarchal structure (suppose you only want > the first x components) you can directly have it using the offsets. With the > Nested TLV structure you have to iteratively parse the first x-1 components. > With the offset structure you cane directly access to the firs x components. > > Max > > >> >> -- Mark >> >> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>> The why is simple: >>> >>> You use a lot of "generic component type" and very few "specific >>> component type". You are imposing types for every component in order to >>> handle few exceptions (segmentation, etc..). You create a rule (specify >>> the component's type ) to handle exceptions! >>> >>> I would prefer not to have typed components. Instead I would prefer to >>> have the name as simple sequence bytes with a field separator. Then, >>> outside the name, if you have some components that could be used at >>> network layer (e.g. a TLV field), you simply need something that >>> indicates which is the offset allowing you to retrieve the version, >>> segment, etc in the name... >>> >>> >>> Max >>> >>> >>> >>> >>> >>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>> >>>>> >>>>> I think we agree on the small number of "component types". >>>>> However, if you have a small number of types, you will end up with >>>>> names >>>>> containing many generic components types and few specific components >>>>> types. Due to the fact that the component type specification is an >>>>> exception in the name, I would prefer something that specify >>>>> component's >>>>> type only when needed (something like UTF8 conventions but that >>>>> applications MUST use). >>>>> >>>> >>>> so ... I can't quite follow that. the thread has had some explanation >>>> about why the UTF8 requirement has problems (with aliasing, e.g.) and >>>> there's been email trying to explain that applications don't have to >>>> use types if they don't need to. your email sounds like "I prefer the >>>> UTF8 convention", but it doesn't say why you have that preference in >>>> the face of the points about the problems. can you say why it is that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>> >>> . >>> >> >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From massimo.gallo at alcatel-lucent.com Thu Sep 18 00:54:58 2014 From: massimo.gallo at alcatel-lucent.com (Massimo Gallo) Date: Thu, 18 Sep 2014 09:54:58 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> Message-ID: <541A8FD2.8010302@alcatel-lucent.com> Indeed each components' offset must be encoded using a fixed amount of bytes: i.e., Type = Offsets Length = 10 Bytes Value = Offset1(1byte), Offset2(1byte), ... You may also imagine to have a "Offset_2byte" type if your name is too long. Max On 18/09/2014 09:27, Tai-Lin Chu wrote: >> if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. > I don't get it. What you described only works if the "offset" is > encoded in fixed bytes. With varNum, you will still need to parse x-1 > offsets to get to the x offset. > > > > On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > wrote: >> On 17/09/2014 14:56, Mark Stapp wrote: >>> ah, thanks - that's helpful. I thought you were saying "I like the >>> existing NDN UTF8 'convention'." I'm still not sure I understand what you >>> _do_ prefer, though. it sounds like you're describing an entirely different >>> scheme where the info that describes the name-components is ... someplace >>> other than _in_ the name-components. is that correct? when you say "field >>> separator", what do you mean (since that's not a "TL" from a TLV)? >> Correct. >> In particular, with our name encoding, a TLV indicates the name hierarchy >> with offsets in the name and other TLV(s) indicates the offset to use in >> order to retrieve special components. >> As for the field separator, it is something like "/". Aliasing is avoided as >> you do not rely on field separators to parse the name; you use the "offset >> TLV " to do that. >> >> So now, it may be an aesthetic question but: >> >> if you do not need the entire hierarchal structure (suppose you only want >> the first x components) you can directly have it using the offsets. With the >> Nested TLV structure you have to iteratively parse the first x-1 components. >> With the offset structure you cane directly access to the firs x components. >> >> Max >> >> >>> -- Mark >>> >>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few "specific >>>> component type". You are imposing types for every component in order to >>>> handle few exceptions (segmentation, etc..). You create a rule (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would prefer to >>>> have the name as simple sequence bytes with a field separator. Then, >>>> outside the name, if you have some components that could be used at >>>> network layer (e.g. a TLV field), you simply need something that >>>> indicates which is the offset allowing you to retrieve the version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>> >>>>> >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>> >>>>>> I think we agree on the small number of "component types". >>>>>> However, if you have a small number of types, you will end up with >>>>>> names >>>>>> containing many generic components types and few specific components >>>>>> types. Due to the fact that the component type specification is an >>>>>> exception in the name, I would prefer something that specify >>>>>> component's >>>>>> type only when needed (something like UTF8 conventions but that >>>>>> applications MUST use). >>>>>> >>>>> so ... I can't quite follow that. the thread has had some explanation >>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) and >>>>> there's been email trying to explain that applications don't have to >>>>> use types if they don't need to. your email sounds like "I prefer the >>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>> the face of the points about the problems. can you say why it is that >>>>> you express a preference for the "convention" with problems ? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>> . >>>> >>> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From blee at AIT.IE Thu Sep 18 01:01:19 2014 From: blee at AIT.IE (Brian Lee) Date: Thu, 18 Sep 2014 08:01:19 +0000 Subject: [Ndn-interest] Hhy Message-ID: <0D29F65C-2D14-4CF6-BDB2-D14A7A17509D@AIT.IE> Sent from my iPhone The information contained in this email is confidential and is designated solely for the attention of the intended recipient(s). If you have received this email in error, please do not use or transmit it for any purpose but rather notify us immediately and delete all copies of this email from your computer system(s). Unless otherwise specifically agreed by our authorised representative, the views expressed in this email are those of the author only and shall not represent the view of or otherwise bind Athlone Institute of Technology. Contact administrator at ait.ie or telephone 090 6468000. From Ignacio.Solis at parc.com Thu Sep 18 14:02:26 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Thu, 18 Sep 2014 21:02:26 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541A8FD2.8010302@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> Message-ID: Does this make that much difference? If you want to parse the first 5 components. One way to do it is: Read the index, find entry 5, then read in that many bytes from the start offset of the beginning of the name. OR Start reading name, (find size + move ) 5 times. How much speed are you getting from one to the other? You seem to imply that the first one is faster. I don?t think this is the case. In the first one you?ll probably have to get the cache line for the index, then all the required cache lines for the first 5 components. For the second, you?ll have to get all the cache lines for the first 5 components. Given an assumption that a cache miss is way more expensive than evaluating a number and computing an addition, you might find that the performance of the index is actually slower than the performance of the direct access. Granted, there is a case where you don?t access the name at all, for example, if you just get the offsets and then send the offsets as parameters to another processor/GPU/NPU/etc. In this case you may see a gain IF there are more cache line misses in reading the name than in reading the index. So, if the regular part of the name that you?re parsing is bigger than the cache line (64 bytes?) and the name is to be processed by a different processor, then your might see some performance gain in using the index, but in all other circumstances I bet this is not the case. I may be wrong, haven?t actually tested it. This is all to say, I don?t think we should be designing the protocol with only one architecture in mind. (The architecture of sending the name to a different processor than the index). If you have numbers that show that the index is faster I would like to see under what conditions and architectural assumptions. Nacho (I may have misinterpreted your description so feel free to correct me if I?m wrong.) -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/18/14, 12:54 AM, "Massimo Gallo" wrote: >Indeed each components' offset must be encoded using a fixed amount of >bytes: > >i.e., >Type = Offsets >Length = 10 Bytes >Value = Offset1(1byte), Offset2(1byte), ... > >You may also imagine to have a "Offset_2byte" type if your name is too >long. > >Max > >On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> if you do not need the entire hierarchal structure (suppose you only >>>want the first x components) you can directly have it using the >>>offsets. With the Nested TLV structure you have to iteratively parse >>>the first x-1 components. With the offset structure you cane directly >>>access to the firs x components. >> I don't get it. What you described only works if the "offset" is >> encoded in fixed bytes. With varNum, you will still need to parse x-1 >> offsets to get to the x offset. >> >> >> >> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >> wrote: >>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>you >>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>different >>>> scheme where the info that describes the name-components is ... >>>>someplace >>>> other than _in_ the name-components. is that correct? when you say >>>>"field >>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>> Correct. >>> In particular, with our name encoding, a TLV indicates the name >>>hierarchy >>> with offsets in the name and other TLV(s) indicates the offset to use >>>in >>> order to retrieve special components. >>> As for the field separator, it is something like "/". Aliasing is >>>avoided as >>> you do not rely on field separators to parse the name; you use the >>>"offset >>> TLV " to do that. >>> >>> So now, it may be an aesthetic question but: >>> >>> if you do not need the entire hierarchal structure (suppose you only >>>want >>> the first x components) you can directly have it using the offsets. >>>With the >>> Nested TLV structure you have to iteratively parse the first x-1 >>>components. >>> With the offset structure you cane directly access to the firs x >>>components. >>> >>> Max >>> >>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>> The why is simple: >>>>> >>>>> You use a lot of "generic component type" and very few "specific >>>>> component type". You are imposing types for every component in order >>>>>to >>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>(specify >>>>> the component's type ) to handle exceptions! >>>>> >>>>> I would prefer not to have typed components. Instead I would prefer >>>>>to >>>>> have the name as simple sequence bytes with a field separator. Then, >>>>> outside the name, if you have some components that could be used at >>>>> network layer (e.g. a TLV field), you simply need something that >>>>> indicates which is the offset allowing you to retrieve the version, >>>>> segment, etc in the name... >>>>> >>>>> >>>>> Max >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>> >>>>>> >>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> I think we agree on the small number of "component types". >>>>>>> However, if you have a small number of types, you will end up with >>>>>>> names >>>>>>> containing many generic components types and few specific >>>>>>>components >>>>>>> types. Due to the fact that the component type specification is an >>>>>>> exception in the name, I would prefer something that specify >>>>>>> component's >>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>> applications MUST use). >>>>>>> >>>>>> so ... I can't quite follow that. the thread has had some >>>>>>explanation >>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>and >>>>>> there's been email trying to explain that applications don't have to >>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>the >>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>> the face of the points about the problems. can you say why it is >>>>>>that >>>>>> you express a preference for the "convention" with problems ? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>> . >>>>> >>>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Thu Sep 18 14:15:21 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Thu, 18 Sep 2014 21:15:21 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> Message-ID: <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> In fact, the index in separate TLV will be slower on some architectures, like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, then any subsequent memory is accessed only as two adjacent 32-byte blocks (there can be at most 5 blocks available at any one time). If you need to switch between arrays, it would be very expensive. If you have to read past the name to get to the 2nd array, then read it, then backup to get to the name, it will be pretty expensive too. Marc On Sep 18, 2014, at 2:02 PM, wrote: > Does this make that much difference? > > If you want to parse the first 5 components. One way to do it is: > > Read the index, find entry 5, then read in that many bytes from the start > offset of the beginning of the name. > OR > Start reading name, (find size + move ) 5 times. > > How much speed are you getting from one to the other? You seem to imply > that the first one is faster. I don?t think this is the case. > > In the first one you?ll probably have to get the cache line for the index, > then all the required cache lines for the first 5 components. For the > second, you?ll have to get all the cache lines for the first 5 components. > Given an assumption that a cache miss is way more expensive than > evaluating a number and computing an addition, you might find that the > performance of the index is actually slower than the performance of the > direct access. > > Granted, there is a case where you don?t access the name at all, for > example, if you just get the offsets and then send the offsets as > parameters to another processor/GPU/NPU/etc. In this case you may see a > gain IF there are more cache line misses in reading the name than in > reading the index. So, if the regular part of the name that you?re > parsing is bigger than the cache line (64 bytes?) and the name is to be > processed by a different processor, then your might see some performance > gain in using the index, but in all other circumstances I bet this is not > the case. I may be wrong, haven?t actually tested it. > > This is all to say, I don?t think we should be designing the protocol with > only one architecture in mind. (The architecture of sending the name to a > different processor than the index). > > If you have numbers that show that the index is faster I would like to see > under what conditions and architectural assumptions. > > Nacho > > (I may have misinterpreted your description so feel free to correct me if > I?m wrong.) > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/18/14, 12:54 AM, "Massimo Gallo" > wrote: > >> Indeed each components' offset must be encoded using a fixed amount of >> bytes: >> >> i.e., >> Type = Offsets >> Length = 10 Bytes >> Value = Offset1(1byte), Offset2(1byte), ... >> >> You may also imagine to have a "Offset_2byte" type if your name is too >> long. >> >> Max >> >> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> if you do not need the entire hierarchal structure (suppose you only >>>> want the first x components) you can directly have it using the >>>> offsets. With the Nested TLV structure you have to iteratively parse >>>> the first x-1 components. With the offset structure you cane directly >>>> access to the firs x components. >>> I don't get it. What you described only works if the "offset" is >>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>> you >>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>> different >>>>> scheme where the info that describes the name-components is ... >>>>> someplace >>>>> other than _in_ the name-components. is that correct? when you say >>>>> "field >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose you only >>>> want >>>> the first x components) you can directly have it using the offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first x-1 >>>> components. >>>> With the offset structure you cane directly access to the firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>>> -- Mark >>>>> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few "specific >>>>>> component type". You are imposing types for every component in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>> outside the name, if you have some components that could be used at >>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>> >>>>>>> >>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>> >>>>>>>> I think we agree on the small number of "component types". >>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>> names >>>>>>>> containing many generic components types and few specific >>>>>>>> components >>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>> exception in the name, I would prefer something that specify >>>>>>>> component's >>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>> applications MUST use). >>>>>>>> >>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>> explanation >>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>> and >>>>>>> there's been email trying to explain that applications don't have to >>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>> the >>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>> the face of the points about the problems. can you say why it is >>>>>>> that >>>>>>> you express a preference for the "convention" with problems ? >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>> . >>>>>> >>>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From felix at rabe.io Thu Sep 18 15:27:05 2014 From: felix at rabe.io (Felix Rabe) Date: Fri, 19 Sep 2014 00:27:05 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> Message-ID: <541B5C39.2050502@rabe.io> Would that chip be suitable, i.e. can we expect most names to fit in (the magnitude of) 96 bytes? What length are names usually in current NDN experiments? I guess wide deployment could make for even longer names. Related: Many URLs I encounter nowadays easily don't fit within two 80-column text lines, and NDN will have to carry more information than URLs, as far as I see. On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: > In fact, the index in separate TLV will be slower on some architectures, like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, then any subsequent memory is accessed only as two adjacent 32-byte blocks (there can be at most 5 blocks available at any one time). If you need to switch between arrays, it would be very expensive. If you have to read past the name to get to the 2nd array, then read it, then backup to get to the name, it will be pretty expensive too. > > Marc > > On Sep 18, 2014, at 2:02 PM, wrote: > >> Does this make that much difference? >> >> If you want to parse the first 5 components. One way to do it is: >> >> Read the index, find entry 5, then read in that many bytes from the start >> offset of the beginning of the name. >> OR >> Start reading name, (find size + move ) 5 times. >> >> How much speed are you getting from one to the other? You seem to imply >> that the first one is faster. I don?t think this is the case. >> >> In the first one you?ll probably have to get the cache line for the index, >> then all the required cache lines for the first 5 components. For the >> second, you?ll have to get all the cache lines for the first 5 components. >> Given an assumption that a cache miss is way more expensive than >> evaluating a number and computing an addition, you might find that the >> performance of the index is actually slower than the performance of the >> direct access. >> >> Granted, there is a case where you don?t access the name at all, for >> example, if you just get the offsets and then send the offsets as >> parameters to another processor/GPU/NPU/etc. In this case you may see a >> gain IF there are more cache line misses in reading the name than in >> reading the index. So, if the regular part of the name that you?re >> parsing is bigger than the cache line (64 bytes?) and the name is to be >> processed by a different processor, then your might see some performance >> gain in using the index, but in all other circumstances I bet this is not >> the case. I may be wrong, haven?t actually tested it. >> >> This is all to say, I don?t think we should be designing the protocol with >> only one architecture in mind. (The architecture of sending the name to a >> different processor than the index). >> >> If you have numbers that show that the index is faster I would like to see >> under what conditions and architectural assumptions. >> >> Nacho >> >> (I may have misinterpreted your description so feel free to correct me if >> I?m wrong.) >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> On 9/18/14, 12:54 AM, "Massimo Gallo" >> wrote: >> >>> Indeed each components' offset must be encoded using a fixed amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>> if you do not need the entire hierarchal structure (suppose you only >>>>> want the first x components) you can directly have it using the >>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>> the first x-1 components. With the offset structure you cane directly >>>>> access to the firs x components. >>>> I don't get it. What you described only works if the "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>> Correct. >>>>> In particular, with our name encoding, a TLV indicates the name >>>>> hierarchy >>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>> in >>>>> order to retrieve special components. >>>>> As for the field separator, it is something like "/". Aliasing is >>>>> avoided as >>>>> you do not rely on field separators to parse the name; you use the >>>>> "offset >>>>> TLV " to do that. >>>>> >>>>> So now, it may be an aesthetic question but: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose you only >>>>> want >>>>> the first x components) you can directly have it using the offsets. >>>>> With the >>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>> components. >>>>> With the offset structure you cane directly access to the firs x >>>>> components. >>>>> >>>>> Max >>>>> >>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>> The why is simple: >>>>>>> >>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>> component type". You are imposing types for every component in order >>>>>>> to >>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>> (specify >>>>>>> the component's type ) to handle exceptions! >>>>>>> >>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>> to >>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>> outside the name, if you have some components that could be used at >>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>> segment, etc in the name... >>>>>>> >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>> >>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>>> names >>>>>>>>> containing many generic components types and few specific >>>>>>>>> components >>>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>> component's >>>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>>> applications MUST use). >>>>>>>>> >>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>> explanation >>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>> and >>>>>>>> there's been email trying to explain that applications don't have to >>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>> the >>>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>> that >>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>> . >>>>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ignacio.Solis at parc.com Thu Sep 18 15:37:01 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Thu, 18 Sep 2014 22:37:01 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541B5C39.2050502@rabe.io> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> Message-ID: IMO it?s not good to equate URLs to NDN/CCN network names. In the case of CCN, we carry data in the interest payload field, so for us names smaller than 96 bytes are possible and probable. Longer names may exist for some manifests, we don?t have a clear notion of the frequency (or ratio) of those yet. In terms of URL sizes, there is quite a bit of info on that from some of the Cisco work and the datasets out there. (See http://www.ietf.org/proceedings/89/slides/slides-89-icnrg-10.pdf ) Note that the size of the complete name may not be an issue in some forwarding situations because you?re looking for routing prefixes. Finally, if the size of the complete name matters, because you?re trying to hash/match the whole thing, then the index is not going to help you. Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/18/14, 3:27 PM, "Felix Rabe" > wrote: Would that chip be suitable, i.e. can we expect most names to fit in (the magnitude of) 96 bytes? What length are names usually in current NDN experiments? I guess wide deployment could make for even longer names. Related: Many URLs I encounter nowadays easily don't fit within two 80-column text lines, and NDN will have to carry more information than URLs, as far as I see. On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: In fact, the index in separate TLV will be slower on some architectures, like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, then any subsequent memory is accessed only as two adjacent 32-byte blocks (there can be at most 5 blocks available at any one time). If you need to switch between arrays, it would be very expensive. If you have to read past the name to get to the 2nd array, then read it, then backup to get to the name, it will be pretty expensive too. Marc On Sep 18, 2014, at 2:02 PM, wrote: Does this make that much difference? If you want to parse the first 5 components. One way to do it is: Read the index, find entry 5, then read in that many bytes from the start offset of the beginning of the name. OR Start reading name, (find size + move ) 5 times. How much speed are you getting from one to the other? You seem to imply that the first one is faster. I don?t think this is the case. In the first one you?ll probably have to get the cache line for the index, then all the required cache lines for the first 5 components. For the second, you?ll have to get all the cache lines for the first 5 components. Given an assumption that a cache miss is way more expensive than evaluating a number and computing an addition, you might find that the performance of the index is actually slower than the performance of the direct access. Granted, there is a case where you don?t access the name at all, for example, if you just get the offsets and then send the offsets as parameters to another processor/GPU/NPU/etc. In this case you may see a gain IF there are more cache line misses in reading the name than in reading the index. So, if the regular part of the name that you?re parsing is bigger than the cache line (64 bytes?) and the name is to be processed by a different processor, then your might see some performance gain in using the index, but in all other circumstances I bet this is not the case. I may be wrong, haven?t actually tested it. This is all to say, I don?t think we should be designing the protocol with only one architecture in mind. (The architecture of sending the name to a different processor than the index). If you have numbers that show that the index is faster I would like to see under what conditions and architectural assumptions. Nacho (I may have misinterpreted your description so feel free to correct me if I?m wrong.) -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/18/14, 12:54 AM, "Massimo Gallo" wrote: Indeed each components' offset must be encoded using a fixed amount of bytes: i.e., Type = Offsets Length = 10 Bytes Value = Offset1(1byte), Offset2(1byte), ... You may also imagine to have a "Offset_2byte" type if your name is too long. Max On 18/09/2014 09:27, Tai-Lin Chu wrote: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. I don't get it. What you described only works if the "offset" is encoded in fixed bytes. With varNum, you will still need to parse x-1 offsets to get to the x offset. On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo wrote: On 17/09/2014 14:56, Mark Stapp wrote: ah, thanks - that's helpful. I thought you were saying "I like the existing NDN UTF8 'convention'." I'm still not sure I understand what you _do_ prefer, though. it sounds like you're describing an entirely different scheme where the info that describes the name-components is ... someplace other than _in_ the name-components. is that correct? when you say "field separator", what do you mean (since that's not a "TL" from a TLV)? Correct. In particular, with our name encoding, a TLV indicates the name hierarchy with offsets in the name and other TLV(s) indicates the offset to use in order to retrieve special components. As for the field separator, it is something like "/". Aliasing is avoided as you do not rely on field separators to parse the name; you use the "offset TLV " to do that. So now, it may be an aesthetic question but: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. Max -- Mark On 9/17/14 6:02 AM, Massimo Gallo wrote: The why is simple: You use a lot of "generic component type" and very few "specific component type". You are imposing types for every component in order to handle few exceptions (segmentation, etc..). You create a rule (specify the component's type ) to handle exceptions! I would prefer not to have typed components. Instead I would prefer to have the name as simple sequence bytes with a field separator. Then, outside the name, if you have some components that could be used at network layer (e.g. a TLV field), you simply need something that indicates which is the offset allowing you to retrieve the version, segment, etc in the name... Max On 16/09/2014 20:33, Mark Stapp wrote: On 9/16/14 10:29 AM, Massimo Gallo wrote: I think we agree on the small number of "component types". However, if you have a small number of types, you will end up with names containing many generic components types and few specific components types. Due to the fact that the component type specification is an exception in the name, I would prefer something that specify component's type only when needed (something like UTF8 conventions but that applications MUST use). so ... I can't quite follow that. the thread has had some explanation about why the UTF8 requirement has problems (with aliasing, e.g.) and there's been email trying to explain that applications don't have to use types if they don't need to. your email sounds like "I prefer the UTF8 convention", but it doesn't say why you have that preference in the face of the points about the problems. can you say why it is that you express a preference for the "convention" with problems ? Thanks, Mark . _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.eduhttp://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.eduhttp://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.eduhttp://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.eduhttp://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Thu Sep 18 16:41:48 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Thu, 18 Sep 2014 16:41:48 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541B5C39.2050502@rabe.io> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> Message-ID: We should not look at a certain chip nowadays and want ndn to perform well on it. It should be the other way around: once ndn app becomes popular, a better chip will be designed for ndn. I feel the discussion today and yesterday has been off-topic. Now I see that there are 3 approaches: 1. we should not define a naming convention at all 2. typed component: use tlv type space and add a handful of types 3. marked component: introduce only one more type and add additional marker space Also everybody thinks that the current utf8 marker naming convention needs to be revised. On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: > Would that chip be suitable, i.e. can we expect most names to fit in (the > magnitude of) 96 bytes? What length are names usually in current NDN > experiments? > > I guess wide deployment could make for even longer names. Related: Many URLs > I encounter nowadays easily don't fit within two 80-column text lines, and > NDN will have to carry more information than URLs, as far as I see. > > > On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: > > In fact, the index in separate TLV will be slower on some architectures, > like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, > then any subsequent memory is accessed only as two adjacent 32-byte blocks > (there can be at most 5 blocks available at any one time). If you need to > switch between arrays, it would be very expensive. If you have to read past > the name to get to the 2nd array, then read it, then backup to get to the > name, it will be pretty expensive too. > > Marc > > On Sep 18, 2014, at 2:02 PM, > wrote: > > Does this make that much difference? > > If you want to parse the first 5 components. One way to do it is: > > Read the index, find entry 5, then read in that many bytes from the start > offset of the beginning of the name. > OR > Start reading name, (find size + move ) 5 times. > > How much speed are you getting from one to the other? You seem to imply > that the first one is faster. I don?t think this is the case. > > In the first one you?ll probably have to get the cache line for the index, > then all the required cache lines for the first 5 components. For the > second, you?ll have to get all the cache lines for the first 5 components. > Given an assumption that a cache miss is way more expensive than > evaluating a number and computing an addition, you might find that the > performance of the index is actually slower than the performance of the > direct access. > > Granted, there is a case where you don?t access the name at all, for > example, if you just get the offsets and then send the offsets as > parameters to another processor/GPU/NPU/etc. In this case you may see a > gain IF there are more cache line misses in reading the name than in > reading the index. So, if the regular part of the name that you?re > parsing is bigger than the cache line (64 bytes?) and the name is to be > processed by a different processor, then your might see some performance > gain in using the index, but in all other circumstances I bet this is not > the case. I may be wrong, haven?t actually tested it. > > This is all to say, I don?t think we should be designing the protocol with > only one architecture in mind. (The architecture of sending the name to a > different processor than the index). > > If you have numbers that show that the index is faster I would like to see > under what conditions and architectural assumptions. > > Nacho > > (I may have misinterpreted your description so feel free to correct me if > I?m wrong.) > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/18/14, 12:54 AM, "Massimo Gallo" > wrote: > > Indeed each components' offset must be encoded using a fixed amount of > bytes: > > i.e., > Type = Offsets > Length = 10 Bytes > Value = Offset1(1byte), Offset2(1byte), ... > > You may also imagine to have a "Offset_2byte" type if your name is too > long. > > Max > > On 18/09/2014 09:27, Tai-Lin Chu wrote: > > if you do not need the entire hierarchal structure (suppose you only > want the first x components) you can directly have it using the > offsets. With the Nested TLV structure you have to iteratively parse > the first x-1 components. With the offset structure you cane directly > access to the firs x components. > > I don't get it. What you described only works if the "offset" is > encoded in fixed bytes. With varNum, you will still need to parse x-1 > offsets to get to the x offset. > > > > On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > wrote: > > On 17/09/2014 14:56, Mark Stapp wrote: > > ah, thanks - that's helpful. I thought you were saying "I like the > existing NDN UTF8 'convention'." I'm still not sure I understand what > you > _do_ prefer, though. it sounds like you're describing an entirely > different > scheme where the info that describes the name-components is ... > someplace > other than _in_ the name-components. is that correct? when you say > "field > separator", what do you mean (since that's not a "TL" from a TLV)? > > Correct. > In particular, with our name encoding, a TLV indicates the name > hierarchy > with offsets in the name and other TLV(s) indicates the offset to use > in > order to retrieve special components. > As for the field separator, it is something like "/". Aliasing is > avoided as > you do not rely on field separators to parse the name; you use the > "offset > TLV " to do that. > > So now, it may be an aesthetic question but: > > if you do not need the entire hierarchal structure (suppose you only > want > the first x components) you can directly have it using the offsets. > With the > Nested TLV structure you have to iteratively parse the first x-1 > components. > With the offset structure you cane directly access to the firs x > components. > > Max > > > -- Mark > > On 9/17/14 6:02 AM, Massimo Gallo wrote: > > The why is simple: > > You use a lot of "generic component type" and very few "specific > component type". You are imposing types for every component in order > to > handle few exceptions (segmentation, etc..). You create a rule > (specify > the component's type ) to handle exceptions! > > I would prefer not to have typed components. Instead I would prefer > to > have the name as simple sequence bytes with a field separator. Then, > outside the name, if you have some components that could be used at > network layer (e.g. a TLV field), you simply need something that > indicates which is the offset allowing you to retrieve the version, > segment, etc in the name... > > > Max > > > > > > On 16/09/2014 20:33, Mark Stapp wrote: > > On 9/16/14 10:29 AM, Massimo Gallo wrote: > > I think we agree on the small number of "component types". > However, if you have a small number of types, you will end up with > names > containing many generic components types and few specific > components > types. Due to the fact that the component type specification is an > exception in the name, I would prefer something that specify > component's > type only when needed (something like UTF8 conventions but that > applications MUST use). > > so ... I can't quite follow that. the thread has had some > explanation > about why the UTF8 requirement has problems (with aliasing, e.g.) > and > there's been email trying to explain that applications don't have to > use types if they don't need to. your email sounds like "I prefer > the > UTF8 convention", but it doesn't say why you have that preference in > the face of the points about the problems. can you say why it is > that > you express a preference for the "convention" with problems ? > > Thanks, > Mark > > . > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From oran at cisco.com Thu Sep 18 17:20:24 2014 From: oran at cisco.com (Dave Oran (oran)) Date: Fri, 19 Sep 2014 00:20:24 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> Message-ID: <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: > We should not look at a certain chip nowadays and want ndn to perform > well on it. It should be the other way around: once ndn app becomes > popular, a better chip will be designed for ndn. > While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: a) clock rates are not getting (much) faster b) memory accesses are getting (relatively) more expensive c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. > I feel the discussion today and yesterday has been off-topic. Now I > see that there are 3 approaches: > 1. we should not define a naming convention at all > 2. typed component: use tlv type space and add a handful of types > 3. marked component: introduce only one more type and add additional > marker space > I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. > Also everybody thinks that the current utf8 marker naming convention > needs to be revised. > > > > On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >> Would that chip be suitable, i.e. can we expect most names to fit in (the >> magnitude of) 96 bytes? What length are names usually in current NDN >> experiments? >> >> I guess wide deployment could make for even longer names. Related: Many URLs >> I encounter nowadays easily don't fit within two 80-column text lines, and >> NDN will have to carry more information than URLs, as far as I see. >> >> >> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >> >> In fact, the index in separate TLV will be slower on some architectures, >> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >> then any subsequent memory is accessed only as two adjacent 32-byte blocks >> (there can be at most 5 blocks available at any one time). If you need to >> switch between arrays, it would be very expensive. If you have to read past >> the name to get to the 2nd array, then read it, then backup to get to the >> name, it will be pretty expensive too. >> >> Marc >> >> On Sep 18, 2014, at 2:02 PM, >> wrote: >> >> Does this make that much difference? >> >> If you want to parse the first 5 components. One way to do it is: >> >> Read the index, find entry 5, then read in that many bytes from the start >> offset of the beginning of the name. >> OR >> Start reading name, (find size + move ) 5 times. >> >> How much speed are you getting from one to the other? You seem to imply >> that the first one is faster. I don?t think this is the case. >> >> In the first one you?ll probably have to get the cache line for the index, >> then all the required cache lines for the first 5 components. For the >> second, you?ll have to get all the cache lines for the first 5 components. >> Given an assumption that a cache miss is way more expensive than >> evaluating a number and computing an addition, you might find that the >> performance of the index is actually slower than the performance of the >> direct access. >> >> Granted, there is a case where you don?t access the name at all, for >> example, if you just get the offsets and then send the offsets as >> parameters to another processor/GPU/NPU/etc. In this case you may see a >> gain IF there are more cache line misses in reading the name than in >> reading the index. So, if the regular part of the name that you?re >> parsing is bigger than the cache line (64 bytes?) and the name is to be >> processed by a different processor, then your might see some performance >> gain in using the index, but in all other circumstances I bet this is not >> the case. I may be wrong, haven?t actually tested it. >> >> This is all to say, I don?t think we should be designing the protocol with >> only one architecture in mind. (The architecture of sending the name to a >> different processor than the index). >> >> If you have numbers that show that the index is faster I would like to see >> under what conditions and architectural assumptions. >> >> Nacho >> >> (I may have misinterpreted your description so feel free to correct me if >> I?m wrong.) >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> On 9/18/14, 12:54 AM, "Massimo Gallo" >> wrote: >> >> Indeed each components' offset must be encoded using a fixed amount of >> bytes: >> >> i.e., >> Type = Offsets >> Length = 10 Bytes >> Value = Offset1(1byte), Offset2(1byte), ... >> >> You may also imagine to have a "Offset_2byte" type if your name is too >> long. >> >> Max >> >> On 18/09/2014 09:27, Tai-Lin Chu wrote: >> >> if you do not need the entire hierarchal structure (suppose you only >> want the first x components) you can directly have it using the >> offsets. With the Nested TLV structure you have to iteratively parse >> the first x-1 components. With the offset structure you cane directly >> access to the firs x components. >> >> I don't get it. What you described only works if the "offset" is >> encoded in fixed bytes. With varNum, you will still need to parse x-1 >> offsets to get to the x offset. >> >> >> >> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >> wrote: >> >> On 17/09/2014 14:56, Mark Stapp wrote: >> >> ah, thanks - that's helpful. I thought you were saying "I like the >> existing NDN UTF8 'convention'." I'm still not sure I understand what >> you >> _do_ prefer, though. it sounds like you're describing an entirely >> different >> scheme where the info that describes the name-components is ... >> someplace >> other than _in_ the name-components. is that correct? when you say >> "field >> separator", what do you mean (since that's not a "TL" from a TLV)? >> >> Correct. >> In particular, with our name encoding, a TLV indicates the name >> hierarchy >> with offsets in the name and other TLV(s) indicates the offset to use >> in >> order to retrieve special components. >> As for the field separator, it is something like "/". Aliasing is >> avoided as >> you do not rely on field separators to parse the name; you use the >> "offset >> TLV " to do that. >> >> So now, it may be an aesthetic question but: >> >> if you do not need the entire hierarchal structure (suppose you only >> want >> the first x components) you can directly have it using the offsets. >> With the >> Nested TLV structure you have to iteratively parse the first x-1 >> components. >> With the offset structure you cane directly access to the firs x >> components. >> >> Max >> >> >> -- Mark >> >> On 9/17/14 6:02 AM, Massimo Gallo wrote: >> >> The why is simple: >> >> You use a lot of "generic component type" and very few "specific >> component type". You are imposing types for every component in order >> to >> handle few exceptions (segmentation, etc..). You create a rule >> (specify >> the component's type ) to handle exceptions! >> >> I would prefer not to have typed components. Instead I would prefer >> to >> have the name as simple sequence bytes with a field separator. Then, >> outside the name, if you have some components that could be used at >> network layer (e.g. a TLV field), you simply need something that >> indicates which is the offset allowing you to retrieve the version, >> segment, etc in the name... >> >> >> Max >> >> >> >> >> >> On 16/09/2014 20:33, Mark Stapp wrote: >> >> On 9/16/14 10:29 AM, Massimo Gallo wrote: >> >> I think we agree on the small number of "component types". >> However, if you have a small number of types, you will end up with >> names >> containing many generic components types and few specific >> components >> types. Due to the fact that the component type specification is an >> exception in the name, I would prefer something that specify >> component's >> type only when needed (something like UTF8 conventions but that >> applications MUST use). >> >> so ... I can't quite follow that. the thread has had some >> explanation >> about why the UTF8 requirement has problems (with aliasing, e.g.) >> and >> there's been email trying to explain that applications don't have to >> use types if they don't need to. your email sounds like "I prefer >> the >> UTF8 convention", but it doesn't say why you have that preference in >> the face of the points about the problems. can you say why it is >> that >> you express a preference for the "convention" with problems ? >> >> Thanks, >> Mark >> >> . >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From tailinchu at gmail.com Thu Sep 18 18:09:22 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Thu, 18 Sep 2014 18:09:22 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> Message-ID: > I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. Could you share it with us? >While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design we could design for performance, but I think there will be a turning point when the slower design starts to become "fast enough". Do you think there will be some design of ndn that will *never* have performance improvement? On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: > > On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: > >> We should not look at a certain chip nowadays and want ndn to perform >> well on it. It should be the other way around: once ndn app becomes >> popular, a better chip will be designed for ndn. >> > While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: > a) clock rates are not getting (much) faster > b) memory accesses are getting (relatively) more expensive > c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. > > The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: > 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere > 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. > > I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. > >> I feel the discussion today and yesterday has been off-topic. Now I >> see that there are 3 approaches: >> 1. we should not define a naming convention at all >> 2. typed component: use tlv type space and add a handful of types >> 3. marked component: introduce only one more type and add additional >> marker space >> > I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. > > It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. > >> Also everybody thinks that the current utf8 marker naming convention >> needs to be revised. >> >> >> >> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>> magnitude of) 96 bytes? What length are names usually in current NDN >>> experiments? >>> >>> I guess wide deployment could make for even longer names. Related: Many URLs >>> I encounter nowadays easily don't fit within two 80-column text lines, and >>> NDN will have to carry more information than URLs, as far as I see. >>> >>> >>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>> >>> In fact, the index in separate TLV will be slower on some architectures, >>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>> (there can be at most 5 blocks available at any one time). If you need to >>> switch between arrays, it would be very expensive. If you have to read past >>> the name to get to the 2nd array, then read it, then backup to get to the >>> name, it will be pretty expensive too. >>> >>> Marc >>> >>> On Sep 18, 2014, at 2:02 PM, >>> wrote: >>> >>> Does this make that much difference? >>> >>> If you want to parse the first 5 components. One way to do it is: >>> >>> Read the index, find entry 5, then read in that many bytes from the start >>> offset of the beginning of the name. >>> OR >>> Start reading name, (find size + move ) 5 times. >>> >>> How much speed are you getting from one to the other? You seem to imply >>> that the first one is faster. I don?t think this is the case. >>> >>> In the first one you?ll probably have to get the cache line for the index, >>> then all the required cache lines for the first 5 components. For the >>> second, you?ll have to get all the cache lines for the first 5 components. >>> Given an assumption that a cache miss is way more expensive than >>> evaluating a number and computing an addition, you might find that the >>> performance of the index is actually slower than the performance of the >>> direct access. >>> >>> Granted, there is a case where you don?t access the name at all, for >>> example, if you just get the offsets and then send the offsets as >>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>> gain IF there are more cache line misses in reading the name than in >>> reading the index. So, if the regular part of the name that you?re >>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>> processed by a different processor, then your might see some performance >>> gain in using the index, but in all other circumstances I bet this is not >>> the case. I may be wrong, haven?t actually tested it. >>> >>> This is all to say, I don?t think we should be designing the protocol with >>> only one architecture in mind. (The architecture of sending the name to a >>> different processor than the index). >>> >>> If you have numbers that show that the index is faster I would like to see >>> under what conditions and architectural assumptions. >>> >>> Nacho >>> >>> (I may have misinterpreted your description so feel free to correct me if >>> I?m wrong.) >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>> wrote: >>> >>> Indeed each components' offset must be encoded using a fixed amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> >>> if you do not need the entire hierarchal structure (suppose you only >>> want the first x components) you can directly have it using the >>> offsets. With the Nested TLV structure you have to iteratively parse >>> the first x-1 components. With the offset structure you cane directly >>> access to the firs x components. >>> >>> I don't get it. What you described only works if the "offset" is >>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>> >>> On 17/09/2014 14:56, Mark Stapp wrote: >>> >>> ah, thanks - that's helpful. I thought you were saying "I like the >>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>> you >>> _do_ prefer, though. it sounds like you're describing an entirely >>> different >>> scheme where the info that describes the name-components is ... >>> someplace >>> other than _in_ the name-components. is that correct? when you say >>> "field >>> separator", what do you mean (since that's not a "TL" from a TLV)? >>> >>> Correct. >>> In particular, with our name encoding, a TLV indicates the name >>> hierarchy >>> with offsets in the name and other TLV(s) indicates the offset to use >>> in >>> order to retrieve special components. >>> As for the field separator, it is something like "/". Aliasing is >>> avoided as >>> you do not rely on field separators to parse the name; you use the >>> "offset >>> TLV " to do that. >>> >>> So now, it may be an aesthetic question but: >>> >>> if you do not need the entire hierarchal structure (suppose you only >>> want >>> the first x components) you can directly have it using the offsets. >>> With the >>> Nested TLV structure you have to iteratively parse the first x-1 >>> components. >>> With the offset structure you cane directly access to the firs x >>> components. >>> >>> Max >>> >>> >>> -- Mark >>> >>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>> The why is simple: >>> >>> You use a lot of "generic component type" and very few "specific >>> component type". You are imposing types for every component in order >>> to >>> handle few exceptions (segmentation, etc..). You create a rule >>> (specify >>> the component's type ) to handle exceptions! >>> >>> I would prefer not to have typed components. Instead I would prefer >>> to >>> have the name as simple sequence bytes with a field separator. Then, >>> outside the name, if you have some components that could be used at >>> network layer (e.g. a TLV field), you simply need something that >>> indicates which is the offset allowing you to retrieve the version, >>> segment, etc in the name... >>> >>> >>> Max >>> >>> >>> >>> >>> >>> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end up with >>> names >>> containing many generic components types and few specific >>> components >>> types. Due to the fact that the component type specification is an >>> exception in the name, I would prefer something that specify >>> component's >>> type only when needed (something like UTF8 conventions but that >>> applications MUST use). >>> >>> so ... I can't quite follow that. the thread has had some >>> explanation >>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>> and >>> there's been email trying to explain that applications don't have to >>> use types if they don't need to. your email sounds like "I prefer >>> the >>> UTF8 convention", but it doesn't say why you have that preference in >>> the face of the points about the problems. can you say why it is >>> that >>> you express a preference for the "convention" with problems ? >>> >>> Thanks, >>> Mark >>> >>> . >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From massimo.gallo at alcatel-lucent.com Fri Sep 19 04:00:56 2014 From: massimo.gallo at alcatel-lucent.com (Massimo Gallo) Date: Fri, 19 Sep 2014 13:00:56 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> Message-ID: <541C0CE8.40902@alcatel-lucent.com> Hi Nacho, I agree that the protocol design should not be done with one single architecture in mind. Indeed, different architecture may have different cache line sizes, memory latency, etc... So I think the performance of the two name encoding is really architecture and implementation dependent. For example: Let say you need first the name up to the 5th component, then the name up to the 4th component: - In the first way you described to get the name up to the 4th component, the second time, you already have the needed data in the L1/L2 cache. - In the second way you may also have all you need in memory (name can be longer with T and L in the middle) but, you need to iterate again to get the name up to the 4th component. Now, in this case you can improve your performance by computing hash values while you parse the name. However, you are spending time in computing hash values (very computationally expensive) that you may not need. So based on the implementation you can have better performance with one or the other. Max On 18/09/2014 23:02, Ignacio.Solis at parc.com wrote: > Does this make that much difference? > > If you want to parse the first 5 components. One way to do it is: > > Read the index, find entry 5, then read in that many bytes from the start > offset of the beginning of the name. > OR > Start reading name, (find size + move ) 5 times. > > How much speed are you getting from one to the other? You seem to imply > that the first one is faster. I don?t think this is the case. > > In the first one you?ll probably have to get the cache line for the index, > then all the required cache lines for the first 5 components. For the > second, you?ll have to get all the cache lines for the first 5 components. > Given an assumption that a cache miss is way more expensive than > evaluating a number and computing an addition, you might find that the > performance of the index is actually slower than the performance of the > direct access. > > Granted, there is a case where you don?t access the name at all, for > example, if you just get the offsets and then send the offsets as > parameters to another processor/GPU/NPU/etc. In this case you may see a > gain IF there are more cache line misses in reading the name than in > reading the index. So, if the regular part of the name that you?re > parsing is bigger than the cache line (64 bytes?) and the name is to be > processed by a different processor, then your might see some performance > gain in using the index, but in all other circumstances I bet this is not > the case. I may be wrong, haven?t actually tested it. > > This is all to say, I don?t think we should be designing the protocol with > only one architecture in mind. (The architecture of sending the name to a > different processor than the index). > > If you have numbers that show that the index is faster I would like to see > under what conditions and architectural assumptions. > > Nacho > > (I may have misinterpreted your description so feel free to correct me if > I?m wrong.) > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/18/14, 12:54 AM, "Massimo Gallo" > wrote: > >> Indeed each components' offset must be encoded using a fixed amount of >> bytes: >> >> i.e., >> Type = Offsets >> Length = 10 Bytes >> Value = Offset1(1byte), Offset2(1byte), ... >> >> You may also imagine to have a "Offset_2byte" type if your name is too >> long. >> >> Max >> >> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> if you do not need the entire hierarchal structure (suppose you only >>>> want the first x components) you can directly have it using the >>>> offsets. With the Nested TLV structure you have to iteratively parse >>>> the first x-1 components. With the offset structure you cane directly >>>> access to the firs x components. >>> I don't get it. What you described only works if the "offset" is >>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>> you >>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>> different >>>>> scheme where the info that describes the name-components is ... >>>>> someplace >>>>> other than _in_ the name-components. is that correct? when you say >>>>> "field >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose you only >>>> want >>>> the first x components) you can directly have it using the offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first x-1 >>>> components. >>>> With the offset structure you cane directly access to the firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>>> -- Mark >>>>> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few "specific >>>>>> component type". You are imposing types for every component in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>> outside the name, if you have some components that could be used at >>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>> >>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>> I think we agree on the small number of "component types". >>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>> names >>>>>>>> containing many generic components types and few specific >>>>>>>> components >>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>> exception in the name, I would prefer something that specify >>>>>>>> component's >>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>> applications MUST use). >>>>>>>> >>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>> explanation >>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>> and >>>>>>> there's been email trying to explain that applications don't have to >>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>> the >>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>> the face of the points about the problems. can you say why it is >>>>>>> that >>>>>>> you express a preference for the "convention" with problems ? >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>> . >>>>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From felix at rabe.io Fri Sep 19 10:57:22 2014 From: felix at rabe.io (Felix Rabe) Date: Fri, 19 Sep 2014 19:57:22 +0200 Subject: [Ndn-interest] CloudFlare announces keyless SSL Message-ID: <541C6E82.9050002@rabe.io> Hi list I'm sure you will (and please do) tell me "we've considered this already, there's 12 research papers from 1980 talking about this", but I still find it interesting to throw in here for discussion: http://blog.cloudflare.com/announcing-keyless-ssl-all-the-benefits-of-cloudflare-without-having-to-turn-over-your-private-ssl-keys/ HN: https://news.ycombinator.com/item?id=8334933 tl;dr (aka what I find interesting): Being a CDN, CloudFlare now implements a scheme where they proxy SSL connections without having direct access to the private SSL key themselves. Now, as far as I understand (and please correct me), NDN does not protect the transport but the individual packet. For static (or cacheable) content, NDN provides the caching, and DoS attacks (another main advantage of CloudFlare) are mitigated by ignoring unsolicited traffic. So NDN is like "Internet with CloudFlare built-in". (I'll post a related question in a separate email, to keep the topic in a certain boundary.) - Felix From felix at rabe.io Fri Sep 19 11:01:34 2014 From: felix at rabe.io (Felix Rabe) Date: Fri, 19 Sep 2014 20:01:34 +0200 Subject: [Ndn-interest] Security: Privacy of Interest names Message-ID: <541C6F7E.7000302@rabe.io> Hi list Someone at NDNcomm, maybe it was Steven Dale, raised this issue that I wonder about as well: Using SSL, an eavesdropper can see that I connect to the bank and the amount of information that I exchange, but he cannot see *what* information I access. Whereas with NDN, to provide at least as much privacy, the Interest names would need to be encrypted as well. Is this assumption correct? If yes, what mechanism does NDN provide for privacy of names in Interests? If no, enlighten me please :) - Felix From thecodemaiden at gmail.com Fri Sep 19 11:12:41 2014 From: thecodemaiden at gmail.com (Adeola Bannis) Date: Fri, 19 Sep 2014 11:12:41 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types Message-ID: Hello all, I am proposing to add an HMAC type, using SHA256 as the hash function, to the signature types defined at http://named-data.net/doc/NDN-TLV/current/signature.html. This will enable communication with symmetric keys, which reduces the signing and verification load on resource-constrained devices. The proposal is attached. Please review it and reply with any comments or suggestions. Thanks, Adeola -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: TLV_spec_HMAC_SHA256.docx Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document Size: 103049 bytes Desc: not available URL: From gts at ics.uci.EDU Fri Sep 19 11:48:50 2014 From: gts at ics.uci.EDU (GTS) Date: Fri, 19 Sep 2014 11:48:50 -0700 Subject: [Ndn-interest] Security: Privacy of Interest names In-Reply-To: <541C6F7E.7000302@rabe.io> References: <541C6F7E.7000302@rabe.io> Message-ID: <541C7A92.3030604@ics.uci.edu> Felix: it is indeed the case that, with cleartext interests, more information is leaked in an NDN interaction than in its IP counterpart. However, it is quite trivial to encrypt an arbitrary number of name suffix components. Assuming that Alice (customer) and her bank (Wells Fargo) share a key (see ** below), Alice can issue interests with names, such as: \ndn\com\usa\wells-fargo\california\ENCRYPTED-GLOP The cleartext prefix is rout-able. This kind of an interest leaks no more than an IP packet sent by Alice to california.wells-fargo.com with contents of ENCRYPTED-GLOP. In fact, in NDN/CCNx, it leaks less information since IP also leaks the source. One intuitive use-case for encrypted name suffixes is to implement something like a VPN. E.g., a wells-fargo VPN border router would receive the above interest and decrypt ENCRYPTED-GLOP to produce a name that might be rout-able within the wells-fargo private network, e.g.: \ndn\wells-fargo-private\orange-county\laguna-beach\retail\Alice\withdrawal\etc. This way, both the bank's internal structure and Alice's request details are concealed from eavesdroppers. **How Alice and her bank come to share a key is a separate issue. Alice might begin by issuing an interest for the content corresponding to the bank's public key. Sort of like an SSL Client Hello. The reply would be similar to SSL Server Hello. The rest is left as a homework exercise :-) Cheers, Gene ====================== Gene Tsudik Chancellor's Professor of Computer Science University of California, Irvine On 9/19/14, 11:01 AM, Felix Rabe wrote: > Hi list > > Someone at NDNcomm, maybe it was Steven Dale, raised this issue that I > wonder about as well: > > Using SSL, an eavesdropper can see that I connect to the bank and the > amount of information that I exchange, but he cannot see *what* > information I access. > > Whereas with NDN, to provide at least as much privacy, the Interest > names would need to be encrypted as well. Is this assumption correct? > If yes, what mechanism does NDN provide for privacy of names in > Interests? If no, enlighten me please :) > > - Felix > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From felix at rabe.io Fri Sep 19 12:01:18 2014 From: felix at rabe.io (Felix Rabe) Date: Fri, 19 Sep 2014 21:01:18 +0200 Subject: [Ndn-interest] Fwd: Re: Security: Privacy of Interest names In-Reply-To: References: Message-ID: <541C7D7E.2020001@rabe.io> -------- Forwarded Message -------- Subject: Re: [Ndn-interest] Security: Privacy of Interest names Date: Fri, 19 Sep 2014 11:27:10 -0700 From: Alex Horn To: Felix Rabe Whereas with NDN, to provide at least as much privacy, the Interest names would need to be encrypted as well. Is this assumption correct? If yes, what mechanism does NDN provide for privacy of names in Interests? for privacy, applications can encrypt names. NDN-CCL doesn't currently provide encryption schemes, yet NDN protocol itself allows for this by allowing any value as name components. i'm sure there are more examples, meanwhile some writing on name encryption: http://named-data.net/publications/cache_privacy-icdcs13/ http://named-data.net/publications/2013ccr-privacy/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.tschudin at unibas.ch Fri Sep 19 12:47:09 2014 From: christian.tschudin at unibas.ch (christian.tschudin at unibas.ch) Date: Fri, 19 Sep 2014 21:47:09 +0200 (CEST) Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <88FFB87C-CB9F-4190-A5AF-92968905E6DA@parc.com> References: <541841E9.6010608@cisco.com> <88FFB87C-CB9F-4190-A5AF-92968905E6DA@parc.com> Message-ID: Marc, thanks for that example and the routing comment, which reinforces my suspicion that we are trying to do too much, in too specific ways. The writeup below is probably out of scope for the nameing convention and encoding debate, but might serve as a background for the goals of these marked components. In a nutshell: serializing marked components has built-in assumptions which might not hold in all cases, so alternatives should be considered, also encoding wise. I believe that in the forwarding architecture we should distinguish the "name-under-routing" (high-speed) from the "name-in-the-interest-or-data-object" (subject to slow path treatment), which are currently collapsed. Analysis: I write the semantics of your name /foo/bar/v1/bibliography/v0/s0 in the usual object oriented way (this is encapsulate-to-the-left) as ((((/foo/bar).getVersion(1)).getMetadata(bibliography)).getVersion(0)).getSegment(0) The problem is (looking from a named-function perspective) that each of these function invocations can happen anywhere, not necessarily where the /foo/bar data is. Your Metadata example is insightful, because metadata really is a new object. If a computing center stores thumbnails on a different server than bibliography or legal copyright notices or the concerned object itself, and if you want to route on the metadata parameter ("thumbnail") instead of "bibliography" (or the name of the underlying object), you are in trouble: with LPM on the serialized name, this is not possible. Or you would have to inject metdadata routes for every possible version of the /foo/bar: FIB: /foo/bar/v --> rightFace /foo/bar/v0/bibliography --> leftFace /foo/bar/v1/bibliography --> leftFace /foo/bar/v2/bibliography --> leftFace ... because with LPM, you cannot have something like: /foo/bar/v --> rightFace /foo/bar/v*/bibliography --> leftFace To continue the analysis, here is a writeup of the semantics in a more explicit form than the OO-syntax-sugar-way from above suggests: name how it is produced #1 = /foo/bar #2 = (#1.getVersion) (#1, 1) #3 = (#2.getMetadata) (#2, bibliography) #4 = (#3.getVersion) (#3, 0) #5 = (#4.getSegment) (#4, 0) Reading example: The getMetadata() function is specific to the version 1 instance of the /foo/bar object - this function neither has to reside at the same place as the object itself nor does it have to produce/reference an object that resides at the same place. Note that three "parameters" are involved: the function name, the object name, and the 'bibliography' value. In order to produce intermediate result #4, you have to - locate the function (its name is "#2.getMetadata") - locate the data ("#2") - find an execution place, to which you also provide the 'bibliography' value Potentially, all these three places are different. But even if you claim that typically they are not, the result can be at another place: The location of result #4 in the metadata case could be different for 'thumbnails' than 'copyright' or 'bibliography': in fact, it could be anywhere on the planet (and was computed by the getMetadata function - there is no way to infer it from the elements in the original name). So this comes back to my suspicion that every marker location potentially is a routing inflection point. If you intend to do routing decision on the flat (oo-way) encoding, and because only one /foo/bar name has been given, you restrict that retrieving and processing can and will happen only at a single place or in marker sequence. The problem really is that the names of intermediate results may point to other locations but are never spelled out while in general we have a name-rewriting situation. Design options This leads to different "solutions": 1) Impose a "locality doctrin": the stem name MUST be enough to locate that single place where object, the referenced functions, and any other object related data reside. 2) Ban "getMetadata()" and any other marker that potentially could lead to other locations than implied by the stem name (which cannot be captured by LPM). The getVersion() is probably safe because of the contextualization (a new chapter version will be the new context of an even newer section in the name tree, hence LPM is fine). 3) Separate 'name-under-routing' from 'name-in-the-message': It can be that some markers "rewrite" the routing target, hence we need a way of keeping the signed message intact but inform the forwarding that another name should be routed on. The use of the fixed header comes to mind: fixedHeaderMsg { . version, TTL, etc . name-under-routing = point to name in msg . optional headers . msg = interest(/foo/bar/v1/bibliography/v0/s0)) } --> fixedHeaderMsg { . version, TTL, etc . name-under-routing = point to name in optional headers . optional headers name-under-routing = /foo/bibliography-owner/v0/bar/v1/s0 . msg = interest(/foo/bar/v1/bibliography/v0/s0)) } Or, for NDN, wrapping the message in another one: msg = interest( nameMangling( /foo/bibliography-owner/v0/bar/v1/s0, /foo/bar/v1/bibliography/v0/s0 ) ) Of course, rewriting/repointing is what named-functions need in order to work on a request by letting the network decide on the flow of processing through different locations. It seems to me that markers, especially getMetadata(), was already a step into named-function land. 4) If one feels uncomfortable with this rewriting at header level, the solution is to return a redirection back to the requestor, which would have to relaunch the query with the name it learned. This means that the packet format imposes edge-only processing, which could have easily been solved at the lower routing level: -> interest(/foo/bar/v1/bibliography/v0/s0))) <- data("please visit /foo/bibliography-owner/v0/bar/v1/s0") -> interest(/foo/bibliography-owner/v0/bar/v1/s0) <- data(finally arrives and burned 20'000 miles) Thanks, christian On Wed, 17 Sep 2014, Marc.Mosko at parc.com wrote: > As Tai-Lin?s example shows, there can always be multiple > encapsulations. This is also very common for the ?metadata? markers, > /foo/bar/v1/bibliography/v0/s0, for example, where > the left name is encapsulated by a metadata object that describes it > via some second object. > > Christian?s question was specifically about a ?core router?. I > seriously doubt core routers would have FIB entries that go out to > application names or beyond as part of the ICN routing protocol. > That said, they might have some special internal or administrative > routes that include such things. > > Routers, beyond core routers, will have such routes when you get to > data centers or enterprises. > > Also, in some implementations, there is little difference between the > FIB and the PIT. If I remember correctly, in the 0.x ccnd PIT names > were inserted in the same name tree as the FIB to find the LMP of a > content object on the reverse path, then walk up the name tree to find > all other possible matches for a returning content object. > > Marc > > On Sep 16, 2014, at 4:39 PM, Tai-Lin Chu wrote: > >>> If not, then LPM for the forwarding can be constrained to the (encapsulated) untyped name components up to the first marker - all following bytes will not influence routing. PIT and CS is another story. >> >> I assume that it is up to and "not including" the first marker. What >> you just said in my opinion limits the typed component to be placed in >> the end of a name, and I don't think this limit will be good. For >> example, /folder1/v1/file1/v2. According to your statement, file1 will >> not influence routing, but I think it should. >> >> >> IMHO, the only one requirement of allocating new types for typed >> components is necessity. i.e. Will this typed component help upper >> layer or router to process this packet differently (perhaps more >> efficiently)? >> >> >> >> >> >> On Tue, Sep 16, 2014 at 3:56 PM, wrote: >>> Hi Marc, >>> >>> a question regarding the insightful encapsulation-to-the-left view (and >>> taking up Tai-Lin's routing question): >>> >>> I see that demux is useful for end nodes where the applications are sitting. >>> But are there important cases where core routers should demux, i.e. forward >>> to different faces, based on that handful set of typed components we talk >>> about? >>> >>> If not, then LPM for the forwarding can be constrained to the (encapsulated) >>> untyped name components up to the first marker - all following bytes will >>> not influence routing. PIT and CS is another story. >>> >>> best, christian >>> >>> >>> >>> On Tue, 16 Sep 2014, Marc.Mosko at parc.com wrote: >>> >>>> I am not sure why something being a typed name is related to routing. >>>> Isn?t routing going to be over the full TLV representation of the name? Or >>>> do you consider the TLV ?T? as separate from the name component and not used >>>> in FIB or PIT matching? >>>> >>>> I think a serial version is more useful than a timestamp version. In CCNx >>>> 1.0, we have a type for both, but we generally use the serial version. A >>>> timestamp does not give distributed versioning, so like a serial version it >>>> is only useful from a single publisher and it gives an easy way to determine >>>> the next version. It does, of course, require that the publisher maintain >>>> state rather than rely on its real-time clock (or ntp) for its version >>>> number. A serial version number also allows unlimited version number >>>> generation, whereas a quantized (e.g. milli-second) timestamp limits the >>>> number of versions one can generate at a time without keeping state. >>>> >>>> As a general philosophy on named addresses, I see the hierarchical name >>>> components as providing protocol encapsulation, essentially encapsulating >>>> name components to the left (not all components are like this, but some >>>> are). For example, when you add a version component to a name it is a >>>> statement that a versioning protocol has encapsulated and possibly modified >>>> the content object identified by the left name. When a segmentation protocol >>>> is applied to a content object, it encapsulates a name to the left. They >>>> serve a similar purpose to header encapsulation in traditional packets. >>>> Therefore, I think that when a protocol is encapsulating the left-name, >>>> those should be unambiguous and explicit. >>>> >>>> For protocols that everyone needs to understand, like versioning or >>>> segmenting, those should be a standardized value, and not exclusive of other >>>> protocols. Someone might come out, for example, with a better segmentation >>>> protocol and that should have a different identifier than the earlier >>>> segmentation protocol. Therefore, wherever you do your multiplexing you >>>> need to coordinate. That?s going to be either in the TLV ?T? or in the >>>> ?key? of a ?key=value? inside the ?V?. >>>> >>>> Marc >>>> >>>> On Sep 16, 2014, at 10:21 AM, Tai-Lin Chu wrote: >>>> >>>>> Summarize all types that people need (feel free to add some, and paste >>>>> in your reply) >>>>> - regular >>>>> - segment >>>>> - version (timestamp) >>>>> - signature >>>>> - key: assuming that the next regular component will be value. The >>>>> value is empty if it sees another key component immediately after. >>>>> - app-specific >>>>> >>>>> >>>>> However, I am not convinced that we need version, signature, and >>>>> app-specific as typed component. Will these change how packet routes? >>>>> >>>>> >>>>> >>>>> On Tue, Sep 16, 2014 at 8:47 AM, Junxiao Shi >>>>> wrote: >>>>>> >>>>>> Hi Jeff >>>>>> >>>>>> Please see my proposal of MarkedComponent >>>>>> >>>>>> , >>>>>> which is a solution to eliminate ambiguity by defining a new type >>>>>> specifically for key-value pair. >>>>>> >>>>>> Yours, Junxiao >>>>>> >>>>>> On Tue, Sep 16, 2014 at 8:18 AM, Burke, Jeff >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Second, if the most important issue is eliminating ambiguity/aliasing, >>>>>>> then why not define a new type that hints that the component can be >>>>>>> interpreted as a key/value pair with some encoding convention? This >>>>>>> could >>>>>>> enable an unambiguous, short list of commonly used conventions that >>>>>>> you've >>>>>>> mentioned (using marker-like keys), while keeping information >>>>>>> describing >>>>>>> the data object in the name. It would also be very useful for >>>>>>> applications >>>>>>> that desire their own k/v representation for components, which Dave has >>>>>>> argued for in other circumstances and we keep running across. It >>>>>>> doesn't >>>>>>> rule out use of hierarchy, and doesn't limit what an application >>>>>>> defined >>>>>>> keys could be. Yet, it could be ignored in forwarding (just another >>>>>>> component) and perhaps have a still-meaningful sort order (key, then >>>>>>> value). >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>> > > From Ignacio.Solis at parc.com Fri Sep 19 13:21:41 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Fri, 19 Sep 2014 20:21:41 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <541C0CE8.40902@alcatel-lucent.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <541C0CE8.40902@alcatel-lucent.com> Message-ID: Let me see if I parse your example: For /a/b/c/d/e You want to compute hash(/a/b/c/d/e) and then compute hash(/a/b/c/d/). Case A (index): Step 1 - Read index Step 2 - find 5th component termination Step 3 - Read memory for /a/b/c/d/e Step 4 - Compute hash of /a/b/c/d/e Step 5 - Read index (cached) Step 6 - find 4th component Step 3 - Read memory for /a/b/c/d (cached) Step 4 - Compute hash of /a/b/c/d Case B1 (no index) Step 1a - Read component /a, parse Step 1b - Read component /b, parse Step 1c - Read component /c, parse Step 1d - Read component /d, parse Step 1e - Read component /e, parse Step 2 - compute hash of /a/b/c/d/e Step 3a - Read component /a, parse (cached) Step 3b - Read component /b, parse (cached) Step 3c - Read component /c, parse (cached) Step 3d - Read component /d, parse (cached) Step 4 - compute hash of /a/b/c/d Case B2 (no index, compute hash along the way) Step 1 - Read component /a, parse Step 2 - compute hash of /a Step 3 - Read component /b, parse Step 4 - compute hash of /a/b Step 5 - Read component /c, parse Step 6 - compute hash of /a/b/c Step 7 - Read component /d, parse Step 8 - compute hash of /a/b/c/d Step 9 - Read component /e, parse Step 10 - compute hash of /a/b/c/d/e Your argument is that Case A might be faster because Case B1 needs to reparse and Case B2 needs to recompute the hash. In Case B1, the reparse is actually really fast if the name is in cache. Hitting the L1 cache costs about 4 cycles (in something like the Core i7 processors) and the parsing and add are basically single cycle instructions. A L2 cache miss is at least 40 cycles in the best of conditions (hitting a unshared L3 cache line) and much more if it has to go to ram. In the case of B2, this may not be that high of a cost depending on the hashing algorithm. After all, the algorithm has to go through all the bytes. For small names that don?t need multiple cache line sizes this won?t matter, everything will be small, everything will be fast. For large names where you go through a lot of memory you want to optimize the access to that memory. Anyway, I?m not an expert in any of this, so I may be mistaken, but I fail to see the big gains of the index. So my question is, why add the complexity of the index external to the name? Are you really saving that much? Do you have any results for this? Because, if it?s a matter of saving time, why not just pre-compute the hashes and send them along with the packet so they don?t need to be computed again? Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/19/14, 4:00 AM, "Massimo Gallo" wrote: >Hi Nacho, > >I agree that the protocol design should not be done with one single >architecture in mind. Indeed, different architecture may have different >cache line sizes, memory latency, etc... So I think the performance of >the two name encoding is really architecture and implementation dependent. > >For example: >Let say you need first the name up to the 5th component, then the name >up to the 4th component: > >- In the first way you described to get the name up to the 4th >component, the second time, you already have the needed data in the >L1/L2 cache. > >- In the second way you may also have all you need in memory (name can >be longer with T and L in the middle) but, you need to iterate again to >get the name up to the 4th component. Now, in this case you can improve >your performance by computing hash values while you parse the name. >However, you are spending time in computing hash values (very >computationally expensive) that you may not need. > >So based on the implementation you can have better performance with one >or the other. > >Max > > >On 18/09/2014 23:02, Ignacio.Solis at parc.com wrote: >> Does this make that much difference? >> >> If you want to parse the first 5 components. One way to do it is: >> >> Read the index, find entry 5, then read in that many bytes from the >>start >> offset of the beginning of the name. >> OR >> Start reading name, (find size + move ) 5 times. >> >> How much speed are you getting from one to the other? You seem to imply >> that the first one is faster. I don?t think this is the case. >> >> In the first one you?ll probably have to get the cache line for the >>index, >> then all the required cache lines for the first 5 components. For the >> second, you?ll have to get all the cache lines for the first 5 >>components. >> Given an assumption that a cache miss is way more expensive than >> evaluating a number and computing an addition, you might find that the >> performance of the index is actually slower than the performance of the >> direct access. >> >> Granted, there is a case where you don?t access the name at all, for >> example, if you just get the offsets and then send the offsets as >> parameters to another processor/GPU/NPU/etc. In this case you may see a >> gain IF there are more cache line misses in reading the name than in >> reading the index. So, if the regular part of the name that you?re >> parsing is bigger than the cache line (64 bytes?) and the name is to be >> processed by a different processor, then your might see some performance >> gain in using the index, but in all other circumstances I bet this is >>not >> the case. I may be wrong, haven?t actually tested it. >> >> This is all to say, I don?t think we should be designing the protocol >>with >> only one architecture in mind. (The architecture of sending the name to >>a >> different processor than the index). >> >> If you have numbers that show that the index is faster I would like to >>see >> under what conditions and architectural assumptions. >> >> Nacho >> >> (I may have misinterpreted your description so feel free to correct me >>if >> I?m wrong.) >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> On 9/18/14, 12:54 AM, "Massimo Gallo" >> wrote: >> >>> Indeed each components' offset must be encoded using a fixed amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>> if you do not need the entire hierarchal structure (suppose you only >>>>> want the first x components) you can directly have it using the >>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>> the first x-1 components. With the offset structure you cane directly >>>>> access to the firs x components. >>>> I don't get it. What you described only works if the "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand >>>>>>what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>> Correct. >>>>> In particular, with our name encoding, a TLV indicates the name >>>>> hierarchy >>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>> in >>>>> order to retrieve special components. >>>>> As for the field separator, it is something like "/". Aliasing is >>>>> avoided as >>>>> you do not rely on field separators to parse the name; you use the >>>>> "offset >>>>> TLV " to do that. >>>>> >>>>> So now, it may be an aesthetic question but: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose you only >>>>> want >>>>> the first x components) you can directly have it using the offsets. >>>>> With the >>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>> components. >>>>> With the offset structure you cane directly access to the firs x >>>>> components. >>>>> >>>>> Max >>>>> >>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>> The why is simple: >>>>>>> >>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>> component type". You are imposing types for every component in >>>>>>>order >>>>>>> to >>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>> (specify >>>>>>> the component's type ) to handle exceptions! >>>>>>> >>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>> to >>>>>>> have the name as simple sequence bytes with a field separator. >>>>>>>Then, >>>>>>> outside the name, if you have some components that could be used at >>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>> segment, etc in the name... >>>>>>> >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>> >>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>> However, if you have a small number of types, you will end up >>>>>>>>>with >>>>>>>>> names >>>>>>>>> containing many generic components types and few specific >>>>>>>>> components >>>>>>>>> types. Due to the fact that the component type specification is >>>>>>>>>an >>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>> component's >>>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>>> applications MUST use). >>>>>>>>> >>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>> explanation >>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>> and >>>>>>>> there's been email trying to explain that applications don't have >>>>>>>>to >>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>> the >>>>>>>> UTF8 convention", but it doesn't say why you have that preference >>>>>>>>in >>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>> that >>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>> . >>>>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From shijunxiao at email.arizona.edu Fri Sep 19 16:19:10 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Fri, 19 Sep 2014 16:19:10 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: Hi Adeola I agree with the necessity of HMAC signature. I have the following questions on the details: - What's expected to appear in KeyLocator? - What's the benefit of using opad and ipad? - Why should SignatureValue contain two SHA256 hash functions? Why not use just "SHA256(KeyValue, Name, MetaInfo, Content, SignatureInfo)"? An accompanying document is needed to cover some guidance about how to design an application that makes use of HMAC signature and still guarantee a strong level of provenance. In particular, is this scheme usable if producer and sender do not exist at the same time? Yours, Junxiao? -------------- next part -------------- An HTML attachment was scrubbed... URL: From thecodemaiden at gmail.com Fri Sep 19 16:58:21 2014 From: thecodemaiden at gmail.com (Adeola Bannis) Date: Fri, 19 Sep 2014 16:58:21 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: On Fri, Sep 19, 2014 at 4:19 PM, Junxiao Shi wrote: > Hi Adeola > > I agree with the necessity of HMAC signature. > > I have the following questions on the details: > > - What's expected to appear in KeyLocator? > > In my current implementation, I am setting up communications between two devices, and each of these devices is assigned an NDN name, which I can use to identify the sender/receiver of a signed packet. I think this is an implementation detail, similar to (partial) certificate names being used as key names with the current RSA signature. That is, there is nothing forcing someone implementing their own trust model with RSA signatures to use our certificate Data type and certificate names. To relate to the current RSA signature KeyLocator, you can think of it as an identity instead of a full certificate name. > > - What's the benefit of using opad and ipad? > - Why should SignatureValue contain two SHA256 hash functions? Why not > use just "SHA256(KeyValue, Name, MetaInfo, Content, SignatureInfo)"? > > This is how HMAC is defined ( http://en.wikipedia.org/wiki/Hash-based_message_authentication_code http://www.ietf.org/rfc/rfc2104.txt). The two applications of SHA256 allow the symmetric key to be embedded in the hash. Otherwise, it would be a simple digest and could not prove the identity of a sender. The choice of ipad and opad were made by someone more aware of hash function attacks than I am. > > An accompanying document is needed to cover some guidance about how to > design an application that makes use of HMAC signature and still guarantee > a strong level of provenance. > There are many implementations of HMAC for authenticating web services. See http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGuide/HMACAuth.html for an example. I am not sure that I would be able to provide better guidance. > In particular, is this scheme usable if producer and sender do not exist > at the same time? > I'm not sure what you mean by exist. If they both know the key, they can exchange data. If you have old data stored and then someone tells you the symmetric key used in signing, you can verify it. It is exactly the same as if you encountered old data signed with an RSA private key, and then got the corresponding public key by whatever means: you would then be able to verify it. > Yours, Junxiao? > Thanks, Adeola -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Fri Sep 19 17:46:10 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Fri, 19 Sep 2014 17:46:10 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: 1. just to make sure: you are proposing "standard" sha256 hmac. 2. The biggest benefit that I can see from hmac is that it is faster to both encode/decode. As a result, we can use RSA to first bootstrap a symmetric key and use it for hmac. On Fri, Sep 19, 2014 at 4:58 PM, Adeola Bannis wrote: > > > On Fri, Sep 19, 2014 at 4:19 PM, Junxiao Shi > wrote: >> >> Hi Adeola >> >> I agree with the necessity of HMAC signature. >> >> I have the following questions on the details: >> >> What's expected to appear in KeyLocator? > > In my current implementation, I am setting up communications between two > devices, and each of these devices is assigned an NDN name, which I can use > to identify the sender/receiver of a signed packet. I think this is an > implementation detail, similar to (partial) certificate names being used as > key names with the current RSA signature. That is, there is nothing forcing > someone implementing their own trust model with RSA signatures to use our > certificate Data type and certificate names. > > To relate to the current RSA signature KeyLocator, you can think of it as an > identity instead of a full certificate name. > >> >> What's the benefit of using opad and ipad? >> Why should SignatureValue contain two SHA256 hash functions? Why not use >> just "SHA256(KeyValue, Name, MetaInfo, Content, SignatureInfo)"? > > This is how HMAC is defined > (http://en.wikipedia.org/wiki/Hash-based_message_authentication_code > http://www.ietf.org/rfc/rfc2104.txt). The two applications of SHA256 allow > the symmetric key to be embedded in the hash. Otherwise, it would be a > simple digest and could not prove the identity of a sender. The choice of > ipad and opad were made by someone more aware of hash function attacks than > I am. > >> >> >> An accompanying document is needed to cover some guidance about how to >> design an application that makes use of HMAC signature and still guarantee a >> strong level of provenance. > > > There are many implementations of HMAC for authenticating web services. See > http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGuide/HMACAuth.html > for an example. I am not sure that I would be able to provide better > guidance. > > >> >> In particular, is this scheme usable if producer and sender do not exist >> at the same time? > > > I'm not sure what you mean by exist. If they both know the key, they can > exchange data. If you have old data stored and then someone tells you the > symmetric key used in signing, you can verify it. It is exactly the same as > if you encountered old data signed with an RSA private key, and then got the > corresponding public key by whatever means: you would then be able to verify > it. > >> >> Yours, Junxiao > > > Thanks, > Adeola > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From abannis at ucla.edu Fri Sep 19 17:50:29 2014 From: abannis at ucla.edu (Adeola Bannis) Date: Fri, 19 Sep 2014 17:50:29 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: On Fri, Sep 19, 2014 at 5:46 PM, Tai-Lin Chu wrote: > 1. just to make sure: you are proposing "standard" sha256 hmac. > > Yes. > 2. The biggest benefit that I can see from hmac is that it is faster > to both encode/decode. As a result, we can use RSA to first bootstrap > a symmetric key and use it for hmac. > > On Fri, Sep 19, 2014 at 4:58 PM, Adeola Bannis > wrote: > > > > > > On Fri, Sep 19, 2014 at 4:19 PM, Junxiao Shi < > shijunxiao at email.arizona.edu> > > wrote: > >> > >> Hi Adeola > >> > >> I agree with the necessity of HMAC signature. > >> > >> I have the following questions on the details: > >> > >> What's expected to appear in KeyLocator? > > > > In my current implementation, I am setting up communications between two > > devices, and each of these devices is assigned an NDN name, which I can > use > > to identify the sender/receiver of a signed packet. I think this is an > > implementation detail, similar to (partial) certificate names being used > as > > key names with the current RSA signature. That is, there is nothing > forcing > > someone implementing their own trust model with RSA signatures to use our > > certificate Data type and certificate names. > > > > To relate to the current RSA signature KeyLocator, you can think of it > as an > > identity instead of a full certificate name. > > > >> > >> What's the benefit of using opad and ipad? > >> Why should SignatureValue contain two SHA256 hash functions? Why not use > >> just "SHA256(KeyValue, Name, MetaInfo, Content, SignatureInfo)"? > > > > This is how HMAC is defined > > (http://en.wikipedia.org/wiki/Hash-based_message_authentication_code > > http://www.ietf.org/rfc/rfc2104.txt). The two applications of SHA256 > allow > > the symmetric key to be embedded in the hash. Otherwise, it would be a > > simple digest and could not prove the identity of a sender. The choice of > > ipad and opad were made by someone more aware of hash function attacks > than > > I am. > > > >> > >> > >> An accompanying document is needed to cover some guidance about how to > >> design an application that makes use of HMAC signature and still > guarantee a > >> strong level of provenance. > > > > > > There are many implementations of HMAC for authenticating web services. > See > > > http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGuide/HMACAuth.html > > for an example. I am not sure that I would be able to provide better > > guidance. > > > > > >> > >> In particular, is this scheme usable if producer and sender do not exist > >> at the same time? > > > > > > I'm not sure what you mean by exist. If they both know the key, they can > > exchange data. If you have old data stored and then someone tells you the > > symmetric key used in signing, you can verify it. It is exactly the same > as > > if you encountered old data signed with an RSA private key, and then got > the > > corresponding public key by whatever means: you would then be able to > verify > > it. > > > >> > >> Yours, Junxiao > > > > > > Thanks, > > Adeola > > > > _______________________________________________ > > Ndn-interest mailing list > > Ndn-interest at lists.cs.ucla.edu > > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wentaoshang at gmail.com Fri Sep 19 19:08:12 2014 From: wentaoshang at gmail.com (Wentao Shang) Date: Fri, 19 Sep 2014 19:08:12 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: On Friday, September 19, 2014, Tai-Lin Chu wrote: > 1. just to make sure: you are proposing "standard" sha256 hmac. > > 2. The biggest benefit that I can see from hmac is that it is faster > to both encode/decode. As a result, we can use RSA to first bootstrap > a symmetric key and use it for hmac. Another important benefit is that for resource-constrained devices asymmetric signature may not be feasible at all and symmetric signature provides a viable alternative. Wentao > > On Fri, Sep 19, 2014 at 4:58 PM, Adeola Bannis > wrote: > > > > > > On Fri, Sep 19, 2014 at 4:19 PM, Junxiao Shi < > shijunxiao at email.arizona.edu > > > wrote: > >> > >> Hi Adeola > >> > >> I agree with the necessity of HMAC signature. > >> > >> I have the following questions on the details: > >> > >> What's expected to appear in KeyLocator? > > > > In my current implementation, I am setting up communications between two > > devices, and each of these devices is assigned an NDN name, which I can > use > > to identify the sender/receiver of a signed packet. I think this is an > > implementation detail, similar to (partial) certificate names being used > as > > key names with the current RSA signature. That is, there is nothing > forcing > > someone implementing their own trust model with RSA signatures to use our > > certificate Data type and certificate names. > > > > To relate to the current RSA signature KeyLocator, you can think of it > as an > > identity instead of a full certificate name. > > > >> > >> What's the benefit of using opad and ipad? > >> Why should SignatureValue contain two SHA256 hash functions? Why not use > >> just "SHA256(KeyValue, Name, MetaInfo, Content, SignatureInfo)"? > > > > This is how HMAC is defined > > (http://en.wikipedia.org/wiki/Hash-based_message_authentication_code > > http://www.ietf.org/rfc/rfc2104.txt). The two applications of SHA256 > allow > > the symmetric key to be embedded in the hash. Otherwise, it would be a > > simple digest and could not prove the identity of a sender. The choice of > > ipad and opad were made by someone more aware of hash function attacks > than > > I am. > > > >> > >> > >> An accompanying document is needed to cover some guidance about how to > >> design an application that makes use of HMAC signature and still > guarantee a > >> strong level of provenance. > > > > > > There are many implementations of HMAC for authenticating web services. > See > > > http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGuide/HMACAuth.html > > for an example. I am not sure that I would be able to provide better > > guidance. > > > > > >> > >> In particular, is this scheme usable if producer and sender do not exist > >> at the same time? > > > > > > I'm not sure what you mean by exist. If they both know the key, they can > > exchange data. If you have old data stored and then someone tells you the > > symmetric key used in signing, you can verify it. It is exactly the same > as > > if you encountered old data signed with an RSA private key, and then got > the > > corresponding public key by whatever means: you would then be able to > verify > > it. > > > >> > >> Yours, Junxiao > > > > > > Thanks, > > Adeola > > > > _______________________________________________ > > Ndn-interest mailing list > > Ndn-interest at lists.cs.ucla.edu > > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -- PhD @ IRL, CSD, UCLA -------------- next part -------------- An HTML attachment was scrubbed... URL: From yingdi at CS.UCLA.EDU Fri Sep 19 21:45:29 2014 From: yingdi at CS.UCLA.EDU (Yingdi Yu) Date: Fri, 19 Sep 2014 21:45:29 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> Hi Adeola, It is great that we have a proposal for HMAC, a few comments about the doc. 1. I think you should mentioned in the spec that how to handle keys that are longer than the hash output. 2. we should either disable keys that are shorter than hash output or still state how to generate HMAC when a key is short. Just "discourage" is not enough. Yingdi On Sep 19, 2014, at 11:12 AM, Adeola Bannis wrote: > Hello all, > > I am proposing to add an HMAC type, using SHA256 as the hash function, to the signature types defined at http://named-data.net/doc/NDN-TLV/current/signature.html. This will enable communication with symmetric keys, which reduces the signing and verification load on resource-constrained devices. > > The proposal is attached. Please review it and reply with any comments or suggestions. > > Thanks, > Adeola > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tailinchu at gmail.com Fri Sep 19 22:09:13 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Fri, 19 Sep 2014 22:09:13 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> Message-ID: just some hmac facts: 1. if key is longer than block size, key = hash(key) 2. if key is shorter than block size, key = key pad with zeros It might be better if the doc simply says that "standard hmac with sha256 hash is used". hmac has so many details, and we should not redocument them again. On Fri, Sep 19, 2014 at 9:45 PM, Yingdi Yu wrote: > Hi Adeola, > > It is great that we have a proposal for HMAC, a few comments about the doc. > > 1. I think you should mentioned in the spec that how to handle keys that are > longer than the hash output. > 2. we should either disable keys that are shorter than hash output or still > state how to generate HMAC when a key is short. Just "discourage" is not > enough. > > Yingdi > > On Sep 19, 2014, at 11:12 AM, Adeola Bannis wrote: > > Hello all, > > I am proposing to add an HMAC type, using SHA256 as the hash function, to > the signature types defined at > http://named-data.net/doc/NDN-TLV/current/signature.html. This will enable > communication with symmetric keys, which reduces the signing and > verification load on resource-constrained devices. > > The proposal is attached. Please review it and reply with any comments or > suggestions. > > Thanks, > Adeola > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From wentaoshang at gmail.com Fri Sep 19 22:44:17 2014 From: wentaoshang at gmail.com (Wentao Shang) Date: Fri, 19 Sep 2014 22:44:17 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> Message-ID: On Fri, Sep 19, 2014 at 9:45 PM, Yingdi Yu wrote: > Hi Adeola, > > It is great that we have a proposal for HMAC, a few comments about the doc. > > 1. I think you should mentioned in the spec that how to handle keys that > are longer than the hash output. > Hi Yingdi, Correct me if I'm wrong: I thought the key should have the same length as the hash output. What people usually do is to provide some kind of secret (e.g., a password) and use key derivation function to get the actual HMAC key. Wentao > 2. we should either disable keys that are shorter than hash output or > still state how to generate HMAC when a key is short. Just "discourage" is > not enough. > > Yingdi > > On Sep 19, 2014, at 11:12 AM, Adeola Bannis > wrote: > > Hello all, > > I am proposing to add an HMAC type, using SHA256 as the hash function, to > the signature types defined at > http://named-data.net/doc/NDN-TLV/current/signature.html. This will > enable communication with symmetric keys, which reduces the signing and > verification load on resource-constrained devices. > > The proposal is attached. Please review it and reply with any comments or > suggestions. > > Thanks, > Adeola > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -- PhD @ IRL, CSD, UCLA -------------- next part -------------- An HTML attachment was scrubbed... URL: From yingdi at CS.UCLA.EDU Fri Sep 19 23:09:26 2014 From: yingdi at CS.UCLA.EDU (Yingdi Yu) Date: Fri, 19 Sep 2014 23:09:26 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> Message-ID: either way. My point is that there should be no ambiguity in the spec. Yingdi On Sep 19, 2014, at 10:09 PM, Tai-Lin Chu wrote: > just some hmac facts: > 1. if key is longer than block size, key = hash(key) > 2. if key is shorter than block size, key = key pad with zeros > > It might be better if the doc simply says that "standard hmac with > sha256 hash is used". hmac has so many details, and we should not > redocument them again. > > On Fri, Sep 19, 2014 at 9:45 PM, Yingdi Yu wrote: >> Hi Adeola, >> >> It is great that we have a proposal for HMAC, a few comments about the doc. >> >> 1. I think you should mentioned in the spec that how to handle keys that are >> longer than the hash output. >> 2. we should either disable keys that are shorter than hash output or still >> state how to generate HMAC when a key is short. Just "discourage" is not >> enough. >> >> Yingdi >> >> On Sep 19, 2014, at 11:12 AM, Adeola Bannis wrote: >> >> Hello all, >> >> I am proposing to add an HMAC type, using SHA256 as the hash function, to >> the signature types defined at >> http://named-data.net/doc/NDN-TLV/current/signature.html. This will enable >> communication with symmetric keys, which reduces the signing and >> verification load on resource-constrained devices. >> >> The proposal is attached. Please review it and reply with any comments or >> suggestions. >> >> Thanks, >> Adeola >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: From yingdi at CS.UCLA.EDU Fri Sep 19 23:18:48 2014 From: yingdi at CS.UCLA.EDU (Yingdi Yu) Date: Fri, 19 Sep 2014 23:18:48 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> Message-ID: <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> On Sep 19, 2014, at 10:44 PM, Wentao Shang wrote: > > > On Fri, Sep 19, 2014 at 9:45 PM, Yingdi Yu wrote: > Hi Adeola, > > It is great that we have a proposal for HMAC, a few comments about the doc. > > 1. I think you should mentioned in the spec that how to handle keys that are longer than the hash output. > > Hi Yingdi, > > Correct me if I'm wrong: I thought the key should have the same length as the hash output. Not necessarily. At least the RFC does not prevent the usage of longer key. > What people usually do is to provide some kind of secret (e.g., a password) and use key derivation function to get the actual HMAC key. This is only a way to derive a symmetric key, but HMAC key does not have to be derived in this way. You do not want to impose such an restriction here. @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC signature. Because if key size is longer than hash output, the key digest is used instead. If we allow KeyDigest in KeyLocator, then some careless programmers may leak the secret. Yingdi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tailinchu at gmail.com Sat Sep 20 01:06:15 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 20 Sep 2014 01:06:15 -0700 Subject: [Ndn-interest] [Clue] A Cloud-Applicable Network Policy Enforcement Strategy using Named Data In-Reply-To: <4ADA6F19-A536-4455-B9A6-5BD206AE1C8F@gmail.com> References: <1EFB1E57-5904-4661-AE8C-22D75B052062@gmail.com> <880D24C3-724A-4FC0-8E84-773609EEA184@gmail.com> <4ADA6F19-A536-4455-B9A6-5BD206AE1C8F@gmail.com> Message-ID: > I hope you could read the spec of signed interest carefully and think a little bit more before making the claim above. sorry, I was making an extreme example of unsynced clock (I know that nfd uses unix UTC time). Btw, do you know why we have both nonce and timestamp in signed interest? Will seq no alone solve this problem? I am worried that msec might not be sufficient in the future. On Sat, Sep 20, 2014 at 12:19 AM, Yingdi Yu wrote: > On Sep 19, 2014, at 10:01 PM, Tai-Lin Chu wrote: > > However, desync clocks (for example. nfd runs UTC, and we are in a > different timezone) makes signed interest unsecure. If we are in +8 > timezone, then all signed interests will give attackers 8 hours to > replay any command before it gets too old. If we are in -8 timezone, > then all signed interests are already too old. If we carefully match > the time, the threshold for expiration will have to compensate > network delay, and give attackers a few secs to replay. Therefore, the > current signed interest is not very secure IMHO. Maybe you can also > propose a better scheme for signed interest. > > > Hi Tai-Lin > > I hope you could read the spec of signed interest carefully and think a > little bit more before making the claim above. > > First, the spec says that timestamp is "millisecond offset from UTC > 1970-01-01 00:00:00", that means it is independent from the time zone... > > Second, about timestamp checking, it is not > > check timestamp to see whether this command is too old > > > The spec says that: > > Recipients of a signed interest may further check the timestamp and the > uniqueness of the signed interest (e.g., when the signed interest carries a > command). In this case, a signed interest may be treated as invalid if : > > a valid signed Interest whose timestamp is equal or later than the timestamp > of the received one has been received before. > > This check simply prevents replay attack. > > Yingdi > > > From oran at cisco.com Sat Sep 20 05:36:00 2014 From: oran at cisco.com (Dave Oran (oran)) Date: Sat, 20 Sep 2014 12:36:00 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> Message-ID: <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. > > Could you share it with us? > Sure. Here?s a strawman. The type space is 16 bits, so you have 65,565 types. The type space is currently shared with the types used for the entire protocol, that gives us two options: (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. - We reserve some portion of the space for unanticipated uses (say another 1024 types) - We give the rest of the space to application assignment. Make sense? >> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design > > we could design for performance, That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. > but I think there will be a turning > point when the slower design starts to become "fast enough?. Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. > Do you > think there will be some design of ndn that will *never* have > performance improvement? > I suspect LPM on data will always be slow (relative to the other functions). i suspect exclusions will always be slow because they will require extra memory references. However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? > On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >> >> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >> >>> We should not look at a certain chip nowadays and want ndn to perform >>> well on it. It should be the other way around: once ndn app becomes >>> popular, a better chip will be designed for ndn. >>> >> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >> a) clock rates are not getting (much) faster >> b) memory accesses are getting (relatively) more expensive >> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >> >> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >> >> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >> >>> I feel the discussion today and yesterday has been off-topic. Now I >>> see that there are 3 approaches: >>> 1. we should not define a naming convention at all >>> 2. typed component: use tlv type space and add a handful of types >>> 3. marked component: introduce only one more type and add additional >>> marker space >>> >> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >> >> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >> >>> Also everybody thinks that the current utf8 marker naming convention >>> needs to be revised. >>> >>> >>> >>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>> NDN will have to carry more information than URLs, as far as I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). If you need to >>>> switch between arrays, it would be very expensive. If you have to read past >>>> the name to get to the 2nd array, then read it, then backup to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You seem to imply >>>> that the first one is faster. I don?t think this is the case. >>>> >>>> In the first one you?ll probably have to get the cache line for the index, >>>> then all the required cache lines for the first 5 components. For the >>>> second, you?ll have to get all the cache lines for the first 5 components. >>>> Given an assumption that a cache miss is way more expensive than >>>> evaluating a number and computing an addition, you might find that the >>>> performance of the index is actually slower than the performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at all, for >>>> example, if you just get the offsets and then send the offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>> gain IF there are more cache line misses in reading the name than in >>>> reading the index. So, if the regular part of the name that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>> processed by a different processor, then your might see some performance >>>> gain in using the index, but in all other circumstances I bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the protocol with >>>> only one architecture in mind. (The architecture of sending the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose you only >>>> want the first x components) you can directly have it using the >>>> offsets. With the Nested TLV structure you have to iteratively parse >>>> the first x-1 components. With the offset structure you cane directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an entirely >>>> different >>>> scheme where the info that describes the name-components is ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose you only >>>> want >>>> the first x components) you can directly have it using the offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first x-1 >>>> components. >>>> With the offset structure you cane directly access to the firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few "specific >>>> component type". You are imposing types for every component in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would prefer >>>> to >>>> have the name as simple sequence bytes with a field separator. Then, >>>> outside the name, if you have some components that could be used at >>>> network layer (e.g. a TLV field), you simply need something that >>>> indicates which is the offset allowing you to retrieve the version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>> and >>>> there's been email trying to explain that applications don't have to >>>> use types if they don't need to. your email sounds like "I prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that preference in >>>> the face of the points about the problems. can you say why it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> From shijunxiao at email.arizona.edu Sat Sep 20 10:10:17 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Sat, 20 Sep 2014 10:10:17 -0700 Subject: [Ndn-interest] [Clue] A Cloud-Applicable Network Policy Enforcement Strategy using Named Data In-Reply-To: References: <1EFB1E57-5904-4661-AE8C-22D75B052062@gmail.com> <880D24C3-724A-4FC0-8E84-773609EEA184@gmail.com> <4ADA6F19-A536-4455-B9A6-5BD206AE1C8F@gmail.com> Message-ID: Hi Tai-Lin In signed Interest , - timestamp is to prevent replay attack: the timestamp in a new command must be greater than any existing timestamps - nonce is to guarantee uniqueness; this is useful when producer is not checking the timestamp Each consumer is expected to have its own unique keypair. Under this assumption, the system can tolerate a clock skew of 60 seconds between consumer and producer. Millisecond granularity is sufficient for the intended usage of signed Interest - infrequent command execution. Also note that the timestamp is never compared to wallclock after the initial command. Therefore, the consumer can operate as follows to send frequent commands: 1. the initial command must carry a timestamp equal to wallclock 2. in each subsequent command, increment timestamp by 1 3. in case a command is rejected due to invalid timestamp, it means latest timestamp state is lost on the producer, therefore consumer should resend the command as an initial command (step 1) But this doesn't solve all problems with high-frequency signed Interests. See bug 1990 . Yours, Junxiao On Sat, Sep 20, 2014 at 1:06 AM, Tai-Lin Chu wrote: > > I hope you could read the spec of signed interest carefully and think a > little bit more before making the claim above. > sorry, I was making an extreme example of unsynced clock (I know that > nfd uses unix UTC time). > > Btw, do you know why we have both nonce and timestamp in signed > interest? Will seq no alone solve this problem? I am worried that msec > might not be sufficient in the future. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zaher at illinois.edu Sat Sep 20 10:12:54 2014 From: zaher at illinois.edu (Abdelzaher, Tarek) Date: Sat, 20 Sep 2014 12:12:54 -0500 Subject: [Ndn-interest] any comments on naming convention? [version marker comment] In-Reply-To: References: Message-ID: <541DB596.5010400@illinois.edu> Jeff, > in the document I sent, there are seven specific, though not > equally well considered, reasons to use marker components that have > nothing to with the so-called type explosion. As far as I can tell, no > one has addressed these from the application developer's perspective. > Could someone? I have a small comment on the original name conventions document sent by Tai-Lin. I'd like to suggest a slight semantic generalization of the meaning of one of the proposed markers; namely, the versioning marker 0xFD (which identifies the version of the component). Do you see harm in overloading the semantics of the field identified by 0xFD to refer to a general "priority" hint? As the document suggests, by convention, priority could be given to, say, lexicographically larger values in that field. The document says "[versioning] can be used by third-parties, e.g., to prioritize caching of the latest versions of data". I can see other reasons that applications or transport protocols might want to instruct caches to prioritize objects that have the same prefix. It would be convenient to be able to insert a value in the 0xFD field such that larger lexicographic values are prioritized (in caches, etc) over smaller ones. One such scenario is described in the "Information Funnel" paper, where the lexicographical priority "value" would be an entire name postfix (referring a subtree in which some lexicographically ordered branches are more important than others). Application software and/or transport layers could then properly name producer objects slated for sharing with subscribers such that the network is aware of their relative importance. Think, for example, of layered video encoding, where base-layer objects should be prioritized over enhancement-layer objects. Any thoughts/comments? Tarek From gts at ics.uci.EDU Sat Sep 20 10:24:20 2014 From: gts at ics.uci.EDU (GTS) Date: Sat, 20 Sep 2014 10:24:20 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: <541DB844.6040402@ics.uci.edu> Hi, adding an HMAC-based authenticator option to secure NDN content is a good idea. It certainly makes sense for intranets and other intra-AS settings. I believe that CCNx has introduced an HMAC option quite some time ago. It might be worthwhile to check with PARC and not to reinvent the wheel. Cheers, Gene On 9/19/14, 11:12 AM, Adeola Bannis wrote: > Hello all, > > I am proposing to add an HMAC type, using SHA256 as the hash function, > to the signature types defined at > http://named-data.net/doc/NDN-TLV/current/signature.html. This will > enable communication with symmetric keys, which reduces the signing > and verification load on resource-constrained devices. > > The proposal is attached. Please review it and reply with any comments > or suggestions. > > Thanks, > Adeola > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From yingdi at CS.UCLA.EDU Sat Sep 20 11:07:01 2014 From: yingdi at CS.UCLA.EDU (Yingdi Yu) Date: Sat, 20 Sep 2014 11:07:01 -0700 Subject: [Ndn-interest] Signed interest In-Reply-To: References: <1EFB1E57-5904-4661-AE8C-22D75B052062@gmail.com> <880D24C3-724A-4FC0-8E84-773609EEA184@gmail.com> <4ADA6F19-A536-4455-B9A6-5BD206AE1C8F@gmail.com> Message-ID: <9292AC2E-197B-4058-AEA3-5802DBEBA9DA@cs.ucla.edu> Changed the topic since it is no longer about the original topic of the discussion. On Sep 20, 2014, at 10:10 AM, Junxiao Shi wrote: > Hi Tai-Lin > > In signed Interest, > timestamp is to prevent replay attack: the timestamp in a new command must be greater than any existing timestamps > nonce is to guarantee uniqueness; this is useful when producer is not checking the timestamp > Each consumer is expected to have its own unique keypair. Under this assumption, the system can tolerate a clock skew of 60 seconds between consumer and producer. > > Millisecond granularity is sufficient for the intended usage of signed Interest - infrequent command execution. > Also note that the timestamp is never compared to wallclock after the initial command. Therefore, the consumer can operate as follows to send frequent commands: > the initial command must carry a timestamp equal to wallclock > in each subsequent command, increment timestamp by 1 > in case a command is rejected due to invalid timestamp, it means latest timestamp state is lost on the producer, therefore consumer should resend the command as an initial command (step 1) This is similar to current KeyChain sign Interest operation. The difference is that we only increase the timestamp by 1 when the timestamp of an interest is the same as the previous one. In the other cases, we simply use the current timestamp. Unless there is an app that needs to generate more than 1000 signed interests using the same key, this solution should work. > But this doesn't solve all problems with high-frequency signed Interests. See bug 1990. As I replied on redmine, if order really matters, the interest sender should wait for the confirmation from the interest recipient. And this should be enforced by the app. > Yours, Junxiao > > On Sat, Sep 20, 2014 at 1:06 AM, Tai-Lin Chu wrote: > > I hope you could read the spec of signed interest carefully and think a little bit more before making the claim above. > sorry, I was making an extreme example of unsynced clock (I know that > nfd uses unix UTC time). > > Btw, do you know why we have both nonce and timestamp in signed > interest? Will seq no alone solve this problem? I am worried that msec > might not be sufficient in the future. Using seqNo requires you to persistently remember the last used seqNo (even if the app is turned off), otherwise you cannot guarantee that a seqNo has not been used before. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tailinchu at gmail.com Sat Sep 20 11:15:37 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 20 Sep 2014 11:15:37 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> Message-ID: I had thought about these questions, but I want to know your idea besides typed component: 1. LPM allows "data discovery". How will exact match do similar things? 2. will removing selectors improve performance? How do we use other faster technique to replace selector? 3. fixed byte length and type. I agree more that type can be fixed byte, but 2 bytes for length might not be enough for future. On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: > > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: > >>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >> >> Could you share it with us? >> > Sure. Here?s a strawman. > > The type space is 16 bits, so you have 65,565 types. > > The type space is currently shared with the types used for the entire protocol, that gives us two options: > (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. > (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. > > We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). > > - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. > - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. > - We reserve some portion of the space for unanticipated uses (say another 1024 types) > - We give the rest of the space to application assignment. > > Make sense? > > >>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >> >> we could design for performance, > That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. > >> but I think there will be a turning >> point when the slower design starts to become "fast enough?. > Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. > >> Do you >> think there will be some design of ndn that will *never* have >> performance improvement? >> > I suspect LPM on data will always be slow (relative to the other functions). > i suspect exclusions will always be slow because they will require extra memory references. > > However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? > >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>> >>>> We should not look at a certain chip nowadays and want ndn to perform >>>> well on it. It should be the other way around: once ndn app becomes >>>> popular, a better chip will be designed for ndn. >>>> >>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>> a) clock rates are not getting (much) faster >>> b) memory accesses are getting (relatively) more expensive >>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>> >>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>> >>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>> >>>> I feel the discussion today and yesterday has been off-topic. Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of types >>>> 3. marked component: introduce only one more type and add additional >>>> marker space >>>> >>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>> >>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>> >>>> Also everybody thinks that the current utf8 marker naming convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>> experiments? >>>>> >>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>> NDN will have to carry more information than URLs, as far as I see. >>>>> >>>>> >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>> >>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>> switch between arrays, it would be very expensive. If you have to read past >>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>> name, it will be pretty expensive too. >>>>> >>>>> Marc >>>>> >>>>> On Sep 18, 2014, at 2:02 PM, >>>>> wrote: >>>>> >>>>> Does this make that much difference? >>>>> >>>>> If you want to parse the first 5 components. One way to do it is: >>>>> >>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>> offset of the beginning of the name. >>>>> OR >>>>> Start reading name, (find size + move ) 5 times. >>>>> >>>>> How much speed are you getting from one to the other? You seem to imply >>>>> that the first one is faster. I don?t think this is the case. >>>>> >>>>> In the first one you?ll probably have to get the cache line for the index, >>>>> then all the required cache lines for the first 5 components. For the >>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>> Given an assumption that a cache miss is way more expensive than >>>>> evaluating a number and computing an addition, you might find that the >>>>> performance of the index is actually slower than the performance of the >>>>> direct access. >>>>> >>>>> Granted, there is a case where you don?t access the name at all, for >>>>> example, if you just get the offsets and then send the offsets as >>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>> gain IF there are more cache line misses in reading the name than in >>>>> reading the index. So, if the regular part of the name that you?re >>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>> processed by a different processor, then your might see some performance >>>>> gain in using the index, but in all other circumstances I bet this is not >>>>> the case. I may be wrong, haven?t actually tested it. >>>>> >>>>> This is all to say, I don?t think we should be designing the protocol with >>>>> only one architecture in mind. (The architecture of sending the name to a >>>>> different processor than the index). >>>>> >>>>> If you have numbers that show that the index is faster I would like to see >>>>> under what conditions and architectural assumptions. >>>>> >>>>> Nacho >>>>> >>>>> (I may have misinterpreted your description so feel free to correct me if >>>>> I?m wrong.) >>>>> >>>>> >>>>> -- >>>>> Nacho (Ignacio) Solis >>>>> Protocol Architect >>>>> Principal Scientist >>>>> Palo Alto Research Center (PARC) >>>>> +1(650)812-4458 >>>>> Ignacio.Solis at parc.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>> wrote: >>>>> >>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>> bytes: >>>>> >>>>> i.e., >>>>> Type = Offsets >>>>> Length = 10 Bytes >>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>> >>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>> long. >>>>> >>>>> Max >>>>> >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose you only >>>>> want the first x components) you can directly have it using the >>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>> the first x-1 components. With the offset structure you cane directly >>>>> access to the firs x components. >>>>> >>>>> I don't get it. What you described only works if the "offset" is >>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>> offsets to get to the x offset. >>>>> >>>>> >>>>> >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>> wrote: >>>>> >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>> >>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>> you >>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>> different >>>>> scheme where the info that describes the name-components is ... >>>>> someplace >>>>> other than _in_ the name-components. is that correct? when you say >>>>> "field >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>> >>>>> Correct. >>>>> In particular, with our name encoding, a TLV indicates the name >>>>> hierarchy >>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>> in >>>>> order to retrieve special components. >>>>> As for the field separator, it is something like "/". Aliasing is >>>>> avoided as >>>>> you do not rely on field separators to parse the name; you use the >>>>> "offset >>>>> TLV " to do that. >>>>> >>>>> So now, it may be an aesthetic question but: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose you only >>>>> want >>>>> the first x components) you can directly have it using the offsets. >>>>> With the >>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>> components. >>>>> With the offset structure you cane directly access to the firs x >>>>> components. >>>>> >>>>> Max >>>>> >>>>> >>>>> -- Mark >>>>> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>> >>>>> The why is simple: >>>>> >>>>> You use a lot of "generic component type" and very few "specific >>>>> component type". You are imposing types for every component in order >>>>> to >>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>> (specify >>>>> the component's type ) to handle exceptions! >>>>> >>>>> I would prefer not to have typed components. Instead I would prefer >>>>> to >>>>> have the name as simple sequence bytes with a field separator. Then, >>>>> outside the name, if you have some components that could be used at >>>>> network layer (e.g. a TLV field), you simply need something that >>>>> indicates which is the offset allowing you to retrieve the version, >>>>> segment, etc in the name... >>>>> >>>>> >>>>> Max >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>> >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>> >>>>> I think we agree on the small number of "component types". >>>>> However, if you have a small number of types, you will end up with >>>>> names >>>>> containing many generic components types and few specific >>>>> components >>>>> types. Due to the fact that the component type specification is an >>>>> exception in the name, I would prefer something that specify >>>>> component's >>>>> type only when needed (something like UTF8 conventions but that >>>>> applications MUST use). >>>>> >>>>> so ... I can't quite follow that. the thread has had some >>>>> explanation >>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>> and >>>>> there's been email trying to explain that applications don't have to >>>>> use types if they don't need to. your email sounds like "I prefer >>>>> the >>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>> the face of the points about the problems. can you say why it is >>>>> that >>>>> you express a preference for the "convention" with problems ? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> . >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> > From abannis at ucla.edu Sat Sep 20 11:18:12 2014 From: abannis at ucla.edu (Adeola Bannis) Date: Sat, 20 Sep 2014 11:18:12 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu wrote: > > On Sep 19, 2014, at 10:44 PM, Wentao Shang wrote: > > > > On Fri, Sep 19, 2014 at 9:45 PM, Yingdi Yu wrote: > >> Hi Adeola, >> >> It is great that we have a proposal for HMAC, a few comments about the >> doc. >> >> 1. I think you should mentioned in the spec that how to handle keys that >> are longer than the hash output. >> > > Hi Yingdi, > > Correct me if I'm wrong: I thought the key should have the same length as > the hash output. > > > Not necessarily. At least the RFC does not prevent the usage of longer > key. > > What people usually do is to provide some kind of secret (e.g., a > password) and use key derivation function to get the actual HMAC key. > > > This is only a way to derive a symmetric key, but HMAC key does not have > to be derived in this way. You do not want to impose such an restriction > here. > > Yes, "standard" HMAC does not explicitly restrict the size of the key, and it also recommends that some key derivation function (bcrypt, PBKDF2, etc) be used with any passphrase that is not the same size as the hash output. I think Tai-Lin's point is fair, and we should let people do whatever they would have done with their HMAC key. > > @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC > signature. Because if key size is longer than hash output, the key digest > is used instead. If we allow KeyDigest in KeyLocator, then some careless > programmers may leak the secret. > Well, a careless programmer could put a passphrase, or something used to derive the key, into a KeyLocator KeyName as well. We can go ahead and make the restriction, but there are other ways a programmer could shoot herself in the foot here. > Yingdi > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Sat Sep 20 11:19:52 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 20 Sep 2014 11:19:52 -0700 Subject: [Ndn-interest] Signed interest In-Reply-To: <9292AC2E-197B-4058-AEA3-5802DBEBA9DA@cs.ucla.edu> References: <1EFB1E57-5904-4661-AE8C-22D75B052062@gmail.com> <880D24C3-724A-4FC0-8E84-773609EEA184@gmail.com> <4ADA6F19-A536-4455-B9A6-5BD206AE1C8F@gmail.com> <9292AC2E-197B-4058-AEA3-5802DBEBA9DA@cs.ucla.edu> Message-ID: >Using seqNo requires you to persistently remember the last used seqNo (even if the app is turned off), otherwise you cannot guarantee that a seqNo has not been used before. can we assume that once the underlying connection tears down, the seqNo resets to 0? so after the app is turned off, you can safely start from 0 again? On Sat, Sep 20, 2014 at 11:07 AM, Yingdi Yu wrote: > Changed the topic since it is no longer about the original topic of the > discussion. > > On Sep 20, 2014, at 10:10 AM, Junxiao Shi > wrote: > > Hi Tai-Lin > > In signed Interest, > > timestamp is to prevent replay attack: the timestamp in a new command must > be greater than any existing timestamps > nonce is to guarantee uniqueness; this is useful when producer is not > checking the timestamp > > Each consumer is expected to have its own unique keypair. Under this > assumption, the system can tolerate a clock skew of 60 seconds between > consumer and producer. > > Millisecond granularity is sufficient for the intended usage of signed > Interest - infrequent command execution. > Also note that the timestamp is never compared to wallclock after the > initial command. Therefore, the consumer can operate as follows to send > frequent commands: > > the initial command must carry a timestamp equal to wallclock > in each subsequent command, increment timestamp by 1 > in case a command is rejected due to invalid timestamp, it means latest > timestamp state is lost on the producer, therefore consumer should resend > the command as an initial command (step 1) > > This is similar to current KeyChain sign Interest operation. The difference > is that we only increase the timestamp by 1 when the timestamp of an > interest is the same as the previous one. In the other cases, we simply use > the current timestamp. Unless there is an app that needs to generate more > than 1000 signed interests using the same key, this solution should work. > > But this doesn't solve all problems with high-frequency signed Interests. > See bug 1990. > > > As I replied on redmine, if order really matters, the interest sender should > wait for the confirmation from the interest recipient. And this should be > enforced by the app. > > Yours, Junxiao > > On Sat, Sep 20, 2014 at 1:06 AM, Tai-Lin Chu wrote: >> >> > I hope you could read the spec of signed interest carefully and think a >> > little bit more before making the claim above. >> sorry, I was making an extreme example of unsynced clock (I know that >> nfd uses unix UTC time). >> >> Btw, do you know why we have both nonce and timestamp in signed >> interest? Will seq no alone solve this problem? I am worried that msec >> might not be sufficient in the future. > > > Using seqNo requires you to persistently remember the last used seqNo (even > if the app is turned off), otherwise you cannot guarantee that a seqNo has > not been used before. > From Marc.Mosko at parc.com Sat Sep 20 11:20:06 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Sat, 20 Sep 2014 18:20:06 +0000 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: <541DB844.6040402@ics.uci.edu> References: <541DB844.6040402@ics.uci.edu> Message-ID: <592B44F3-A841-4EAC-80F2-750534E69F9C@parc.com> In CCNx 1.0, we have HMAC-SHA256 running along side RSA-SHA256/512. We require a keyid field in the signature algorithm section. This could be a SHA256 of the key, or we do allow it to be any identifier agreed upon by the two parties as part of a key exchange protocol. For example, two nodes could begin using integers 0, 1, 2, ? for some given namespace. They would only need to remember 2 identifiers at a time (the current and the last one to handle packets in flight during change over). The keyid does not need to be globally significant like for RSA, as MACs should really be pair-wise authentications. Obviously, the key itself is never communicated outside a key exchange protocol. A previous poster was correct. The HMAC key can be any length up to and including the hash block length (which may be greater than the output length). Section 2 of the the RFC also states that if the key is longer than the block length, one should hash the key with the hash function, then use that as the key. The keyid would be the hash of that hash (obviously not wanting to put the key in the keyid). For HMAC, section 2 recommends keys the same length as the hash function output. We have a specific identifier for each validation algorithm, whether it is CRC-32c, HMAC-SHA256, RSA-256,etc. See http://www.ccnx.org/pubs/ccnx-mosko-tlvmessages-01.txt, sec. 3.6. Thus, if we wanted to use HMAC-BLAKE2s-32, for example, that would end up as a new crypto-suite identifier for us. We do not identifier the signing algorithm apart from the hashing algorithm. Marc On Sep 20, 2014, at 10:24 AM, GTS wrote: > Hi, > > adding an HMAC-based authenticator option to secure NDN content is a good idea. > It certainly makes sense for intranets and other intra-AS settings. > > I believe that CCNx has introduced an HMAC option quite some time ago. > It might be worthwhile to check with PARC and not to reinvent the wheel. > > Cheers, > Gene > > > > > > > > On 9/19/14, 11:12 AM, Adeola Bannis wrote: >> Hello all, >> >> I am proposing to add an HMAC type, using SHA256 as the hash function, to the signature types defined at http://named-data.net/doc/NDN-TLV/current/signature.html. This will enable communication with symmetric keys, which reduces the signing and verification load on resource-constrained devices. >> >> The proposal is attached. Please review it and reply with any comments or suggestions. >> >> Thanks, >> Adeola >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From abannis at ucla.edu Sat Sep 20 11:36:00 2014 From: abannis at ucla.edu (Adeola Bannis) Date: Sat, 20 Sep 2014 11:36:00 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: <592B44F3-A841-4EAC-80F2-750534E69F9C@parc.com> References: <541DB844.6040402@ics.uci.edu> <592B44F3-A841-4EAC-80F2-750534E69F9C@parc.com> Message-ID: It sounds like the current proposal for NDN is in line with what's defined in CCNx. On Sat, Sep 20, 2014 at 11:20 AM, wrote: > In CCNx 1.0, we have HMAC-SHA256 running along side RSA-SHA256/512. > > We require a keyid field in the signature algorithm section. This could > be a SHA256 of the key, or we do allow it to be any identifier agreed upon > by the two parties as part of a key exchange protocol. For example, two > nodes could begin using integers 0, 1, 2, ? for some given namespace. They > would only need to remember 2 identifiers at a time (the current and the > last one to handle packets in flight during change over). The keyid does > not need to be globally significant like for RSA, as MACs should really be > pair-wise authentications. > > Obviously, the key itself is never communicated outside a key exchange > protocol. > > A previous poster was correct. The HMAC key can be any length up to and > including the hash block length (which may be greater than the output > length). Section 2 of the the RFC also states that if the key is longer > than the block length, one should hash the key with the hash function, then > use that as the key. The keyid would be the hash of that hash (obviously > not wanting to put the key in the keyid). For HMAC, section 2 recommends > keys the same length as the hash function output. > > We have a specific identifier for each validation algorithm, whether it is > CRC-32c, HMAC-SHA256, RSA-256,etc. See > http://www.ccnx.org/pubs/ccnx-mosko-tlvmessages-01.txt, sec. 3.6. Thus, > if we wanted to use HMAC-BLAKE2s-32, for example, that would end up as a > new crypto-suite identifier for us. We do not identifier the signing > algorithm apart from the hashing algorithm. > > Marc > > > > On Sep 20, 2014, at 10:24 AM, GTS wrote: > > Hi, > > adding an HMAC-based authenticator option to secure NDN content is a good > idea. > It certainly makes sense for intranets and other intra-AS settings. > > I believe that CCNx has introduced an HMAC option quite some time ago. > It might be worthwhile to check with PARC and not to reinvent the wheel. > > Cheers, > Gene > > > > > > > > On 9/19/14, 11:12 AM, Adeola Bannis wrote: > > Hello all, > > I am proposing to add an HMAC type, using SHA256 as the hash function, > to the signature types defined at > http://named-data.net/doc/NDN-TLV/current/signature.html. This will > enable communication with symmetric keys, which reduces the signing and > verification load on resource-constrained devices. > > The proposal is attached. Please review it and reply with any comments > or suggestions. > > Thanks, > Adeola > > > _______________________________________________ > Ndn-interest mailing listNdn-interest at lists.cs.ucla.eduhttp://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravi.ravindran at gmail.com Sat Sep 20 11:41:58 2014 From: ravi.ravindran at gmail.com (Ravi Ravindran) Date: Sat, 20 Sep 2014 11:41:58 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> Message-ID: I agree on the 2B length field comment, that should be variable to accommodate ICN interfacing with high speed optical networks, or in general future requirements to ship large (GB) chunks of data. Regards, Ravi On Sat, Sep 20, 2014 at 11:15 AM, Tai-Lin Chu wrote: > I had thought about these questions, but I want to know your idea > besides typed component: > 1. LPM allows "data discovery". How will exact match do similar things? > 2. will removing selectors improve performance? How do we use other > faster technique to replace selector? > 3. fixed byte length and type. I agree more that type can be fixed > byte, but 2 bytes for length might not be enough for future. > > > On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: > > > > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: > > > >>> I know how to make #2 flexible enough to do what things I can envision > we need to do, and with a few simple conventions on how the registry of > types is managed. > >> > >> Could you share it with us? > >> > > Sure. Here?s a strawman. > > > > The type space is 16 bits, so you have 65,565 types. > > > > The type space is currently shared with the types used for the entire > protocol, that gives us two options: > > (1) we reserve a range for name component types. Given the likelihood > there will be at least as much and probably more need to component types > than protocol extensions, we could reserve 1/2 of the type space, giving us > 32K types for name components. > > (2) since there is no parsing ambiguity between name components and > other fields of the protocol (sine they are sub-types of the name type) we > could reuse numbers and thereby have an entire 65K name component types. > > > > We divide the type space into regions, and manage it with a registry. If > we ever get to the point of creating an IETF standard, IANA has 25 years of > experience running registries and there are well-understood rule sets for > different kinds of registries (open, requires a written spec, requires > standards approval). > > > > - We allocate one ?default" name component type for ?generic name?, > which would be used on name prefixes and other common cases where there are > no special semantics on the name component. > > - We allocate a range of name component types, say 1024, to globally > understood types that are part of the base or extension NDN specifications > (e.g. chunk#, version#, etc. > > - We reserve some portion of the space for unanticipated uses (say > another 1024 types) > > - We give the rest of the space to application assignment. > > > > Make sense? > > > > > >>> While I?m sympathetic to that view, there are three ways in which > Moore?s law or hardware tricks will not save us from performance flaws in > the design > >> > >> we could design for performance, > > That?s not what people are advocating. We are advocating that we *not* > design for known bad performance and hope serendipity or Moore?s Law will > come to the rescue. > > > >> but I think there will be a turning > >> point when the slower design starts to become "fast enough?. > > Perhaps, perhaps not. Relative performance is what matters so things > that don?t get faster while others do tend to get dropped or not used > because they impose a performance penalty relative to the things that go > faster. There is also the ?low-end? phenomenon where impovements in > technology get applied to lowering cost rather than improving performance. > For those environments bad performance just never get better. > > > >> Do you > >> think there will be some design of ndn that will *never* have > >> performance improvement? > >> > > I suspect LPM on data will always be slow (relative to the other > functions). > > i suspect exclusions will always be slow because they will require extra > memory references. > > > > However I of course don?t claim to clairvoyance so this is just > speculation based on 35+ years of seeing performance improve by 4 orders of > magnitude and still having to worry about counting cycles and memory > references? > > > >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) > wrote: > >>> > >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: > >>> > >>>> We should not look at a certain chip nowadays and want ndn to perform > >>>> well on it. It should be the other way around: once ndn app becomes > >>>> popular, a better chip will be designed for ndn. > >>>> > >>> While I?m sympathetic to that view, there are three ways in which > Moore?s law or hardware tricks will not save us from performance flaws in > the design: > >>> a) clock rates are not getting (much) faster > >>> b) memory accesses are getting (relatively) more expensive > >>> c) data structures that require locks to manipulate successfully will > be relatively more expensive, even with near-zero lock contention. > >>> > >>> The fact is, IP *did* have some serious performance flaws in its > design. We just forgot those because the design elements that depended on > those mistakes have fallen into disuse. The poster children for this are: > >>> 1. IP options. Nobody can use them because they are too slow on modern > forwarding hardware, so they can?t be reliably used anywhere > >>> 2. the UDP checksum, which was a bad design when it was specified and > is now a giant PITA that still causes major pain in working around. > >>> > >>> I?m afraid students today are being taught the that designers of IP > were flawless, as opposed to very good scientists and engineers that got > most of it right. > >>> > >>>> I feel the discussion today and yesterday has been off-topic. Now I > >>>> see that there are 3 approaches: > >>>> 1. we should not define a naming convention at all > >>>> 2. typed component: use tlv type space and add a handful of types > >>>> 3. marked component: introduce only one more type and add additional > >>>> marker space > >>>> > >>> I know how to make #2 flexible enough to do what things I can envision > we need to do, and with a few simple conventions on how the registry of > types is managed. > >>> > >>> It is just as powerful in practice as either throwing up our hands and > letting applications design their own mutually incompatible schemes or > trying to make naming conventions with markers in a way that is fast to > generate/parse and also resilient against aliasing. > >>> > >>>> Also everybody thinks that the current utf8 marker naming convention > >>>> needs to be revised. > >>>> > >>>> > >>>> > >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: > >>>>> Would that chip be suitable, i.e. can we expect most names to fit in > (the > >>>>> magnitude of) 96 bytes? What length are names usually in current NDN > >>>>> experiments? > >>>>> > >>>>> I guess wide deployment could make for even longer names. Related: > Many URLs > >>>>> I encounter nowadays easily don't fit within two 80-column text > lines, and > >>>>> NDN will have to carry more information than URLs, as far as I see. > >>>>> > >>>>> > >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: > >>>>> > >>>>> In fact, the index in separate TLV will be slower on some > architectures, > >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in > memory, > >>>>> then any subsequent memory is accessed only as two adjacent 32-byte > blocks > >>>>> (there can be at most 5 blocks available at any one time). If you > need to > >>>>> switch between arrays, it would be very expensive. If you have to > read past > >>>>> the name to get to the 2nd array, then read it, then backup to get > to the > >>>>> name, it will be pretty expensive too. > >>>>> > >>>>> Marc > >>>>> > >>>>> On Sep 18, 2014, at 2:02 PM, > >>>>> wrote: > >>>>> > >>>>> Does this make that much difference? > >>>>> > >>>>> If you want to parse the first 5 components. One way to do it is: > >>>>> > >>>>> Read the index, find entry 5, then read in that many bytes from the > start > >>>>> offset of the beginning of the name. > >>>>> OR > >>>>> Start reading name, (find size + move ) 5 times. > >>>>> > >>>>> How much speed are you getting from one to the other? You seem to > imply > >>>>> that the first one is faster. I don?t think this is the case. > >>>>> > >>>>> In the first one you?ll probably have to get the cache line for the > index, > >>>>> then all the required cache lines for the first 5 components. For > the > >>>>> second, you?ll have to get all the cache lines for the first 5 > components. > >>>>> Given an assumption that a cache miss is way more expensive than > >>>>> evaluating a number and computing an addition, you might find that > the > >>>>> performance of the index is actually slower than the performance of > the > >>>>> direct access. > >>>>> > >>>>> Granted, there is a case where you don?t access the name at all, for > >>>>> example, if you just get the offsets and then send the offsets as > >>>>> parameters to another processor/GPU/NPU/etc. In this case you may > see a > >>>>> gain IF there are more cache line misses in reading the name than in > >>>>> reading the index. So, if the regular part of the name that you?re > >>>>> parsing is bigger than the cache line (64 bytes?) and the name is to > be > >>>>> processed by a different processor, then your might see some > performance > >>>>> gain in using the index, but in all other circumstances I bet this > is not > >>>>> the case. I may be wrong, haven?t actually tested it. > >>>>> > >>>>> This is all to say, I don?t think we should be designing the > protocol with > >>>>> only one architecture in mind. (The architecture of sending the name > to a > >>>>> different processor than the index). > >>>>> > >>>>> If you have numbers that show that the index is faster I would like > to see > >>>>> under what conditions and architectural assumptions. > >>>>> > >>>>> Nacho > >>>>> > >>>>> (I may have misinterpreted your description so feel free to correct > me if > >>>>> I?m wrong.) > >>>>> > >>>>> > >>>>> -- > >>>>> Nacho (Ignacio) Solis > >>>>> Protocol Architect > >>>>> Principal Scientist > >>>>> Palo Alto Research Center (PARC) > >>>>> +1(650)812-4458 > >>>>> Ignacio.Solis at parc.com > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" < > massimo.gallo at alcatel-lucent.com> > >>>>> wrote: > >>>>> > >>>>> Indeed each components' offset must be encoded using a fixed amount > of > >>>>> bytes: > >>>>> > >>>>> i.e., > >>>>> Type = Offsets > >>>>> Length = 10 Bytes > >>>>> Value = Offset1(1byte), Offset2(1byte), ... > >>>>> > >>>>> You may also imagine to have a "Offset_2byte" type if your name is > too > >>>>> long. > >>>>> > >>>>> Max > >>>>> > >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: > >>>>> > >>>>> if you do not need the entire hierarchal structure (suppose you only > >>>>> want the first x components) you can directly have it using the > >>>>> offsets. With the Nested TLV structure you have to iteratively parse > >>>>> the first x-1 components. With the offset structure you cane directly > >>>>> access to the firs x components. > >>>>> > >>>>> I don't get it. What you described only works if the "offset" is > >>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 > >>>>> offsets to get to the x offset. > >>>>> > >>>>> > >>>>> > >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > >>>>> wrote: > >>>>> > >>>>> On 17/09/2014 14:56, Mark Stapp wrote: > >>>>> > >>>>> ah, thanks - that's helpful. I thought you were saying "I like the > >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what > >>>>> you > >>>>> _do_ prefer, though. it sounds like you're describing an entirely > >>>>> different > >>>>> scheme where the info that describes the name-components is ... > >>>>> someplace > >>>>> other than _in_ the name-components. is that correct? when you say > >>>>> "field > >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? > >>>>> > >>>>> Correct. > >>>>> In particular, with our name encoding, a TLV indicates the name > >>>>> hierarchy > >>>>> with offsets in the name and other TLV(s) indicates the offset to use > >>>>> in > >>>>> order to retrieve special components. > >>>>> As for the field separator, it is something like "/". Aliasing is > >>>>> avoided as > >>>>> you do not rely on field separators to parse the name; you use the > >>>>> "offset > >>>>> TLV " to do that. > >>>>> > >>>>> So now, it may be an aesthetic question but: > >>>>> > >>>>> if you do not need the entire hierarchal structure (suppose you only > >>>>> want > >>>>> the first x components) you can directly have it using the offsets. > >>>>> With the > >>>>> Nested TLV structure you have to iteratively parse the first x-1 > >>>>> components. > >>>>> With the offset structure you cane directly access to the firs x > >>>>> components. > >>>>> > >>>>> Max > >>>>> > >>>>> > >>>>> -- Mark > >>>>> > >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: > >>>>> > >>>>> The why is simple: > >>>>> > >>>>> You use a lot of "generic component type" and very few "specific > >>>>> component type". You are imposing types for every component in order > >>>>> to > >>>>> handle few exceptions (segmentation, etc..). You create a rule > >>>>> (specify > >>>>> the component's type ) to handle exceptions! > >>>>> > >>>>> I would prefer not to have typed components. Instead I would prefer > >>>>> to > >>>>> have the name as simple sequence bytes with a field separator. Then, > >>>>> outside the name, if you have some components that could be used at > >>>>> network layer (e.g. a TLV field), you simply need something that > >>>>> indicates which is the offset allowing you to retrieve the version, > >>>>> segment, etc in the name... > >>>>> > >>>>> > >>>>> Max > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 16/09/2014 20:33, Mark Stapp wrote: > >>>>> > >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: > >>>>> > >>>>> I think we agree on the small number of "component types". > >>>>> However, if you have a small number of types, you will end up with > >>>>> names > >>>>> containing many generic components types and few specific > >>>>> components > >>>>> types. Due to the fact that the component type specification is an > >>>>> exception in the name, I would prefer something that specify > >>>>> component's > >>>>> type only when needed (something like UTF8 conventions but that > >>>>> applications MUST use). > >>>>> > >>>>> so ... I can't quite follow that. the thread has had some > >>>>> explanation > >>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) > >>>>> and > >>>>> there's been email trying to explain that applications don't have to > >>>>> use types if they don't need to. your email sounds like "I prefer > >>>>> the > >>>>> UTF8 convention", but it doesn't say why you have that preference in > >>>>> the face of the points about the problems. can you say why it is > >>>>> that > >>>>> you express a preference for the "convention" with problems ? > >>>>> > >>>>> Thanks, > >>>>> Mark > >>>>> > >>>>> . > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>> > >>>> _______________________________________________ > >>>> Ndn-interest mailing list > >>>> Ndn-interest at lists.cs.ucla.edu > >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>> > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oran at cisco.com Sat Sep 20 12:14:34 2014 From: oran at cisco.com (Dave Oran (oran)) Date: Sat, 20 Sep 2014 19:14:34 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> Message-ID: <86D039FF-F464-4699-90F9-8E93C1879042@cisco.com> On Sep 20, 2014, at 2:15 PM, Tai-Lin Chu wrote: > I had thought about these questions, but I want to know your idea > besides typed component: > 1. LPM allows "data discovery". How will exact match do similar things? It doesn?t. You layer discovery on top. And you do it in a way that does not permit cache exploration by snoopers as a bad security side effect. There are any number of possible discovery protocols, including having ?pointer? objects anchored at various places in the name tree. It would be really useful to start some research on the alternative of layering discovery above the NDN L3 as opposed to building it in as is done currently with selectors. > 2. will removing selectors improve performance? Undoubtedly. Especially in caches. They are less problematic if executed by the publisher application. > How do we use other > faster technique to replace selector? If by ?faster? you mean just the individual lookup, that?s not the only issue. More important is the ?cache chasing? problem which can result in search multipliers and many round trips. > 3. fixed byte length and type. I agree more that type can be fixed > byte, but 2 bytes for length might not be enough for future. > Whether we need individual TLV?s to be longer than 64K is worth further discussion. The question of whether entire NDN object messages need to be longer than 64K could be either a dependent design decision if the overall wire format uses recursive TLV or independent if there is a fixed header that permits overall length to exceed 65K. Here?s one argument for not bothering to go over 64K. In 1978 or so, IP packets rarely exceeded 576 bytes. By 1982 this increased to 1500 or so due to Ethernet. Larger packets were attempted using IP fragmentation, which turned out to be a HORRIBLE design in practice, so PMTU discovery was proposed and adopted with limited success. Over the next two decades networks increased in speed by about 3 orders of magnitude, and inherent error rates (except for wireless) reduced by about 5 orders of magnitude. In that time, all we have seen in terms of L3 packet size increase has been the shift to 9K jumbo frames under carefully controlled deployments. We now have hardware (e.g. NICS) that happily run at 10Gbps with no performance penalty due to packets of 9K or smaller. Next gen NICs (and router switching hardware) will run cost-effectively at between 40-100 Gig without increasing the packet size, sine nobody uses interrupt driven code or per-packet serial operations anymore (and haven?t for 5-10 years). There is certainly an argument to be made that the system can be made faster and a bit simpler by allowing bigger L3 packets/messages. So far however, we aren?t even close to needing to go over 64K, and no existence proofs of a big win on future hardware of doing so. 30+ years of dramatic network performance increase has not pushed us beyond 64K. That said, if the complexity cost or the cost for low-end memory-starved systems is small enough to justify larger length fields, then the balance may tip in favor of the future-proofing even in the absence of a compelling case today or in the near future. DaveO. > > On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >> >> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >> >>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>> >>> Could you share it with us? >>> >> Sure. Here?s a strawman. >> >> The type space is 16 bits, so you have 65,565 types. >> >> The type space is currently shared with the types used for the entire protocol, that gives us two options: >> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >> >> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >> >> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >> - We give the rest of the space to application assignment. >> >> Make sense? >> >> >>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>> >>> we could design for performance, >> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >> I suspect LPM on data will always be slow (relative to the other functions). >> i suspect exclusions will always be slow because they will require extra memory references. >> >> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>> >>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>> well on it. It should be the other way around: once ndn app becomes >>>>> popular, a better chip will be designed for ndn. >>>>> >>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>> >>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>> >>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>> see that there are 3 approaches: >>>>> 1. we should not define a naming convention at all >>>>> 2. typed component: use tlv type space and add a handful of types >>>>> 3. marked component: introduce only one more type and add additional >>>>> marker space >>>>> >>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>> >>>>> Also everybody thinks that the current utf8 marker naming convention >>>>> needs to be revised. >>>>> >>>>> >>>>> >>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>> experiments? >>>>>> >>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>> >>>>>> >>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>> >>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>> name, it will be pretty expensive too. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>> wrote: >>>>>> >>>>>> Does this make that much difference? >>>>>> >>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>> >>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>> offset of the beginning of the name. >>>>>> OR >>>>>> Start reading name, (find size + move ) 5 times. >>>>>> >>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>> that the first one is faster. I don?t think this is the case. >>>>>> >>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>> then all the required cache lines for the first 5 components. For the >>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>> Given an assumption that a cache miss is way more expensive than >>>>>> evaluating a number and computing an addition, you might find that the >>>>>> performance of the index is actually slower than the performance of the >>>>>> direct access. >>>>>> >>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>> example, if you just get the offsets and then send the offsets as >>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>> gain IF there are more cache line misses in reading the name than in >>>>>> reading the index. So, if the regular part of the name that you?re >>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>> processed by a different processor, then your might see some performance >>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>> >>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>> different processor than the index). >>>>>> >>>>>> If you have numbers that show that the index is faster I would like to see >>>>>> under what conditions and architectural assumptions. >>>>>> >>>>>> Nacho >>>>>> >>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>> I?m wrong.) >>>>>> >>>>>> >>>>>> -- >>>>>> Nacho (Ignacio) Solis >>>>>> Protocol Architect >>>>>> Principal Scientist >>>>>> Palo Alto Research Center (PARC) >>>>>> +1(650)812-4458 >>>>>> Ignacio.Solis at parc.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>> wrote: >>>>>> >>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>> bytes: >>>>>> >>>>>> i.e., >>>>>> Type = Offsets >>>>>> Length = 10 Bytes >>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>> >>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>> long. >>>>>> >>>>>> Max >>>>>> >>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>> want the first x components) you can directly have it using the >>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>> the first x-1 components. With the offset structure you cane directly >>>>>> access to the firs x components. >>>>>> >>>>>> I don't get it. What you described only works if the "offset" is >>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>> offsets to get to the x offset. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>> wrote: >>>>>> >>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> >>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>> >>>>>> Correct. >>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>> hierarchy >>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>> in >>>>>> order to retrieve special components. >>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>> avoided as >>>>>> you do not rely on field separators to parse the name; you use the >>>>>> "offset >>>>>> TLV " to do that. >>>>>> >>>>>> So now, it may be an aesthetic question but: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>> want >>>>>> the first x components) you can directly have it using the offsets. >>>>>> With the >>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>> components. >>>>>> With the offset structure you cane directly access to the firs x >>>>>> components. >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few "specific >>>>>> component type". You are imposing types for every component in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>> outside the name, if you have some components that could be used at >>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>> >>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>> >>>>>> I think we agree on the small number of "component types". >>>>>> However, if you have a small number of types, you will end up with >>>>>> names >>>>>> containing many generic components types and few specific >>>>>> components >>>>>> types. Due to the fact that the component type specification is an >>>>>> exception in the name, I would prefer something that specify >>>>>> component's >>>>>> type only when needed (something like UTF8 conventions but that >>>>>> applications MUST use). >>>>>> >>>>>> so ... I can't quite follow that. the thread has had some >>>>>> explanation >>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>> and >>>>>> there's been email trying to explain that applications don't have to >>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>> the >>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>> the face of the points about the problems. can you say why it is >>>>>> that >>>>>> you express a preference for the "convention" with problems ? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> . >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >> From Ignacio.Solis at parc.com Sat Sep 20 13:56:29 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Sat, 20 Sep 2014 20:56:29 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> Message-ID: Dave already touched on some of these concerns but let me add a couple of things. On 9/20/14, 11:15 AM, "Tai-Lin Chu" wrote: >1. LPM allows "data discovery". How will exact match do similar things? LPM in the reverse path creates more problems than it solves. It works in some cases where you have only 2 nodes in the network, but when you have a large complicated topology it doesn?t help that much. Are you going to request right-most child on every hop? What if you have cross traffic for the same prefix? You may never get to what you?re looking for because there is always a newer match. The way to solve these problems are to know more about the namespace. If you know more about the namespace then why did you need this type of discovery in the first place? Discovery at the forwarder is a privacy problem. Do you allow people to as for ?/? and then start exploring everything you have in the cache? Or do you plan to limit this with a set of rules? Discovery is needed, but what we need is not ?forwarder-level? discovery. If you want a system that allows discovery and exploration of the caches at the forwarder level then create a protocol to do that. If you want a discovery protocol to run at the transport level then create a protocol that does that. If you need a discovery protocol at the application level, then create one of those. To top it off, LPM for interests is a very complicated problem if you don?t have selectors. You?re going to end up inserting termination markers to do exact match so that you don?t get random stuff you don?t want. At that point might as well move to exact matching. Selectors by themselves are a problem see bellow. Finally, it?s unlikely that medium and big routers will implement LPM and selector matching. Cisco doesn?t think they work, Alcatel-Lucent doesn?t think they work, PARC doesn't think they work; Huawei, what do you think? Ericsson? Juniper? Anybody? >2. will removing selectors improve performance? How do we use other >faster technique to replace selector? Selectors are needed if you use LPM for interest matching, otherwise you can?t do much with LPM. IF your argument is that selectors are useful because they are used for discovery, then why not use real selectors that implement full regular expressions? Why not a full query language? In terms of performance, what is the current limit on selectors? How many excludes can I have? (And what effect does that have on router performance?) Are these enough? How many roundtrips do I need to take to discover the data that I want? I?ve always found this a funny argument because you?re basically saying: I?m not sure what I want (hence the discovery protocol), but I?ll know what I want when I see it. Well, maybe if you had a real protocol that could specify what you wanted you would have gotten it in one round trip. So this is general protocol performance. I would venture a guess that for most situations you can come up with that require LPM and selectors you?re either talking about a situation where you need a custom protocol (like some form of exploratory ad-hoc networks), or you?re relying on flooding the whole network, or you?re trying to solve a transport/application level problem. >3. fixed byte length and type. I agree more that type can be fixed >byte, but 2 bytes for length might not be enough for future. This is a valid question, is 2 bytes enough for length? The first think I want to say is that we should not confuse the length of the network TLV format with application level objects. This is like arguing about the block size of hard disks. As Dave mentioned, so far, we haven?t had a lot of problems in getting to line rate with ?small objects?. We (the network) are interested in things like MTUs of various networks and how they interact with application layer data blobs, but we don?t need to make all of these the same. If you have a 4K movie that takes 25Gigs, it?s unlikely that?s going to be a single network unit. When you read these off disk, when you move them through the filesystem or OS it?s going to be done in things smaller than 25Gigs. It?s true that some optical links can have large envelopes, but that?s not a problem. We can always encapsulate many little packets into a big bundle. The question to ask would be performance and overhead. How much overhead do we have by needing to use headers for every ?regular packet? inside one of these 25Gig envelopes? I guess if you?re encoding is inefficient this could be hi, but so far this doesn?t seem to be the case (at least not for CCN, not sure about NDN overhead). However, for the nearish future it?s hard to see all links being able to support this (specially wireless links, which, as you imagine, are growing in popularity). This means that we?ll need to chop this 25Gig thing up into smaller units for the rest of the network. This implies fragmentation if the original message was 25Gigs. Fragmentation is bad. It?s expensive and breaks a lot of things (or at least makes them harder). Why not just use reasonable size messages and bundle them when we need to send them in a large envelope? In terms of the TLVs used for large files. Well, you can always have a field for ?File Length?, of size ?8?, so you can include a 64bit number that refers to the actual file size. There is no need for the network level TLV parsing engine to deal with the overhead of 8 byte L fields or with the need to process variable length encodings. Nacho > >On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >> >> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >> >>>> I know how to make #2 flexible enough to do what things I can >>>>envision we need to do, and with a few simple conventions on how the >>>>registry of types is managed. >>> >>> Could you share it with us? >>> >> Sure. Here?s a strawman. >> >> The type space is 16 bits, so you have 65,565 types. >> >> The type space is currently shared with the types used for the entire >>protocol, that gives us two options: >> (1) we reserve a range for name component types. Given the likelihood >>there will be at least as much and probably more need to component types >>than protocol extensions, we could reserve 1/2 of the type space, giving >>us 32K types for name components. >> (2) since there is no parsing ambiguity between name components and >>other fields of the protocol (sine they are sub-types of the name type) >>we could reuse numbers and thereby have an entire 65K name component >>types. >> >> We divide the type space into regions, and manage it with a registry. >>If we ever get to the point of creating an IETF standard, IANA has 25 >>years of experience running registries and there are well-understood >>rule sets for different kinds of registries (open, requires a written >>spec, requires standards approval). >> >> - We allocate one ?default" name component type for ?generic name?, >>which would be used on name prefixes and other common cases where there >>are no special semantics on the name component. >> - We allocate a range of name component types, say 1024, to globally >>understood types that are part of the base or extension NDN >>specifications (e.g. chunk#, version#, etc. >> - We reserve some portion of the space for unanticipated uses (say >>another 1024 types) >> - We give the rest of the space to application assignment. >> >> Make sense? >> >> >>>> While I?m sympathetic to that view, there are three ways in which >>>>Moore?s law or hardware tricks will not save us from performance flaws >>>>in the design >>> >>> we could design for performance, >> That?s not what people are advocating. We are advocating that we *not* >>design for known bad performance and hope serendipity or Moore?s Law >>will come to the rescue. >> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >> Perhaps, perhaps not. Relative performance is what matters so things >>that don?t get faster while others do tend to get dropped or not used >>because they impose a performance penalty relative to the things that go >>faster. There is also the ?low-end? phenomenon where impovements in >>technology get applied to lowering cost rather than improving >>performance. For those environments bad performance just never get >>better. >> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >> I suspect LPM on data will always be slow (relative to the other >>functions). >> i suspect exclusions will always be slow because they will require >>extra memory references. >> >> However I of course don?t claim to clairvoyance so this is just >>speculation based on 35+ years of seeing performance improve by 4 orders >>of magnitude and still having to worry about counting cycles and memory >>references? >> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>> >>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>> well on it. It should be the other way around: once ndn app becomes >>>>> popular, a better chip will be designed for ndn. >>>>> >>>> While I?m sympathetic to that view, there are three ways in which >>>>Moore?s law or hardware tricks will not save us from performance flaws >>>>in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate successfully will >>>>be relatively more expensive, even with near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in its >>>>design. We just forgot those because the design elements that depended >>>>on those mistakes have fallen into disuse. The poster children for >>>>this are: >>>> 1. IP options. Nobody can use them because they are too slow on >>>>modern forwarding hardware, so they can?t be reliably used anywhere >>>> 2. the UDP checksum, which was a bad design when it was specified and >>>>is now a giant PITA that still causes major pain in working around. >>>> >>>> I?m afraid students today are being taught the that designers of IP >>>>were flawless, as opposed to very good scientists and engineers that >>>>got most of it right. >>>> >>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>> see that there are 3 approaches: >>>>> 1. we should not define a naming convention at all >>>>> 2. typed component: use tlv type space and add a handful of types >>>>> 3. marked component: introduce only one more type and add additional >>>>> marker space >>>>> >>>> I know how to make #2 flexible enough to do what things I can >>>>envision we need to do, and with a few simple conventions on how the >>>>registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our hands >>>>and letting applications design their own mutually incompatible >>>>schemes or trying to make naming conventions with markers in a way >>>>that is fast to generate/parse and also resilient against aliasing. >>>> >>>>> Also everybody thinks that the current utf8 marker naming convention >>>>> needs to be revised. >>>>> >>>>> >>>>> >>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>> Would that chip be suitable, i.e. can we expect most names to fit >>>>>>in (the >>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>> experiments? >>>>>> >>>>>> I guess wide deployment could make for even longer names. Related: >>>>>>Many URLs >>>>>> I encounter nowadays easily don't fit within two 80-column text >>>>>>lines, and >>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>> >>>>>> >>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>> >>>>>> In fact, the index in separate TLV will be slower on some >>>>>>architectures, >>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in >>>>>>memory, >>>>>> then any subsequent memory is accessed only as two adjacent 32-byte >>>>>>blocks >>>>>> (there can be at most 5 blocks available at any one time). If you >>>>>>need to >>>>>> switch between arrays, it would be very expensive. If you have to >>>>>>read past >>>>>> the name to get to the 2nd array, then read it, then backup to get >>>>>>to the >>>>>> name, it will be pretty expensive too. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>> wrote: >>>>>> >>>>>> Does this make that much difference? >>>>>> >>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>> >>>>>> Read the index, find entry 5, then read in that many bytes from the >>>>>>start >>>>>> offset of the beginning of the name. >>>>>> OR >>>>>> Start reading name, (find size + move ) 5 times. >>>>>> >>>>>> How much speed are you getting from one to the other? You seem to >>>>>>imply >>>>>> that the first one is faster. I don?t think this is the case. >>>>>> >>>>>> In the first one you?ll probably have to get the cache line for the >>>>>>index, >>>>>> then all the required cache lines for the first 5 components. For >>>>>>the >>>>>> second, you?ll have to get all the cache lines for the first 5 >>>>>>components. >>>>>> Given an assumption that a cache miss is way more expensive than >>>>>> evaluating a number and computing an addition, you might find that >>>>>>the >>>>>> performance of the index is actually slower than the performance of >>>>>>the >>>>>> direct access. >>>>>> >>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>> example, if you just get the offsets and then send the offsets as >>>>>> parameters to another processor/GPU/NPU/etc. In this case you may >>>>>>see a >>>>>> gain IF there are more cache line misses in reading the name than in >>>>>> reading the index. So, if the regular part of the name that you?re >>>>>> parsing is bigger than the cache line (64 bytes?) and the name is >>>>>>to be >>>>>> processed by a different processor, then your might see some >>>>>>performance >>>>>> gain in using the index, but in all other circumstances I bet this >>>>>>is not >>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>> >>>>>> This is all to say, I don?t think we should be designing the >>>>>>protocol with >>>>>> only one architecture in mind. (The architecture of sending the >>>>>>name to a >>>>>> different processor than the index). >>>>>> >>>>>> If you have numbers that show that the index is faster I would like >>>>>>to see >>>>>> under what conditions and architectural assumptions. >>>>>> >>>>>> Nacho >>>>>> >>>>>> (I may have misinterpreted your description so feel free to correct >>>>>>me if >>>>>> I?m wrong.) >>>>>> >>>>>> >>>>>> -- >>>>>> Nacho (Ignacio) Solis >>>>>> Protocol Architect >>>>>> Principal Scientist >>>>>> Palo Alto Research Center (PARC) >>>>>> +1(650)812-4458 >>>>>> Ignacio.Solis at parc.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>> >>>>>> wrote: >>>>>> >>>>>> Indeed each components' offset must be encoded using a fixed amount >>>>>>of >>>>>> bytes: >>>>>> >>>>>> i.e., >>>>>> Type = Offsets >>>>>> Length = 10 Bytes >>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>> >>>>>> You may also imagine to have a "Offset_2byte" type if your name is >>>>>>too >>>>>> long. >>>>>> >>>>>> Max >>>>>> >>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>> want the first x components) you can directly have it using the >>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>> the first x-1 components. With the offset structure you cane >>>>>>directly >>>>>> access to the firs x components. >>>>>> >>>>>> I don't get it. What you described only works if the "offset" is >>>>>> encoded in fixed bytes. With varNum, you will still need to parse >>>>>>x-1 >>>>>> offsets to get to the x offset. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>> wrote: >>>>>> >>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> >>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand >>>>>>what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>> >>>>>> Correct. >>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>> hierarchy >>>>>> with offsets in the name and other TLV(s) indicates the offset to >>>>>>use >>>>>> in >>>>>> order to retrieve special components. >>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>> avoided as >>>>>> you do not rely on field separators to parse the name; you use the >>>>>> "offset >>>>>> TLV " to do that. >>>>>> >>>>>> So now, it may be an aesthetic question but: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>> want >>>>>> the first x components) you can directly have it using the offsets. >>>>>> With the >>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>> components. >>>>>> With the offset structure you cane directly access to the firs x >>>>>> components. >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few "specific >>>>>> component type". You are imposing types for every component in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>> outside the name, if you have some components that could be used at >>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>> >>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>> >>>>>> I think we agree on the small number of "component types". >>>>>> However, if you have a small number of types, you will end up with >>>>>> names >>>>>> containing many generic components types and few specific >>>>>> components >>>>>> types. Due to the fact that the component type specification is an >>>>>> exception in the name, I would prefer something that specify >>>>>> component's >>>>>> type only when needed (something like UTF8 conventions but that >>>>>> applications MUST use). >>>>>> >>>>>> so ... I can't quite follow that. the thread has had some >>>>>> explanation >>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>> and >>>>>> there's been email trying to explain that applications don't have to >>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>> the >>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>> the face of the points about the problems. can you say why it is >>>>>> that >>>>>> you express a preference for the "convention" with problems ? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> . >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Sat Sep 20 14:10:35 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Sat, 20 Sep 2014 21:10:35 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> Message-ID: <1FF22A0E-E81E-439E-89C9-5C0F93C61A36@parc.com> If one publishes a Content Object at 1 GB, that is the unit of retransmission. what happens when that 1 GB content object hits an access point with a 5% loss rate? Is the access point really going to cache multiple 1 GB objects? If the unit of retransmission can be a fragment, then I would argue that?s a content object and the 1GB thing is really an aggregation not an atom that must be fragmented on smaller MTU networks. Yes, if you have an all-optical network with very low loss rate (including buffer drops) and won?t be hitting anything that needs to fragment, then going with a large MTU might be beneficial. It remains to be shown that one can do efficient cut-through forwarding of a CCN packet. Obviously if one is exploiting content object hash naming one cannot do cut-through forwarding because you need to hash before picking the next hop. If content objects are 5GB, how much packet memory are you going to need on a typical line card? At 40 Gbps, a 5 GB packet takes 125 milli-seconds, so you could likely have one arriving on each interface. Next-gen chipsets are looking at 400 Gbps, so you could have 10x interfaces so you would need at least 50 GB of frame memory? You clearly are not multiplexing any real-time traffic on those interfaces?. What would be the use-case for such large packets without fragmentation? What does such a system look like? Marc On Sep 20, 2014, at 8:41 PM, Ravi Ravindran wrote: > I agree on the 2B length field comment, that should be variable to accommodate ICN interfacing with high speed optical networks, or in general future requirements to ship large (GB) chunks of data. > > Regards, > Ravi > > On Sat, Sep 20, 2014 at 11:15 AM, Tai-Lin Chu wrote: > I had thought about these questions, but I want to know your idea > besides typed component: > 1. LPM allows "data discovery". How will exact match do similar things? > 2. will removing selectors improve performance? How do we use other > faster technique to replace selector? > 3. fixed byte length and type. I agree more that type can be fixed > byte, but 2 bytes for length might not be enough for future. > > > On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: > > > > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: > > > >>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. > >> > >> Could you share it with us? > >> > > Sure. Here?s a strawman. > > > > The type space is 16 bits, so you have 65,565 types. > > > > The type space is currently shared with the types used for the entire protocol, that gives us two options: > > (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. > > (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. > > > > We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). > > > > - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. > > - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. > > - We reserve some portion of the space for unanticipated uses (say another 1024 types) > > - We give the rest of the space to application assignment. > > > > Make sense? > > > > > >>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design > >> > >> we could design for performance, > > That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. > > > >> but I think there will be a turning > >> point when the slower design starts to become "fast enough?. > > Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. > > > >> Do you > >> think there will be some design of ndn that will *never* have > >> performance improvement? > >> > > I suspect LPM on data will always be slow (relative to the other functions). > > i suspect exclusions will always be slow because they will require extra memory references. > > > > However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? > > > >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: > >>> > >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: > >>> > >>>> We should not look at a certain chip nowadays and want ndn to perform > >>>> well on it. It should be the other way around: once ndn app becomes > >>>> popular, a better chip will be designed for ndn. > >>>> > >>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: > >>> a) clock rates are not getting (much) faster > >>> b) memory accesses are getting (relatively) more expensive > >>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. > >>> > >>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: > >>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere > >>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. > >>> > >>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. > >>> > >>>> I feel the discussion today and yesterday has been off-topic. Now I > >>>> see that there are 3 approaches: > >>>> 1. we should not define a naming convention at all > >>>> 2. typed component: use tlv type space and add a handful of types > >>>> 3. marked component: introduce only one more type and add additional > >>>> marker space > >>>> > >>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. > >>> > >>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. > >>> > >>>> Also everybody thinks that the current utf8 marker naming convention > >>>> needs to be revised. > >>>> > >>>> > >>>> > >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: > >>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the > >>>>> magnitude of) 96 bytes? What length are names usually in current NDN > >>>>> experiments? > >>>>> > >>>>> I guess wide deployment could make for even longer names. Related: Many URLs > >>>>> I encounter nowadays easily don't fit within two 80-column text lines, and > >>>>> NDN will have to carry more information than URLs, as far as I see. > >>>>> > >>>>> > >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: > >>>>> > >>>>> In fact, the index in separate TLV will be slower on some architectures, > >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, > >>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks > >>>>> (there can be at most 5 blocks available at any one time). If you need to > >>>>> switch between arrays, it would be very expensive. If you have to read past > >>>>> the name to get to the 2nd array, then read it, then backup to get to the > >>>>> name, it will be pretty expensive too. > >>>>> > >>>>> Marc > >>>>> > >>>>> On Sep 18, 2014, at 2:02 PM, > >>>>> wrote: > >>>>> > >>>>> Does this make that much difference? > >>>>> > >>>>> If you want to parse the first 5 components. One way to do it is: > >>>>> > >>>>> Read the index, find entry 5, then read in that many bytes from the start > >>>>> offset of the beginning of the name. > >>>>> OR > >>>>> Start reading name, (find size + move ) 5 times. > >>>>> > >>>>> How much speed are you getting from one to the other? You seem to imply > >>>>> that the first one is faster. I don?t think this is the case. > >>>>> > >>>>> In the first one you?ll probably have to get the cache line for the index, > >>>>> then all the required cache lines for the first 5 components. For the > >>>>> second, you?ll have to get all the cache lines for the first 5 components. > >>>>> Given an assumption that a cache miss is way more expensive than > >>>>> evaluating a number and computing an addition, you might find that the > >>>>> performance of the index is actually slower than the performance of the > >>>>> direct access. > >>>>> > >>>>> Granted, there is a case where you don?t access the name at all, for > >>>>> example, if you just get the offsets and then send the offsets as > >>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a > >>>>> gain IF there are more cache line misses in reading the name than in > >>>>> reading the index. So, if the regular part of the name that you?re > >>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be > >>>>> processed by a different processor, then your might see some performance > >>>>> gain in using the index, but in all other circumstances I bet this is not > >>>>> the case. I may be wrong, haven?t actually tested it. > >>>>> > >>>>> This is all to say, I don?t think we should be designing the protocol with > >>>>> only one architecture in mind. (The architecture of sending the name to a > >>>>> different processor than the index). > >>>>> > >>>>> If you have numbers that show that the index is faster I would like to see > >>>>> under what conditions and architectural assumptions. > >>>>> > >>>>> Nacho > >>>>> > >>>>> (I may have misinterpreted your description so feel free to correct me if > >>>>> I?m wrong.) > >>>>> > >>>>> > >>>>> -- > >>>>> Nacho (Ignacio) Solis > >>>>> Protocol Architect > >>>>> Principal Scientist > >>>>> Palo Alto Research Center (PARC) > >>>>> +1(650)812-4458 > >>>>> Ignacio.Solis at parc.com > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" > >>>>> wrote: > >>>>> > >>>>> Indeed each components' offset must be encoded using a fixed amount of > >>>>> bytes: > >>>>> > >>>>> i.e., > >>>>> Type = Offsets > >>>>> Length = 10 Bytes > >>>>> Value = Offset1(1byte), Offset2(1byte), ... > >>>>> > >>>>> You may also imagine to have a "Offset_2byte" type if your name is too > >>>>> long. > >>>>> > >>>>> Max > >>>>> > >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: > >>>>> > >>>>> if you do not need the entire hierarchal structure (suppose you only > >>>>> want the first x components) you can directly have it using the > >>>>> offsets. With the Nested TLV structure you have to iteratively parse > >>>>> the first x-1 components. With the offset structure you cane directly > >>>>> access to the firs x components. > >>>>> > >>>>> I don't get it. What you described only works if the "offset" is > >>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 > >>>>> offsets to get to the x offset. > >>>>> > >>>>> > >>>>> > >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > >>>>> wrote: > >>>>> > >>>>> On 17/09/2014 14:56, Mark Stapp wrote: > >>>>> > >>>>> ah, thanks - that's helpful. I thought you were saying "I like the > >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what > >>>>> you > >>>>> _do_ prefer, though. it sounds like you're describing an entirely > >>>>> different > >>>>> scheme where the info that describes the name-components is ... > >>>>> someplace > >>>>> other than _in_ the name-components. is that correct? when you say > >>>>> "field > >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? > >>>>> > >>>>> Correct. > >>>>> In particular, with our name encoding, a TLV indicates the name > >>>>> hierarchy > >>>>> with offsets in the name and other TLV(s) indicates the offset to use > >>>>> in > >>>>> order to retrieve special components. > >>>>> As for the field separator, it is something like "/". Aliasing is > >>>>> avoided as > >>>>> you do not rely on field separators to parse the name; you use the > >>>>> "offset > >>>>> TLV " to do that. > >>>>> > >>>>> So now, it may be an aesthetic question but: > >>>>> > >>>>> if you do not need the entire hierarchal structure (suppose you only > >>>>> want > >>>>> the first x components) you can directly have it using the offsets. > >>>>> With the > >>>>> Nested TLV structure you have to iteratively parse the first x-1 > >>>>> components. > >>>>> With the offset structure you cane directly access to the firs x > >>>>> components. > >>>>> > >>>>> Max > >>>>> > >>>>> > >>>>> -- Mark > >>>>> > >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: > >>>>> > >>>>> The why is simple: > >>>>> > >>>>> You use a lot of "generic component type" and very few "specific > >>>>> component type". You are imposing types for every component in order > >>>>> to > >>>>> handle few exceptions (segmentation, etc..). You create a rule > >>>>> (specify > >>>>> the component's type ) to handle exceptions! > >>>>> > >>>>> I would prefer not to have typed components. Instead I would prefer > >>>>> to > >>>>> have the name as simple sequence bytes with a field separator. Then, > >>>>> outside the name, if you have some components that could be used at > >>>>> network layer (e.g. a TLV field), you simply need something that > >>>>> indicates which is the offset allowing you to retrieve the version, > >>>>> segment, etc in the name... > >>>>> > >>>>> > >>>>> Max > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 16/09/2014 20:33, Mark Stapp wrote: > >>>>> > >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: > >>>>> > >>>>> I think we agree on the small number of "component types". > >>>>> However, if you have a small number of types, you will end up with > >>>>> names > >>>>> containing many generic components types and few specific > >>>>> components > >>>>> types. Due to the fact that the component type specification is an > >>>>> exception in the name, I would prefer something that specify > >>>>> component's > >>>>> type only when needed (something like UTF8 conventions but that > >>>>> applications MUST use). > >>>>> > >>>>> so ... I can't quite follow that. the thread has had some > >>>>> explanation > >>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) > >>>>> and > >>>>> there's been email trying to explain that applications don't have to > >>>>> use types if they don't need to. your email sounds like "I prefer > >>>>> the > >>>>> UTF8 convention", but it doesn't say why you have that preference in > >>>>> the face of the points about the problems. can you say why it is > >>>>> that > >>>>> you express a preference for the "convention" with problems ? > >>>>> > >>>>> Thanks, > >>>>> Mark > >>>>> > >>>>> . > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Ndn-interest mailing list > >>>>> Ndn-interest at lists.cs.ucla.edu > >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>>>> > >>>> > >>>> _______________________________________________ > >>>> Ndn-interest mailing list > >>>> Ndn-interest at lists.cs.ucla.edu > >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >>> > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Marc.Mosko at parc.com Sat Sep 20 14:38:01 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Sat, 20 Sep 2014 21:38:01 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> Message-ID: <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. This is probably getting off-topic from the original post about naming conventions. a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. Marc On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: > I had thought about these questions, but I want to know your idea > besides typed component: > 1. LPM allows "data discovery". How will exact match do similar things? > 2. will removing selectors improve performance? How do we use other > faster technique to replace selector? > 3. fixed byte length and type. I agree more that type can be fixed > byte, but 2 bytes for length might not be enough for future. > > > On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >> >> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >> >>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>> >>> Could you share it with us? >>> >> Sure. Here?s a strawman. >> >> The type space is 16 bits, so you have 65,565 types. >> >> The type space is currently shared with the types used for the entire protocol, that gives us two options: >> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >> >> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >> >> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >> - We give the rest of the space to application assignment. >> >> Make sense? >> >> >>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>> >>> we could design for performance, >> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >> I suspect LPM on data will always be slow (relative to the other functions). >> i suspect exclusions will always be slow because they will require extra memory references. >> >> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>> >>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>> well on it. It should be the other way around: once ndn app becomes >>>>> popular, a better chip will be designed for ndn. >>>>> >>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>> >>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>> >>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>> see that there are 3 approaches: >>>>> 1. we should not define a naming convention at all >>>>> 2. typed component: use tlv type space and add a handful of types >>>>> 3. marked component: introduce only one more type and add additional >>>>> marker space >>>>> >>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>> >>>>> Also everybody thinks that the current utf8 marker naming convention >>>>> needs to be revised. >>>>> >>>>> >>>>> >>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>> experiments? >>>>>> >>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>> >>>>>> >>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>> >>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>> name, it will be pretty expensive too. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>> wrote: >>>>>> >>>>>> Does this make that much difference? >>>>>> >>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>> >>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>> offset of the beginning of the name. >>>>>> OR >>>>>> Start reading name, (find size + move ) 5 times. >>>>>> >>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>> that the first one is faster. I don?t think this is the case. >>>>>> >>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>> then all the required cache lines for the first 5 components. For the >>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>> Given an assumption that a cache miss is way more expensive than >>>>>> evaluating a number and computing an addition, you might find that the >>>>>> performance of the index is actually slower than the performance of the >>>>>> direct access. >>>>>> >>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>> example, if you just get the offsets and then send the offsets as >>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>> gain IF there are more cache line misses in reading the name than in >>>>>> reading the index. So, if the regular part of the name that you?re >>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>> processed by a different processor, then your might see some performance >>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>> >>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>> different processor than the index). >>>>>> >>>>>> If you have numbers that show that the index is faster I would like to see >>>>>> under what conditions and architectural assumptions. >>>>>> >>>>>> Nacho >>>>>> >>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>> I?m wrong.) >>>>>> >>>>>> >>>>>> -- >>>>>> Nacho (Ignacio) Solis >>>>>> Protocol Architect >>>>>> Principal Scientist >>>>>> Palo Alto Research Center (PARC) >>>>>> +1(650)812-4458 >>>>>> Ignacio.Solis at parc.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>> wrote: >>>>>> >>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>> bytes: >>>>>> >>>>>> i.e., >>>>>> Type = Offsets >>>>>> Length = 10 Bytes >>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>> >>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>> long. >>>>>> >>>>>> Max >>>>>> >>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>> want the first x components) you can directly have it using the >>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>> the first x-1 components. With the offset structure you cane directly >>>>>> access to the firs x components. >>>>>> >>>>>> I don't get it. What you described only works if the "offset" is >>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>> offsets to get to the x offset. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>> wrote: >>>>>> >>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> >>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>> >>>>>> Correct. >>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>> hierarchy >>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>> in >>>>>> order to retrieve special components. >>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>> avoided as >>>>>> you do not rely on field separators to parse the name; you use the >>>>>> "offset >>>>>> TLV " to do that. >>>>>> >>>>>> So now, it may be an aesthetic question but: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>> want >>>>>> the first x components) you can directly have it using the offsets. >>>>>> With the >>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>> components. >>>>>> With the offset structure you cane directly access to the firs x >>>>>> components. >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few "specific >>>>>> component type". You are imposing types for every component in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>> outside the name, if you have some components that could be used at >>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>> >>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>> >>>>>> I think we agree on the small number of "component types". >>>>>> However, if you have a small number of types, you will end up with >>>>>> names >>>>>> containing many generic components types and few specific >>>>>> components >>>>>> types. Due to the fact that the component type specification is an >>>>>> exception in the name, I would prefer something that specify >>>>>> component's >>>>>> type only when needed (something like UTF8 conventions but that >>>>>> applications MUST use). >>>>>> >>>>>> so ... I can't quite follow that. the thread has had some >>>>>> explanation >>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>> and >>>>>> there's been email trying to explain that applications don't have to >>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>> the >>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>> the face of the points about the problems. can you say why it is >>>>>> that >>>>>> you express a preference for the "convention" with problems ? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> . >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From ravi.ravindran at gmail.com Sat Sep 20 15:56:31 2014 From: ravi.ravindran at gmail.com (Ravi Ravindran) Date: Sat, 20 Sep 2014 15:56:31 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <1FF22A0E-E81E-439E-89C9-5C0F93C61A36@parc.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <1FF22A0E-E81E-439E-89C9-5C0F93C61A36@parc.com> Message-ID: Hi Marc, You point to a use case where you might have a end-to-end optical network, google-fibre like access networks could be a good example, another is when you have content producers/distributors collaborate on a high speed optical backbone to move large amount of data, but once it interacts with consuming networks, it can be adopted to any level of granularity. -Ravi On Sat, Sep 20, 2014 at 2:10 PM, wrote: > If one publishes a Content Object at 1 GB, that is the unit of > retransmission. what happens when that 1 GB content object hits an access > point with a 5% loss rate? Is the access point really going to cache > multiple 1 GB objects? > > If the unit of retransmission can be a fragment, then I would argue that?s > a content object and the 1GB thing is really an aggregation not an atom > that must be fragmented on smaller MTU networks. > > Yes, if you have an all-optical network with very low loss rate (including > buffer drops) and won?t be hitting anything that needs to fragment, then > going with a large MTU might be beneficial. It remains to be shown that > one can do efficient cut-through forwarding of a CCN packet. Obviously if > one is exploiting content object hash naming one cannot do cut-through > forwarding because you need to hash before picking the next hop. > > If content objects are 5GB, how much packet memory are you going to need > on a typical line card? At 40 Gbps, a 5 GB packet takes 125 milli-seconds, > so you could likely have one arriving on each interface. Next-gen chipsets > are looking at 400 Gbps, so you could have 10x interfaces so you would need > at least 50 GB of frame memory? You clearly are not multiplexing any > real-time traffic on those interfaces?. > > What would be the use-case for such large packets without fragmentation? > What does such a system look like? > > Marc > > On Sep 20, 2014, at 8:41 PM, Ravi Ravindran > wrote: > > I agree on the 2B length field comment, that should be variable to > accommodate ICN interfacing with high speed optical networks, or in general > future requirements to ship large (GB) chunks of data. > > Regards, > Ravi > > On Sat, Sep 20, 2014 at 11:15 AM, Tai-Lin Chu wrote: > >> I had thought about these questions, but I want to know your idea >> besides typed component: >> 1. LPM allows "data discovery". How will exact match do similar things? >> 2. will removing selectors improve performance? How do we use other >> faster technique to replace selector? >> 3. fixed byte length and type. I agree more that type can be fixed >> byte, but 2 bytes for length might not be enough for future. >> >> >> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >> > >> > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >> > >> >>> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on how the >> registry of types is managed. >> >> >> >> Could you share it with us? >> >> >> > Sure. Here?s a strawman. >> > >> > The type space is 16 bits, so you have 65,565 types. >> > >> > The type space is currently shared with the types used for the entire >> protocol, that gives us two options: >> > (1) we reserve a range for name component types. Given the likelihood >> there will be at least as much and probably more need to component types >> than protocol extensions, we could reserve 1/2 of the type space, giving us >> 32K types for name components. >> > (2) since there is no parsing ambiguity between name components and >> other fields of the protocol (sine they are sub-types of the name type) we >> could reuse numbers and thereby have an entire 65K name component types. >> > >> > We divide the type space into regions, and manage it with a registry. >> If we ever get to the point of creating an IETF standard, IANA has 25 years >> of experience running registries and there are well-understood rule sets >> for different kinds of registries (open, requires a written spec, requires >> standards approval). >> > >> > - We allocate one ?default" name component type for ?generic name?, >> which would be used on name prefixes and other common cases where there are >> no special semantics on the name component. >> > - We allocate a range of name component types, say 1024, to globally >> understood types that are part of the base or extension NDN specifications >> (e.g. chunk#, version#, etc. >> > - We reserve some portion of the space for unanticipated uses (say >> another 1024 types) >> > - We give the rest of the space to application assignment. >> > >> > Make sense? >> > >> > >> >>> While I?m sympathetic to that view, there are three ways in which >> Moore?s law or hardware tricks will not save us from performance flaws in >> the design >> >> >> >> we could design for performance, >> > That?s not what people are advocating. We are advocating that we *not* >> design for known bad performance and hope serendipity or Moore?s Law will >> come to the rescue. >> > >> >> but I think there will be a turning >> >> point when the slower design starts to become "fast enough?. >> > Perhaps, perhaps not. Relative performance is what matters so things >> that don?t get faster while others do tend to get dropped or not used >> because they impose a performance penalty relative to the things that go >> faster. There is also the ?low-end? phenomenon where impovements in >> technology get applied to lowering cost rather than improving performance. >> For those environments bad performance just never get better. >> > >> >> Do you >> >> think there will be some design of ndn that will *never* have >> >> performance improvement? >> >> >> > I suspect LPM on data will always be slow (relative to the other >> functions). >> > i suspect exclusions will always be slow because they will require >> extra memory references. >> > >> > However I of course don?t claim to clairvoyance so this is just >> speculation based on 35+ years of seeing performance improve by 4 orders of >> magnitude and still having to worry about counting cycles and memory >> references? >> > >> >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >> wrote: >> >>> >> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >> >>> >> >>>> We should not look at a certain chip nowadays and want ndn to perform >> >>>> well on it. It should be the other way around: once ndn app becomes >> >>>> popular, a better chip will be designed for ndn. >> >>>> >> >>> While I?m sympathetic to that view, there are three ways in which >> Moore?s law or hardware tricks will not save us from performance flaws in >> the design: >> >>> a) clock rates are not getting (much) faster >> >>> b) memory accesses are getting (relatively) more expensive >> >>> c) data structures that require locks to manipulate successfully will >> be relatively more expensive, even with near-zero lock contention. >> >>> >> >>> The fact is, IP *did* have some serious performance flaws in its >> design. We just forgot those because the design elements that depended on >> those mistakes have fallen into disuse. The poster children for this are: >> >>> 1. IP options. Nobody can use them because they are too slow on >> modern forwarding hardware, so they can?t be reliably used anywhere >> >>> 2. the UDP checksum, which was a bad design when it was specified and >> is now a giant PITA that still causes major pain in working around. >> >>> >> >>> I?m afraid students today are being taught the that designers of IP >> were flawless, as opposed to very good scientists and engineers that got >> most of it right. >> >>> >> >>>> I feel the discussion today and yesterday has been off-topic. Now I >> >>>> see that there are 3 approaches: >> >>>> 1. we should not define a naming convention at all >> >>>> 2. typed component: use tlv type space and add a handful of types >> >>>> 3. marked component: introduce only one more type and add additional >> >>>> marker space >> >>>> >> >>> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on how the >> registry of types is managed. >> >>> >> >>> It is just as powerful in practice as either throwing up our hands >> and letting applications design their own mutually incompatible schemes or >> trying to make naming conventions with markers in a way that is fast to >> generate/parse and also resilient against aliasing. >> >>> >> >>>> Also everybody thinks that the current utf8 marker naming convention >> >>>> needs to be revised. >> >>>> >> >>>> >> >>>> >> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >> >>>>> Would that chip be suitable, i.e. can we expect most names to fit >> in (the >> >>>>> magnitude of) 96 bytes? What length are names usually in current NDN >> >>>>> experiments? >> >>>>> >> >>>>> I guess wide deployment could make for even longer names. Related: >> Many URLs >> >>>>> I encounter nowadays easily don't fit within two 80-column text >> lines, and >> >>>>> NDN will have to carry more information than URLs, as far as I see. >> >>>>> >> >>>>> >> >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >> >>>>> >> >>>>> In fact, the index in separate TLV will be slower on some >> architectures, >> >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in >> memory, >> >>>>> then any subsequent memory is accessed only as two adjacent 32-byte >> blocks >> >>>>> (there can be at most 5 blocks available at any one time). If you >> need to >> >>>>> switch between arrays, it would be very expensive. If you have to >> read past >> >>>>> the name to get to the 2nd array, then read it, then backup to get >> to the >> >>>>> name, it will be pretty expensive too. >> >>>>> >> >>>>> Marc >> >>>>> >> >>>>> On Sep 18, 2014, at 2:02 PM, >> >>>>> wrote: >> >>>>> >> >>>>> Does this make that much difference? >> >>>>> >> >>>>> If you want to parse the first 5 components. One way to do it is: >> >>>>> >> >>>>> Read the index, find entry 5, then read in that many bytes from the >> start >> >>>>> offset of the beginning of the name. >> >>>>> OR >> >>>>> Start reading name, (find size + move ) 5 times. >> >>>>> >> >>>>> How much speed are you getting from one to the other? You seem to >> imply >> >>>>> that the first one is faster. I don?t think this is the case. >> >>>>> >> >>>>> In the first one you?ll probably have to get the cache line for the >> index, >> >>>>> then all the required cache lines for the first 5 components. For >> the >> >>>>> second, you?ll have to get all the cache lines for the first 5 >> components. >> >>>>> Given an assumption that a cache miss is way more expensive than >> >>>>> evaluating a number and computing an addition, you might find that >> the >> >>>>> performance of the index is actually slower than the performance of >> the >> >>>>> direct access. >> >>>>> >> >>>>> Granted, there is a case where you don?t access the name at all, for >> >>>>> example, if you just get the offsets and then send the offsets as >> >>>>> parameters to another processor/GPU/NPU/etc. In this case you may >> see a >> >>>>> gain IF there are more cache line misses in reading the name than in >> >>>>> reading the index. So, if the regular part of the name that you?re >> >>>>> parsing is bigger than the cache line (64 bytes?) and the name is >> to be >> >>>>> processed by a different processor, then your might see some >> performance >> >>>>> gain in using the index, but in all other circumstances I bet this >> is not >> >>>>> the case. I may be wrong, haven?t actually tested it. >> >>>>> >> >>>>> This is all to say, I don?t think we should be designing the >> protocol with >> >>>>> only one architecture in mind. (The architecture of sending the >> name to a >> >>>>> different processor than the index). >> >>>>> >> >>>>> If you have numbers that show that the index is faster I would like >> to see >> >>>>> under what conditions and architectural assumptions. >> >>>>> >> >>>>> Nacho >> >>>>> >> >>>>> (I may have misinterpreted your description so feel free to correct >> me if >> >>>>> I?m wrong.) >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Nacho (Ignacio) Solis >> >>>>> Protocol Architect >> >>>>> Principal Scientist >> >>>>> Palo Alto Research Center (PARC) >> >>>>> +1(650)812-4458 >> >>>>> Ignacio.Solis at parc.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" < >> massimo.gallo at alcatel-lucent.com> >> >>>>> wrote: >> >>>>> >> >>>>> Indeed each components' offset must be encoded using a fixed amount >> of >> >>>>> bytes: >> >>>>> >> >>>>> i.e., >> >>>>> Type = Offsets >> >>>>> Length = 10 Bytes >> >>>>> Value = Offset1(1byte), Offset2(1byte), ... >> >>>>> >> >>>>> You may also imagine to have a "Offset_2byte" type if your name is >> too >> >>>>> long. >> >>>>> >> >>>>> Max >> >>>>> >> >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >> >>>>> >> >>>>> if you do not need the entire hierarchal structure (suppose you only >> >>>>> want the first x components) you can directly have it using the >> >>>>> offsets. With the Nested TLV structure you have to iteratively parse >> >>>>> the first x-1 components. With the offset structure you cane >> directly >> >>>>> access to the firs x components. >> >>>>> >> >>>>> I don't get it. What you described only works if the "offset" is >> >>>>> encoded in fixed bytes. With varNum, you will still need to parse >> x-1 >> >>>>> offsets to get to the x offset. >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >> >>>>> wrote: >> >>>>> >> >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >> >>>>> >> >>>>> ah, thanks - that's helpful. I thought you were saying "I like the >> >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand >> what >> >>>>> you >> >>>>> _do_ prefer, though. it sounds like you're describing an entirely >> >>>>> different >> >>>>> scheme where the info that describes the name-components is ... >> >>>>> someplace >> >>>>> other than _in_ the name-components. is that correct? when you say >> >>>>> "field >> >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >> >>>>> >> >>>>> Correct. >> >>>>> In particular, with our name encoding, a TLV indicates the name >> >>>>> hierarchy >> >>>>> with offsets in the name and other TLV(s) indicates the offset to >> use >> >>>>> in >> >>>>> order to retrieve special components. >> >>>>> As for the field separator, it is something like "/". Aliasing is >> >>>>> avoided as >> >>>>> you do not rely on field separators to parse the name; you use the >> >>>>> "offset >> >>>>> TLV " to do that. >> >>>>> >> >>>>> So now, it may be an aesthetic question but: >> >>>>> >> >>>>> if you do not need the entire hierarchal structure (suppose you only >> >>>>> want >> >>>>> the first x components) you can directly have it using the offsets. >> >>>>> With the >> >>>>> Nested TLV structure you have to iteratively parse the first x-1 >> >>>>> components. >> >>>>> With the offset structure you cane directly access to the firs x >> >>>>> components. >> >>>>> >> >>>>> Max >> >>>>> >> >>>>> >> >>>>> -- Mark >> >>>>> >> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >> >>>>> >> >>>>> The why is simple: >> >>>>> >> >>>>> You use a lot of "generic component type" and very few "specific >> >>>>> component type". You are imposing types for every component in order >> >>>>> to >> >>>>> handle few exceptions (segmentation, etc..). You create a rule >> >>>>> (specify >> >>>>> the component's type ) to handle exceptions! >> >>>>> >> >>>>> I would prefer not to have typed components. Instead I would prefer >> >>>>> to >> >>>>> have the name as simple sequence bytes with a field separator. Then, >> >>>>> outside the name, if you have some components that could be used at >> >>>>> network layer (e.g. a TLV field), you simply need something that >> >>>>> indicates which is the offset allowing you to retrieve the version, >> >>>>> segment, etc in the name... >> >>>>> >> >>>>> >> >>>>> Max >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On 16/09/2014 20:33, Mark Stapp wrote: >> >>>>> >> >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >> >>>>> >> >>>>> I think we agree on the small number of "component types". >> >>>>> However, if you have a small number of types, you will end up with >> >>>>> names >> >>>>> containing many generic components types and few specific >> >>>>> components >> >>>>> types. Due to the fact that the component type specification is an >> >>>>> exception in the name, I would prefer something that specify >> >>>>> component's >> >>>>> type only when needed (something like UTF8 conventions but that >> >>>>> applications MUST use). >> >>>>> >> >>>>> so ... I can't quite follow that. the thread has had some >> >>>>> explanation >> >>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >> >>>>> and >> >>>>> there's been email trying to explain that applications don't have to >> >>>>> use types if they don't need to. your email sounds like "I prefer >> >>>>> the >> >>>>> UTF8 convention", but it doesn't say why you have that preference in >> >>>>> the face of the points about the problems. can you say why it is >> >>>>> that >> >>>>> you express a preference for the "convention" with problems ? >> >>>>> >> >>>>> Thanks, >> >>>>> Mark >> >>>>> >> >>>>> . >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>> >> >>>> _______________________________________________ >> >>>> Ndn-interest mailing list >> >>>> Ndn-interest at lists.cs.ucla.edu >> >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>> >> > >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Sat Sep 20 16:47:04 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 20 Sep 2014 16:47:04 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> Message-ID: > If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. Could you explain why missing content object situation happens? also range exclusion is just a shorter notation for many explicit exclude; converting from explicit excludes to ranged exclude is always possible. > You exclude through 100 then issue a new interest. This goes to cache B I feel this case is invalid because cache A will also get the interest, and cache A will return v101 if it exists. Like you said, if this goes to cache B only, it means that cache A dies. How do you know that v101 even exist? c,d In general I agree that LPM performance is related to the number of components. In my own thread-safe LMP implementation, I used only one RWMutex for the whole tree. I don't know whether adding lock for every node will be faster or not because of lock overhead. However, we should compare (exact match + discovery protocol) vs (ndn lpm). Comparing performance of exact match to lpm is unfair. On Sat, Sep 20, 2014 at 2:38 PM, wrote: > I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. > > This is probably getting off-topic from the original post about naming conventions. > > a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. > > b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. > > I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. > > c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. > > d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. > > Marc > > > On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: > >> I had thought about these questions, but I want to know your idea >> besides typed component: >> 1. LPM allows "data discovery". How will exact match do similar things? >> 2. will removing selectors improve performance? How do we use other >> faster technique to replace selector? >> 3. fixed byte length and type. I agree more that type can be fixed >> byte, but 2 bytes for length might not be enough for future. >> >> >> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >>> >>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>> >>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>> >>>> Could you share it with us? >>>> >>> Sure. Here?s a strawman. >>> >>> The type space is 16 bits, so you have 65,565 types. >>> >>> The type space is currently shared with the types used for the entire protocol, that gives us two options: >>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >>> >>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >>> >>> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >>> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >>> - We give the rest of the space to application assignment. >>> >>> Make sense? >>> >>> >>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>>> >>>> we could design for performance, >>> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>> I suspect LPM on data will always be slow (relative to the other functions). >>> i suspect exclusions will always be slow because they will require extra memory references. >>> >>> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>>> >>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>>> >>>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>>> well on it. It should be the other way around: once ndn app becomes >>>>>> popular, a better chip will be designed for ndn. >>>>>> >>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>>> a) clock rates are not getting (much) faster >>>>> b) memory accesses are getting (relatively) more expensive >>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>>> >>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>>> >>>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>>> >>>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>>> see that there are 3 approaches: >>>>>> 1. we should not define a naming convention at all >>>>>> 2. typed component: use tlv type space and add a handful of types >>>>>> 3. marked component: introduce only one more type and add additional >>>>>> marker space >>>>>> >>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>> >>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>>> >>>>>> Also everybody thinks that the current utf8 marker naming convention >>>>>> needs to be revised. >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>>> experiments? >>>>>>> >>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>>> >>>>>>> >>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>> >>>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>>> name, it will be pretty expensive too. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>> wrote: >>>>>>> >>>>>>> Does this make that much difference? >>>>>>> >>>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>>> >>>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>>> offset of the beginning of the name. >>>>>>> OR >>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>> >>>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>>> that the first one is faster. I don?t think this is the case. >>>>>>> >>>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>>> then all the required cache lines for the first 5 components. For the >>>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>>> Given an assumption that a cache miss is way more expensive than >>>>>>> evaluating a number and computing an addition, you might find that the >>>>>>> performance of the index is actually slower than the performance of the >>>>>>> direct access. >>>>>>> >>>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>>> example, if you just get the offsets and then send the offsets as >>>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>>> gain IF there are more cache line misses in reading the name than in >>>>>>> reading the index. So, if the regular part of the name that you?re >>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>>> processed by a different processor, then your might see some performance >>>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>> >>>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>>> different processor than the index). >>>>>>> >>>>>>> If you have numbers that show that the index is faster I would like to see >>>>>>> under what conditions and architectural assumptions. >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>>> I?m wrong.) >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nacho (Ignacio) Solis >>>>>>> Protocol Architect >>>>>>> Principal Scientist >>>>>>> Palo Alto Research Center (PARC) >>>>>>> +1(650)812-4458 >>>>>>> Ignacio.Solis at parc.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>> wrote: >>>>>>> >>>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>>> bytes: >>>>>>> >>>>>>> i.e., >>>>>>> Type = Offsets >>>>>>> Length = 10 Bytes >>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>> >>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>>> long. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>> want the first x components) you can directly have it using the >>>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>>> the first x-1 components. With the offset structure you cane directly >>>>>>> access to the firs x components. >>>>>>> >>>>>>> I don't get it. What you described only works if the "offset" is >>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>>> offsets to get to the x offset. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>> wrote: >>>>>>> >>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>> >>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>>> you >>>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>>> different >>>>>>> scheme where the info that describes the name-components is ... >>>>>>> someplace >>>>>>> other than _in_ the name-components. is that correct? when you say >>>>>>> "field >>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>>> >>>>>>> Correct. >>>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>>> hierarchy >>>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>>> in >>>>>>> order to retrieve special components. >>>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>>> avoided as >>>>>>> you do not rely on field separators to parse the name; you use the >>>>>>> "offset >>>>>>> TLV " to do that. >>>>>>> >>>>>>> So now, it may be an aesthetic question but: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>> want >>>>>>> the first x components) you can directly have it using the offsets. >>>>>>> With the >>>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>>> components. >>>>>>> With the offset structure you cane directly access to the firs x >>>>>>> components. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> -- Mark >>>>>>> >>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> The why is simple: >>>>>>> >>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>> component type". You are imposing types for every component in order >>>>>>> to >>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>> (specify >>>>>>> the component's type ) to handle exceptions! >>>>>>> >>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>> to >>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>> outside the name, if you have some components that could be used at >>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>> segment, etc in the name... >>>>>>> >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>> >>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> I think we agree on the small number of "component types". >>>>>>> However, if you have a small number of types, you will end up with >>>>>>> names >>>>>>> containing many generic components types and few specific >>>>>>> components >>>>>>> types. Due to the fact that the component type specification is an >>>>>>> exception in the name, I would prefer something that specify >>>>>>> component's >>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>> applications MUST use). >>>>>>> >>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>> explanation >>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>> and >>>>>>> there's been email trying to explain that applications don't have to >>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>> the >>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>> the face of the points about the problems. can you say why it is >>>>>>> that >>>>>>> you express a preference for the "convention" with problems ? >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> . >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Marc.Mosko at parc.com Sat Sep 20 22:55:11 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Sun, 21 Sep 2014 05:55:11 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> Message-ID: On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >> If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. > > Could you explain why missing content object situation happens? also > range exclusion is just a shorter notation for many explicit exclude; > converting from explicit excludes to ranged exclude is always > possible. Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. > >> You exclude through 100 then issue a new interest. This goes to cache B > > I feel this case is invalid because cache A will also get the > interest, and cache A will return v101 if it exists. Like you said, if > this goes to cache B only, it means that cache A dies. How do you know > that v101 even exist? I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. > > > c,d In general I agree that LPM performance is related to the number > of components. In my own thread-safe LMP implementation, I used only > one RWMutex for the whole tree. I don't know whether adding lock for > every node will be faster or not because of lock overhead. > > However, we should compare (exact match + discovery protocol) vs (ndn > lpm). Comparing performance of exact match to lpm is unfair. Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. > > > > > > On Sat, Sep 20, 2014 at 2:38 PM, wrote: >> I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. >> >> This is probably getting off-topic from the original post about naming conventions. >> >> a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. >> >> b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. >> >> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. >> >> c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. >> >> d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. >> >> Marc >> >> >> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: >> >>> I had thought about these questions, but I want to know your idea >>> besides typed component: >>> 1. LPM allows "data discovery". How will exact match do similar things? >>> 2. will removing selectors improve performance? How do we use other >>> faster technique to replace selector? >>> 3. fixed byte length and type. I agree more that type can be fixed >>> byte, but 2 bytes for length might not be enough for future. >>> >>> >>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>>> >>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>> >>>>> Could you share it with us? >>>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >>>> >>>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >>>> >>>> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>>>> >>>>> we could design for performance, >>>> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >>>> >>>>> but I think there will be a turning >>>>> point when the slower design starts to become "fast enough?. >>>> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >>>> >>>>> Do you >>>>> think there will be some design of ndn that will *never* have >>>>> performance improvement? >>>>> >>>> I suspect LPM on data will always be slow (relative to the other functions). >>>> i suspect exclusions will always be slow because they will require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >>>> >>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>>>> >>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>>>> >>>>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>>>> well on it. It should be the other way around: once ndn app becomes >>>>>>> popular, a better chip will be designed for ndn. >>>>>>> >>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>>>> a) clock rates are not getting (much) faster >>>>>> b) memory accesses are getting (relatively) more expensive >>>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>>>> >>>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>>>> >>>>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>>>> >>>>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>>>> see that there are 3 approaches: >>>>>>> 1. we should not define a naming convention at all >>>>>>> 2. typed component: use tlv type space and add a handful of types >>>>>>> 3. marked component: introduce only one more type and add additional >>>>>>> marker space >>>>>>> >>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>> >>>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>>>> >>>>>>> Also everybody thinks that the current utf8 marker naming convention >>>>>>> needs to be revised. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>>>> experiments? >>>>>>>> >>>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>>>> >>>>>>>> >>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>> >>>>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>>>> name, it will be pretty expensive too. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>> wrote: >>>>>>>> >>>>>>>> Does this make that much difference? >>>>>>>> >>>>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>>>> >>>>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>>>> offset of the beginning of the name. >>>>>>>> OR >>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>> >>>>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>>>> that the first one is faster. I don?t think this is the case. >>>>>>>> >>>>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>>>> then all the required cache lines for the first 5 components. For the >>>>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>>>> Given an assumption that a cache miss is way more expensive than >>>>>>>> evaluating a number and computing an addition, you might find that the >>>>>>>> performance of the index is actually slower than the performance of the >>>>>>>> direct access. >>>>>>>> >>>>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>>>> example, if you just get the offsets and then send the offsets as >>>>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>>>> gain IF there are more cache line misses in reading the name than in >>>>>>>> reading the index. So, if the regular part of the name that you?re >>>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>>>> processed by a different processor, then your might see some performance >>>>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>> >>>>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>>>> different processor than the index). >>>>>>>> >>>>>>>> If you have numbers that show that the index is faster I would like to see >>>>>>>> under what conditions and architectural assumptions. >>>>>>>> >>>>>>>> Nacho >>>>>>>> >>>>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>>>> I?m wrong.) >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nacho (Ignacio) Solis >>>>>>>> Protocol Architect >>>>>>>> Principal Scientist >>>>>>>> Palo Alto Research Center (PARC) >>>>>>>> +1(650)812-4458 >>>>>>>> Ignacio.Solis at parc.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>> wrote: >>>>>>>> >>>>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>>>> bytes: >>>>>>>> >>>>>>>> i.e., >>>>>>>> Type = Offsets >>>>>>>> Length = 10 Bytes >>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>> >>>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>>>> long. >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>> >>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>> want the first x components) you can directly have it using the >>>>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>>>> the first x-1 components. With the offset structure you cane directly >>>>>>>> access to the firs x components. >>>>>>>> >>>>>>>> I don't get it. What you described only works if the "offset" is >>>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>>>> offsets to get to the x offset. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>> >>>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>>>> you >>>>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>>>> different >>>>>>>> scheme where the info that describes the name-components is ... >>>>>>>> someplace >>>>>>>> other than _in_ the name-components. is that correct? when you say >>>>>>>> "field >>>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>>>> >>>>>>>> Correct. >>>>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>>>> hierarchy >>>>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>>>> in >>>>>>>> order to retrieve special components. >>>>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>>>> avoided as >>>>>>>> you do not rely on field separators to parse the name; you use the >>>>>>>> "offset >>>>>>>> TLV " to do that. >>>>>>>> >>>>>>>> So now, it may be an aesthetic question but: >>>>>>>> >>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>> want >>>>>>>> the first x components) you can directly have it using the offsets. >>>>>>>> With the >>>>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>>>> components. >>>>>>>> With the offset structure you cane directly access to the firs x >>>>>>>> components. >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> >>>>>>>> -- Mark >>>>>>>> >>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>> >>>>>>>> The why is simple: >>>>>>>> >>>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>>> component type". You are imposing types for every component in order >>>>>>>> to >>>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>>> (specify >>>>>>>> the component's type ) to handle exceptions! >>>>>>>> >>>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>>> to >>>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>>> outside the name, if you have some components that could be used at >>>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>>> segment, etc in the name... >>>>>>>> >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>> >>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>> >>>>>>>> I think we agree on the small number of "component types". >>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>> names >>>>>>>> containing many generic components types and few specific >>>>>>>> components >>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>> exception in the name, I would prefer something that specify >>>>>>>> component's >>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>> applications MUST use). >>>>>>>> >>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>> explanation >>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>> and >>>>>>>> there's been email trying to explain that applications don't have to >>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>> the >>>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>> that >>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>>> . >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Marc.Mosko at parc.com Sat Sep 20 22:57:03 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Sun, 21 Sep 2014 05:57:03 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <1FF22A0E-E81E-439E-89C9-5C0F93C61A36@parc.com> Message-ID: <6134CB3B-B827-4F4B-8F7C-F957A4DC02D3@parc.com> >> it can be adopted to any level of granularity. So by that you mean re-signed and re-published as a new set of content objects with a more friendly MTU for the non-optical network? Also, in my example my math was off a little. A 5 GByte object at 40 Gbps is 1 second, not 1/8th of a second, of serialization delay. Marc On Sep 21, 2014, at 12:56 AM, Ravi Ravindran wrote: > Hi Marc, > > You point to a use case where you might have a end-to-end optical network, google-fibre like access networks could be a good example, another is when you have content producers/distributors collaborate on a high speed optical backbone to move large amount of data, but once it interacts with consuming networks, it can be adopted to any level of granularity. > > -Ravi > > On Sat, Sep 20, 2014 at 2:10 PM, wrote: > If one publishes a Content Object at 1 GB, that is the unit of retransmission. what happens when that 1 GB content object hits an access point with a 5% loss rate? Is the access point really going to cache multiple 1 GB objects? > > If the unit of retransmission can be a fragment, then I would argue that?s a content object and the 1GB thing is really an aggregation not an atom that must be fragmented on smaller MTU networks. > > Yes, if you have an all-optical network with very low loss rate (including buffer drops) and won?t be hitting anything that needs to fragment, then going with a large MTU might be beneficial. It remains to be shown that one can do efficient cut-through forwarding of a CCN packet. Obviously if one is exploiting content object hash naming one cannot do cut-through forwarding because you need to hash before picking the next hop. > > If content objects are 5GB, how much packet memory are you going to need on a typical line card? At 40 Gbps, a 5 GB packet takes 125 milli-seconds, so you could likely have one arriving on each interface. Next-gen chipsets are looking at 400 Gbps, so you could have 10x interfaces so you would need at least 50 GB of frame memory? You clearly are not multiplexing any real-time traffic on those interfaces?. > > What would be the use-case for such large packets without fragmentation? What does such a system look like? > > Marc > > On Sep 20, 2014, at 8:41 PM, Ravi Ravindran wrote: > >> I agree on the 2B length field comment, that should be variable to accommodate ICN interfacing with high speed optical networks, or in general future requirements to ship large (GB) chunks of data. >> >> Regards, >> Ravi >> >> On Sat, Sep 20, 2014 at 11:15 AM, Tai-Lin Chu wrote: >> I had thought about these questions, but I want to know your idea >> besides typed component: >> 1. LPM allows "data discovery". How will exact match do similar things? >> 2. will removing selectors improve performance? How do we use other >> faster technique to replace selector? >> 3. fixed byte length and type. I agree more that type can be fixed >> byte, but 2 bytes for length might not be enough for future. >> >> >> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >> > >> > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >> > >> >>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >> >> >> >> Could you share it with us? >> >> >> > Sure. Here?s a strawman. >> > >> > The type space is 16 bits, so you have 65,565 types. >> > >> > The type space is currently shared with the types used for the entire protocol, that gives us two options: >> > (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >> > (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >> > >> > We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >> > >> > - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >> > - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >> > - We reserve some portion of the space for unanticipated uses (say another 1024 types) >> > - We give the rest of the space to application assignment. >> > >> > Make sense? >> > >> > >> >>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >> >> >> >> we could design for performance, >> > That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >> > >> >> but I think there will be a turning >> >> point when the slower design starts to become "fast enough?. >> > Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >> > >> >> Do you >> >> think there will be some design of ndn that will *never* have >> >> performance improvement? >> >> >> > I suspect LPM on data will always be slow (relative to the other functions). >> > i suspect exclusions will always be slow because they will require extra memory references. >> > >> > However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >> > >> >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >> >>> >> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >> >>> >> >>>> We should not look at a certain chip nowadays and want ndn to perform >> >>>> well on it. It should be the other way around: once ndn app becomes >> >>>> popular, a better chip will be designed for ndn. >> >>>> >> >>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >> >>> a) clock rates are not getting (much) faster >> >>> b) memory accesses are getting (relatively) more expensive >> >>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >> >>> >> >>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >> >>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >> >>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >> >>> >> >>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >> >>> >> >>>> I feel the discussion today and yesterday has been off-topic. Now I >> >>>> see that there are 3 approaches: >> >>>> 1. we should not define a naming convention at all >> >>>> 2. typed component: use tlv type space and add a handful of types >> >>>> 3. marked component: introduce only one more type and add additional >> >>>> marker space >> >>>> >> >>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >> >>> >> >>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >> >>> >> >>>> Also everybody thinks that the current utf8 marker naming convention >> >>>> needs to be revised. >> >>>> >> >>>> >> >>>> >> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >> >>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >> >>>>> magnitude of) 96 bytes? What length are names usually in current NDN >> >>>>> experiments? >> >>>>> >> >>>>> I guess wide deployment could make for even longer names. Related: Many URLs >> >>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >> >>>>> NDN will have to carry more information than URLs, as far as I see. >> >>>>> >> >>>>> >> >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >> >>>>> >> >>>>> In fact, the index in separate TLV will be slower on some architectures, >> >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >> >>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >> >>>>> (there can be at most 5 blocks available at any one time). If you need to >> >>>>> switch between arrays, it would be very expensive. If you have to read past >> >>>>> the name to get to the 2nd array, then read it, then backup to get to the >> >>>>> name, it will be pretty expensive too. >> >>>>> >> >>>>> Marc >> >>>>> >> >>>>> On Sep 18, 2014, at 2:02 PM, >> >>>>> wrote: >> >>>>> >> >>>>> Does this make that much difference? >> >>>>> >> >>>>> If you want to parse the first 5 components. One way to do it is: >> >>>>> >> >>>>> Read the index, find entry 5, then read in that many bytes from the start >> >>>>> offset of the beginning of the name. >> >>>>> OR >> >>>>> Start reading name, (find size + move ) 5 times. >> >>>>> >> >>>>> How much speed are you getting from one to the other? You seem to imply >> >>>>> that the first one is faster. I don?t think this is the case. >> >>>>> >> >>>>> In the first one you?ll probably have to get the cache line for the index, >> >>>>> then all the required cache lines for the first 5 components. For the >> >>>>> second, you?ll have to get all the cache lines for the first 5 components. >> >>>>> Given an assumption that a cache miss is way more expensive than >> >>>>> evaluating a number and computing an addition, you might find that the >> >>>>> performance of the index is actually slower than the performance of the >> >>>>> direct access. >> >>>>> >> >>>>> Granted, there is a case where you don?t access the name at all, for >> >>>>> example, if you just get the offsets and then send the offsets as >> >>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >> >>>>> gain IF there are more cache line misses in reading the name than in >> >>>>> reading the index. So, if the regular part of the name that you?re >> >>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >> >>>>> processed by a different processor, then your might see some performance >> >>>>> gain in using the index, but in all other circumstances I bet this is not >> >>>>> the case. I may be wrong, haven?t actually tested it. >> >>>>> >> >>>>> This is all to say, I don?t think we should be designing the protocol with >> >>>>> only one architecture in mind. (The architecture of sending the name to a >> >>>>> different processor than the index). >> >>>>> >> >>>>> If you have numbers that show that the index is faster I would like to see >> >>>>> under what conditions and architectural assumptions. >> >>>>> >> >>>>> Nacho >> >>>>> >> >>>>> (I may have misinterpreted your description so feel free to correct me if >> >>>>> I?m wrong.) >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Nacho (Ignacio) Solis >> >>>>> Protocol Architect >> >>>>> Principal Scientist >> >>>>> Palo Alto Research Center (PARC) >> >>>>> +1(650)812-4458 >> >>>>> Ignacio.Solis at parc.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >> >>>>> wrote: >> >>>>> >> >>>>> Indeed each components' offset must be encoded using a fixed amount of >> >>>>> bytes: >> >>>>> >> >>>>> i.e., >> >>>>> Type = Offsets >> >>>>> Length = 10 Bytes >> >>>>> Value = Offset1(1byte), Offset2(1byte), ... >> >>>>> >> >>>>> You may also imagine to have a "Offset_2byte" type if your name is too >> >>>>> long. >> >>>>> >> >>>>> Max >> >>>>> >> >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >> >>>>> >> >>>>> if you do not need the entire hierarchal structure (suppose you only >> >>>>> want the first x components) you can directly have it using the >> >>>>> offsets. With the Nested TLV structure you have to iteratively parse >> >>>>> the first x-1 components. With the offset structure you cane directly >> >>>>> access to the firs x components. >> >>>>> >> >>>>> I don't get it. What you described only works if the "offset" is >> >>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >> >>>>> offsets to get to the x offset. >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >> >>>>> wrote: >> >>>>> >> >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >> >>>>> >> >>>>> ah, thanks - that's helpful. I thought you were saying "I like the >> >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >> >>>>> you >> >>>>> _do_ prefer, though. it sounds like you're describing an entirely >> >>>>> different >> >>>>> scheme where the info that describes the name-components is ... >> >>>>> someplace >> >>>>> other than _in_ the name-components. is that correct? when you say >> >>>>> "field >> >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >> >>>>> >> >>>>> Correct. >> >>>>> In particular, with our name encoding, a TLV indicates the name >> >>>>> hierarchy >> >>>>> with offsets in the name and other TLV(s) indicates the offset to use >> >>>>> in >> >>>>> order to retrieve special components. >> >>>>> As for the field separator, it is something like "/". Aliasing is >> >>>>> avoided as >> >>>>> you do not rely on field separators to parse the name; you use the >> >>>>> "offset >> >>>>> TLV " to do that. >> >>>>> >> >>>>> So now, it may be an aesthetic question but: >> >>>>> >> >>>>> if you do not need the entire hierarchal structure (suppose you only >> >>>>> want >> >>>>> the first x components) you can directly have it using the offsets. >> >>>>> With the >> >>>>> Nested TLV structure you have to iteratively parse the first x-1 >> >>>>> components. >> >>>>> With the offset structure you cane directly access to the firs x >> >>>>> components. >> >>>>> >> >>>>> Max >> >>>>> >> >>>>> >> >>>>> -- Mark >> >>>>> >> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >> >>>>> >> >>>>> The why is simple: >> >>>>> >> >>>>> You use a lot of "generic component type" and very few "specific >> >>>>> component type". You are imposing types for every component in order >> >>>>> to >> >>>>> handle few exceptions (segmentation, etc..). You create a rule >> >>>>> (specify >> >>>>> the component's type ) to handle exceptions! >> >>>>> >> >>>>> I would prefer not to have typed components. Instead I would prefer >> >>>>> to >> >>>>> have the name as simple sequence bytes with a field separator. Then, >> >>>>> outside the name, if you have some components that could be used at >> >>>>> network layer (e.g. a TLV field), you simply need something that >> >>>>> indicates which is the offset allowing you to retrieve the version, >> >>>>> segment, etc in the name... >> >>>>> >> >>>>> >> >>>>> Max >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On 16/09/2014 20:33, Mark Stapp wrote: >> >>>>> >> >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >> >>>>> >> >>>>> I think we agree on the small number of "component types". >> >>>>> However, if you have a small number of types, you will end up with >> >>>>> names >> >>>>> containing many generic components types and few specific >> >>>>> components >> >>>>> types. Due to the fact that the component type specification is an >> >>>>> exception in the name, I would prefer something that specify >> >>>>> component's >> >>>>> type only when needed (something like UTF8 conventions but that >> >>>>> applications MUST use). >> >>>>> >> >>>>> so ... I can't quite follow that. the thread has had some >> >>>>> explanation >> >>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >> >>>>> and >> >>>>> there's been email trying to explain that applications don't have to >> >>>>> use types if they don't need to. your email sounds like "I prefer >> >>>>> the >> >>>>> UTF8 convention", but it doesn't say why you have that preference in >> >>>>> the face of the points about the problems. can you say why it is >> >>>>> that >> >>>>> you express a preference for the "convention" with problems ? >> >>>>> >> >>>>> Thanks, >> >>>>> Mark >> >>>>> >> >>>>> . >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Ndn-interest mailing list >> >>>>> Ndn-interest at lists.cs.ucla.edu >> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>>>> >> >>>> >> >>>> _______________________________________________ >> >>>> Ndn-interest mailing list >> >>>> Ndn-interest at lists.cs.ucla.edu >> >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>> >> > >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From lixia at CS.UCLA.EDU Sat Sep 20 23:20:58 2014 From: lixia at CS.UCLA.EDU (Lixia Zhang) Date: Sat, 20 Sep 2014 23:20:58 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <2E7244C8-FADE-4DC4-BD3D-FF2EDC9F7985@cs.ucla.edu> On Sep 14, 2014, at 8:39 PM, Tai-Lin Chu wrote: > hi, > Just some questions to know how people feel about it. > 1. Do you like it or not? why? > 2. Does it fit the need of your application? > 3. What might be some possible changes (or even a big redesign) if you > are asked to purpose a naming convention? > 4. some other thoughts > > Feel free to answer any of the questions. > Thanks > > [1] http://named-data.net/wp-content/uploads/2014/08/ndn-tr-22-ndn-memo-naming-conventions.pdf Someone reminded me that the above msg has led to a very active exchange. My apology for falling (pretty far) behind email, due to an unexpected big deadline (a number of NDNers are probably in similar situation). Hope to catch up and join the discussion within a week. Lixia From tailinchu at gmail.com Sat Sep 20 23:22:52 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 20 Sep 2014 23:22:52 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> Message-ID: > Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. I am very confused. For your example, if I want to get all today's sensor data, I just do (Any..Last second of last day)(First second of tomorrow..Any). That's 18 bytes. [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude On Sat, Sep 20, 2014 at 10:55 PM, wrote: > > On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: > >>> If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. >> >> Could you explain why missing content object situation happens? also >> range exclusion is just a shorter notation for many explicit exclude; >> converting from explicit excludes to ranged exclude is always >> possible. > > Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. > > yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. > >> >>> You exclude through 100 then issue a new interest. This goes to cache B >> >> I feel this case is invalid because cache A will also get the >> interest, and cache A will return v101 if it exists. Like you said, if >> this goes to cache B only, it means that cache A dies. How do you know >> that v101 even exist? > > I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. > >> >> >> c,d In general I agree that LPM performance is related to the number >> of components. In my own thread-safe LMP implementation, I used only >> one RWMutex for the whole tree. I don't know whether adding lock for >> every node will be faster or not because of lock overhead. >> >> However, we should compare (exact match + discovery protocol) vs (ndn >> lpm). Comparing performance of exact match to lpm is unfair. > > Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. > >> >> >> >> >> >> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>> I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. >>> >>> This is probably getting off-topic from the original post about naming conventions. >>> >>> a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. >>> >>> b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. >>> >>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. >>> >>> c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. >>> >>> d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. >>> >>> Marc >>> >>> >>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: >>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar things? >>>> 2. will removing selectors improve performance? How do we use other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >>>>> >>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>> >>>>>> Could you share it with us? >>>>>> >>>>> Sure. Here?s a strawman. >>>>> >>>>> The type space is 16 bits, so you have 65,565 types. >>>>> >>>>> The type space is currently shared with the types used for the entire protocol, that gives us two options: >>>>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >>>>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >>>>> >>>>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >>>>> >>>>> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >>>>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >>>>> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >>>>> - We give the rest of the space to application assignment. >>>>> >>>>> Make sense? >>>>> >>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>>>>> >>>>>> we could design for performance, >>>>> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >>>>> >>>>>> but I think there will be a turning >>>>>> point when the slower design starts to become "fast enough?. >>>>> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >>>>> >>>>>> Do you >>>>>> think there will be some design of ndn that will *never* have >>>>>> performance improvement? >>>>>> >>>>> I suspect LPM on data will always be slow (relative to the other functions). >>>>> i suspect exclusions will always be slow because they will require extra memory references. >>>>> >>>>> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >>>>> >>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>>>>> >>>>>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>>>>> well on it. It should be the other way around: once ndn app becomes >>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>>>>> a) clock rates are not getting (much) faster >>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>>>>> >>>>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>>>>> >>>>>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>>>>> >>>>>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>>>>> see that there are 3 approaches: >>>>>>>> 1. we should not define a naming convention at all >>>>>>>> 2. typed component: use tlv type space and add a handful of types >>>>>>>> 3. marked component: introduce only one more type and add additional >>>>>>>> marker space >>>>>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>> >>>>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>>>>> >>>>>>>> Also everybody thinks that the current utf8 marker naming convention >>>>>>>> needs to be revised. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>>>>> experiments? >>>>>>>>> >>>>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>> >>>>>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>>>>> name, it will be pretty expensive too. >>>>>>>>> >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Does this make that much difference? >>>>>>>>> >>>>>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>>>>> >>>>>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>>>>> offset of the beginning of the name. >>>>>>>>> OR >>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>> >>>>>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>>>>> that the first one is faster. I don?t think this is the case. >>>>>>>>> >>>>>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>>>>> then all the required cache lines for the first 5 components. For the >>>>>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>>>>> Given an assumption that a cache miss is way more expensive than >>>>>>>>> evaluating a number and computing an addition, you might find that the >>>>>>>>> performance of the index is actually slower than the performance of the >>>>>>>>> direct access. >>>>>>>>> >>>>>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>>>>> example, if you just get the offsets and then send the offsets as >>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>>>>> gain IF there are more cache line misses in reading the name than in >>>>>>>>> reading the index. So, if the regular part of the name that you?re >>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>>>>> processed by a different processor, then your might see some performance >>>>>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>> >>>>>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>>>>> different processor than the index). >>>>>>>>> >>>>>>>>> If you have numbers that show that the index is faster I would like to see >>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>> >>>>>>>>> Nacho >>>>>>>>> >>>>>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>>>>> I?m wrong.) >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>> Protocol Architect >>>>>>>>> Principal Scientist >>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>> +1(650)812-4458 >>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>>>>> bytes: >>>>>>>>> >>>>>>>>> i.e., >>>>>>>>> Type = Offsets >>>>>>>>> Length = 10 Bytes >>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>> >>>>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>>>>> long. >>>>>>>>> >>>>>>>>> Max >>>>>>>>> >>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>> >>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>> want the first x components) you can directly have it using the >>>>>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>>>>> the first x-1 components. With the offset structure you cane directly >>>>>>>>> access to the firs x components. >>>>>>>>> >>>>>>>>> I don't get it. What you described only works if the "offset" is >>>>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>>>>> offsets to get to the x offset. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>> >>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>>>>> you >>>>>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>>>>> different >>>>>>>>> scheme where the info that describes the name-components is ... >>>>>>>>> someplace >>>>>>>>> other than _in_ the name-components. is that correct? when you say >>>>>>>>> "field >>>>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>>>>> >>>>>>>>> Correct. >>>>>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>>>>> hierarchy >>>>>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>>>>> in >>>>>>>>> order to retrieve special components. >>>>>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>>>>> avoided as >>>>>>>>> you do not rely on field separators to parse the name; you use the >>>>>>>>> "offset >>>>>>>>> TLV " to do that. >>>>>>>>> >>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>> >>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>> want >>>>>>>>> the first x components) you can directly have it using the offsets. >>>>>>>>> With the >>>>>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>>>>> components. >>>>>>>>> With the offset structure you cane directly access to the firs x >>>>>>>>> components. >>>>>>>>> >>>>>>>>> Max >>>>>>>>> >>>>>>>>> >>>>>>>>> -- Mark >>>>>>>>> >>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>> >>>>>>>>> The why is simple: >>>>>>>>> >>>>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>>>> component type". You are imposing types for every component in order >>>>>>>>> to >>>>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>>>> (specify >>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>> >>>>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>>>> to >>>>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>>>> outside the name, if you have some components that could be used at >>>>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>>>> segment, etc in the name... >>>>>>>>> >>>>>>>>> >>>>>>>>> Max >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>> >>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>> >>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>>> names >>>>>>>>> containing many generic components types and few specific >>>>>>>>> components >>>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>> component's >>>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>>> applications MUST use). >>>>>>>>> >>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>> explanation >>>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>>> and >>>>>>>>> there's been email trying to explain that applications don't have to >>>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>>> the >>>>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>>> that >>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> . >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> > From Marc.Mosko at parc.com Sat Sep 20 23:25:35 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Sun, 21 Sep 2014 06:25:35 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> , Message-ID: That will get you one reading then you need to exclude it and ask again. Sent from my telephone On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. > > I am very confused. For your example, if I want to get all today's > sensor data, I just do (Any..Last second of last day)(First second of > tomorrow..Any). That's 18 bytes. > > > [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude > >> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >> >> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >> >>>> If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. >>> >>> Could you explain why missing content object situation happens? also >>> range exclusion is just a shorter notation for many explicit exclude; >>> converting from explicit excludes to ranged exclude is always >>> possible. >> >> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. >> >> yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. >> >>> >>>> You exclude through 100 then issue a new interest. This goes to cache B >>> >>> I feel this case is invalid because cache A will also get the >>> interest, and cache A will return v101 if it exists. Like you said, if >>> this goes to cache B only, it means that cache A dies. How do you know >>> that v101 even exist? >> >> I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. >> >>> >>> >>> c,d In general I agree that LPM performance is related to the number >>> of components. In my own thread-safe LMP implementation, I used only >>> one RWMutex for the whole tree. I don't know whether adding lock for >>> every node will be faster or not because of lock overhead. >>> >>> However, we should compare (exact match + discovery protocol) vs (ndn >>> lpm). Comparing performance of exact match to lpm is unfair. >> >> Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. >> >>> >>> >>> >>> >>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: >>>>> >>>>> I had thought about these questions, but I want to know your idea >>>>> besides typed component: >>>>> 1. LPM allows "data discovery". How will exact match do similar things? >>>>> 2. will removing selectors improve performance? How do we use other >>>>> faster technique to replace selector? >>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>> byte, but 2 bytes for length might not be enough for future. >>>>> >>>>> >>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >>>>>> >>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>>>>> >>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>> >>>>>>> Could you share it with us? >>>>>> Sure. Here?s a strawman. >>>>>> >>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>> >>>>>> The type space is currently shared with the types used for the entire protocol, that gives us two options: >>>>>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >>>>>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >>>>>> >>>>>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >>>>>> >>>>>> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >>>>>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >>>>>> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >>>>>> - We give the rest of the space to application assignment. >>>>>> >>>>>> Make sense? >>>>>> >>>>>> >>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>>>>>> >>>>>>> we could design for performance, >>>>>> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >>>>>> >>>>>>> but I think there will be a turning >>>>>>> point when the slower design starts to become "fast enough?. >>>>>> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >>>>>> >>>>>>> Do you >>>>>>> think there will be some design of ndn that will *never* have >>>>>>> performance improvement? >>>>>> I suspect LPM on data will always be slow (relative to the other functions). >>>>>> i suspect exclusions will always be slow because they will require extra memory references. >>>>>> >>>>>> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >>>>>> >>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>>>>>> >>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>>>>>>> >>>>>>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>>>>>> well on it. It should be the other way around: once ndn app becomes >>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>>>>>> a) clock rates are not getting (much) faster >>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>>>>>> >>>>>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>>>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>>>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>>>>>> >>>>>>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>>>>>> >>>>>>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>>>>>> see that there are 3 approaches: >>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>> 2. typed component: use tlv type space and add a handful of types >>>>>>>>> 3. marked component: introduce only one more type and add additional >>>>>>>>> marker space >>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>>> >>>>>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>>>>>> >>>>>>>>> Also everybody thinks that the current utf8 marker naming convention >>>>>>>>> needs to be revised. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>>>>>> experiments? >>>>>>>>>> >>>>>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>> >>>>>>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Does this make that much difference? >>>>>>>>>> >>>>>>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>>>>>> >>>>>>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>>>>>> offset of the beginning of the name. >>>>>>>>>> OR >>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>> >>>>>>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>>>>>> that the first one is faster. I don?t think this is the case. >>>>>>>>>> >>>>>>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>>>>>> then all the required cache lines for the first 5 components. For the >>>>>>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>>>>>> Given an assumption that a cache miss is way more expensive than >>>>>>>>>> evaluating a number and computing an addition, you might find that the >>>>>>>>>> performance of the index is actually slower than the performance of the >>>>>>>>>> direct access. >>>>>>>>>> >>>>>>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>>>>>> example, if you just get the offsets and then send the offsets as >>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>>>>>> gain IF there are more cache line misses in reading the name than in >>>>>>>>>> reading the index. So, if the regular part of the name that you?re >>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>>>>>> processed by a different processor, then your might see some performance >>>>>>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>> >>>>>>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>>>>>> different processor than the index). >>>>>>>>>> >>>>>>>>>> If you have numbers that show that the index is faster I would like to see >>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>> >>>>>>>>>> Nacho >>>>>>>>>> >>>>>>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>>>>>> I?m wrong.) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>> Protocol Architect >>>>>>>>>> Principal Scientist >>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>> +1(650)812-4458 >>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>>>>>> bytes: >>>>>>>>>> >>>>>>>>>> i.e., >>>>>>>>>> Type = Offsets >>>>>>>>>> Length = 10 Bytes >>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>> >>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>>>>>> long. >>>>>>>>>> >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>> >>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>> want the first x components) you can directly have it using the >>>>>>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>>>>>> the first x-1 components. With the offset structure you cane directly >>>>>>>>>> access to the firs x components. >>>>>>>>>> >>>>>>>>>> I don't get it. What you described only works if the "offset" is >>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>>>>>> offsets to get to the x offset. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>> >>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>>>>>> you >>>>>>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>>>>>> different >>>>>>>>>> scheme where the info that describes the name-components is ... >>>>>>>>>> someplace >>>>>>>>>> other than _in_ the name-components. is that correct? when you say >>>>>>>>>> "field >>>>>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>>>>>> >>>>>>>>>> Correct. >>>>>>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>>>>>> hierarchy >>>>>>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>>>>>> in >>>>>>>>>> order to retrieve special components. >>>>>>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>>>>>> avoided as >>>>>>>>>> you do not rely on field separators to parse the name; you use the >>>>>>>>>> "offset >>>>>>>>>> TLV " to do that. >>>>>>>>>> >>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>> >>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>> want >>>>>>>>>> the first x components) you can directly have it using the offsets. >>>>>>>>>> With the >>>>>>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>>>>>> components. >>>>>>>>>> With the offset structure you cane directly access to the firs x >>>>>>>>>> components. >>>>>>>>>> >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- Mark >>>>>>>>>> >>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>> >>>>>>>>>> The why is simple: >>>>>>>>>> >>>>>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>>>>> component type". You are imposing types for every component in order >>>>>>>>>> to >>>>>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>>>>> (specify >>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>> >>>>>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>>>>> to >>>>>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>>>>> outside the name, if you have some components that could be used at >>>>>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>>>>> segment, etc in the name... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>> >>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>> >>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>>>> names >>>>>>>>>> containing many generic components types and few specific >>>>>>>>>> components >>>>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>> component's >>>>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>>>> applications MUST use). >>>>>>>>>> >>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>> explanation >>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>>>> and >>>>>>>>>> there's been email trying to explain that applications don't have to >>>>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>>>> the >>>>>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>>>> that >>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mark >>>>>>>>>> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> From ravi.ravindran at gmail.com Sat Sep 20 23:42:02 2014 From: ravi.ravindran at gmail.com (Ravi Ravindran) Date: Sat, 20 Sep 2014 23:42:02 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <6134CB3B-B827-4F4B-8F7C-F957A4DC02D3@parc.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <1FF22A0E-E81E-439E-89C9-5C0F93C61A36@parc.com> <6134CB3B-B827-4F4B-8F7C-F957A4DC02D3@parc.com> Message-ID: IMO there can be two cases Marc: If the content is moving hands from one service to another, one has to republish, this then can consider the MTU restrictions. If one requires service continuity, publisher could provide multiple sizes of the same content, considering the content distributors MTU requirements, in this case one may not to republish, this is to suit the large chunks are for the high capacity paths in the network leading to the consumers. Regards, Ravi On Sat, Sep 20, 2014 at 10:57 PM, wrote: > > it can be adopted to any level of granularity. > > > So by that you mean re-signed and re-published as a new set of content > objects with a more friendly MTU for the non-optical network? > > Also, in my example my math was off a little. A 5 GByte object at 40 Gbps > is 1 second, not 1/8th of a second, of serialization delay. > > Marc > > On Sep 21, 2014, at 12:56 AM, Ravi Ravindran > wrote: > > Hi Marc, > > You point to a use case where you might have a end-to-end optical network, > google-fibre like access networks could be a good example, another is when > you have content producers/distributors collaborate on a high speed optical > backbone to move large amount of data, but once it interacts with consuming > networks, it can be adopted to any level of granularity. > > -Ravi > > On Sat, Sep 20, 2014 at 2:10 PM, wrote: > >> If one publishes a Content Object at 1 GB, that is the unit of >> retransmission. what happens when that 1 GB content object hits an access >> point with a 5% loss rate? Is the access point really going to cache >> multiple 1 GB objects? >> >> If the unit of retransmission can be a fragment, then I would argue >> that?s a content object and the 1GB thing is really an aggregation not an >> atom that must be fragmented on smaller MTU networks. >> >> Yes, if you have an all-optical network with very low loss rate >> (including buffer drops) and won?t be hitting anything that needs to >> fragment, then going with a large MTU might be beneficial. It remains to >> be shown that one can do efficient cut-through forwarding of a CCN packet. >> Obviously if one is exploiting content object hash naming one cannot do >> cut-through forwarding because you need to hash before picking the next hop. >> >> If content objects are 5GB, how much packet memory are you going to need >> on a typical line card? At 40 Gbps, a 5 GB packet takes 125 milli-seconds, >> so you could likely have one arriving on each interface. Next-gen chipsets >> are looking at 400 Gbps, so you could have 10x interfaces so you would need >> at least 50 GB of frame memory? You clearly are not multiplexing any >> real-time traffic on those interfaces?. >> >> What would be the use-case for such large packets without fragmentation? >> What does such a system look like? >> >> Marc >> >> On Sep 20, 2014, at 8:41 PM, Ravi Ravindran >> wrote: >> >> I agree on the 2B length field comment, that should be variable to >> accommodate ICN interfacing with high speed optical networks, or in general >> future requirements to ship large (GB) chunks of data. >> >> Regards, >> Ravi >> >> On Sat, Sep 20, 2014 at 11:15 AM, Tai-Lin Chu >> wrote: >> >>> I had thought about these questions, but I want to know your idea >>> besides typed component: >>> 1. LPM allows "data discovery". How will exact match do similar things? >>> 2. will removing selectors improve performance? How do we use other >>> faster technique to replace selector? >>> 3. fixed byte length and type. I agree more that type can be fixed >>> byte, but 2 bytes for length might not be enough for future. >>> >>> >>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>> wrote: >>> > >>> > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>> > >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on how the >>> registry of types is managed. >>> >> >>> >> Could you share it with us? >>> >> >>> > Sure. Here?s a strawman. >>> > >>> > The type space is 16 bits, so you have 65,565 types. >>> > >>> > The type space is currently shared with the types used for the entire >>> protocol, that gives us two options: >>> > (1) we reserve a range for name component types. Given the likelihood >>> there will be at least as much and probably more need to component types >>> than protocol extensions, we could reserve 1/2 of the type space, giving us >>> 32K types for name components. >>> > (2) since there is no parsing ambiguity between name components and >>> other fields of the protocol (sine they are sub-types of the name type) we >>> could reuse numbers and thereby have an entire 65K name component types. >>> > >>> > We divide the type space into regions, and manage it with a registry. >>> If we ever get to the point of creating an IETF standard, IANA has 25 years >>> of experience running registries and there are well-understood rule sets >>> for different kinds of registries (open, requires a written spec, requires >>> standards approval). >>> > >>> > - We allocate one ?default" name component type for ?generic name?, >>> which would be used on name prefixes and other common cases where there are >>> no special semantics on the name component. >>> > - We allocate a range of name component types, say 1024, to globally >>> understood types that are part of the base or extension NDN specifications >>> (e.g. chunk#, version#, etc. >>> > - We reserve some portion of the space for unanticipated uses (say >>> another 1024 types) >>> > - We give the rest of the space to application assignment. >>> > >>> > Make sense? >>> > >>> > >>> >>> While I?m sympathetic to that view, there are three ways in which >>> Moore?s law or hardware tricks will not save us from performance flaws in >>> the design >>> >> >>> >> we could design for performance, >>> > That?s not what people are advocating. We are advocating that we *not* >>> design for known bad performance and hope serendipity or Moore?s Law will >>> come to the rescue. >>> > >>> >> but I think there will be a turning >>> >> point when the slower design starts to become "fast enough?. >>> > Perhaps, perhaps not. Relative performance is what matters so things >>> that don?t get faster while others do tend to get dropped or not used >>> because they impose a performance penalty relative to the things that go >>> faster. There is also the ?low-end? phenomenon where impovements in >>> technology get applied to lowering cost rather than improving performance. >>> For those environments bad performance just never get better. >>> > >>> >> Do you >>> >> think there will be some design of ndn that will *never* have >>> >> performance improvement? >>> >> >>> > I suspect LPM on data will always be slow (relative to the other >>> functions). >>> > i suspect exclusions will always be slow because they will require >>> extra memory references. >>> > >>> > However I of course don?t claim to clairvoyance so this is just >>> speculation based on 35+ years of seeing performance improve by 4 orders of >>> magnitude and still having to worry about counting cycles and memory >>> references? >>> > >>> >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>> wrote: >>> >>> >>> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>> wrote: >>> >>> >>> >>>> We should not look at a certain chip nowadays and want ndn to >>> perform >>> >>>> well on it. It should be the other way around: once ndn app becomes >>> >>>> popular, a better chip will be designed for ndn. >>> >>>> >>> >>> While I?m sympathetic to that view, there are three ways in which >>> Moore?s law or hardware tricks will not save us from performance flaws in >>> the design: >>> >>> a) clock rates are not getting (much) faster >>> >>> b) memory accesses are getting (relatively) more expensive >>> >>> c) data structures that require locks to manipulate successfully >>> will be relatively more expensive, even with near-zero lock contention. >>> >>> >>> >>> The fact is, IP *did* have some serious performance flaws in its >>> design. We just forgot those because the design elements that depended on >>> those mistakes have fallen into disuse. The poster children for this are: >>> >>> 1. IP options. Nobody can use them because they are too slow on >>> modern forwarding hardware, so they can?t be reliably used anywhere >>> >>> 2. the UDP checksum, which was a bad design when it was specified >>> and is now a giant PITA that still causes major pain in working around. >>> >>> >>> >>> I?m afraid students today are being taught the that designers of IP >>> were flawless, as opposed to very good scientists and engineers that got >>> most of it right. >>> >>> >>> >>>> I feel the discussion today and yesterday has been off-topic. Now I >>> >>>> see that there are 3 approaches: >>> >>>> 1. we should not define a naming convention at all >>> >>>> 2. typed component: use tlv type space and add a handful of types >>> >>>> 3. marked component: introduce only one more type and add additional >>> >>>> marker space >>> >>>> >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on how the >>> registry of types is managed. >>> >>> >>> >>> It is just as powerful in practice as either throwing up our hands >>> and letting applications design their own mutually incompatible schemes or >>> trying to make naming conventions with markers in a way that is fast to >>> generate/parse and also resilient against aliasing. >>> >>> >>> >>>> Also everybody thinks that the current utf8 marker naming convention >>> >>>> needs to be revised. >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>> >>>>> Would that chip be suitable, i.e. can we expect most names to fit >>> in (the >>> >>>>> magnitude of) 96 bytes? What length are names usually in current >>> NDN >>> >>>>> experiments? >>> >>>>> >>> >>>>> I guess wide deployment could make for even longer names. Related: >>> Many URLs >>> >>>>> I encounter nowadays easily don't fit within two 80-column text >>> lines, and >>> >>>>> NDN will have to carry more information than URLs, as far as I see. >>> >>>>> >>> >>>>> >>> >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>> >>>>> >>> >>>>> In fact, the index in separate TLV will be slower on some >>> architectures, >>> >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in >>> memory, >>> >>>>> then any subsequent memory is accessed only as two adjacent >>> 32-byte blocks >>> >>>>> (there can be at most 5 blocks available at any one time). If you >>> need to >>> >>>>> switch between arrays, it would be very expensive. If you have to >>> read past >>> >>>>> the name to get to the 2nd array, then read it, then backup to get >>> to the >>> >>>>> name, it will be pretty expensive too. >>> >>>>> >>> >>>>> Marc >>> >>>>> >>> >>>>> On Sep 18, 2014, at 2:02 PM, >>> >>>>> wrote: >>> >>>>> >>> >>>>> Does this make that much difference? >>> >>>>> >>> >>>>> If you want to parse the first 5 components. One way to do it is: >>> >>>>> >>> >>>>> Read the index, find entry 5, then read in that many bytes from >>> the start >>> >>>>> offset of the beginning of the name. >>> >>>>> OR >>> >>>>> Start reading name, (find size + move ) 5 times. >>> >>>>> >>> >>>>> How much speed are you getting from one to the other? You seem to >>> imply >>> >>>>> that the first one is faster. I don?t think this is the case. >>> >>>>> >>> >>>>> In the first one you?ll probably have to get the cache line for >>> the index, >>> >>>>> then all the required cache lines for the first 5 components. For >>> the >>> >>>>> second, you?ll have to get all the cache lines for the first 5 >>> components. >>> >>>>> Given an assumption that a cache miss is way more expensive than >>> >>>>> evaluating a number and computing an addition, you might find that >>> the >>> >>>>> performance of the index is actually slower than the performance >>> of the >>> >>>>> direct access. >>> >>>>> >>> >>>>> Granted, there is a case where you don?t access the name at all, >>> for >>> >>>>> example, if you just get the offsets and then send the offsets as >>> >>>>> parameters to another processor/GPU/NPU/etc. In this case you may >>> see a >>> >>>>> gain IF there are more cache line misses in reading the name than >>> in >>> >>>>> reading the index. So, if the regular part of the name that >>> you?re >>> >>>>> parsing is bigger than the cache line (64 bytes?) and the name is >>> to be >>> >>>>> processed by a different processor, then your might see some >>> performance >>> >>>>> gain in using the index, but in all other circumstances I bet this >>> is not >>> >>>>> the case. I may be wrong, haven?t actually tested it. >>> >>>>> >>> >>>>> This is all to say, I don?t think we should be designing the >>> protocol with >>> >>>>> only one architecture in mind. (The architecture of sending the >>> name to a >>> >>>>> different processor than the index). >>> >>>>> >>> >>>>> If you have numbers that show that the index is faster I would >>> like to see >>> >>>>> under what conditions and architectural assumptions. >>> >>>>> >>> >>>>> Nacho >>> >>>>> >>> >>>>> (I may have misinterpreted your description so feel free to >>> correct me if >>> >>>>> I?m wrong.) >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> Nacho (Ignacio) Solis >>> >>>>> Protocol Architect >>> >>>>> Principal Scientist >>> >>>>> Palo Alto Research Center (PARC) >>> >>>>> +1(650)812-4458 >>> >>>>> Ignacio.Solis at parc.com >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" < >>> massimo.gallo at alcatel-lucent.com> >>> >>>>> wrote: >>> >>>>> >>> >>>>> Indeed each components' offset must be encoded using a fixed >>> amount of >>> >>>>> bytes: >>> >>>>> >>> >>>>> i.e., >>> >>>>> Type = Offsets >>> >>>>> Length = 10 Bytes >>> >>>>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>>>> >>> >>>>> You may also imagine to have a "Offset_2byte" type if your name is >>> too >>> >>>>> long. >>> >>>>> >>> >>>>> Max >>> >>>>> >>> >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> >>>>> >>> >>>>> if you do not need the entire hierarchal structure (suppose you >>> only >>> >>>>> want the first x components) you can directly have it using the >>> >>>>> offsets. With the Nested TLV structure you have to iteratively >>> parse >>> >>>>> the first x-1 components. With the offset structure you cane >>> directly >>> >>>>> access to the firs x components. >>> >>>>> >>> >>>>> I don't get it. What you described only works if the "offset" is >>> >>>>> encoded in fixed bytes. With varNum, you will still need to parse >>> x-1 >>> >>>>> offsets to get to the x offset. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> >>>>> wrote: >>> >>>>> >>> >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>> >>>>> >>> >>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>> >>>>> existing NDN UTF8 'convention'." I'm still not sure I understand >>> what >>> >>>>> you >>> >>>>> _do_ prefer, though. it sounds like you're describing an entirely >>> >>>>> different >>> >>>>> scheme where the info that describes the name-components is ... >>> >>>>> someplace >>> >>>>> other than _in_ the name-components. is that correct? when you say >>> >>>>> "field >>> >>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>> >>>>> >>> >>>>> Correct. >>> >>>>> In particular, with our name encoding, a TLV indicates the name >>> >>>>> hierarchy >>> >>>>> with offsets in the name and other TLV(s) indicates the offset to >>> use >>> >>>>> in >>> >>>>> order to retrieve special components. >>> >>>>> As for the field separator, it is something like "/". Aliasing is >>> >>>>> avoided as >>> >>>>> you do not rely on field separators to parse the name; you use the >>> >>>>> "offset >>> >>>>> TLV " to do that. >>> >>>>> >>> >>>>> So now, it may be an aesthetic question but: >>> >>>>> >>> >>>>> if you do not need the entire hierarchal structure (suppose you >>> only >>> >>>>> want >>> >>>>> the first x components) you can directly have it using the offsets. >>> >>>>> With the >>> >>>>> Nested TLV structure you have to iteratively parse the first x-1 >>> >>>>> components. >>> >>>>> With the offset structure you cane directly access to the firs x >>> >>>>> components. >>> >>>>> >>> >>>>> Max >>> >>>>> >>> >>>>> >>> >>>>> -- Mark >>> >>>>> >>> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>>>> >>> >>>>> The why is simple: >>> >>>>> >>> >>>>> You use a lot of "generic component type" and very few "specific >>> >>>>> component type". You are imposing types for every component in >>> order >>> >>>>> to >>> >>>>> handle few exceptions (segmentation, etc..). You create a rule >>> >>>>> (specify >>> >>>>> the component's type ) to handle exceptions! >>> >>>>> >>> >>>>> I would prefer not to have typed components. Instead I would prefer >>> >>>>> to >>> >>>>> have the name as simple sequence bytes with a field separator. >>> Then, >>> >>>>> outside the name, if you have some components that could be used at >>> >>>>> network layer (e.g. a TLV field), you simply need something that >>> >>>>> indicates which is the offset allowing you to retrieve the version, >>> >>>>> segment, etc in the name... >>> >>>>> >>> >>>>> >>> >>>>> Max >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>>>> >>> >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>>>> >>> >>>>> I think we agree on the small number of "component types". >>> >>>>> However, if you have a small number of types, you will end up with >>> >>>>> names >>> >>>>> containing many generic components types and few specific >>> >>>>> components >>> >>>>> types. Due to the fact that the component type specification is an >>> >>>>> exception in the name, I would prefer something that specify >>> >>>>> component's >>> >>>>> type only when needed (something like UTF8 conventions but that >>> >>>>> applications MUST use). >>> >>>>> >>> >>>>> so ... I can't quite follow that. the thread has had some >>> >>>>> explanation >>> >>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>> >>>>> and >>> >>>>> there's been email trying to explain that applications don't have >>> to >>> >>>>> use types if they don't need to. your email sounds like "I prefer >>> >>>>> the >>> >>>>> UTF8 convention", but it doesn't say why you have that preference >>> in >>> >>>>> the face of the points about the problems. can you say why it is >>> >>>>> that >>> >>>>> you express a preference for the "convention" with problems ? >>> >>>>> >>> >>>>> Thanks, >>> >>>>> Mark >>> >>>>> >>> >>>>> . >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Ndn-interest mailing list >>> >>>>> Ndn-interest at lists.cs.ucla.edu >>> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Ndn-interest mailing list >>> >>>>> Ndn-interest at lists.cs.ucla.edu >>> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Ndn-interest mailing list >>> >>>>> Ndn-interest at lists.cs.ucla.edu >>> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Ndn-interest mailing list >>> >>>>> Ndn-interest at lists.cs.ucla.edu >>> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Ndn-interest mailing list >>> >>>>> Ndn-interest at lists.cs.ucla.edu >>> >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>>>> >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Ndn-interest mailing list >>> >>>> Ndn-interest at lists.cs.ucla.edu >>> >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> > >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Sat Sep 20 23:50:34 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 20 Sep 2014 23:50:34 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> Message-ID: I see. Can you briefly describe how ccnx discovery protocol solves the all problems that you mentioned (not just exclude)? a doc will be better. My unserious conjecture( :) ) : exclude is equal to [not]. I will soon expect [and] and [or], so boolean algebra is fully supported. Regular language or context free language might become part of selector too. On Sat, Sep 20, 2014 at 11:25 PM, wrote: > That will get you one reading then you need to exclude it and ask again. > > Sent from my telephone > > On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: > >>> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. >> >> I am very confused. For your example, if I want to get all today's >> sensor data, I just do (Any..Last second of last day)(First second of >> tomorrow..Any). That's 18 bytes. >> >> >> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >> >>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>> >>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>> >>>>> If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>> >>> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. >>> >>> yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. >>> >>>> >>>>> You exclude through 100 then issue a new interest. This goes to cache B >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, if >>>> this goes to cache B only, it means that cache A dies. How do you know >>>> that v101 even exist? >>> >>> I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. >>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>> >>> Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. >>> >>>> >>>> >>>> >>>> >>>> >>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>> I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. >>>>> >>>>> This is probably getting off-topic from the original post about naming conventions. >>>>> >>>>> a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. >>>>> >>>>> b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. >>>>> >>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. >>>>> >>>>> c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. >>>>> >>>>> d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. >>>>> >>>>> Marc >>>>> >>>>> >>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: >>>>>> >>>>>> I had thought about these questions, but I want to know your idea >>>>>> besides typed component: >>>>>> 1. LPM allows "data discovery". How will exact match do similar things? >>>>>> 2. will removing selectors improve performance? How do we use other >>>>>> faster technique to replace selector? >>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>> >>>>>> >>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>>>>>> >>>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>>> >>>>>>>> Could you share it with us? >>>>>>> Sure. Here?s a strawman. >>>>>>> >>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>> >>>>>>> The type space is currently shared with the types used for the entire protocol, that gives us two options: >>>>>>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >>>>>>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >>>>>>> >>>>>>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >>>>>>> >>>>>>> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >>>>>>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >>>>>>> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >>>>>>> - We give the rest of the space to application assignment. >>>>>>> >>>>>>> Make sense? >>>>>>> >>>>>>> >>>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>>>>>>> >>>>>>>> we could design for performance, >>>>>>> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >>>>>>> >>>>>>>> but I think there will be a turning >>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >>>>>>> >>>>>>>> Do you >>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>> performance improvement? >>>>>>> I suspect LPM on data will always be slow (relative to the other functions). >>>>>>> i suspect exclusions will always be slow because they will require extra memory references. >>>>>>> >>>>>>> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >>>>>>> >>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>>>>>>> >>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>>>>>>>> >>>>>>>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>>>>>>> well on it. It should be the other way around: once ndn app becomes >>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>>>>>>> >>>>>>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>>>>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>>>>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>>>>>>> >>>>>>>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>>>>>>> >>>>>>>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>>>>>>> see that there are 3 approaches: >>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>> 2. typed component: use tlv type space and add a handful of types >>>>>>>>>> 3. marked component: introduce only one more type and add additional >>>>>>>>>> marker space >>>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>>>> >>>>>>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>>>>>>> >>>>>>>>>> Also everybody thinks that the current utf8 marker naming convention >>>>>>>>>> needs to be revised. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>>>>>>> experiments? >>>>>>>>>>> >>>>>>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>>>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>> >>>>>>>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>>>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>>>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>>>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Does this make that much difference? >>>>>>>>>>> >>>>>>>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>>>>>>> >>>>>>>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>> OR >>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>> >>>>>>>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>>>>>>> that the first one is faster. I don?t think this is the case. >>>>>>>>>>> >>>>>>>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>>>>>>> then all the required cache lines for the first 5 components. For the >>>>>>>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>>>>>>> Given an assumption that a cache miss is way more expensive than >>>>>>>>>>> evaluating a number and computing an addition, you might find that the >>>>>>>>>>> performance of the index is actually slower than the performance of the >>>>>>>>>>> direct access. >>>>>>>>>>> >>>>>>>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>>>>>>> example, if you just get the offsets and then send the offsets as >>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>>>>>>> gain IF there are more cache line misses in reading the name than in >>>>>>>>>>> reading the index. So, if the regular part of the name that you?re >>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>>>>>>> processed by a different processor, then your might see some performance >>>>>>>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>> >>>>>>>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>>>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>>>>>>> different processor than the index). >>>>>>>>>>> >>>>>>>>>>> If you have numbers that show that the index is faster I would like to see >>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>> >>>>>>>>>>> Nacho >>>>>>>>>>> >>>>>>>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>>>>>>> I?m wrong.) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>> Protocol Architect >>>>>>>>>>> Principal Scientist >>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>>>>>>> bytes: >>>>>>>>>>> >>>>>>>>>>> i.e., >>>>>>>>>>> Type = Offsets >>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>> >>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>>>>>>> long. >>>>>>>>>>> >>>>>>>>>>> Max >>>>>>>>>>> >>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>> >>>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>>> want the first x components) you can directly have it using the >>>>>>>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>>>>>>> the first x-1 components. With the offset structure you cane directly >>>>>>>>>>> access to the firs x components. >>>>>>>>>>> >>>>>>>>>>> I don't get it. What you described only works if the "offset" is >>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>> >>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>>>>>>> you >>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>>>>>>> different >>>>>>>>>>> scheme where the info that describes the name-components is ... >>>>>>>>>>> someplace >>>>>>>>>>> other than _in_ the name-components. is that correct? when you say >>>>>>>>>>> "field >>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>>>>>>> >>>>>>>>>>> Correct. >>>>>>>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>>>>>>> hierarchy >>>>>>>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>>>>>>> in >>>>>>>>>>> order to retrieve special components. >>>>>>>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>>>>>>> avoided as >>>>>>>>>>> you do not rely on field separators to parse the name; you use the >>>>>>>>>>> "offset >>>>>>>>>>> TLV " to do that. >>>>>>>>>>> >>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>> >>>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>>> want >>>>>>>>>>> the first x components) you can directly have it using the offsets. >>>>>>>>>>> With the >>>>>>>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>>>>>>> components. >>>>>>>>>>> With the offset structure you cane directly access to the firs x >>>>>>>>>>> components. >>>>>>>>>>> >>>>>>>>>>> Max >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- Mark >>>>>>>>>>> >>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>> >>>>>>>>>>> The why is simple: >>>>>>>>>>> >>>>>>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>>>>>> component type". You are imposing types for every component in order >>>>>>>>>>> to >>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>>>>>> (specify >>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>> >>>>>>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>>>>>> to >>>>>>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>>>>>> outside the name, if you have some components that could be used at >>>>>>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>>>>>> segment, etc in the name... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Max >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>> >>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>> >>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>>>>> names >>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>> components >>>>>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>> component's >>>>>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>>>>> applications MUST use). >>>>>>>>>>> >>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>> explanation >>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>>>>> and >>>>>>>>>>> there's been email trying to explain that applications don't have to >>>>>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>>>>> the >>>>>>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>>>>> that >>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> From yingdi at CS.UCLA.EDU Sun Sep 21 00:00:05 2014 From: yingdi at CS.UCLA.EDU (Yingdi Yu) Date: Sun, 21 Sep 2014 00:00:05 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: On Sep 20, 2014, at 11:18 AM, Adeola Bannis wrote: > > On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu wrote: > > @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC signature. Because if key size is longer than hash output, the key digest is used instead. If we allow KeyDigest in KeyLocator, then some careless programmers may leak the secret. > > Well, a careless programmer could put a passphrase, or something used to derive the key, into a KeyLocator KeyName as well. We can go ahead and make the restriction, but there are other ways a programmer could shoot herself in the foot here. I did not mean that "careless" as you describe. I mean those who do not know the details of HMAC. The problem is that when key is shorter than hash output, it is safe to put key digest in the key locator. However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other, i.e., attackers can directly use the key digest to construct any legitimate HMAC. Therefore you either explicitly specify what should be used when KeyDigest is used as the KeyLocator, or you can disable the usage of KeyDigest in KeyLocator. Yingdi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: From Marc.Mosko at parc.com Sun Sep 21 00:23:52 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Sun, 21 Sep 2014 07:23:52 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> , Message-ID: <250C55BE-CC5C-4147-8391-A676A60B7728@parc.com> No matter what the expressiveness of the predicates if the forwarder can send interests different ways you don't have a consistent underlying set to talk about so you would always need non-range exclusions to discover every version. Range exclusions only work I believe if you get an authoritative answer. If different content pieces are scattered between different caches I don't see how range exclusions would work to discover every version. I'm sorry to be pointing out problems without offering solutions but we're not ready to publish our discovery protocols. Sent from my telephone > On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: > > I see. Can you briefly describe how ccnx discovery protocol solves the > all problems that you mentioned (not just exclude)? a doc will be > better. > > My unserious conjecture( :) ) : exclude is equal to [not]. I will soon > expect [and] and [or], so boolean algebra is fully supported. Regular > language or context free language might become part of selector too. > >> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >> That will get you one reading then you need to exclude it and ask again. >> >> Sent from my telephone >> >> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >> >>>> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. >>> >>> I am very confused. For your example, if I want to get all today's >>> sensor data, I just do (Any..Last second of last day)(First second of >>> tomorrow..Any). That's 18 bytes. >>> >>> >>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>>>> If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. >>>>> >>>>> Could you explain why missing content object situation happens? also >>>>> range exclusion is just a shorter notation for many explicit exclude; >>>>> converting from explicit excludes to ranged exclude is always >>>>> possible. >>>> >>>> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. >>>> >>>>> >>>>>> You exclude through 100 then issue a new interest. This goes to cache B >>>>> >>>>> I feel this case is invalid because cache A will also get the >>>>> interest, and cache A will return v101 if it exists. Like you said, if >>>>> this goes to cache B only, it means that cache A dies. How do you know >>>>> that v101 even exist? >>>> >>>> I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. >>>> >>>>> >>>>> >>>>> c,d In general I agree that LPM performance is related to the number >>>>> of components. In my own thread-safe LMP implementation, I used only >>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>> every node will be faster or not because of lock overhead. >>>>> >>>>> However, we should compare (exact match + discovery protocol) vs (ndn >>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>> I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. >>>>>> >>>>>> This is probably getting off-topic from the original post about naming conventions. >>>>>> >>>>>> a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. >>>>>> >>>>>> b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. >>>>>> >>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. >>>>>> >>>>>> c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. >>>>>> >>>>>> d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: >>>>>>> >>>>>>> I had thought about these questions, but I want to know your idea >>>>>>> besides typed component: >>>>>>> 1. LPM allows "data discovery". How will exact match do similar things? >>>>>>> 2. will removing selectors improve performance? How do we use other >>>>>>> faster technique to replace selector? >>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>> >>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >>>>>>>> >>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>>>>>>> >>>>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>>>> >>>>>>>>> Could you share it with us? >>>>>>>> Sure. Here?s a strawman. >>>>>>>> >>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>> >>>>>>>> The type space is currently shared with the types used for the entire protocol, that gives us two options: >>>>>>>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >>>>>>>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >>>>>>>> >>>>>>>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >>>>>>>> >>>>>>>> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >>>>>>>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >>>>>>>> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >>>>>>>> - We give the rest of the space to application assignment. >>>>>>>> >>>>>>>> Make sense? >>>>>>>> >>>>>>>> >>>>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>>>>>>>> >>>>>>>>> we could design for performance, >>>>>>>> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >>>>>>>> >>>>>>>>> but I think there will be a turning >>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >>>>>>>> >>>>>>>>> Do you >>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>> performance improvement? >>>>>>>> I suspect LPM on data will always be slow (relative to the other functions). >>>>>>>> i suspect exclusions will always be slow because they will require extra memory references. >>>>>>>> >>>>>>>> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >>>>>>>> >>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>>>>>>>>> >>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>>>>>>>>> >>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>>>>>>>> well on it. It should be the other way around: once ndn app becomes >>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>>>>>>>> >>>>>>>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>>>>>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>>>>>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>>>>>>>> >>>>>>>>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>>>>>>>> >>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>> 2. typed component: use tlv type space and add a handful of types >>>>>>>>>>> 3. marked component: introduce only one more type and add additional >>>>>>>>>>> marker space >>>>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>>>>> >>>>>>>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>>>>>>>> >>>>>>>>>>> Also everybody thinks that the current utf8 marker naming convention >>>>>>>>>>> needs to be revised. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>>>>>>>> experiments? >>>>>>>>>>>> >>>>>>>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>>>>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>> >>>>>>>>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>>>>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>>>>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>> >>>>>>>>>>>> Marc >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>> >>>>>>>>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>>>>>>>> >>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>> OR >>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>> >>>>>>>>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>>>>>>>> that the first one is faster. I don?t think this is the case. >>>>>>>>>>>> >>>>>>>>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>>>>>>>> then all the required cache lines for the first 5 components. For the >>>>>>>>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>>>>>>>> Given an assumption that a cache miss is way more expensive than >>>>>>>>>>>> evaluating a number and computing an addition, you might find that the >>>>>>>>>>>> performance of the index is actually slower than the performance of the >>>>>>>>>>>> direct access. >>>>>>>>>>>> >>>>>>>>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>>>>>>>> example, if you just get the offsets and then send the offsets as >>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>>>>>>>> gain IF there are more cache line misses in reading the name than in >>>>>>>>>>>> reading the index. So, if the regular part of the name that you?re >>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>>>>>>>> processed by a different processor, then your might see some performance >>>>>>>>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>> >>>>>>>>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>>>>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>>>>>>>> different processor than the index). >>>>>>>>>>>> >>>>>>>>>>>> If you have numbers that show that the index is faster I would like to see >>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>> >>>>>>>>>>>> Nacho >>>>>>>>>>>> >>>>>>>>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>> Protocol Architect >>>>>>>>>>>> Principal Scientist >>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>>>>>>>> bytes: >>>>>>>>>>>> >>>>>>>>>>>> i.e., >>>>>>>>>>>> Type = Offsets >>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>> >>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>>>>>>>> long. >>>>>>>>>>>> >>>>>>>>>>>> Max >>>>>>>>>>>> >>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>> >>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>>>> want the first x components) you can directly have it using the >>>>>>>>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>>>>>>>> the first x-1 components. With the offset structure you cane directly >>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>> >>>>>>>>>>>> I don't get it. What you described only works if the "offset" is >>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>> >>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>>>>>>>> you >>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>>>>>>>> different >>>>>>>>>>>> scheme where the info that describes the name-components is ... >>>>>>>>>>>> someplace >>>>>>>>>>>> other than _in_ the name-components. is that correct? when you say >>>>>>>>>>>> "field >>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>>>>>>>> >>>>>>>>>>>> Correct. >>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>>>>>>>> hierarchy >>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>>>>>>>> in >>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>>>>>>>> avoided as >>>>>>>>>>>> you do not rely on field separators to parse the name; you use the >>>>>>>>>>>> "offset >>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>> >>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>> >>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>>>> want >>>>>>>>>>>> the first x components) you can directly have it using the offsets. >>>>>>>>>>>> With the >>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>>>>>>>> components. >>>>>>>>>>>> With the offset structure you cane directly access to the firs x >>>>>>>>>>>> components. >>>>>>>>>>>> >>>>>>>>>>>> Max >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- Mark >>>>>>>>>>>> >>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>> >>>>>>>>>>>> The why is simple: >>>>>>>>>>>> >>>>>>>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>>>>>>> component type". You are imposing types for every component in order >>>>>>>>>>>> to >>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>>>>>>> (specify >>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>> >>>>>>>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>>>>>>> to >>>>>>>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>>>>>>> outside the name, if you have some components that could be used at >>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Max >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>> >>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>>>>>> names >>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>> components >>>>>>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>> component's >>>>>>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>>>>>> applications MUST use). >>>>>>>>>>>> >>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>> explanation >>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>>>>>> and >>>>>>>>>>>> there's been email trying to explain that applications don't have to >>>>>>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>>>>>> the >>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>>>>>> that >>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Mark >>>>>>>>>>>> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> From christian.tschudin at unibas.ch Sun Sep 21 06:21:33 2014 From: christian.tschudin at unibas.ch (christian.tschudin at unibas.ch) Date: Sun, 21 Sep 2014 15:21:33 +0200 (CEST) Subject: [Ndn-interest] retrieving named data and layers (was Re: any comments on naming convention? In-Reply-To: <250C55BE-CC5C-4147-8391-A676A60B7728@parc.com> References: <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> , <250C55BE-CC5C-4147-8391-A676A60B7728@parc.com> Message-ID: Hi Marc, Tai-Lin and all to me, this feasibility-vs-drawback-of-selectors discussion has traits of an unproductive binary showdown, while I think it would be more interesting to understand and enable heterogeneity (inevitable in an evolving protocol world). Could, for example, ways be found such that selector-carrying interests can traverse an exact-match stretch (and work with the caching being done there)? I would see a yes to above question as a desirable specialization: simpler forwarding semantics would be deployed in domains with high speed, high volume or special technology like optical, richer forwarding functions whereever possible, functionally necessary or economically justifiable (and to still have some in-network optimization benefits). Here is a layout of a possible layering where CCNx and NDN live side-by-side (layers 3.b and 3.c): 4.b) numerous applications ----- fancy data access ---- 4.a) many high-level APIs, from prodCons, pubSub, and groupComm to sync 3.c) a few "discovery+selection APIs", mapping to either or a combination of - exact match + selProtocol - selectors in various shades (regExp) - named functions 3.b) heterogeneous concatenation of NDN and CCNx stretches 3.a) virtualization/redirection/fragm layer (common to NDN and CCNx) 2.b) one common wire format and naming convent. (covering layers up to 3.c) ----- raw media access ------- 2.a) link In that picture, CCNx would have a top-heavy stack: selection/discovery protocol at 3.c, lean forwarding at 3.b. NDN has priorities inverted: small or empty 3.c layer, but more involved forwarding semantics at 3.b. I added 3.a (which to me relates to the fixed header discussion) because I think this is missing so far. Looking forward to comments. Would the Paris ICNRG interim meeting be a place to discuss this picture? best, christian. On Sun, 21 Sep 2014, Marc.Mosko at parc.com wrote: > No matter what the expressiveness of the predicates if the forwarder > can send interests different ways you don't have a consistent > underlying set to talk about so you would always need non-range > exclusions to discover every version. > > Range exclusions only work I believe if you get an authoritative > answer. If different content pieces are scattered between different > caches I don't see how range exclusions would work to discover every > version. > > I'm sorry to be pointing out problems without offering solutions but > we're not ready to publish our discovery protocols. > > Sent from my telephone > >> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >> >> I see. Can you briefly describe how ccnx discovery protocol solves the >> all problems that you mentioned (not just exclude)? a doc will be >> better. >> >> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >> expect [and] and [or], so boolean algebra is fully supported. Regular >> language or context free language might become part of selector too. ... From shijunxiao at email.ARIZONA.EDU Sun Sep 21 11:55:59 2014 From: shijunxiao at email.ARIZONA.EDU (Junxiao Shi) Date: Sun, 21 Sep 2014 11:55:59 -0700 Subject: [Ndn-interest] Signed interest In-Reply-To: References: <1EFB1E57-5904-4661-AE8C-22D75B052062@gmail.com> <880D24C3-724A-4FC0-8E84-773609EEA184@gmail.com> <4ADA6F19-A536-4455-B9A6-5BD206AE1C8F@gmail.com> <9292AC2E-197B-4058-AEA3-5802DBEBA9DA@cs.ucla.edu> Message-ID: Hi Tai-Lin No, there is no way to detect an underlying connection is down, because NDN network operates on the granularity of Interest-Data exchange, and has no concept of "connection". Yours, Junxiao On Sat, Sep 20, 2014 at 11:19 AM, Tai-Lin Chu wrote: > >Using seqNo requires you to persistently remember the last used seqNo > (even if the app is turned off), otherwise you cannot guarantee that a > seqNo has not been used before. > > can we assume that once the underlying connection tears down, the > seqNo resets to 0? so after the app is turned off, you can safely > start from 0 again? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shijunxiao at email.ARIZONA.EDU Sun Sep 21 12:22:39 2014 From: shijunxiao at email.ARIZONA.EDU (Junxiao Shi) Date: Sun, 21 Sep 2014 12:22:39 -0700 Subject: [Ndn-interest] Signed interest Message-ID: Hi Tai-Lin NDN USA testbed is deployed with UDP tunnels. UDP has no concept of connection. More importantly, consumer and producer can be multiple hops away, and there can be multiple paths between them. Even if consumer or producer can be notified about a link failure, it doesn't mean consumer or producer have to restart. On the other direction, if consumer or producer restarts, there is always a link failure: the UNIX socket between application and local forwarder fails. However, forwarding plane will not notify all correspondents about such a failure. Yours, Junxiao On Sun, Sep 21, 2014 at 12:14 PM, Tai-Lin Chu wrote: > thanks. what I mean is like tcp/udp that ndn runs on top of. Also I > actually found another weakness from my proposal. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Marc.Mosko at parc.com Mon Sep 22 02:33:23 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 22 Sep 2014 09:33:23 +0000 Subject: [Ndn-interest] Discovery (was: any comments on naming convention?) In-Reply-To: <250C55BE-CC5C-4147-8391-A676A60B7728@parc.com> References: <541746A4.1000206@illinois.edu> <54175769.6070205@rabe.io> <5417713F.4090201@illinois.edu> <5417E823.70306@rabe.io> <5417FEF1.1060307@alcatel-lucent.com> <541841A4.70508@alcatel-lucent.com> <541843BC.1040307@cisco.com> <54184960.8060102@alcatel-lucent.com> <54188260.50306@cisco.com> <54195C32.6000609@alcatel-lucent.com> <541984EC.1000009@cisco.com> <541A823D.4010108@alcatel-lucent.com> <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> , <250C55BE-CC5C-4147-8391-A676A60B7728@parc.com> Message-ID: <8B644CB3-E05E-4A1E-A9D6-222B09E64B7F@parc.com> I received some feedback that people would like to know more about discovery and what we have done in ccnx 1.0, so I am starting a new thread for it. As I mentioned, we are not ready to publish our discovery protocols and I don?t want to go off half way with something we are still working on, but I can talk about what I think discovery is and should do. Discovery is a very important topic and I don?t think research is anywhere near done on the topic. First, I think we need a clear definition of what services discover offers. I would say it should do, at least, ?discover greatest? and ?discover all? with the obvious variations based on sort order and range, for some given prefix. It should also support discovery scoped by one (or possibly more) publisher keys. What does ?discover greatest? mean? One could do something similar to ccnx 0.x and ndn, where its based on a canonical sort order of name components and one could ask for the greatest name after a given prefix. Or, one could do something specific to a versioning protocol or creation time, etc. What does ?discover all? mean? First, I think we should recognize that some data sets might be very, very large. Like millions or billions of possible results (content objects). The discovery protocol should be able to discover it all, efficiently (if not optimally). I also think there is no one discovery protocol. Some applications may want strict ACID-style discovery (i.e. only see ?completed? or ?whole? results, such as where all segments are available) some might want eventually consistent discovery, some might take best-effort discovery. Some discovery protocols may require authentication and some may be open to the world. I think the discovery process should be separate from the retrieval process. If one is publishing, say, 64KB objects or even 4GB objects (or even larger objects!), one does not want to have to fetch each object to discover it. All this leads me to think we need discovery protocols that allow us to talk about content without actually transferring the content. The discovery protocol should be able to handle multiple objects with the same name ? i.e. two publishers overwrite each other or even a single publisher publishes two objects with the same name. Discovery is also closely related to how routing works and forwarding strategy. Does an Interest flood all replicas? Does it only go anycast style? How large can an Interest be, and what?s the performance tradeoff for large interests (i.e. dropping fragments)? Is there an in-network control protocol to NAK or are all NAKs end-to-end? How does a discovery process terminate (i.e. when do you know you?re done)? Marc On Sep 21, 2014, at 9:23 AM, Mosko, Marc wrote: > No matter what the expressiveness of the predicates if the forwarder can send interests different ways you don't have a consistent underlying set to talk about so you would always need non-range exclusions to discover every version. > > Range exclusions only work I believe if you get an authoritative answer. If different content pieces are scattered between different caches I don't see how range exclusions would work to discover every version. > > I'm sorry to be pointing out problems without offering solutions but we're not ready to publish our discovery protocols. > > Sent from my telephone > >> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >> >> I see. Can you briefly describe how ccnx discovery protocol solves the >> all problems that you mentioned (not just exclude)? a doc will be >> better. >> >> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >> expect [and] and [or], so boolean algebra is fully supported. Regular >> language or context free language might become part of selector too. >> >>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>> That will get you one reading then you need to exclude it and ask again. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>> >>>>> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>> >>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>> >>>>>>> If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. >>>>>> >>>>>> Could you explain why missing content object situation happens? also >>>>>> range exclusion is just a shorter notation for many explicit exclude; >>>>>> converting from explicit excludes to ranged exclude is always >>>>>> possible. >>>>> >>>>> Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. >>>>> >>>>> yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. >>>>> >>>>>> >>>>>>> You exclude through 100 then issue a new interest. This goes to cache B >>>>>> >>>>>> I feel this case is invalid because cache A will also get the >>>>>> interest, and cache A will return v101 if it exists. Like you said, if >>>>>> this goes to cache B only, it means that cache A dies. How do you know >>>>>> that v101 even exist? >>>>> >>>>> I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. >>>>> >>>>>> >>>>>> >>>>>> c,d In general I agree that LPM performance is related to the number >>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>> every node will be faster or not because of lock overhead. >>>>>> >>>>>> However, we should compare (exact match + discovery protocol) vs (ndn >>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>> >>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>> I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. >>>>>>> >>>>>>> This is probably getting off-topic from the original post about naming conventions. >>>>>>> >>>>>>> a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. >>>>>>> >>>>>>> b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. >>>>>>> >>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. >>>>>>> >>>>>>> c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. >>>>>>> >>>>>>> d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu wrote: >>>>>>>> >>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>> besides typed component: >>>>>>>> 1. LPM allows "data discovery". How will exact match do similar things? >>>>>>>> 2. will removing selectors improve performance? How do we use other >>>>>>>> faster technique to replace selector? >>>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>> >>>>>>>> >>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) wrote: >>>>>>>>> >>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu wrote: >>>>>>>>> >>>>>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>>>>> >>>>>>>>>> Could you share it with us? >>>>>>>>> Sure. Here?s a strawman. >>>>>>>>> >>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>> >>>>>>>>> The type space is currently shared with the types used for the entire protocol, that gives us two options: >>>>>>>>> (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. >>>>>>>>> (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. >>>>>>>>> >>>>>>>>> We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). >>>>>>>>> >>>>>>>>> - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. >>>>>>>>> - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>> - We reserve some portion of the space for unanticipated uses (say another 1024 types) >>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>> >>>>>>>>> Make sense? >>>>>>>>> >>>>>>>>> >>>>>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design >>>>>>>>>> >>>>>>>>>> we could design for performance, >>>>>>>>> That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. >>>>>>>>> >>>>>>>>>> but I think there will be a turning >>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. >>>>>>>>> >>>>>>>>>> Do you >>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>> performance improvement? >>>>>>>>> I suspect LPM on data will always be slow (relative to the other functions). >>>>>>>>> i suspect exclusions will always be slow because they will require extra memory references. >>>>>>>>> >>>>>>>>> However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? >>>>>>>>> >>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu wrote: >>>>>>>>>>>> >>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to perform >>>>>>>>>>>> well on it. It should be the other way around: once ndn app becomes >>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>> While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: >>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>> c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. >>>>>>>>>>> >>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: >>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere >>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. >>>>>>>>>>> >>>>>>>>>>> I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. >>>>>>>>>>> >>>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. Now I >>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of types >>>>>>>>>>>> 3. marked component: introduce only one more type and add additional >>>>>>>>>>>> marker space >>>>>>>>>>> I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. >>>>>>>>>>> >>>>>>>>>>> It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. >>>>>>>>>>> >>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming convention >>>>>>>>>>>> needs to be revised. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe wrote: >>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names to fit in (the >>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in current NDN >>>>>>>>>>>>> experiments? >>>>>>>>>>>>> >>>>>>>>>>>>> I guess wide deployment could make for even longer names. Related: Many URLs >>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column text lines, and >>>>>>>>>>>>> NDN will have to carry more information than URLs, as far as I see. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some architectures, >>>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, >>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent 32-byte blocks >>>>>>>>>>>>> (there can be at most 5 blocks available at any one time). If you need to >>>>>>>>>>>>> switch between arrays, it would be very expensive. If you have to read past >>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup to get to the >>>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>>> >>>>>>>>>>>>> If you want to parse the first 5 components. One way to do it is: >>>>>>>>>>>>> >>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes from the start >>>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>>> OR >>>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>>> >>>>>>>>>>>>> How much speed are you getting from one to the other? You seem to imply >>>>>>>>>>>>> that the first one is faster. I don?t think this is the case. >>>>>>>>>>>>> >>>>>>>>>>>>> In the first one you?ll probably have to get the cache line for the index, >>>>>>>>>>>>> then all the required cache lines for the first 5 components. For the >>>>>>>>>>>>> second, you?ll have to get all the cache lines for the first 5 components. >>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive than >>>>>>>>>>>>> evaluating a number and computing an addition, you might find that the >>>>>>>>>>>>> performance of the index is actually slower than the performance of the >>>>>>>>>>>>> direct access. >>>>>>>>>>>>> >>>>>>>>>>>>> Granted, there is a case where you don?t access the name at all, for >>>>>>>>>>>>> example, if you just get the offsets and then send the offsets as >>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case you may see a >>>>>>>>>>>>> gain IF there are more cache line misses in reading the name than in >>>>>>>>>>>>> reading the index. So, if the regular part of the name that you?re >>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the name is to be >>>>>>>>>>>>> processed by a different processor, then your might see some performance >>>>>>>>>>>>> gain in using the index, but in all other circumstances I bet this is not >>>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>>> >>>>>>>>>>>>> This is all to say, I don?t think we should be designing the protocol with >>>>>>>>>>>>> only one architecture in mind. (The architecture of sending the name to a >>>>>>>>>>>>> different processor than the index). >>>>>>>>>>>>> >>>>>>>>>>>>> If you have numbers that show that the index is faster I would like to see >>>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>>> >>>>>>>>>>>>> Nacho >>>>>>>>>>>>> >>>>>>>>>>>>> (I may have misinterpreted your description so feel free to correct me if >>>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>>> Protocol Architect >>>>>>>>>>>>> Principal Scientist >>>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed amount of >>>>>>>>>>>>> bytes: >>>>>>>>>>>>> >>>>>>>>>>>>> i.e., >>>>>>>>>>>>> Type = Offsets >>>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>>> >>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your name is too >>>>>>>>>>>>> long. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>>>>> want the first x components) you can directly have it using the >>>>>>>>>>>>> offsets. With the Nested TLV structure you have to iteratively parse >>>>>>>>>>>>> the first x-1 components. With the offset structure you cane directly >>>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't get it. What you described only works if the "offset" is >>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to parse x-1 >>>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I like the >>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I understand what >>>>>>>>>>>>> you >>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an entirely >>>>>>>>>>>>> different >>>>>>>>>>>>> scheme where the info that describes the name-components is ... >>>>>>>>>>>>> someplace >>>>>>>>>>>>> other than _in_ the name-components. is that correct? when you say >>>>>>>>>>>>> "field >>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a TLV)? >>>>>>>>>>>>> >>>>>>>>>>>>> Correct. >>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the name >>>>>>>>>>>>> hierarchy >>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the offset to use >>>>>>>>>>>>> in >>>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>>> As for the field separator, it is something like "/". Aliasing is >>>>>>>>>>>>> avoided as >>>>>>>>>>>>> you do not rely on field separators to parse the name; you use the >>>>>>>>>>>>> "offset >>>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>>> >>>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose you only >>>>>>>>>>>>> want >>>>>>>>>>>>> the first x components) you can directly have it using the offsets. >>>>>>>>>>>>> With the >>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first x-1 >>>>>>>>>>>>> components. >>>>>>>>>>>>> With the offset structure you cane directly access to the firs x >>>>>>>>>>>>> components. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- Mark >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> The why is simple: >>>>>>>>>>>>> >>>>>>>>>>>>> You use a lot of "generic component type" and very few "specific >>>>>>>>>>>>> component type". You are imposing types for every component in order >>>>>>>>>>>>> to >>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a rule >>>>>>>>>>>>> (specify >>>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>>> >>>>>>>>>>>>> I would prefer not to have typed components. Instead I would prefer >>>>>>>>>>>>> to >>>>>>>>>>>>> have the name as simple sequence bytes with a field separator. Then, >>>>>>>>>>>>> outside the name, if you have some components that could be used at >>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something that >>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the version, >>>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>>> However, if you have a small number of types, you will end up with >>>>>>>>>>>>> names >>>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>>> components >>>>>>>>>>>>> types. Due to the fact that the component type specification is an >>>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>>> component's >>>>>>>>>>>>> type only when needed (something like UTF8 conventions but that >>>>>>>>>>>>> applications MUST use). >>>>>>>>>>>>> >>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>>> explanation >>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, e.g.) >>>>>>>>>>>>> and >>>>>>>>>>>>> there's been email trying to explain that applications don't have to >>>>>>>>>>>>> use types if they don't need to. your email sounds like "I prefer >>>>>>>>>>>>> the >>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that preference in >>>>>>>>>>>>> the face of the points about the problems. can you say why it is >>>>>>>>>>>>> that >>>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Mark >>>>>>>>>>>>> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From felix at rabe.io Mon Sep 22 03:00:49 2014 From: felix at rabe.io (Felix Rabe) Date: Mon, 22 Sep 2014 12:00:49 +0200 Subject: [Ndn-interest] Discovery (was: any comments on naming convention?) Message-ID: <541FF351.40209@rabe.io> Marc's email actually went to the same thread. This should start a new one. Feel free to reply to Marc here. - Felix On 22/Sep/14 11:33, Marc.Mosko at parc.com wrote: > I received some feedback that people would like to know more about discovery and what we have done in ccnx 1.0, so I am starting a new thread for it. > > As I mentioned, we are not ready to publish our discovery protocols and I don?t want to go off half way with something we are still working on, but I can talk about what I think discovery is and should do. Discovery is a very important topic and I don?t think research is anywhere near done on the topic. > > First, I think we need a clear definition of what services discover offers. I would say it should do, at least, ?discover greatest? and ?discover all? with the obvious variations based on sort order and range, for some given prefix. It should also support discovery scoped by one (or possibly more) publisher keys. > > What does ?discover greatest? mean? One could do something similar to ccnx 0.x and ndn, where its based on a canonical sort order of name components and one could ask for the greatest name after a given prefix. Or, one could do something specific to a versioning protocol or creation time, etc. > > What does ?discover all? mean? First, I think we should recognize that some data sets might be very, very large. Like millions or billions of possible results (content objects). The discovery protocol should be able to discover it all, efficiently (if not optimally). > > I also think there is no one discovery protocol. Some applications may want strict ACID-style discovery (i.e. only see ?completed? or ?whole? results, such as where all segments are available) some might want eventually consistent discovery, some might take best-effort discovery. Some discovery protocols may require authentication and some may be open to the world. > > I think the discovery process should be separate from the retrieval process. If one is publishing, say, 64KB objects or even 4GB objects (or even larger objects!), one does not want to have to fetch each object to discover it. > > All this leads me to think we need discovery protocols that allow us to talk about content without actually transferring the content. > > The discovery protocol should be able to handle multiple objects with the same name ? i.e. two publishers overwrite each other or even a single publisher publishes two objects with the same name. > > Discovery is also closely related to how routing works and forwarding strategy. Does an Interest flood all replicas? Does it only go anycast style? How large can an Interest be, and what?s the performance tradeoff for large interests (i.e. dropping fragments)? Is there an in-network control protocol to NAK or are all NAKs end-to-end? How does a discovery process terminate (i.e. when do you know you?re done)? > > Marc From caozhenpku at gmail.com Mon Sep 22 05:36:42 2014 From: caozhenpku at gmail.com (Zhen Cao) Date: Mon, 22 Sep 2014 14:36:42 +0200 Subject: [Ndn-interest] Middlebox in NDN Era Message-ID: Hello Everybody, I am working on Middlebox problems in the Internet, and also thinking about MB in NDN. Will NDN relax the Middlebox dependancy? I do not know if there is already some work/articles about the topic. Appreciate it if you could help give me some references. If the NDN is built as an overlay on the IP, the MBs like NAT/IPS/WAN-Ops will still be necessary. If the NDN creates a clean slate, is there a need of some new types of MBs? Many thanks for discussion. Best regards, zhen From shijunxiao at email.arizona.edu Mon Sep 22 05:39:09 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Mon, 22 Sep 2014 05:39:09 -0700 Subject: [Ndn-interest] Middlebox in NDN Era In-Reply-To: References: Message-ID: Hi Zhen Can you explain more on the definition of Middlebox? Yours, Junxiao -------------- next part -------------- An HTML attachment was scrubbed... URL: From caozhenpku at gmail.com Mon Sep 22 05:44:06 2014 From: caozhenpku at gmail.com (Zhen Cao) Date: Mon, 22 Sep 2014 14:44:06 +0200 Subject: [Ndn-interest] Middlebox in NDN Era In-Reply-To: References: Message-ID: Hi Junxiao, I think RFC 3234 is good summary of MBs on Internet. http://tools.ietf.org/html/rfc3234 Where the first paragraph suffices a neat definition: "The phrase "middlebox" was coined by Prof. Lixia Zhang as a graphic description of a recent phenomenon in the Internet. A middlebox is defined as any intermediary device performing functions other than the normal, standard functions of an IP router on the datagram path between a source host and destination host." Cheers, zhen On Mon, Sep 22, 2014 at 2:39 PM, Junxiao Shi wrote: > Hi Zhen > > Can you explain more on the definition of Middlebox? > > Yours, Junxiao From christian.tschudin at unibas.ch Mon Sep 22 05:54:13 2014 From: christian.tschudin at unibas.ch (christian.tschudin at unibas.ch) Date: Mon, 22 Sep 2014 14:54:13 +0200 (CEST) Subject: [Ndn-interest] Middlebox in NDN Era In-Reply-To: References: Message-ID: ... which means that as soon as caching, selectors or named functions are involved, every NDN-enabled node is a middlebox: especially the destination node has to be aware of their potential actions. best, christian On Mon, 22 Sep 2014, Zhen Cao wrote: > Hi Junxiao, > > I think RFC 3234 is good summary of MBs on Internet. > http://tools.ietf.org/html/rfc3234 > > Where the first paragraph suffices a neat definition: > "The phrase "middlebox" was coined by Prof. Lixia Zhang as a graphic > description of a recent phenomenon in the Internet. A middlebox is > defined as any intermediary device performing functions other than > the normal, standard functions of an IP router on the datagram path > between a source host and destination host." > > Cheers, > zhen > > On Mon, Sep 22, 2014 at 2:39 PM, Junxiao Shi > wrote: >> Hi Zhen >> >> Can you explain more on the definition of Middlebox? >> >> Yours, Junxiao > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Marc.Mosko at parc.com Mon Sep 22 05:57:08 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 22 Sep 2014 12:57:08 +0000 Subject: [Ndn-interest] retrieving named data and layers (was Re: any comments on naming convention? In-Reply-To: References: <541A8FD2.8010302@alcatel-lucent.com> <096077D8-5479-4CC2-A0F0-1990CDDE57F4@parc.com> <541B5C39.2050502@rabe.io> <86620BF2-348A-424D-AB1A-237936550ECA@cisco.com> <9CFF122E-98C5-437C-9B18-E5EBD0AC4F19@cisco.com> <18CE40D5-3C63-47EF-84A5-D819A2623590@parc.com> , <250C55BE-CC5C-4147-8391-A676A60B7728@parc.com> Message-ID: <77A00781-A8DE-4382-8B80-D2B5E8CFB7B9@parc.com> Christian, Attached is a proposal we were working with to use the ccnx 0.x selector discovery over exact match names. Basically, all the selectors are TLV encoded into a single name component and the response is encapsulated with a new PayloadType. We do not have any code around this, it was just an early proposal of ours. We have not advanced this route because, as you may notice from the other thread on Discovery, is that I think a discovery protocol should do more than what selectors offer. So, if we are going to run one (or more) discovery protocols over exact match, I think it should be a different protocol than selectors. That said, to offer compatibility to ccnx 0.x or ndn applications, one could use a scheme like this. Marc -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ccnx-mosko-selectors-00.txt URL: -------------- next part -------------- On Sep 21, 2014, at 3:21 PM, christian.tschudin at unibas.ch wrote: > Hi Marc, Tai-Lin and all > > to me, this feasibility-vs-drawback-of-selectors discussion has traits of an unproductive binary showdown, while I think it would be more interesting to understand and enable heterogeneity (inevitable in an evolving protocol world). > > Could, for example, ways be found such that selector-carrying interests can traverse an exact-match stretch (and work with the caching being done there)? > > I would see a yes to above question as a desirable specialization: simpler forwarding semantics would be deployed in domains with high speed, high volume or special technology like optical, richer forwarding functions whereever possible, functionally necessary or economically justifiable (and to still have some in-network optimization benefits). > > Here is a layout of a possible layering where CCNx and NDN live side-by-side (layers 3.b and 3.c): > > > 4.b) numerous applications > > ----- fancy data access ---- > > 4.a) many high-level APIs, from prodCons, pubSub, and groupComm to sync > > 3.c) a few "discovery+selection APIs", mapping to either or a combination of > - exact match + selProtocol > - selectors in various shades (regExp) > - named functions > > 3.b) heterogeneous concatenation of NDN and CCNx stretches > > 3.a) virtualization/redirection/fragm layer (common to NDN and CCNx) > > 2.b) one common wire format and naming convent. (covering layers up to 3.c) > > ----- raw media access ------- > > 2.a) link > > > In that picture, CCNx would have a top-heavy stack: selection/discovery protocol at 3.c, lean forwarding at 3.b. > > NDN has priorities inverted: small or empty 3.c layer, but more involved forwarding semantics at 3.b. > > I added 3.a (which to me relates to the fixed header discussion) because I think this is missing so far. > > Looking forward to comments. Would the Paris ICNRG interim meeting be a place to discuss this picture? > > best, christian. > > > On Sun, 21 Sep 2014, Marc.Mosko at parc.com wrote: > >> No matter what the expressiveness of the predicates if the forwarder can send interests different ways you don't have a consistent underlying set to talk about so you would always need non-range exclusions to discover every version. >> >> Range exclusions only work I believe if you get an authoritative answer. If different content pieces are scattered between different caches I don't see how range exclusions would work to discover every version. >> >> I'm sorry to be pointing out problems without offering solutions but we're not ready to publish our discovery protocols. >> >> Sent from my telephone >> >>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>> >>> I see. Can you briefly describe how ccnx discovery protocol solves the >>> all problems that you mentioned (not just exclude)? a doc will be >>> better. >>> >>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>> expect [and] and [or], so boolean algebra is fully supported. Regular >>> language or context free language might become part of selector too. > ... -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From jefft0 at remap.UCLA.EDU Mon Sep 22 09:53:47 2014 From: jefft0 at remap.UCLA.EDU (Thompson, Jeff) Date: Mon, 22 Sep 2014 16:53:47 +0000 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: > However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other. Oh crap, you're right! Thanks for pointing that out. I agree that we should prohibit KeyDigest in the case that the key is longer than 32 bytes. I would reword the reason slightly: "When the key is longer than 32 bytes, the HmacWithSha256 algorithm uses the SHA-256 digest of the key, which would be the same as the KeyDigest, effectively exposing the secret." - Jeff T From: Yingdi Yu > Date: Sunday, September 21, 2014 12:00 AM To: Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types On Sep 20, 2014, at 11:18 AM, Adeola Bannis > wrote: On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC signature. Because if key size is longer than hash output, the key digest is used instead. If we allow KeyDigest in KeyLocator, then some careless programmers may leak the secret. Well, a careless programmer could put a passphrase, or something used to derive the key, into a KeyLocator KeyName as well. We can go ahead and make the restriction, but there are other ways a programmer could shoot herself in the foot here. I did not mean that "careless" as you describe. I mean those who do not know the details of HMAC. The problem is that when key is shorter than hash output, it is safe to put key digest in the key locator. However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other, i.e., attackers can directly use the key digest to construct any legitimate HMAC. Therefore you either explicitly specify what should be used when KeyDigest is used as the KeyLocator, or you can disable the usage of KeyDigest in KeyLocator. Yingdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From jefft0 at remap.UCLA.EDU Mon Sep 22 09:55:51 2014 From: jefft0 at remap.UCLA.EDU (Thompson, Jeff) Date: Mon, 22 Sep 2014 16:55:51 +0000 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: (On a second read, I see now that Marc made the same point?) From: , Jeff Thompson > Date: Monday, September 22, 2014 9:53 AM To: Yingdi Yu >, Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: > However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other. Oh crap, you're right! Thanks for pointing that out. I agree that we should prohibit KeyDigest in the case that the key is longer than 32 bytes. I would reword the reason slightly: "When the key is longer than 32 bytes, the HmacWithSha256 algorithm uses the SHA-256 digest of the key, which would be the same as the KeyDigest, effectively exposing the secret." - Jeff T From: Yingdi Yu > Date: Sunday, September 21, 2014 12:00 AM To: Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types On Sep 20, 2014, at 11:18 AM, Adeola Bannis > wrote: On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC signature. Because if key size is longer than hash output, the key digest is used instead. If we allow KeyDigest in KeyLocator, then some careless programmers may leak the secret. Well, a careless programmer could put a passphrase, or something used to derive the key, into a KeyLocator KeyName as well. We can go ahead and make the restriction, but there are other ways a programmer could shoot herself in the foot here. I did not mean that "careless" as you describe. I mean those who do not know the details of HMAC. The problem is that when key is shorter than hash output, it is safe to put key digest in the key locator. However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other, i.e., attackers can directly use the key digest to construct any legitimate HMAC. Therefore you either explicitly specify what should be used when KeyDigest is used as the KeyLocator, or you can disable the usage of KeyDigest in KeyLocator. Yingdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From Marc.Mosko at parc.com Mon Sep 22 10:00:21 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 22 Sep 2014 17:00:21 +0000 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> , Message-ID: Actually I would say the keyid is the hash of the "functional" key. If you're using the hash of a long key as the key, I'd put the hash of the hash as the keyid. Marc Sent from my telephone On Sep 22, 2014, at 18:56, "Thompson, Jeff" > wrote: (On a second read, I see now that Marc made the same point?) From: , Jeff Thompson > Date: Monday, September 22, 2014 9:53 AM To: Yingdi Yu >, Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: > However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other. Oh crap, you're right! Thanks for pointing that out. I agree that we should prohibit KeyDigest in the case that the key is longer than 32 bytes. I would reword the reason slightly: "When the key is longer than 32 bytes, the HmacWithSha256 algorithm uses the SHA-256 digest of the key, which would be the same as the KeyDigest, effectively exposing the secret." - Jeff T From: Yingdi Yu > Date: Sunday, September 21, 2014 12:00 AM To: Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types On Sep 20, 2014, at 11:18 AM, Adeola Bannis > wrote: On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC signature. Because if key size is longer than hash output, the key digest is used instead. If we allow KeyDigest in KeyLocator, then some careless programmers may leak the secret. Well, a careless programmer could put a passphrase, or something used to derive the key, into a KeyLocator KeyName as well. We can go ahead and make the restriction, but there are other ways a programmer could shoot herself in the foot here. I did not mean that "careless" as you describe. I mean those who do not know the details of HMAC. The problem is that when key is shorter than hash output, it is safe to put key digest in the key locator. However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other, i.e., attackers can directly use the key digest to construct any legitimate HMAC. Therefore you either explicitly specify what should be used when KeyDigest is used as the KeyLocator, or you can disable the usage of KeyDigest in KeyLocator. Yingdi _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From jefft0 at remap.UCLA.EDU Mon Sep 22 10:02:21 2014 From: jefft0 at remap.UCLA.EDU (Thompson, Jeff) Date: Mon, 22 Sep 2014 17:02:21 +0000 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: Hash output length vs. block size? Marc is right. The HmacWithSha256 algorithm hashes the key when it is longer than the block size (64 bytes), not the hash output length (32 bytes). So we would prohibit keys longer than 64 bytes (not 32 bytes). Also, most applications will use a crypto library's HMAC function which should automatically hash the key if it is longer than the block size. It could be confusing to put this in the spec since the application writer may unnecessarily has the key when the crypto library will do it anyway, and more efficiently. - Jeff T From: , Jeff Thompson > Date: Monday, September 22, 2014 9:53 AM To: Yingdi Yu >, Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: > However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other. Oh crap, you're right! Thanks for pointing that out. I agree that we should prohibit KeyDigest in the case that the key is longer than 32 bytes. I would reword the reason slightly: "When the key is longer than 32 bytes, the HmacWithSha256 algorithm uses the SHA-256 digest of the key, which would be the same as the KeyDigest, effectively exposing the secret." - Jeff T From: Yingdi Yu > Date: Sunday, September 21, 2014 12:00 AM To: Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types On Sep 20, 2014, at 11:18 AM, Adeola Bannis > wrote: On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu > wrote: @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC signature. Because if key size is longer than hash output, the key digest is used instead. If we allow KeyDigest in KeyLocator, then some careless programmers may leak the secret. Well, a careless programmer could put a passphrase, or something used to derive the key, into a KeyLocator KeyName as well. We can go ahead and make the restriction, but there are other ways a programmer could shoot herself in the foot here. I did not mean that "careless" as you describe. I mean those who do not know the details of HMAC. The problem is that when key is shorter than hash output, it is safe to put key digest in the key locator. However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other, i.e., attackers can directly use the key digest to construct any legitimate HMAC. Therefore you either explicitly specify what should be used when KeyDigest is used as the KeyLocator, or you can disable the usage of KeyDigest in KeyLocator. Yingdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From shijunxiao at email.arizona.edu Mon Sep 22 10:02:22 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Mon, 22 Sep 2014 10:02:22 -0700 Subject: [Ndn-interest] Middlebox in NDN Era In-Reply-To: References: Message-ID: Hi Zhen To align with RFC3234 definition, in NDN: A middlebox is defined as any intermediary device performing functions other than the normal, standard functions of an NDN router on the datagram path between a consumer and producer. Standard functions of an NDN router include: - forward Interest by Name - return Data by PIT states - cache Data Middlebox functions can include: - translate between routable and non-routable Names - perform Data validation - filter packets according to firewall rules Yours, Junxiao -------------- next part -------------- An HTML attachment was scrubbed... URL: From smherwig at gmail.com Mon Sep 22 12:26:46 2014 From: smherwig at gmail.com (Stephen Herwig) Date: Mon, 22 Sep 2014 15:26:46 -0400 Subject: [Ndn-interest] new PhD student looking to get involved in NDN Message-ID: Hi, I just started a PhD program in computer science at the University of Maryland. Over the past month, I've been reading about NDN and playing with v0.3, and I find it very interesting. I'm thinking of ways to get involved, and wondering what the critical problems are. For instance, distributed trust-models seem pervasive. On the other hand, a lot of the work seems to be protocol experimentation (e.g., can I write such-and-such application in NDN). In short, I'm just sort of wondering what the difficult problems are that the community is working on. Hopefully, I'll be able to help out. Thanks, Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From abannis at ucla.edu Mon Sep 22 12:34:13 2014 From: abannis at ucla.edu (Adeola Bannis) Date: Mon, 22 Sep 2014 12:34:13 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: So far, I haven't heard anything I object to. Could someone with strong feelings about KeyDigest use rewrite the spec? Are there any other areas we want to rewrite (the definition of HMAC)? Otherwise, it sounds like this is something we would implement soon. -Adeola On Mon, Sep 22, 2014 at 10:02 AM, Thompson, Jeff wrote: > Hash output length vs. block size? > > Marc is right. The HmacWithSha256 algorithm hashes the key when it is > longer than the block size (64 bytes), not the hash output length (32 > bytes). So we would prohibit keys longer than 64 bytes (not 32 bytes). > > Also, most applications will use a crypto library's HMAC function which > should automatically hash the key if it is longer than the block size. It > could be confusing to put this in the spec since the application writer may > unnecessarily has the key when the crypto library will do it anyway, and > more efficiently. > > - Jeff T > > From: , Jeff Thompson > Date: Monday, September 22, 2014 9:53 AM > To: Yingdi Yu , Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types > > On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu wrote: > > However, when key is longer than the hash output, putting key digest > into key locator directly expose the secret to other. > > Oh crap, you're right! Thanks for pointing that out. I agree that we > should prohibit KeyDigest in the case that the key is longer than 32 > bytes. I would reword the reason slightly: "When the key is longer than 32 > bytes, the HmacWithSha256 algorithm uses the SHA-256 digest of the key, > which would be the same as the KeyDigest, effectively exposing the secret." > > - Jeff T > > From: Yingdi Yu > Date: Sunday, September 21, 2014 12:00 AM > To: Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types > > On Sep 20, 2014, at 11:18 AM, Adeola Bannis wrote: > > > On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu wrote: > >> >> @Adeola, you probably want to forbid KeyDigest in KeyLocator for this >> HMAC signature. Because if key size is longer than hash output, the key >> digest is used instead. If we allow KeyDigest in KeyLocator, then some >> careless programmers may leak the secret. >> > > Well, a careless programmer could put a passphrase, or something used to > derive the key, into a KeyLocator KeyName as well. We can go ahead and make > the restriction, but there are other ways a programmer could shoot herself > in the foot here. > > > I did not mean that "careless" as you describe. I mean those who do not > know the details of HMAC. > > The problem is that when key is shorter than hash output, it is safe to > put key digest in the key locator. However, when key is longer than the > hash output, putting key digest into key locator directly expose the secret > to other, i.e., attackers can directly use the key digest to construct any > legitimate HMAC. Therefore you either explicitly specify what should be > used when KeyDigest is used as the KeyLocator, or you can disable the usage > of KeyDigest in KeyLocator. > > Yingdi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix at rabe.io Mon Sep 22 13:46:28 2014 From: felix at rabe.io (Felix Rabe) Date: Mon, 22 Sep 2014 22:46:28 +0200 Subject: [Ndn-interest] NDNcomm 2014 videos Message-ID: <54208AA4.9060709@rabe.io> Hi list (or, REMAP) The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are they available as downloads somewhere? Also, I see some videos are barely viewable (at least [1], but [2] seems to be fine), skipping a few seconds every few seconds. Do you still have a complete version? [1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 [2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 Kind regards - Felix From jburke at remap.UCLA.EDU Mon Sep 22 14:04:01 2014 From: jburke at remap.UCLA.EDU (Burke, Jeff) Date: Mon, 22 Sep 2014 21:04:01 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <250C55BE-CC5C-4147-8391-A676A60B7728@parc.com> Message-ID: Marc, If you can't talk about your protocols, perhaps we can discuss this based on use cases. What are the use cases you are using to evaluate discovery? Jeff On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: >No matter what the expressiveness of the predicates if the forwarder can >send interests different ways you don't have a consistent underlying set >to talk about so you would always need non-range exclusions to discover >every version. > >Range exclusions only work I believe if you get an authoritative answer. > If different content pieces are scattered between different caches I >don't see how range exclusions would work to discover every version. > >I'm sorry to be pointing out problems without offering solutions but >we're not ready to publish our discovery protocols. > >Sent from my telephone > >> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >> >> I see. Can you briefly describe how ccnx discovery protocol solves the >> all problems that you mentioned (not just exclude)? a doc will be >> better. >> >> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >> expect [and] and [or], so boolean algebra is fully supported. Regular >> language or context free language might become part of selector too. >> >>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>> That will get you one reading then you need to exclude it and ask >>>again. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>> >>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>with a particular cache, then you need to always use individual >>>>>excludes not range excludes if you want to discover all the versions >>>>>of an object. >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>> >>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>> >>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>could miss content objects you want to discovery unless you avoid >>>>>>>all range exclusions and only exclude explicit versions. >>>>>> >>>>>> Could you explain why missing content object situation happens? also >>>>>> range exclusion is just a shorter notation for many explicit >>>>>>exclude; >>>>>> converting from explicit excludes to ranged exclude is always >>>>>> possible. >>>>> >>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>with a particular cache, then you need to always use individual >>>>>excludes not range excludes if you want to discover all the versions >>>>>of an object. For something like a sensor reading that is updated, >>>>>say, once per second you will have 86,400 of them per day. If each >>>>>exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>exclusions (plus encoding overhead) per day. >>>>> >>>>> yes, maybe using a more deterministic version number than a >>>>>timestamp makes sense here, but its just an example of needing a lot >>>>>of exclusions. >>>>> >>>>>> >>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>>cache B >>>>>> >>>>>> I feel this case is invalid because cache A will also get the >>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>>if >>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>know >>>>>> that v101 even exist? >>>>> >>>>> I guess this depends on what the forwarding strategy is. If the >>>>>forwarder will always send each interest to all replicas, then yes, >>>>>modulo packet loss, you would discover v101 on cache A. If the >>>>>forwarder is just doing ?best path? and can round-robin between cache >>>>>A and cache B, then your application could miss v101. >>>>> >>>>>> >>>>>> >>>>>> c,d In general I agree that LPM performance is related to the number >>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>> every node will be faster or not because of lock overhead. >>>>>> >>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>(ndn >>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>> >>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>specs for doing the exact match discovery. So, as I said, I?m not >>>>>ready to claim its better yet because we have not done that. >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>> I would point out that using LPM on content object to Interest >>>>>>>matching to do discovery has its own set of problems. Discovery >>>>>>>involves more than just ?latest version? discovery too. >>>>>>> >>>>>>> This is probably getting off-topic from the original post about >>>>>>>naming conventions. >>>>>>> >>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>different caches are responding, the exclusion set you build up >>>>>>>talking with cache A will be invalid for cache B. If you talk >>>>>>>sometimes to A and sometimes to B, you very easily could miss >>>>>>>content objects you want to discovery unless you avoid all range >>>>>>>exclusions and only exclude explicit versions. That will lead to >>>>>>>very large interest packets. In ccnx 1.0, we believe that an >>>>>>>explicit discovery protocol that allows conversations about >>>>>>>consistent sets is better. >>>>>>> >>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>should be transitive between caches, but imagine this. You send >>>>>>>Interest #1 to cache A which returns version 100. You exclude >>>>>>>through 100 then issue a new interest. This goes to cache B who >>>>>>>only has version 99, so the interest times out or is NACK?d. So >>>>>>>you think you have it! But, cache A already has version 101, you >>>>>>>just don?t know. If you cannot have a conversation around >>>>>>>consistent sets, it seems like even doing latest version discovery >>>>>>>is difficult with selector based discovery. From what I saw in >>>>>>>ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>authoritative source because you can never believe an intermediate >>>>>>>cache that there?s not something more recent. >>>>>>> >>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>interest in seeing your analysis. Case (a) is that a node can >>>>>>>correctly discover every version of a name prefix, and (b) is that >>>>>>>a node can correctly discover the latest version. We have not >>>>>>>formally compared (or yet published) our discovery protocols (we >>>>>>>have three, 2 for content, 1 for device) compared to selector based >>>>>>>discovery, so I cannot yet claim they are better, but they do not >>>>>>>have the non-determinism sketched above. >>>>>>> >>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>>must do in the PIT to match a content object. If you have a name >>>>>>>tree or a threaded hash table, those don?t all need to be hash >>>>>>>lookups, but you need to walk up the name tree for every prefix of >>>>>>>the content object name and evaluate the selector predicate. >>>>>>>Content Based Networking (CBN) had some some methods to create data >>>>>>>structures based on predicates, maybe those would be better. But >>>>>>>in any case, you will potentially need to retrieve many PIT entries >>>>>>>if there is Interest traffic for many prefixes of a root. Even on >>>>>>>an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>>lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>implementation only requires at most 3 lookups (one by name, one by >>>>>>>name + keyid, one by name + content object hash), and one can do >>>>>>>other things to optimize lookup for an extra write. >>>>>>> >>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>walking parent pointers, I suspect you?ll need locking of the >>>>>>>ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>>and that will be expensive. It would be interesting to see what a >>>>>>>cache consistent multi-threaded name tree looks like. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>wrote: >>>>>>>> >>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>> besides typed component: >>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>things? >>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>other >>>>>>>> faster technique to replace selector? >>>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>> >>>>>>>> >>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>envision we need to do, and with a few simple conventions on >>>>>>>>>>>how the registry of types is managed. >>>>>>>>>> >>>>>>>>>> Could you share it with us? >>>>>>>>> Sure. Here?s a strawman. >>>>>>>>> >>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>> >>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>>entire protocol, that gives us two options: >>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>likelihood there will be at least as much and probably more need >>>>>>>>>to component types than protocol extensions, we could reserve 1/2 >>>>>>>>>of the type space, giving us 32K types for name components. >>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>>and other fields of the protocol (sine they are sub-types of the >>>>>>>>>name type) we could reuse numbers and thereby have an entire 65K >>>>>>>>>name component types. >>>>>>>>> >>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>registry. If we ever get to the point of creating an IETF >>>>>>>>>standard, IANA has 25 years of experience running registries and >>>>>>>>>there are well-understood rule sets for different kinds of >>>>>>>>>registries (open, requires a written spec, requires standards >>>>>>>>>approval). >>>>>>>>> >>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>name?, which would be used on name prefixes and other common >>>>>>>>>cases where there are no special semantics on the name component. >>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>globally understood types that are part of the base or extension >>>>>>>>>NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>(say another 1024 types) >>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>> >>>>>>>>> Make sense? >>>>>>>>> >>>>>>>>> >>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>performance flaws in the design >>>>>>>>>> >>>>>>>>>> we could design for performance, >>>>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>>>>*not* design for known bad performance and hope serendipity or >>>>>>>>>Moore?s Law will come to the rescue. >>>>>>>>> >>>>>>>>>> but I think there will be a turning >>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>things that don?t get faster while others do tend to get dropped >>>>>>>>>or not used because they impose a performance penalty relative to >>>>>>>>>the things that go faster. There is also the ?low-end? phenomenon >>>>>>>>>where impovements in technology get applied to lowering cost >>>>>>>>>rather than improving performance. For those environments bad >>>>>>>>>performance just never get better. >>>>>>>>> >>>>>>>>>> Do you >>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>> performance improvement? >>>>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>>>>functions). >>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>require extra memory references. >>>>>>>>> >>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>>speculation based on 35+ years of seeing performance improve by 4 >>>>>>>>>orders of magnitude and still having to worry about counting >>>>>>>>>cycles and memory references? >>>>>>>>> >>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>>>>perform >>>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>>>becomes >>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>performance flaws in the design: >>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>>successfully will be relatively more expensive, even with >>>>>>>>>>>near-zero lock contention. >>>>>>>>>>> >>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>>its design. We just forgot those because the design elements >>>>>>>>>>>that depended on those mistakes have fallen into disuse. The >>>>>>>>>>>poster children for this are: >>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>>on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>>>anywhere >>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>>specified and is now a giant PITA that still causes major pain >>>>>>>>>>>in working around. >>>>>>>>>>> >>>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>>>of IP were flawless, as opposed to very good scientists and >>>>>>>>>>>engineers that got most of it right. >>>>>>>>>>> >>>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>>>>Now I >>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>>>types >>>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>>>additional >>>>>>>>>>>> marker space >>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>envision we need to do, and with a few simple conventions on >>>>>>>>>>>how the registry of types is managed. >>>>>>>>>>> >>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>>hands and letting applications design their own mutually >>>>>>>>>>>incompatible schemes or trying to make naming conventions with >>>>>>>>>>>markers in a way that is fast to generate/parse and also >>>>>>>>>>>resilient against aliasing. >>>>>>>>>>> >>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>>>convention >>>>>>>>>>>> needs to be revised. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>>>>to fit in (the >>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>>>>current NDN >>>>>>>>>>>>> experiments? >>>>>>>>>>>>> >>>>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>>>>Related: Many URLs >>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>>>>text lines, and >>>>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>>>>I see. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>>>>architectures, >>>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>>>>bytes in memory, >>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>>>>32-byte blocks >>>>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>>>>If you need to >>>>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>>>>have to read past >>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>>>>to get to the >>>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>>> >>>>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>>>>it is: >>>>>>>>>>>>> >>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>>>>from the start >>>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>>> OR >>>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>>> >>>>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>>>>seem to imply >>>>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>>>>case. >>>>>>>>>>>>> >>>>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>>>>for the index, >>>>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>>>>components. For the >>>>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>>>>5 components. >>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>>>>than >>>>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>>>>find that the >>>>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>>>>performance of the >>>>>>>>>>>>> direct access. >>>>>>>>>>>>> >>>>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>>>>all, for >>>>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>>>>offsets as >>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>>>>you may see a >>>>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>>>>than in >>>>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>>>>that you?re >>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>>>>name is to be >>>>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>>>>performance >>>>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>>>>bet this is not >>>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>>> >>>>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>>>>protocol with >>>>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>>>>the name to a >>>>>>>>>>>>> different processor than the index). >>>>>>>>>>>>> >>>>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>>>>would like to see >>>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>>> >>>>>>>>>>>>> Nacho >>>>>>>>>>>>> >>>>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>>>>correct me if >>>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>>> Protocol Architect >>>>>>>>>>>>> Principal Scientist >>>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>>>>amount of >>>>>>>>>>>>> bytes: >>>>>>>>>>>>> >>>>>>>>>>>>> i.e., >>>>>>>>>>>>> Type = Offsets >>>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>>> >>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>>>>name is too >>>>>>>>>>>>> long. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>you only >>>>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>>>>the >>>>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>>>>iteratively parse >>>>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>>>>directly >>>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>>>>"offset" is >>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>>>>parse x-1 >>>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>>>>like the >>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>>>>understand what >>>>>>>>>>>>> you >>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>>>>entirely >>>>>>>>>>>>> different >>>>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>>>>... >>>>>>>>>>>>> someplace >>>>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>>>>you say >>>>>>>>>>>>> "field >>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>>>>TLV)? >>>>>>>>>>>>> >>>>>>>>>>>>> Correct. >>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>>>>name >>>>>>>>>>>>> hierarchy >>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>>>>offset to use >>>>>>>>>>>>> in >>>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>>>>Aliasing is >>>>>>>>>>>>> avoided as >>>>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>>>>use the >>>>>>>>>>>>> "offset >>>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>>> >>>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>you only >>>>>>>>>>>>> want >>>>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>>>>offsets. >>>>>>>>>>>>> With the >>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>>>>x-1 >>>>>>>>>>>>> components. >>>>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>>>>firs x >>>>>>>>>>>>> components. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- Mark >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> The why is simple: >>>>>>>>>>>>> >>>>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>>>>"specific >>>>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>>>>in order >>>>>>>>>>>>> to >>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>>>>rule >>>>>>>>>>>>> (specify >>>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>>> >>>>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>>>>prefer >>>>>>>>>>>>> to >>>>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>>>>separator. Then, >>>>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>>>>used at >>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>>>>that >>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>>>>version, >>>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>>>>up with >>>>>>>>>>>>> names >>>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>>> components >>>>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>>>>is an >>>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>>> component's >>>>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>>>>that >>>>>>>>>>>>> applications MUST use). >>>>>>>>>>>>> >>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>>> explanation >>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>>>>e.g.) >>>>>>>>>>>>> and >>>>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>>>>have to >>>>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>>>>prefer >>>>>>>>>>>>> the >>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>>>>preference in >>>>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>>>>it is >>>>>>>>>>>>> that >>>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Mark >>>>>>>>>>>>> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jburke at remap.ucla.edu Mon Sep 22 15:11:58 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Mon, 22 Sep 2014 22:11:58 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: Hi Nacho, > >Discovery is needed, but what we need is not ?forwarder-level? discovery. Can you share with the list some use cases that can be employed to evaluate this statement? i.e., In what you have looked at, are there *no* applications that can benefit, or only some without sufficient impact? >Finally, it?s unlikely that medium and big routers will implement LPM and >selector matching. Cisco doesn?t think they work, Alcatel-Lucent doesn?t >think they work, PARC doesn't think they work; Huawei, what do you think? > Ericsson? Juniper? Anybody? > There seem to be at least three threads of equally valid work here: 1) what can applications benefit from; 2) how effective are the proposed architectural abstractions in providing those benefits; 3) how efficiently (and where in the architecture) can one implement what works well for a broad range of applications? This thread seems to be barreling around the third question without (at least from what I can tell) a lot about the first two. Marc mentioned that it's not possible for PARC to share the discovery alternative that you are thinking of, so I'm not sure how to compare solutions. But perhaps we can look at use cases. The types of applications that we have been considering most recently were outlined at NDNComm. In the long run, we plan to evaluate those use cases from the application development perspective by adapting usability frameworks like Green & Petre's cognitive dimensions framework. [1] Certainly performance--whether measured by round trips, application CPU cycles, router requirements, etc. is also important and will be evaluated, but other aspects of application development play a significant role here. In addition to the use cases, are you able to talk about how you are comparing the impact on application design and deployment? >IF your argument is that selectors are useful because they are used for >discovery, then why not use real selectors that implement full regular >expressions? Why not a full query language? > >In terms of performance, what is the current limit on selectors? How many >excludes can I have? (And what effect does that have on router >performance?) Are these enough? How many roundtrips do I need to take to >discover the data that I want? I?ve always found this a funny argument >because you?re basically saying: I?m not sure what I want (hence the >discovery protocol), but I?ll know what I want when I see it. To me, part of the innovation of ICN architectures like NDN and CCN is that they are data dissemination focused and as a result different consumption patterns can be applied to the same data. e.g., One application's real-time video stream is another's archival playout is another's multi-segment file to transfer. So if a small number of Interest selectors allow a simplification of interoperability between applications by enabling them to use the same patterns with a wide variety of data from both live and historical producers, this is a really significant benefit... especially in a research context that (for us) is encouraged by NSF to be application driven. One can always create a discovery protocol that directly talks to end nodes in NDN, but this doesn't necessarily negate the usefulness of selectors. If they end up broadly useful, perhaps we need to push on where and how they can be implemented a little more. Also, I wonder if the dichotomy between "exact match" and "wander the namespace" could be misleading. What about other application-level primitives like "best effort for fresh content", which seems natural for some web browsing, or search applications that use sortable feature vectors as names and LPM+selectors to implement nearest neighbor matches. These are some nice possibilities for NDN. Sure, they can all be done with explicit protocols, but perhaps there more tradeoffs such as publisher load or more roundtrips in those cases. Seems like there is still room for comparison and discussion by looking at where selectors work well in addition to where they appear to have problems. > >Well, maybe if you had a real protocol that could specify what you wanted >you would have gotten it in one round trip. So this is general protocol >performance. Not knowing exactly what you mean by a "real protocol" it seems to involve either some reliance on direct interaction with the producer or with caches that are deemed authoritative (which in turn brings up the latest-matching problem in and of itself). Could you clarify what you mean by "real" and whether it involves either of these? If the former, you're trading publisher load for round trips, which can again can be done in NDN anyway (e.g., protocols that pose discovery questions to end nodes). But I am a little confused about the emphasis on round-trips with no context here. If NDN is proposed to replace Layer 2, is the optimization of a single packet round-trip for an as-yet-undefined use case the gold standard? Can we identify the number of IP round trips in a specific application or two as a point of comparison? Thanks, Jeff [1] Green, Thomas R. G., and Marian Petre. "Usability analysis of visual programming environments: a ?cognitive dimensions? framework." Journal of Visual Languages & Computing 7.2 (1996): 131-174. From Marc.Mosko at parc.com Mon Sep 22 15:29:43 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Mon, 22 Sep 2014 22:29:43 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <93D4F3C6-A679-4AA4-88FE-CD809577A0F1@parc.com> Jeff, Take a look at my posting (that Felix fixed) in a new thread on Discovery. http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html I think it would be very productive to talk about what Discovery should do, and not focus on the how. It is sometimes easy to get caught up in the how, which I think is a less important topic than the what at this stage. Marc On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: > Marc, > > If you can't talk about your protocols, perhaps we can discuss this based > on use cases. What are the use cases you are using to evaluate > discovery? > > Jeff > > > > On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: > >> No matter what the expressiveness of the predicates if the forwarder can >> send interests different ways you don't have a consistent underlying set >> to talk about so you would always need non-range exclusions to discover >> every version. >> >> Range exclusions only work I believe if you get an authoritative answer. >> If different content pieces are scattered between different caches I >> don't see how range exclusions would work to discover every version. >> >> I'm sorry to be pointing out problems without offering solutions but >> we're not ready to publish our discovery protocols. >> >> Sent from my telephone >> >>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>> >>> I see. Can you briefly describe how ccnx discovery protocol solves the >>> all problems that you mentioned (not just exclude)? a doc will be >>> better. >>> >>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>> expect [and] and [or], so boolean algebra is fully supported. Regular >>> language or context free language might become part of selector too. >>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. >>>>> >>>>> I am very confused. For your example, if I want to get all today's >>>>> sensor data, I just do (Any..Last second of last day)(First second of >>>>> tomorrow..Any). That's 18 bytes. >>>>> >>>>> >>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>> >>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>> >>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>>> >>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>> >>>>>>> Could you explain why missing content object situation happens? also >>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>> exclude; >>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>> possible. >>>>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. For something like a sensor reading that is updated, >>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>> exclusions (plus encoding overhead) per day. >>>>>> >>>>>> yes, maybe using a more deterministic version number than a >>>>>> timestamp makes sense here, but its just an example of needing a lot >>>>>> of exclusions. >>>>>> >>>>>>> >>>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>>> cache B >>>>>>> >>>>>>> I feel this case is invalid because cache A will also get the >>>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>>> if >>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>> know >>>>>>> that v101 even exist? >>>>>> >>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>> forwarder is just doing ?best path? and can round-robin between cache >>>>>> A and cache B, then your application could miss v101. >>>>>> >>>>>>> >>>>>>> >>>>>>> c,d In general I agree that LPM performance is related to the number >>>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>>> every node will be faster or not because of lock overhead. >>>>>>> >>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>> (ndn >>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>> >>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>> ready to claim its better yet because we have not done that. >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>> >>>>>>>> This is probably getting off-topic from the original post about >>>>>>>> naming conventions. >>>>>>>> >>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>> content objects you want to discovery unless you avoid all range >>>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>> consistent sets is better. >>>>>>>> >>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>> should be transitive between caches, but imagine this. You send >>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>> consistent sets, it seems like even doing latest version discovery >>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>> authoritative source because you can never believe an intermediate >>>>>>>> cache that there?s not something more recent. >>>>>>>> >>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>>> have the non-determinism sketched above. >>>>>>>> >>>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>>>> structures based on predicates, maybe those would be better. But >>>>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>>> other things to optimize lookup for an extra write. >>>>>>>> >>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>>> and that will be expensive. It would be interesting to see what a >>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> >>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>>> besides typed component: >>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>> things? >>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>> other >>>>>>>>> faster technique to replace selector? >>>>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>> >>>>>>>>>>> Could you share it with us? >>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>> >>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>> >>>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>> likelihood there will be at least as much and probably more need >>>>>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>>>>> name component types. >>>>>>>>>> >>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>>> approval). >>>>>>>>>> >>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>> cases where there are no special semantics on the name component. >>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>> globally understood types that are part of the base or extension >>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>> (say another 1024 types) >>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>> >>>>>>>>>> Make sense? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>> performance flaws in the design >>>>>>>>>>> >>>>>>>>>>> we could design for performance, >>>>>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>> >>>>>>>>>>> but I think there will be a turning >>>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>>>>> or not used because they impose a performance penalty relative to >>>>>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>>> performance just never get better. >>>>>>>>>> >>>>>>>>>>> Do you >>>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>>> performance improvement? >>>>>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>>>>> functions). >>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>> require extra memory references. >>>>>>>>>> >>>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>> cycles and memory references? >>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>>>>> perform >>>>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>>>> becomes >>>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>> performance flaws in the design: >>>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>>>> near-zero lock contention. >>>>>>>>>>>> >>>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>>>> poster children for this are: >>>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>>>> anywhere >>>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>>>>>> in working around. >>>>>>>>>>>> >>>>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>>>> engineers that got most of it right. >>>>>>>>>>>> >>>>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>>>>> Now I >>>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>>>> types >>>>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>>>> additional >>>>>>>>>>>>> marker space >>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>> >>>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>>>> resilient against aliasing. >>>>>>>>>>>> >>>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>>>> convention >>>>>>>>>>>>> needs to be revised. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>>>>> to fit in (the >>>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>>>>> current NDN >>>>>>>>>>>>>> experiments? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>>>>> Related: Many URLs >>>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>>>>> text lines, and >>>>>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>>>>> I see. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>>>>> architectures, >>>>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>>>>> bytes in memory, >>>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>>>>> 32-byte blocks >>>>>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>>>>> If you need to >>>>>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>>>>> have to read past >>>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>>>>> to get to the >>>>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Marc >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>>>>> it is: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>>>>> from the start >>>>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>>>> OR >>>>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>>>> >>>>>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>>>>> seem to imply >>>>>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>>>>> case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>>>>> for the index, >>>>>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>>>>> components. For the >>>>>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>>>>> 5 components. >>>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>>>>> than >>>>>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>>>>> find that the >>>>>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>>>>> performance of the >>>>>>>>>>>>>> direct access. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>>>>> all, for >>>>>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>>>>> offsets as >>>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>>>>> you may see a >>>>>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>>>>> than in >>>>>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>>>>> that you?re >>>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>>>>> name is to be >>>>>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>>>>> bet this is not >>>>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>>>>> protocol with >>>>>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>>>>> the name to a >>>>>>>>>>>>>> different processor than the index). >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>>>>> would like to see >>>>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nacho >>>>>>>>>>>>>> >>>>>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>>>>> correct me if >>>>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>>>> Protocol Architect >>>>>>>>>>>>>> Principal Scientist >>>>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>>>>> amount of >>>>>>>>>>>>>> bytes: >>>>>>>>>>>>>> >>>>>>>>>>>>>> i.e., >>>>>>>>>>>>>> Type = Offsets >>>>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>>>>> name is too >>>>>>>>>>>>>> long. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Max >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>> you only >>>>>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>>>>> the >>>>>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>>>>> iteratively parse >>>>>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>>>>> directly >>>>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>>>>> "offset" is >>>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>>>>> parse x-1 >>>>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>>>>> like the >>>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>>>>> understand what >>>>>>>>>>>>>> you >>>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>>>>> entirely >>>>>>>>>>>>>> different >>>>>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> someplace >>>>>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>>>>> you say >>>>>>>>>>>>>> "field >>>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>>>>> TLV)? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Correct. >>>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>>>>> name >>>>>>>>>>>>>> hierarchy >>>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>>>>> offset to use >>>>>>>>>>>>>> in >>>>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>>>>> Aliasing is >>>>>>>>>>>>>> avoided as >>>>>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>>>>> use the >>>>>>>>>>>>>> "offset >>>>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>> you only >>>>>>>>>>>>>> want >>>>>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>>>>> offsets. >>>>>>>>>>>>>> With the >>>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>>>>> x-1 >>>>>>>>>>>>>> components. >>>>>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>>>>> firs x >>>>>>>>>>>>>> components. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Max >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- Mark >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> The why is simple: >>>>>>>>>>>>>> >>>>>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>>>>> "specific >>>>>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>>>>> in order >>>>>>>>>>>>>> to >>>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>>>>> rule >>>>>>>>>>>>>> (specify >>>>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>>>>> prefer >>>>>>>>>>>>>> to >>>>>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>>>>> separator. Then, >>>>>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>>>>> used at >>>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>>>>> that >>>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>>>>> version, >>>>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Max >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>>>>> up with >>>>>>>>>>>>>> names >>>>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>>>> components >>>>>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>>>>> is an >>>>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>>>> component's >>>>>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>>>>> that >>>>>>>>>>>>>> applications MUST use). >>>>>>>>>>>>>> >>>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>>>> explanation >>>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>>>>> e.g.) >>>>>>>>>>>>>> and >>>>>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>>>>> have to >>>>>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>>>>> prefer >>>>>>>>>>>>>> the >>>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>>>>> preference in >>>>>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>>>>> it is >>>>>>>>>>>>>> that >>>>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Mark >>>>>>>>>>>>>> >>>>>>>>>>>>>> . >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From jburke at remap.ucla.edu Mon Sep 22 15:36:26 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Mon, 22 Sep 2014 22:36:26 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <93D4F3C6-A679-4AA4-88FE-CD809577A0F1@parc.com> Message-ID: Hi Marc, Thanks ? yes, I saw that as well. I was just trying to get one step more specific, which was to see if we could identify a few specific use cases around which to have the conversation. (e.g., time series sensor data and web content retrieval for "get latest"; climate data for huge data sets; local data in a vehicular network; etc.) What have you been looking at that's driving considerations of discovery? Thanks, Jeff From: > Date: Mon, 22 Sep 2014 22:29:43 +0000 To: Jeff Burke > Cc: >, > Subject: Re: [Ndn-interest] any comments on naming convention? Jeff, Take a look at my posting (that Felix fixed) in a new thread on Discovery. http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html I think it would be very productive to talk about what Discovery should do, and not focus on the how. It is sometimes easy to get caught up in the how, which I think is a less important topic than the what at this stage. Marc On Sep 22, 2014, at 11:04 PM, Burke, Jeff > wrote: Marc, If you can't talk about your protocols, perhaps we can discuss this based on use cases. What are the use cases you are using to evaluate discovery? Jeff On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" > wrote: No matter what the expressiveness of the predicates if the forwarder can send interests different ways you don't have a consistent underlying set to talk about so you would always need non-range exclusions to discover every version. Range exclusions only work I believe if you get an authoritative answer. If different content pieces are scattered between different caches I don't see how range exclusions would work to discover every version. I'm sorry to be pointing out problems without offering solutions but we're not ready to publish our discovery protocols. Sent from my telephone On Sep 21, 2014, at 8:50, "Tai-Lin Chu" > wrote: I see. Can you briefly describe how ccnx discovery protocol solves the all problems that you mentioned (not just exclude)? a doc will be better. My unserious conjecture( :) ) : exclude is equal to [not]. I will soon expect [and] and [or], so boolean algebra is fully supported. Regular language or context free language might become part of selector too. On Sat, Sep 20, 2014 at 11:25 PM, > wrote: That will get you one reading then you need to exclude it and ask again. Sent from my telephone On Sep 21, 2014, at 8:22, "Tai-Lin Chu" > wrote: Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. I am very confused. For your example, if I want to get all today's sensor data, I just do (Any..Last second of last day)(First second of tomorrow..Any). That's 18 bytes. [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude On Sat, Sep 20, 2014 at 10:55 PM, > wrote: On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu > wrote: If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. Could you explain why missing content object situation happens? also range exclusion is just a shorter notation for many explicit exclude; converting from explicit excludes to ranged exclude is always possible. Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. You exclude through 100 then issue a new interest. This goes to cache B I feel this case is invalid because cache A will also get the interest, and cache A will return v101 if it exists. Like you said, if this goes to cache B only, it means that cache A dies. How do you know that v101 even exist? I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. c,d In general I agree that LPM performance is related to the number of components. In my own thread-safe LMP implementation, I used only one RWMutex for the whole tree. I don't know whether adding lock for every node will be faster or not because of lock overhead. However, we should compare (exact match + discovery protocol) vs (ndn lpm). Comparing performance of exact match to lpm is unfair. Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. On Sat, Sep 20, 2014 at 2:38 PM, > wrote: I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. This is probably getting off-topic from the original post about naming conventions. a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. Marc On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu > wrote: I had thought about these questions, but I want to know your idea besides typed component: 1. LPM allows "data discovery". How will exact match do similar things? 2. will removing selectors improve performance? How do we use other faster technique to replace selector? 3. fixed byte length and type. I agree more that type can be fixed byte, but 2 bytes for length might not be enough for future. On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) > wrote: On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu > wrote: I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. Could you share it with us? Sure. Here?s a strawman. The type space is 16 bits, so you have 65,565 types. The type space is currently shared with the types used for the entire protocol, that gives us two options: (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. - We reserve some portion of the space for unanticipated uses (say another 1024 types) - We give the rest of the space to application assignment. Make sense? While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design we could design for performance, That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. but I think there will be a turning point when the slower design starts to become "fast enough?. Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. Do you think there will be some design of ndn that will *never* have performance improvement? I suspect LPM on data will always be slow (relative to the other functions). i suspect exclusions will always be slow because they will require extra memory references. However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) > wrote: On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu > wrote: We should not look at a certain chip nowadays and want ndn to perform well on it. It should be the other way around: once ndn app becomes popular, a better chip will be designed for ndn. While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: a) clock rates are not getting (much) faster b) memory accesses are getting (relatively) more expensive c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. I feel the discussion today and yesterday has been off-topic. Now I see that there are 3 approaches: 1. we should not define a naming convention at all 2. typed component: use tlv type space and add a handful of types 3. marked component: introduce only one more type and add additional marker space I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. Also everybody thinks that the current utf8 marker naming convention needs to be revised. On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe > wrote: Would that chip be suitable, i.e. can we expect most names to fit in (the magnitude of) 96 bytes? What length are names usually in current NDN experiments? I guess wide deployment could make for even longer names. Related: Many URLs I encounter nowadays easily don't fit within two 80-column text lines, and NDN will have to carry more information than URLs, as far as I see. On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: In fact, the index in separate TLV will be slower on some architectures, like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, then any subsequent memory is accessed only as two adjacent 32-byte blocks (there can be at most 5 blocks available at any one time). If you need to switch between arrays, it would be very expensive. If you have to read past the name to get to the 2nd array, then read it, then backup to get to the name, it will be pretty expensive too. Marc On Sep 18, 2014, at 2:02 PM, > > wrote: Does this make that much difference? If you want to parse the first 5 components. One way to do it is: Read the index, find entry 5, then read in that many bytes from the start offset of the beginning of the name. OR Start reading name, (find size + move ) 5 times. How much speed are you getting from one to the other? You seem to imply that the first one is faster. I don?t think this is the case. In the first one you?ll probably have to get the cache line for the index, then all the required cache lines for the first 5 components. For the second, you?ll have to get all the cache lines for the first 5 components. Given an assumption that a cache miss is way more expensive than evaluating a number and computing an addition, you might find that the performance of the index is actually slower than the performance of the direct access. Granted, there is a case where you don?t access the name at all, for example, if you just get the offsets and then send the offsets as parameters to another processor/GPU/NPU/etc. In this case you may see a gain IF there are more cache line misses in reading the name than in reading the index. So, if the regular part of the name that you?re parsing is bigger than the cache line (64 bytes?) and the name is to be processed by a different processor, then your might see some performance gain in using the index, but in all other circumstances I bet this is not the case. I may be wrong, haven?t actually tested it. This is all to say, I don?t think we should be designing the protocol with only one architecture in mind. (The architecture of sending the name to a different processor than the index). If you have numbers that show that the index is faster I would like to see under what conditions and architectural assumptions. Nacho (I may have misinterpreted your description so feel free to correct me if I?m wrong.) -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/18/14, 12:54 AM, "Massimo Gallo" > wrote: Indeed each components' offset must be encoded using a fixed amount of bytes: i.e., Type = Offsets Length = 10 Bytes Value = Offset1(1byte), Offset2(1byte), ... You may also imagine to have a "Offset_2byte" type if your name is too long. Max On 18/09/2014 09:27, Tai-Lin Chu wrote: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. I don't get it. What you described only works if the "offset" is encoded in fixed bytes. With varNum, you will still need to parse x-1 offsets to get to the x offset. On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > wrote: On 17/09/2014 14:56, Mark Stapp wrote: ah, thanks - that's helpful. I thought you were saying "I like the existing NDN UTF8 'convention'." I'm still not sure I understand what you _do_ prefer, though. it sounds like you're describing an entirely different scheme where the info that describes the name-components is ... someplace other than _in_ the name-components. is that correct? when you say "field separator", what do you mean (since that's not a "TL" from a TLV)? Correct. In particular, with our name encoding, a TLV indicates the name hierarchy with offsets in the name and other TLV(s) indicates the offset to use in order to retrieve special components. As for the field separator, it is something like "/". Aliasing is avoided as you do not rely on field separators to parse the name; you use the "offset TLV " to do that. So now, it may be an aesthetic question but: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. Max -- Mark On 9/17/14 6:02 AM, Massimo Gallo wrote: The why is simple: You use a lot of "generic component type" and very few "specific component type". You are imposing types for every component in order to handle few exceptions (segmentation, etc..). You create a rule (specify the component's type ) to handle exceptions! I would prefer not to have typed components. Instead I would prefer to have the name as simple sequence bytes with a field separator. Then, outside the name, if you have some components that could be used at network layer (e.g. a TLV field), you simply need something that indicates which is the offset allowing you to retrieve the version, segment, etc in the name... Max On 16/09/2014 20:33, Mark Stapp wrote: On 9/16/14 10:29 AM, Massimo Gallo wrote: I think we agree on the small number of "component types". However, if you have a small number of types, you will end up with names containing many generic components types and few specific components types. Due to the fact that the component type specification is an exception in the name, I would prefer something that specify component's type only when needed (something like UTF8 conventions but that applications MUST use). so ... I can't quite follow that. the thread has had some explanation about why the UTF8 requirement has problems (with aliasing, e.g.) and there's been email trying to explain that applications don't have to use types if they don't need to. your email sounds like "I prefer the UTF8 convention", but it doesn't say why you have that preference in the face of the points about the problems. can you say why it is that you express a preference for the "convention" with problems ? Thanks, Mark . _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From jburke at remap.ucla.edu Mon Sep 22 23:01:41 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Tue, 23 Sep 2014 06:01:41 +0000 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: <54208AA4.9060709@rabe.io> Message-ID: Hi Felix, Unfortunately the live stream records are what they are - some quirks in the early recording can't be fixed. We should have separate local recordings as well, but are pretty swamped right now. Is there something in particular you'd like to see posted? Jeff On 9/23/14, 12:46 AM, "Felix Rabe" wrote: >Hi list (or, REMAP) > >The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are >they available as downloads somewhere? > >Also, I see some videos are barely viewable (at least [1], but [2] seems >to be fine), skipping a few seconds every few seconds. Do you still have >a complete version? > >[1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 >[2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 > >Kind regards >- Felix >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From felix at rabe.io Mon Sep 22 23:35:52 2014 From: felix at rabe.io (Felix Rabe) Date: Tue, 23 Sep 2014 08:35:52 +0200 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: References: <54208AA4.9060709@rabe.io> Message-ID: <542114C8.7020802@rabe.io> Hi Jeff Don't want to bother you right now, but I'd like to review them sometime in the next month, as my notes are incomplete in many places. Also, Xiaoke mentioned that the streamed versions are unsuitable for folks in China with a slow connection. They would still like to see them. - Felix On 23/Sep/14 08:01, Burke, Jeff wrote: > Hi Felix, > > Unfortunately the live stream records are what they are - some quirks in > the early recording can't be fixed. > > We should have separate local recordings as well, but are pretty swamped > right now. Is there something in particular you'd like to see posted? > > Jeff > > > > On 9/23/14, 12:46 AM, "Felix Rabe" wrote: > >> Hi list (or, REMAP) >> >> The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are >> they available as downloads somewhere? >> >> Also, I see some videos are barely viewable (at least [1], but [2] seems >> to be fine), skipping a few seconds every few seconds. Do you still have >> a complete version? >> >> [1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 >> [2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 >> >> Kind regards >> - Felix >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From xhbreezehu at gmail.com Mon Sep 22 23:46:46 2014 From: xhbreezehu at gmail.com (Hu, Xiaoyan) Date: Tue, 23 Sep 2014 14:46:46 +0800 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: <542114C8.7020802@rabe.io> References: <54208AA4.9060709@rabe.io> <542114C8.7020802@rabe.io> Message-ID: Hi Jeff, Yes indeed. It is really a slow connection. Really appreciate if there would be download links offered. Thanks very much! Best Regards, Xiaoyan Hu PhD Candidate School of Computer Science and Engineering, Southeast University, NanJing, China (Post code: 211189) xhbreezehu at gmail dot com +86-186-5187-8116 On Tue, Sep 23, 2014 at 2:35 PM, Felix Rabe wrote: > Hi Jeff > > Don't want to bother you right now, but I'd like to review them sometime > in the next month, as my notes are incomplete in many places. > > Also, Xiaoke mentioned that the streamed versions are unsuitable for folks > in China with a slow connection. They would still like to see them. > > - Felix > > > > On 23/Sep/14 08:01, Burke, Jeff wrote: > >> Hi Felix, >> >> Unfortunately the live stream records are what they are - some quirks in >> the early recording can't be fixed. >> >> We should have separate local recordings as well, but are pretty swamped >> right now. Is there something in particular you'd like to see posted? >> >> Jeff >> >> >> >> On 9/23/14, 12:46 AM, "Felix Rabe" wrote: >> >> Hi list (or, REMAP) >>> >>> The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are >>> they available as downloads somewhere? >>> >>> Also, I see some videos are barely viewable (at least [1], but [2] seems >>> to be fine), skipping a few seconds every few seconds. Do you still have >>> a complete version? >>> >>> [1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 >>> [2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 >>> >>> Kind regards >>> - Felix >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Mon Sep 22 23:46:53 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Mon, 22 Sep 2014 23:46:53 -0700 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: <542114C8.7020802@rabe.io> References: <54208AA4.9060709@rabe.io> <542114C8.7020802@rabe.io> Message-ID: I was watching it a week after ndncomm. Besides slow connection, the other painful thing is that I cannot fast forward/backward, and seek. On Mon, Sep 22, 2014 at 11:35 PM, Felix Rabe wrote: > Hi Jeff > > Don't want to bother you right now, but I'd like to review them sometime in > the next month, as my notes are incomplete in many places. > > Also, Xiaoke mentioned that the streamed versions are unsuitable for folks > in China with a slow connection. They would still like to see them. > > - Felix > > > > On 23/Sep/14 08:01, Burke, Jeff wrote: >> >> Hi Felix, >> >> Unfortunately the live stream records are what they are - some quirks in >> the early recording can't be fixed. >> >> We should have separate local recordings as well, but are pretty swamped >> right now. Is there something in particular you'd like to see posted? >> >> Jeff >> >> >> >> On 9/23/14, 12:46 AM, "Felix Rabe" wrote: >> >>> Hi list (or, REMAP) >>> >>> The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are >>> they available as downloads somewhere? >>> >>> Also, I see some videos are barely viewable (at least [1], but [2] seems >>> to be fine), skipping a few seconds every few seconds. Do you still have >>> a complete version? >>> >>> [1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 >>> [2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 >>> >>> Kind regards >>> - Felix >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From ithkuil at gmail.com Mon Sep 22 22:11:02 2014 From: ithkuil at gmail.com (Jason Livesay) Date: Mon, 22 Sep 2014 22:11:02 -0700 Subject: [Ndn-interest] NDN and the Metaverse Message-ID: Hello, wondering if anyone has built a virtual reality system on top of NDN? Here is my idea: https://github.com/runvnc/vr Obviously I am just starting to learn about NDN, but I hope I have understood some main concepts well enough. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From caozhenpku at gmail.com Tue Sep 23 01:19:09 2014 From: caozhenpku at gmail.com (Zhen Cao) Date: Tue, 23 Sep 2014 10:19:09 +0200 Subject: [Ndn-interest] Middlebox in NDN Era In-Reply-To: References: Message-ID: Hi Junxiao, Thanks for the analysis. It is helpful. Regards, zhen On Mon, Sep 22, 2014 at 7:02 PM, Junxiao Shi wrote: > Hi Zhen > > To align with RFC3234 definition, in NDN: > A middlebox is defined as any intermediary device performing functions other > than the normal, standard functions of an NDN router on the datagram path > between a consumer and producer. > > Standard functions of an NDN router include: > > forward Interest by Name > return Data by PIT states > cache Data > > Middlebox functions can include: > > translate between routable and non-routable Names > perform Data validation > filter packets according to firewall rules > > > Yours, Junxiao From Marc.Mosko at parc.com Tue Sep 23 02:34:58 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Tue, 23 Sep 2014 09:34:58 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> Ok, yes I think those would all be good things. One thing to keep in mind, especially with things like time series sensor data, is that people see a pattern and infer a way of doing it. That?s easy for a human :) But in Discovery, one should assume that one does not know of patterns in the data beyond what the protocols used to publish the data explicitly require. That said, I think some of the things you listed are good places to start: sensor data, web content, climate data or genome data. We also need to state what the forwarding strategies are and what the cache behavior is. I outlined some of the points that I think are important in that other posting. While ?discover latest? is useful, ?discover all? is also important, and that one gets complicated fast. So points like separating discovery from retrieval and working with large data sets have been important in shaping our thinking. That all said, I?d be happy starting from 0 and working through the Discovery service definition from scratch along with data set use cases. Marc On Sep 23, 2014, at 12:36 AM, Burke, Jeff wrote: > Hi Marc, > > Thanks ? yes, I saw that as well. I was just trying to get one step more specific, which was to see if we could identify a few specific use cases around which to have the conversation. (e.g., time series sensor data and web content retrieval for "get latest"; climate data for huge data sets; local data in a vehicular network; etc.) What have you been looking at that's driving considerations of discovery? > > Thanks, > Jeff > > From: > Date: Mon, 22 Sep 2014 22:29:43 +0000 > To: Jeff Burke > Cc: , > Subject: Re: [Ndn-interest] any comments on naming convention? > >> Jeff, >> >> Take a look at my posting (that Felix fixed) in a new thread on Discovery. >> >> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html >> >> I think it would be very productive to talk about what Discovery should do, and not focus on the how. It is sometimes easy to get caught up in the how, which I think is a less important topic than the what at this stage. >> >> Marc >> >> On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: >> >>> Marc, >>> >>> If you can't talk about your protocols, perhaps we can discuss this based >>> on use cases. What are the use cases you are using to evaluate >>> discovery? >>> >>> Jeff >>> >>> >>> >>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: >>> >>>> No matter what the expressiveness of the predicates if the forwarder can >>>> send interests different ways you don't have a consistent underlying set >>>> to talk about so you would always need non-range exclusions to discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>>> >>>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>> better. >>>>> >>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>>> language or context free language might become part of selector too. >>>>> >>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>> That will get you one reading then you need to exclude it and ask >>>>>> again. >>>>>> >>>>>> Sent from my telephone >>>>>> >>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>>>> >>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>> with a particular cache, then you need to always use individual >>>>>>>> excludes not range excludes if you want to discover all the versions >>>>>>>> of an object. >>>>>>> >>>>>>> I am very confused. For your example, if I want to get all today's >>>>>>> sensor data, I just do (Any..Last second of last day)(First second of >>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>> >>>>>>> >>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>> >>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>>>>> >>>>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>> >>>>>>>>> Could you explain why missing content object situation happens? also >>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>> exclude; >>>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>>> possible. >>>>>>>> >>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>> with a particular cache, then you need to always use individual >>>>>>>> excludes not range excludes if you want to discover all the versions >>>>>>>> of an object. For something like a sensor reading that is updated, >>>>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>> >>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>> timestamp makes sense here, but its just an example of needing a lot >>>>>>>> of exclusions. >>>>>>>> >>>>>>>>> >>>>>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>>>>> cache B >>>>>>>>> >>>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>>>>> if >>>>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>>>> know >>>>>>>>> that v101 even exist? >>>>>>>> >>>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>> forwarder is just doing ?best path? and can round-robin between cache >>>>>>>> A and cache B, then your application could miss v101. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> c,d In general I agree that LPM performance is related to the number >>>>>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>> >>>>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>>>> (ndn >>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>> >>>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>>> >>>>>>>>>> This is probably getting off-topic from the original post about >>>>>>>>>> naming conventions. >>>>>>>>>> >>>>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>>>> content objects you want to discovery unless you avoid all range >>>>>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>>> consistent sets is better. >>>>>>>>>> >>>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>>> should be transitive between caches, but imagine this. You send >>>>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>>> consistent sets, it seems like even doing latest version discovery >>>>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>>>> authoritative source because you can never believe an intermediate >>>>>>>>>> cache that there?s not something more recent. >>>>>>>>>> >>>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>>>>> have the non-determinism sketched above. >>>>>>>>>> >>>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>>>>>> structures based on predicates, maybe those would be better. But >>>>>>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>>> >>>>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>>>>> and that will be expensive. It would be interesting to see what a >>>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>>>>> besides typed component: >>>>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>>>> things? >>>>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>>>> other >>>>>>>>>>> faster technique to replace selector? >>>>>>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you share it with us? >>>>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>>>> >>>>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>>>> >>>>>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>>>> likelihood there will be at least as much and probably more need >>>>>>>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>>>>>>> name component types. >>>>>>>>>>>> >>>>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>>>>> approval). >>>>>>>>>>>> >>>>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>>>> cases where there are no special semantics on the name component. >>>>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>>>> globally understood types that are part of the base or extension >>>>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>>>> (say another 1024 types) >>>>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>>>> >>>>>>>>>>>> Make sense? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>>> performance flaws in the design >>>>>>>>>>>>> >>>>>>>>>>>>> we could design for performance, >>>>>>>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>>>> >>>>>>>>>>>>> but I think there will be a turning >>>>>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>>>>>>> or not used because they impose a performance penalty relative to >>>>>>>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>>>>> performance just never get better. >>>>>>>>>>>> >>>>>>>>>>>>> Do you >>>>>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>>>>> performance improvement? >>>>>>>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>>>>>>> functions). >>>>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>>>> require extra memory references. >>>>>>>>>>>> >>>>>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>>>> cycles and memory references? >>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>>>>>>> perform >>>>>>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>>>>>> becomes >>>>>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>>> performance flaws in the design: >>>>>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>>>>>> near-zero lock contention. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>>>>>> poster children for this are: >>>>>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>>>>>> anywhere >>>>>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>>>>>>>> in working around. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>>>>>> engineers that got most of it right. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>>>>>>> Now I >>>>>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>>>>>> types >>>>>>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>>>>>> additional >>>>>>>>>>>>>>> marker space >>>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>>>>>> resilient against aliasing. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>>>>>> convention >>>>>>>>>>>>>>> needs to be revised. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>>>>>>> to fit in (the >>>>>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>>>>>>> current NDN >>>>>>>>>>>>>>>> experiments? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>>>>>>> Related: Many URLs >>>>>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>>>>>>> text lines, and >>>>>>>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>>>>>>> I see. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>>>>>>> architectures, >>>>>>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>>>>>>> bytes in memory, >>>>>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>>>>>>> 32-byte blocks >>>>>>>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>>>>>>> If you need to >>>>>>>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>>>>>>> have to read past >>>>>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>>>>>>> to get to the >>>>>>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Marc >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>>>>>>> it is: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>>>>>>> from the start >>>>>>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>>>>>> OR >>>>>>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>>>>>>> seem to imply >>>>>>>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>>>>>>> case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>>>>>>> for the index, >>>>>>>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>>>>>>> components. For the >>>>>>>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>>>>>>> 5 components. >>>>>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>>>>>>> find that the >>>>>>>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>>>>>>> performance of the >>>>>>>>>>>>>>>> direct access. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>>>>>>> all, for >>>>>>>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>>>>>>> offsets as >>>>>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>>>>>>> you may see a >>>>>>>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>>>>>>> than in >>>>>>>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>>>>>>> that you?re >>>>>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>>>>>>> name is to be >>>>>>>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>>>>>>> bet this is not >>>>>>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>>>>>>> protocol with >>>>>>>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>>>>>>> the name to a >>>>>>>>>>>>>>>> different processor than the index). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>>>>>>> would like to see >>>>>>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Nacho >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>>>>>>> correct me if >>>>>>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>>>>>> Protocol Architect >>>>>>>>>>>>>>>> Principal Scientist >>>>>>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>>>>>>> amount of >>>>>>>>>>>>>>>> bytes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> i.e., >>>>>>>>>>>>>>>> Type = Offsets >>>>>>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>>>>>>> name is too >>>>>>>>>>>>>>>> long. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Max >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>>>> you only >>>>>>>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>>>>>>> iteratively parse >>>>>>>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>>>>>>> directly >>>>>>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>>>>>>> "offset" is >>>>>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>>>>>>> parse x-1 >>>>>>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>>>>>>> like the >>>>>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>>>>>>> understand what >>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>>>>>>> entirely >>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> someplace >>>>>>>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>>>>>>> you say >>>>>>>>>>>>>>>> "field >>>>>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>>>>>>> TLV)? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Correct. >>>>>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>>>>>>> name >>>>>>>>>>>>>>>> hierarchy >>>>>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>>>>>>> offset to use >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>>>>>>> Aliasing is >>>>>>>>>>>>>>>> avoided as >>>>>>>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>>>>>>> use the >>>>>>>>>>>>>>>> "offset >>>>>>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>>>> you only >>>>>>>>>>>>>>>> want >>>>>>>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>>>>>>> offsets. >>>>>>>>>>>>>>>> With the >>>>>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>>>>>>> x-1 >>>>>>>>>>>>>>>> components. >>>>>>>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>>>>>>> firs x >>>>>>>>>>>>>>>> components. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Max >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- Mark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The why is simple: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>>>>>>> "specific >>>>>>>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>>>>>>> in order >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>>>>>>> rule >>>>>>>>>>>>>>>> (specify >>>>>>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>>>>>>> prefer >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>>>>>>> separator. Then, >>>>>>>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>>>>>>> used at >>>>>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>>>>>>> version, >>>>>>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Max >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>>>>>>> up with >>>>>>>>>>>>>>>> names >>>>>>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>>>>>> components >>>>>>>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>>>>>>> is an >>>>>>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>>>>>> component's >>>>>>>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> applications MUST use). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>>>>>> explanation >>>>>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>>>>>>> e.g.) >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>>>>>>> have to >>>>>>>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>>>>>>> prefer >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>>>>>>> preference in >>>>>>>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Mark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Marc.Mosko at parc.com Tue Sep 23 03:12:08 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Tue, 23 Sep 2014 10:12:08 +0000 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: <18D3BE4F-0F7B-49C5-9AF0-B16581F2A34B@parc.com> One could always just double hash the key to get the keyid for hmac. Personally, I would think that in general hmac keys need to be agreed on by a key exchange protocol so they are rotated periodically. However that agreement protocol identifies keys could also be used as the keyid, such as a small integer. Marc On Sep 22, 2014, at 7:02 PM, Thompson, Jeff wrote: > Hash output length vs. block size? > > Marc is right. The HmacWithSha256 algorithm hashes the key when it is longer than the block size (64 bytes), not the hash output length (32 bytes). So we would prohibit keys longer than 64 bytes (not 32 bytes). > > Also, most applications will use a crypto library's HMAC function which should automatically hash the key if it is longer than the block size. It could be confusing to put this in the spec since the application writer may unnecessarily has the key when the crypto library will do it anyway, and more efficiently. > > - Jeff T > > From: , Jeff Thompson > Date: Monday, September 22, 2014 9:53 AM > To: Yingdi Yu , Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types > > On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu wrote: > > However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other. > > Oh crap, you're right! Thanks for pointing that out. I agree that we should prohibit KeyDigest in the case that the key is longer than 32 bytes. I would reword the reason slightly: "When the key is longer than 32 bytes, the HmacWithSha256 algorithm uses the SHA-256 digest of the key, which would be the same as the KeyDigest, effectively exposing the secret." > > - Jeff T > > From: Yingdi Yu > Date: Sunday, September 21, 2014 12:00 AM > To: Adeola Bannis > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types > > On Sep 20, 2014, at 11:18 AM, Adeola Bannis wrote: > >> >> On Fri, Sep 19, 2014 at 11:18 PM, Yingdi Yu wrote: >>> >>> @Adeola, you probably want to forbid KeyDigest in KeyLocator for this HMAC signature. Because if key size is longer than hash output, the key digest is used instead. If we allow KeyDigest in KeyLocator, then some careless programmers may leak the secret. >> >> Well, a careless programmer could put a passphrase, or something used to derive the key, into a KeyLocator KeyName as well. We can go ahead and make the restriction, but there are other ways a programmer could shoot herself in the foot here. > > I did not mean that "careless" as you describe. I mean those who do not know the details of HMAC. > > The problem is that when key is shorter than hash output, it is safe to put key digest in the key locator. However, when key is longer than the hash output, putting key digest into key locator directly expose the secret to other, i.e., attackers can directly use the key digest to construct any legitimate HMAC. Therefore you either explicitly specify what should be used when KeyDigest is used as the KeyLocator, or you can disable the usage of KeyDigest in KeyLocator. > > Yingdi > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From gts at ics.uci.EDU Tue Sep 23 08:28:20 2014 From: gts at ics.uci.EDU (Gene Tsudik) Date: Tue, 23 Sep 2014 08:28:20 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: <18D3BE4F-0F7B-49C5-9AF0-B16581F2A34B@parc.com> References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> <18D3BE4F-0F7B-49C5-9AF0-B16581F2A34B@parc.com> Message-ID: I would suggest *not* to use the (single or double) hash of the key itself as the key-id. One simple (and reasonably secure) way of computing key-id is as HMAC(key,string) where "string" is drawn from a set of non-secret session values, e.g., timestamp, seq#, endpoint names/addresses, etc. Cheers, Gene On Tue, Sep 23, 2014 at 3:12 AM, wrote: > One could always just double hash the key to get the keyid for hmac. > > Personally, I would think that in general hmac keys need to be agreed on > by a key exchange protocol so they are rotated periodically. However that > agreement protocol identifies keys could also be used as the keyid, such as > a small integer. > > Marc > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shijunxiao at email.arizona.edu Tue Sep 23 08:47:32 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Tue, 23 Sep 2014 08:47:32 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: Dear folks It appears to me that HMAC, as a signing algorithm that requires a symmetric key, is suitable only for realtime applications with a small number of mutually trusted participants. The reasons are: - Participants must be mutually trusted, because any participant who wants to verify Data must know the symmetric key, and knowing the symmetric key allows a participant to sign Data as well. - Verifying old Data requires knowing the symmetric key. However, it's impossible to establish mutual trust when the participant who generated the Data is gone. Thus HMAC is suitable for realtime applications only, where all participants are still alive. To use HMAC signing, we must first use existing pubkey signing methods to establish mutual trust between participants, and negotiate a symmetric key. This key should also be rotated periodically. The mechanism for key management should be designed, in order for HMAC signing to be useful. Yours, Junxiao -------------- next part -------------- An HTML attachment was scrubbed... URL: From gts at ics.uci.EDU Tue Sep 23 09:04:33 2014 From: gts at ics.uci.EDU (Gene Tsudik) Date: Tue, 23 Sep 2014 09:04:33 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: Message-ID: >>> The mechanism for key management should be designed, in order for HMAC signing to be useful. Just as a word of caution, I hope that, instead of designing a homegrown (and quite possibly insecure) NDN- or CCNx-specific key management approach, people consult ample prior work that contains many such protocols, i.e.., using public key-based authenticated key exchange/agreement to yield a symmetric key. (By "prior work" I don't just mean papers but also real deployed protocols.) Cheers, Gene On Tue, Sep 23, 2014 at 8:47 AM, Junxiao Shi wrote: > Dear folks > > It appears to me that HMAC, as a signing algorithm that requires a > symmetric key, is suitable only for realtime applications with a small > number of mutually trusted participants. > The reasons are: > > - Participants must be mutually trusted, because any participant who > wants to verify Data must know the symmetric key, and knowing the symmetric > key allows a participant to sign Data as well. > - Verifying old Data requires knowing the symmetric key. However, it's > impossible to establish mutual trust when the participant who generated the > Data is gone. Thus HMAC is suitable for realtime applications only, where > all participants are still alive. > > > To use HMAC signing, we must first use existing pubkey signing methods to > establish mutual trust between participants, and negotiate a symmetric key. > This key should also be rotated periodically. > The mechanism for key management should be designed, in order for HMAC > signing to be useful. > > Yours, Junxiao > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yingdi at CS.UCLA.EDU Tue Sep 23 09:51:42 2014 From: yingdi at CS.UCLA.EDU (Yingdi Yu) Date: Tue, 23 Sep 2014 09:51:42 -0700 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: Hi Jeff, On Sep 22, 2014, at 10:02 AM, Thompson, Jeff wrote: > Hash output length vs. block size? > > Marc is right. The HmacWithSha256 algorithm hashes the key when it is longer than the block size (64 bytes), not the hash output length (32 bytes). So we would prohibit keys longer than 64 bytes (not 32 bytes). In Adeola's spec, the block size is not required to be 64 bytes. The RFC does not require the block size to be 64. I wonder if we should clarify that in the spec? > Also, most applications will use a crypto library's HMAC function which should automatically hash the key if it is longer than the block size. It could be confusing to put this in the spec since the application writer may unnecessarily has the key when the crypto library will do it anyway, and more efficiently. I think the purpose of this spec is not only for application developers, but also for library developers, especially if we want to have good interoperability. So I think it would be better to clearly specify what is the requirement of generating a signature, how to generate a signature and what should be put into the SignatureInfo. Moreover, how to definition of KeyDigest is important in the validation part, so that data consumers can determine which symmetric key should be used to verify the HMAC. I agree with Marc that, for key longer than the block size, we can use the digest of digest of the key as the key id because the digest of the key is the actual key in this case. Yingdi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: From nano at remap.ucla.edu Tue Sep 23 12:21:07 2014 From: nano at remap.ucla.edu (Alex Horn) Date: Tue, 23 Sep 2014 12:21:07 -0700 Subject: [Ndn-interest] NDN and the Metaverse In-Reply-To: References: Message-ID: Jason, closest work is likely https://github.com/remap/ndn-mog unity game engine for display NDN for world partitioning & state syncing. refines earlier work presented in this tech report: http://named-data.net/publications/techreports/tregalcar/ cheers Alex On Mon, Sep 22, 2014 at 10:11 PM, Jason Livesay wrote: > Hello, wondering if anyone has built a virtual reality system on top of > NDN? Here is my idea: https://github.com/runvnc/vr > > Obviously I am just starting to learn about NDN, but I hope I have > understood some main concepts well enough. > > Thanks. > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanwang at memphis.edu Tue Sep 23 13:12:56 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Tue, 23 Sep 2014 20:12:56 +0000 Subject: [Ndn-interest] new PhD student looking to get involved in NDN In-Reply-To: References: Message-ID: <7BF6066F-2C57-44C5-B042-194B7B90A97C@memphis.edu> Stephen, What general research area in NDN are you looking for a problem? Trust model is definitely one of the areas that needs more work. On the other hand, I think "can I write such-and-such application" is a good starting point to get you think in an NDN way. Without this experience, sometimes the research becomes irrelevant (not addressing the right problems). Lan On Sep 22, 2014, at 2:26 PM, Stephen Herwig wrote: > Hi, > > I just started a PhD program in computer science at the University of Maryland. Over the past month, I've been reading about NDN and playing with v0.3, and I find it very interesting. > > I'm thinking of ways to get involved, and wondering what the critical problems are. For instance, distributed trust-models seem pervasive. On the other hand, a lot of the work seems to be protocol experimentation (e.g., can I write such-and-such application in NDN). > > In short, I'm just sort of wondering what the difficult problems are that the community is working on. Hopefully, I'll be able to help out. > > Thanks, > Stephen > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jefft0 at remap.UCLA.EDU Tue Sep 23 14:28:23 2014 From: jefft0 at remap.UCLA.EDU (Thompson, Jeff) Date: Tue, 23 Sep 2014 21:28:23 +0000 Subject: [Ndn-interest] Adding HMAC to available NDN signature types In-Reply-To: References: <7BB03D9C-A407-444F-A07E-4EB2370DF504@cs.ucla.edu> <6F30B27A-8E53-4B36-9557-E07D16A57EBE@cs.ucla.edu> Message-ID: > In Adeola's spec, the block size is not required to be 64 bytes. The RFC does not require the block size to be 64. I wonder if we should clarify that in the spec? The RFC is is general for any hash algorithm, but Adeola's spec is specific to SHA-256. Since the spec is specific to SHA-256, I think we should show the exact values. In this case, the block size is 64 bytes. - Jeff T From: Yingdi Yu > Date: Tuesday, September 23, 2014 9:51 AM To: Jeff Thompson > Cc: Adeola Bannis >, "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] Adding HMAC to available NDN signature types Hi Jeff, On Sep 22, 2014, at 10:02 AM, Thompson, Jeff > wrote: Hash output length vs. block size? Marc is right. The HmacWithSha256 algorithm hashes the key when it is longer than the block size (64 bytes), not the hash output length (32 bytes). So we would prohibit keys longer than 64 bytes (not 32 bytes). In Adeola's spec, the block size is not required to be 64 bytes. The RFC does not require the block size to be 64. I wonder if we should clarify that in the spec? Also, most applications will use a crypto library's HMAC function which should automatically hash the key if it is longer than the block size. It could be confusing to put this in the spec since the application writer may unnecessarily has the key when the crypto library will do it anyway, and more efficiently. I think the purpose of this spec is not only for application developers, but also for library developers, especially if we want to have good interoperability. So I think it would be better to clearly specify what is the requirement of generating a signature, how to generate a signature and what should be put into the SignatureInfo. Moreover, how to definition of KeyDigest is important in the validation part, so that data consumers can determine which symmetric key should be used to verify the HMAC. I agree with Marc that, for key longer than the block size, we can use the digest of digest of the key as the key id because the digest of the key is the actual key in this case. Yingdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From abannis at UCLA.EDU Tue Sep 23 16:24:06 2014 From: abannis at UCLA.EDU (Adeola Bannis) Date: Tue, 23 Sep 2014 16:24:06 -0700 Subject: [Ndn-interest] Raspberry Pi IoT toolkit Message-ID: Hello all, The Raspberry Pi IoT toolkit (as described at NDNcomm ) is now available for download. The main component of the toolkit is an SD card image for Raspberry Pi. Using the image, users can connect Raspberry Pis on the same LAN to form a home network over NDN. For image downloads and current source code, see https://github.com/remap/ndn-pi/releases Thanks, Adeola -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Tue Sep 23 19:27:13 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Tue, 23 Sep 2014 19:27:13 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> References: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> Message-ID: discovery can be reduced to "pattern detection" (can we infer what exists?) and "pattern validation" (can we confirm this guess?) For example, I see a pattern /mail/inbox/148. I, a human being, see a pattern with static (/mail/inbox) and variable (148) components; with proper naming convention, computers can also detect this pattern easily. Now I want to look for all mails in my inbox. I can generate a list of /mail/inbox/. These are my guesses, and with selectors I can further refine my guesses. To validate them, bloom filter can provide "best effort" discovery(with some false positives, so I call it "best-effort") before I stupidly send all the interests to the network. The discovery protocol, as I described above, is essentially "pattern detection by naming convention" and "bloom filter validation." This is definitely one of the "simpler" discovery protocol, because the data producer only need to add additional bloom filter. Notice that we can progressively add entries to bfilter with low computation cost. On Tue, Sep 23, 2014 at 2:34 AM, wrote: > Ok, yes I think those would all be good things. > > One thing to keep in mind, especially with things like time series sensor > data, is that people see a pattern and infer a way of doing it. That?s easy > for a human :) But in Discovery, one should assume that one does not know > of patterns in the data beyond what the protocols used to publish the data > explicitly require. That said, I think some of the things you listed are > good places to start: sensor data, web content, climate data or genome data. > > We also need to state what the forwarding strategies are and what the cache > behavior is. > > I outlined some of the points that I think are important in that other > posting. While ?discover latest? is useful, ?discover all? is also > important, and that one gets complicated fast. So points like separating > discovery from retrieval and working with large data sets have been > important in shaping our thinking. That all said, I?d be happy starting > from 0 and working through the Discovery service definition from scratch > along with data set use cases. > > Marc > > On Sep 23, 2014, at 12:36 AM, Burke, Jeff wrote: > > Hi Marc, > > Thanks ? yes, I saw that as well. I was just trying to get one step more > specific, which was to see if we could identify a few specific use cases > around which to have the conversation. (e.g., time series sensor data and > web content retrieval for "get latest"; climate data for huge data sets; > local data in a vehicular network; etc.) What have you been looking at > that's driving considerations of discovery? > > Thanks, > Jeff > > From: > Date: Mon, 22 Sep 2014 22:29:43 +0000 > To: Jeff Burke > Cc: , > Subject: Re: [Ndn-interest] any comments on naming convention? > > Jeff, > > Take a look at my posting (that Felix fixed) in a new thread on Discovery. > > http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html > > I think it would be very productive to talk about what Discovery should do, > and not focus on the how. It is sometimes easy to get caught up in the how, > which I think is a less important topic than the what at this stage. > > Marc > > On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: > > Marc, > > If you can't talk about your protocols, perhaps we can discuss this based > on use cases. What are the use cases you are using to evaluate > discovery? > > Jeff > > > > On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: > > No matter what the expressiveness of the predicates if the forwarder can > send interests different ways you don't have a consistent underlying set > to talk about so you would always need non-range exclusions to discover > every version. > > Range exclusions only work I believe if you get an authoritative answer. > If different content pieces are scattered between different caches I > don't see how range exclusions would work to discover every version. > > I'm sorry to be pointing out problems without offering solutions but > we're not ready to publish our discovery protocols. > > Sent from my telephone > > On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: > > I see. Can you briefly describe how ccnx discovery protocol solves the > all problems that you mentioned (not just exclude)? a doc will be > better. > > My unserious conjecture( :) ) : exclude is equal to [not]. I will soon > expect [and] and [or], so boolean algebra is fully supported. Regular > language or context free language might become part of selector too. > > On Sat, Sep 20, 2014 at 11:25 PM, wrote: > That will get you one reading then you need to exclude it and ask > again. > > Sent from my telephone > > On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: > > Yes, my point was that if you cannot talk about a consistent set > with a particular cache, then you need to always use individual > excludes not range excludes if you want to discover all the versions > of an object. > > > I am very confused. For your example, if I want to get all today's > sensor data, I just do (Any..Last second of last day)(First second of > tomorrow..Any). That's 18 bytes. > > > [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude > > On Sat, Sep 20, 2014 at 10:55 PM, wrote: > > On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: > > If you talk sometimes to A and sometimes to B, you very easily > could miss content objects you want to discovery unless you avoid > all range exclusions and only exclude explicit versions. > > > Could you explain why missing content object situation happens? also > range exclusion is just a shorter notation for many explicit > exclude; > converting from explicit excludes to ranged exclude is always > possible. > > > Yes, my point was that if you cannot talk about a consistent set > with a particular cache, then you need to always use individual > excludes not range excludes if you want to discover all the versions > of an object. For something like a sensor reading that is updated, > say, once per second you will have 86,400 of them per day. If each > exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of > exclusions (plus encoding overhead) per day. > > yes, maybe using a more deterministic version number than a > timestamp makes sense here, but its just an example of needing a lot > of exclusions. > > > You exclude through 100 then issue a new interest. This goes to > cache B > > > I feel this case is invalid because cache A will also get the > interest, and cache A will return v101 if it exists. Like you said, > if > this goes to cache B only, it means that cache A dies. How do you > know > that v101 even exist? > > > I guess this depends on what the forwarding strategy is. If the > forwarder will always send each interest to all replicas, then yes, > modulo packet loss, you would discover v101 on cache A. If the > forwarder is just doing ?best path? and can round-robin between cache > A and cache B, then your application could miss v101. > > > > c,d In general I agree that LPM performance is related to the number > of components. In my own thread-safe LMP implementation, I used only > one RWMutex for the whole tree. I don't know whether adding lock for > every node will be faster or not because of lock overhead. > > However, we should compare (exact match + discovery protocol) vs > (ndn > lpm). Comparing performance of exact match to lpm is unfair. > > > Yes, we should compare them. And we need to publish the ccnx 1.0 > specs for doing the exact match discovery. So, as I said, I?m not > ready to claim its better yet because we have not done that. > > > > > > > On Sat, Sep 20, 2014 at 2:38 PM, wrote: > I would point out that using LPM on content object to Interest > matching to do discovery has its own set of problems. Discovery > involves more than just ?latest version? discovery too. > > This is probably getting off-topic from the original post about > naming conventions. > > a. If Interests can be forwarded multiple directions and two > different caches are responding, the exclusion set you build up > talking with cache A will be invalid for cache B. If you talk > sometimes to A and sometimes to B, you very easily could miss > content objects you want to discovery unless you avoid all range > exclusions and only exclude explicit versions. That will lead to > very large interest packets. In ccnx 1.0, we believe that an > explicit discovery protocol that allows conversations about > consistent sets is better. > > b. Yes, if you just want the ?latest version? discovery that > should be transitive between caches, but imagine this. You send > Interest #1 to cache A which returns version 100. You exclude > through 100 then issue a new interest. This goes to cache B who > only has version 99, so the interest times out or is NACK?d. So > you think you have it! But, cache A already has version 101, you > just don?t know. If you cannot have a conversation around > consistent sets, it seems like even doing latest version discovery > is difficult with selector based discovery. From what I saw in > ccnx 0.x, one ended up getting an Interest all the way to the > authoritative source because you can never believe an intermediate > cache that there?s not something more recent. > > I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be > interest in seeing your analysis. Case (a) is that a node can > correctly discover every version of a name prefix, and (b) is that > a node can correctly discover the latest version. We have not > formally compared (or yet published) our discovery protocols (we > have three, 2 for content, 1 for device) compared to selector based > discovery, so I cannot yet claim they are better, but they do not > have the non-determinism sketched above. > > c. Using LPM, there is a non-deterministic number of lookups you > must do in the PIT to match a content object. If you have a name > tree or a threaded hash table, those don?t all need to be hash > lookups, but you need to walk up the name tree for every prefix of > the content object name and evaluate the selector predicate. > Content Based Networking (CBN) had some some methods to create data > structures based on predicates, maybe those would be better. But > in any case, you will potentially need to retrieve many PIT entries > if there is Interest traffic for many prefixes of a root. Even on > an Intel system, you?ll likely miss cache lines, so you?ll have a > lot of NUMA access for each one. In CCNx 1.0, even a naive > implementation only requires at most 3 lookups (one by name, one by > name + keyid, one by name + content object hash), and one can do > other things to optimize lookup for an extra write. > > d. In (c) above, if you have a threaded name tree or are just > walking parent pointers, I suspect you?ll need locking of the > ancestors in a multi-threaded system (?threaded" here meaning LWP) > and that will be expensive. It would be interesting to see what a > cache consistent multi-threaded name tree looks like. > > Marc > > > On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu > wrote: > > I had thought about these questions, but I want to know your idea > besides typed component: > 1. LPM allows "data discovery". How will exact match do similar > things? > 2. will removing selectors improve performance? How do we use > other > faster technique to replace selector? > 3. fixed byte length and type. I agree more that type can be fixed > byte, but 2 bytes for length might not be enough for future. > > > On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) > wrote: > > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu > wrote: > > I know how to make #2 flexible enough to do what things I can > envision we need to do, and with a few simple conventions on > how the registry of types is managed. > > > Could you share it with us? > > Sure. Here?s a strawman. > > The type space is 16 bits, so you have 65,565 types. > > The type space is currently shared with the types used for the > entire protocol, that gives us two options: > (1) we reserve a range for name component types. Given the > likelihood there will be at least as much and probably more need > to component types than protocol extensions, we could reserve 1/2 > of the type space, giving us 32K types for name components. > (2) since there is no parsing ambiguity between name components > and other fields of the protocol (sine they are sub-types of the > name type) we could reuse numbers and thereby have an entire 65K > name component types. > > We divide the type space into regions, and manage it with a > registry. If we ever get to the point of creating an IETF > standard, IANA has 25 years of experience running registries and > there are well-understood rule sets for different kinds of > registries (open, requires a written spec, requires standards > approval). > > - We allocate one ?default" name component type for ?generic > name?, which would be used on name prefixes and other common > cases where there are no special semantics on the name component. > - We allocate a range of name component types, say 1024, to > globally understood types that are part of the base or extension > NDN specifications (e.g. chunk#, version#, etc. > - We reserve some portion of the space for unanticipated uses > (say another 1024 types) > - We give the rest of the space to application assignment. > > Make sense? > > > While I?m sympathetic to that view, there are three ways in > which Moore?s law or hardware tricks will not save us from > performance flaws in the design > > > we could design for performance, > > That?s not what people are advocating. We are advocating that we > *not* design for known bad performance and hope serendipity or > Moore?s Law will come to the rescue. > > but I think there will be a turning > point when the slower design starts to become "fast enough?. > > Perhaps, perhaps not. Relative performance is what matters so > things that don?t get faster while others do tend to get dropped > or not used because they impose a performance penalty relative to > the things that go faster. There is also the ?low-end? phenomenon > where impovements in technology get applied to lowering cost > rather than improving performance. For those environments bad > performance just never get better. > > Do you > think there will be some design of ndn that will *never* have > performance improvement? > > I suspect LPM on data will always be slow (relative to the other > functions). > i suspect exclusions will always be slow because they will > require extra memory references. > > However I of course don?t claim to clairvoyance so this is just > speculation based on 35+ years of seeing performance improve by 4 > orders of magnitude and still having to worry about counting > cycles and memory references? > > On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) > wrote: > > On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu > wrote: > > We should not look at a certain chip nowadays and want ndn to > perform > well on it. It should be the other way around: once ndn app > becomes > popular, a better chip will be designed for ndn. > > While I?m sympathetic to that view, there are three ways in > which Moore?s law or hardware tricks will not save us from > performance flaws in the design: > a) clock rates are not getting (much) faster > b) memory accesses are getting (relatively) more expensive > c) data structures that require locks to manipulate > successfully will be relatively more expensive, even with > near-zero lock contention. > > The fact is, IP *did* have some serious performance flaws in > its design. We just forgot those because the design elements > that depended on those mistakes have fallen into disuse. The > poster children for this are: > 1. IP options. Nobody can use them because they are too slow > on modern forwarding hardware, so they can?t be reliably used > anywhere > 2. the UDP checksum, which was a bad design when it was > specified and is now a giant PITA that still causes major pain > in working around. > > I?m afraid students today are being taught the that designers > of IP were flawless, as opposed to very good scientists and > engineers that got most of it right. > > I feel the discussion today and yesterday has been off-topic. > Now I > see that there are 3 approaches: > 1. we should not define a naming convention at all > 2. typed component: use tlv type space and add a handful of > types > 3. marked component: introduce only one more type and add > additional > marker space > > I know how to make #2 flexible enough to do what things I can > envision we need to do, and with a few simple conventions on > how the registry of types is managed. > > It is just as powerful in practice as either throwing up our > hands and letting applications design their own mutually > incompatible schemes or trying to make naming conventions with > markers in a way that is fast to generate/parse and also > resilient against aliasing. > > Also everybody thinks that the current utf8 marker naming > convention > needs to be revised. > > > > On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe > wrote: > Would that chip be suitable, i.e. can we expect most names > to fit in (the > magnitude of) 96 bytes? What length are names usually in > current NDN > experiments? > > I guess wide deployment could make for even longer names. > Related: Many URLs > I encounter nowadays easily don't fit within two 80-column > text lines, and > NDN will have to carry more information than URLs, as far as > I see. > > > On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: > > In fact, the index in separate TLV will be slower on some > architectures, > like the ezChip NP4. The NP4 can hold the fist 96 frame > bytes in memory, > then any subsequent memory is accessed only as two adjacent > 32-byte blocks > (there can be at most 5 blocks available at any one time). > If you need to > switch between arrays, it would be very expensive. If you > have to read past > the name to get to the 2nd array, then read it, then backup > to get to the > name, it will be pretty expensive too. > > Marc > > On Sep 18, 2014, at 2:02 PM, > wrote: > > Does this make that much difference? > > If you want to parse the first 5 components. One way to do > it is: > > Read the index, find entry 5, then read in that many bytes > from the start > offset of the beginning of the name. > OR > Start reading name, (find size + move ) 5 times. > > How much speed are you getting from one to the other? You > seem to imply > that the first one is faster. I don?t think this is the > case. > > In the first one you?ll probably have to get the cache line > for the index, > then all the required cache lines for the first 5 > components. For the > second, you?ll have to get all the cache lines for the first > 5 components. > Given an assumption that a cache miss is way more expensive > than > evaluating a number and computing an addition, you might > find that the > performance of the index is actually slower than the > performance of the > direct access. > > Granted, there is a case where you don?t access the name at > all, for > example, if you just get the offsets and then send the > offsets as > parameters to another processor/GPU/NPU/etc. In this case > you may see a > gain IF there are more cache line misses in reading the name > than in > reading the index. So, if the regular part of the name > that you?re > parsing is bigger than the cache line (64 bytes?) and the > name is to be > processed by a different processor, then your might see some > performance > gain in using the index, but in all other circumstances I > bet this is not > the case. I may be wrong, haven?t actually tested it. > > This is all to say, I don?t think we should be designing the > protocol with > only one architecture in mind. (The architecture of sending > the name to a > different processor than the index). > > If you have numbers that show that the index is faster I > would like to see > under what conditions and architectural assumptions. > > Nacho > > (I may have misinterpreted your description so feel free to > correct me if > I?m wrong.) > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/18/14, 12:54 AM, "Massimo Gallo" > > wrote: > > Indeed each components' offset must be encoded using a fixed > amount of > bytes: > > i.e., > Type = Offsets > Length = 10 Bytes > Value = Offset1(1byte), Offset2(1byte), ... > > You may also imagine to have a "Offset_2byte" type if your > name is too > long. > > Max > > On 18/09/2014 09:27, Tai-Lin Chu wrote: > > if you do not need the entire hierarchal structure (suppose > you only > want the first x components) you can directly have it using > the > offsets. With the Nested TLV structure you have to > iteratively parse > the first x-1 components. With the offset structure you cane > directly > access to the firs x components. > > I don't get it. What you described only works if the > "offset" is > encoded in fixed bytes. With varNum, you will still need to > parse x-1 > offsets to get to the x offset. > > > > On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > wrote: > > On 17/09/2014 14:56, Mark Stapp wrote: > > ah, thanks - that's helpful. I thought you were saying "I > like the > existing NDN UTF8 'convention'." I'm still not sure I > understand what > you > _do_ prefer, though. it sounds like you're describing an > entirely > different > scheme where the info that describes the name-components is > ... > someplace > other than _in_ the name-components. is that correct? when > you say > "field > separator", what do you mean (since that's not a "TL" from a > TLV)? > > Correct. > In particular, with our name encoding, a TLV indicates the > name > hierarchy > with offsets in the name and other TLV(s) indicates the > offset to use > in > order to retrieve special components. > As for the field separator, it is something like "/". > Aliasing is > avoided as > you do not rely on field separators to parse the name; you > use the > "offset > TLV " to do that. > > So now, it may be an aesthetic question but: > > if you do not need the entire hierarchal structure (suppose > you only > want > the first x components) you can directly have it using the > offsets. > With the > Nested TLV structure you have to iteratively parse the first > x-1 > components. > With the offset structure you cane directly access to the > firs x > components. > > Max > > > -- Mark > > On 9/17/14 6:02 AM, Massimo Gallo wrote: > > The why is simple: > > You use a lot of "generic component type" and very few > "specific > component type". You are imposing types for every component > in order > to > handle few exceptions (segmentation, etc..). You create a > rule > (specify > the component's type ) to handle exceptions! > > I would prefer not to have typed components. Instead I would > prefer > to > have the name as simple sequence bytes with a field > separator. Then, > outside the name, if you have some components that could be > used at > network layer (e.g. a TLV field), you simply need something > that > indicates which is the offset allowing you to retrieve the > version, > segment, etc in the name... > > > Max > > > > > > On 16/09/2014 20:33, Mark Stapp wrote: > > On 9/16/14 10:29 AM, Massimo Gallo wrote: > > I think we agree on the small number of "component types". > However, if you have a small number of types, you will end > up with > names > containing many generic components types and few specific > components > types. Due to the fact that the component type specification > is an > exception in the name, I would prefer something that specify > component's > type only when needed (something like UTF8 conventions but > that > applications MUST use). > > so ... I can't quite follow that. the thread has had some > explanation > about why the UTF8 requirement has problems (with aliasing, > e.g.) > and > there's been email trying to explain that applications don't > have to > use types if they don't need to. your email sounds like "I > prefer > the > UTF8 convention", but it doesn't say why you have that > preference in > the face of the points about the problems. can you say why > it is > that > you express a preference for the "convention" with problems ? > > Thanks, > Mark > > . > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > From Marc.Mosko at parc.com Wed Sep 24 00:37:14 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Wed, 24 Sep 2014 07:37:14 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> Message-ID: <65E2A60C-5494-4991-B78D-7164B1277176@parc.com> Ok, let?s take that example and run with it a bit. I?ll walk through a ?discover all? example. This example leads me to why I say discovery should be separate from data retrieval. I don?t claim that we have a final solution to this problem, I think in a distributed peer-to-peer environment solving this problem is difficult. If you have a counter example as to how this discovery could progress using only the information know a priori by the requester, I would be interesting in seeing that example worked out. Please do correct me if you think this is wrong. You have mails that were originally numbered 0 - 10000, sequentially by the server. You travel between several places and access different emails from different places. This populates caches. Lets say 0,3,6,9,? are on cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. Also, you have deleted 500 random emails, so there?s only 9500 emails actually out there. You setup a new computer and now want to download all your emails. The new computer is on the path of caches C, B, then A, then the authoritative source server. The new email program has no initial state. The email program only knows that the email number is an integer that starts at 0. It issues an interest for /mail/inbox, and asks for left-most child because it want to populate in order. It gets a response from cache C with mail 2. Now, what does the email program do? It cannot exclude the range 0..2 because that would possibly miss 0 and 1. So, all it can do is exclude the exact number ?2? and ask again. It then gets cache C again and it responds with ?5?. There are about 3000 emails on cache C, and if they all take 4 bytes (for the exclude component plus its coding overhead), then that?s 12KB of exclusions to finally exhaust cache C. If we want Interests to avoid fragmentation, we can fit about 1200 bytes of exclusions, or 300 components. This means we need about 10 interest messages. Each interest would be something like ?exclude 2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, 308, ?, >600?, etc. Those interests that exclude everything at cache C would then hit, say cache B and start getting results 1, 4, 7, ?. This means an Interest like ?exclude 2,5,8,11,?, >300? would then get back number 1. That means the next request actually has to split that one interest?s exclude in to two (because the interest was at maximum size), so you now issue two interests where one is ?exclude 1, 2, 5, 8, >210? and the other is ?<210, 212, 215, ?, >300?. If you look in the CCNx 0.8 java code, there should be a class that does these Interest based discoveries and does the Interest splitting based on the currently know range of discovered content. I don?t have the specific reference right now, but I can send a link if you are interested in seeing that. The java class keeps state of what has been discovered so far, so it could re-start later if interrupted. So all those interests would now be getting results form cache B. You would then start to split all those ranges to accommodate the numbers coming back from B. Eventually, you?ll have at least 10 Interest messages outstanding that would be excluding all the 9500 messages that are in caches A, B, and C. Some of those interest messages might actually reach an authoritative server, which might respond too. It would like be more than 10 interests due to the algorithm that?s used to split full interests, which likely is not optimal because it does not know exactly where breaks should be a priori. Once you have exhausted caches A, B, and C, the interest messages would reach the authoritative source (if its on line), and it would be issuing NACKs (i assume) for interests have have excluded all non-deleted emails. In any case, it takes, at best, 9,500 round trips to ?discover? all 9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 bytes of Interest exclusions. Note that it?s an arithmetic sum of bytes of exclusion, because at each Interest the size of the exclusions increases by 4. There was an NDN paper about light bulb discovery (or something like that) that noted this same problem and proposed some work around, but I don?t remember what they proposed. Yes, you could possibly pipeline it, but what would you do? In this example, where emails 0 - 10000 (minus some random ones) would allow you ? if you knew a priori ? to issue say 10 interest in parallel that ask for different ranges. But, 2 years from now your undeleted emails might range form 100,000 - 150,000. The point is that a discovery protocol does not know, a priori, what is to be discovered. It might start learning some stuff as it goes on. If you could have retrieved just a table of contents from each cache, where each ?row? is say 64 bytes (i.e. the name continuation plus hash value), you would need to retrieve 3300 * 64 = 211KB from each cache (total 640 KB) to list all the emails. That would take 640KB / 1200 = 534 interest messages of say 64 bytes = 34 KB to discover all 9500 emails plus another set to fetch the header rows. That?s, say 68 KB of interest traffic compared to 200 MB. Now, I?ve not said how to list these tables of contents, so an actual protocol might be higher communication cost, but even if it was 10x worse that would still be an attractive tradeoff. This assumes that you publish just the ?header? in the 1st segment (say 1 KB total object size including the signatures). That?s 10 MB to learn the headers. You could also argue that the distribute of emails over caches is arbitrary. That?s true, I picked a difficult sequence. But unless you have some positive controls on what could be in a cache, it could be any difficult sequence. I also did not address the timeout issue, and how do you know you are done? This is also why sync works so much better than doing raw interest discovery. Sync exchanges tables of contents and diffs, it does not need to enumerate by exclusion everything to retrieve. Marc On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu wrote: > discovery can be reduced to "pattern detection" (can we infer what > exists?) and "pattern validation" (can we confirm this guess?) > > For example, I see a pattern /mail/inbox/148. I, a human being, see a > pattern with static (/mail/inbox) and variable (148) components; with > proper naming convention, computers can also detect this pattern > easily. Now I want to look for all mails in my inbox. I can generate a > list of /mail/inbox/. These are my guesses, and with selectors > I can further refine my guesses. > > To validate them, bloom filter can provide "best effort" > discovery(with some false positives, so I call it "best-effort") > before I stupidly send all the interests to the network. > > The discovery protocol, as I described above, is essentially "pattern > detection by naming convention" and "bloom filter validation." This is > definitely one of the "simpler" discovery protocol, because the data > producer only need to add additional bloom filter. Notice that we can > progressively add entries to bfilter with low computation cost. > > > > > > > > > > > > On Tue, Sep 23, 2014 at 2:34 AM, wrote: >> Ok, yes I think those would all be good things. >> >> One thing to keep in mind, especially with things like time series sensor >> data, is that people see a pattern and infer a way of doing it. That?s easy >> for a human :) But in Discovery, one should assume that one does not know >> of patterns in the data beyond what the protocols used to publish the data >> explicitly require. That said, I think some of the things you listed are >> good places to start: sensor data, web content, climate data or genome data. >> >> We also need to state what the forwarding strategies are and what the cache >> behavior is. >> >> I outlined some of the points that I think are important in that other >> posting. While ?discover latest? is useful, ?discover all? is also >> important, and that one gets complicated fast. So points like separating >> discovery from retrieval and working with large data sets have been >> important in shaping our thinking. That all said, I?d be happy starting >> from 0 and working through the Discovery service definition from scratch >> along with data set use cases. >> >> Marc >> >> On Sep 23, 2014, at 12:36 AM, Burke, Jeff wrote: >> >> Hi Marc, >> >> Thanks ? yes, I saw that as well. I was just trying to get one step more >> specific, which was to see if we could identify a few specific use cases >> around which to have the conversation. (e.g., time series sensor data and >> web content retrieval for "get latest"; climate data for huge data sets; >> local data in a vehicular network; etc.) What have you been looking at >> that's driving considerations of discovery? >> >> Thanks, >> Jeff >> >> From: >> Date: Mon, 22 Sep 2014 22:29:43 +0000 >> To: Jeff Burke >> Cc: , >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> Jeff, >> >> Take a look at my posting (that Felix fixed) in a new thread on Discovery. >> >> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html >> >> I think it would be very productive to talk about what Discovery should do, >> and not focus on the how. It is sometimes easy to get caught up in the how, >> which I think is a less important topic than the what at this stage. >> >> Marc >> >> On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: >> >> Marc, >> >> If you can't talk about your protocols, perhaps we can discuss this based >> on use cases. What are the use cases you are using to evaluate >> discovery? >> >> Jeff >> >> >> >> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: >> >> No matter what the expressiveness of the predicates if the forwarder can >> send interests different ways you don't have a consistent underlying set >> to talk about so you would always need non-range exclusions to discover >> every version. >> >> Range exclusions only work I believe if you get an authoritative answer. >> If different content pieces are scattered between different caches I >> don't see how range exclusions would work to discover every version. >> >> I'm sorry to be pointing out problems without offering solutions but >> we're not ready to publish our discovery protocols. >> >> Sent from my telephone >> >> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >> >> I see. Can you briefly describe how ccnx discovery protocol solves the >> all problems that you mentioned (not just exclude)? a doc will be >> better. >> >> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >> expect [and] and [or], so boolean algebra is fully supported. Regular >> language or context free language might become part of selector too. >> >> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >> That will get you one reading then you need to exclude it and ask >> again. >> >> Sent from my telephone >> >> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >> >> Yes, my point was that if you cannot talk about a consistent set >> with a particular cache, then you need to always use individual >> excludes not range excludes if you want to discover all the versions >> of an object. >> >> >> I am very confused. For your example, if I want to get all today's >> sensor data, I just do (Any..Last second of last day)(First second of >> tomorrow..Any). That's 18 bytes. >> >> >> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >> >> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >> >> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >> >> If you talk sometimes to A and sometimes to B, you very easily >> could miss content objects you want to discovery unless you avoid >> all range exclusions and only exclude explicit versions. >> >> >> Could you explain why missing content object situation happens? also >> range exclusion is just a shorter notation for many explicit >> exclude; >> converting from explicit excludes to ranged exclude is always >> possible. >> >> >> Yes, my point was that if you cannot talk about a consistent set >> with a particular cache, then you need to always use individual >> excludes not range excludes if you want to discover all the versions >> of an object. For something like a sensor reading that is updated, >> say, once per second you will have 86,400 of them per day. If each >> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >> exclusions (plus encoding overhead) per day. >> >> yes, maybe using a more deterministic version number than a >> timestamp makes sense here, but its just an example of needing a lot >> of exclusions. >> >> >> You exclude through 100 then issue a new interest. This goes to >> cache B >> >> >> I feel this case is invalid because cache A will also get the >> interest, and cache A will return v101 if it exists. Like you said, >> if >> this goes to cache B only, it means that cache A dies. How do you >> know >> that v101 even exist? >> >> >> I guess this depends on what the forwarding strategy is. If the >> forwarder will always send each interest to all replicas, then yes, >> modulo packet loss, you would discover v101 on cache A. If the >> forwarder is just doing ?best path? and can round-robin between cache >> A and cache B, then your application could miss v101. >> >> >> >> c,d In general I agree that LPM performance is related to the number >> of components. In my own thread-safe LMP implementation, I used only >> one RWMutex for the whole tree. I don't know whether adding lock for >> every node will be faster or not because of lock overhead. >> >> However, we should compare (exact match + discovery protocol) vs >> (ndn >> lpm). Comparing performance of exact match to lpm is unfair. >> >> >> Yes, we should compare them. And we need to publish the ccnx 1.0 >> specs for doing the exact match discovery. So, as I said, I?m not >> ready to claim its better yet because we have not done that. >> >> >> >> >> >> >> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >> I would point out that using LPM on content object to Interest >> matching to do discovery has its own set of problems. Discovery >> involves more than just ?latest version? discovery too. >> >> This is probably getting off-topic from the original post about >> naming conventions. >> >> a. If Interests can be forwarded multiple directions and two >> different caches are responding, the exclusion set you build up >> talking with cache A will be invalid for cache B. If you talk >> sometimes to A and sometimes to B, you very easily could miss >> content objects you want to discovery unless you avoid all range >> exclusions and only exclude explicit versions. That will lead to >> very large interest packets. In ccnx 1.0, we believe that an >> explicit discovery protocol that allows conversations about >> consistent sets is better. >> >> b. Yes, if you just want the ?latest version? discovery that >> should be transitive between caches, but imagine this. You send >> Interest #1 to cache A which returns version 100. You exclude >> through 100 then issue a new interest. This goes to cache B who >> only has version 99, so the interest times out or is NACK?d. So >> you think you have it! But, cache A already has version 101, you >> just don?t know. If you cannot have a conversation around >> consistent sets, it seems like even doing latest version discovery >> is difficult with selector based discovery. From what I saw in >> ccnx 0.x, one ended up getting an Interest all the way to the >> authoritative source because you can never believe an intermediate >> cache that there?s not something more recent. >> >> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >> interest in seeing your analysis. Case (a) is that a node can >> correctly discover every version of a name prefix, and (b) is that >> a node can correctly discover the latest version. We have not >> formally compared (or yet published) our discovery protocols (we >> have three, 2 for content, 1 for device) compared to selector based >> discovery, so I cannot yet claim they are better, but they do not >> have the non-determinism sketched above. >> >> c. Using LPM, there is a non-deterministic number of lookups you >> must do in the PIT to match a content object. If you have a name >> tree or a threaded hash table, those don?t all need to be hash >> lookups, but you need to walk up the name tree for every prefix of >> the content object name and evaluate the selector predicate. >> Content Based Networking (CBN) had some some methods to create data >> structures based on predicates, maybe those would be better. But >> in any case, you will potentially need to retrieve many PIT entries >> if there is Interest traffic for many prefixes of a root. Even on >> an Intel system, you?ll likely miss cache lines, so you?ll have a >> lot of NUMA access for each one. In CCNx 1.0, even a naive >> implementation only requires at most 3 lookups (one by name, one by >> name + keyid, one by name + content object hash), and one can do >> other things to optimize lookup for an extra write. >> >> d. In (c) above, if you have a threaded name tree or are just >> walking parent pointers, I suspect you?ll need locking of the >> ancestors in a multi-threaded system (?threaded" here meaning LWP) >> and that will be expensive. It would be interesting to see what a >> cache consistent multi-threaded name tree looks like. >> >> Marc >> >> >> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >> wrote: >> >> I had thought about these questions, but I want to know your idea >> besides typed component: >> 1. LPM allows "data discovery". How will exact match do similar >> things? >> 2. will removing selectors improve performance? How do we use >> other >> faster technique to replace selector? >> 3. fixed byte length and type. I agree more that type can be fixed >> byte, but 2 bytes for length might not be enough for future. >> >> >> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >> wrote: >> >> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >> wrote: >> >> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on >> how the registry of types is managed. >> >> >> Could you share it with us? >> >> Sure. Here?s a strawman. >> >> The type space is 16 bits, so you have 65,565 types. >> >> The type space is currently shared with the types used for the >> entire protocol, that gives us two options: >> (1) we reserve a range for name component types. Given the >> likelihood there will be at least as much and probably more need >> to component types than protocol extensions, we could reserve 1/2 >> of the type space, giving us 32K types for name components. >> (2) since there is no parsing ambiguity between name components >> and other fields of the protocol (sine they are sub-types of the >> name type) we could reuse numbers and thereby have an entire 65K >> name component types. >> >> We divide the type space into regions, and manage it with a >> registry. If we ever get to the point of creating an IETF >> standard, IANA has 25 years of experience running registries and >> there are well-understood rule sets for different kinds of >> registries (open, requires a written spec, requires standards >> approval). >> >> - We allocate one ?default" name component type for ?generic >> name?, which would be used on name prefixes and other common >> cases where there are no special semantics on the name component. >> - We allocate a range of name component types, say 1024, to >> globally understood types that are part of the base or extension >> NDN specifications (e.g. chunk#, version#, etc. >> - We reserve some portion of the space for unanticipated uses >> (say another 1024 types) >> - We give the rest of the space to application assignment. >> >> Make sense? >> >> >> While I?m sympathetic to that view, there are three ways in >> which Moore?s law or hardware tricks will not save us from >> performance flaws in the design >> >> >> we could design for performance, >> >> That?s not what people are advocating. We are advocating that we >> *not* design for known bad performance and hope serendipity or >> Moore?s Law will come to the rescue. >> >> but I think there will be a turning >> point when the slower design starts to become "fast enough?. >> >> Perhaps, perhaps not. Relative performance is what matters so >> things that don?t get faster while others do tend to get dropped >> or not used because they impose a performance penalty relative to >> the things that go faster. There is also the ?low-end? phenomenon >> where impovements in technology get applied to lowering cost >> rather than improving performance. For those environments bad >> performance just never get better. >> >> Do you >> think there will be some design of ndn that will *never* have >> performance improvement? >> >> I suspect LPM on data will always be slow (relative to the other >> functions). >> i suspect exclusions will always be slow because they will >> require extra memory references. >> >> However I of course don?t claim to clairvoyance so this is just >> speculation based on 35+ years of seeing performance improve by 4 >> orders of magnitude and still having to worry about counting >> cycles and memory references? >> >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >> wrote: >> >> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >> wrote: >> >> We should not look at a certain chip nowadays and want ndn to >> perform >> well on it. It should be the other way around: once ndn app >> becomes >> popular, a better chip will be designed for ndn. >> >> While I?m sympathetic to that view, there are three ways in >> which Moore?s law or hardware tricks will not save us from >> performance flaws in the design: >> a) clock rates are not getting (much) faster >> b) memory accesses are getting (relatively) more expensive >> c) data structures that require locks to manipulate >> successfully will be relatively more expensive, even with >> near-zero lock contention. >> >> The fact is, IP *did* have some serious performance flaws in >> its design. We just forgot those because the design elements >> that depended on those mistakes have fallen into disuse. The >> poster children for this are: >> 1. IP options. Nobody can use them because they are too slow >> on modern forwarding hardware, so they can?t be reliably used >> anywhere >> 2. the UDP checksum, which was a bad design when it was >> specified and is now a giant PITA that still causes major pain >> in working around. >> >> I?m afraid students today are being taught the that designers >> of IP were flawless, as opposed to very good scientists and >> engineers that got most of it right. >> >> I feel the discussion today and yesterday has been off-topic. >> Now I >> see that there are 3 approaches: >> 1. we should not define a naming convention at all >> 2. typed component: use tlv type space and add a handful of >> types >> 3. marked component: introduce only one more type and add >> additional >> marker space >> >> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on >> how the registry of types is managed. >> >> It is just as powerful in practice as either throwing up our >> hands and letting applications design their own mutually >> incompatible schemes or trying to make naming conventions with >> markers in a way that is fast to generate/parse and also >> resilient against aliasing. >> >> Also everybody thinks that the current utf8 marker naming >> convention >> needs to be revised. >> >> >> >> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >> wrote: >> Would that chip be suitable, i.e. can we expect most names >> to fit in (the >> magnitude of) 96 bytes? What length are names usually in >> current NDN >> experiments? >> >> I guess wide deployment could make for even longer names. >> Related: Many URLs >> I encounter nowadays easily don't fit within two 80-column >> text lines, and >> NDN will have to carry more information than URLs, as far as >> I see. >> >> >> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >> >> In fact, the index in separate TLV will be slower on some >> architectures, >> like the ezChip NP4. The NP4 can hold the fist 96 frame >> bytes in memory, >> then any subsequent memory is accessed only as two adjacent >> 32-byte blocks >> (there can be at most 5 blocks available at any one time). >> If you need to >> switch between arrays, it would be very expensive. If you >> have to read past >> the name to get to the 2nd array, then read it, then backup >> to get to the >> name, it will be pretty expensive too. >> >> Marc >> >> On Sep 18, 2014, at 2:02 PM, >> wrote: >> >> Does this make that much difference? >> >> If you want to parse the first 5 components. One way to do >> it is: >> >> Read the index, find entry 5, then read in that many bytes >> from the start >> offset of the beginning of the name. >> OR >> Start reading name, (find size + move ) 5 times. >> >> How much speed are you getting from one to the other? You >> seem to imply >> that the first one is faster. I don?t think this is the >> case. >> >> In the first one you?ll probably have to get the cache line >> for the index, >> then all the required cache lines for the first 5 >> components. For the >> second, you?ll have to get all the cache lines for the first >> 5 components. >> Given an assumption that a cache miss is way more expensive >> than >> evaluating a number and computing an addition, you might >> find that the >> performance of the index is actually slower than the >> performance of the >> direct access. >> >> Granted, there is a case where you don?t access the name at >> all, for >> example, if you just get the offsets and then send the >> offsets as >> parameters to another processor/GPU/NPU/etc. In this case >> you may see a >> gain IF there are more cache line misses in reading the name >> than in >> reading the index. So, if the regular part of the name >> that you?re >> parsing is bigger than the cache line (64 bytes?) and the >> name is to be >> processed by a different processor, then your might see some >> performance >> gain in using the index, but in all other circumstances I >> bet this is not >> the case. I may be wrong, haven?t actually tested it. >> >> This is all to say, I don?t think we should be designing the >> protocol with >> only one architecture in mind. (The architecture of sending >> the name to a >> different processor than the index). >> >> If you have numbers that show that the index is faster I >> would like to see >> under what conditions and architectural assumptions. >> >> Nacho >> >> (I may have misinterpreted your description so feel free to >> correct me if >> I?m wrong.) >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> On 9/18/14, 12:54 AM, "Massimo Gallo" >> >> wrote: >> >> Indeed each components' offset must be encoded using a fixed >> amount of >> bytes: >> >> i.e., >> Type = Offsets >> Length = 10 Bytes >> Value = Offset1(1byte), Offset2(1byte), ... >> >> You may also imagine to have a "Offset_2byte" type if your >> name is too >> long. >> >> Max >> >> On 18/09/2014 09:27, Tai-Lin Chu wrote: >> >> if you do not need the entire hierarchal structure (suppose >> you only >> want the first x components) you can directly have it using >> the >> offsets. With the Nested TLV structure you have to >> iteratively parse >> the first x-1 components. With the offset structure you cane >> directly >> access to the firs x components. >> >> I don't get it. What you described only works if the >> "offset" is >> encoded in fixed bytes. With varNum, you will still need to >> parse x-1 >> offsets to get to the x offset. >> >> >> >> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >> wrote: >> >> On 17/09/2014 14:56, Mark Stapp wrote: >> >> ah, thanks - that's helpful. I thought you were saying "I >> like the >> existing NDN UTF8 'convention'." I'm still not sure I >> understand what >> you >> _do_ prefer, though. it sounds like you're describing an >> entirely >> different >> scheme where the info that describes the name-components is >> ... >> someplace >> other than _in_ the name-components. is that correct? when >> you say >> "field >> separator", what do you mean (since that's not a "TL" from a >> TLV)? >> >> Correct. >> In particular, with our name encoding, a TLV indicates the >> name >> hierarchy >> with offsets in the name and other TLV(s) indicates the >> offset to use >> in >> order to retrieve special components. >> As for the field separator, it is something like "/". >> Aliasing is >> avoided as >> you do not rely on field separators to parse the name; you >> use the >> "offset >> TLV " to do that. >> >> So now, it may be an aesthetic question but: >> >> if you do not need the entire hierarchal structure (suppose >> you only >> want >> the first x components) you can directly have it using the >> offsets. >> With the >> Nested TLV structure you have to iteratively parse the first >> x-1 >> components. >> With the offset structure you cane directly access to the >> firs x >> components. >> >> Max >> >> >> -- Mark >> >> On 9/17/14 6:02 AM, Massimo Gallo wrote: >> >> The why is simple: >> >> You use a lot of "generic component type" and very few >> "specific >> component type". You are imposing types for every component >> in order >> to >> handle few exceptions (segmentation, etc..). You create a >> rule >> (specify >> the component's type ) to handle exceptions! >> >> I would prefer not to have typed components. Instead I would >> prefer >> to >> have the name as simple sequence bytes with a field >> separator. Then, >> outside the name, if you have some components that could be >> used at >> network layer (e.g. a TLV field), you simply need something >> that >> indicates which is the offset allowing you to retrieve the >> version, >> segment, etc in the name... >> >> >> Max >> >> >> >> >> >> On 16/09/2014 20:33, Mark Stapp wrote: >> >> On 9/16/14 10:29 AM, Massimo Gallo wrote: >> >> I think we agree on the small number of "component types". >> However, if you have a small number of types, you will end >> up with >> names >> containing many generic components types and few specific >> components >> types. Due to the fact that the component type specification >> is an >> exception in the name, I would prefer something that specify >> component's >> type only when needed (something like UTF8 conventions but >> that >> applications MUST use). >> >> so ... I can't quite follow that. the thread has had some >> explanation >> about why the UTF8 requirement has problems (with aliasing, >> e.g.) >> and >> there's been email trying to explain that applications don't >> have to >> use types if they don't need to. your email sounds like "I >> prefer >> the >> UTF8 convention", but it doesn't say why you have that >> preference in >> the face of the points about the problems. can you say why >> it is >> that >> you express a preference for the "convention" with problems ? >> >> Thanks, >> Mark >> >> . >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Marc.Mosko at parc.com Wed Sep 24 00:51:04 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Wed, 24 Sep 2014 07:51:04 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <65E2A60C-5494-4991-B78D-7164B1277176@parc.com> References: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> <65E2A60C-5494-4991-B78D-7164B1277176@parc.com> Message-ID: Sorry for a followup immediately, but I also wanted to address compressing the ranges once you have a dense set. Yes, once you have discovered 0, 1, 2, 3, 4, 5, 6 you could exclude through 6 as one expression. But with random deletions and the difficult order of cache responses, such compression might not be possible in many cases. In a worse distribution of emails, you could say that instead of random deletions every 4th email was deleted, or something like that. Marc On Sep 24, 2014, at 9:37 AM, wrote: > Ok, let?s take that example and run with it a bit. I?ll walk through a ?discover all? example. This example leads me to why I say discovery should be separate from data retrieval. I don?t claim that we have a final solution to this problem, I think in a distributed peer-to-peer environment solving this problem is difficult. If you have a counter example as to how this discovery could progress using only the information know a priori by the requester, I would be interesting in seeing that example worked out. Please do correct me if you think this is wrong. > > You have mails that were originally numbered 0 - 10000, sequentially by the server. > > You travel between several places and access different emails from different places. This populates caches. Lets say 0,3,6,9,? are on cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. Also, you have deleted 500 random emails, so there?s only 9500 emails actually out there. > > You setup a new computer and now want to download all your emails. The new computer is on the path of caches C, B, then A, then the authoritative source server. The new email program has no initial state. The email program only knows that the email number is an integer that starts at 0. It issues an interest for /mail/inbox, and asks for left-most child because it want to populate in order. It gets a response from cache C with mail 2. > > Now, what does the email program do? It cannot exclude the range 0..2 because that would possibly miss 0 and 1. So, all it can do is exclude the exact number ?2? and ask again. It then gets cache C again and it responds with ?5?. There are about 3000 emails on cache C, and if they all take 4 bytes (for the exclude component plus its coding overhead), then that?s 12KB of exclusions to finally exhaust cache C. > > If we want Interests to avoid fragmentation, we can fit about 1200 bytes of exclusions, or 300 components. This means we need about 10 interest messages. Each interest would be something like ?exclude 2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, 308, ?, >600?, etc. > > Those interests that exclude everything at cache C would then hit, say cache B and start getting results 1, 4, 7, ?. This means an Interest like ?exclude 2,5,8,11,?, >300? would then get back number 1. That means the next request actually has to split that one interest?s exclude in to two (because the interest was at maximum size), so you now issue two interests where one is ?exclude 1, 2, 5, 8, >210? and the other is ?<210, 212, 215, ?, >300?. > > If you look in the CCNx 0.8 java code, there should be a class that does these Interest based discoveries and does the Interest splitting based on the currently know range of discovered content. I don?t have the specific reference right now, but I can send a link if you are interested in seeing that. The java class keeps state of what has been discovered so far, so it could re-start later if interrupted. > > So all those interests would now be getting results form cache B. You would then start to split all those ranges to accommodate the numbers coming back from B. Eventually, you?ll have at least 10 Interest messages outstanding that would be excluding all the 9500 messages that are in caches A, B, and C. Some of those interest messages might actually reach an authoritative server, which might respond too. It would like be more than 10 interests due to the algorithm that?s used to split full interests, which likely is not optimal because it does not know exactly where breaks should be a priori. > > Once you have exhausted caches A, B, and C, the interest messages would reach the authoritative source (if its on line), and it would be issuing NACKs (i assume) for interests have have excluded all non-deleted emails. > > In any case, it takes, at best, 9,500 round trips to ?discover? all 9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 bytes of Interest exclusions. Note that it?s an arithmetic sum of bytes of exclusion, because at each Interest the size of the exclusions increases by 4. There was an NDN paper about light bulb discovery (or something like that) that noted this same problem and proposed some work around, but I don?t remember what they proposed. > > Yes, you could possibly pipeline it, but what would you do? In this example, where emails 0 - 10000 (minus some random ones) would allow you ? if you knew a priori ? to issue say 10 interest in parallel that ask for different ranges. But, 2 years from now your undeleted emails might range form 100,000 - 150,000. The point is that a discovery protocol does not know, a priori, what is to be discovered. It might start learning some stuff as it goes on. > > If you could have retrieved just a table of contents from each cache, where each ?row? is say 64 bytes (i.e. the name continuation plus hash value), you would need to retrieve 3300 * 64 = 211KB from each cache (total 640 KB) to list all the emails. That would take 640KB / 1200 = 534 interest messages of say 64 bytes = 34 KB to discover all 9500 emails plus another set to fetch the header rows. That?s, say 68 KB of interest traffic compared to 200 MB. Now, I?ve not said how to list these tables of contents, so an actual protocol might be higher communication cost, but even if it was 10x worse that would still be an attractive tradeoff. > > This assumes that you publish just the ?header? in the 1st segment (say 1 KB total object size including the signatures). That?s 10 MB to learn the headers. > > You could also argue that the distribute of emails over caches is arbitrary. That?s true, I picked a difficult sequence. But unless you have some positive controls on what could be in a cache, it could be any difficult sequence. I also did not address the timeout issue, and how do you know you are done? > > This is also why sync works so much better than doing raw interest discovery. Sync exchanges tables of contents and diffs, it does not need to enumerate by exclusion everything to retrieve. > > Marc > > > > On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu wrote: > >> discovery can be reduced to "pattern detection" (can we infer what >> exists?) and "pattern validation" (can we confirm this guess?) >> >> For example, I see a pattern /mail/inbox/148. I, a human being, see a >> pattern with static (/mail/inbox) and variable (148) components; with >> proper naming convention, computers can also detect this pattern >> easily. Now I want to look for all mails in my inbox. I can generate a >> list of /mail/inbox/. These are my guesses, and with selectors >> I can further refine my guesses. >> >> To validate them, bloom filter can provide "best effort" >> discovery(with some false positives, so I call it "best-effort") >> before I stupidly send all the interests to the network. >> >> The discovery protocol, as I described above, is essentially "pattern >> detection by naming convention" and "bloom filter validation." This is >> definitely one of the "simpler" discovery protocol, because the data >> producer only need to add additional bloom filter. Notice that we can >> progressively add entries to bfilter with low computation cost. >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>> Ok, yes I think those would all be good things. >>> >>> One thing to keep in mind, especially with things like time series sensor >>> data, is that people see a pattern and infer a way of doing it. That?s easy >>> for a human :) But in Discovery, one should assume that one does not know >>> of patterns in the data beyond what the protocols used to publish the data >>> explicitly require. That said, I think some of the things you listed are >>> good places to start: sensor data, web content, climate data or genome data. >>> >>> We also need to state what the forwarding strategies are and what the cache >>> behavior is. >>> >>> I outlined some of the points that I think are important in that other >>> posting. While ?discover latest? is useful, ?discover all? is also >>> important, and that one gets complicated fast. So points like separating >>> discovery from retrieval and working with large data sets have been >>> important in shaping our thinking. That all said, I?d be happy starting >>> from 0 and working through the Discovery service definition from scratch >>> along with data set use cases. >>> >>> Marc >>> >>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff wrote: >>> >>> Hi Marc, >>> >>> Thanks ? yes, I saw that as well. I was just trying to get one step more >>> specific, which was to see if we could identify a few specific use cases >>> around which to have the conversation. (e.g., time series sensor data and >>> web content retrieval for "get latest"; climate data for huge data sets; >>> local data in a vehicular network; etc.) What have you been looking at >>> that's driving considerations of discovery? >>> >>> Thanks, >>> Jeff >>> >>> From: >>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>> To: Jeff Burke >>> Cc: , >>> Subject: Re: [Ndn-interest] any comments on naming convention? >>> >>> Jeff, >>> >>> Take a look at my posting (that Felix fixed) in a new thread on Discovery. >>> >>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html >>> >>> I think it would be very productive to talk about what Discovery should do, >>> and not focus on the how. It is sometimes easy to get caught up in the how, >>> which I think is a less important topic than the what at this stage. >>> >>> Marc >>> >>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: >>> >>> Marc, >>> >>> If you can't talk about your protocols, perhaps we can discuss this based >>> on use cases. What are the use cases you are using to evaluate >>> discovery? >>> >>> Jeff >>> >>> >>> >>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: >>> >>> No matter what the expressiveness of the predicates if the forwarder can >>> send interests different ways you don't have a consistent underlying set >>> to talk about so you would always need non-range exclusions to discover >>> every version. >>> >>> Range exclusions only work I believe if you get an authoritative answer. >>> If different content pieces are scattered between different caches I >>> don't see how range exclusions would work to discover every version. >>> >>> I'm sorry to be pointing out problems without offering solutions but >>> we're not ready to publish our discovery protocols. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>> >>> I see. Can you briefly describe how ccnx discovery protocol solves the >>> all problems that you mentioned (not just exclude)? a doc will be >>> better. >>> >>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>> expect [and] and [or], so boolean algebra is fully supported. Regular >>> language or context free language might become part of selector too. >>> >>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>> That will get you one reading then you need to exclude it and ask >>> again. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. >>> >>> >>> I am very confused. For your example, if I want to get all today's >>> sensor data, I just do (Any..Last second of last day)(First second of >>> tomorrow..Any). That's 18 bytes. >>> >>> >>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>> >>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>> >>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>> >>> If you talk sometimes to A and sometimes to B, you very easily >>> could miss content objects you want to discovery unless you avoid >>> all range exclusions and only exclude explicit versions. >>> >>> >>> Could you explain why missing content object situation happens? also >>> range exclusion is just a shorter notation for many explicit >>> exclude; >>> converting from explicit excludes to ranged exclude is always >>> possible. >>> >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. For something like a sensor reading that is updated, >>> say, once per second you will have 86,400 of them per day. If each >>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>> exclusions (plus encoding overhead) per day. >>> >>> yes, maybe using a more deterministic version number than a >>> timestamp makes sense here, but its just an example of needing a lot >>> of exclusions. >>> >>> >>> You exclude through 100 then issue a new interest. This goes to >>> cache B >>> >>> >>> I feel this case is invalid because cache A will also get the >>> interest, and cache A will return v101 if it exists. Like you said, >>> if >>> this goes to cache B only, it means that cache A dies. How do you >>> know >>> that v101 even exist? >>> >>> >>> I guess this depends on what the forwarding strategy is. If the >>> forwarder will always send each interest to all replicas, then yes, >>> modulo packet loss, you would discover v101 on cache A. If the >>> forwarder is just doing ?best path? and can round-robin between cache >>> A and cache B, then your application could miss v101. >>> >>> >>> >>> c,d In general I agree that LPM performance is related to the number >>> of components. In my own thread-safe LMP implementation, I used only >>> one RWMutex for the whole tree. I don't know whether adding lock for >>> every node will be faster or not because of lock overhead. >>> >>> However, we should compare (exact match + discovery protocol) vs >>> (ndn >>> lpm). Comparing performance of exact match to lpm is unfair. >>> >>> >>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>> specs for doing the exact match discovery. So, as I said, I?m not >>> ready to claim its better yet because we have not done that. >>> >>> >>> >>> >>> >>> >>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>> I would point out that using LPM on content object to Interest >>> matching to do discovery has its own set of problems. Discovery >>> involves more than just ?latest version? discovery too. >>> >>> This is probably getting off-topic from the original post about >>> naming conventions. >>> >>> a. If Interests can be forwarded multiple directions and two >>> different caches are responding, the exclusion set you build up >>> talking with cache A will be invalid for cache B. If you talk >>> sometimes to A and sometimes to B, you very easily could miss >>> content objects you want to discovery unless you avoid all range >>> exclusions and only exclude explicit versions. That will lead to >>> very large interest packets. In ccnx 1.0, we believe that an >>> explicit discovery protocol that allows conversations about >>> consistent sets is better. >>> >>> b. Yes, if you just want the ?latest version? discovery that >>> should be transitive between caches, but imagine this. You send >>> Interest #1 to cache A which returns version 100. You exclude >>> through 100 then issue a new interest. This goes to cache B who >>> only has version 99, so the interest times out or is NACK?d. So >>> you think you have it! But, cache A already has version 101, you >>> just don?t know. If you cannot have a conversation around >>> consistent sets, it seems like even doing latest version discovery >>> is difficult with selector based discovery. From what I saw in >>> ccnx 0.x, one ended up getting an Interest all the way to the >>> authoritative source because you can never believe an intermediate >>> cache that there?s not something more recent. >>> >>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>> interest in seeing your analysis. Case (a) is that a node can >>> correctly discover every version of a name prefix, and (b) is that >>> a node can correctly discover the latest version. We have not >>> formally compared (or yet published) our discovery protocols (we >>> have three, 2 for content, 1 for device) compared to selector based >>> discovery, so I cannot yet claim they are better, but they do not >>> have the non-determinism sketched above. >>> >>> c. Using LPM, there is a non-deterministic number of lookups you >>> must do in the PIT to match a content object. If you have a name >>> tree or a threaded hash table, those don?t all need to be hash >>> lookups, but you need to walk up the name tree for every prefix of >>> the content object name and evaluate the selector predicate. >>> Content Based Networking (CBN) had some some methods to create data >>> structures based on predicates, maybe those would be better. But >>> in any case, you will potentially need to retrieve many PIT entries >>> if there is Interest traffic for many prefixes of a root. Even on >>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>> implementation only requires at most 3 lookups (one by name, one by >>> name + keyid, one by name + content object hash), and one can do >>> other things to optimize lookup for an extra write. >>> >>> d. In (c) above, if you have a threaded name tree or are just >>> walking parent pointers, I suspect you?ll need locking of the >>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>> and that will be expensive. It would be interesting to see what a >>> cache consistent multi-threaded name tree looks like. >>> >>> Marc >>> >>> >>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>> wrote: >>> >>> I had thought about these questions, but I want to know your idea >>> besides typed component: >>> 1. LPM allows "data discovery". How will exact match do similar >>> things? >>> 2. will removing selectors improve performance? How do we use >>> other >>> faster technique to replace selector? >>> 3. fixed byte length and type. I agree more that type can be fixed >>> byte, but 2 bytes for length might not be enough for future. >>> >>> >>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>> wrote: >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> >>> Could you share it with us? >>> >>> Sure. Here?s a strawman. >>> >>> The type space is 16 bits, so you have 65,565 types. >>> >>> The type space is currently shared with the types used for the >>> entire protocol, that gives us two options: >>> (1) we reserve a range for name component types. Given the >>> likelihood there will be at least as much and probably more need >>> to component types than protocol extensions, we could reserve 1/2 >>> of the type space, giving us 32K types for name components. >>> (2) since there is no parsing ambiguity between name components >>> and other fields of the protocol (sine they are sub-types of the >>> name type) we could reuse numbers and thereby have an entire 65K >>> name component types. >>> >>> We divide the type space into regions, and manage it with a >>> registry. If we ever get to the point of creating an IETF >>> standard, IANA has 25 years of experience running registries and >>> there are well-understood rule sets for different kinds of >>> registries (open, requires a written spec, requires standards >>> approval). >>> >>> - We allocate one ?default" name component type for ?generic >>> name?, which would be used on name prefixes and other common >>> cases where there are no special semantics on the name component. >>> - We allocate a range of name component types, say 1024, to >>> globally understood types that are part of the base or extension >>> NDN specifications (e.g. chunk#, version#, etc. >>> - We reserve some portion of the space for unanticipated uses >>> (say another 1024 types) >>> - We give the rest of the space to application assignment. >>> >>> Make sense? >>> >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design >>> >>> >>> we could design for performance, >>> >>> That?s not what people are advocating. We are advocating that we >>> *not* design for known bad performance and hope serendipity or >>> Moore?s Law will come to the rescue. >>> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >>> >>> Perhaps, perhaps not. Relative performance is what matters so >>> things that don?t get faster while others do tend to get dropped >>> or not used because they impose a performance penalty relative to >>> the things that go faster. There is also the ?low-end? phenomenon >>> where impovements in technology get applied to lowering cost >>> rather than improving performance. For those environments bad >>> performance just never get better. >>> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >>> I suspect LPM on data will always be slow (relative to the other >>> functions). >>> i suspect exclusions will always be slow because they will >>> require extra memory references. >>> >>> However I of course don?t claim to clairvoyance so this is just >>> speculation based on 35+ years of seeing performance improve by 4 >>> orders of magnitude and still having to worry about counting >>> cycles and memory references? >>> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>> wrote: >>> >>> We should not look at a certain chip nowadays and want ndn to >>> perform >>> well on it. It should be the other way around: once ndn app >>> becomes >>> popular, a better chip will be designed for ndn. >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design: >>> a) clock rates are not getting (much) faster >>> b) memory accesses are getting (relatively) more expensive >>> c) data structures that require locks to manipulate >>> successfully will be relatively more expensive, even with >>> near-zero lock contention. >>> >>> The fact is, IP *did* have some serious performance flaws in >>> its design. We just forgot those because the design elements >>> that depended on those mistakes have fallen into disuse. The >>> poster children for this are: >>> 1. IP options. Nobody can use them because they are too slow >>> on modern forwarding hardware, so they can?t be reliably used >>> anywhere >>> 2. the UDP checksum, which was a bad design when it was >>> specified and is now a giant PITA that still causes major pain >>> in working around. >>> >>> I?m afraid students today are being taught the that designers >>> of IP were flawless, as opposed to very good scientists and >>> engineers that got most of it right. >>> >>> I feel the discussion today and yesterday has been off-topic. >>> Now I >>> see that there are 3 approaches: >>> 1. we should not define a naming convention at all >>> 2. typed component: use tlv type space and add a handful of >>> types >>> 3. marked component: introduce only one more type and add >>> additional >>> marker space >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> It is just as powerful in practice as either throwing up our >>> hands and letting applications design their own mutually >>> incompatible schemes or trying to make naming conventions with >>> markers in a way that is fast to generate/parse and also >>> resilient against aliasing. >>> >>> Also everybody thinks that the current utf8 marker naming >>> convention >>> needs to be revised. >>> >>> >>> >>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>> wrote: >>> Would that chip be suitable, i.e. can we expect most names >>> to fit in (the >>> magnitude of) 96 bytes? What length are names usually in >>> current NDN >>> experiments? >>> >>> I guess wide deployment could make for even longer names. >>> Related: Many URLs >>> I encounter nowadays easily don't fit within two 80-column >>> text lines, and >>> NDN will have to carry more information than URLs, as far as >>> I see. >>> >>> >>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>> >>> In fact, the index in separate TLV will be slower on some >>> architectures, >>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>> bytes in memory, >>> then any subsequent memory is accessed only as two adjacent >>> 32-byte blocks >>> (there can be at most 5 blocks available at any one time). >>> If you need to >>> switch between arrays, it would be very expensive. If you >>> have to read past >>> the name to get to the 2nd array, then read it, then backup >>> to get to the >>> name, it will be pretty expensive too. >>> >>> Marc >>> >>> On Sep 18, 2014, at 2:02 PM, >>> wrote: >>> >>> Does this make that much difference? >>> >>> If you want to parse the first 5 components. One way to do >>> it is: >>> >>> Read the index, find entry 5, then read in that many bytes >>> from the start >>> offset of the beginning of the name. >>> OR >>> Start reading name, (find size + move ) 5 times. >>> >>> How much speed are you getting from one to the other? You >>> seem to imply >>> that the first one is faster. I don?t think this is the >>> case. >>> >>> In the first one you?ll probably have to get the cache line >>> for the index, >>> then all the required cache lines for the first 5 >>> components. For the >>> second, you?ll have to get all the cache lines for the first >>> 5 components. >>> Given an assumption that a cache miss is way more expensive >>> than >>> evaluating a number and computing an addition, you might >>> find that the >>> performance of the index is actually slower than the >>> performance of the >>> direct access. >>> >>> Granted, there is a case where you don?t access the name at >>> all, for >>> example, if you just get the offsets and then send the >>> offsets as >>> parameters to another processor/GPU/NPU/etc. In this case >>> you may see a >>> gain IF there are more cache line misses in reading the name >>> than in >>> reading the index. So, if the regular part of the name >>> that you?re >>> parsing is bigger than the cache line (64 bytes?) and the >>> name is to be >>> processed by a different processor, then your might see some >>> performance >>> gain in using the index, but in all other circumstances I >>> bet this is not >>> the case. I may be wrong, haven?t actually tested it. >>> >>> This is all to say, I don?t think we should be designing the >>> protocol with >>> only one architecture in mind. (The architecture of sending >>> the name to a >>> different processor than the index). >>> >>> If you have numbers that show that the index is faster I >>> would like to see >>> under what conditions and architectural assumptions. >>> >>> Nacho >>> >>> (I may have misinterpreted your description so feel free to >>> correct me if >>> I?m wrong.) >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>> >>> wrote: >>> >>> Indeed each components' offset must be encoded using a fixed >>> amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your >>> name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want the first x components) you can directly have it using >>> the >>> offsets. With the Nested TLV structure you have to >>> iteratively parse >>> the first x-1 components. With the offset structure you cane >>> directly >>> access to the firs x components. >>> >>> I don't get it. What you described only works if the >>> "offset" is >>> encoded in fixed bytes. With varNum, you will still need to >>> parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>> >>> On 17/09/2014 14:56, Mark Stapp wrote: >>> >>> ah, thanks - that's helpful. I thought you were saying "I >>> like the >>> existing NDN UTF8 'convention'." I'm still not sure I >>> understand what >>> you >>> _do_ prefer, though. it sounds like you're describing an >>> entirely >>> different >>> scheme where the info that describes the name-components is >>> ... >>> someplace >>> other than _in_ the name-components. is that correct? when >>> you say >>> "field >>> separator", what do you mean (since that's not a "TL" from a >>> TLV)? >>> >>> Correct. >>> In particular, with our name encoding, a TLV indicates the >>> name >>> hierarchy >>> with offsets in the name and other TLV(s) indicates the >>> offset to use >>> in >>> order to retrieve special components. >>> As for the field separator, it is something like "/". >>> Aliasing is >>> avoided as >>> you do not rely on field separators to parse the name; you >>> use the >>> "offset >>> TLV " to do that. >>> >>> So now, it may be an aesthetic question but: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want >>> the first x components) you can directly have it using the >>> offsets. >>> With the >>> Nested TLV structure you have to iteratively parse the first >>> x-1 >>> components. >>> With the offset structure you cane directly access to the >>> firs x >>> components. >>> >>> Max >>> >>> >>> -- Mark >>> >>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>> The why is simple: >>> >>> You use a lot of "generic component type" and very few >>> "specific >>> component type". You are imposing types for every component >>> in order >>> to >>> handle few exceptions (segmentation, etc..). You create a >>> rule >>> (specify >>> the component's type ) to handle exceptions! >>> >>> I would prefer not to have typed components. Instead I would >>> prefer >>> to >>> have the name as simple sequence bytes with a field >>> separator. Then, >>> outside the name, if you have some components that could be >>> used at >>> network layer (e.g. a TLV field), you simply need something >>> that >>> indicates which is the offset allowing you to retrieve the >>> version, >>> segment, etc in the name... >>> >>> >>> Max >>> >>> >>> >>> >>> >>> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end >>> up with >>> names >>> containing many generic components types and few specific >>> components >>> types. Due to the fact that the component type specification >>> is an >>> exception in the name, I would prefer something that specify >>> component's >>> type only when needed (something like UTF8 conventions but >>> that >>> applications MUST use). >>> >>> so ... I can't quite follow that. the thread has had some >>> explanation >>> about why the UTF8 requirement has problems (with aliasing, >>> e.g.) >>> and >>> there's been email trying to explain that applications don't >>> have to >>> use types if they don't need to. your email sounds like "I >>> prefer >>> the >>> UTF8 convention", but it doesn't say why you have that >>> preference in >>> the face of the points about the problems. can you say why >>> it is >>> that >>> you express a preference for the "convention" with problems ? >>> >>> Thanks, >>> Mark >>> >>> . >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From jburke at remap.UCLA.EDU Wed Sep 24 01:33:42 2014 From: jburke at remap.UCLA.EDU (Burke, Jeff) Date: Wed, 24 Sep 2014 08:33:42 +0000 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: Message-ID: Thanks for the feedback; we will look at uploading to vimeo or a similar service - I suspect this can happen in October. Jeff On 9/23/14, 10:46 AM, "Tai-Lin Chu" wrote: >I was watching it a week after ndncomm. Besides slow connection, the >other painful thing is that I cannot fast forward/backward, and seek. > > >On Mon, Sep 22, 2014 at 11:35 PM, Felix Rabe wrote: >> Hi Jeff >> >> Don't want to bother you right now, but I'd like to review them >>sometime in >> the next month, as my notes are incomplete in many places. >> >> Also, Xiaoke mentioned that the streamed versions are unsuitable for >>folks >> in China with a slow connection. They would still like to see them. >> >> - Felix >> >> >> >> On 23/Sep/14 08:01, Burke, Jeff wrote: >>> >>> Hi Felix, >>> >>> Unfortunately the live stream records are what they are - some quirks >>>in >>> the early recording can't be fixed. >>> >>> We should have separate local recordings as well, but are pretty >>>swamped >>> right now. Is there something in particular you'd like to see posted? >>> >>> Jeff >>> >>> >>> >>> On 9/23/14, 12:46 AM, "Felix Rabe" wrote: >>> >>>> Hi list (or, REMAP) >>>> >>>> The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are >>>> they available as downloads somewhere? >>>> >>>> Also, I see some videos are barely viewable (at least [1], but [2] >>>>seems >>>> to be fine), skipping a few seconds every few seconds. Do you still >>>>have >>>> a complete version? >>>> >>>> [1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 >>>> [2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 >>>> >>>> Kind regards >>>> - Felix >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From lanwang at memphis.edu Wed Sep 24 07:37:58 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Wed, 24 Sep 2014 14:37:58 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <65E2A60C-5494-4991-B78D-7164B1277176@parc.com> References: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> <65E2A60C-5494-4991-B78D-7164B1277176@parc.com> Message-ID: <5D39D37F-D41D-4834-A0DB-A94322D79DA3@memphis.edu> Agree that name discovery will be useful in this case. But I think the question is whether you can get rid of selectors in the name discovery and use exact matching only. Here's how I see the name discovery will work in this case: - the email client can request the latest list of email names in the mailbox /username/mailbox/list (using the right-most selector). - of course, there may be cached lists in the network. This should be minimized by a very short FreshnessSeconds on the data set by the email server (e.g., 1 second). So most likely there is at most one cached list is out there. - after the client gets an email list (/username/mailbox/list/20), the client should immediately issue another Interest (/username/mailbox/list/) with the selector (>20) to see if there are most recent ones. - eventually the request will get to the server. The server may respond with a more recent list or a NACK. - meanwhile, the email client can retrieve the emails using the names obtained in these lists. Some emails may turn out to be unnecessary, so they will be discarded when a most recent list comes. The email client can also keep state about the names of the emails it has deleted to minimize this problem. Lan On Sep 24, 2014, at 2:37 AM, Marc.Mosko at parc.com wrote: > Ok, let?s take that example and run with it a bit. I?ll walk through a ?discover all? example. This example leads me to why I say discovery should be separate from data retrieval. I don?t claim that we have a final solution to this problem, I think in a distributed peer-to-peer environment solving this problem is difficult. If you have a counter example as to how this discovery could progress using only the information know a priori by the requester, I would be interesting in seeing that example worked out. Please do correct me if you think this is wrong. > > You have mails that were originally numbered 0 - 10000, sequentially by the server. > > You travel between several places and access different emails from different places. This populates caches. Lets say 0,3,6,9,? are on cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. Also, you have deleted 500 random emails, so there?s only 9500 emails actually out there. > > You setup a new computer and now want to download all your emails. The new computer is on the path of caches C, B, then A, then the authoritative source server. The new email program has no initial state. The email program only knows that the email number is an integer that starts at 0. It issues an interest for /mail/inbox, and asks for left-most child because it want to populate in order. It gets a response from cache C with mail 2. > > Now, what does the email program do? It cannot exclude the range 0..2 because that would possibly miss 0 and 1. So, all it can do is exclude the exact number ?2? and ask again. It then gets cache C again and it responds with ?5?. There are about 3000 emails on cache C, and if they all take 4 bytes (for the exclude component plus its coding overhead), then that?s 12KB of exclusions to finally exhaust cache C. > > If we want Interests to avoid fragmentation, we can fit about 1200 bytes of exclusions, or 300 components. This means we need about 10 interest messages. Each interest would be something like ?exclude 2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, 308, ?, >600?, etc. > > Those interests that exclude everything at cache C would then hit, say cache B and start getting results 1, 4, 7, ?. This means an Interest like ?exclude 2,5,8,11,?, >300? would then get back number 1. That means the next request actually has to split that one interest?s exclude in to two (because the interest was at maximum size), so you now issue two interests where one is ?exclude 1, 2, 5, 8, >210? and the other is ?<210, 212, 215, ?, >300?. > > If you look in the CCNx 0.8 java code, there should be a class that does these Interest based discoveries and does the Interest splitting based on the currently know range of discovered content. I don?t have the specific reference right now, but I can send a link if you are interested in seeing that. The java class keeps state of what has been discovered so far, so it could re-start later if interrupted. > > So all those interests would now be getting results form cache B. You would then start to split all those ranges to accommodate the numbers coming back from B. Eventually, you?ll have at least 10 Interest messages outstanding that would be excluding all the 9500 messages that are in caches A, B, and C. Some of those interest messages might actually reach an authoritative server, which might respond too. It would like be more than 10 interests due to the algorithm that?s used to split full interests, which likely is not optimal because it does not know exactly where breaks should be a priori. > > Once you have exhausted caches A, B, and C, the interest messages would reach the authoritative source (if its on line), and it would be issuing NACKs (i assume) for interests have have excluded all non-deleted emails. > > In any case, it takes, at best, 9,500 round trips to ?discover? all 9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 bytes of Interest exclusions. Note that it?s an arithmetic sum of bytes of exclusion, because at each Interest the size of the exclusions increases by 4. There was an NDN paper about light bulb discovery (or something like that) that noted this same problem and proposed some work around, but I don?t remember what they proposed. > > Yes, you could possibly pipeline it, but what would you do? In this example, where emails 0 - 10000 (minus some random ones) would allow you ? if you knew a priori ? to issue say 10 interest in parallel that ask for different ranges. But, 2 years from now your undeleted emails might range form 100,000 - 150,000. The point is that a discovery protocol does not know, a priori, what is to be discovered. It might start learning some stuff as it goes on. > > If you could have retrieved just a table of contents from each cache, where each ?row? is say 64 bytes (i.e. the name continuation plus hash value), you would need to retrieve 3300 * 64 = 211KB from each cache (total 640 KB) to list all the emails. That would take 640KB / 1200 = 534 interest messages of say 64 bytes = 34 KB to discover all 9500 emails plus another set to fetch the header rows. That?s, say 68 KB of interest traffic compared to 200 MB. Now, I?ve not said how to list these tables of contents, so an actual protocol might be higher communication cost, but even if it was 10x worse that would still be an attractive tradeoff. > > This assumes that you publish just the ?header? in the 1st segment (say 1 KB total object size including the signatures). That?s 10 MB to learn the headers. > > You could also argue that the distribute of emails over caches is arbitrary. That?s true, I picked a difficult sequence. But unless you have some positive controls on what could be in a cache, it could be any difficult sequence. I also did not address the timeout issue, and how do you know you are done? > > This is also why sync works so much better than doing raw interest discovery. Sync exchanges tables of contents and diffs, it does not need to enumerate by exclusion everything to retrieve. > > Marc > > > > On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu wrote: > >> discovery can be reduced to "pattern detection" (can we infer what >> exists?) and "pattern validation" (can we confirm this guess?) >> >> For example, I see a pattern /mail/inbox/148. I, a human being, see a >> pattern with static (/mail/inbox) and variable (148) components; with >> proper naming convention, computers can also detect this pattern >> easily. Now I want to look for all mails in my inbox. I can generate a >> list of /mail/inbox/. These are my guesses, and with selectors >> I can further refine my guesses. >> >> To validate them, bloom filter can provide "best effort" >> discovery(with some false positives, so I call it "best-effort") >> before I stupidly send all the interests to the network. >> >> The discovery protocol, as I described above, is essentially "pattern >> detection by naming convention" and "bloom filter validation." This is >> definitely one of the "simpler" discovery protocol, because the data >> producer only need to add additional bloom filter. Notice that we can >> progressively add entries to bfilter with low computation cost. >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>> Ok, yes I think those would all be good things. >>> >>> One thing to keep in mind, especially with things like time series sensor >>> data, is that people see a pattern and infer a way of doing it. That?s easy >>> for a human :) But in Discovery, one should assume that one does not know >>> of patterns in the data beyond what the protocols used to publish the data >>> explicitly require. That said, I think some of the things you listed are >>> good places to start: sensor data, web content, climate data or genome data. >>> >>> We also need to state what the forwarding strategies are and what the cache >>> behavior is. >>> >>> I outlined some of the points that I think are important in that other >>> posting. While ?discover latest? is useful, ?discover all? is also >>> important, and that one gets complicated fast. So points like separating >>> discovery from retrieval and working with large data sets have been >>> important in shaping our thinking. That all said, I?d be happy starting >>> from 0 and working through the Discovery service definition from scratch >>> along with data set use cases. >>> >>> Marc >>> >>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff wrote: >>> >>> Hi Marc, >>> >>> Thanks ? yes, I saw that as well. I was just trying to get one step more >>> specific, which was to see if we could identify a few specific use cases >>> around which to have the conversation. (e.g., time series sensor data and >>> web content retrieval for "get latest"; climate data for huge data sets; >>> local data in a vehicular network; etc.) What have you been looking at >>> that's driving considerations of discovery? >>> >>> Thanks, >>> Jeff >>> >>> From: >>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>> To: Jeff Burke >>> Cc: , >>> Subject: Re: [Ndn-interest] any comments on naming convention? >>> >>> Jeff, >>> >>> Take a look at my posting (that Felix fixed) in a new thread on Discovery. >>> >>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html >>> >>> I think it would be very productive to talk about what Discovery should do, >>> and not focus on the how. It is sometimes easy to get caught up in the how, >>> which I think is a less important topic than the what at this stage. >>> >>> Marc >>> >>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: >>> >>> Marc, >>> >>> If you can't talk about your protocols, perhaps we can discuss this based >>> on use cases. What are the use cases you are using to evaluate >>> discovery? >>> >>> Jeff >>> >>> >>> >>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: >>> >>> No matter what the expressiveness of the predicates if the forwarder can >>> send interests different ways you don't have a consistent underlying set >>> to talk about so you would always need non-range exclusions to discover >>> every version. >>> >>> Range exclusions only work I believe if you get an authoritative answer. >>> If different content pieces are scattered between different caches I >>> don't see how range exclusions would work to discover every version. >>> >>> I'm sorry to be pointing out problems without offering solutions but >>> we're not ready to publish our discovery protocols. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>> >>> I see. Can you briefly describe how ccnx discovery protocol solves the >>> all problems that you mentioned (not just exclude)? a doc will be >>> better. >>> >>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>> expect [and] and [or], so boolean algebra is fully supported. Regular >>> language or context free language might become part of selector too. >>> >>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>> That will get you one reading then you need to exclude it and ask >>> again. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. >>> >>> >>> I am very confused. For your example, if I want to get all today's >>> sensor data, I just do (Any..Last second of last day)(First second of >>> tomorrow..Any). That's 18 bytes. >>> >>> >>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>> >>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>> >>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>> >>> If you talk sometimes to A and sometimes to B, you very easily >>> could miss content objects you want to discovery unless you avoid >>> all range exclusions and only exclude explicit versions. >>> >>> >>> Could you explain why missing content object situation happens? also >>> range exclusion is just a shorter notation for many explicit >>> exclude; >>> converting from explicit excludes to ranged exclude is always >>> possible. >>> >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. For something like a sensor reading that is updated, >>> say, once per second you will have 86,400 of them per day. If each >>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>> exclusions (plus encoding overhead) per day. >>> >>> yes, maybe using a more deterministic version number than a >>> timestamp makes sense here, but its just an example of needing a lot >>> of exclusions. >>> >>> >>> You exclude through 100 then issue a new interest. This goes to >>> cache B >>> >>> >>> I feel this case is invalid because cache A will also get the >>> interest, and cache A will return v101 if it exists. Like you said, >>> if >>> this goes to cache B only, it means that cache A dies. How do you >>> know >>> that v101 even exist? >>> >>> >>> I guess this depends on what the forwarding strategy is. If the >>> forwarder will always send each interest to all replicas, then yes, >>> modulo packet loss, you would discover v101 on cache A. If the >>> forwarder is just doing ?best path? and can round-robin between cache >>> A and cache B, then your application could miss v101. >>> >>> >>> >>> c,d In general I agree that LPM performance is related to the number >>> of components. In my own thread-safe LMP implementation, I used only >>> one RWMutex for the whole tree. I don't know whether adding lock for >>> every node will be faster or not because of lock overhead. >>> >>> However, we should compare (exact match + discovery protocol) vs >>> (ndn >>> lpm). Comparing performance of exact match to lpm is unfair. >>> >>> >>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>> specs for doing the exact match discovery. So, as I said, I?m not >>> ready to claim its better yet because we have not done that. >>> >>> >>> >>> >>> >>> >>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>> I would point out that using LPM on content object to Interest >>> matching to do discovery has its own set of problems. Discovery >>> involves more than just ?latest version? discovery too. >>> >>> This is probably getting off-topic from the original post about >>> naming conventions. >>> >>> a. If Interests can be forwarded multiple directions and two >>> different caches are responding, the exclusion set you build up >>> talking with cache A will be invalid for cache B. If you talk >>> sometimes to A and sometimes to B, you very easily could miss >>> content objects you want to discovery unless you avoid all range >>> exclusions and only exclude explicit versions. That will lead to >>> very large interest packets. In ccnx 1.0, we believe that an >>> explicit discovery protocol that allows conversations about >>> consistent sets is better. >>> >>> b. Yes, if you just want the ?latest version? discovery that >>> should be transitive between caches, but imagine this. You send >>> Interest #1 to cache A which returns version 100. You exclude >>> through 100 then issue a new interest. This goes to cache B who >>> only has version 99, so the interest times out or is NACK?d. So >>> you think you have it! But, cache A already has version 101, you >>> just don?t know. If you cannot have a conversation around >>> consistent sets, it seems like even doing latest version discovery >>> is difficult with selector based discovery. From what I saw in >>> ccnx 0.x, one ended up getting an Interest all the way to the >>> authoritative source because you can never believe an intermediate >>> cache that there?s not something more recent. >>> >>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>> interest in seeing your analysis. Case (a) is that a node can >>> correctly discover every version of a name prefix, and (b) is that >>> a node can correctly discover the latest version. We have not >>> formally compared (or yet published) our discovery protocols (we >>> have three, 2 for content, 1 for device) compared to selector based >>> discovery, so I cannot yet claim they are better, but they do not >>> have the non-determinism sketched above. >>> >>> c. Using LPM, there is a non-deterministic number of lookups you >>> must do in the PIT to match a content object. If you have a name >>> tree or a threaded hash table, those don?t all need to be hash >>> lookups, but you need to walk up the name tree for every prefix of >>> the content object name and evaluate the selector predicate. >>> Content Based Networking (CBN) had some some methods to create data >>> structures based on predicates, maybe those would be better. But >>> in any case, you will potentially need to retrieve many PIT entries >>> if there is Interest traffic for many prefixes of a root. Even on >>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>> implementation only requires at most 3 lookups (one by name, one by >>> name + keyid, one by name + content object hash), and one can do >>> other things to optimize lookup for an extra write. >>> >>> d. In (c) above, if you have a threaded name tree or are just >>> walking parent pointers, I suspect you?ll need locking of the >>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>> and that will be expensive. It would be interesting to see what a >>> cache consistent multi-threaded name tree looks like. >>> >>> Marc >>> >>> >>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>> wrote: >>> >>> I had thought about these questions, but I want to know your idea >>> besides typed component: >>> 1. LPM allows "data discovery". How will exact match do similar >>> things? >>> 2. will removing selectors improve performance? How do we use >>> other >>> faster technique to replace selector? >>> 3. fixed byte length and type. I agree more that type can be fixed >>> byte, but 2 bytes for length might not be enough for future. >>> >>> >>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>> wrote: >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> >>> Could you share it with us? >>> >>> Sure. Here?s a strawman. >>> >>> The type space is 16 bits, so you have 65,565 types. >>> >>> The type space is currently shared with the types used for the >>> entire protocol, that gives us two options: >>> (1) we reserve a range for name component types. Given the >>> likelihood there will be at least as much and probably more need >>> to component types than protocol extensions, we could reserve 1/2 >>> of the type space, giving us 32K types for name components. >>> (2) since there is no parsing ambiguity between name components >>> and other fields of the protocol (sine they are sub-types of the >>> name type) we could reuse numbers and thereby have an entire 65K >>> name component types. >>> >>> We divide the type space into regions, and manage it with a >>> registry. If we ever get to the point of creating an IETF >>> standard, IANA has 25 years of experience running registries and >>> there are well-understood rule sets for different kinds of >>> registries (open, requires a written spec, requires standards >>> approval). >>> >>> - We allocate one ?default" name component type for ?generic >>> name?, which would be used on name prefixes and other common >>> cases where there are no special semantics on the name component. >>> - We allocate a range of name component types, say 1024, to >>> globally understood types that are part of the base or extension >>> NDN specifications (e.g. chunk#, version#, etc. >>> - We reserve some portion of the space for unanticipated uses >>> (say another 1024 types) >>> - We give the rest of the space to application assignment. >>> >>> Make sense? >>> >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design >>> >>> >>> we could design for performance, >>> >>> That?s not what people are advocating. We are advocating that we >>> *not* design for known bad performance and hope serendipity or >>> Moore?s Law will come to the rescue. >>> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >>> >>> Perhaps, perhaps not. Relative performance is what matters so >>> things that don?t get faster while others do tend to get dropped >>> or not used because they impose a performance penalty relative to >>> the things that go faster. There is also the ?low-end? phenomenon >>> where impovements in technology get applied to lowering cost >>> rather than improving performance. For those environments bad >>> performance just never get better. >>> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >>> I suspect LPM on data will always be slow (relative to the other >>> functions). >>> i suspect exclusions will always be slow because they will >>> require extra memory references. >>> >>> However I of course don?t claim to clairvoyance so this is just >>> speculation based on 35+ years of seeing performance improve by 4 >>> orders of magnitude and still having to worry about counting >>> cycles and memory references? >>> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>> wrote: >>> >>> We should not look at a certain chip nowadays and want ndn to >>> perform >>> well on it. It should be the other way around: once ndn app >>> becomes >>> popular, a better chip will be designed for ndn. >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design: >>> a) clock rates are not getting (much) faster >>> b) memory accesses are getting (relatively) more expensive >>> c) data structures that require locks to manipulate >>> successfully will be relatively more expensive, even with >>> near-zero lock contention. >>> >>> The fact is, IP *did* have some serious performance flaws in >>> its design. We just forgot those because the design elements >>> that depended on those mistakes have fallen into disuse. The >>> poster children for this are: >>> 1. IP options. Nobody can use them because they are too slow >>> on modern forwarding hardware, so they can?t be reliably used >>> anywhere >>> 2. the UDP checksum, which was a bad design when it was >>> specified and is now a giant PITA that still causes major pain >>> in working around. >>> >>> I?m afraid students today are being taught the that designers >>> of IP were flawless, as opposed to very good scientists and >>> engineers that got most of it right. >>> >>> I feel the discussion today and yesterday has been off-topic. >>> Now I >>> see that there are 3 approaches: >>> 1. we should not define a naming convention at all >>> 2. typed component: use tlv type space and add a handful of >>> types >>> 3. marked component: introduce only one more type and add >>> additional >>> marker space >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> It is just as powerful in practice as either throwing up our >>> hands and letting applications design their own mutually >>> incompatible schemes or trying to make naming conventions with >>> markers in a way that is fast to generate/parse and also >>> resilient against aliasing. >>> >>> Also everybody thinks that the current utf8 marker naming >>> convention >>> needs to be revised. >>> >>> >>> >>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>> wrote: >>> Would that chip be suitable, i.e. can we expect most names >>> to fit in (the >>> magnitude of) 96 bytes? What length are names usually in >>> current NDN >>> experiments? >>> >>> I guess wide deployment could make for even longer names. >>> Related: Many URLs >>> I encounter nowadays easily don't fit within two 80-column >>> text lines, and >>> NDN will have to carry more information than URLs, as far as >>> I see. >>> >>> >>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>> >>> In fact, the index in separate TLV will be slower on some >>> architectures, >>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>> bytes in memory, >>> then any subsequent memory is accessed only as two adjacent >>> 32-byte blocks >>> (there can be at most 5 blocks available at any one time). >>> If you need to >>> switch between arrays, it would be very expensive. If you >>> have to read past >>> the name to get to the 2nd array, then read it, then backup >>> to get to the >>> name, it will be pretty expensive too. >>> >>> Marc >>> >>> On Sep 18, 2014, at 2:02 PM, >>> wrote: >>> >>> Does this make that much difference? >>> >>> If you want to parse the first 5 components. One way to do >>> it is: >>> >>> Read the index, find entry 5, then read in that many bytes >>> from the start >>> offset of the beginning of the name. >>> OR >>> Start reading name, (find size + move ) 5 times. >>> >>> How much speed are you getting from one to the other? You >>> seem to imply >>> that the first one is faster. I don?t think this is the >>> case. >>> >>> In the first one you?ll probably have to get the cache line >>> for the index, >>> then all the required cache lines for the first 5 >>> components. For the >>> second, you?ll have to get all the cache lines for the first >>> 5 components. >>> Given an assumption that a cache miss is way more expensive >>> than >>> evaluating a number and computing an addition, you might >>> find that the >>> performance of the index is actually slower than the >>> performance of the >>> direct access. >>> >>> Granted, there is a case where you don?t access the name at >>> all, for >>> example, if you just get the offsets and then send the >>> offsets as >>> parameters to another processor/GPU/NPU/etc. In this case >>> you may see a >>> gain IF there are more cache line misses in reading the name >>> than in >>> reading the index. So, if the regular part of the name >>> that you?re >>> parsing is bigger than the cache line (64 bytes?) and the >>> name is to be >>> processed by a different processor, then your might see some >>> performance >>> gain in using the index, but in all other circumstances I >>> bet this is not >>> the case. I may be wrong, haven?t actually tested it. >>> >>> This is all to say, I don?t think we should be designing the >>> protocol with >>> only one architecture in mind. (The architecture of sending >>> the name to a >>> different processor than the index). >>> >>> If you have numbers that show that the index is faster I >>> would like to see >>> under what conditions and architectural assumptions. >>> >>> Nacho >>> >>> (I may have misinterpreted your description so feel free to >>> correct me if >>> I?m wrong.) >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>> >>> wrote: >>> >>> Indeed each components' offset must be encoded using a fixed >>> amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your >>> name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want the first x components) you can directly have it using >>> the >>> offsets. With the Nested TLV structure you have to >>> iteratively parse >>> the first x-1 components. With the offset structure you cane >>> directly >>> access to the firs x components. >>> >>> I don't get it. What you described only works if the >>> "offset" is >>> encoded in fixed bytes. With varNum, you will still need to >>> parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>> >>> On 17/09/2014 14:56, Mark Stapp wrote: >>> >>> ah, thanks - that's helpful. I thought you were saying "I >>> like the >>> existing NDN UTF8 'convention'." I'm still not sure I >>> understand what >>> you >>> _do_ prefer, though. it sounds like you're describing an >>> entirely >>> different >>> scheme where the info that describes the name-components is >>> ... >>> someplace >>> other than _in_ the name-components. is that correct? when >>> you say >>> "field >>> separator", what do you mean (since that's not a "TL" from a >>> TLV)? >>> >>> Correct. >>> In particular, with our name encoding, a TLV indicates the >>> name >>> hierarchy >>> with offsets in the name and other TLV(s) indicates the >>> offset to use >>> in >>> order to retrieve special components. >>> As for the field separator, it is something like "/". >>> Aliasing is >>> avoided as >>> you do not rely on field separators to parse the name; you >>> use the >>> "offset >>> TLV " to do that. >>> >>> So now, it may be an aesthetic question but: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want >>> the first x components) you can directly have it using the >>> offsets. >>> With the >>> Nested TLV structure you have to iteratively parse the first >>> x-1 >>> components. >>> With the offset structure you cane directly access to the >>> firs x >>> components. >>> >>> Max >>> >>> >>> -- Mark >>> >>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>> The why is simple: >>> >>> You use a lot of "generic component type" and very few >>> "specific >>> component type". You are imposing types for every component >>> in order >>> to >>> handle few exceptions (segmentation, etc..). You create a >>> rule >>> (specify >>> the component's type ) to handle exceptions! >>> >>> I would prefer not to have typed components. Instead I would >>> prefer >>> to >>> have the name as simple sequence bytes with a field >>> separator. Then, >>> outside the name, if you have some components that could be >>> used at >>> network layer (e.g. a TLV field), you simply need something >>> that >>> indicates which is the offset allowing you to retrieve the >>> version, >>> segment, etc in the name... >>> >>> >>> Max >>> >>> >>> >>> >>> >>> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end >>> up with >>> names >>> containing many generic components types and few specific >>> components >>> types. Due to the fact that the component type specification >>> is an >>> exception in the name, I would prefer something that specify >>> component's >>> type only when needed (something like UTF8 conventions but >>> that >>> applications MUST use). >>> >>> so ... I can't quite follow that. the thread has had some >>> explanation >>> about why the UTF8 requirement has problems (with aliasing, >>> e.g.) >>> and >>> there's been email trying to explain that applications don't >>> have to >>> use types if they don't need to. your email sounds like "I >>> prefer >>> the >>> UTF8 convention", but it doesn't say why you have that >>> preference in >>> the face of the points about the problems. can you say why >>> it is >>> that >>> you express a preference for the "convention" with problems ? >>> >>> Thanks, >>> Mark >>> >>> . >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Wed Sep 24 08:20:02 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Wed, 24 Sep 2014 15:20:02 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> Message-ID: On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >For example, I see a pattern /mail/inbox/148. I, a human being, see a >pattern with static (/mail/inbox) and variable (148) components; with >proper naming convention, computers can also detect this pattern >easily. Now I want to look for all mails in my inbox. I can generate a >list of /mail/inbox/. These are my guesses, and with selectors >I can further refine my guesses. I think this is a very bad example (or at least a very bad application design). You have an app (a mail server / inbox) and you want it to list your emails? An email list is an application data structure. I don?t think you should use the network structure to reflect this. I?ll give you an example, how do you delete emails from your inbox? If an email was cached in the network it can never be deleted from your inbox? Or moved to another mailbox? Do you rely on the emails expiring? This problem is true for most (any?) situations where you use network name structure to directly reflect the application data structure. Nacho >On Tue, Sep 23, 2014 at 2:34 AM, wrote: >> Ok, yes I think those would all be good things. >> >> One thing to keep in mind, especially with things like time series >>sensor >> data, is that people see a pattern and infer a way of doing it. That?s >>easy >> for a human :) But in Discovery, one should assume that one does not >>know >> of patterns in the data beyond what the protocols used to publish the >>data >> explicitly require. That said, I think some of the things you listed >>are >> good places to start: sensor data, web content, climate data or genome >>data. >> >> We also need to state what the forwarding strategies are and what the >>cache >> behavior is. >> >> I outlined some of the points that I think are important in that other >> posting. While ?discover latest? is useful, ?discover all? is also >> important, and that one gets complicated fast. So points like >>separating >> discovery from retrieval and working with large data sets have been >> important in shaping our thinking. That all said, I?d be happy starting >> from 0 and working through the Discovery service definition from scratch >> along with data set use cases. >> >> Marc >> >> On Sep 23, 2014, at 12:36 AM, Burke, Jeff wrote: >> >> Hi Marc, >> >> Thanks ? yes, I saw that as well. I was just trying to get one step more >> specific, which was to see if we could identify a few specific use cases >> around which to have the conversation. (e.g., time series sensor data >>and >> web content retrieval for "get latest"; climate data for huge data sets; >> local data in a vehicular network; etc.) What have you been looking at >> that's driving considerations of discovery? >> >> Thanks, >> Jeff >> >> From: >> Date: Mon, 22 Sep 2014 22:29:43 +0000 >> To: Jeff Burke >> Cc: , >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> Jeff, >> >> Take a look at my posting (that Felix fixed) in a new thread on >>Discovery. >> >> >>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200 >>.html >> >> I think it would be very productive to talk about what Discovery should >>do, >> and not focus on the how. It is sometimes easy to get caught up in the >>how, >> which I think is a less important topic than the what at this stage. >> >> Marc >> >> On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: >> >> Marc, >> >> If you can't talk about your protocols, perhaps we can discuss this >>based >> on use cases. What are the use cases you are using to evaluate >> discovery? >> >> Jeff >> >> >> >> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: >> >> No matter what the expressiveness of the predicates if the forwarder can >> send interests different ways you don't have a consistent underlying set >> to talk about so you would always need non-range exclusions to discover >> every version. >> >> Range exclusions only work I believe if you get an authoritative answer. >> If different content pieces are scattered between different caches I >> don't see how range exclusions would work to discover every version. >> >> I'm sorry to be pointing out problems without offering solutions but >> we're not ready to publish our discovery protocols. >> >> Sent from my telephone >> >> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >> >> I see. Can you briefly describe how ccnx discovery protocol solves the >> all problems that you mentioned (not just exclude)? a doc will be >> better. >> >> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >> expect [and] and [or], so boolean algebra is fully supported. Regular >> language or context free language might become part of selector too. >> >> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >> That will get you one reading then you need to exclude it and ask >> again. >> >> Sent from my telephone >> >> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >> >> Yes, my point was that if you cannot talk about a consistent set >> with a particular cache, then you need to always use individual >> excludes not range excludes if you want to discover all the versions >> of an object. >> >> >> I am very confused. For your example, if I want to get all today's >> sensor data, I just do (Any..Last second of last day)(First second of >> tomorrow..Any). That's 18 bytes. >> >> >> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >> >> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >> >> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >> >> If you talk sometimes to A and sometimes to B, you very easily >> could miss content objects you want to discovery unless you avoid >> all range exclusions and only exclude explicit versions. >> >> >> Could you explain why missing content object situation happens? also >> range exclusion is just a shorter notation for many explicit >> exclude; >> converting from explicit excludes to ranged exclude is always >> possible. >> >> >> Yes, my point was that if you cannot talk about a consistent set >> with a particular cache, then you need to always use individual >> excludes not range excludes if you want to discover all the versions >> of an object. For something like a sensor reading that is updated, >> say, once per second you will have 86,400 of them per day. If each >> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >> exclusions (plus encoding overhead) per day. >> >> yes, maybe using a more deterministic version number than a >> timestamp makes sense here, but its just an example of needing a lot >> of exclusions. >> >> >> You exclude through 100 then issue a new interest. This goes to >> cache B >> >> >> I feel this case is invalid because cache A will also get the >> interest, and cache A will return v101 if it exists. Like you said, >> if >> this goes to cache B only, it means that cache A dies. How do you >> know >> that v101 even exist? >> >> >> I guess this depends on what the forwarding strategy is. If the >> forwarder will always send each interest to all replicas, then yes, >> modulo packet loss, you would discover v101 on cache A. If the >> forwarder is just doing ?best path? and can round-robin between cache >> A and cache B, then your application could miss v101. >> >> >> >> c,d In general I agree that LPM performance is related to the number >> of components. In my own thread-safe LMP implementation, I used only >> one RWMutex for the whole tree. I don't know whether adding lock for >> every node will be faster or not because of lock overhead. >> >> However, we should compare (exact match + discovery protocol) vs >> (ndn >> lpm). Comparing performance of exact match to lpm is unfair. >> >> >> Yes, we should compare them. And we need to publish the ccnx 1.0 >> specs for doing the exact match discovery. So, as I said, I?m not >> ready to claim its better yet because we have not done that. >> >> >> >> >> >> >> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >> I would point out that using LPM on content object to Interest >> matching to do discovery has its own set of problems. Discovery >> involves more than just ?latest version? discovery too. >> >> This is probably getting off-topic from the original post about >> naming conventions. >> >> a. If Interests can be forwarded multiple directions and two >> different caches are responding, the exclusion set you build up >> talking with cache A will be invalid for cache B. If you talk >> sometimes to A and sometimes to B, you very easily could miss >> content objects you want to discovery unless you avoid all range >> exclusions and only exclude explicit versions. That will lead to >> very large interest packets. In ccnx 1.0, we believe that an >> explicit discovery protocol that allows conversations about >> consistent sets is better. >> >> b. Yes, if you just want the ?latest version? discovery that >> should be transitive between caches, but imagine this. You send >> Interest #1 to cache A which returns version 100. You exclude >> through 100 then issue a new interest. This goes to cache B who >> only has version 99, so the interest times out or is NACK?d. So >> you think you have it! But, cache A already has version 101, you >> just don?t know. If you cannot have a conversation around >> consistent sets, it seems like even doing latest version discovery >> is difficult with selector based discovery. From what I saw in >> ccnx 0.x, one ended up getting an Interest all the way to the >> authoritative source because you can never believe an intermediate >> cache that there?s not something more recent. >> >> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >> interest in seeing your analysis. Case (a) is that a node can >> correctly discover every version of a name prefix, and (b) is that >> a node can correctly discover the latest version. We have not >> formally compared (or yet published) our discovery protocols (we >> have three, 2 for content, 1 for device) compared to selector based >> discovery, so I cannot yet claim they are better, but they do not >> have the non-determinism sketched above. >> >> c. Using LPM, there is a non-deterministic number of lookups you >> must do in the PIT to match a content object. If you have a name >> tree or a threaded hash table, those don?t all need to be hash >> lookups, but you need to walk up the name tree for every prefix of >> the content object name and evaluate the selector predicate. >> Content Based Networking (CBN) had some some methods to create data >> structures based on predicates, maybe those would be better. But >> in any case, you will potentially need to retrieve many PIT entries >> if there is Interest traffic for many prefixes of a root. Even on >> an Intel system, you?ll likely miss cache lines, so you?ll have a >> lot of NUMA access for each one. In CCNx 1.0, even a naive >> implementation only requires at most 3 lookups (one by name, one by >> name + keyid, one by name + content object hash), and one can do >> other things to optimize lookup for an extra write. >> >> d. In (c) above, if you have a threaded name tree or are just >> walking parent pointers, I suspect you?ll need locking of the >> ancestors in a multi-threaded system (?threaded" here meaning LWP) >> and that will be expensive. It would be interesting to see what a >> cache consistent multi-threaded name tree looks like. >> >> Marc >> >> >> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >> wrote: >> >> I had thought about these questions, but I want to know your idea >> besides typed component: >> 1. LPM allows "data discovery". How will exact match do similar >> things? >> 2. will removing selectors improve performance? How do we use >> other >> faster technique to replace selector? >> 3. fixed byte length and type. I agree more that type can be fixed >> byte, but 2 bytes for length might not be enough for future. >> >> >> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >> wrote: >> >> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >> wrote: >> >> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on >> how the registry of types is managed. >> >> >> Could you share it with us? >> >> Sure. Here?s a strawman. >> >> The type space is 16 bits, so you have 65,565 types. >> >> The type space is currently shared with the types used for the >> entire protocol, that gives us two options: >> (1) we reserve a range for name component types. Given the >> likelihood there will be at least as much and probably more need >> to component types than protocol extensions, we could reserve 1/2 >> of the type space, giving us 32K types for name components. >> (2) since there is no parsing ambiguity between name components >> and other fields of the protocol (sine they are sub-types of the >> name type) we could reuse numbers and thereby have an entire 65K >> name component types. >> >> We divide the type space into regions, and manage it with a >> registry. If we ever get to the point of creating an IETF >> standard, IANA has 25 years of experience running registries and >> there are well-understood rule sets for different kinds of >> registries (open, requires a written spec, requires standards >> approval). >> >> - We allocate one ?default" name component type for ?generic >> name?, which would be used on name prefixes and other common >> cases where there are no special semantics on the name component. >> - We allocate a range of name component types, say 1024, to >> globally understood types that are part of the base or extension >> NDN specifications (e.g. chunk#, version#, etc. >> - We reserve some portion of the space for unanticipated uses >> (say another 1024 types) >> - We give the rest of the space to application assignment. >> >> Make sense? >> >> >> While I?m sympathetic to that view, there are three ways in >> which Moore?s law or hardware tricks will not save us from >> performance flaws in the design >> >> >> we could design for performance, >> >> That?s not what people are advocating. We are advocating that we >> *not* design for known bad performance and hope serendipity or >> Moore?s Law will come to the rescue. >> >> but I think there will be a turning >> point when the slower design starts to become "fast enough?. >> >> Perhaps, perhaps not. Relative performance is what matters so >> things that don?t get faster while others do tend to get dropped >> or not used because they impose a performance penalty relative to >> the things that go faster. There is also the ?low-end? phenomenon >> where impovements in technology get applied to lowering cost >> rather than improving performance. For those environments bad >> performance just never get better. >> >> Do you >> think there will be some design of ndn that will *never* have >> performance improvement? >> >> I suspect LPM on data will always be slow (relative to the other >> functions). >> i suspect exclusions will always be slow because they will >> require extra memory references. >> >> However I of course don?t claim to clairvoyance so this is just >> speculation based on 35+ years of seeing performance improve by 4 >> orders of magnitude and still having to worry about counting >> cycles and memory references? >> >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >> wrote: >> >> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >> wrote: >> >> We should not look at a certain chip nowadays and want ndn to >> perform >> well on it. It should be the other way around: once ndn app >> becomes >> popular, a better chip will be designed for ndn. >> >> While I?m sympathetic to that view, there are three ways in >> which Moore?s law or hardware tricks will not save us from >> performance flaws in the design: >> a) clock rates are not getting (much) faster >> b) memory accesses are getting (relatively) more expensive >> c) data structures that require locks to manipulate >> successfully will be relatively more expensive, even with >> near-zero lock contention. >> >> The fact is, IP *did* have some serious performance flaws in >> its design. We just forgot those because the design elements >> that depended on those mistakes have fallen into disuse. The >> poster children for this are: >> 1. IP options. Nobody can use them because they are too slow >> on modern forwarding hardware, so they can?t be reliably used >> anywhere >> 2. the UDP checksum, which was a bad design when it was >> specified and is now a giant PITA that still causes major pain >> in working around. >> >> I?m afraid students today are being taught the that designers >> of IP were flawless, as opposed to very good scientists and >> engineers that got most of it right. >> >> I feel the discussion today and yesterday has been off-topic. >> Now I >> see that there are 3 approaches: >> 1. we should not define a naming convention at all >> 2. typed component: use tlv type space and add a handful of >> types >> 3. marked component: introduce only one more type and add >> additional >> marker space >> >> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on >> how the registry of types is managed. >> >> It is just as powerful in practice as either throwing up our >> hands and letting applications design their own mutually >> incompatible schemes or trying to make naming conventions with >> markers in a way that is fast to generate/parse and also >> resilient against aliasing. >> >> Also everybody thinks that the current utf8 marker naming >> convention >> needs to be revised. >> >> >> >> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >> wrote: >> Would that chip be suitable, i.e. can we expect most names >> to fit in (the >> magnitude of) 96 bytes? What length are names usually in >> current NDN >> experiments? >> >> I guess wide deployment could make for even longer names. >> Related: Many URLs >> I encounter nowadays easily don't fit within two 80-column >> text lines, and >> NDN will have to carry more information than URLs, as far as >> I see. >> >> >> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >> >> In fact, the index in separate TLV will be slower on some >> architectures, >> like the ezChip NP4. The NP4 can hold the fist 96 frame >> bytes in memory, >> then any subsequent memory is accessed only as two adjacent >> 32-byte blocks >> (there can be at most 5 blocks available at any one time). >> If you need to >> switch between arrays, it would be very expensive. If you >> have to read past >> the name to get to the 2nd array, then read it, then backup >> to get to the >> name, it will be pretty expensive too. >> >> Marc >> >> On Sep 18, 2014, at 2:02 PM, >> wrote: >> >> Does this make that much difference? >> >> If you want to parse the first 5 components. One way to do >> it is: >> >> Read the index, find entry 5, then read in that many bytes >> from the start >> offset of the beginning of the name. >> OR >> Start reading name, (find size + move ) 5 times. >> >> How much speed are you getting from one to the other? You >> seem to imply >> that the first one is faster. I don?t think this is the >> case. >> >> In the first one you?ll probably have to get the cache line >> for the index, >> then all the required cache lines for the first 5 >> components. For the >> second, you?ll have to get all the cache lines for the first >> 5 components. >> Given an assumption that a cache miss is way more expensive >> than >> evaluating a number and computing an addition, you might >> find that the >> performance of the index is actually slower than the >> performance of the >> direct access. >> >> Granted, there is a case where you don?t access the name at >> all, for >> example, if you just get the offsets and then send the >> offsets as >> parameters to another processor/GPU/NPU/etc. In this case >> you may see a >> gain IF there are more cache line misses in reading the name >> than in >> reading the index. So, if the regular part of the name >> that you?re >> parsing is bigger than the cache line (64 bytes?) and the >> name is to be >> processed by a different processor, then your might see some >> performance >> gain in using the index, but in all other circumstances I >> bet this is not >> the case. I may be wrong, haven?t actually tested it. >> >> This is all to say, I don?t think we should be designing the >> protocol with >> only one architecture in mind. (The architecture of sending >> the name to a >> different processor than the index). >> >> If you have numbers that show that the index is faster I >> would like to see >> under what conditions and architectural assumptions. >> >> Nacho >> >> (I may have misinterpreted your description so feel free to >> correct me if >> I?m wrong.) >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> On 9/18/14, 12:54 AM, "Massimo Gallo" >> >> wrote: >> >> Indeed each components' offset must be encoded using a fixed >> amount of >> bytes: >> >> i.e., >> Type = Offsets >> Length = 10 Bytes >> Value = Offset1(1byte), Offset2(1byte), ... >> >> You may also imagine to have a "Offset_2byte" type if your >> name is too >> long. >> >> Max >> >> On 18/09/2014 09:27, Tai-Lin Chu wrote: >> >> if you do not need the entire hierarchal structure (suppose >> you only >> want the first x components) you can directly have it using >> the >> offsets. With the Nested TLV structure you have to >> iteratively parse >> the first x-1 components. With the offset structure you cane >> directly >> access to the firs x components. >> >> I don't get it. What you described only works if the >> "offset" is >> encoded in fixed bytes. With varNum, you will still need to >> parse x-1 >> offsets to get to the x offset. >> >> >> >> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >> wrote: >> >> On 17/09/2014 14:56, Mark Stapp wrote: >> >> ah, thanks - that's helpful. I thought you were saying "I >> like the >> existing NDN UTF8 'convention'." I'm still not sure I >> understand what >> you >> _do_ prefer, though. it sounds like you're describing an >> entirely >> different >> scheme where the info that describes the name-components is >> ... >> someplace >> other than _in_ the name-components. is that correct? when >> you say >> "field >> separator", what do you mean (since that's not a "TL" from a >> TLV)? >> >> Correct. >> In particular, with our name encoding, a TLV indicates the >> name >> hierarchy >> with offsets in the name and other TLV(s) indicates the >> offset to use >> in >> order to retrieve special components. >> As for the field separator, it is something like "/". >> Aliasing is >> avoided as >> you do not rely on field separators to parse the name; you >> use the >> "offset >> TLV " to do that. >> >> So now, it may be an aesthetic question but: >> >> if you do not need the entire hierarchal structure (suppose >> you only >> want >> the first x components) you can directly have it using the >> offsets. >> With the >> Nested TLV structure you have to iteratively parse the first >> x-1 >> components. >> With the offset structure you cane directly access to the >> firs x >> components. >> >> Max >> >> >> -- Mark >> >> On 9/17/14 6:02 AM, Massimo Gallo wrote: >> >> The why is simple: >> >> You use a lot of "generic component type" and very few >> "specific >> component type". You are imposing types for every component >> in order >> to >> handle few exceptions (segmentation, etc..). You create a >> rule >> (specify >> the component's type ) to handle exceptions! >> >> I would prefer not to have typed components. Instead I would >> prefer >> to >> have the name as simple sequence bytes with a field >> separator. Then, >> outside the name, if you have some components that could be >> used at >> network layer (e.g. a TLV field), you simply need something >> that >> indicates which is the offset allowing you to retrieve the >> version, >> segment, etc in the name... >> >> >> Max >> >> >> >> >> >> On 16/09/2014 20:33, Mark Stapp wrote: >> >> On 9/16/14 10:29 AM, Massimo Gallo wrote: >> >> I think we agree on the small number of "component types". >> However, if you have a small number of types, you will end >> up with >> names >> containing many generic components types and few specific >> components >> types. Due to the fact that the component type specification >> is an >> exception in the name, I would prefer something that specify >> component's >> type only when needed (something like UTF8 conventions but >> that >> applications MUST use). >> >> so ... I can't quite follow that. the thread has had some >> explanation >> about why the UTF8 requirement has problems (with aliasing, >> e.g.) >> and >> there's been email trying to explain that applications don't >> have to >> use types if they don't need to. your email sounds like "I >> prefer >> the >> UTF8 convention", but it doesn't say why you have that >> preference in >> the face of the points about the problems. can you say why >> it is >> that >> you express a preference for the "convention" with problems ? >> >> Thanks, >> Mark >> >> . >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Wed Sep 24 08:35:05 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Wed, 24 Sep 2014 15:35:05 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <5D39D37F-D41D-4834-A0DB-A94322D79DA3@memphis.edu> References: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> <65E2A60C-5494-4991-B78D-7164B1277176@parc.com> <5D39D37F-D41D-4834-A0DB-A94322D79DA3@memphis.edu> Message-ID: On 9/24/14, 4:37 PM, "Lan Wang (lanwang)" wrote: >Here's how I see the name discovery will work in this case: > >- the email client can request the latest list of email names in the >mailbox /username/mailbox/list (using the right-most selector). > >- of course, there may be cached lists in the network. This should be >minimized by a very short FreshnessSeconds on the data set by the email >server (e.g., 1 second). So most likely there is at most one cached list >is out there. > >- after the client gets an email list (/username/mailbox/list/20), the >client should immediately issue another Interest >(/username/mailbox/list/) with the selector (>20) to see if there are >most recent ones. > >- eventually the request will get to the server. The server may respond >with a more recent list or a NACK. Let me see if I follow: A1- You publish /username/mailbox/list/1 (lifetime of 1 second) A2- You publish /username/mailbox/list/2 (lifetime of 1 second) A3- You publish /username/mailbox/list/3 (lifetime of 1 second) ? A20- You publish /username/mailbox/list/20 (lifetime of 1 second) B- You request /username/mailbox/list C- You receive /username/mailbox/list/20 (lifetime of 1 second) D- You request /username/mailbox/list (/>20) E- Request gets routed to actual publisher F- It replies with a NACK or a new list. Why did you do all these extra steps? Why not just do: B- You request /username/mailbox/list E- Request gets routed to actual publisher F- It replies with a new list (lifetime of 0 seconds) In Scheme A you sent 2 interests, received 2 objects, going all the way to source. In Scheme B you sent 1 interest, received 1 object, going all the way to source. Scheme B is always better (doesn?t need to do C, D) for this example and it uses exact matching. You can play tricks with the lifetime of the object in both cases, selectors or not. > >- meanwhile, the email client can retrieve the emails using the names >obtained in these lists. Some emails may turn out to be unnecessary, so >they will be discarded when a most recent list comes. The email client >can also keep state about the names of the emails it has deleted to >minimize this problem. This is independent of selectors / exact matching. Nacho > >On Sep 24, 2014, at 2:37 AM, Marc.Mosko at parc.com wrote: > >> Ok, let?s take that example and run with it a bit. I?ll walk through a >>?discover all? example. This example leads me to why I say discovery >>should be separate from data retrieval. I don?t claim that we have a >>final solution to this problem, I think in a distributed peer-to-peer >>environment solving this problem is difficult. If you have a counter >>example as to how this discovery could progress using only the >>information know a priori by the requester, I would be interesting in >>seeing that example worked out. Please do correct me if you think this >>is wrong. >> >> You have mails that were originally numbered 0 - 10000, sequentially by >>the server. >> >> You travel between several places and access different emails from >>different places. This populates caches. Lets say 0,3,6,9,? are on >>cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. >>Also, you have deleted 500 random emails, so there?s only 9500 emails >>actually out there. >> >> You setup a new computer and now want to download all your emails. The >>new computer is on the path of caches C, B, then A, then the >>authoritative source server. The new email program has no initial >>state. The email program only knows that the email number is an integer >>that starts at 0. It issues an interest for /mail/inbox, and asks for >>left-most child because it want to populate in order. It gets a >>response from cache C with mail 2. >> >> Now, what does the email program do? It cannot exclude the range 0..2 >>because that would possibly miss 0 and 1. So, all it can do is exclude >>the exact number ?2? and ask again. It then gets cache C again and it >>responds with ?5?. There are about 3000 emails on cache C, and if they >>all take 4 bytes (for the exclude component plus its coding overhead), >>then that?s 12KB of exclusions to finally exhaust cache C. >> >> If we want Interests to avoid fragmentation, we can fit about 1200 >>bytes of exclusions, or 300 components. This means we need about 10 >>interest messages. Each interest would be something like ?exclude >>2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, 308, >>?, >600?, etc. >> >> Those interests that exclude everything at cache C would then hit, say >>cache B and start getting results 1, 4, 7, ?. This means an Interest >>like ?exclude 2,5,8,11,?, >300? would then get back number 1. That >>means the next request actually has to split that one interest?s exclude >>in to two (because the interest was at maximum size), so you now issue >>two interests where one is ?exclude 1, 2, 5, 8, >210? and the other is >>?<210, 212, 215, ?, >300?. >> >> If you look in the CCNx 0.8 java code, there should be a class that >>does these Interest based discoveries and does the Interest splitting >>based on the currently know range of discovered content. I don?t have >>the specific reference right now, but I can send a link if you are >>interested in seeing that. The java class keeps state of what has been >>discovered so far, so it could re-start later if interrupted. >> >> So all those interests would now be getting results form cache B. You >>would then start to split all those ranges to accommodate the numbers >>coming back from B. Eventually, you?ll have at least 10 Interest >>messages outstanding that would be excluding all the 9500 messages that >>are in caches A, B, and C. Some of those interest messages might >>actually reach an authoritative server, which might respond too. It >>would like be more than 10 interests due to the algorithm that?s used to >>split full interests, which likely is not optimal because it does not >>know exactly where breaks should be a priori. >> >> Once you have exhausted caches A, B, and C, the interest messages would >>reach the authoritative source (if its on line), and it would be issuing >>NACKs (i assume) for interests have have excluded all non-deleted emails. >> >> In any case, it takes, at best, 9,500 round trips to ?discover? all >>9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 bytes >>of Interest exclusions. Note that it?s an arithmetic sum of bytes of >>exclusion, because at each Interest the size of the exclusions increases >>by 4. There was an NDN paper about light bulb discovery (or something >>like that) that noted this same problem and proposed some work around, >>but I don?t remember what they proposed. >> >> Yes, you could possibly pipeline it, but what would you do? In this >>example, where emails 0 - 10000 (minus some random ones) would allow you >>? if you knew a priori ? to issue say 10 interest in parallel that ask >>for different ranges. But, 2 years from now your undeleted emails might >>range form 100,000 - 150,000. The point is that a discovery protocol >>does not know, a priori, what is to be discovered. It might start >>learning some stuff as it goes on. >> >> If you could have retrieved just a table of contents from each cache, >>where each ?row? is say 64 bytes (i.e. the name continuation plus hash >>value), you would need to retrieve 3300 * 64 = 211KB from each cache >>(total 640 KB) to list all the emails. That would take 640KB / 1200 = >>534 interest messages of say 64 bytes = 34 KB to discover all 9500 >>emails plus another set to fetch the header rows. That?s, say 68 KB of >>interest traffic compared to 200 MB. Now, I?ve not said how to list >>these tables of contents, so an actual protocol might be higher >>communication cost, but even if it was 10x worse that would still be an >>attractive tradeoff. >> >> This assumes that you publish just the ?header? in the 1st segment (say >>1 KB total object size including the signatures). That?s 10 MB to learn >>the headers. >> >> You could also argue that the distribute of emails over caches is >>arbitrary. That?s true, I picked a difficult sequence. But unless you >>have some positive controls on what could be in a cache, it could be any >>difficult sequence. I also did not address the timeout issue, and how >>do you know you are done? >> >> This is also why sync works so much better than doing raw interest >>discovery. Sync exchanges tables of contents and diffs, it does not >>need to enumerate by exclusion everything to retrieve. >> >> Marc >> >> >> >> On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu wrote: >> >>> discovery can be reduced to "pattern detection" (can we infer what >>> exists?) and "pattern validation" (can we confirm this guess?) >>> >>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>> pattern with static (/mail/inbox) and variable (148) components; with >>> proper naming convention, computers can also detect this pattern >>> easily. Now I want to look for all mails in my inbox. I can generate a >>> list of /mail/inbox/. These are my guesses, and with selectors >>> I can further refine my guesses. >>> >>> To validate them, bloom filter can provide "best effort" >>> discovery(with some false positives, so I call it "best-effort") >>> before I stupidly send all the interests to the network. >>> >>> The discovery protocol, as I described above, is essentially "pattern >>> detection by naming convention" and "bloom filter validation." This is >>> definitely one of the "simpler" discovery protocol, because the data >>> producer only need to add additional bloom filter. Notice that we can >>> progressively add entries to bfilter with low computation cost. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>> Ok, yes I think those would all be good things. >>>> >>>> One thing to keep in mind, especially with things like time series >>>>sensor >>>> data, is that people see a pattern and infer a way of doing it. >>>>That?s easy >>>> for a human :) But in Discovery, one should assume that one does not >>>>know >>>> of patterns in the data beyond what the protocols used to publish the >>>>data >>>> explicitly require. That said, I think some of the things you listed >>>>are >>>> good places to start: sensor data, web content, climate data or >>>>genome data. >>>> >>>> We also need to state what the forwarding strategies are and what the >>>>cache >>>> behavior is. >>>> >>>> I outlined some of the points that I think are important in that other >>>> posting. While ?discover latest? is useful, ?discover all? is also >>>> important, and that one gets complicated fast. So points like >>>>separating >>>> discovery from retrieval and working with large data sets have been >>>> important in shaping our thinking. That all said, I?d be happy >>>>starting >>>> from 0 and working through the Discovery service definition from >>>>scratch >>>> along with data set use cases. >>>> >>>> Marc >>>> >>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>wrote: >>>> >>>> Hi Marc, >>>> >>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>more >>>> specific, which was to see if we could identify a few specific use >>>>cases >>>> around which to have the conversation. (e.g., time series sensor >>>>data and >>>> web content retrieval for "get latest"; climate data for huge data >>>>sets; >>>> local data in a vehicular network; etc.) What have you been looking >>>>at >>>> that's driving considerations of discovery? >>>> >>>> Thanks, >>>> Jeff >>>> >>>> From: >>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>> To: Jeff Burke >>>> Cc: , >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> Jeff, >>>> >>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>Discovery. >>>> >>>> >>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/0002 >>>>00.html >>>> >>>> I think it would be very productive to talk about what Discovery >>>>should do, >>>> and not focus on the how. It is sometimes easy to get caught up in >>>>the how, >>>> which I think is a less important topic than the what at this stage. >>>> >>>> Marc >>>> >>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>wrote: >>>> >>>> Marc, >>>> >>>> If you can't talk about your protocols, perhaps we can discuss this >>>>based >>>> on use cases. What are the use cases you are using to evaluate >>>> discovery? >>>> >>>> Jeff >>>> >>>> >>>> >>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>wrote: >>>> >>>> No matter what the expressiveness of the predicates if the forwarder >>>>can >>>> send interests different ways you don't have a consistent underlying >>>>set >>>> to talk about so you would always need non-range exclusions to >>>>discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative >>>>answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>> >>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>> all problems that you mentioned (not just exclude)? a doc will be >>>> better. >>>> >>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>> language or context free language might become part of selector too. >>>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. >>>> >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>> If you talk sometimes to A and sometimes to B, you very easily >>>> could miss content objects you want to discovery unless you avoid >>>> all range exclusions and only exclude explicit versions. >>>> >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit >>>> exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>>> >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. For something like a sensor reading that is updated, >>>> say, once per second you will have 86,400 of them per day. If each >>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>> exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a >>>> timestamp makes sense here, but its just an example of needing a lot >>>> of exclusions. >>>> >>>> >>>> You exclude through 100 then issue a new interest. This goes to >>>> cache B >>>> >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, >>>> if >>>> this goes to cache B only, it means that cache A dies. How do you >>>> know >>>> that v101 even exist? >>>> >>>> >>>> I guess this depends on what the forwarding strategy is. If the >>>> forwarder will always send each interest to all replicas, then yes, >>>> modulo packet loss, you would discover v101 on cache A. If the >>>> forwarder is just doing ?best path? and can round-robin between cache >>>> A and cache B, then your application could miss v101. >>>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs >>>> (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>> specs for doing the exact match discovery. So, as I said, I?m not >>>> ready to claim its better yet because we have not done that. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest >>>> matching to do discovery has its own set of problems. Discovery >>>> involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about >>>> naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two >>>> different caches are responding, the exclusion set you build up >>>> talking with cache A will be invalid for cache B. If you talk >>>> sometimes to A and sometimes to B, you very easily could miss >>>> content objects you want to discovery unless you avoid all range >>>> exclusions and only exclude explicit versions. That will lead to >>>> very large interest packets. In ccnx 1.0, we believe that an >>>> explicit discovery protocol that allows conversations about >>>> consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that >>>> should be transitive between caches, but imagine this. You send >>>> Interest #1 to cache A which returns version 100. You exclude >>>> through 100 then issue a new interest. This goes to cache B who >>>> only has version 99, so the interest times out or is NACK?d. So >>>> you think you have it! But, cache A already has version 101, you >>>> just don?t know. If you cannot have a conversation around >>>> consistent sets, it seems like even doing latest version discovery >>>> is difficult with selector based discovery. From what I saw in >>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>> authoritative source because you can never believe an intermediate >>>> cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>> interest in seeing your analysis. Case (a) is that a node can >>>> correctly discover every version of a name prefix, and (b) is that >>>> a node can correctly discover the latest version. We have not >>>> formally compared (or yet published) our discovery protocols (we >>>> have three, 2 for content, 1 for device) compared to selector based >>>> discovery, so I cannot yet claim they are better, but they do not >>>> have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you >>>> must do in the PIT to match a content object. If you have a name >>>> tree or a threaded hash table, those don?t all need to be hash >>>> lookups, but you need to walk up the name tree for every prefix of >>>> the content object name and evaluate the selector predicate. >>>> Content Based Networking (CBN) had some some methods to create data >>>> structures based on predicates, maybe those would be better. But >>>> in any case, you will potentially need to retrieve many PIT entries >>>> if there is Interest traffic for many prefixes of a root. Even on >>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>> implementation only requires at most 3 lookups (one by name, one by >>>> name + keyid, one by name + content object hash), and one can do >>>> other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just >>>> walking parent pointers, I suspect you?ll need locking of the >>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>> and that will be expensive. It would be interesting to see what a >>>> cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar >>>> things? >>>> 2. will removing selectors improve performance? How do we use >>>> other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> >>>> Could you share it with us? >>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the >>>> entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the >>>> likelihood there will be at least as much and probably more need >>>> to component types than protocol extensions, we could reserve 1/2 >>>> of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components >>>> and other fields of the protocol (sine they are sub-types of the >>>> name type) we could reuse numbers and thereby have an entire 65K >>>> name component types. >>>> >>>> We divide the type space into regions, and manage it with a >>>> registry. If we ever get to the point of creating an IETF >>>> standard, IANA has 25 years of experience running registries and >>>> there are well-understood rule sets for different kinds of >>>> registries (open, requires a written spec, requires standards >>>> approval). >>>> >>>> - We allocate one ?default" name component type for ?generic >>>> name?, which would be used on name prefixes and other common >>>> cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to >>>> globally understood types that are part of the base or extension >>>> NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses >>>> (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design >>>> >>>> >>>> we could design for performance, >>>> >>>> That?s not what people are advocating. We are advocating that we >>>> *not* design for known bad performance and hope serendipity or >>>> Moore?s Law will come to the rescue. >>>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>>> >>>> Perhaps, perhaps not. Relative performance is what matters so >>>> things that don?t get faster while others do tend to get dropped >>>> or not used because they impose a performance penalty relative to >>>> the things that go faster. There is also the ?low-end? phenomenon >>>> where impovements in technology get applied to lowering cost >>>> rather than improving performance. For those environments bad >>>> performance just never get better. >>>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>>> I suspect LPM on data will always be slow (relative to the other >>>> functions). >>>> i suspect exclusions will always be slow because they will >>>> require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just >>>> speculation based on 35+ years of seeing performance improve by 4 >>>> orders of magnitude and still having to worry about counting >>>> cycles and memory references? >>>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> We should not look at a certain chip nowadays and want ndn to >>>> perform >>>> well on it. It should be the other way around: once ndn app >>>> becomes >>>> popular, a better chip will be designed for ndn. >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate >>>> successfully will be relatively more expensive, even with >>>> near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in >>>> its design. We just forgot those because the design elements >>>> that depended on those mistakes have fallen into disuse. The >>>> poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow >>>> on modern forwarding hardware, so they can?t be reliably used >>>> anywhere >>>> 2. the UDP checksum, which was a bad design when it was >>>> specified and is now a giant PITA that still causes major pain >>>> in working around. >>>> >>>> I?m afraid students today are being taught the that designers >>>> of IP were flawless, as opposed to very good scientists and >>>> engineers that got most of it right. >>>> >>>> I feel the discussion today and yesterday has been off-topic. >>>> Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of >>>> types >>>> 3. marked component: introduce only one more type and add >>>> additional >>>> marker space >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our >>>> hands and letting applications design their own mutually >>>> incompatible schemes or trying to make naming conventions with >>>> markers in a way that is fast to generate/parse and also >>>> resilient against aliasing. >>>> >>>> Also everybody thinks that the current utf8 marker naming >>>> convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>> wrote: >>>> Would that chip be suitable, i.e. can we expect most names >>>> to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in >>>> current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. >>>> Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column >>>> text lines, and >>>> NDN will have to carry more information than URLs, as far as >>>> I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some >>>> architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>> bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent >>>> 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). >>>> If you need to >>>> switch between arrays, it would be very expensive. If you >>>> have to read past >>>> the name to get to the 2nd array, then read it, then backup >>>> to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do >>>> it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes >>>> from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You >>>> seem to imply >>>> that the first one is faster. I don?t think this is the >>>> case. >>>> >>>> In the first one you?ll probably have to get the cache line >>>> for the index, >>>> then all the required cache lines for the first 5 >>>> components. For the >>>> second, you?ll have to get all the cache lines for the first >>>> 5 components. >>>> Given an assumption that a cache miss is way more expensive >>>> than >>>> evaluating a number and computing an addition, you might >>>> find that the >>>> performance of the index is actually slower than the >>>> performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at >>>> all, for >>>> example, if you just get the offsets and then send the >>>> offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case >>>> you may see a >>>> gain IF there are more cache line misses in reading the name >>>> than in >>>> reading the index. So, if the regular part of the name >>>> that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the >>>> name is to be >>>> processed by a different processor, then your might see some >>>> performance >>>> gain in using the index, but in all other circumstances I >>>> bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the >>>> protocol with >>>> only one architecture in mind. (The architecture of sending >>>> the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I >>>> would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to >>>> correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed >>>> amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your >>>> name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want the first x components) you can directly have it using >>>> the >>>> offsets. With the Nested TLV structure you have to >>>> iteratively parse >>>> the first x-1 components. With the offset structure you cane >>>> directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the >>>> "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to >>>> parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I >>>> like the >>>> existing NDN UTF8 'convention'." I'm still not sure I >>>> understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an >>>> entirely >>>> different >>>> scheme where the info that describes the name-components is >>>> ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when >>>> you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a >>>> TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the >>>> name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the >>>> offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". >>>> Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you >>>> use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want >>>> the first x components) you can directly have it using the >>>> offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first >>>> x-1 >>>> components. >>>> With the offset structure you cane directly access to the >>>> firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few >>>> "specific >>>> component type". You are imposing types for every component >>>> in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a >>>> rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would >>>> prefer >>>> to >>>> have the name as simple sequence bytes with a field >>>> separator. Then, >>>> outside the name, if you have some components that could be >>>> used at >>>> network layer (e.g. a TLV field), you simply need something >>>> that >>>> indicates which is the offset allowing you to retrieve the >>>> version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end >>>> up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification >>>> is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but >>>> that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, >>>> e.g.) >>>> and >>>> there's been email trying to explain that applications don't >>>> have to >>>> use types if they don't need to. your email sounds like "I >>>> prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that >>>> preference in >>>> the face of the points about the problems. can you say why >>>> it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jburke at remap.ucla.edu Wed Sep 24 08:46:00 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Wed, 24 Sep 2014 15:46:00 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" wrote: >On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: > >>For example, I see a pattern /mail/inbox/148. I, a human being, see a >>pattern with static (/mail/inbox) and variable (148) components; with >>proper naming convention, computers can also detect this pattern >>easily. Now I want to look for all mails in my inbox. I can generate a >>list of /mail/inbox/. These are my guesses, and with selectors >>I can further refine my guesses. > >I think this is a very bad example (or at least a very bad application >design). You have an app (a mail server / inbox) and you want it to list >your emails? An email list is an application data structure. I don?t >think you should use the network structure to reflect this. I think Tai-Lin is trying to sketch a small example, not propose a full-scale approach to email. (Maybe I am misunderstanding.) Another way to look at it is that if the network architecture is providing the equivalent of distributed storage to the application, perhaps the application data structure could be adapted to match the affordances of the network. Then it would not be so bad that the two structures were aligned. > >I?ll give you an example, how do you delete emails from your inbox? If an >email was cached in the network it can never be deleted from your inbox? This is conflating two issues - what you are pointing out is that the data structure of a linear list doesn't handle common email management operations well. Again, I'm not sure if that's what he was getting at here. But deletion is not the issue - the availability of a data object on the network does not necessarily mean it's valid from the perspective of the application. >Or moved to another mailbox? Do you rely on the emails expiring? > >This problem is true for most (any?) situations where you use network name >structure to directly reflect the application data structure. Not sure I understand how you make the leap from the example to the general statement. Jeff > >Nacho > > > >>On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>> Ok, yes I think those would all be good things. >>> >>> One thing to keep in mind, especially with things like time series >>>sensor >>> data, is that people see a pattern and infer a way of doing it. That?s >>>easy >>> for a human :) But in Discovery, one should assume that one does not >>>know >>> of patterns in the data beyond what the protocols used to publish the >>>data >>> explicitly require. That said, I think some of the things you listed >>>are >>> good places to start: sensor data, web content, climate data or genome >>>data. >>> >>> We also need to state what the forwarding strategies are and what the >>>cache >>> behavior is. >>> >>> I outlined some of the points that I think are important in that other >>> posting. While ?discover latest? is useful, ?discover all? is also >>> important, and that one gets complicated fast. So points like >>>separating >>> discovery from retrieval and working with large data sets have been >>> important in shaping our thinking. That all said, I?d be happy >>>starting >>> from 0 and working through the Discovery service definition from >>>scratch >>> along with data set use cases. >>> >>> Marc >>> >>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>wrote: >>> >>> Hi Marc, >>> >>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>more >>> specific, which was to see if we could identify a few specific use >>>cases >>> around which to have the conversation. (e.g., time series sensor data >>>and >>> web content retrieval for "get latest"; climate data for huge data >>>sets; >>> local data in a vehicular network; etc.) What have you been looking at >>> that's driving considerations of discovery? >>> >>> Thanks, >>> Jeff >>> >>> From: >>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>> To: Jeff Burke >>> Cc: , >>> Subject: Re: [Ndn-interest] any comments on naming convention? >>> >>> Jeff, >>> >>> Take a look at my posting (that Felix fixed) in a new thread on >>>Discovery. >>> >>> >>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >>>0 >>>.html >>> >>> I think it would be very productive to talk about what Discovery should >>>do, >>> and not focus on the how. It is sometimes easy to get caught up in the >>>how, >>> which I think is a less important topic than the what at this stage. >>> >>> Marc >>> >>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>wrote: >>> >>> Marc, >>> >>> If you can't talk about your protocols, perhaps we can discuss this >>>based >>> on use cases. What are the use cases you are using to evaluate >>> discovery? >>> >>> Jeff >>> >>> >>> >>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>wrote: >>> >>> No matter what the expressiveness of the predicates if the forwarder >>>can >>> send interests different ways you don't have a consistent underlying >>>set >>> to talk about so you would always need non-range exclusions to discover >>> every version. >>> >>> Range exclusions only work I believe if you get an authoritative >>>answer. >>> If different content pieces are scattered between different caches I >>> don't see how range exclusions would work to discover every version. >>> >>> I'm sorry to be pointing out problems without offering solutions but >>> we're not ready to publish our discovery protocols. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>> >>> I see. Can you briefly describe how ccnx discovery protocol solves the >>> all problems that you mentioned (not just exclude)? a doc will be >>> better. >>> >>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>> expect [and] and [or], so boolean algebra is fully supported. Regular >>> language or context free language might become part of selector too. >>> >>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>> That will get you one reading then you need to exclude it and ask >>> again. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. >>> >>> >>> I am very confused. For your example, if I want to get all today's >>> sensor data, I just do (Any..Last second of last day)(First second of >>> tomorrow..Any). That's 18 bytes. >>> >>> >>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>> >>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>> >>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>> >>> If you talk sometimes to A and sometimes to B, you very easily >>> could miss content objects you want to discovery unless you avoid >>> all range exclusions and only exclude explicit versions. >>> >>> >>> Could you explain why missing content object situation happens? also >>> range exclusion is just a shorter notation for many explicit >>> exclude; >>> converting from explicit excludes to ranged exclude is always >>> possible. >>> >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. For something like a sensor reading that is updated, >>> say, once per second you will have 86,400 of them per day. If each >>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>> exclusions (plus encoding overhead) per day. >>> >>> yes, maybe using a more deterministic version number than a >>> timestamp makes sense here, but its just an example of needing a lot >>> of exclusions. >>> >>> >>> You exclude through 100 then issue a new interest. This goes to >>> cache B >>> >>> >>> I feel this case is invalid because cache A will also get the >>> interest, and cache A will return v101 if it exists. Like you said, >>> if >>> this goes to cache B only, it means that cache A dies. How do you >>> know >>> that v101 even exist? >>> >>> >>> I guess this depends on what the forwarding strategy is. If the >>> forwarder will always send each interest to all replicas, then yes, >>> modulo packet loss, you would discover v101 on cache A. If the >>> forwarder is just doing ?best path? and can round-robin between cache >>> A and cache B, then your application could miss v101. >>> >>> >>> >>> c,d In general I agree that LPM performance is related to the number >>> of components. In my own thread-safe LMP implementation, I used only >>> one RWMutex for the whole tree. I don't know whether adding lock for >>> every node will be faster or not because of lock overhead. >>> >>> However, we should compare (exact match + discovery protocol) vs >>> (ndn >>> lpm). Comparing performance of exact match to lpm is unfair. >>> >>> >>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>> specs for doing the exact match discovery. So, as I said, I?m not >>> ready to claim its better yet because we have not done that. >>> >>> >>> >>> >>> >>> >>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>> I would point out that using LPM on content object to Interest >>> matching to do discovery has its own set of problems. Discovery >>> involves more than just ?latest version? discovery too. >>> >>> This is probably getting off-topic from the original post about >>> naming conventions. >>> >>> a. If Interests can be forwarded multiple directions and two >>> different caches are responding, the exclusion set you build up >>> talking with cache A will be invalid for cache B. If you talk >>> sometimes to A and sometimes to B, you very easily could miss >>> content objects you want to discovery unless you avoid all range >>> exclusions and only exclude explicit versions. That will lead to >>> very large interest packets. In ccnx 1.0, we believe that an >>> explicit discovery protocol that allows conversations about >>> consistent sets is better. >>> >>> b. Yes, if you just want the ?latest version? discovery that >>> should be transitive between caches, but imagine this. You send >>> Interest #1 to cache A which returns version 100. You exclude >>> through 100 then issue a new interest. This goes to cache B who >>> only has version 99, so the interest times out or is NACK?d. So >>> you think you have it! But, cache A already has version 101, you >>> just don?t know. If you cannot have a conversation around >>> consistent sets, it seems like even doing latest version discovery >>> is difficult with selector based discovery. From what I saw in >>> ccnx 0.x, one ended up getting an Interest all the way to the >>> authoritative source because you can never believe an intermediate >>> cache that there?s not something more recent. >>> >>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>> interest in seeing your analysis. Case (a) is that a node can >>> correctly discover every version of a name prefix, and (b) is that >>> a node can correctly discover the latest version. We have not >>> formally compared (or yet published) our discovery protocols (we >>> have three, 2 for content, 1 for device) compared to selector based >>> discovery, so I cannot yet claim they are better, but they do not >>> have the non-determinism sketched above. >>> >>> c. Using LPM, there is a non-deterministic number of lookups you >>> must do in the PIT to match a content object. If you have a name >>> tree or a threaded hash table, those don?t all need to be hash >>> lookups, but you need to walk up the name tree for every prefix of >>> the content object name and evaluate the selector predicate. >>> Content Based Networking (CBN) had some some methods to create data >>> structures based on predicates, maybe those would be better. But >>> in any case, you will potentially need to retrieve many PIT entries >>> if there is Interest traffic for many prefixes of a root. Even on >>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>> implementation only requires at most 3 lookups (one by name, one by >>> name + keyid, one by name + content object hash), and one can do >>> other things to optimize lookup for an extra write. >>> >>> d. In (c) above, if you have a threaded name tree or are just >>> walking parent pointers, I suspect you?ll need locking of the >>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>> and that will be expensive. It would be interesting to see what a >>> cache consistent multi-threaded name tree looks like. >>> >>> Marc >>> >>> >>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>> wrote: >>> >>> I had thought about these questions, but I want to know your idea >>> besides typed component: >>> 1. LPM allows "data discovery". How will exact match do similar >>> things? >>> 2. will removing selectors improve performance? How do we use >>> other >>> faster technique to replace selector? >>> 3. fixed byte length and type. I agree more that type can be fixed >>> byte, but 2 bytes for length might not be enough for future. >>> >>> >>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>> wrote: >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> >>> Could you share it with us? >>> >>> Sure. Here?s a strawman. >>> >>> The type space is 16 bits, so you have 65,565 types. >>> >>> The type space is currently shared with the types used for the >>> entire protocol, that gives us two options: >>> (1) we reserve a range for name component types. Given the >>> likelihood there will be at least as much and probably more need >>> to component types than protocol extensions, we could reserve 1/2 >>> of the type space, giving us 32K types for name components. >>> (2) since there is no parsing ambiguity between name components >>> and other fields of the protocol (sine they are sub-types of the >>> name type) we could reuse numbers and thereby have an entire 65K >>> name component types. >>> >>> We divide the type space into regions, and manage it with a >>> registry. If we ever get to the point of creating an IETF >>> standard, IANA has 25 years of experience running registries and >>> there are well-understood rule sets for different kinds of >>> registries (open, requires a written spec, requires standards >>> approval). >>> >>> - We allocate one ?default" name component type for ?generic >>> name?, which would be used on name prefixes and other common >>> cases where there are no special semantics on the name component. >>> - We allocate a range of name component types, say 1024, to >>> globally understood types that are part of the base or extension >>> NDN specifications (e.g. chunk#, version#, etc. >>> - We reserve some portion of the space for unanticipated uses >>> (say another 1024 types) >>> - We give the rest of the space to application assignment. >>> >>> Make sense? >>> >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design >>> >>> >>> we could design for performance, >>> >>> That?s not what people are advocating. We are advocating that we >>> *not* design for known bad performance and hope serendipity or >>> Moore?s Law will come to the rescue. >>> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >>> >>> Perhaps, perhaps not. Relative performance is what matters so >>> things that don?t get faster while others do tend to get dropped >>> or not used because they impose a performance penalty relative to >>> the things that go faster. There is also the ?low-end? phenomenon >>> where impovements in technology get applied to lowering cost >>> rather than improving performance. For those environments bad >>> performance just never get better. >>> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >>> I suspect LPM on data will always be slow (relative to the other >>> functions). >>> i suspect exclusions will always be slow because they will >>> require extra memory references. >>> >>> However I of course don?t claim to clairvoyance so this is just >>> speculation based on 35+ years of seeing performance improve by 4 >>> orders of magnitude and still having to worry about counting >>> cycles and memory references? >>> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>> wrote: >>> >>> We should not look at a certain chip nowadays and want ndn to >>> perform >>> well on it. It should be the other way around: once ndn app >>> becomes >>> popular, a better chip will be designed for ndn. >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design: >>> a) clock rates are not getting (much) faster >>> b) memory accesses are getting (relatively) more expensive >>> c) data structures that require locks to manipulate >>> successfully will be relatively more expensive, even with >>> near-zero lock contention. >>> >>> The fact is, IP *did* have some serious performance flaws in >>> its design. We just forgot those because the design elements >>> that depended on those mistakes have fallen into disuse. The >>> poster children for this are: >>> 1. IP options. Nobody can use them because they are too slow >>> on modern forwarding hardware, so they can?t be reliably used >>> anywhere >>> 2. the UDP checksum, which was a bad design when it was >>> specified and is now a giant PITA that still causes major pain >>> in working around. >>> >>> I?m afraid students today are being taught the that designers >>> of IP were flawless, as opposed to very good scientists and >>> engineers that got most of it right. >>> >>> I feel the discussion today and yesterday has been off-topic. >>> Now I >>> see that there are 3 approaches: >>> 1. we should not define a naming convention at all >>> 2. typed component: use tlv type space and add a handful of >>> types >>> 3. marked component: introduce only one more type and add >>> additional >>> marker space >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> It is just as powerful in practice as either throwing up our >>> hands and letting applications design their own mutually >>> incompatible schemes or trying to make naming conventions with >>> markers in a way that is fast to generate/parse and also >>> resilient against aliasing. >>> >>> Also everybody thinks that the current utf8 marker naming >>> convention >>> needs to be revised. >>> >>> >>> >>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>> wrote: >>> Would that chip be suitable, i.e. can we expect most names >>> to fit in (the >>> magnitude of) 96 bytes? What length are names usually in >>> current NDN >>> experiments? >>> >>> I guess wide deployment could make for even longer names. >>> Related: Many URLs >>> I encounter nowadays easily don't fit within two 80-column >>> text lines, and >>> NDN will have to carry more information than URLs, as far as >>> I see. >>> >>> >>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>> >>> In fact, the index in separate TLV will be slower on some >>> architectures, >>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>> bytes in memory, >>> then any subsequent memory is accessed only as two adjacent >>> 32-byte blocks >>> (there can be at most 5 blocks available at any one time). >>> If you need to >>> switch between arrays, it would be very expensive. If you >>> have to read past >>> the name to get to the 2nd array, then read it, then backup >>> to get to the >>> name, it will be pretty expensive too. >>> >>> Marc >>> >>> On Sep 18, 2014, at 2:02 PM, >>> wrote: >>> >>> Does this make that much difference? >>> >>> If you want to parse the first 5 components. One way to do >>> it is: >>> >>> Read the index, find entry 5, then read in that many bytes >>> from the start >>> offset of the beginning of the name. >>> OR >>> Start reading name, (find size + move ) 5 times. >>> >>> How much speed are you getting from one to the other? You >>> seem to imply >>> that the first one is faster. I don?t think this is the >>> case. >>> >>> In the first one you?ll probably have to get the cache line >>> for the index, >>> then all the required cache lines for the first 5 >>> components. For the >>> second, you?ll have to get all the cache lines for the first >>> 5 components. >>> Given an assumption that a cache miss is way more expensive >>> than >>> evaluating a number and computing an addition, you might >>> find that the >>> performance of the index is actually slower than the >>> performance of the >>> direct access. >>> >>> Granted, there is a case where you don?t access the name at >>> all, for >>> example, if you just get the offsets and then send the >>> offsets as >>> parameters to another processor/GPU/NPU/etc. In this case >>> you may see a >>> gain IF there are more cache line misses in reading the name >>> than in >>> reading the index. So, if the regular part of the name >>> that you?re >>> parsing is bigger than the cache line (64 bytes?) and the >>> name is to be >>> processed by a different processor, then your might see some >>> performance >>> gain in using the index, but in all other circumstances I >>> bet this is not >>> the case. I may be wrong, haven?t actually tested it. >>> >>> This is all to say, I don?t think we should be designing the >>> protocol with >>> only one architecture in mind. (The architecture of sending >>> the name to a >>> different processor than the index). >>> >>> If you have numbers that show that the index is faster I >>> would like to see >>> under what conditions and architectural assumptions. >>> >>> Nacho >>> >>> (I may have misinterpreted your description so feel free to >>> correct me if >>> I?m wrong.) >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>> >>> wrote: >>> >>> Indeed each components' offset must be encoded using a fixed >>> amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your >>> name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want the first x components) you can directly have it using >>> the >>> offsets. With the Nested TLV structure you have to >>> iteratively parse >>> the first x-1 components. With the offset structure you cane >>> directly >>> access to the firs x components. >>> >>> I don't get it. What you described only works if the >>> "offset" is >>> encoded in fixed bytes. With varNum, you will still need to >>> parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>> >>> On 17/09/2014 14:56, Mark Stapp wrote: >>> >>> ah, thanks - that's helpful. I thought you were saying "I >>> like the >>> existing NDN UTF8 'convention'." I'm still not sure I >>> understand what >>> you >>> _do_ prefer, though. it sounds like you're describing an >>> entirely >>> different >>> scheme where the info that describes the name-components is >>> ... >>> someplace >>> other than _in_ the name-components. is that correct? when >>> you say >>> "field >>> separator", what do you mean (since that's not a "TL" from a >>> TLV)? >>> >>> Correct. >>> In particular, with our name encoding, a TLV indicates the >>> name >>> hierarchy >>> with offsets in the name and other TLV(s) indicates the >>> offset to use >>> in >>> order to retrieve special components. >>> As for the field separator, it is something like "/". >>> Aliasing is >>> avoided as >>> you do not rely on field separators to parse the name; you >>> use the >>> "offset >>> TLV " to do that. >>> >>> So now, it may be an aesthetic question but: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want >>> the first x components) you can directly have it using the >>> offsets. >>> With the >>> Nested TLV structure you have to iteratively parse the first >>> x-1 >>> components. >>> With the offset structure you cane directly access to the >>> firs x >>> components. >>> >>> Max >>> >>> >>> -- Mark >>> >>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>> The why is simple: >>> >>> You use a lot of "generic component type" and very few >>> "specific >>> component type". You are imposing types for every component >>> in order >>> to >>> handle few exceptions (segmentation, etc..). You create a >>> rule >>> (specify >>> the component's type ) to handle exceptions! >>> >>> I would prefer not to have typed components. Instead I would >>> prefer >>> to >>> have the name as simple sequence bytes with a field >>> separator. Then, >>> outside the name, if you have some components that could be >>> used at >>> network layer (e.g. a TLV field), you simply need something >>> that >>> indicates which is the offset allowing you to retrieve the >>> version, >>> segment, etc in the name... >>> >>> >>> Max >>> >>> >>> >>> >>> >>> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end >>> up with >>> names >>> containing many generic components types and few specific >>> components >>> types. Due to the fact that the component type specification >>> is an >>> exception in the name, I would prefer something that specify >>> component's >>> type only when needed (something like UTF8 conventions but >>> that >>> applications MUST use). >>> >>> so ... I can't quite follow that. the thread has had some >>> explanation >>> about why the UTF8 requirement has problems (with aliasing, >>> e.g.) >>> and >>> there's been email trying to explain that applications don't >>> have to >>> use types if they don't need to. your email sounds like "I >>> prefer >>> the >>> UTF8 convention", but it doesn't say why you have that >>> preference in >>> the face of the points about the problems. can you say why >>> it is >>> that >>> you express a preference for the "convention" with problems ? >>> >>> Thanks, >>> Mark >>> >>> . >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Wed Sep 24 09:25:53 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Wed, 24 Sep 2014 16:25:53 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: I think Tai-Lin?s example was just fine to talk about discovery. /blah/blah/value, how do you discover all the ?value?s? Discovery shouldn?t care if its email messages or temperature readings or world cup photos. I described one set of problems using the exclusion approach, and that an NDN paper on device discovery described a similar problem, though they did not go into the details of splitting interests, etc. That all was simple enough to see from the example. Another question is how does one do the discovery with exact match names, which is also conflating things. You could do a different discovery with continuation names too, just not the exclude method. As I alluded to, one needs a way to talk with a specific cache about its ?table of contents? for a prefix so one can get a consistent set of results without all the round-trips of exclusions. Actually downloading the ?headers? of the messages would be the same bytes, more or less. In a way, this is a little like name enumeration from a ccnx 0.x repo, but that protocol has its own set of problems and I?m not suggesting to use that directly. One approach is to encode a request in a name component and a participating cache can reply. It replies in such a way that one could continue talking with that cache to get its TOC. One would then issue another interest with a request for not-that-cache. Another approach is to try to ask the authoritative source for the ?current? manifest name, i.e. /mail/inbox/current/, which could return the manifest or a link to the manifest. Then fetching the actual manifest from the link could come from caches because you how have a consistent set of names to ask for. If you cannot talk with an authoritative source, you could try again without the nonce and see if there?s a cached copy of a recent version around. Marc On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: > > > On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" > wrote: > >> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >> >>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>> pattern with static (/mail/inbox) and variable (148) components; with >>> proper naming convention, computers can also detect this pattern >>> easily. Now I want to look for all mails in my inbox. I can generate a >>> list of /mail/inbox/. These are my guesses, and with selectors >>> I can further refine my guesses. >> >> I think this is a very bad example (or at least a very bad application >> design). You have an app (a mail server / inbox) and you want it to list >> your emails? An email list is an application data structure. I don?t >> think you should use the network structure to reflect this. > > I think Tai-Lin is trying to sketch a small example, not propose a > full-scale approach to email. (Maybe I am misunderstanding.) > > > Another way to look at it is that if the network architecture is providing > the equivalent of distributed storage to the application, perhaps the > application data structure could be adapted to match the affordances of > the network. Then it would not be so bad that the two structures were > aligned. > >> >> I?ll give you an example, how do you delete emails from your inbox? If an >> email was cached in the network it can never be deleted from your inbox? > > This is conflating two issues - what you are pointing out is that the data > structure of a linear list doesn't handle common email management > operations well. Again, I'm not sure if that's what he was getting at > here. But deletion is not the issue - the availability of a data object > on the network does not necessarily mean it's valid from the perspective > of the application. > >> Or moved to another mailbox? Do you rely on the emails expiring? >> >> This problem is true for most (any?) situations where you use network name >> structure to directly reflect the application data structure. > > Not sure I understand how you make the leap from the example to the > general statement. > > Jeff > > > >> >> Nacho >> >> >> >>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>> Ok, yes I think those would all be good things. >>>> >>>> One thing to keep in mind, especially with things like time series >>>> sensor >>>> data, is that people see a pattern and infer a way of doing it. That?s >>>> easy >>>> for a human :) But in Discovery, one should assume that one does not >>>> know >>>> of patterns in the data beyond what the protocols used to publish the >>>> data >>>> explicitly require. That said, I think some of the things you listed >>>> are >>>> good places to start: sensor data, web content, climate data or genome >>>> data. >>>> >>>> We also need to state what the forwarding strategies are and what the >>>> cache >>>> behavior is. >>>> >>>> I outlined some of the points that I think are important in that other >>>> posting. While ?discover latest? is useful, ?discover all? is also >>>> important, and that one gets complicated fast. So points like >>>> separating >>>> discovery from retrieval and working with large data sets have been >>>> important in shaping our thinking. That all said, I?d be happy >>>> starting >>>> from 0 and working through the Discovery service definition from >>>> scratch >>>> along with data set use cases. >>>> >>>> Marc >>>> >>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>> wrote: >>>> >>>> Hi Marc, >>>> >>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>> more >>>> specific, which was to see if we could identify a few specific use >>>> cases >>>> around which to have the conversation. (e.g., time series sensor data >>>> and >>>> web content retrieval for "get latest"; climate data for huge data >>>> sets; >>>> local data in a vehicular network; etc.) What have you been looking at >>>> that's driving considerations of discovery? >>>> >>>> Thanks, >>>> Jeff >>>> >>>> From: >>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>> To: Jeff Burke >>>> Cc: , >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> Jeff, >>>> >>>> Take a look at my posting (that Felix fixed) in a new thread on >>>> Discovery. >>>> >>>> >>>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >>>> 0 >>>> .html >>>> >>>> I think it would be very productive to talk about what Discovery should >>>> do, >>>> and not focus on the how. It is sometimes easy to get caught up in the >>>> how, >>>> which I think is a less important topic than the what at this stage. >>>> >>>> Marc >>>> >>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>> wrote: >>>> >>>> Marc, >>>> >>>> If you can't talk about your protocols, perhaps we can discuss this >>>> based >>>> on use cases. What are the use cases you are using to evaluate >>>> discovery? >>>> >>>> Jeff >>>> >>>> >>>> >>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>> wrote: >>>> >>>> No matter what the expressiveness of the predicates if the forwarder >>>> can >>>> send interests different ways you don't have a consistent underlying >>>> set >>>> to talk about so you would always need non-range exclusions to discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative >>>> answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>> >>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>> all problems that you mentioned (not just exclude)? a doc will be >>>> better. >>>> >>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>> language or context free language might become part of selector too. >>>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. >>>> >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>> If you talk sometimes to A and sometimes to B, you very easily >>>> could miss content objects you want to discovery unless you avoid >>>> all range exclusions and only exclude explicit versions. >>>> >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit >>>> exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>>> >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. For something like a sensor reading that is updated, >>>> say, once per second you will have 86,400 of them per day. If each >>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>> exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a >>>> timestamp makes sense here, but its just an example of needing a lot >>>> of exclusions. >>>> >>>> >>>> You exclude through 100 then issue a new interest. This goes to >>>> cache B >>>> >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, >>>> if >>>> this goes to cache B only, it means that cache A dies. How do you >>>> know >>>> that v101 even exist? >>>> >>>> >>>> I guess this depends on what the forwarding strategy is. If the >>>> forwarder will always send each interest to all replicas, then yes, >>>> modulo packet loss, you would discover v101 on cache A. If the >>>> forwarder is just doing ?best path? and can round-robin between cache >>>> A and cache B, then your application could miss v101. >>>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs >>>> (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>> specs for doing the exact match discovery. So, as I said, I?m not >>>> ready to claim its better yet because we have not done that. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest >>>> matching to do discovery has its own set of problems. Discovery >>>> involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about >>>> naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two >>>> different caches are responding, the exclusion set you build up >>>> talking with cache A will be invalid for cache B. If you talk >>>> sometimes to A and sometimes to B, you very easily could miss >>>> content objects you want to discovery unless you avoid all range >>>> exclusions and only exclude explicit versions. That will lead to >>>> very large interest packets. In ccnx 1.0, we believe that an >>>> explicit discovery protocol that allows conversations about >>>> consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that >>>> should be transitive between caches, but imagine this. You send >>>> Interest #1 to cache A which returns version 100. You exclude >>>> through 100 then issue a new interest. This goes to cache B who >>>> only has version 99, so the interest times out or is NACK?d. So >>>> you think you have it! But, cache A already has version 101, you >>>> just don?t know. If you cannot have a conversation around >>>> consistent sets, it seems like even doing latest version discovery >>>> is difficult with selector based discovery. From what I saw in >>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>> authoritative source because you can never believe an intermediate >>>> cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>> interest in seeing your analysis. Case (a) is that a node can >>>> correctly discover every version of a name prefix, and (b) is that >>>> a node can correctly discover the latest version. We have not >>>> formally compared (or yet published) our discovery protocols (we >>>> have three, 2 for content, 1 for device) compared to selector based >>>> discovery, so I cannot yet claim they are better, but they do not >>>> have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you >>>> must do in the PIT to match a content object. If you have a name >>>> tree or a threaded hash table, those don?t all need to be hash >>>> lookups, but you need to walk up the name tree for every prefix of >>>> the content object name and evaluate the selector predicate. >>>> Content Based Networking (CBN) had some some methods to create data >>>> structures based on predicates, maybe those would be better. But >>>> in any case, you will potentially need to retrieve many PIT entries >>>> if there is Interest traffic for many prefixes of a root. Even on >>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>> implementation only requires at most 3 lookups (one by name, one by >>>> name + keyid, one by name + content object hash), and one can do >>>> other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just >>>> walking parent pointers, I suspect you?ll need locking of the >>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>> and that will be expensive. It would be interesting to see what a >>>> cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar >>>> things? >>>> 2. will removing selectors improve performance? How do we use >>>> other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> >>>> Could you share it with us? >>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the >>>> entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the >>>> likelihood there will be at least as much and probably more need >>>> to component types than protocol extensions, we could reserve 1/2 >>>> of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components >>>> and other fields of the protocol (sine they are sub-types of the >>>> name type) we could reuse numbers and thereby have an entire 65K >>>> name component types. >>>> >>>> We divide the type space into regions, and manage it with a >>>> registry. If we ever get to the point of creating an IETF >>>> standard, IANA has 25 years of experience running registries and >>>> there are well-understood rule sets for different kinds of >>>> registries (open, requires a written spec, requires standards >>>> approval). >>>> >>>> - We allocate one ?default" name component type for ?generic >>>> name?, which would be used on name prefixes and other common >>>> cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to >>>> globally understood types that are part of the base or extension >>>> NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses >>>> (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design >>>> >>>> >>>> we could design for performance, >>>> >>>> That?s not what people are advocating. We are advocating that we >>>> *not* design for known bad performance and hope serendipity or >>>> Moore?s Law will come to the rescue. >>>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>>> >>>> Perhaps, perhaps not. Relative performance is what matters so >>>> things that don?t get faster while others do tend to get dropped >>>> or not used because they impose a performance penalty relative to >>>> the things that go faster. There is also the ?low-end? phenomenon >>>> where impovements in technology get applied to lowering cost >>>> rather than improving performance. For those environments bad >>>> performance just never get better. >>>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>>> I suspect LPM on data will always be slow (relative to the other >>>> functions). >>>> i suspect exclusions will always be slow because they will >>>> require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just >>>> speculation based on 35+ years of seeing performance improve by 4 >>>> orders of magnitude and still having to worry about counting >>>> cycles and memory references? >>>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> We should not look at a certain chip nowadays and want ndn to >>>> perform >>>> well on it. It should be the other way around: once ndn app >>>> becomes >>>> popular, a better chip will be designed for ndn. >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate >>>> successfully will be relatively more expensive, even with >>>> near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in >>>> its design. We just forgot those because the design elements >>>> that depended on those mistakes have fallen into disuse. The >>>> poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow >>>> on modern forwarding hardware, so they can?t be reliably used >>>> anywhere >>>> 2. the UDP checksum, which was a bad design when it was >>>> specified and is now a giant PITA that still causes major pain >>>> in working around. >>>> >>>> I?m afraid students today are being taught the that designers >>>> of IP were flawless, as opposed to very good scientists and >>>> engineers that got most of it right. >>>> >>>> I feel the discussion today and yesterday has been off-topic. >>>> Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of >>>> types >>>> 3. marked component: introduce only one more type and add >>>> additional >>>> marker space >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our >>>> hands and letting applications design their own mutually >>>> incompatible schemes or trying to make naming conventions with >>>> markers in a way that is fast to generate/parse and also >>>> resilient against aliasing. >>>> >>>> Also everybody thinks that the current utf8 marker naming >>>> convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>> wrote: >>>> Would that chip be suitable, i.e. can we expect most names >>>> to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in >>>> current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. >>>> Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column >>>> text lines, and >>>> NDN will have to carry more information than URLs, as far as >>>> I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some >>>> architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>> bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent >>>> 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). >>>> If you need to >>>> switch between arrays, it would be very expensive. If you >>>> have to read past >>>> the name to get to the 2nd array, then read it, then backup >>>> to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do >>>> it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes >>>> from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You >>>> seem to imply >>>> that the first one is faster. I don?t think this is the >>>> case. >>>> >>>> In the first one you?ll probably have to get the cache line >>>> for the index, >>>> then all the required cache lines for the first 5 >>>> components. For the >>>> second, you?ll have to get all the cache lines for the first >>>> 5 components. >>>> Given an assumption that a cache miss is way more expensive >>>> than >>>> evaluating a number and computing an addition, you might >>>> find that the >>>> performance of the index is actually slower than the >>>> performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at >>>> all, for >>>> example, if you just get the offsets and then send the >>>> offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case >>>> you may see a >>>> gain IF there are more cache line misses in reading the name >>>> than in >>>> reading the index. So, if the regular part of the name >>>> that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the >>>> name is to be >>>> processed by a different processor, then your might see some >>>> performance >>>> gain in using the index, but in all other circumstances I >>>> bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the >>>> protocol with >>>> only one architecture in mind. (The architecture of sending >>>> the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I >>>> would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to >>>> correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed >>>> amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your >>>> name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want the first x components) you can directly have it using >>>> the >>>> offsets. With the Nested TLV structure you have to >>>> iteratively parse >>>> the first x-1 components. With the offset structure you cane >>>> directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the >>>> "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to >>>> parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I >>>> like the >>>> existing NDN UTF8 'convention'." I'm still not sure I >>>> understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an >>>> entirely >>>> different >>>> scheme where the info that describes the name-components is >>>> ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when >>>> you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a >>>> TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the >>>> name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the >>>> offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". >>>> Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you >>>> use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want >>>> the first x components) you can directly have it using the >>>> offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first >>>> x-1 >>>> components. >>>> With the offset structure you cane directly access to the >>>> firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few >>>> "specific >>>> component type". You are imposing types for every component >>>> in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a >>>> rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would >>>> prefer >>>> to >>>> have the name as simple sequence bytes with a field >>>> separator. Then, >>>> outside the name, if you have some components that could be >>>> used at >>>> network layer (e.g. a TLV field), you simply need something >>>> that >>>> indicates which is the offset allowing you to retrieve the >>>> version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end >>>> up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification >>>> is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but >>>> that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, >>>> e.g.) >>>> and >>>> there's been email trying to explain that applications don't >>>> have to >>>> use types if they don't need to. your email sounds like "I >>>> prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that >>>> preference in >>>> the face of the points about the problems. can you say why >>>> it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From jburke at remap.ucla.edu Wed Sep 24 09:30:08 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Wed, 24 Sep 2014 16:30:08 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: On 9/24/14, 8:35 AM, "Ignacio.Solis at parc.com" wrote: >On 9/24/14, 4:37 PM, "Lan Wang (lanwang)" wrote: > >>Here's how I see the name discovery will work in this case: >> >>- the email client can request the latest list of email names in the >>mailbox /username/mailbox/list (using the right-most selector). >> >>- of course, there may be cached lists in the network. This should be >>minimized by a very short FreshnessSeconds on the data set by the email >>server (e.g., 1 second). So most likely there is at most one cached list >>is out there. >> >>- after the client gets an email list (/username/mailbox/list/20), the >>client should immediately issue another Interest >>(/username/mailbox/list/) with the selector (>20) to see if there are >>most recent ones. >> >>- eventually the request will get to the server. The server may respond >>with a more recent list or a NACK. > Ok, so sync-style approaches may work better for this example as Marc already pointed out, but nonetheless... (Marc, I am catching up on emails and will respond to that shortly.) > >Let me see if I follow: > > > >A1- You publish /username/mailbox/list/1 (lifetime of 1 second) >A2- You publish /username/mailbox/list/2 (lifetime of 1 second) >A3- You publish /username/mailbox/list/3 (lifetime of 1 second) > >? >A20- You publish /username/mailbox/list/20 (lifetime of 1 second) This isn't 20 steps. First, no data leaves the publisher without an Interest. Second, it's more like one API call: make this list available as versioned Data with a minimum allowable time between responses of one second. No matter how many requests outstanding, a stable load on the source. > >B- You request /username/mailbox/list > >C- You receive /username/mailbox/list/20 (lifetime of 1 second) At this point, you decide if list v20 is sufficient for your purposes. Perhaps it is. If not, >D- You request /username/mailbox/list (/>20) >E- Request gets routed to actual publisher >F- It replies with a NACK or a new list. > > >Why did you do all these extra steps? > >Why not just do: > > > > >B- You request /username/mailbox/list >E- Request gets routed to actual publisher >F- It replies with a new list (lifetime of 0 seconds) > (Again, could use sync-style set reconciliation here. Also, one can achieve this pattern in NDN by throwing a unique string on the end. But let's think of this a just a versioned file.) Some thoughts: - In Scheme B, if the list has not changed, you still get a response, because the publisher has no way to know anything about the consumer's knowledge. In Scheme A, publishers have that knowledge from the exclusion and need not reply. If NACKs are used as heartbeats, they can be returned more slowly... say every 3-10 seconds. So, many data packets are potentially saved. Hopefully we don't get one email per second... :) - Benefit seems apparent in multi-consumer scenarios, even without sync. Let's say I have 5 personal devices requesting mail. In Scheme B, every publisher receives and processes 5 interests per second on average. In Scheme A, with an upstream caching node, each receives 1 per second maximum. The publisher still has to throttle requests, but with no help or scaling support from the network. > > >In Scheme A you sent 2 interests, received 2 objects, going all the way to >source. >In Scheme B you sent 1 interest, received 1 object, going all the way to >source. > >Scheme B is always better (doesn?t need to do C, D) for this example and >it uses exact matching. It's better if your metric is roundtrips and you don't care about load on the publisher, lower traffic in times of no new data, etc. But if you don't, then you can certainly implement Scheme B on NDN, too. Jeff > >You can play tricks with the lifetime of the object in both cases, >selectors or not. > >> >>- meanwhile, the email client can retrieve the emails using the names >>obtained in these lists. Some emails may turn out to be unnecessary, so >>they will be discarded when a most recent list comes. The email client >>can also keep state about the names of the emails it has deleted to >>minimize this problem. > >This is independent of selectors / exact matching. > >Nacho > > > >> >>On Sep 24, 2014, at 2:37 AM, Marc.Mosko at parc.com wrote: >> >>> Ok, let?s take that example and run with it a bit. I?ll walk through a >>>?discover all? example. This example leads me to why I say discovery >>>should be separate from data retrieval. I don?t claim that we have a >>>final solution to this problem, I think in a distributed peer-to-peer >>>environment solving this problem is difficult. If you have a counter >>>example as to how this discovery could progress using only the >>>information know a priori by the requester, I would be interesting in >>>seeing that example worked out. Please do correct me if you think this >>>is wrong. >>> >>> You have mails that were originally numbered 0 - 10000, sequentially by >>>the server. >>> >>> You travel between several places and access different emails from >>>different places. This populates caches. Lets say 0,3,6,9,? are on >>>cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. >>>Also, you have deleted 500 random emails, so there?s only 9500 emails >>>actually out there. >>> >>> You setup a new computer and now want to download all your emails. The >>>new computer is on the path of caches C, B, then A, then the >>>authoritative source server. The new email program has no initial >>>state. The email program only knows that the email number is an integer >>>that starts at 0. It issues an interest for /mail/inbox, and asks for >>>left-most child because it want to populate in order. It gets a >>>response from cache C with mail 2. >>> >>> Now, what does the email program do? It cannot exclude the range 0..2 >>>because that would possibly miss 0 and 1. So, all it can do is exclude >>>the exact number ?2? and ask again. It then gets cache C again and it >>>responds with ?5?. There are about 3000 emails on cache C, and if they >>>all take 4 bytes (for the exclude component plus its coding overhead), >>>then that?s 12KB of exclusions to finally exhaust cache C. >>> >>> If we want Interests to avoid fragmentation, we can fit about 1200 >>>bytes of exclusions, or 300 components. This means we need about 10 >>>interest messages. Each interest would be something like ?exclude >>>2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, 308, >>>?, >600?, etc. >>> >>> Those interests that exclude everything at cache C would then hit, say >>>cache B and start getting results 1, 4, 7, ?. This means an Interest >>>like ?exclude 2,5,8,11,?, >300? would then get back number 1. That >>>means the next request actually has to split that one interest?s exclude >>>in to two (because the interest was at maximum size), so you now issue >>>two interests where one is ?exclude 1, 2, 5, 8, >210? and the other is >>>?<210, 212, 215, ?, >300?. >>> >>> If you look in the CCNx 0.8 java code, there should be a class that >>>does these Interest based discoveries and does the Interest splitting >>>based on the currently know range of discovered content. I don?t have >>>the specific reference right now, but I can send a link if you are >>>interested in seeing that. The java class keeps state of what has been >>>discovered so far, so it could re-start later if interrupted. >>> >>> So all those interests would now be getting results form cache B. You >>>would then start to split all those ranges to accommodate the numbers >>>coming back from B. Eventually, you?ll have at least 10 Interest >>>messages outstanding that would be excluding all the 9500 messages that >>>are in caches A, B, and C. Some of those interest messages might >>>actually reach an authoritative server, which might respond too. It >>>would like be more than 10 interests due to the algorithm that?s used to >>>split full interests, which likely is not optimal because it does not >>>know exactly where breaks should be a priori. >>> >>> Once you have exhausted caches A, B, and C, the interest messages would >>>reach the authoritative source (if its on line), and it would be issuing >>>NACKs (i assume) for interests have have excluded all non-deleted >>>emails. >>> >>> In any case, it takes, at best, 9,500 round trips to ?discover? all >>>9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 bytes >>>of Interest exclusions. Note that it?s an arithmetic sum of bytes of >>>exclusion, because at each Interest the size of the exclusions increases >>>by 4. There was an NDN paper about light bulb discovery (or something >>>like that) that noted this same problem and proposed some work around, >>>but I don?t remember what they proposed. >>> >>> Yes, you could possibly pipeline it, but what would you do? In this >>>example, where emails 0 - 10000 (minus some random ones) would allow you >>>? if you knew a priori ? to issue say 10 interest in parallel that ask >>>for different ranges. But, 2 years from now your undeleted emails might >>>range form 100,000 - 150,000. The point is that a discovery protocol >>>does not know, a priori, what is to be discovered. It might start >>>learning some stuff as it goes on. >>> >>> If you could have retrieved just a table of contents from each cache, >>>where each ?row? is say 64 bytes (i.e. the name continuation plus hash >>>value), you would need to retrieve 3300 * 64 = 211KB from each cache >>>(total 640 KB) to list all the emails. That would take 640KB / 1200 = >>>534 interest messages of say 64 bytes = 34 KB to discover all 9500 >>>emails plus another set to fetch the header rows. That?s, say 68 KB of >>>interest traffic compared to 200 MB. Now, I?ve not said how to list >>>these tables of contents, so an actual protocol might be higher >>>communication cost, but even if it was 10x worse that would still be an >>>attractive tradeoff. >>> >>> This assumes that you publish just the ?header? in the 1st segment (say >>>1 KB total object size including the signatures). That?s 10 MB to learn >>>the headers. >>> >>> You could also argue that the distribute of emails over caches is >>>arbitrary. That?s true, I picked a difficult sequence. But unless you >>>have some positive controls on what could be in a cache, it could be any >>>difficult sequence. I also did not address the timeout issue, and how >>>do you know you are done? >>> >>> This is also why sync works so much better than doing raw interest >>>discovery. Sync exchanges tables of contents and diffs, it does not >>>need to enumerate by exclusion everything to retrieve. >>> >>> Marc >>> >>> >>> >>> On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu wrote: >>> >>>> discovery can be reduced to "pattern detection" (can we infer what >>>> exists?) and "pattern validation" (can we confirm this guess?) >>>> >>>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>> pattern with static (/mail/inbox) and variable (148) components; with >>>> proper naming convention, computers can also detect this pattern >>>> easily. Now I want to look for all mails in my inbox. I can generate a >>>> list of /mail/inbox/. These are my guesses, and with selectors >>>> I can further refine my guesses. >>>> >>>> To validate them, bloom filter can provide "best effort" >>>> discovery(with some false positives, so I call it "best-effort") >>>> before I stupidly send all the interests to the network. >>>> >>>> The discovery protocol, as I described above, is essentially "pattern >>>> detection by naming convention" and "bloom filter validation." This is >>>> definitely one of the "simpler" discovery protocol, because the data >>>> producer only need to add additional bloom filter. Notice that we can >>>> progressively add entries to bfilter with low computation cost. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>> Ok, yes I think those would all be good things. >>>>> >>>>> One thing to keep in mind, especially with things like time series >>>>>sensor >>>>> data, is that people see a pattern and infer a way of doing it. >>>>>That?s easy >>>>> for a human :) But in Discovery, one should assume that one does not >>>>>know >>>>> of patterns in the data beyond what the protocols used to publish the >>>>>data >>>>> explicitly require. That said, I think some of the things you listed >>>>>are >>>>> good places to start: sensor data, web content, climate data or >>>>>genome data. >>>>> >>>>> We also need to state what the forwarding strategies are and what the >>>>>cache >>>>> behavior is. >>>>> >>>>> I outlined some of the points that I think are important in that >>>>>other >>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>> important, and that one gets complicated fast. So points like >>>>>separating >>>>> discovery from retrieval and working with large data sets have been >>>>> important in shaping our thinking. That all said, I?d be happy >>>>>starting >>>>> from 0 and working through the Discovery service definition from >>>>>scratch >>>>> along with data set use cases. >>>>> >>>>> Marc >>>>> >>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>wrote: >>>>> >>>>> Hi Marc, >>>>> >>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>more >>>>> specific, which was to see if we could identify a few specific use >>>>>cases >>>>> around which to have the conversation. (e.g., time series sensor >>>>>data and >>>>> web content retrieval for "get latest"; climate data for huge data >>>>>sets; >>>>> local data in a vehicular network; etc.) What have you been looking >>>>>at >>>>> that's driving considerations of discovery? >>>>> >>>>> Thanks, >>>>> Jeff >>>>> >>>>> From: >>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>> To: Jeff Burke >>>>> Cc: , >>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>> >>>>> Jeff, >>>>> >>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>Discovery. >>>>> >>>>> >>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000 >>>>>2 >>>>>00.html >>>>> >>>>> I think it would be very productive to talk about what Discovery >>>>>should do, >>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>the how, >>>>> which I think is a less important topic than the what at this stage. >>>>> >>>>> Marc >>>>> >>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>wrote: >>>>> >>>>> Marc, >>>>> >>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>based >>>>> on use cases. What are the use cases you are using to evaluate >>>>> discovery? >>>>> >>>>> Jeff >>>>> >>>>> >>>>> >>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>wrote: >>>>> >>>>> No matter what the expressiveness of the predicates if the forwarder >>>>>can >>>>> send interests different ways you don't have a consistent underlying >>>>>set >>>>> to talk about so you would always need non-range exclusions to >>>>>discover >>>>> every version. >>>>> >>>>> Range exclusions only work I believe if you get an authoritative >>>>>answer. >>>>> If different content pieces are scattered between different caches I >>>>> don't see how range exclusions would work to discover every version. >>>>> >>>>> I'm sorry to be pointing out problems without offering solutions but >>>>> we're not ready to publish our discovery protocols. >>>>> >>>>> Sent from my telephone >>>>> >>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>>> >>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>the >>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>> better. >>>>> >>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>soon >>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>Regular >>>>> language or context free language might become part of selector too. >>>>> >>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>> That will get you one reading then you need to exclude it and ask >>>>> again. >>>>> >>>>> Sent from my telephone >>>>> >>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>>> >>>>> Yes, my point was that if you cannot talk about a consistent set >>>>> with a particular cache, then you need to always use individual >>>>> excludes not range excludes if you want to discover all the versions >>>>> of an object. >>>>> >>>>> >>>>> I am very confused. For your example, if I want to get all today's >>>>> sensor data, I just do (Any..Last second of last day)(First second of >>>>> tomorrow..Any). That's 18 bytes. >>>>> >>>>> >>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>> >>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>> >>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>> >>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>> could miss content objects you want to discovery unless you avoid >>>>> all range exclusions and only exclude explicit versions. >>>>> >>>>> >>>>> Could you explain why missing content object situation happens? also >>>>> range exclusion is just a shorter notation for many explicit >>>>> exclude; >>>>> converting from explicit excludes to ranged exclude is always >>>>> possible. >>>>> >>>>> >>>>> Yes, my point was that if you cannot talk about a consistent set >>>>> with a particular cache, then you need to always use individual >>>>> excludes not range excludes if you want to discover all the versions >>>>> of an object. For something like a sensor reading that is updated, >>>>> say, once per second you will have 86,400 of them per day. If each >>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>> exclusions (plus encoding overhead) per day. >>>>> >>>>> yes, maybe using a more deterministic version number than a >>>>> timestamp makes sense here, but its just an example of needing a lot >>>>> of exclusions. >>>>> >>>>> >>>>> You exclude through 100 then issue a new interest. This goes to >>>>> cache B >>>>> >>>>> >>>>> I feel this case is invalid because cache A will also get the >>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>> if >>>>> this goes to cache B only, it means that cache A dies. How do you >>>>> know >>>>> that v101 even exist? >>>>> >>>>> >>>>> I guess this depends on what the forwarding strategy is. If the >>>>> forwarder will always send each interest to all replicas, then yes, >>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>> forwarder is just doing ?best path? and can round-robin between cache >>>>> A and cache B, then your application could miss v101. >>>>> >>>>> >>>>> >>>>> c,d In general I agree that LPM performance is related to the number >>>>> of components. In my own thread-safe LMP implementation, I used only >>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>> every node will be faster or not because of lock overhead. >>>>> >>>>> However, we should compare (exact match + discovery protocol) vs >>>>> (ndn >>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>> >>>>> >>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>> ready to claim its better yet because we have not done that. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>> I would point out that using LPM on content object to Interest >>>>> matching to do discovery has its own set of problems. Discovery >>>>> involves more than just ?latest version? discovery too. >>>>> >>>>> This is probably getting off-topic from the original post about >>>>> naming conventions. >>>>> >>>>> a. If Interests can be forwarded multiple directions and two >>>>> different caches are responding, the exclusion set you build up >>>>> talking with cache A will be invalid for cache B. If you talk >>>>> sometimes to A and sometimes to B, you very easily could miss >>>>> content objects you want to discovery unless you avoid all range >>>>> exclusions and only exclude explicit versions. That will lead to >>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>> explicit discovery protocol that allows conversations about >>>>> consistent sets is better. >>>>> >>>>> b. Yes, if you just want the ?latest version? discovery that >>>>> should be transitive between caches, but imagine this. You send >>>>> Interest #1 to cache A which returns version 100. You exclude >>>>> through 100 then issue a new interest. This goes to cache B who >>>>> only has version 99, so the interest times out or is NACK?d. So >>>>> you think you have it! But, cache A already has version 101, you >>>>> just don?t know. If you cannot have a conversation around >>>>> consistent sets, it seems like even doing latest version discovery >>>>> is difficult with selector based discovery. From what I saw in >>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>> authoritative source because you can never believe an intermediate >>>>> cache that there?s not something more recent. >>>>> >>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>> interest in seeing your analysis. Case (a) is that a node can >>>>> correctly discover every version of a name prefix, and (b) is that >>>>> a node can correctly discover the latest version. We have not >>>>> formally compared (or yet published) our discovery protocols (we >>>>> have three, 2 for content, 1 for device) compared to selector based >>>>> discovery, so I cannot yet claim they are better, but they do not >>>>> have the non-determinism sketched above. >>>>> >>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>> must do in the PIT to match a content object. If you have a name >>>>> tree or a threaded hash table, those don?t all need to be hash >>>>> lookups, but you need to walk up the name tree for every prefix of >>>>> the content object name and evaluate the selector predicate. >>>>> Content Based Networking (CBN) had some some methods to create data >>>>> structures based on predicates, maybe those would be better. But >>>>> in any case, you will potentially need to retrieve many PIT entries >>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>> implementation only requires at most 3 lookups (one by name, one by >>>>> name + keyid, one by name + content object hash), and one can do >>>>> other things to optimize lookup for an extra write. >>>>> >>>>> d. In (c) above, if you have a threaded name tree or are just >>>>> walking parent pointers, I suspect you?ll need locking of the >>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>> and that will be expensive. It would be interesting to see what a >>>>> cache consistent multi-threaded name tree looks like. >>>>> >>>>> Marc >>>>> >>>>> >>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>> wrote: >>>>> >>>>> I had thought about these questions, but I want to know your idea >>>>> besides typed component: >>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>> things? >>>>> 2. will removing selectors improve performance? How do we use >>>>> other >>>>> faster technique to replace selector? >>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>> byte, but 2 bytes for length might not be enough for future. >>>>> >>>>> >>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>> wrote: >>>>> >>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>> wrote: >>>>> >>>>> I know how to make #2 flexible enough to do what things I can >>>>> envision we need to do, and with a few simple conventions on >>>>> how the registry of types is managed. >>>>> >>>>> >>>>> Could you share it with us? >>>>> >>>>> Sure. Here?s a strawman. >>>>> >>>>> The type space is 16 bits, so you have 65,565 types. >>>>> >>>>> The type space is currently shared with the types used for the >>>>> entire protocol, that gives us two options: >>>>> (1) we reserve a range for name component types. Given the >>>>> likelihood there will be at least as much and probably more need >>>>> to component types than protocol extensions, we could reserve 1/2 >>>>> of the type space, giving us 32K types for name components. >>>>> (2) since there is no parsing ambiguity between name components >>>>> and other fields of the protocol (sine they are sub-types of the >>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>> name component types. >>>>> >>>>> We divide the type space into regions, and manage it with a >>>>> registry. If we ever get to the point of creating an IETF >>>>> standard, IANA has 25 years of experience running registries and >>>>> there are well-understood rule sets for different kinds of >>>>> registries (open, requires a written spec, requires standards >>>>> approval). >>>>> >>>>> - We allocate one ?default" name component type for ?generic >>>>> name?, which would be used on name prefixes and other common >>>>> cases where there are no special semantics on the name component. >>>>> - We allocate a range of name component types, say 1024, to >>>>> globally understood types that are part of the base or extension >>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>> - We reserve some portion of the space for unanticipated uses >>>>> (say another 1024 types) >>>>> - We give the rest of the space to application assignment. >>>>> >>>>> Make sense? >>>>> >>>>> >>>>> While I?m sympathetic to that view, there are three ways in >>>>> which Moore?s law or hardware tricks will not save us from >>>>> performance flaws in the design >>>>> >>>>> >>>>> we could design for performance, >>>>> >>>>> That?s not what people are advocating. We are advocating that we >>>>> *not* design for known bad performance and hope serendipity or >>>>> Moore?s Law will come to the rescue. >>>>> >>>>> but I think there will be a turning >>>>> point when the slower design starts to become "fast enough?. >>>>> >>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>> things that don?t get faster while others do tend to get dropped >>>>> or not used because they impose a performance penalty relative to >>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>> where impovements in technology get applied to lowering cost >>>>> rather than improving performance. For those environments bad >>>>> performance just never get better. >>>>> >>>>> Do you >>>>> think there will be some design of ndn that will *never* have >>>>> performance improvement? >>>>> >>>>> I suspect LPM on data will always be slow (relative to the other >>>>> functions). >>>>> i suspect exclusions will always be slow because they will >>>>> require extra memory references. >>>>> >>>>> However I of course don?t claim to clairvoyance so this is just >>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>> orders of magnitude and still having to worry about counting >>>>> cycles and memory references? >>>>> >>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>> wrote: >>>>> >>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>> wrote: >>>>> >>>>> We should not look at a certain chip nowadays and want ndn to >>>>> perform >>>>> well on it. It should be the other way around: once ndn app >>>>> becomes >>>>> popular, a better chip will be designed for ndn. >>>>> >>>>> While I?m sympathetic to that view, there are three ways in >>>>> which Moore?s law or hardware tricks will not save us from >>>>> performance flaws in the design: >>>>> a) clock rates are not getting (much) faster >>>>> b) memory accesses are getting (relatively) more expensive >>>>> c) data structures that require locks to manipulate >>>>> successfully will be relatively more expensive, even with >>>>> near-zero lock contention. >>>>> >>>>> The fact is, IP *did* have some serious performance flaws in >>>>> its design. We just forgot those because the design elements >>>>> that depended on those mistakes have fallen into disuse. The >>>>> poster children for this are: >>>>> 1. IP options. Nobody can use them because they are too slow >>>>> on modern forwarding hardware, so they can?t be reliably used >>>>> anywhere >>>>> 2. the UDP checksum, which was a bad design when it was >>>>> specified and is now a giant PITA that still causes major pain >>>>> in working around. >>>>> >>>>> I?m afraid students today are being taught the that designers >>>>> of IP were flawless, as opposed to very good scientists and >>>>> engineers that got most of it right. >>>>> >>>>> I feel the discussion today and yesterday has been off-topic. >>>>> Now I >>>>> see that there are 3 approaches: >>>>> 1. we should not define a naming convention at all >>>>> 2. typed component: use tlv type space and add a handful of >>>>> types >>>>> 3. marked component: introduce only one more type and add >>>>> additional >>>>> marker space >>>>> >>>>> I know how to make #2 flexible enough to do what things I can >>>>> envision we need to do, and with a few simple conventions on >>>>> how the registry of types is managed. >>>>> >>>>> It is just as powerful in practice as either throwing up our >>>>> hands and letting applications design their own mutually >>>>> incompatible schemes or trying to make naming conventions with >>>>> markers in a way that is fast to generate/parse and also >>>>> resilient against aliasing. >>>>> >>>>> Also everybody thinks that the current utf8 marker naming >>>>> convention >>>>> needs to be revised. >>>>> >>>>> >>>>> >>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>> wrote: >>>>> Would that chip be suitable, i.e. can we expect most names >>>>> to fit in (the >>>>> magnitude of) 96 bytes? What length are names usually in >>>>> current NDN >>>>> experiments? >>>>> >>>>> I guess wide deployment could make for even longer names. >>>>> Related: Many URLs >>>>> I encounter nowadays easily don't fit within two 80-column >>>>> text lines, and >>>>> NDN will have to carry more information than URLs, as far as >>>>> I see. >>>>> >>>>> >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>> >>>>> In fact, the index in separate TLV will be slower on some >>>>> architectures, >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>> bytes in memory, >>>>> then any subsequent memory is accessed only as two adjacent >>>>> 32-byte blocks >>>>> (there can be at most 5 blocks available at any one time). >>>>> If you need to >>>>> switch between arrays, it would be very expensive. If you >>>>> have to read past >>>>> the name to get to the 2nd array, then read it, then backup >>>>> to get to the >>>>> name, it will be pretty expensive too. >>>>> >>>>> Marc >>>>> >>>>> On Sep 18, 2014, at 2:02 PM, >>>>> wrote: >>>>> >>>>> Does this make that much difference? >>>>> >>>>> If you want to parse the first 5 components. One way to do >>>>> it is: >>>>> >>>>> Read the index, find entry 5, then read in that many bytes >>>>> from the start >>>>> offset of the beginning of the name. >>>>> OR >>>>> Start reading name, (find size + move ) 5 times. >>>>> >>>>> How much speed are you getting from one to the other? You >>>>> seem to imply >>>>> that the first one is faster. I don?t think this is the >>>>> case. >>>>> >>>>> In the first one you?ll probably have to get the cache line >>>>> for the index, >>>>> then all the required cache lines for the first 5 >>>>> components. For the >>>>> second, you?ll have to get all the cache lines for the first >>>>> 5 components. >>>>> Given an assumption that a cache miss is way more expensive >>>>> than >>>>> evaluating a number and computing an addition, you might >>>>> find that the >>>>> performance of the index is actually slower than the >>>>> performance of the >>>>> direct access. >>>>> >>>>> Granted, there is a case where you don?t access the name at >>>>> all, for >>>>> example, if you just get the offsets and then send the >>>>> offsets as >>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>> you may see a >>>>> gain IF there are more cache line misses in reading the name >>>>> than in >>>>> reading the index. So, if the regular part of the name >>>>> that you?re >>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>> name is to be >>>>> processed by a different processor, then your might see some >>>>> performance >>>>> gain in using the index, but in all other circumstances I >>>>> bet this is not >>>>> the case. I may be wrong, haven?t actually tested it. >>>>> >>>>> This is all to say, I don?t think we should be designing the >>>>> protocol with >>>>> only one architecture in mind. (The architecture of sending >>>>> the name to a >>>>> different processor than the index). >>>>> >>>>> If you have numbers that show that the index is faster I >>>>> would like to see >>>>> under what conditions and architectural assumptions. >>>>> >>>>> Nacho >>>>> >>>>> (I may have misinterpreted your description so feel free to >>>>> correct me if >>>>> I?m wrong.) >>>>> >>>>> >>>>> -- >>>>> Nacho (Ignacio) Solis >>>>> Protocol Architect >>>>> Principal Scientist >>>>> Palo Alto Research Center (PARC) >>>>> +1(650)812-4458 >>>>> Ignacio.Solis at parc.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>> >>>>> wrote: >>>>> >>>>> Indeed each components' offset must be encoded using a fixed >>>>> amount of >>>>> bytes: >>>>> >>>>> i.e., >>>>> Type = Offsets >>>>> Length = 10 Bytes >>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>> >>>>> You may also imagine to have a "Offset_2byte" type if your >>>>> name is too >>>>> long. >>>>> >>>>> Max >>>>> >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose >>>>> you only >>>>> want the first x components) you can directly have it using >>>>> the >>>>> offsets. With the Nested TLV structure you have to >>>>> iteratively parse >>>>> the first x-1 components. With the offset structure you cane >>>>> directly >>>>> access to the firs x components. >>>>> >>>>> I don't get it. What you described only works if the >>>>> "offset" is >>>>> encoded in fixed bytes. With varNum, you will still need to >>>>> parse x-1 >>>>> offsets to get to the x offset. >>>>> >>>>> >>>>> >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>> wrote: >>>>> >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>> >>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>> like the >>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>> understand what >>>>> you >>>>> _do_ prefer, though. it sounds like you're describing an >>>>> entirely >>>>> different >>>>> scheme where the info that describes the name-components is >>>>> ... >>>>> someplace >>>>> other than _in_ the name-components. is that correct? when >>>>> you say >>>>> "field >>>>> separator", what do you mean (since that's not a "TL" from a >>>>> TLV)? >>>>> >>>>> Correct. >>>>> In particular, with our name encoding, a TLV indicates the >>>>> name >>>>> hierarchy >>>>> with offsets in the name and other TLV(s) indicates the >>>>> offset to use >>>>> in >>>>> order to retrieve special components. >>>>> As for the field separator, it is something like "/". >>>>> Aliasing is >>>>> avoided as >>>>> you do not rely on field separators to parse the name; you >>>>> use the >>>>> "offset >>>>> TLV " to do that. >>>>> >>>>> So now, it may be an aesthetic question but: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose >>>>> you only >>>>> want >>>>> the first x components) you can directly have it using the >>>>> offsets. >>>>> With the >>>>> Nested TLV structure you have to iteratively parse the first >>>>> x-1 >>>>> components. >>>>> With the offset structure you cane directly access to the >>>>> firs x >>>>> components. >>>>> >>>>> Max >>>>> >>>>> >>>>> -- Mark >>>>> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>> >>>>> The why is simple: >>>>> >>>>> You use a lot of "generic component type" and very few >>>>> "specific >>>>> component type". You are imposing types for every component >>>>> in order >>>>> to >>>>> handle few exceptions (segmentation, etc..). You create a >>>>> rule >>>>> (specify >>>>> the component's type ) to handle exceptions! >>>>> >>>>> I would prefer not to have typed components. Instead I would >>>>> prefer >>>>> to >>>>> have the name as simple sequence bytes with a field >>>>> separator. Then, >>>>> outside the name, if you have some components that could be >>>>> used at >>>>> network layer (e.g. a TLV field), you simply need something >>>>> that >>>>> indicates which is the offset allowing you to retrieve the >>>>> version, >>>>> segment, etc in the name... >>>>> >>>>> >>>>> Max >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>> >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>> >>>>> I think we agree on the small number of "component types". >>>>> However, if you have a small number of types, you will end >>>>> up with >>>>> names >>>>> containing many generic components types and few specific >>>>> components >>>>> types. Due to the fact that the component type specification >>>>> is an >>>>> exception in the name, I would prefer something that specify >>>>> component's >>>>> type only when needed (something like UTF8 conventions but >>>>> that >>>>> applications MUST use). >>>>> >>>>> so ... I can't quite follow that. the thread has had some >>>>> explanation >>>>> about why the UTF8 requirement has problems (with aliasing, >>>>> e.g.) >>>>> and >>>>> there's been email trying to explain that applications don't >>>>> have to >>>>> use types if they don't need to. your email sounds like "I >>>>> prefer >>>>> the >>>>> UTF8 convention", but it doesn't say why you have that >>>>> preference in >>>>> the face of the points about the problems. can you say why >>>>> it is >>>>> that >>>>> you express a preference for the "convention" with problems ? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> . >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From bzhang at cs.arizona.edu Wed Sep 24 13:46:55 2014 From: bzhang at cs.arizona.edu (Beichuan Zhang) Date: Wed, 24 Sep 2014 22:46:55 +0200 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: References: <54208AA4.9060709@rabe.io> <542114C8.7020802@rabe.io> Message-ID: <142C6142-4DD2-4BF1-A932-5030D8815B72@cs.arizona.edu> Maybe we can stream it over NDN testbed -:) There?re 3 nodes in China. Beichuan On Sep 23, 2014, at 8:46 AM, Hu, Xiaoyan wrote: > Hi Jeff, > > Yes indeed. It is really a slow connection. > Really appreciate if there would be download links offered. > Thanks very much! > > Best Regards, > Xiaoyan Hu > PhD Candidate > School of Computer Science and Engineering, Southeast University, NanJing, China (Post code: 211189) > xhbreezehu at gmail dot com > +86-186-5187-8116 > > On Tue, Sep 23, 2014 at 2:35 PM, Felix Rabe wrote: > Hi Jeff > > Don't want to bother you right now, but I'd like to review them sometime in the next month, as my notes are incomplete in many places. > > Also, Xiaoke mentioned that the streamed versions are unsuitable for folks in China with a slow connection. They would still like to see them. > > - Felix > > > > On 23/Sep/14 08:01, Burke, Jeff wrote: > Hi Felix, > > Unfortunately the live stream records are what they are - some quirks in > the early recording can't be fixed. > > We should have separate local recordings as well, but are pretty swamped > right now. Is there something in particular you'd like to see posted? > > Jeff > > > > On 9/23/14, 12:46 AM, "Felix Rabe" wrote: > > Hi list (or, REMAP) > > The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are > they available as downloads somewhere? > > Also, I see some videos are barely viewable (at least [1], but [2] seems > to be fine), skipping a few seconds every few seconds. Do you still have > a complete version? > > [1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 > [2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 > > Kind regards > - Felix > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From shock.jiang at gmail.com Wed Sep 24 13:54:48 2014 From: shock.jiang at gmail.com (Xiaoke Jiang) Date: Wed, 24 Sep 2014 13:54:48 -0700 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: <142C6142-4DD2-4BF1-A932-5030D8815B72@cs.arizona.edu> References: <54208AA4.9060709@rabe.io> <542114C8.7020802@rabe.io> <142C6142-4DD2-4BF1-A932-5030D8815B72@cs.arizona.edu> Message-ID: On Wednesday, 24 September, 2014 at 1:46 pm, Beichuan Zhang wrote: > Maybe we can stream it over NDN testbed -:) There?re 3 nodes in China. > Good solution. Some other guys in China also complained the slow connection. also, NDN could shows its advantage over data delivery with this solution. Xiaoke (Shock) > > Beichuan > > On Sep 23, 2014, at 8:46 AM, Hu, Xiaoyan wrote: > > Hi Jeff, > > > > Yes indeed. It is really a slow connection. > > Really appreciate if there would be download links offered. > > Thanks very much! > > > > Best Regards, > > Xiaoyan Hu > > PhD Candidate > > School of Computer Science and Engineering, Southeast University, NanJing, China (Post code: 211189) > > xhbreezehu at gmail dot com > > +86-186-5187-8116 > > > > On Tue, Sep 23, 2014 at 2:35 PM, Felix Rabe wrote: > > > Hi Jeff > > > > > > Don't want to bother you right now, but I'd like to review them sometime in the next month, as my notes are incomplete in many places. > > > > > > Also, Xiaoke mentioned that the streamed versions are unsuitable for folks in China with a slow connection. They would still like to see them. > > > > > > - Felix > > > > > > > > > > > > On 23/Sep/14 08:01, Burke, Jeff wrote: > > > > Hi Felix, > > > > > > > > Unfortunately the live stream records are what they are - some quirks in > > > > the early recording can't be fixed. > > > > > > > > We should have separate local recordings as well, but are pretty swamped > > > > right now. Is there something in particular you'd like to see posted? > > > > > > > > Jeff > > > > > > > > > > > > > > > > On 9/23/14, 12:46 AM, "Felix Rabe" wrote: > > > > > > > > > Hi list (or, REMAP) > > > > > > > > > > The NDNcomm 2014 videos at http://new.livestream.com/uclaremap/, are > > > > > they available as downloads somewhere? > > > > > > > > > > Also, I see some videos are barely viewable (at least [1], but [2] seems > > > > > to be fine), skipping a few seconds every few seconds. Do you still have > > > > > a complete version? > > > > > > > > > > [1] http://new.livestream.com/uclaremap/events/3355835/videos/61160218 > > > > > [2] http://new.livestream.com/uclaremap/events/3357542/videos/61244919 > > > > > > > > > > Kind regards > > > > > - Felix > > > > > _______________________________________________ > > > > > Ndn-interest mailing list > > > > > Ndn-interest at lists.cs.ucla.edu (mailto:Ndn-interest at lists.cs.ucla.edu) > > > > > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > > > _______________________________________________ > > > Ndn-interest mailing list > > > Ndn-interest at lists.cs.ucla.edu (mailto:Ndn-interest at lists.cs.ucla.edu) > > > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > > Ndn-interest mailing list > > Ndn-interest at lists.cs.ucla.edu (mailto:Ndn-interest at lists.cs.ucla.edu) > > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu (mailto:Ndn-interest at lists.cs.ucla.edu) > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gq.wang at huawei.com Wed Sep 24 15:04:10 2014 From: gq.wang at huawei.com (GQ Wang) Date: Wed, 24 Sep 2014 22:04:10 +0000 Subject: [Ndn-interest] Ndn-interest Digest, Vol 7, Issue 60 In-Reply-To: References: Message-ID: <6759D6D370C7024D9BB22CD5F92D2D3318565884@SJCEML701-CHM.china.huawei.com> It seemed to me that the arguments were caused by the functions we require "a name" to do: 1. Ask it for routing/forwarding 2. Ask it for data retrieval services (someone mentioned email service) If we separate CS from CCN/NDN data forwarding plane and treat it just as a plain service (I asked this question at NDNComm 2014), then "locator" becomes an attribute purely for service layer functions, which is handled by Apps, not by the networking. Will this separation make naming convention (and its scope) more clear? G.Q -----Original Message----- From: Ndn-interest [mailto:ndn-interest-bounces at lists.cs.ucla.edu] On Behalf Of ndn-interest-request at lists.cs.ucla.edu Sent: Wednesday, September 24, 2014 9:26 AM To: ndn-interest at lists.cs.ucla.edu Subject: Ndn-interest Digest, Vol 7, Issue 60 Send Ndn-interest mailing list submissions to ndn-interest at lists.cs.ucla.edu To subscribe or unsubscribe via the World Wide Web, visit http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest or, via email, send a message with subject or body 'help' to ndn-interest-request at lists.cs.ucla.edu You can reach the person managing the list at ndn-interest-owner at lists.cs.ucla.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of Ndn-interest digest..." Today's Topics: 1. Re: any comments on naming convention? (Marc.Mosko at parc.com) ---------------------------------------------------------------------- Message: 1 Date: Wed, 24 Sep 2014 16:25:53 +0000 From: To: Cc: Ignacio.Solis at parc.com, ndn-interest at lists.cs.ucla.edu Subject: Re: [Ndn-interest] any comments on naming convention? Message-ID: Content-Type: text/plain; charset="euc-kr" I think Tai-Lin?s example was just fine to talk about discovery. /blah/blah/value, how do you discover all the ?value?s? Discovery shouldn?t care if its email messages or temperature readings or world cup photos. I described one set of problems using the exclusion approach, and that an NDN paper on device discovery described a similar problem, though they did not go into the details of splitting interests, etc. That all was simple enough to see from the example. Another question is how does one do the discovery with exact match names, which is also conflating things. You could do a different discovery with continuation names too, just not the exclude method. As I alluded to, one needs a way to talk with a specific cache about its ?table of contents? for a prefix so one can get a consistent set of results without all the round-trips of exclusions. Actually downloading the ?headers? of the messages would be the same bytes, more or less. In a way, this is a little like name enumeration from a ccnx 0.x repo, but that protocol has its own set of problems and I?m not suggesting to use that directly. One approach is to encode a request in a name component and a participating cache can reply. It replies in such a way that one could continue talking with that cache to get its TOC. One would then issue another interest with a request for not-that-cache. Another approach is to try to ask the authoritative source for the ?current? manifest name, i.e. /mail/inbox/current/, which could return the manifest or a link to the manifest. Then fetching the actual manifest from the link could come from caches because you how have a consistent set of names to ask for. If you cannot talk with an authoritative source, you could try again without the nonce and see if there?s a cached copy of a recent version around. Marc On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: > > > On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" > wrote: > >> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >> >>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>> pattern with static (/mail/inbox) and variable (148) components; with >>> proper naming convention, computers can also detect this pattern >>> easily. Now I want to look for all mails in my inbox. I can generate a >>> list of /mail/inbox/. These are my guesses, and with selectors >>> I can further refine my guesses. >> >> I think this is a very bad example (or at least a very bad application >> design). You have an app (a mail server / inbox) and you want it to list >> your emails? An email list is an application data structure. I don?t >> think you should use the network structure to reflect this. > > I think Tai-Lin is trying to sketch a small example, not propose a > full-scale approach to email. (Maybe I am misunderstanding.) > > > Another way to look at it is that if the network architecture is providing > the equivalent of distributed storage to the application, perhaps the > application data structure could be adapted to match the affordances of > the network. Then it would not be so bad that the two structures were > aligned. > >> >> I?ll give you an example, how do you delete emails from your inbox? If an >> email was cached in the network it can never be deleted from your inbox? > > This is conflating two issues - what you are pointing out is that the data > structure of a linear list doesn't handle common email management > operations well. Again, I'm not sure if that's what he was getting at > here. But deletion is not the issue - the availability of a data object > on the network does not necessarily mean it's valid from the perspective > of the application. > >> Or moved to another mailbox? Do you rely on the emails expiring? >> >> This problem is true for most (any?) situations where you use network name >> structure to directly reflect the application data structure. > > Not sure I understand how you make the leap from the example to the > general statement. > > Jeff > > > >> >> Nacho >> >> >> >>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>> Ok, yes I think those would all be good things. >>>> >>>> One thing to keep in mind, especially with things like time series >>>> sensor >>>> data, is that people see a pattern and infer a way of doing it. That?s >>>> easy >>>> for a human :) But in Discovery, one should assume that one does not >>>> know >>>> of patterns in the data beyond what the protocols used to publish the >>>> data >>>> explicitly require. That said, I think some of the things you listed >>>> are >>>> good places to start: sensor data, web content, climate data or genome >>>> data. >>>> >>>> We also need to state what the forwarding strategies are and what the >>>> cache >>>> behavior is. >>>> >>>> I outlined some of the points that I think are important in that other >>>> posting. While ?discover latest? is useful, ?discover all? is also >>>> important, and that one gets complicated fast. So points like >>>> separating >>>> discovery from retrieval and working with large data sets have been >>>> important in shaping our thinking. That all said, I?d be happy >>>> starting >>>> from 0 and working through the Discovery service definition from >>>> scratch >>>> along with data set use cases. >>>> >>>> Marc >>>> >>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>> wrote: >>>> >>>> Hi Marc, >>>> >>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>> more >>>> specific, which was to see if we could identify a few specific use >>>> cases >>>> around which to have the conversation. (e.g., time series sensor data >>>> and >>>> web content retrieval for "get latest"; climate data for huge data >>>> sets; >>>> local data in a vehicular network; etc.) What have you been looking at >>>> that's driving considerations of discovery? >>>> >>>> Thanks, >>>> Jeff >>>> >>>> From: >>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>> To: Jeff Burke >>>> Cc: , >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> Jeff, >>>> >>>> Take a look at my posting (that Felix fixed) in a new thread on >>>> Discovery. >>>> >>>> >>>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >>>> 0 >>>> .html >>>> >>>> I think it would be very productive to talk about what Discovery should >>>> do, >>>> and not focus on the how. It is sometimes easy to get caught up in the >>>> how, >>>> which I think is a less important topic than the what at this stage. >>>> >>>> Marc >>>> >>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>> wrote: >>>> >>>> Marc, >>>> >>>> If you can't talk about your protocols, perhaps we can discuss this >>>> based >>>> on use cases. What are the use cases you are using to evaluate >>>> discovery? >>>> >>>> Jeff >>>> >>>> >>>> >>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>> wrote: >>>> >>>> No matter what the expressiveness of the predicates if the forwarder >>>> can >>>> send interests different ways you don't have a consistent underlying >>>> set >>>> to talk about so you would always need non-range exclusions to discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative >>>> answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>> >>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>> all problems that you mentioned (not just exclude)? a doc will be >>>> better. >>>> >>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>> language or context free language might become part of selector too. >>>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. >>>> >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>> If you talk sometimes to A and sometimes to B, you very easily >>>> could miss content objects you want to discovery unless you avoid >>>> all range exclusions and only exclude explicit versions. >>>> >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit >>>> exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>>> >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. For something like a sensor reading that is updated, >>>> say, once per second you will have 86,400 of them per day. If each >>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>> exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a >>>> timestamp makes sense here, but its just an example of needing a lot >>>> of exclusions. >>>> >>>> >>>> You exclude through 100 then issue a new interest. This goes to >>>> cache B >>>> >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, >>>> if >>>> this goes to cache B only, it means that cache A dies. How do you >>>> know >>>> that v101 even exist? >>>> >>>> >>>> I guess this depends on what the forwarding strategy is. If the >>>> forwarder will always send each interest to all replicas, then yes, >>>> modulo packet loss, you would discover v101 on cache A. If the >>>> forwarder is just doing ?best path? and can round-robin between cache >>>> A and cache B, then your application could miss v101. >>>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs >>>> (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>> specs for doing the exact match discovery. So, as I said, I?m not >>>> ready to claim its better yet because we have not done that. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest >>>> matching to do discovery has its own set of problems. Discovery >>>> involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about >>>> naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two >>>> different caches are responding, the exclusion set you build up >>>> talking with cache A will be invalid for cache B. If you talk >>>> sometimes to A and sometimes to B, you very easily could miss >>>> content objects you want to discovery unless you avoid all range >>>> exclusions and only exclude explicit versions. That will lead to >>>> very large interest packets. In ccnx 1.0, we believe that an >>>> explicit discovery protocol that allows conversations about >>>> consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that >>>> should be transitive between caches, but imagine this. You send >>>> Interest #1 to cache A which returns version 100. You exclude >>>> through 100 then issue a new interest. This goes to cache B who >>>> only has version 99, so the interest times out or is NACK?d. So >>>> you think you have it! But, cache A already has version 101, you >>>> just don?t know. If you cannot have a conversation around >>>> consistent sets, it seems like even doing latest version discovery >>>> is difficult with selector based discovery. From what I saw in >>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>> authoritative source because you can never believe an intermediate >>>> cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>> interest in seeing your analysis. Case (a) is that a node can >>>> correctly discover every version of a name prefix, and (b) is that >>>> a node can correctly discover the latest version. We have not >>>> formally compared (or yet published) our discovery protocols (we >>>> have three, 2 for content, 1 for device) compared to selector based >>>> discovery, so I cannot yet claim they are better, but they do not >>>> have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you >>>> must do in the PIT to match a content object. If you have a name >>>> tree or a threaded hash table, those don?t all need to be hash >>>> lookups, but you need to walk up the name tree for every prefix of >>>> the content object name and evaluate the selector predicate. >>>> Content Based Networking (CBN) had some some methods to create data >>>> structures based on predicates, maybe those would be better. But >>>> in any case, you will potentially need to retrieve many PIT entries >>>> if there is Interest traffic for many prefixes of a root. Even on >>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>> implementation only requires at most 3 lookups (one by name, one by >>>> name + keyid, one by name + content object hash), and one can do >>>> other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just >>>> walking parent pointers, I suspect you?ll need locking of the >>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>> and that will be expensive. It would be interesting to see what a >>>> cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar >>>> things? >>>> 2. will removing selectors improve performance? How do we use >>>> other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> >>>> Could you share it with us? >>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the >>>> entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the >>>> likelihood there will be at least as much and probably more need >>>> to component types than protocol extensions, we could reserve 1/2 >>>> of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components >>>> and other fields of the protocol (sine they are sub-types of the >>>> name type) we could reuse numbers and thereby have an entire 65K >>>> name component types. >>>> >>>> We divide the type space into regions, and manage it with a >>>> registry. If we ever get to the point of creating an IETF >>>> standard, IANA has 25 years of experience running registries and >>>> there are well-understood rule sets for different kinds of >>>> registries (open, requires a written spec, requires standards >>>> approval). >>>> >>>> - We allocate one ?default" name component type for ?generic >>>> name?, which would be used on name prefixes and other common >>>> cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to >>>> globally understood types that are part of the base or extension >>>> NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses >>>> (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design >>>> >>>> >>>> we could design for performance, >>>> >>>> That?s not what people are advocating. We are advocating that we >>>> *not* design for known bad performance and hope serendipity or >>>> Moore?s Law will come to the rescue. >>>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>>> >>>> Perhaps, perhaps not. Relative performance is what matters so >>>> things that don?t get faster while others do tend to get dropped >>>> or not used because they impose a performance penalty relative to >>>> the things that go faster. There is also the ?low-end? phenomenon >>>> where impovements in technology get applied to lowering cost >>>> rather than improving performance. For those environments bad >>>> performance just never get better. >>>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>>> I suspect LPM on data will always be slow (relative to the other >>>> functions). >>>> i suspect exclusions will always be slow because they will >>>> require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just >>>> speculation based on 35+ years of seeing performance improve by 4 >>>> orders of magnitude and still having to worry about counting >>>> cycles and memory references? >>>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> We should not look at a certain chip nowadays and want ndn to >>>> perform >>>> well on it. It should be the other way around: once ndn app >>>> becomes >>>> popular, a better chip will be designed for ndn. >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate >>>> successfully will be relatively more expensive, even with >>>> near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in >>>> its design. We just forgot those because the design elements >>>> that depended on those mistakes have fallen into disuse. The >>>> poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow >>>> on modern forwarding hardware, so they can?t be reliably used >>>> anywhere >>>> 2. the UDP checksum, which was a bad design when it was >>>> specified and is now a giant PITA that still causes major pain >>>> in working around. >>>> >>>> I?m afraid students today are being taught the that designers >>>> of IP were flawless, as opposed to very good scientists and >>>> engineers that got most of it right. >>>> >>>> I feel the discussion today and yesterday has been off-topic. >>>> Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of >>>> types >>>> 3. marked component: introduce only one more type and add >>>> additional >>>> marker space >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our >>>> hands and letting applications design their own mutually >>>> incompatible schemes or trying to make naming conventions with >>>> markers in a way that is fast to generate/parse and also >>>> resilient against aliasing. >>>> >>>> Also everybody thinks that the current utf8 marker naming >>>> convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>> wrote: >>>> Would that chip be suitable, i.e. can we expect most names >>>> to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in >>>> current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. >>>> Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column >>>> text lines, and >>>> NDN will have to carry more information than URLs, as far as >>>> I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some >>>> architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>> bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent >>>> 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). >>>> If you need to >>>> switch between arrays, it would be very expensive. If you >>>> have to read past >>>> the name to get to the 2nd array, then read it, then backup >>>> to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do >>>> it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes >>>> from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You >>>> seem to imply >>>> that the first one is faster. I don?t think this is the >>>> case. >>>> >>>> In the first one you?ll probably have to get the cache line >>>> for the index, >>>> then all the required cache lines for the first 5 >>>> components. For the >>>> second, you?ll have to get all the cache lines for the first >>>> 5 components. >>>> Given an assumption that a cache miss is way more expensive >>>> than >>>> evaluating a number and computing an addition, you might >>>> find that the >>>> performance of the index is actually slower than the >>>> performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at >>>> all, for >>>> example, if you just get the offsets and then send the >>>> offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case >>>> you may see a >>>> gain IF there are more cache line misses in reading the name >>>> than in >>>> reading the index. So, if the regular part of the name >>>> that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the >>>> name is to be >>>> processed by a different processor, then your might see some >>>> performance >>>> gain in using the index, but in all other circumstances I >>>> bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the >>>> protocol with >>>> only one architecture in mind. (The architecture of sending >>>> the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I >>>> would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to >>>> correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed >>>> amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your >>>> name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want the first x components) you can directly have it using >>>> the >>>> offsets. With the Nested TLV structure you have to >>>> iteratively parse >>>> the first x-1 components. With the offset structure you cane >>>> directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the >>>> "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to >>>> parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I >>>> like the >>>> existing NDN UTF8 'convention'." I'm still not sure I >>>> understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an >>>> entirely >>>> different >>>> scheme where the info that describes the name-components is >>>> ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when >>>> you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a >>>> TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the >>>> name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the >>>> offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". >>>> Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you >>>> use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want >>>> the first x components) you can directly have it using the >>>> offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first >>>> x-1 >>>> components. >>>> With the offset structure you cane directly access to the >>>> firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few >>>> "specific >>>> component type". You are imposing types for every component >>>> in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a >>>> rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would >>>> prefer >>>> to >>>> have the name as simple sequence bytes with a field >>>> separator. Then, >>>> outside the name, if you have some components that could be >>>> used at >>>> network layer (e.g. a TLV field), you simply need something >>>> that >>>> indicates which is the offset allowing you to retrieve the >>>> version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end >>>> up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification >>>> is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but >>>> that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, >>>> e.g.) >>>> and >>>> there's been email trying to explain that applications don't >>>> have to >>>> use types if they don't need to. your email sounds like "I >>>> prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that >>>> preference in >>>> the face of the points about the problems. can you say why >>>> it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: ------------------------------ Subject: Digest Footer _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest ------------------------------ End of Ndn-interest Digest, Vol 7, Issue 60 ******************************************* From gq.wang at huawei.com Wed Sep 24 15:27:05 2014 From: gq.wang at huawei.com (GQ Wang) Date: Wed, 24 Sep 2014 22:27:05 +0000 Subject: [Ndn-interest] any comments on naming convention? (Marc.Mosko@parc.com) References: Message-ID: <6759D6D370C7024D9BB22CD5F92D2D3318565918@SJCEML701-CHM.china.huawei.com> It seemed to me that the arguments were caused by the functions we require "a name" to do: 1. Ask it for routing/forwarding 2. Ask it for data retrieval services (someone mentioned email service) If we separate CS from CCN/NDN data forwarding plane and treat it just as a plain service (I asked this question at NDNComm 2014), then "locator" becomes an attribute purely for service layer functions, which is handled by Apps, not by the networking. Will this separation make naming convention (and its scope) more clear? G.Q -----Original Message----- From: Ndn-interest [mailto:ndn-interest-bounces at lists.cs.ucla.edu] On Behalf Of ndn-interest-request at lists.cs.ucla.edu Sent: Wednesday, September 24, 2014 9:26 AM To: ndn-interest at lists.cs.ucla.edu Subject: Ndn-interest Digest, Vol 7, Issue 60 Send Ndn-interest mailing list submissions to ndn-interest at lists.cs.ucla.edu To subscribe or unsubscribe via the World Wide Web, visit http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest or, via email, send a message with subject or body 'help' to ndn-interest-request at lists.cs.ucla.edu You can reach the person managing the list at ndn-interest-owner at lists.cs.ucla.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of Ndn-interest digest..." Today's Topics: 1. Re: any comments on naming convention? (Marc.Mosko at parc.com) ---------------------------------------------------------------------- Message: 1 Date: Wed, 24 Sep 2014 16:25:53 +0000 From: To: Cc: Ignacio.Solis at parc.com, ndn-interest at lists.cs.ucla.edu Subject: Re: [Ndn-interest] any comments on naming convention? Message-ID: Content-Type: text/plain; charset="euc-kr" I think Tai-Lin?s example was just fine to talk about discovery. /blah/blah/value, how do you discover all the ?value?s? Discovery shouldn?t care if its email messages or temperature readings or world cup photos. I described one set of problems using the exclusion approach, and that an NDN paper on device discovery described a similar problem, though they did not go into the details of splitting interests, etc. That all was simple enough to see from the example. Another question is how does one do the discovery with exact match names, which is also conflating things. You could do a different discovery with continuation names too, just not the exclude method. As I alluded to, one needs a way to talk with a specific cache about its ?table of contents? for a prefix so one can get a consistent set of results without all the round-trips of exclusions. Actually downloading the ?headers? of the messages would be the same bytes, more or less. In a way, this is a little like name enumeration from a ccnx 0.x repo, but that protocol has its own set of problems and I?m not suggesting to use that directly. One approach is to encode a request in a name component and a participating cache can reply. It replies in such a way that one could continue talking with that cache to get its TOC. One would then issue another interest with a request for not-that-cache. Another approach is to try to ask the authoritative source for the ?current? manifest name, i.e. /mail/inbox/current/, which could return the manifest or a link to the manifest. Then fetching the actual manifest from the link could come from caches because you how have a consistent set of names to ask for. If you cannot talk with an authoritative source, you could try again without the nonce and see if there?s a cached copy of a recent version around. Marc On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: > > > On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" > wrote: > >> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >> >>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>> pattern with static (/mail/inbox) and variable (148) components; with >>> proper naming convention, computers can also detect this pattern >>> easily. Now I want to look for all mails in my inbox. I can generate a >>> list of /mail/inbox/. These are my guesses, and with selectors >>> I can further refine my guesses. >> >> I think this is a very bad example (or at least a very bad application >> design). You have an app (a mail server / inbox) and you want it to list >> your emails? An email list is an application data structure. I don?t >> think you should use the network structure to reflect this. > > I think Tai-Lin is trying to sketch a small example, not propose a > full-scale approach to email. (Maybe I am misunderstanding.) > > > Another way to look at it is that if the network architecture is providing > the equivalent of distributed storage to the application, perhaps the > application data structure could be adapted to match the affordances of > the network. Then it would not be so bad that the two structures were > aligned. > >> >> I?ll give you an example, how do you delete emails from your inbox? If an >> email was cached in the network it can never be deleted from your inbox? > > This is conflating two issues - what you are pointing out is that the data > structure of a linear list doesn't handle common email management > operations well. Again, I'm not sure if that's what he was getting at > here. But deletion is not the issue - the availability of a data object > on the network does not necessarily mean it's valid from the perspective > of the application. > >> Or moved to another mailbox? Do you rely on the emails expiring? >> >> This problem is true for most (any?) situations where you use network name >> structure to directly reflect the application data structure. > > Not sure I understand how you make the leap from the example to the > general statement. > > Jeff > > > >> >> Nacho >> >> >> >>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>> Ok, yes I think those would all be good things. >>>> >>>> One thing to keep in mind, especially with things like time series >>>> sensor >>>> data, is that people see a pattern and infer a way of doing it. That?s >>>> easy >>>> for a human :) But in Discovery, one should assume that one does not >>>> know >>>> of patterns in the data beyond what the protocols used to publish the >>>> data >>>> explicitly require. That said, I think some of the things you listed >>>> are >>>> good places to start: sensor data, web content, climate data or genome >>>> data. >>>> >>>> We also need to state what the forwarding strategies are and what the >>>> cache >>>> behavior is. >>>> >>>> I outlined some of the points that I think are important in that other >>>> posting. While ?discover latest? is useful, ?discover all? is also >>>> important, and that one gets complicated fast. So points like >>>> separating >>>> discovery from retrieval and working with large data sets have been >>>> important in shaping our thinking. That all said, I?d be happy >>>> starting >>>> from 0 and working through the Discovery service definition from >>>> scratch >>>> along with data set use cases. >>>> >>>> Marc >>>> >>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>> wrote: >>>> >>>> Hi Marc, >>>> >>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>> more >>>> specific, which was to see if we could identify a few specific use >>>> cases >>>> around which to have the conversation. (e.g., time series sensor data >>>> and >>>> web content retrieval for "get latest"; climate data for huge data >>>> sets; >>>> local data in a vehicular network; etc.) What have you been looking at >>>> that's driving considerations of discovery? >>>> >>>> Thanks, >>>> Jeff >>>> >>>> From: >>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>> To: Jeff Burke >>>> Cc: , >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> Jeff, >>>> >>>> Take a look at my posting (that Felix fixed) in a new thread on >>>> Discovery. >>>> >>>> >>>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >>>> 0 >>>> .html >>>> >>>> I think it would be very productive to talk about what Discovery should >>>> do, >>>> and not focus on the how. It is sometimes easy to get caught up in the >>>> how, >>>> which I think is a less important topic than the what at this stage. >>>> >>>> Marc >>>> >>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>> wrote: >>>> >>>> Marc, >>>> >>>> If you can't talk about your protocols, perhaps we can discuss this >>>> based >>>> on use cases. What are the use cases you are using to evaluate >>>> discovery? >>>> >>>> Jeff >>>> >>>> >>>> >>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>> wrote: >>>> >>>> No matter what the expressiveness of the predicates if the forwarder >>>> can >>>> send interests different ways you don't have a consistent underlying >>>> set >>>> to talk about so you would always need non-range exclusions to discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative >>>> answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>> >>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>> all problems that you mentioned (not just exclude)? a doc will be >>>> better. >>>> >>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>> language or context free language might become part of selector too. >>>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. >>>> >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>> If you talk sometimes to A and sometimes to B, you very easily >>>> could miss content objects you want to discovery unless you avoid >>>> all range exclusions and only exclude explicit versions. >>>> >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit >>>> exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>>> >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. For something like a sensor reading that is updated, >>>> say, once per second you will have 86,400 of them per day. If each >>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>> exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a >>>> timestamp makes sense here, but its just an example of needing a lot >>>> of exclusions. >>>> >>>> >>>> You exclude through 100 then issue a new interest. This goes to >>>> cache B >>>> >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, >>>> if >>>> this goes to cache B only, it means that cache A dies. How do you >>>> know >>>> that v101 even exist? >>>> >>>> >>>> I guess this depends on what the forwarding strategy is. If the >>>> forwarder will always send each interest to all replicas, then yes, >>>> modulo packet loss, you would discover v101 on cache A. If the >>>> forwarder is just doing ?best path? and can round-robin between cache >>>> A and cache B, then your application could miss v101. >>>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs >>>> (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>> specs for doing the exact match discovery. So, as I said, I?m not >>>> ready to claim its better yet because we have not done that. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest >>>> matching to do discovery has its own set of problems. Discovery >>>> involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about >>>> naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two >>>> different caches are responding, the exclusion set you build up >>>> talking with cache A will be invalid for cache B. If you talk >>>> sometimes to A and sometimes to B, you very easily could miss >>>> content objects you want to discovery unless you avoid all range >>>> exclusions and only exclude explicit versions. That will lead to >>>> very large interest packets. In ccnx 1.0, we believe that an >>>> explicit discovery protocol that allows conversations about >>>> consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that >>>> should be transitive between caches, but imagine this. You send >>>> Interest #1 to cache A which returns version 100. You exclude >>>> through 100 then issue a new interest. This goes to cache B who >>>> only has version 99, so the interest times out or is NACK?d. So >>>> you think you have it! But, cache A already has version 101, you >>>> just don?t know. If you cannot have a conversation around >>>> consistent sets, it seems like even doing latest version discovery >>>> is difficult with selector based discovery. From what I saw in >>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>> authoritative source because you can never believe an intermediate >>>> cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>> interest in seeing your analysis. Case (a) is that a node can >>>> correctly discover every version of a name prefix, and (b) is that >>>> a node can correctly discover the latest version. We have not >>>> formally compared (or yet published) our discovery protocols (we >>>> have three, 2 for content, 1 for device) compared to selector based >>>> discovery, so I cannot yet claim they are better, but they do not >>>> have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you >>>> must do in the PIT to match a content object. If you have a name >>>> tree or a threaded hash table, those don?t all need to be hash >>>> lookups, but you need to walk up the name tree for every prefix of >>>> the content object name and evaluate the selector predicate. >>>> Content Based Networking (CBN) had some some methods to create data >>>> structures based on predicates, maybe those would be better. But >>>> in any case, you will potentially need to retrieve many PIT entries >>>> if there is Interest traffic for many prefixes of a root. Even on >>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>> implementation only requires at most 3 lookups (one by name, one by >>>> name + keyid, one by name + content object hash), and one can do >>>> other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just >>>> walking parent pointers, I suspect you?ll need locking of the >>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>> and that will be expensive. It would be interesting to see what a >>>> cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar >>>> things? >>>> 2. will removing selectors improve performance? How do we use >>>> other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> >>>> Could you share it with us? >>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the >>>> entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the >>>> likelihood there will be at least as much and probably more need >>>> to component types than protocol extensions, we could reserve 1/2 >>>> of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components >>>> and other fields of the protocol (sine they are sub-types of the >>>> name type) we could reuse numbers and thereby have an entire 65K >>>> name component types. >>>> >>>> We divide the type space into regions, and manage it with a >>>> registry. If we ever get to the point of creating an IETF >>>> standard, IANA has 25 years of experience running registries and >>>> there are well-understood rule sets for different kinds of >>>> registries (open, requires a written spec, requires standards >>>> approval). >>>> >>>> - We allocate one ?default" name component type for ?generic >>>> name?, which would be used on name prefixes and other common >>>> cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to >>>> globally understood types that are part of the base or extension >>>> NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses >>>> (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design >>>> >>>> >>>> we could design for performance, >>>> >>>> That?s not what people are advocating. We are advocating that we >>>> *not* design for known bad performance and hope serendipity or >>>> Moore?s Law will come to the rescue. >>>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>>> >>>> Perhaps, perhaps not. Relative performance is what matters so >>>> things that don?t get faster while others do tend to get dropped >>>> or not used because they impose a performance penalty relative to >>>> the things that go faster. There is also the ?low-end? phenomenon >>>> where impovements in technology get applied to lowering cost >>>> rather than improving performance. For those environments bad >>>> performance just never get better. >>>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>>> I suspect LPM on data will always be slow (relative to the other >>>> functions). >>>> i suspect exclusions will always be slow because they will >>>> require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just >>>> speculation based on 35+ years of seeing performance improve by 4 >>>> orders of magnitude and still having to worry about counting >>>> cycles and memory references? >>>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> We should not look at a certain chip nowadays and want ndn to >>>> perform >>>> well on it. It should be the other way around: once ndn app >>>> becomes >>>> popular, a better chip will be designed for ndn. >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate >>>> successfully will be relatively more expensive, even with >>>> near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in >>>> its design. We just forgot those because the design elements >>>> that depended on those mistakes have fallen into disuse. The >>>> poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow >>>> on modern forwarding hardware, so they can?t be reliably used >>>> anywhere >>>> 2. the UDP checksum, which was a bad design when it was >>>> specified and is now a giant PITA that still causes major pain >>>> in working around. >>>> >>>> I?m afraid students today are being taught the that designers >>>> of IP were flawless, as opposed to very good scientists and >>>> engineers that got most of it right. >>>> >>>> I feel the discussion today and yesterday has been off-topic. >>>> Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of >>>> types >>>> 3. marked component: introduce only one more type and add >>>> additional >>>> marker space >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our >>>> hands and letting applications design their own mutually >>>> incompatible schemes or trying to make naming conventions with >>>> markers in a way that is fast to generate/parse and also >>>> resilient against aliasing. >>>> >>>> Also everybody thinks that the current utf8 marker naming >>>> convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>> wrote: >>>> Would that chip be suitable, i.e. can we expect most names >>>> to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in >>>> current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. >>>> Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column >>>> text lines, and >>>> NDN will have to carry more information than URLs, as far as >>>> I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some >>>> architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>> bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent >>>> 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). >>>> If you need to >>>> switch between arrays, it would be very expensive. If you >>>> have to read past >>>> the name to get to the 2nd array, then read it, then backup >>>> to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do >>>> it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes >>>> from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You >>>> seem to imply >>>> that the first one is faster. I don?t think this is the >>>> case. >>>> >>>> In the first one you?ll probably have to get the cache line >>>> for the index, >>>> then all the required cache lines for the first 5 >>>> components. For the >>>> second, you?ll have to get all the cache lines for the first >>>> 5 components. >>>> Given an assumption that a cache miss is way more expensive >>>> than >>>> evaluating a number and computing an addition, you might >>>> find that the >>>> performance of the index is actually slower than the >>>> performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at >>>> all, for >>>> example, if you just get the offsets and then send the >>>> offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case >>>> you may see a >>>> gain IF there are more cache line misses in reading the name >>>> than in >>>> reading the index. So, if the regular part of the name >>>> that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the >>>> name is to be >>>> processed by a different processor, then your might see some >>>> performance >>>> gain in using the index, but in all other circumstances I >>>> bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the >>>> protocol with >>>> only one architecture in mind. (The architecture of sending >>>> the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I >>>> would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to >>>> correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed >>>> amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your >>>> name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want the first x components) you can directly have it using >>>> the >>>> offsets. With the Nested TLV structure you have to >>>> iteratively parse >>>> the first x-1 components. With the offset structure you cane >>>> directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the >>>> "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to >>>> parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I >>>> like the >>>> existing NDN UTF8 'convention'." I'm still not sure I >>>> understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an >>>> entirely >>>> different >>>> scheme where the info that describes the name-components is >>>> ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when >>>> you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a >>>> TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the >>>> name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the >>>> offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". >>>> Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you >>>> use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want >>>> the first x components) you can directly have it using the >>>> offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first >>>> x-1 >>>> components. >>>> With the offset structure you cane directly access to the >>>> firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few >>>> "specific >>>> component type". You are imposing types for every component >>>> in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a >>>> rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would >>>> prefer >>>> to >>>> have the name as simple sequence bytes with a field >>>> separator. Then, >>>> outside the name, if you have some components that could be >>>> used at >>>> network layer (e.g. a TLV field), you simply need something >>>> that >>>> indicates which is the offset allowing you to retrieve the >>>> version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end >>>> up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification >>>> is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but >>>> that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, >>>> e.g.) >>>> and >>>> there's been email trying to explain that applications don't >>>> have to >>>> use types if they don't need to. your email sounds like "I >>>> prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that >>>> preference in >>>> the face of the points about the problems. can you say why >>>> it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: ------------------------------ Subject: Digest Footer _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest ------------------------------ End of Ndn-interest Digest, Vol 7, Issue 60 ******************************************* From nano at remap.ucla.edu Wed Sep 24 15:28:15 2014 From: nano at remap.ucla.edu (Alex Horn) Date: Wed, 24 Sep 2014 15:28:15 -0700 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: References: <54208AA4.9060709@rabe.io> <542114C8.7020802@rabe.io> <142C6142-4DD2-4BF1-A932-5030D8815B72@cs.arizona.edu> Message-ID: Indeed - we had ndn - video testbed distribution for a few years ! ndn-video (source , tech report ) was very useful in testing the early testbed. unfortunately it is out of date, as: a) uses pyccn/ccnx - needs to be updated to pyndn2 /NFD b) uses gst .010 - needs to be updated to gst 1.X we don't internally have the resources for that, at the moment... (our recent video work has been in ndn-RTC ) but if someone wanted to take it on, it is a fairly straightforward effort. meanwhile, we will get the conference video online in some form as soon as possible. thanks for your interest ! Alex On Wed, Sep 24, 2014 at 1:54 PM, Xiaoke Jiang wrote: > On Wednesday, 24 September, 2014 at 1:46 pm, Beichuan Zhang wrote: > > Maybe we can stream it over NDN testbed -:) There?re 3 nodes in China. > > Good solution. Some other guys in China also complained the slow > connection. also, NDN could shows its advantage over data delivery with > this solution. > > Xiaoke (Shock) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gq.wang at huawei.com Wed Sep 24 15:32:16 2014 From: gq.wang at huawei.com (GQ Wang) Date: Wed, 24 Sep 2014 22:32:16 +0000 Subject: [Ndn-interest] any comments on naming convention? References: Message-ID: <6759D6D370C7024D9BB22CD5F92D2D331856593B@SJCEML701-CHM.china.huawei.com> It seemed to me that the arguments were caused by the functions we require "a name" to do: 1. Ask it for routing/forwarding 2. Ask it for data retrieval services (someone mentioned email service) If we separate CS from CCN/NDN data forwarding plane and treat it just as a plain service (I asked this question at NDNComm 2014), then "Selector" becomes an attribute purely for service layer functions, which is handled by Apps, not by the networking. Will this separation make naming convention (and its scope) more clear? G.Q -----Original Message----- From: Ndn-interest [mailto:ndn-interest-bounces at lists.cs.ucla.edu] On Behalf Of ndn-interest-request at lists.cs.ucla.edu Sent: Wednesday, September 24, 2014 9:26 AM To: ndn-interest at lists.cs.ucla.edu Subject: Ndn-interest Digest, Vol 7, Issue 60 Send Ndn-interest mailing list submissions to ndn-interest at lists.cs.ucla.edu To subscribe or unsubscribe via the World Wide Web, visit http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest or, via email, send a message with subject or body 'help' to ndn-interest-request at lists.cs.ucla.edu You can reach the person managing the list at ndn-interest-owner at lists.cs.ucla.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of Ndn-interest digest..." Today's Topics: 1. Re: any comments on naming convention? (Marc.Mosko at parc.com) ---------------------------------------------------------------------- Message: 1 Date: Wed, 24 Sep 2014 16:25:53 +0000 From: To: Cc: Ignacio.Solis at parc.com, ndn-interest at lists.cs.ucla.edu Subject: Re: [Ndn-interest] any comments on naming convention? Message-ID: Content-Type: text/plain; charset="euc-kr" I think Tai-Lin?s example was just fine to talk about discovery. /blah/blah/value, how do you discover all the ?value?s? Discovery shouldn?t care if its email messages or temperature readings or world cup photos. I described one set of problems using the exclusion approach, and that an NDN paper on device discovery described a similar problem, though they did not go into the details of splitting interests, etc. That all was simple enough to see from the example. Another question is how does one do the discovery with exact match names, which is also conflating things. You could do a different discovery with continuation names too, just not the exclude method. As I alluded to, one needs a way to talk with a specific cache about its ?table of contents? for a prefix so one can get a consistent set of results without all the round-trips of exclusions. Actually downloading the ?headers? of the messages would be the same bytes, more or less. In a way, this is a little like name enumeration from a ccnx 0.x repo, but that protocol has its own set of problems and I?m not suggesting to use that directly. One approach is to encode a request in a name component and a participating cache can reply. It replies in such a way that one could continue talking with that cache to get its TOC. One would then issue another interest with a request for not-that-cache. Another approach is to try to ask the authoritative source for the ?current? manifest name, i.e. /mail/inbox/current/, which could return the manifest or a link to the manifest. Then fetching the actual manifest from the link could come from caches because you how have a consistent set of names to ask for. If you cannot talk with an authoritative source, you could try again without the nonce and see if there?s a cached copy of a recent version around. Marc On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: > > > On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" > wrote: > >> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >> >>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>> pattern with static (/mail/inbox) and variable (148) components; with >>> proper naming convention, computers can also detect this pattern >>> easily. Now I want to look for all mails in my inbox. I can generate a >>> list of /mail/inbox/. These are my guesses, and with selectors >>> I can further refine my guesses. >> >> I think this is a very bad example (or at least a very bad application >> design). You have an app (a mail server / inbox) and you want it to list >> your emails? An email list is an application data structure. I don?t >> think you should use the network structure to reflect this. > > I think Tai-Lin is trying to sketch a small example, not propose a > full-scale approach to email. (Maybe I am misunderstanding.) > > > Another way to look at it is that if the network architecture is providing > the equivalent of distributed storage to the application, perhaps the > application data structure could be adapted to match the affordances of > the network. Then it would not be so bad that the two structures were > aligned. > >> >> I?ll give you an example, how do you delete emails from your inbox? If an >> email was cached in the network it can never be deleted from your inbox? > > This is conflating two issues - what you are pointing out is that the data > structure of a linear list doesn't handle common email management > operations well. Again, I'm not sure if that's what he was getting at > here. But deletion is not the issue - the availability of a data object > on the network does not necessarily mean it's valid from the perspective > of the application. > >> Or moved to another mailbox? Do you rely on the emails expiring? >> >> This problem is true for most (any?) situations where you use network name >> structure to directly reflect the application data structure. > > Not sure I understand how you make the leap from the example to the > general statement. > > Jeff > > > >> >> Nacho >> >> >> >>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>> Ok, yes I think those would all be good things. >>>> >>>> One thing to keep in mind, especially with things like time series >>>> sensor >>>> data, is that people see a pattern and infer a way of doing it. That?s >>>> easy >>>> for a human :) But in Discovery, one should assume that one does not >>>> know >>>> of patterns in the data beyond what the protocols used to publish the >>>> data >>>> explicitly require. That said, I think some of the things you listed >>>> are >>>> good places to start: sensor data, web content, climate data or genome >>>> data. >>>> >>>> We also need to state what the forwarding strategies are and what the >>>> cache >>>> behavior is. >>>> >>>> I outlined some of the points that I think are important in that other >>>> posting. While ?discover latest? is useful, ?discover all? is also >>>> important, and that one gets complicated fast. So points like >>>> separating >>>> discovery from retrieval and working with large data sets have been >>>> important in shaping our thinking. That all said, I?d be happy >>>> starting >>>> from 0 and working through the Discovery service definition from >>>> scratch >>>> along with data set use cases. >>>> >>>> Marc >>>> >>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>> wrote: >>>> >>>> Hi Marc, >>>> >>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>> more >>>> specific, which was to see if we could identify a few specific use >>>> cases >>>> around which to have the conversation. (e.g., time series sensor data >>>> and >>>> web content retrieval for "get latest"; climate data for huge data >>>> sets; >>>> local data in a vehicular network; etc.) What have you been looking at >>>> that's driving considerations of discovery? >>>> >>>> Thanks, >>>> Jeff >>>> >>>> From: >>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>> To: Jeff Burke >>>> Cc: , >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> Jeff, >>>> >>>> Take a look at my posting (that Felix fixed) in a new thread on >>>> Discovery. >>>> >>>> >>>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >>>> 0 >>>> .html >>>> >>>> I think it would be very productive to talk about what Discovery should >>>> do, >>>> and not focus on the how. It is sometimes easy to get caught up in the >>>> how, >>>> which I think is a less important topic than the what at this stage. >>>> >>>> Marc >>>> >>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>> wrote: >>>> >>>> Marc, >>>> >>>> If you can't talk about your protocols, perhaps we can discuss this >>>> based >>>> on use cases. What are the use cases you are using to evaluate >>>> discovery? >>>> >>>> Jeff >>>> >>>> >>>> >>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>> wrote: >>>> >>>> No matter what the expressiveness of the predicates if the forwarder >>>> can >>>> send interests different ways you don't have a consistent underlying >>>> set >>>> to talk about so you would always need non-range exclusions to discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative >>>> answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>> >>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>> all problems that you mentioned (not just exclude)? a doc will be >>>> better. >>>> >>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>> language or context free language might become part of selector too. >>>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. >>>> >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>> If you talk sometimes to A and sometimes to B, you very easily >>>> could miss content objects you want to discovery unless you avoid >>>> all range exclusions and only exclude explicit versions. >>>> >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit >>>> exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>>> >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. For something like a sensor reading that is updated, >>>> say, once per second you will have 86,400 of them per day. If each >>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>> exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a >>>> timestamp makes sense here, but its just an example of needing a lot >>>> of exclusions. >>>> >>>> >>>> You exclude through 100 then issue a new interest. This goes to >>>> cache B >>>> >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, >>>> if >>>> this goes to cache B only, it means that cache A dies. How do you >>>> know >>>> that v101 even exist? >>>> >>>> >>>> I guess this depends on what the forwarding strategy is. If the >>>> forwarder will always send each interest to all replicas, then yes, >>>> modulo packet loss, you would discover v101 on cache A. If the >>>> forwarder is just doing ?best path? and can round-robin between cache >>>> A and cache B, then your application could miss v101. >>>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs >>>> (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>> specs for doing the exact match discovery. So, as I said, I?m not >>>> ready to claim its better yet because we have not done that. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest >>>> matching to do discovery has its own set of problems. Discovery >>>> involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about >>>> naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two >>>> different caches are responding, the exclusion set you build up >>>> talking with cache A will be invalid for cache B. If you talk >>>> sometimes to A and sometimes to B, you very easily could miss >>>> content objects you want to discovery unless you avoid all range >>>> exclusions and only exclude explicit versions. That will lead to >>>> very large interest packets. In ccnx 1.0, we believe that an >>>> explicit discovery protocol that allows conversations about >>>> consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that >>>> should be transitive between caches, but imagine this. You send >>>> Interest #1 to cache A which returns version 100. You exclude >>>> through 100 then issue a new interest. This goes to cache B who >>>> only has version 99, so the interest times out or is NACK?d. So >>>> you think you have it! But, cache A already has version 101, you >>>> just don?t know. If you cannot have a conversation around >>>> consistent sets, it seems like even doing latest version discovery >>>> is difficult with selector based discovery. From what I saw in >>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>> authoritative source because you can never believe an intermediate >>>> cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>> interest in seeing your analysis. Case (a) is that a node can >>>> correctly discover every version of a name prefix, and (b) is that >>>> a node can correctly discover the latest version. We have not >>>> formally compared (or yet published) our discovery protocols (we >>>> have three, 2 for content, 1 for device) compared to selector based >>>> discovery, so I cannot yet claim they are better, but they do not >>>> have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you >>>> must do in the PIT to match a content object. If you have a name >>>> tree or a threaded hash table, those don?t all need to be hash >>>> lookups, but you need to walk up the name tree for every prefix of >>>> the content object name and evaluate the selector predicate. >>>> Content Based Networking (CBN) had some some methods to create data >>>> structures based on predicates, maybe those would be better. But >>>> in any case, you will potentially need to retrieve many PIT entries >>>> if there is Interest traffic for many prefixes of a root. Even on >>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>> implementation only requires at most 3 lookups (one by name, one by >>>> name + keyid, one by name + content object hash), and one can do >>>> other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just >>>> walking parent pointers, I suspect you?ll need locking of the >>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>> and that will be expensive. It would be interesting to see what a >>>> cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar >>>> things? >>>> 2. will removing selectors improve performance? How do we use >>>> other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> >>>> Could you share it with us? >>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the >>>> entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the >>>> likelihood there will be at least as much and probably more need >>>> to component types than protocol extensions, we could reserve 1/2 >>>> of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components >>>> and other fields of the protocol (sine they are sub-types of the >>>> name type) we could reuse numbers and thereby have an entire 65K >>>> name component types. >>>> >>>> We divide the type space into regions, and manage it with a >>>> registry. If we ever get to the point of creating an IETF >>>> standard, IANA has 25 years of experience running registries and >>>> there are well-understood rule sets for different kinds of >>>> registries (open, requires a written spec, requires standards >>>> approval). >>>> >>>> - We allocate one ?default" name component type for ?generic >>>> name?, which would be used on name prefixes and other common >>>> cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to >>>> globally understood types that are part of the base or extension >>>> NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses >>>> (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design >>>> >>>> >>>> we could design for performance, >>>> >>>> That?s not what people are advocating. We are advocating that we >>>> *not* design for known bad performance and hope serendipity or >>>> Moore?s Law will come to the rescue. >>>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>>> >>>> Perhaps, perhaps not. Relative performance is what matters so >>>> things that don?t get faster while others do tend to get dropped >>>> or not used because they impose a performance penalty relative to >>>> the things that go faster. There is also the ?low-end? phenomenon >>>> where impovements in technology get applied to lowering cost >>>> rather than improving performance. For those environments bad >>>> performance just never get better. >>>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>>> I suspect LPM on data will always be slow (relative to the other >>>> functions). >>>> i suspect exclusions will always be slow because they will >>>> require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just >>>> speculation based on 35+ years of seeing performance improve by 4 >>>> orders of magnitude and still having to worry about counting >>>> cycles and memory references? >>>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> We should not look at a certain chip nowadays and want ndn to >>>> perform >>>> well on it. It should be the other way around: once ndn app >>>> becomes >>>> popular, a better chip will be designed for ndn. >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate >>>> successfully will be relatively more expensive, even with >>>> near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in >>>> its design. We just forgot those because the design elements >>>> that depended on those mistakes have fallen into disuse. The >>>> poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow >>>> on modern forwarding hardware, so they can?t be reliably used >>>> anywhere >>>> 2. the UDP checksum, which was a bad design when it was >>>> specified and is now a giant PITA that still causes major pain >>>> in working around. >>>> >>>> I?m afraid students today are being taught the that designers >>>> of IP were flawless, as opposed to very good scientists and >>>> engineers that got most of it right. >>>> >>>> I feel the discussion today and yesterday has been off-topic. >>>> Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of >>>> types >>>> 3. marked component: introduce only one more type and add >>>> additional >>>> marker space >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our >>>> hands and letting applications design their own mutually >>>> incompatible schemes or trying to make naming conventions with >>>> markers in a way that is fast to generate/parse and also >>>> resilient against aliasing. >>>> >>>> Also everybody thinks that the current utf8 marker naming >>>> convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>> wrote: >>>> Would that chip be suitable, i.e. can we expect most names >>>> to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in >>>> current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. >>>> Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column >>>> text lines, and >>>> NDN will have to carry more information than URLs, as far as >>>> I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some >>>> architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>> bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent >>>> 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). >>>> If you need to >>>> switch between arrays, it would be very expensive. If you >>>> have to read past >>>> the name to get to the 2nd array, then read it, then backup >>>> to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do >>>> it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes >>>> from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You >>>> seem to imply >>>> that the first one is faster. I don?t think this is the >>>> case. >>>> >>>> In the first one you?ll probably have to get the cache line >>>> for the index, >>>> then all the required cache lines for the first 5 >>>> components. For the >>>> second, you?ll have to get all the cache lines for the first >>>> 5 components. >>>> Given an assumption that a cache miss is way more expensive >>>> than >>>> evaluating a number and computing an addition, you might >>>> find that the >>>> performance of the index is actually slower than the >>>> performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at >>>> all, for >>>> example, if you just get the offsets and then send the >>>> offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case >>>> you may see a >>>> gain IF there are more cache line misses in reading the name >>>> than in >>>> reading the index. So, if the regular part of the name >>>> that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the >>>> name is to be >>>> processed by a different processor, then your might see some >>>> performance >>>> gain in using the index, but in all other circumstances I >>>> bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the >>>> protocol with >>>> only one architecture in mind. (The architecture of sending >>>> the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I >>>> would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to >>>> correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed >>>> amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your >>>> name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want the first x components) you can directly have it using >>>> the >>>> offsets. With the Nested TLV structure you have to >>>> iteratively parse >>>> the first x-1 components. With the offset structure you cane >>>> directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the >>>> "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to >>>> parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I >>>> like the >>>> existing NDN UTF8 'convention'." I'm still not sure I >>>> understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an >>>> entirely >>>> different >>>> scheme where the info that describes the name-components is >>>> ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when >>>> you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a >>>> TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the >>>> name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the >>>> offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". >>>> Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you >>>> use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want >>>> the first x components) you can directly have it using the >>>> offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first >>>> x-1 >>>> components. >>>> With the offset structure you cane directly access to the >>>> firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few >>>> "specific >>>> component type". You are imposing types for every component >>>> in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a >>>> rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would >>>> prefer >>>> to >>>> have the name as simple sequence bytes with a field >>>> separator. Then, >>>> outside the name, if you have some components that could be >>>> used at >>>> network layer (e.g. a TLV field), you simply need something >>>> that >>>> indicates which is the offset allowing you to retrieve the >>>> version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end >>>> up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification >>>> is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but >>>> that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, >>>> e.g.) >>>> and >>>> there's been email trying to explain that applications don't >>>> have to >>>> use types if they don't need to. your email sounds like "I >>>> prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that >>>> preference in >>>> the face of the points about the problems. can you say why >>>> it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: ------------------------------ Subject: Digest Footer _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest ------------------------------ End of Ndn-interest Digest, Vol 7, Issue 60 ******************************************* From tailinchu at gmail.com Wed Sep 24 21:48:30 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Wed, 24 Sep 2014 21:48:30 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> Message-ID: I agree. My initial idea is not thoughtful. After reading the comments, I think selectors need to be redesigned, or even be removed. For example, exclude is very inefficient. While other selectors express "what I want", exclude expresses "what I don't want". Given that we could have variable number of components and infinite number of possible values in each component, using exclude could be problematic. Like we detach trust model from ndn, we might as well detach discovery from ndn. It is for extensibility because different applications have their own way of discovering data. On Wed, Sep 24, 2014 at 8:20 AM, wrote: > On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: > >>For example, I see a pattern /mail/inbox/148. I, a human being, see a >>pattern with static (/mail/inbox) and variable (148) components; with >>proper naming convention, computers can also detect this pattern >>easily. Now I want to look for all mails in my inbox. I can generate a >>list of /mail/inbox/. These are my guesses, and with selectors >>I can further refine my guesses. > > I think this is a very bad example (or at least a very bad application > design). You have an app (a mail server / inbox) and you want it to list > your emails? An email list is an application data structure. I don?t > think you should use the network structure to reflect this. > > I?ll give you an example, how do you delete emails from your inbox? If an > email was cached in the network it can never be deleted from your inbox? > Or moved to another mailbox? Do you rely on the emails expiring? > > This problem is true for most (any?) situations where you use network name > structure to directly reflect the application data structure. > > Nacho > > > >>On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>> Ok, yes I think those would all be good things. >>> >>> One thing to keep in mind, especially with things like time series >>>sensor >>> data, is that people see a pattern and infer a way of doing it. That?s >>>easy >>> for a human :) But in Discovery, one should assume that one does not >>>know >>> of patterns in the data beyond what the protocols used to publish the >>>data >>> explicitly require. That said, I think some of the things you listed >>>are >>> good places to start: sensor data, web content, climate data or genome >>>data. >>> >>> We also need to state what the forwarding strategies are and what the >>>cache >>> behavior is. >>> >>> I outlined some of the points that I think are important in that other >>> posting. While ?discover latest? is useful, ?discover all? is also >>> important, and that one gets complicated fast. So points like >>>separating >>> discovery from retrieval and working with large data sets have been >>> important in shaping our thinking. That all said, I?d be happy starting >>> from 0 and working through the Discovery service definition from scratch >>> along with data set use cases. >>> >>> Marc >>> >>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff wrote: >>> >>> Hi Marc, >>> >>> Thanks ? yes, I saw that as well. I was just trying to get one step more >>> specific, which was to see if we could identify a few specific use cases >>> around which to have the conversation. (e.g., time series sensor data >>>and >>> web content retrieval for "get latest"; climate data for huge data sets; >>> local data in a vehicular network; etc.) What have you been looking at >>> that's driving considerations of discovery? >>> >>> Thanks, >>> Jeff >>> >>> From: >>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>> To: Jeff Burke >>> Cc: , >>> Subject: Re: [Ndn-interest] any comments on naming convention? >>> >>> Jeff, >>> >>> Take a look at my posting (that Felix fixed) in a new thread on >>>Discovery. >>> >>> >>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200 >>>.html >>> >>> I think it would be very productive to talk about what Discovery should >>>do, >>> and not focus on the how. It is sometimes easy to get caught up in the >>>how, >>> which I think is a less important topic than the what at this stage. >>> >>> Marc >>> >>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff wrote: >>> >>> Marc, >>> >>> If you can't talk about your protocols, perhaps we can discuss this >>>based >>> on use cases. What are the use cases you are using to evaluate >>> discovery? >>> >>> Jeff >>> >>> >>> >>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" wrote: >>> >>> No matter what the expressiveness of the predicates if the forwarder can >>> send interests different ways you don't have a consistent underlying set >>> to talk about so you would always need non-range exclusions to discover >>> every version. >>> >>> Range exclusions only work I believe if you get an authoritative answer. >>> If different content pieces are scattered between different caches I >>> don't see how range exclusions would work to discover every version. >>> >>> I'm sorry to be pointing out problems without offering solutions but >>> we're not ready to publish our discovery protocols. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>> >>> I see. Can you briefly describe how ccnx discovery protocol solves the >>> all problems that you mentioned (not just exclude)? a doc will be >>> better. >>> >>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>> expect [and] and [or], so boolean algebra is fully supported. Regular >>> language or context free language might become part of selector too. >>> >>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>> That will get you one reading then you need to exclude it and ask >>> again. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. >>> >>> >>> I am very confused. For your example, if I want to get all today's >>> sensor data, I just do (Any..Last second of last day)(First second of >>> tomorrow..Any). That's 18 bytes. >>> >>> >>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>> >>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>> >>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>> >>> If you talk sometimes to A and sometimes to B, you very easily >>> could miss content objects you want to discovery unless you avoid >>> all range exclusions and only exclude explicit versions. >>> >>> >>> Could you explain why missing content object situation happens? also >>> range exclusion is just a shorter notation for many explicit >>> exclude; >>> converting from explicit excludes to ranged exclude is always >>> possible. >>> >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. For something like a sensor reading that is updated, >>> say, once per second you will have 86,400 of them per day. If each >>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>> exclusions (plus encoding overhead) per day. >>> >>> yes, maybe using a more deterministic version number than a >>> timestamp makes sense here, but its just an example of needing a lot >>> of exclusions. >>> >>> >>> You exclude through 100 then issue a new interest. This goes to >>> cache B >>> >>> >>> I feel this case is invalid because cache A will also get the >>> interest, and cache A will return v101 if it exists. Like you said, >>> if >>> this goes to cache B only, it means that cache A dies. How do you >>> know >>> that v101 even exist? >>> >>> >>> I guess this depends on what the forwarding strategy is. If the >>> forwarder will always send each interest to all replicas, then yes, >>> modulo packet loss, you would discover v101 on cache A. If the >>> forwarder is just doing ?best path? and can round-robin between cache >>> A and cache B, then your application could miss v101. >>> >>> >>> >>> c,d In general I agree that LPM performance is related to the number >>> of components. In my own thread-safe LMP implementation, I used only >>> one RWMutex for the whole tree. I don't know whether adding lock for >>> every node will be faster or not because of lock overhead. >>> >>> However, we should compare (exact match + discovery protocol) vs >>> (ndn >>> lpm). Comparing performance of exact match to lpm is unfair. >>> >>> >>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>> specs for doing the exact match discovery. So, as I said, I?m not >>> ready to claim its better yet because we have not done that. >>> >>> >>> >>> >>> >>> >>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>> I would point out that using LPM on content object to Interest >>> matching to do discovery has its own set of problems. Discovery >>> involves more than just ?latest version? discovery too. >>> >>> This is probably getting off-topic from the original post about >>> naming conventions. >>> >>> a. If Interests can be forwarded multiple directions and two >>> different caches are responding, the exclusion set you build up >>> talking with cache A will be invalid for cache B. If you talk >>> sometimes to A and sometimes to B, you very easily could miss >>> content objects you want to discovery unless you avoid all range >>> exclusions and only exclude explicit versions. That will lead to >>> very large interest packets. In ccnx 1.0, we believe that an >>> explicit discovery protocol that allows conversations about >>> consistent sets is better. >>> >>> b. Yes, if you just want the ?latest version? discovery that >>> should be transitive between caches, but imagine this. You send >>> Interest #1 to cache A which returns version 100. You exclude >>> through 100 then issue a new interest. This goes to cache B who >>> only has version 99, so the interest times out or is NACK?d. So >>> you think you have it! But, cache A already has version 101, you >>> just don?t know. If you cannot have a conversation around >>> consistent sets, it seems like even doing latest version discovery >>> is difficult with selector based discovery. From what I saw in >>> ccnx 0.x, one ended up getting an Interest all the way to the >>> authoritative source because you can never believe an intermediate >>> cache that there?s not something more recent. >>> >>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>> interest in seeing your analysis. Case (a) is that a node can >>> correctly discover every version of a name prefix, and (b) is that >>> a node can correctly discover the latest version. We have not >>> formally compared (or yet published) our discovery protocols (we >>> have three, 2 for content, 1 for device) compared to selector based >>> discovery, so I cannot yet claim they are better, but they do not >>> have the non-determinism sketched above. >>> >>> c. Using LPM, there is a non-deterministic number of lookups you >>> must do in the PIT to match a content object. If you have a name >>> tree or a threaded hash table, those don?t all need to be hash >>> lookups, but you need to walk up the name tree for every prefix of >>> the content object name and evaluate the selector predicate. >>> Content Based Networking (CBN) had some some methods to create data >>> structures based on predicates, maybe those would be better. But >>> in any case, you will potentially need to retrieve many PIT entries >>> if there is Interest traffic for many prefixes of a root. Even on >>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>> implementation only requires at most 3 lookups (one by name, one by >>> name + keyid, one by name + content object hash), and one can do >>> other things to optimize lookup for an extra write. >>> >>> d. In (c) above, if you have a threaded name tree or are just >>> walking parent pointers, I suspect you?ll need locking of the >>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>> and that will be expensive. It would be interesting to see what a >>> cache consistent multi-threaded name tree looks like. >>> >>> Marc >>> >>> >>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>> wrote: >>> >>> I had thought about these questions, but I want to know your idea >>> besides typed component: >>> 1. LPM allows "data discovery". How will exact match do similar >>> things? >>> 2. will removing selectors improve performance? How do we use >>> other >>> faster technique to replace selector? >>> 3. fixed byte length and type. I agree more that type can be fixed >>> byte, but 2 bytes for length might not be enough for future. >>> >>> >>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>> wrote: >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> >>> Could you share it with us? >>> >>> Sure. Here?s a strawman. >>> >>> The type space is 16 bits, so you have 65,565 types. >>> >>> The type space is currently shared with the types used for the >>> entire protocol, that gives us two options: >>> (1) we reserve a range for name component types. Given the >>> likelihood there will be at least as much and probably more need >>> to component types than protocol extensions, we could reserve 1/2 >>> of the type space, giving us 32K types for name components. >>> (2) since there is no parsing ambiguity between name components >>> and other fields of the protocol (sine they are sub-types of the >>> name type) we could reuse numbers and thereby have an entire 65K >>> name component types. >>> >>> We divide the type space into regions, and manage it with a >>> registry. If we ever get to the point of creating an IETF >>> standard, IANA has 25 years of experience running registries and >>> there are well-understood rule sets for different kinds of >>> registries (open, requires a written spec, requires standards >>> approval). >>> >>> - We allocate one ?default" name component type for ?generic >>> name?, which would be used on name prefixes and other common >>> cases where there are no special semantics on the name component. >>> - We allocate a range of name component types, say 1024, to >>> globally understood types that are part of the base or extension >>> NDN specifications (e.g. chunk#, version#, etc. >>> - We reserve some portion of the space for unanticipated uses >>> (say another 1024 types) >>> - We give the rest of the space to application assignment. >>> >>> Make sense? >>> >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design >>> >>> >>> we could design for performance, >>> >>> That?s not what people are advocating. We are advocating that we >>> *not* design for known bad performance and hope serendipity or >>> Moore?s Law will come to the rescue. >>> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >>> >>> Perhaps, perhaps not. Relative performance is what matters so >>> things that don?t get faster while others do tend to get dropped >>> or not used because they impose a performance penalty relative to >>> the things that go faster. There is also the ?low-end? phenomenon >>> where impovements in technology get applied to lowering cost >>> rather than improving performance. For those environments bad >>> performance just never get better. >>> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >>> I suspect LPM on data will always be slow (relative to the other >>> functions). >>> i suspect exclusions will always be slow because they will >>> require extra memory references. >>> >>> However I of course don?t claim to clairvoyance so this is just >>> speculation based on 35+ years of seeing performance improve by 4 >>> orders of magnitude and still having to worry about counting >>> cycles and memory references? >>> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>> wrote: >>> >>> We should not look at a certain chip nowadays and want ndn to >>> perform >>> well on it. It should be the other way around: once ndn app >>> becomes >>> popular, a better chip will be designed for ndn. >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design: >>> a) clock rates are not getting (much) faster >>> b) memory accesses are getting (relatively) more expensive >>> c) data structures that require locks to manipulate >>> successfully will be relatively more expensive, even with >>> near-zero lock contention. >>> >>> The fact is, IP *did* have some serious performance flaws in >>> its design. We just forgot those because the design elements >>> that depended on those mistakes have fallen into disuse. The >>> poster children for this are: >>> 1. IP options. Nobody can use them because they are too slow >>> on modern forwarding hardware, so they can?t be reliably used >>> anywhere >>> 2. the UDP checksum, which was a bad design when it was >>> specified and is now a giant PITA that still causes major pain >>> in working around. >>> >>> I?m afraid students today are being taught the that designers >>> of IP were flawless, as opposed to very good scientists and >>> engineers that got most of it right. >>> >>> I feel the discussion today and yesterday has been off-topic. >>> Now I >>> see that there are 3 approaches: >>> 1. we should not define a naming convention at all >>> 2. typed component: use tlv type space and add a handful of >>> types >>> 3. marked component: introduce only one more type and add >>> additional >>> marker space >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> It is just as powerful in practice as either throwing up our >>> hands and letting applications design their own mutually >>> incompatible schemes or trying to make naming conventions with >>> markers in a way that is fast to generate/parse and also >>> resilient against aliasing. >>> >>> Also everybody thinks that the current utf8 marker naming >>> convention >>> needs to be revised. >>> >>> >>> >>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>> wrote: >>> Would that chip be suitable, i.e. can we expect most names >>> to fit in (the >>> magnitude of) 96 bytes? What length are names usually in >>> current NDN >>> experiments? >>> >>> I guess wide deployment could make for even longer names. >>> Related: Many URLs >>> I encounter nowadays easily don't fit within two 80-column >>> text lines, and >>> NDN will have to carry more information than URLs, as far as >>> I see. >>> >>> >>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>> >>> In fact, the index in separate TLV will be slower on some >>> architectures, >>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>> bytes in memory, >>> then any subsequent memory is accessed only as two adjacent >>> 32-byte blocks >>> (there can be at most 5 blocks available at any one time). >>> If you need to >>> switch between arrays, it would be very expensive. If you >>> have to read past >>> the name to get to the 2nd array, then read it, then backup >>> to get to the >>> name, it will be pretty expensive too. >>> >>> Marc >>> >>> On Sep 18, 2014, at 2:02 PM, >>> wrote: >>> >>> Does this make that much difference? >>> >>> If you want to parse the first 5 components. One way to do >>> it is: >>> >>> Read the index, find entry 5, then read in that many bytes >>> from the start >>> offset of the beginning of the name. >>> OR >>> Start reading name, (find size + move ) 5 times. >>> >>> How much speed are you getting from one to the other? You >>> seem to imply >>> that the first one is faster. I don?t think this is the >>> case. >>> >>> In the first one you?ll probably have to get the cache line >>> for the index, >>> then all the required cache lines for the first 5 >>> components. For the >>> second, you?ll have to get all the cache lines for the first >>> 5 components. >>> Given an assumption that a cache miss is way more expensive >>> than >>> evaluating a number and computing an addition, you might >>> find that the >>> performance of the index is actually slower than the >>> performance of the >>> direct access. >>> >>> Granted, there is a case where you don?t access the name at >>> all, for >>> example, if you just get the offsets and then send the >>> offsets as >>> parameters to another processor/GPU/NPU/etc. In this case >>> you may see a >>> gain IF there are more cache line misses in reading the name >>> than in >>> reading the index. So, if the regular part of the name >>> that you?re >>> parsing is bigger than the cache line (64 bytes?) and the >>> name is to be >>> processed by a different processor, then your might see some >>> performance >>> gain in using the index, but in all other circumstances I >>> bet this is not >>> the case. I may be wrong, haven?t actually tested it. >>> >>> This is all to say, I don?t think we should be designing the >>> protocol with >>> only one architecture in mind. (The architecture of sending >>> the name to a >>> different processor than the index). >>> >>> If you have numbers that show that the index is faster I >>> would like to see >>> under what conditions and architectural assumptions. >>> >>> Nacho >>> >>> (I may have misinterpreted your description so feel free to >>> correct me if >>> I?m wrong.) >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>> >>> wrote: >>> >>> Indeed each components' offset must be encoded using a fixed >>> amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your >>> name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want the first x components) you can directly have it using >>> the >>> offsets. With the Nested TLV structure you have to >>> iteratively parse >>> the first x-1 components. With the offset structure you cane >>> directly >>> access to the firs x components. >>> >>> I don't get it. What you described only works if the >>> "offset" is >>> encoded in fixed bytes. With varNum, you will still need to >>> parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>> >>> On 17/09/2014 14:56, Mark Stapp wrote: >>> >>> ah, thanks - that's helpful. I thought you were saying "I >>> like the >>> existing NDN UTF8 'convention'." I'm still not sure I >>> understand what >>> you >>> _do_ prefer, though. it sounds like you're describing an >>> entirely >>> different >>> scheme where the info that describes the name-components is >>> ... >>> someplace >>> other than _in_ the name-components. is that correct? when >>> you say >>> "field >>> separator", what do you mean (since that's not a "TL" from a >>> TLV)? >>> >>> Correct. >>> In particular, with our name encoding, a TLV indicates the >>> name >>> hierarchy >>> with offsets in the name and other TLV(s) indicates the >>> offset to use >>> in >>> order to retrieve special components. >>> As for the field separator, it is something like "/". >>> Aliasing is >>> avoided as >>> you do not rely on field separators to parse the name; you >>> use the >>> "offset >>> TLV " to do that. >>> >>> So now, it may be an aesthetic question but: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want >>> the first x components) you can directly have it using the >>> offsets. >>> With the >>> Nested TLV structure you have to iteratively parse the first >>> x-1 >>> components. >>> With the offset structure you cane directly access to the >>> firs x >>> components. >>> >>> Max >>> >>> >>> -- Mark >>> >>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>> The why is simple: >>> >>> You use a lot of "generic component type" and very few >>> "specific >>> component type". You are imposing types for every component >>> in order >>> to >>> handle few exceptions (segmentation, etc..). You create a >>> rule >>> (specify >>> the component's type ) to handle exceptions! >>> >>> I would prefer not to have typed components. Instead I would >>> prefer >>> to >>> have the name as simple sequence bytes with a field >>> separator. Then, >>> outside the name, if you have some components that could be >>> used at >>> network layer (e.g. a TLV field), you simply need something >>> that >>> indicates which is the offset allowing you to retrieve the >>> version, >>> segment, etc in the name... >>> >>> >>> Max >>> >>> >>> >>> >>> >>> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end >>> up with >>> names >>> containing many generic components types and few specific >>> components >>> types. Due to the fact that the component type specification >>> is an >>> exception in the name, I would prefer something that specify >>> component's >>> type only when needed (something like UTF8 conventions but >>> that >>> applications MUST use). >>> >>> so ... I can't quite follow that. the thread has had some >>> explanation >>> about why the UTF8 requirement has problems (with aliasing, >>> e.g.) >>> and >>> there's been email trying to explain that applications don't >>> have to >>> use types if they don't need to. your email sounds like "I >>> prefer >>> the >>> UTF8 convention", but it doesn't say why you have that >>> preference in >>> the face of the points about the problems. can you say why >>> it is >>> that >>> you express a preference for the "convention" with problems ? >>> >>> Thanks, >>> Mark >>> >>> . >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From jburke at remap.UCLA.EDU Wed Sep 24 22:07:20 2014 From: jburke at remap.UCLA.EDU (Burke, Jeff) Date: Thu, 25 Sep 2014 05:07:20 +0000 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: Message-ID: Yes. If someone can help, we can get this up and running... :) Probably about a week to port and a week to test with the specific video files. Jeff From: Alex Horn > Date: Wed, 24 Sep 2014 15:28:15 -0700 Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] NDNcomm 2014 videos Indeed - we had ndn - video testbed distribution for a few years ! ndn-video (source, tech report) was very useful in testing the early testbed. unfortunately it is out of date, as: a) uses pyccn/ccnx - needs to be updated to pyndn2/NFD b) uses gst .010 - needs to be updated to gst 1.X we don't internally have the resources for that, at the moment... (our recent video work has been in ndn-RTC) but if someone wanted to take it on, it is a fairly straightforward effort. meanwhile, we will get the conference video online in some form as soon as possible. thanks for your interest ! Alex On Wed, Sep 24, 2014 at 1:54 PM, Xiaoke Jiang > wrote: On Wednesday, 24 September, 2014 at 1:46 pm, Beichuan Zhang wrote: Maybe we can stream it over NDN testbed -:) There?re 3 nodes in China. Good solution. Some other guys in China also complained the slow connection. also, NDN could shows its advantage over data delivery with this solution. Xiaoke (Shock) _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From jburke at remap.ucla.edu Wed Sep 24 22:31:32 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Thu, 25 Sep 2014 05:31:32 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: Tai-Lin, I think you gave up too easily. :) While it's true that the approach described isn't necessarily that efficient for your example (you should look at the set reconciliation / SYNC work that is part of both NDN and CCN), even with it there are tradeoffs between roundtrip performance (which is only one metric) and other benefits. For example, in ndnrtc, the real-time video conferencing test application we are working on, we do some gymnastics on the consumer side to acquire the latest data the network can deliver through excludes and selectors, and then use exact match to continue playout. NDN's current design is a superset of what Nacho is describing - it includes exact match as an option, so there is always an engineering choice that can be made to use it. It fact it looks like applications might often use both. There's a lot more research to be done to tweak and evaluate the ndnrtc example, but the approach taken means that regardless of the number of consumers of the video stream, the load on the publisher doesn't change because we never get to what Nacho called the "actual publisher". This scaling property is really valuable. It's related to how WUSTL demonstrated 1000's of clients of NDNVideo (different but similar code) publishing from a relatively simple publisher and a plain vanilla NDN network. While in the conferencing case, there are indeed more roundtrips at startup, and occasional probes of the latest data the network can deliver, they are traded off for a very interesting and useful property that decouples producers and consumers even in low-latency video delivery - fewer roundtrips in the aggregate. We can argue the merits of the approach, but the point is that there is still research to be done before discarding the general notions of exclusions and selectors. Put another way, excludes provide a way of embedding requestor knowledge in the interest and selectors+LPM provide ways to help reduce the knowledge a consumer needs of the namespace to make a request. These have benefits - for example, in producer naming. Consider again "best-effort latest-value" type delivery of a sensor value in an IoT scenario, where the consumer is looking not for all values, but the best that the *network*, not the *producer*, can do in providing the current value of a sensor, image from a camera, etc. Then you can use excludes and selectors efficiently, let the data be named with a timestamp (not known to the consumer), and be done - it doesn't require the additional work done in the video example to acquire all segments. The reframing, though, is useful - interests with selectors and excludes are more akin to asking the "network" a question, which is potentially very powerful in comparison with having to often/always reach an "actual publisher". Jeff On 9/24/14, 9:48 PM, "Tai-Lin Chu" wrote: >I agree. My initial idea is not thoughtful. > >After reading the comments, I think selectors need to be redesigned, >or even be removed. >For example, exclude is very inefficient. While other selectors >express "what I want", exclude expresses "what I don't want". Given >that we could have variable number of components and infinite number >of possible values in each component, using exclude could be >problematic. > >Like we detach trust model from ndn, we might as well detach discovery >from ndn. It is for extensibility because different applications have >their own way of discovering data. > > >On Wed, Sep 24, 2014 at 8:20 AM, wrote: >> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >> >>>For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>pattern with static (/mail/inbox) and variable (148) components; with >>>proper naming convention, computers can also detect this pattern >>>easily. Now I want to look for all mails in my inbox. I can generate a >>>list of /mail/inbox/. These are my guesses, and with selectors >>>I can further refine my guesses. >> >> I think this is a very bad example (or at least a very bad application >> design). You have an app (a mail server / inbox) and you want it to >>list >> your emails? An email list is an application data structure. I don?t >> think you should use the network structure to reflect this. >> >> I?ll give you an example, how do you delete emails from your inbox? If >>an >> email was cached in the network it can never be deleted from your inbox? >> Or moved to another mailbox? Do you rely on the emails expiring? >> >> This problem is true for most (any?) situations where you use network >>name >> structure to directly reflect the application data structure. >> >> Nacho >> >> >> >>>On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>> Ok, yes I think those would all be good things. >>>> >>>> One thing to keep in mind, especially with things like time series >>>>sensor >>>> data, is that people see a pattern and infer a way of doing it. >>>>That?s >>>>easy >>>> for a human :) But in Discovery, one should assume that one does not >>>>know >>>> of patterns in the data beyond what the protocols used to publish the >>>>data >>>> explicitly require. That said, I think some of the things you listed >>>>are >>>> good places to start: sensor data, web content, climate data or genome >>>>data. >>>> >>>> We also need to state what the forwarding strategies are and what the >>>>cache >>>> behavior is. >>>> >>>> I outlined some of the points that I think are important in that other >>>> posting. While ?discover latest? is useful, ?discover all? is also >>>> important, and that one gets complicated fast. So points like >>>>separating >>>> discovery from retrieval and working with large data sets have been >>>> important in shaping our thinking. That all said, I?d be happy >>>>starting >>>> from 0 and working through the Discovery service definition from >>>>scratch >>>> along with data set use cases. >>>> >>>> Marc >>>> >>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>wrote: >>>> >>>> Hi Marc, >>>> >>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>more >>>> specific, which was to see if we could identify a few specific use >>>>cases >>>> around which to have the conversation. (e.g., time series sensor data >>>>and >>>> web content retrieval for "get latest"; climate data for huge data >>>>sets; >>>> local data in a vehicular network; etc.) What have you been looking >>>>at >>>> that's driving considerations of discovery? >>>> >>>> Thanks, >>>> Jeff >>>> >>>> From: >>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>> To: Jeff Burke >>>> Cc: , >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> Jeff, >>>> >>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>Discovery. >>>> >>>> >>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/0002 >>>>00 >>>>.html >>>> >>>> I think it would be very productive to talk about what Discovery >>>>should >>>>do, >>>> and not focus on the how. It is sometimes easy to get caught up in >>>>the >>>>how, >>>> which I think is a less important topic than the what at this stage. >>>> >>>> Marc >>>> >>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>wrote: >>>> >>>> Marc, >>>> >>>> If you can't talk about your protocols, perhaps we can discuss this >>>>based >>>> on use cases. What are the use cases you are using to evaluate >>>> discovery? >>>> >>>> Jeff >>>> >>>> >>>> >>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>wrote: >>>> >>>> No matter what the expressiveness of the predicates if the forwarder >>>>can >>>> send interests different ways you don't have a consistent underlying >>>>set >>>> to talk about so you would always need non-range exclusions to >>>>discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative >>>>answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>> >>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>> all problems that you mentioned (not just exclude)? a doc will be >>>> better. >>>> >>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>> language or context free language might become part of selector too. >>>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. >>>> >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>> If you talk sometimes to A and sometimes to B, you very easily >>>> could miss content objects you want to discovery unless you avoid >>>> all range exclusions and only exclude explicit versions. >>>> >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit >>>> exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>>> >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. For something like a sensor reading that is updated, >>>> say, once per second you will have 86,400 of them per day. If each >>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>> exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a >>>> timestamp makes sense here, but its just an example of needing a lot >>>> of exclusions. >>>> >>>> >>>> You exclude through 100 then issue a new interest. This goes to >>>> cache B >>>> >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, >>>> if >>>> this goes to cache B only, it means that cache A dies. How do you >>>> know >>>> that v101 even exist? >>>> >>>> >>>> I guess this depends on what the forwarding strategy is. If the >>>> forwarder will always send each interest to all replicas, then yes, >>>> modulo packet loss, you would discover v101 on cache A. If the >>>> forwarder is just doing ?best path? and can round-robin between cache >>>> A and cache B, then your application could miss v101. >>>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs >>>> (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>> specs for doing the exact match discovery. So, as I said, I?m not >>>> ready to claim its better yet because we have not done that. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest >>>> matching to do discovery has its own set of problems. Discovery >>>> involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about >>>> naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two >>>> different caches are responding, the exclusion set you build up >>>> talking with cache A will be invalid for cache B. If you talk >>>> sometimes to A and sometimes to B, you very easily could miss >>>> content objects you want to discovery unless you avoid all range >>>> exclusions and only exclude explicit versions. That will lead to >>>> very large interest packets. In ccnx 1.0, we believe that an >>>> explicit discovery protocol that allows conversations about >>>> consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that >>>> should be transitive between caches, but imagine this. You send >>>> Interest #1 to cache A which returns version 100. You exclude >>>> through 100 then issue a new interest. This goes to cache B who >>>> only has version 99, so the interest times out or is NACK?d. So >>>> you think you have it! But, cache A already has version 101, you >>>> just don?t know. If you cannot have a conversation around >>>> consistent sets, it seems like even doing latest version discovery >>>> is difficult with selector based discovery. From what I saw in >>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>> authoritative source because you can never believe an intermediate >>>> cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>> interest in seeing your analysis. Case (a) is that a node can >>>> correctly discover every version of a name prefix, and (b) is that >>>> a node can correctly discover the latest version. We have not >>>> formally compared (or yet published) our discovery protocols (we >>>> have three, 2 for content, 1 for device) compared to selector based >>>> discovery, so I cannot yet claim they are better, but they do not >>>> have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you >>>> must do in the PIT to match a content object. If you have a name >>>> tree or a threaded hash table, those don?t all need to be hash >>>> lookups, but you need to walk up the name tree for every prefix of >>>> the content object name and evaluate the selector predicate. >>>> Content Based Networking (CBN) had some some methods to create data >>>> structures based on predicates, maybe those would be better. But >>>> in any case, you will potentially need to retrieve many PIT entries >>>> if there is Interest traffic for many prefixes of a root. Even on >>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>> implementation only requires at most 3 lookups (one by name, one by >>>> name + keyid, one by name + content object hash), and one can do >>>> other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just >>>> walking parent pointers, I suspect you?ll need locking of the >>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>> and that will be expensive. It would be interesting to see what a >>>> cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar >>>> things? >>>> 2. will removing selectors improve performance? How do we use >>>> other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> >>>> Could you share it with us? >>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the >>>> entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the >>>> likelihood there will be at least as much and probably more need >>>> to component types than protocol extensions, we could reserve 1/2 >>>> of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components >>>> and other fields of the protocol (sine they are sub-types of the >>>> name type) we could reuse numbers and thereby have an entire 65K >>>> name component types. >>>> >>>> We divide the type space into regions, and manage it with a >>>> registry. If we ever get to the point of creating an IETF >>>> standard, IANA has 25 years of experience running registries and >>>> there are well-understood rule sets for different kinds of >>>> registries (open, requires a written spec, requires standards >>>> approval). >>>> >>>> - We allocate one ?default" name component type for ?generic >>>> name?, which would be used on name prefixes and other common >>>> cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to >>>> globally understood types that are part of the base or extension >>>> NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses >>>> (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design >>>> >>>> >>>> we could design for performance, >>>> >>>> That?s not what people are advocating. We are advocating that we >>>> *not* design for known bad performance and hope serendipity or >>>> Moore?s Law will come to the rescue. >>>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>>> >>>> Perhaps, perhaps not. Relative performance is what matters so >>>> things that don?t get faster while others do tend to get dropped >>>> or not used because they impose a performance penalty relative to >>>> the things that go faster. There is also the ?low-end? phenomenon >>>> where impovements in technology get applied to lowering cost >>>> rather than improving performance. For those environments bad >>>> performance just never get better. >>>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>>> I suspect LPM on data will always be slow (relative to the other >>>> functions). >>>> i suspect exclusions will always be slow because they will >>>> require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just >>>> speculation based on 35+ years of seeing performance improve by 4 >>>> orders of magnitude and still having to worry about counting >>>> cycles and memory references? >>>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> We should not look at a certain chip nowadays and want ndn to >>>> perform >>>> well on it. It should be the other way around: once ndn app >>>> becomes >>>> popular, a better chip will be designed for ndn. >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate >>>> successfully will be relatively more expensive, even with >>>> near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in >>>> its design. We just forgot those because the design elements >>>> that depended on those mistakes have fallen into disuse. The >>>> poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow >>>> on modern forwarding hardware, so they can?t be reliably used >>>> anywhere >>>> 2. the UDP checksum, which was a bad design when it was >>>> specified and is now a giant PITA that still causes major pain >>>> in working around. >>>> >>>> I?m afraid students today are being taught the that designers >>>> of IP were flawless, as opposed to very good scientists and >>>> engineers that got most of it right. >>>> >>>> I feel the discussion today and yesterday has been off-topic. >>>> Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of >>>> types >>>> 3. marked component: introduce only one more type and add >>>> additional >>>> marker space >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our >>>> hands and letting applications design their own mutually >>>> incompatible schemes or trying to make naming conventions with >>>> markers in a way that is fast to generate/parse and also >>>> resilient against aliasing. >>>> >>>> Also everybody thinks that the current utf8 marker naming >>>> convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>> wrote: >>>> Would that chip be suitable, i.e. can we expect most names >>>> to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in >>>> current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. >>>> Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column >>>> text lines, and >>>> NDN will have to carry more information than URLs, as far as >>>> I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some >>>> architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>> bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent >>>> 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). >>>> If you need to >>>> switch between arrays, it would be very expensive. If you >>>> have to read past >>>> the name to get to the 2nd array, then read it, then backup >>>> to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do >>>> it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes >>>> from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You >>>> seem to imply >>>> that the first one is faster. I don?t think this is the >>>> case. >>>> >>>> In the first one you?ll probably have to get the cache line >>>> for the index, >>>> then all the required cache lines for the first 5 >>>> components. For the >>>> second, you?ll have to get all the cache lines for the first >>>> 5 components. >>>> Given an assumption that a cache miss is way more expensive >>>> than >>>> evaluating a number and computing an addition, you might >>>> find that the >>>> performance of the index is actually slower than the >>>> performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at >>>> all, for >>>> example, if you just get the offsets and then send the >>>> offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case >>>> you may see a >>>> gain IF there are more cache line misses in reading the name >>>> than in >>>> reading the index. So, if the regular part of the name >>>> that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the >>>> name is to be >>>> processed by a different processor, then your might see some >>>> performance >>>> gain in using the index, but in all other circumstances I >>>> bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the >>>> protocol with >>>> only one architecture in mind. (The architecture of sending >>>> the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I >>>> would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to >>>> correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed >>>> amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your >>>> name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want the first x components) you can directly have it using >>>> the >>>> offsets. With the Nested TLV structure you have to >>>> iteratively parse >>>> the first x-1 components. With the offset structure you cane >>>> directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the >>>> "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to >>>> parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I >>>> like the >>>> existing NDN UTF8 'convention'." I'm still not sure I >>>> understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an >>>> entirely >>>> different >>>> scheme where the info that describes the name-components is >>>> ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when >>>> you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a >>>> TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the >>>> name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the >>>> offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". >>>> Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you >>>> use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want >>>> the first x components) you can directly have it using the >>>> offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first >>>> x-1 >>>> components. >>>> With the offset structure you cane directly access to the >>>> firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few >>>> "specific >>>> component type". You are imposing types for every component >>>> in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a >>>> rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would >>>> prefer >>>> to >>>> have the name as simple sequence bytes with a field >>>> separator. Then, >>>> outside the name, if you have some components that could be >>>> used at >>>> network layer (e.g. a TLV field), you simply need something >>>> that >>>> indicates which is the offset allowing you to retrieve the >>>> version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end >>>> up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification >>>> is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but >>>> that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, >>>> e.g.) >>>> and >>>> there's been email trying to explain that applications don't >>>> have to >>>> use types if they don't need to. your email sounds like "I >>>> prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that >>>> preference in >>>> the face of the points about the problems. can you say why >>>> it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> >>> >>>_______________________________________________ >>>Ndn-interest mailing list >>>Ndn-interest at lists.cs.ucla.edu >>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jburke at remap.ucla.edu Wed Sep 24 22:45:59 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Thu, 25 Sep 2014 05:45:59 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: From: > Date: Wed, 24 Sep 2014 16:25:53 +0000 To: Jeff Burke > Cc: >, >, > Subject: Re: [Ndn-interest] any comments on naming convention? I think Tai-Lin?s example was just fine to talk about discovery. /blah/blah/value, how do you discover all the ?value?s? Discovery shouldn?t care if its email messages or temperature readings or world cup photos. This is true if discovery means "finding everything" - in which case, as you point out, sync-style approaches may be best. But I am not sure that this definition is complete. The most pressing example that I can think of is best-effort latest-value, in which the consumer's goal is to get the latest copy the network can deliver at the moment, and may not care about previous values or (if freshness is used well) potential later versions. Another case that seems to work well is video seeking. Let's say I want to enable random access to a video by timecode. The publisher can provide a time-code based discovery namespace that's queried using an Interest that essentially says "give me the closest keyframe to 00:37:03:12", which returns an interest that, via the name, provides the exact timecode of the keyframe in question and a link to a segment-based namespace for efficient exact match playout. In two roundtrips and in a very lightweight way, the consumer has random access capability. If the NDN is the moral equivalent of IP, then I am not sure we should be afraid of roundtrips that provide this kind of functionality, just as they are used in TCP. I described one set of problems using the exclusion approach, and that an NDN paper on device discovery described a similar problem, though they did not go into the details of splitting interests, etc. That all was simple enough to see from the example. Another question is how does one do the discovery with exact match names, which is also conflating things. You could do a different discovery with continuation names too, just not the exclude method. As I alluded to, one needs a way to talk with a specific cache about its ?table of contents? for a prefix so one can get a consistent set of results without all the round-trips of exclusions. Actually downloading the ?headers? of the messages would be the same bytes, more or less. In a way, this is a little like name enumeration from a ccnx 0.x repo, but that protocol has its own set of problems and I?m not suggesting to use that directly. One approach is to encode a request in a name component and a participating cache can reply. It replies in such a way that one could continue talking with that cache to get its TOC. One would then issue another interest with a request for not-that-cache. I'm curious how the TOC approach works in a multi-publisher scenario? Another approach is to try to ask the authoritative source for the ?current? manifest name, i.e. /mail/inbox/current/, which could return the manifest or a link to the manifest. Then fetching the actual manifest from the link could come from caches because you how have a consistent set of names to ask for. If you cannot talk with an authoritative source, you could try again without the nonce and see if there?s a cached copy of a recent version around. Marc On Sep 24, 2014, at 5:46 PM, Burke, Jeff > wrote: On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" > wrote: On 9/24/14, 4:27 AM, "Tai-Lin Chu" > wrote: For example, I see a pattern /mail/inbox/148. I, a human being, see a pattern with static (/mail/inbox) and variable (148) components; with proper naming convention, computers can also detect this pattern easily. Now I want to look for all mails in my inbox. I can generate a list of /mail/inbox/. These are my guesses, and with selectors I can further refine my guesses. I think this is a very bad example (or at least a very bad application design). You have an app (a mail server / inbox) and you want it to list your emails? An email list is an application data structure. I don?t think you should use the network structure to reflect this. I think Tai-Lin is trying to sketch a small example, not propose a full-scale approach to email. (Maybe I am misunderstanding.) Another way to look at it is that if the network architecture is providing the equivalent of distributed storage to the application, perhaps the application data structure could be adapted to match the affordances of the network. Then it would not be so bad that the two structures were aligned. I?ll give you an example, how do you delete emails from your inbox? If an email was cached in the network it can never be deleted from your inbox? This is conflating two issues - what you are pointing out is that the data structure of a linear list doesn't handle common email management operations well. Again, I'm not sure if that's what he was getting at here. But deletion is not the issue - the availability of a data object on the network does not necessarily mean it's valid from the perspective of the application. Or moved to another mailbox? Do you rely on the emails expiring? This problem is true for most (any?) situations where you use network name structure to directly reflect the application data structure. Not sure I understand how you make the leap from the example to the general statement. Jeff Nacho On Tue, Sep 23, 2014 at 2:34 AM, > wrote: Ok, yes I think those would all be good things. One thing to keep in mind, especially with things like time series sensor data, is that people see a pattern and infer a way of doing it. That?s easy for a human :) But in Discovery, one should assume that one does not know of patterns in the data beyond what the protocols used to publish the data explicitly require. That said, I think some of the things you listed are good places to start: sensor data, web content, climate data or genome data. We also need to state what the forwarding strategies are and what the cache behavior is. I outlined some of the points that I think are important in that other posting. While ?discover latest? is useful, ?discover all? is also important, and that one gets complicated fast. So points like separating discovery from retrieval and working with large data sets have been important in shaping our thinking. That all said, I?d be happy starting from 0 and working through the Discovery service definition from scratch along with data set use cases. Marc On Sep 23, 2014, at 12:36 AM, Burke, Jeff > wrote: Hi Marc, Thanks ? yes, I saw that as well. I was just trying to get one step more specific, which was to see if we could identify a few specific use cases around which to have the conversation. (e.g., time series sensor data and web content retrieval for "get latest"; climate data for huge data sets; local data in a vehicular network; etc.) What have you been looking at that's driving considerations of discovery? Thanks, Jeff From: > Date: Mon, 22 Sep 2014 22:29:43 +0000 To: Jeff Burke > Cc: >, > Subject: Re: [Ndn-interest] any comments on naming convention? Jeff, Take a look at my posting (that Felix fixed) in a new thread on Discovery. http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 0 .html I think it would be very productive to talk about what Discovery should do, and not focus on the how. It is sometimes easy to get caught up in the how, which I think is a less important topic than the what at this stage. Marc On Sep 22, 2014, at 11:04 PM, Burke, Jeff > wrote: Marc, If you can't talk about your protocols, perhaps we can discuss this based on use cases. What are the use cases you are using to evaluate discovery? Jeff On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" > wrote: No matter what the expressiveness of the predicates if the forwarder can send interests different ways you don't have a consistent underlying set to talk about so you would always need non-range exclusions to discover every version. Range exclusions only work I believe if you get an authoritative answer. If different content pieces are scattered between different caches I don't see how range exclusions would work to discover every version. I'm sorry to be pointing out problems without offering solutions but we're not ready to publish our discovery protocols. Sent from my telephone On Sep 21, 2014, at 8:50, "Tai-Lin Chu" > wrote: I see. Can you briefly describe how ccnx discovery protocol solves the all problems that you mentioned (not just exclude)? a doc will be better. My unserious conjecture( :) ) : exclude is equal to [not]. I will soon expect [and] and [or], so boolean algebra is fully supported. Regular language or context free language might become part of selector too. On Sat, Sep 20, 2014 at 11:25 PM, > wrote: That will get you one reading then you need to exclude it and ask again. Sent from my telephone On Sep 21, 2014, at 8:22, "Tai-Lin Chu" > wrote: Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. I am very confused. For your example, if I want to get all today's sensor data, I just do (Any..Last second of last day)(First second of tomorrow..Any). That's 18 bytes. [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude On Sat, Sep 20, 2014 at 10:55 PM, > wrote: On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu > wrote: If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. Could you explain why missing content object situation happens? also range exclusion is just a shorter notation for many explicit exclude; converting from explicit excludes to ranged exclude is always possible. Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. You exclude through 100 then issue a new interest. This goes to cache B I feel this case is invalid because cache A will also get the interest, and cache A will return v101 if it exists. Like you said, if this goes to cache B only, it means that cache A dies. How do you know that v101 even exist? I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. c,d In general I agree that LPM performance is related to the number of components. In my own thread-safe LMP implementation, I used only one RWMutex for the whole tree. I don't know whether adding lock for every node will be faster or not because of lock overhead. However, we should compare (exact match + discovery protocol) vs (ndn lpm). Comparing performance of exact match to lpm is unfair. Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. On Sat, Sep 20, 2014 at 2:38 PM, > wrote: I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. This is probably getting off-topic from the original post about naming conventions. a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. Marc On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu > wrote: I had thought about these questions, but I want to know your idea besides typed component: 1. LPM allows "data discovery". How will exact match do similar things? 2. will removing selectors improve performance? How do we use other faster technique to replace selector? 3. fixed byte length and type. I agree more that type can be fixed byte, but 2 bytes for length might not be enough for future. On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) > wrote: On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu > wrote: I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. Could you share it with us? Sure. Here?s a strawman. The type space is 16 bits, so you have 65,565 types. The type space is currently shared with the types used for the entire protocol, that gives us two options: (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. - We reserve some portion of the space for unanticipated uses (say another 1024 types) - We give the rest of the space to application assignment. Make sense? While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design we could design for performance, That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. but I think there will be a turning point when the slower design starts to become "fast enough?. Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. Do you think there will be some design of ndn that will *never* have performance improvement? I suspect LPM on data will always be slow (relative to the other functions). i suspect exclusions will always be slow because they will require extra memory references. However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) > wrote: On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu > wrote: We should not look at a certain chip nowadays and want ndn to perform well on it. It should be the other way around: once ndn app becomes popular, a better chip will be designed for ndn. While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: a) clock rates are not getting (much) faster b) memory accesses are getting (relatively) more expensive c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. I feel the discussion today and yesterday has been off-topic. Now I see that there are 3 approaches: 1. we should not define a naming convention at all 2. typed component: use tlv type space and add a handful of types 3. marked component: introduce only one more type and add additional marker space I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. Also everybody thinks that the current utf8 marker naming convention needs to be revised. On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe > wrote: Would that chip be suitable, i.e. can we expect most names to fit in (the magnitude of) 96 bytes? What length are names usually in current NDN experiments? I guess wide deployment could make for even longer names. Related: Many URLs I encounter nowadays easily don't fit within two 80-column text lines, and NDN will have to carry more information than URLs, as far as I see. On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: In fact, the index in separate TLV will be slower on some architectures, like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, then any subsequent memory is accessed only as two adjacent 32-byte blocks (there can be at most 5 blocks available at any one time). If you need to switch between arrays, it would be very expensive. If you have to read past the name to get to the 2nd array, then read it, then backup to get to the name, it will be pretty expensive too. Marc On Sep 18, 2014, at 2:02 PM, > > wrote: Does this make that much difference? If you want to parse the first 5 components. One way to do it is: Read the index, find entry 5, then read in that many bytes from the start offset of the beginning of the name. OR Start reading name, (find size + move ) 5 times. How much speed are you getting from one to the other? You seem to imply that the first one is faster. I don?t think this is the case. In the first one you?ll probably have to get the cache line for the index, then all the required cache lines for the first 5 components. For the second, you?ll have to get all the cache lines for the first 5 components. Given an assumption that a cache miss is way more expensive than evaluating a number and computing an addition, you might find that the performance of the index is actually slower than the performance of the direct access. Granted, there is a case where you don?t access the name at all, for example, if you just get the offsets and then send the offsets as parameters to another processor/GPU/NPU/etc. In this case you may see a gain IF there are more cache line misses in reading the name than in reading the index. So, if the regular part of the name that you?re parsing is bigger than the cache line (64 bytes?) and the name is to be processed by a different processor, then your might see some performance gain in using the index, but in all other circumstances I bet this is not the case. I may be wrong, haven?t actually tested it. This is all to say, I don?t think we should be designing the protocol with only one architecture in mind. (The architecture of sending the name to a different processor than the index). If you have numbers that show that the index is faster I would like to see under what conditions and architectural assumptions. Nacho (I may have misinterpreted your description so feel free to correct me if I?m wrong.) -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/18/14, 12:54 AM, "Massimo Gallo" > wrote: Indeed each components' offset must be encoded using a fixed amount of bytes: i.e., Type = Offsets Length = 10 Bytes Value = Offset1(1byte), Offset2(1byte), ... You may also imagine to have a "Offset_2byte" type if your name is too long. Max On 18/09/2014 09:27, Tai-Lin Chu wrote: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. I don't get it. What you described only works if the "offset" is encoded in fixed bytes. With varNum, you will still need to parse x-1 offsets to get to the x offset. On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > wrote: On 17/09/2014 14:56, Mark Stapp wrote: ah, thanks - that's helpful. I thought you were saying "I like the existing NDN UTF8 'convention'." I'm still not sure I understand what you _do_ prefer, though. it sounds like you're describing an entirely different scheme where the info that describes the name-components is ... someplace other than _in_ the name-components. is that correct? when you say "field separator", what do you mean (since that's not a "TL" from a TLV)? Correct. In particular, with our name encoding, a TLV indicates the name hierarchy with offsets in the name and other TLV(s) indicates the offset to use in order to retrieve special components. As for the field separator, it is something like "/". Aliasing is avoided as you do not rely on field separators to parse the name; you use the "offset TLV " to do that. So now, it may be an aesthetic question but: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. Max -- Mark On 9/17/14 6:02 AM, Massimo Gallo wrote: The why is simple: You use a lot of "generic component type" and very few "specific component type". You are imposing types for every component in order to handle few exceptions (segmentation, etc..). You create a rule (specify the component's type ) to handle exceptions! I would prefer not to have typed components. Instead I would prefer to have the name as simple sequence bytes with a field separator. Then, outside the name, if you have some components that could be used at network layer (e.g. a TLV field), you simply need something that indicates which is the offset allowing you to retrieve the version, segment, etc in the name... Max On 16/09/2014 20:33, Mark Stapp wrote: On 9/16/14 10:29 AM, Massimo Gallo wrote: I think we agree on the small number of "component types". However, if you have a small number of types, you will end up with names containing many generic components types and few specific components types. Due to the fact that the component type specification is an exception in the name, I would prefer something that specify component's type only when needed (something like UTF8 conventions but that applications MUST use). so ... I can't quite follow that. the thread has had some explanation about why the UTF8 requirement has problems (with aliasing, e.g.) and there's been email trying to explain that applications don't have to use types if they don't need to. your email sounds like "I prefer the UTF8 convention", but it doesn't say why you have that preference in the face of the points about the problems. can you say why it is that you express a preference for the "convention" with problems ? Thanks, Mark . _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From jburke at remap.ucla.edu Wed Sep 24 22:48:22 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Thu, 25 Sep 2014 05:48:22 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <773536A5-3E3F-454E-8352-BB88354FCE5D@parc.com> Message-ID: From: > Date: Tue, 23 Sep 2014 09:34:58 +0000 To: Jeff Burke > Cc: >, > Subject: Re: [Ndn-interest] any comments on naming convention? Ok, yes I think those would all be good things. One thing to keep in mind, especially with things like time series sensor data, is that people see a pattern and infer a way of doing it. That?s easy for a human :) But in Discovery, one should assume that one does not know of patterns in the data beyond what the protocols used to publish the data explicitly require. That said, I think some of the things you listed are good places to start: sensor data, web content, climate data or genome data. We also need to state what the forwarding strategies are and what the cache behavior is. I outlined some of the points that I think are important in that other posting. While ?discover latest? is useful, ?discover all? is also important, and that one gets complicated fast. So points like separating discovery from retrieval and working with large data sets have been important in shaping our thinking. That all said, I?d be happy starting from 0 and working through the Discovery service definition from scratch along with data set use cases. Yeah, I'm not sure whether there is an argument here that excludes and selectors should be used for "discover all" - maybe someone else will have one. But I think that's the purpose of the sync-style protocols that are under development. Jeff Marc On Sep 23, 2014, at 12:36 AM, Burke, Jeff > wrote: Hi Marc, Thanks ? yes, I saw that as well. I was just trying to get one step more specific, which was to see if we could identify a few specific use cases around which to have the conversation. (e.g., time series sensor data and web content retrieval for "get latest"; climate data for huge data sets; local data in a vehicular network; etc.) What have you been looking at that's driving considerations of discovery? Thanks, Jeff From: > Date: Mon, 22 Sep 2014 22:29:43 +0000 To: Jeff Burke > Cc: >, > Subject: Re: [Ndn-interest] any comments on naming convention? Jeff, Take a look at my posting (that Felix fixed) in a new thread on Discovery. http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000200.html I think it would be very productive to talk about what Discovery should do, and not focus on the how. It is sometimes easy to get caught up in the how, which I think is a less important topic than the what at this stage. Marc On Sep 22, 2014, at 11:04 PM, Burke, Jeff > wrote: Marc, If you can't talk about your protocols, perhaps we can discuss this based on use cases. What are the use cases you are using to evaluate discovery? Jeff On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" > wrote: No matter what the expressiveness of the predicates if the forwarder can send interests different ways you don't have a consistent underlying set to talk about so you would always need non-range exclusions to discover every version. Range exclusions only work I believe if you get an authoritative answer. If different content pieces are scattered between different caches I don't see how range exclusions would work to discover every version. I'm sorry to be pointing out problems without offering solutions but we're not ready to publish our discovery protocols. Sent from my telephone On Sep 21, 2014, at 8:50, "Tai-Lin Chu" > wrote: I see. Can you briefly describe how ccnx discovery protocol solves the all problems that you mentioned (not just exclude)? a doc will be better. My unserious conjecture( :) ) : exclude is equal to [not]. I will soon expect [and] and [or], so boolean algebra is fully supported. Regular language or context free language might become part of selector too. On Sat, Sep 20, 2014 at 11:25 PM, > wrote: That will get you one reading then you need to exclude it and ask again. Sent from my telephone On Sep 21, 2014, at 8:22, "Tai-Lin Chu" > wrote: Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. I am very confused. For your example, if I want to get all today's sensor data, I just do (Any..Last second of last day)(First second of tomorrow..Any). That's 18 bytes. [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude On Sat, Sep 20, 2014 at 10:55 PM, > wrote: On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu > wrote: If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. Could you explain why missing content object situation happens? also range exclusion is just a shorter notation for many explicit exclude; converting from explicit excludes to ranged exclude is always possible. Yes, my point was that if you cannot talk about a consistent set with a particular cache, then you need to always use individual excludes not range excludes if you want to discover all the versions of an object. For something like a sensor reading that is updated, say, once per second you will have 86,400 of them per day. If each exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of exclusions (plus encoding overhead) per day. yes, maybe using a more deterministic version number than a timestamp makes sense here, but its just an example of needing a lot of exclusions. You exclude through 100 then issue a new interest. This goes to cache B I feel this case is invalid because cache A will also get the interest, and cache A will return v101 if it exists. Like you said, if this goes to cache B only, it means that cache A dies. How do you know that v101 even exist? I guess this depends on what the forwarding strategy is. If the forwarder will always send each interest to all replicas, then yes, modulo packet loss, you would discover v101 on cache A. If the forwarder is just doing ?best path? and can round-robin between cache A and cache B, then your application could miss v101. c,d In general I agree that LPM performance is related to the number of components. In my own thread-safe LMP implementation, I used only one RWMutex for the whole tree. I don't know whether adding lock for every node will be faster or not because of lock overhead. However, we should compare (exact match + discovery protocol) vs (ndn lpm). Comparing performance of exact match to lpm is unfair. Yes, we should compare them. And we need to publish the ccnx 1.0 specs for doing the exact match discovery. So, as I said, I?m not ready to claim its better yet because we have not done that. On Sat, Sep 20, 2014 at 2:38 PM, > wrote: I would point out that using LPM on content object to Interest matching to do discovery has its own set of problems. Discovery involves more than just ?latest version? discovery too. This is probably getting off-topic from the original post about naming conventions. a. If Interests can be forwarded multiple directions and two different caches are responding, the exclusion set you build up talking with cache A will be invalid for cache B. If you talk sometimes to A and sometimes to B, you very easily could miss content objects you want to discovery unless you avoid all range exclusions and only exclude explicit versions. That will lead to very large interest packets. In ccnx 1.0, we believe that an explicit discovery protocol that allows conversations about consistent sets is better. b. Yes, if you just want the ?latest version? discovery that should be transitive between caches, but imagine this. You send Interest #1 to cache A which returns version 100. You exclude through 100 then issue a new interest. This goes to cache B who only has version 99, so the interest times out or is NACK?d. So you think you have it! But, cache A already has version 101, you just don?t know. If you cannot have a conversation around consistent sets, it seems like even doing latest version discovery is difficult with selector based discovery. From what I saw in ccnx 0.x, one ended up getting an Interest all the way to the authoritative source because you can never believe an intermediate cache that there?s not something more recent. I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be interest in seeing your analysis. Case (a) is that a node can correctly discover every version of a name prefix, and (b) is that a node can correctly discover the latest version. We have not formally compared (or yet published) our discovery protocols (we have three, 2 for content, 1 for device) compared to selector based discovery, so I cannot yet claim they are better, but they do not have the non-determinism sketched above. c. Using LPM, there is a non-deterministic number of lookups you must do in the PIT to match a content object. If you have a name tree or a threaded hash table, those don?t all need to be hash lookups, but you need to walk up the name tree for every prefix of the content object name and evaluate the selector predicate. Content Based Networking (CBN) had some some methods to create data structures based on predicates, maybe those would be better. But in any case, you will potentially need to retrieve many PIT entries if there is Interest traffic for many prefixes of a root. Even on an Intel system, you?ll likely miss cache lines, so you?ll have a lot of NUMA access for each one. In CCNx 1.0, even a naive implementation only requires at most 3 lookups (one by name, one by name + keyid, one by name + content object hash), and one can do other things to optimize lookup for an extra write. d. In (c) above, if you have a threaded name tree or are just walking parent pointers, I suspect you?ll need locking of the ancestors in a multi-threaded system (?threaded" here meaning LWP) and that will be expensive. It would be interesting to see what a cache consistent multi-threaded name tree looks like. Marc On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu > wrote: I had thought about these questions, but I want to know your idea besides typed component: 1. LPM allows "data discovery". How will exact match do similar things? 2. will removing selectors improve performance? How do we use other faster technique to replace selector? 3. fixed byte length and type. I agree more that type can be fixed byte, but 2 bytes for length might not be enough for future. On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) > wrote: On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu > wrote: I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. Could you share it with us? Sure. Here?s a strawman. The type space is 16 bits, so you have 65,565 types. The type space is currently shared with the types used for the entire protocol, that gives us two options: (1) we reserve a range for name component types. Given the likelihood there will be at least as much and probably more need to component types than protocol extensions, we could reserve 1/2 of the type space, giving us 32K types for name components. (2) since there is no parsing ambiguity between name components and other fields of the protocol (sine they are sub-types of the name type) we could reuse numbers and thereby have an entire 65K name component types. We divide the type space into regions, and manage it with a registry. If we ever get to the point of creating an IETF standard, IANA has 25 years of experience running registries and there are well-understood rule sets for different kinds of registries (open, requires a written spec, requires standards approval). - We allocate one ?default" name component type for ?generic name?, which would be used on name prefixes and other common cases where there are no special semantics on the name component. - We allocate a range of name component types, say 1024, to globally understood types that are part of the base or extension NDN specifications (e.g. chunk#, version#, etc. - We reserve some portion of the space for unanticipated uses (say another 1024 types) - We give the rest of the space to application assignment. Make sense? While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design we could design for performance, That?s not what people are advocating. We are advocating that we *not* design for known bad performance and hope serendipity or Moore?s Law will come to the rescue. but I think there will be a turning point when the slower design starts to become "fast enough?. Perhaps, perhaps not. Relative performance is what matters so things that don?t get faster while others do tend to get dropped or not used because they impose a performance penalty relative to the things that go faster. There is also the ?low-end? phenomenon where impovements in technology get applied to lowering cost rather than improving performance. For those environments bad performance just never get better. Do you think there will be some design of ndn that will *never* have performance improvement? I suspect LPM on data will always be slow (relative to the other functions). i suspect exclusions will always be slow because they will require extra memory references. However I of course don?t claim to clairvoyance so this is just speculation based on 35+ years of seeing performance improve by 4 orders of magnitude and still having to worry about counting cycles and memory references? On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) > wrote: On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu > wrote: We should not look at a certain chip nowadays and want ndn to perform well on it. It should be the other way around: once ndn app becomes popular, a better chip will be designed for ndn. While I?m sympathetic to that view, there are three ways in which Moore?s law or hardware tricks will not save us from performance flaws in the design: a) clock rates are not getting (much) faster b) memory accesses are getting (relatively) more expensive c) data structures that require locks to manipulate successfully will be relatively more expensive, even with near-zero lock contention. The fact is, IP *did* have some serious performance flaws in its design. We just forgot those because the design elements that depended on those mistakes have fallen into disuse. The poster children for this are: 1. IP options. Nobody can use them because they are too slow on modern forwarding hardware, so they can?t be reliably used anywhere 2. the UDP checksum, which was a bad design when it was specified and is now a giant PITA that still causes major pain in working around. I?m afraid students today are being taught the that designers of IP were flawless, as opposed to very good scientists and engineers that got most of it right. I feel the discussion today and yesterday has been off-topic. Now I see that there are 3 approaches: 1. we should not define a naming convention at all 2. typed component: use tlv type space and add a handful of types 3. marked component: introduce only one more type and add additional marker space I know how to make #2 flexible enough to do what things I can envision we need to do, and with a few simple conventions on how the registry of types is managed. It is just as powerful in practice as either throwing up our hands and letting applications design their own mutually incompatible schemes or trying to make naming conventions with markers in a way that is fast to generate/parse and also resilient against aliasing. Also everybody thinks that the current utf8 marker naming convention needs to be revised. On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe > wrote: Would that chip be suitable, i.e. can we expect most names to fit in (the magnitude of) 96 bytes? What length are names usually in current NDN experiments? I guess wide deployment could make for even longer names. Related: Many URLs I encounter nowadays easily don't fit within two 80-column text lines, and NDN will have to carry more information than URLs, as far as I see. On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: In fact, the index in separate TLV will be slower on some architectures, like the ezChip NP4. The NP4 can hold the fist 96 frame bytes in memory, then any subsequent memory is accessed only as two adjacent 32-byte blocks (there can be at most 5 blocks available at any one time). If you need to switch between arrays, it would be very expensive. If you have to read past the name to get to the 2nd array, then read it, then backup to get to the name, it will be pretty expensive too. Marc On Sep 18, 2014, at 2:02 PM, > > wrote: Does this make that much difference? If you want to parse the first 5 components. One way to do it is: Read the index, find entry 5, then read in that many bytes from the start offset of the beginning of the name. OR Start reading name, (find size + move ) 5 times. How much speed are you getting from one to the other? You seem to imply that the first one is faster. I don?t think this is the case. In the first one you?ll probably have to get the cache line for the index, then all the required cache lines for the first 5 components. For the second, you?ll have to get all the cache lines for the first 5 components. Given an assumption that a cache miss is way more expensive than evaluating a number and computing an addition, you might find that the performance of the index is actually slower than the performance of the direct access. Granted, there is a case where you don?t access the name at all, for example, if you just get the offsets and then send the offsets as parameters to another processor/GPU/NPU/etc. In this case you may see a gain IF there are more cache line misses in reading the name than in reading the index. So, if the regular part of the name that you?re parsing is bigger than the cache line (64 bytes?) and the name is to be processed by a different processor, then your might see some performance gain in using the index, but in all other circumstances I bet this is not the case. I may be wrong, haven?t actually tested it. This is all to say, I don?t think we should be designing the protocol with only one architecture in mind. (The architecture of sending the name to a different processor than the index). If you have numbers that show that the index is faster I would like to see under what conditions and architectural assumptions. Nacho (I may have misinterpreted your description so feel free to correct me if I?m wrong.) -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/18/14, 12:54 AM, "Massimo Gallo" > wrote: Indeed each components' offset must be encoded using a fixed amount of bytes: i.e., Type = Offsets Length = 10 Bytes Value = Offset1(1byte), Offset2(1byte), ... You may also imagine to have a "Offset_2byte" type if your name is too long. Max On 18/09/2014 09:27, Tai-Lin Chu wrote: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. I don't get it. What you described only works if the "offset" is encoded in fixed bytes. With varNum, you will still need to parse x-1 offsets to get to the x offset. On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > wrote: On 17/09/2014 14:56, Mark Stapp wrote: ah, thanks - that's helpful. I thought you were saying "I like the existing NDN UTF8 'convention'." I'm still not sure I understand what you _do_ prefer, though. it sounds like you're describing an entirely different scheme where the info that describes the name-components is ... someplace other than _in_ the name-components. is that correct? when you say "field separator", what do you mean (since that's not a "TL" from a TLV)? Correct. In particular, with our name encoding, a TLV indicates the name hierarchy with offsets in the name and other TLV(s) indicates the offset to use in order to retrieve special components. As for the field separator, it is something like "/". Aliasing is avoided as you do not rely on field separators to parse the name; you use the "offset TLV " to do that. So now, it may be an aesthetic question but: if you do not need the entire hierarchal structure (suppose you only want the first x components) you can directly have it using the offsets. With the Nested TLV structure you have to iteratively parse the first x-1 components. With the offset structure you cane directly access to the firs x components. Max -- Mark On 9/17/14 6:02 AM, Massimo Gallo wrote: The why is simple: You use a lot of "generic component type" and very few "specific component type". You are imposing types for every component in order to handle few exceptions (segmentation, etc..). You create a rule (specify the component's type ) to handle exceptions! I would prefer not to have typed components. Instead I would prefer to have the name as simple sequence bytes with a field separator. Then, outside the name, if you have some components that could be used at network layer (e.g. a TLV field), you simply need something that indicates which is the offset allowing you to retrieve the version, segment, etc in the name... Max On 16/09/2014 20:33, Mark Stapp wrote: On 9/16/14 10:29 AM, Massimo Gallo wrote: I think we agree on the small number of "component types". However, if you have a small number of types, you will end up with names containing many generic components types and few specific components types. Due to the fact that the component type specification is an exception in the name, I would prefer something that specify component's type only when needed (something like UTF8 conventions but that applications MUST use). so ... I can't quite follow that. the thread has had some explanation about why the UTF8 requirement has problems (with aliasing, e.g.) and there's been email trying to explain that applications don't have to use types if they don't need to. your email sounds like "I prefer the UTF8 convention", but it doesn't say why you have that preference in the face of the points about the problems. can you say why it is that you express a preference for the "convention" with problems ? Thanks, Mark . _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Wed Sep 24 23:16:30 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Wed, 24 Sep 2014 23:16:30 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: However, I cannot see whether we can achieve "best-effort *all*-value" efficiently. There are still interesting topics on 1. how do we express the discovery query? 2. is selector "discovery-complete"? i. e. can we express any discovery query with current selector? 3. if so, can we re-express current selector in a more efficient way? I personally see a named data as a set, which can then be categorized into "ordered set", and "unordered set". some questions that any discovery expression must solve: 1. is this a nil set or not? nil set means that this name is the leaf 2. set contains member X? 3. is set ordered or not 4. (ordered) first, prev, next, last 5. if we enforce component ordering, answer question 4. 6. recursively answer all questions above on any set member On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff wrote: > > > From: > Date: Wed, 24 Sep 2014 16:25:53 +0000 > To: Jeff Burke > Cc: , , > > Subject: Re: [Ndn-interest] any comments on naming convention? > > I think Tai-Lin?s example was just fine to talk about discovery. > /blah/blah/value, how do you discover all the ?value?s? Discovery shouldn?t > care if its email messages or temperature readings or world cup photos. > > > This is true if discovery means "finding everything" - in which case, as you > point out, sync-style approaches may be best. But I am not sure that this > definition is complete. The most pressing example that I can think of is > best-effort latest-value, in which the consumer's goal is to get the latest > copy the network can deliver at the moment, and may not care about previous > values or (if freshness is used well) potential later versions. > > Another case that seems to work well is video seeking. Let's say I want to > enable random access to a video by timecode. The publisher can provide a > time-code based discovery namespace that's queried using an Interest that > essentially says "give me the closest keyframe to 00:37:03:12", which > returns an interest that, via the name, provides the exact timecode of the > keyframe in question and a link to a segment-based namespace for efficient > exact match playout. In two roundtrips and in a very lightweight way, the > consumer has random access capability. If the NDN is the moral equivalent > of IP, then I am not sure we should be afraid of roundtrips that provide > this kind of functionality, just as they are used in TCP. > > > I described one set of problems using the exclusion approach, and that an > NDN paper on device discovery described a similar problem, though they did > not go into the details of splitting interests, etc. That all was simple > enough to see from the example. > > Another question is how does one do the discovery with exact match names, > which is also conflating things. You could do a different discovery with > continuation names too, just not the exclude method. > > As I alluded to, one needs a way to talk with a specific cache about its > ?table of contents? for a prefix so one can get a consistent set of results > without all the round-trips of exclusions. Actually downloading the > ?headers? of the messages would be the same bytes, more or less. In a way, > this is a little like name enumeration from a ccnx 0.x repo, but that > protocol has its own set of problems and I?m not suggesting to use that > directly. > > One approach is to encode a request in a name component and a participating > cache can reply. It replies in such a way that one could continue talking > with that cache to get its TOC. One would then issue another interest with > a request for not-that-cache. > > > I'm curious how the TOC approach works in a multi-publisher scenario? > > > Another approach is to try to ask the authoritative source for the ?current? > manifest name, i.e. /mail/inbox/current/, which could return the > manifest or a link to the manifest. Then fetching the actual manifest from > the link could come from caches because you how have a consistent set of > names to ask for. If you cannot talk with an authoritative source, you > could try again without the nonce and see if there?s a cached copy of a > recent version around. > > Marc > > > On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: > > > > On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" > wrote: > > On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: > > For example, I see a pattern /mail/inbox/148. I, a human being, see a > pattern with static (/mail/inbox) and variable (148) components; with > proper naming convention, computers can also detect this pattern > easily. Now I want to look for all mails in my inbox. I can generate a > list of /mail/inbox/. These are my guesses, and with selectors > I can further refine my guesses. > > > I think this is a very bad example (or at least a very bad application > design). You have an app (a mail server / inbox) and you want it to list > your emails? An email list is an application data structure. I don?t > think you should use the network structure to reflect this. > > > I think Tai-Lin is trying to sketch a small example, not propose a > full-scale approach to email. (Maybe I am misunderstanding.) > > > Another way to look at it is that if the network architecture is providing > the equivalent of distributed storage to the application, perhaps the > application data structure could be adapted to match the affordances of > the network. Then it would not be so bad that the two structures were > aligned. > > > I?ll give you an example, how do you delete emails from your inbox? If an > email was cached in the network it can never be deleted from your inbox? > > > This is conflating two issues - what you are pointing out is that the data > structure of a linear list doesn't handle common email management > operations well. Again, I'm not sure if that's what he was getting at > here. But deletion is not the issue - the availability of a data object > on the network does not necessarily mean it's valid from the perspective > of the application. > > Or moved to another mailbox? Do you rely on the emails expiring? > > This problem is true for most (any?) situations where you use network name > structure to directly reflect the application data structure. > > > Not sure I understand how you make the leap from the example to the > general statement. > > Jeff > > > > > Nacho > > > > On Tue, Sep 23, 2014 at 2:34 AM, wrote: > > Ok, yes I think those would all be good things. > > One thing to keep in mind, especially with things like time series > sensor > data, is that people see a pattern and infer a way of doing it. That?s > easy > for a human :) But in Discovery, one should assume that one does not > know > of patterns in the data beyond what the protocols used to publish the > data > explicitly require. That said, I think some of the things you listed > are > good places to start: sensor data, web content, climate data or genome > data. > > We also need to state what the forwarding strategies are and what the > cache > behavior is. > > I outlined some of the points that I think are important in that other > posting. While ?discover latest? is useful, ?discover all? is also > important, and that one gets complicated fast. So points like > separating > discovery from retrieval and working with large data sets have been > important in shaping our thinking. That all said, I?d be happy > starting > from 0 and working through the Discovery service definition from > scratch > along with data set use cases. > > Marc > > On Sep 23, 2014, at 12:36 AM, Burke, Jeff > wrote: > > Hi Marc, > > Thanks ? yes, I saw that as well. I was just trying to get one step > more > specific, which was to see if we could identify a few specific use > cases > around which to have the conversation. (e.g., time series sensor data > and > web content retrieval for "get latest"; climate data for huge data > sets; > local data in a vehicular network; etc.) What have you been looking at > that's driving considerations of discovery? > > Thanks, > Jeff > > From: > Date: Mon, 22 Sep 2014 22:29:43 +0000 > To: Jeff Burke > Cc: , > Subject: Re: [Ndn-interest] any comments on naming convention? > > Jeff, > > Take a look at my posting (that Felix fixed) in a new thread on > Discovery. > > > http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 > 0 > .html > > I think it would be very productive to talk about what Discovery should > do, > and not focus on the how. It is sometimes easy to get caught up in the > how, > which I think is a less important topic than the what at this stage. > > Marc > > On Sep 22, 2014, at 11:04 PM, Burke, Jeff > wrote: > > Marc, > > If you can't talk about your protocols, perhaps we can discuss this > based > on use cases. What are the use cases you are using to evaluate > discovery? > > Jeff > > > > On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" > wrote: > > No matter what the expressiveness of the predicates if the forwarder > can > send interests different ways you don't have a consistent underlying > set > to talk about so you would always need non-range exclusions to discover > every version. > > Range exclusions only work I believe if you get an authoritative > answer. > If different content pieces are scattered between different caches I > don't see how range exclusions would work to discover every version. > > I'm sorry to be pointing out problems without offering solutions but > we're not ready to publish our discovery protocols. > > Sent from my telephone > > On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: > > I see. Can you briefly describe how ccnx discovery protocol solves the > all problems that you mentioned (not just exclude)? a doc will be > better. > > My unserious conjecture( :) ) : exclude is equal to [not]. I will soon > expect [and] and [or], so boolean algebra is fully supported. Regular > language or context free language might become part of selector too. > > On Sat, Sep 20, 2014 at 11:25 PM, wrote: > That will get you one reading then you need to exclude it and ask > again. > > Sent from my telephone > > On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: > > Yes, my point was that if you cannot talk about a consistent set > with a particular cache, then you need to always use individual > excludes not range excludes if you want to discover all the versions > of an object. > > > I am very confused. For your example, if I want to get all today's > sensor data, I just do (Any..Last second of last day)(First second of > tomorrow..Any). That's 18 bytes. > > > [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude > > On Sat, Sep 20, 2014 at 10:55 PM, wrote: > > On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: > > If you talk sometimes to A and sometimes to B, you very easily > could miss content objects you want to discovery unless you avoid > all range exclusions and only exclude explicit versions. > > > Could you explain why missing content object situation happens? also > range exclusion is just a shorter notation for many explicit > exclude; > converting from explicit excludes to ranged exclude is always > possible. > > > Yes, my point was that if you cannot talk about a consistent set > with a particular cache, then you need to always use individual > excludes not range excludes if you want to discover all the versions > of an object. For something like a sensor reading that is updated, > say, once per second you will have 86,400 of them per day. If each > exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of > exclusions (plus encoding overhead) per day. > > yes, maybe using a more deterministic version number than a > timestamp makes sense here, but its just an example of needing a lot > of exclusions. > > > You exclude through 100 then issue a new interest. This goes to > cache B > > > I feel this case is invalid because cache A will also get the > interest, and cache A will return v101 if it exists. Like you said, > if > this goes to cache B only, it means that cache A dies. How do you > know > that v101 even exist? > > > I guess this depends on what the forwarding strategy is. If the > forwarder will always send each interest to all replicas, then yes, > modulo packet loss, you would discover v101 on cache A. If the > forwarder is just doing ?best path? and can round-robin between cache > A and cache B, then your application could miss v101. > > > > c,d In general I agree that LPM performance is related to the number > of components. In my own thread-safe LMP implementation, I used only > one RWMutex for the whole tree. I don't know whether adding lock for > every node will be faster or not because of lock overhead. > > However, we should compare (exact match + discovery protocol) vs > (ndn > lpm). Comparing performance of exact match to lpm is unfair. > > > Yes, we should compare them. And we need to publish the ccnx 1.0 > specs for doing the exact match discovery. So, as I said, I?m not > ready to claim its better yet because we have not done that. > > > > > > > On Sat, Sep 20, 2014 at 2:38 PM, wrote: > I would point out that using LPM on content object to Interest > matching to do discovery has its own set of problems. Discovery > involves more than just ?latest version? discovery too. > > This is probably getting off-topic from the original post about > naming conventions. > > a. If Interests can be forwarded multiple directions and two > different caches are responding, the exclusion set you build up > talking with cache A will be invalid for cache B. If you talk > sometimes to A and sometimes to B, you very easily could miss > content objects you want to discovery unless you avoid all range > exclusions and only exclude explicit versions. That will lead to > very large interest packets. In ccnx 1.0, we believe that an > explicit discovery protocol that allows conversations about > consistent sets is better. > > b. Yes, if you just want the ?latest version? discovery that > should be transitive between caches, but imagine this. You send > Interest #1 to cache A which returns version 100. You exclude > through 100 then issue a new interest. This goes to cache B who > only has version 99, so the interest times out or is NACK?d. So > you think you have it! But, cache A already has version 101, you > just don?t know. If you cannot have a conversation around > consistent sets, it seems like even doing latest version discovery > is difficult with selector based discovery. From what I saw in > ccnx 0.x, one ended up getting an Interest all the way to the > authoritative source because you can never believe an intermediate > cache that there?s not something more recent. > > I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be > interest in seeing your analysis. Case (a) is that a node can > correctly discover every version of a name prefix, and (b) is that > a node can correctly discover the latest version. We have not > formally compared (or yet published) our discovery protocols (we > have three, 2 for content, 1 for device) compared to selector based > discovery, so I cannot yet claim they are better, but they do not > have the non-determinism sketched above. > > c. Using LPM, there is a non-deterministic number of lookups you > must do in the PIT to match a content object. If you have a name > tree or a threaded hash table, those don?t all need to be hash > lookups, but you need to walk up the name tree for every prefix of > the content object name and evaluate the selector predicate. > Content Based Networking (CBN) had some some methods to create data > structures based on predicates, maybe those would be better. But > in any case, you will potentially need to retrieve many PIT entries > if there is Interest traffic for many prefixes of a root. Even on > an Intel system, you?ll likely miss cache lines, so you?ll have a > lot of NUMA access for each one. In CCNx 1.0, even a naive > implementation only requires at most 3 lookups (one by name, one by > name + keyid, one by name + content object hash), and one can do > other things to optimize lookup for an extra write. > > d. In (c) above, if you have a threaded name tree or are just > walking parent pointers, I suspect you?ll need locking of the > ancestors in a multi-threaded system (?threaded" here meaning LWP) > and that will be expensive. It would be interesting to see what a > cache consistent multi-threaded name tree looks like. > > Marc > > > On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu > wrote: > > I had thought about these questions, but I want to know your idea > besides typed component: > 1. LPM allows "data discovery". How will exact match do similar > things? > 2. will removing selectors improve performance? How do we use > other > faster technique to replace selector? > 3. fixed byte length and type. I agree more that type can be fixed > byte, but 2 bytes for length might not be enough for future. > > > On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) > wrote: > > On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu > wrote: > > I know how to make #2 flexible enough to do what things I can > envision we need to do, and with a few simple conventions on > how the registry of types is managed. > > > Could you share it with us? > > Sure. Here?s a strawman. > > The type space is 16 bits, so you have 65,565 types. > > The type space is currently shared with the types used for the > entire protocol, that gives us two options: > (1) we reserve a range for name component types. Given the > likelihood there will be at least as much and probably more need > to component types than protocol extensions, we could reserve 1/2 > of the type space, giving us 32K types for name components. > (2) since there is no parsing ambiguity between name components > and other fields of the protocol (sine they are sub-types of the > name type) we could reuse numbers and thereby have an entire 65K > name component types. > > We divide the type space into regions, and manage it with a > registry. If we ever get to the point of creating an IETF > standard, IANA has 25 years of experience running registries and > there are well-understood rule sets for different kinds of > registries (open, requires a written spec, requires standards > approval). > > - We allocate one ?default" name component type for ?generic > name?, which would be used on name prefixes and other common > cases where there are no special semantics on the name component. > - We allocate a range of name component types, say 1024, to > globally understood types that are part of the base or extension > NDN specifications (e.g. chunk#, version#, etc. > - We reserve some portion of the space for unanticipated uses > (say another 1024 types) > - We give the rest of the space to application assignment. > > Make sense? > > > While I?m sympathetic to that view, there are three ways in > which Moore?s law or hardware tricks will not save us from > performance flaws in the design > > > we could design for performance, > > That?s not what people are advocating. We are advocating that we > *not* design for known bad performance and hope serendipity or > Moore?s Law will come to the rescue. > > but I think there will be a turning > point when the slower design starts to become "fast enough?. > > Perhaps, perhaps not. Relative performance is what matters so > things that don?t get faster while others do tend to get dropped > or not used because they impose a performance penalty relative to > the things that go faster. There is also the ?low-end? phenomenon > where impovements in technology get applied to lowering cost > rather than improving performance. For those environments bad > performance just never get better. > > Do you > think there will be some design of ndn that will *never* have > performance improvement? > > I suspect LPM on data will always be slow (relative to the other > functions). > i suspect exclusions will always be slow because they will > require extra memory references. > > However I of course don?t claim to clairvoyance so this is just > speculation based on 35+ years of seeing performance improve by 4 > orders of magnitude and still having to worry about counting > cycles and memory references? > > On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) > wrote: > > On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu > wrote: > > We should not look at a certain chip nowadays and want ndn to > perform > well on it. It should be the other way around: once ndn app > becomes > popular, a better chip will be designed for ndn. > > While I?m sympathetic to that view, there are three ways in > which Moore?s law or hardware tricks will not save us from > performance flaws in the design: > a) clock rates are not getting (much) faster > b) memory accesses are getting (relatively) more expensive > c) data structures that require locks to manipulate > successfully will be relatively more expensive, even with > near-zero lock contention. > > The fact is, IP *did* have some serious performance flaws in > its design. We just forgot those because the design elements > that depended on those mistakes have fallen into disuse. The > poster children for this are: > 1. IP options. Nobody can use them because they are too slow > on modern forwarding hardware, so they can?t be reliably used > anywhere > 2. the UDP checksum, which was a bad design when it was > specified and is now a giant PITA that still causes major pain > in working around. > > I?m afraid students today are being taught the that designers > of IP were flawless, as opposed to very good scientists and > engineers that got most of it right. > > I feel the discussion today and yesterday has been off-topic. > Now I > see that there are 3 approaches: > 1. we should not define a naming convention at all > 2. typed component: use tlv type space and add a handful of > types > 3. marked component: introduce only one more type and add > additional > marker space > > I know how to make #2 flexible enough to do what things I can > envision we need to do, and with a few simple conventions on > how the registry of types is managed. > > It is just as powerful in practice as either throwing up our > hands and letting applications design their own mutually > incompatible schemes or trying to make naming conventions with > markers in a way that is fast to generate/parse and also > resilient against aliasing. > > Also everybody thinks that the current utf8 marker naming > convention > needs to be revised. > > > > On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe > wrote: > Would that chip be suitable, i.e. can we expect most names > to fit in (the > magnitude of) 96 bytes? What length are names usually in > current NDN > experiments? > > I guess wide deployment could make for even longer names. > Related: Many URLs > I encounter nowadays easily don't fit within two 80-column > text lines, and > NDN will have to carry more information than URLs, as far as > I see. > > > On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: > > In fact, the index in separate TLV will be slower on some > architectures, > like the ezChip NP4. The NP4 can hold the fist 96 frame > bytes in memory, > then any subsequent memory is accessed only as two adjacent > 32-byte blocks > (there can be at most 5 blocks available at any one time). > If you need to > switch between arrays, it would be very expensive. If you > have to read past > the name to get to the 2nd array, then read it, then backup > to get to the > name, it will be pretty expensive too. > > Marc > > On Sep 18, 2014, at 2:02 PM, > wrote: > > Does this make that much difference? > > If you want to parse the first 5 components. One way to do > it is: > > Read the index, find entry 5, then read in that many bytes > from the start > offset of the beginning of the name. > OR > Start reading name, (find size + move ) 5 times. > > How much speed are you getting from one to the other? You > seem to imply > that the first one is faster. I don?t think this is the > case. > > In the first one you?ll probably have to get the cache line > for the index, > then all the required cache lines for the first 5 > components. For the > second, you?ll have to get all the cache lines for the first > 5 components. > Given an assumption that a cache miss is way more expensive > than > evaluating a number and computing an addition, you might > find that the > performance of the index is actually slower than the > performance of the > direct access. > > Granted, there is a case where you don?t access the name at > all, for > example, if you just get the offsets and then send the > offsets as > parameters to another processor/GPU/NPU/etc. In this case > you may see a > gain IF there are more cache line misses in reading the name > than in > reading the index. So, if the regular part of the name > that you?re > parsing is bigger than the cache line (64 bytes?) and the > name is to be > processed by a different processor, then your might see some > performance > gain in using the index, but in all other circumstances I > bet this is not > the case. I may be wrong, haven?t actually tested it. > > This is all to say, I don?t think we should be designing the > protocol with > only one architecture in mind. (The architecture of sending > the name to a > different processor than the index). > > If you have numbers that show that the index is faster I > would like to see > under what conditions and architectural assumptions. > > Nacho > > (I may have misinterpreted your description so feel free to > correct me if > I?m wrong.) > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > On 9/18/14, 12:54 AM, "Massimo Gallo" > > wrote: > > Indeed each components' offset must be encoded using a fixed > amount of > bytes: > > i.e., > Type = Offsets > Length = 10 Bytes > Value = Offset1(1byte), Offset2(1byte), ... > > You may also imagine to have a "Offset_2byte" type if your > name is too > long. > > Max > > On 18/09/2014 09:27, Tai-Lin Chu wrote: > > if you do not need the entire hierarchal structure (suppose > you only > want the first x components) you can directly have it using > the > offsets. With the Nested TLV structure you have to > iteratively parse > the first x-1 components. With the offset structure you cane > directly > access to the firs x components. > > I don't get it. What you described only works if the > "offset" is > encoded in fixed bytes. With varNum, you will still need to > parse x-1 > offsets to get to the x offset. > > > > On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo > wrote: > > On 17/09/2014 14:56, Mark Stapp wrote: > > ah, thanks - that's helpful. I thought you were saying "I > like the > existing NDN UTF8 'convention'." I'm still not sure I > understand what > you > _do_ prefer, though. it sounds like you're describing an > entirely > different > scheme where the info that describes the name-components is > ... > someplace > other than _in_ the name-components. is that correct? when > you say > "field > separator", what do you mean (since that's not a "TL" from a > TLV)? > > Correct. > In particular, with our name encoding, a TLV indicates the > name > hierarchy > with offsets in the name and other TLV(s) indicates the > offset to use > in > order to retrieve special components. > As for the field separator, it is something like "/". > Aliasing is > avoided as > you do not rely on field separators to parse the name; you > use the > "offset > TLV " to do that. > > So now, it may be an aesthetic question but: > > if you do not need the entire hierarchal structure (suppose > you only > want > the first x components) you can directly have it using the > offsets. > With the > Nested TLV structure you have to iteratively parse the first > x-1 > components. > With the offset structure you cane directly access to the > firs x > components. > > Max > > > -- Mark > > On 9/17/14 6:02 AM, Massimo Gallo wrote: > > The why is simple: > > You use a lot of "generic component type" and very few > "specific > component type". You are imposing types for every component > in order > to > handle few exceptions (segmentation, etc..). You create a > rule > (specify > the component's type ) to handle exceptions! > > I would prefer not to have typed components. Instead I would > prefer > to > have the name as simple sequence bytes with a field > separator. Then, > outside the name, if you have some components that could be > used at > network layer (e.g. a TLV field), you simply need something > that > indicates which is the offset allowing you to retrieve the > version, > segment, etc in the name... > > > Max > > > > > > On 16/09/2014 20:33, Mark Stapp wrote: > > On 9/16/14 10:29 AM, Massimo Gallo wrote: > > I think we agree on the small number of "component types". > However, if you have a small number of types, you will end > up with > names > containing many generic components types and few specific > components > types. Due to the fact that the component type specification > is an > exception in the name, I would prefer something that specify > component's > type only when needed (something like UTF8 conventions but > that > applications MUST use). > > so ... I can't quite follow that. the thread has had some > explanation > about why the UTF8 requirement has problems (with aliasing, > e.g.) > and > there's been email trying to explain that applications don't > have to > use types if they don't need to. your email sounds like "I > prefer > the > UTF8 convention", but it doesn't say why you have that > preference in > the face of the points about the problems. can you say why > it is > that > you express a preference for the "convention" with problems ? > > Thanks, > Mark > > . > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > From jburke at remap.ucla.edu Wed Sep 24 23:18:34 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Thu, 25 Sep 2014 06:18:34 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/SynchronizationPr otocol.html J. On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >However, I cannot see whether we can achieve "best-effort *all*-value" >efficiently. >There are still interesting topics on >1. how do we express the discovery query? >2. is selector "discovery-complete"? i. e. can we express any >discovery query with current selector? >3. if so, can we re-express current selector in a more efficient way? > >I personally see a named data as a set, which can then be categorized >into "ordered set", and "unordered set". >some questions that any discovery expression must solve: >1. is this a nil set or not? nil set means that this name is the leaf >2. set contains member X? >3. is set ordered or not >4. (ordered) first, prev, next, last >5. if we enforce component ordering, answer question 4. >6. recursively answer all questions above on any set member > > > >On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >wrote: >> >> >> From: >> Date: Wed, 24 Sep 2014 16:25:53 +0000 >> To: Jeff Burke >> Cc: , , >> >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> I think Tai-Lin?s example was just fine to talk about discovery. >> /blah/blah/value, how do you discover all the ?value?s? Discovery >>shouldn?t >> care if its email messages or temperature readings or world cup photos. >> >> >> This is true if discovery means "finding everything" - in which case, >>as you >> point out, sync-style approaches may be best. But I am not sure that >>this >> definition is complete. The most pressing example that I can think of >>is >> best-effort latest-value, in which the consumer's goal is to get the >>latest >> copy the network can deliver at the moment, and may not care about >>previous >> values or (if freshness is used well) potential later versions. >> >> Another case that seems to work well is video seeking. Let's say I >>want to >> enable random access to a video by timecode. The publisher can provide a >> time-code based discovery namespace that's queried using an Interest >>that >> essentially says "give me the closest keyframe to 00:37:03:12", which >> returns an interest that, via the name, provides the exact timecode of >>the >> keyframe in question and a link to a segment-based namespace for >>efficient >> exact match playout. In two roundtrips and in a very lightweight way, >>the >> consumer has random access capability. If the NDN is the moral >>equivalent >> of IP, then I am not sure we should be afraid of roundtrips that provide >> this kind of functionality, just as they are used in TCP. >> >> >> I described one set of problems using the exclusion approach, and that >>an >> NDN paper on device discovery described a similar problem, though they >>did >> not go into the details of splitting interests, etc. That all was >>simple >> enough to see from the example. >> >> Another question is how does one do the discovery with exact match >>names, >> which is also conflating things. You could do a different discovery >>with >> continuation names too, just not the exclude method. >> >> As I alluded to, one needs a way to talk with a specific cache about its >> ?table of contents? for a prefix so one can get a consistent set of >>results >> without all the round-trips of exclusions. Actually downloading the >> ?headers? of the messages would be the same bytes, more or less. In a >>way, >> this is a little like name enumeration from a ccnx 0.x repo, but that >> protocol has its own set of problems and I?m not suggesting to use that >> directly. >> >> One approach is to encode a request in a name component and a >>participating >> cache can reply. It replies in such a way that one could continue >>talking >> with that cache to get its TOC. One would then issue another interest >>with >> a request for not-that-cache. >> >> >> I'm curious how the TOC approach works in a multi-publisher scenario? >> >> >> Another approach is to try to ask the authoritative source for the >>?current? >> manifest name, i.e. /mail/inbox/current/, which could return the >> manifest or a link to the manifest. Then fetching the actual manifest >>from >> the link could come from caches because you how have a consistent set of >> names to ask for. If you cannot talk with an authoritative source, you >> could try again without the nonce and see if there?s a cached copy of a >> recent version around. >> >> Marc >> >> >> On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: >> >> >> >> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >> wrote: >> >> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >> >> For example, I see a pattern /mail/inbox/148. I, a human being, see a >> pattern with static (/mail/inbox) and variable (148) components; with >> proper naming convention, computers can also detect this pattern >> easily. Now I want to look for all mails in my inbox. I can generate a >> list of /mail/inbox/. These are my guesses, and with selectors >> I can further refine my guesses. >> >> >> I think this is a very bad example (or at least a very bad application >> design). You have an app (a mail server / inbox) and you want it to >>list >> your emails? An email list is an application data structure. I don?t >> think you should use the network structure to reflect this. >> >> >> I think Tai-Lin is trying to sketch a small example, not propose a >> full-scale approach to email. (Maybe I am misunderstanding.) >> >> >> Another way to look at it is that if the network architecture is >>providing >> the equivalent of distributed storage to the application, perhaps the >> application data structure could be adapted to match the affordances of >> the network. Then it would not be so bad that the two structures were >> aligned. >> >> >> I?ll give you an example, how do you delete emails from your inbox? If >>an >> email was cached in the network it can never be deleted from your inbox? >> >> >> This is conflating two issues - what you are pointing out is that the >>data >> structure of a linear list doesn't handle common email management >> operations well. Again, I'm not sure if that's what he was getting at >> here. But deletion is not the issue - the availability of a data object >> on the network does not necessarily mean it's valid from the perspective >> of the application. >> >> Or moved to another mailbox? Do you rely on the emails expiring? >> >> This problem is true for most (any?) situations where you use network >>name >> structure to directly reflect the application data structure. >> >> >> Not sure I understand how you make the leap from the example to the >> general statement. >> >> Jeff >> >> >> >> >> Nacho >> >> >> >> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >> >> Ok, yes I think those would all be good things. >> >> One thing to keep in mind, especially with things like time series >> sensor >> data, is that people see a pattern and infer a way of doing it. That?s >> easy >> for a human :) But in Discovery, one should assume that one does not >> know >> of patterns in the data beyond what the protocols used to publish the >> data >> explicitly require. That said, I think some of the things you listed >> are >> good places to start: sensor data, web content, climate data or genome >> data. >> >> We also need to state what the forwarding strategies are and what the >> cache >> behavior is. >> >> I outlined some of the points that I think are important in that other >> posting. While ?discover latest? is useful, ?discover all? is also >> important, and that one gets complicated fast. So points like >> separating >> discovery from retrieval and working with large data sets have been >> important in shaping our thinking. That all said, I?d be happy >> starting >> from 0 and working through the Discovery service definition from >> scratch >> along with data set use cases. >> >> Marc >> >> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >> wrote: >> >> Hi Marc, >> >> Thanks ? yes, I saw that as well. I was just trying to get one step >> more >> specific, which was to see if we could identify a few specific use >> cases >> around which to have the conversation. (e.g., time series sensor data >> and >> web content retrieval for "get latest"; climate data for huge data >> sets; >> local data in a vehicular network; etc.) What have you been looking at >> that's driving considerations of discovery? >> >> Thanks, >> Jeff >> >> From: >> Date: Mon, 22 Sep 2014 22:29:43 +0000 >> To: Jeff Burke >> Cc: , >> Subject: Re: [Ndn-interest] any comments on naming convention? >> >> Jeff, >> >> Take a look at my posting (that Felix fixed) in a new thread on >> Discovery. >> >> >> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >> 0 >> .html >> >> I think it would be very productive to talk about what Discovery should >> do, >> and not focus on the how. It is sometimes easy to get caught up in the >> how, >> which I think is a less important topic than the what at this stage. >> >> Marc >> >> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >> wrote: >> >> Marc, >> >> If you can't talk about your protocols, perhaps we can discuss this >> based >> on use cases. What are the use cases you are using to evaluate >> discovery? >> >> Jeff >> >> >> >> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >> wrote: >> >> No matter what the expressiveness of the predicates if the forwarder >> can >> send interests different ways you don't have a consistent underlying >> set >> to talk about so you would always need non-range exclusions to discover >> every version. >> >> Range exclusions only work I believe if you get an authoritative >> answer. >> If different content pieces are scattered between different caches I >> don't see how range exclusions would work to discover every version. >> >> I'm sorry to be pointing out problems without offering solutions but >> we're not ready to publish our discovery protocols. >> >> Sent from my telephone >> >> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >> >> I see. Can you briefly describe how ccnx discovery protocol solves the >> all problems that you mentioned (not just exclude)? a doc will be >> better. >> >> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >> expect [and] and [or], so boolean algebra is fully supported. Regular >> language or context free language might become part of selector too. >> >> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >> That will get you one reading then you need to exclude it and ask >> again. >> >> Sent from my telephone >> >> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >> >> Yes, my point was that if you cannot talk about a consistent set >> with a particular cache, then you need to always use individual >> excludes not range excludes if you want to discover all the versions >> of an object. >> >> >> I am very confused. For your example, if I want to get all today's >> sensor data, I just do (Any..Last second of last day)(First second of >> tomorrow..Any). That's 18 bytes. >> >> >> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >> >> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >> >> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >> >> If you talk sometimes to A and sometimes to B, you very easily >> could miss content objects you want to discovery unless you avoid >> all range exclusions and only exclude explicit versions. >> >> >> Could you explain why missing content object situation happens? also >> range exclusion is just a shorter notation for many explicit >> exclude; >> converting from explicit excludes to ranged exclude is always >> possible. >> >> >> Yes, my point was that if you cannot talk about a consistent set >> with a particular cache, then you need to always use individual >> excludes not range excludes if you want to discover all the versions >> of an object. For something like a sensor reading that is updated, >> say, once per second you will have 86,400 of them per day. If each >> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >> exclusions (plus encoding overhead) per day. >> >> yes, maybe using a more deterministic version number than a >> timestamp makes sense here, but its just an example of needing a lot >> of exclusions. >> >> >> You exclude through 100 then issue a new interest. This goes to >> cache B >> >> >> I feel this case is invalid because cache A will also get the >> interest, and cache A will return v101 if it exists. Like you said, >> if >> this goes to cache B only, it means that cache A dies. How do you >> know >> that v101 even exist? >> >> >> I guess this depends on what the forwarding strategy is. If the >> forwarder will always send each interest to all replicas, then yes, >> modulo packet loss, you would discover v101 on cache A. If the >> forwarder is just doing ?best path? and can round-robin between cache >> A and cache B, then your application could miss v101. >> >> >> >> c,d In general I agree that LPM performance is related to the number >> of components. In my own thread-safe LMP implementation, I used only >> one RWMutex for the whole tree. I don't know whether adding lock for >> every node will be faster or not because of lock overhead. >> >> However, we should compare (exact match + discovery protocol) vs >> (ndn >> lpm). Comparing performance of exact match to lpm is unfair. >> >> >> Yes, we should compare them. And we need to publish the ccnx 1.0 >> specs for doing the exact match discovery. So, as I said, I?m not >> ready to claim its better yet because we have not done that. >> >> >> >> >> >> >> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >> I would point out that using LPM on content object to Interest >> matching to do discovery has its own set of problems. Discovery >> involves more than just ?latest version? discovery too. >> >> This is probably getting off-topic from the original post about >> naming conventions. >> >> a. If Interests can be forwarded multiple directions and two >> different caches are responding, the exclusion set you build up >> talking with cache A will be invalid for cache B. If you talk >> sometimes to A and sometimes to B, you very easily could miss >> content objects you want to discovery unless you avoid all range >> exclusions and only exclude explicit versions. That will lead to >> very large interest packets. In ccnx 1.0, we believe that an >> explicit discovery protocol that allows conversations about >> consistent sets is better. >> >> b. Yes, if you just want the ?latest version? discovery that >> should be transitive between caches, but imagine this. You send >> Interest #1 to cache A which returns version 100. You exclude >> through 100 then issue a new interest. This goes to cache B who >> only has version 99, so the interest times out or is NACK?d. So >> you think you have it! But, cache A already has version 101, you >> just don?t know. If you cannot have a conversation around >> consistent sets, it seems like even doing latest version discovery >> is difficult with selector based discovery. From what I saw in >> ccnx 0.x, one ended up getting an Interest all the way to the >> authoritative source because you can never believe an intermediate >> cache that there?s not something more recent. >> >> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >> interest in seeing your analysis. Case (a) is that a node can >> correctly discover every version of a name prefix, and (b) is that >> a node can correctly discover the latest version. We have not >> formally compared (or yet published) our discovery protocols (we >> have three, 2 for content, 1 for device) compared to selector based >> discovery, so I cannot yet claim they are better, but they do not >> have the non-determinism sketched above. >> >> c. Using LPM, there is a non-deterministic number of lookups you >> must do in the PIT to match a content object. If you have a name >> tree or a threaded hash table, those don?t all need to be hash >> lookups, but you need to walk up the name tree for every prefix of >> the content object name and evaluate the selector predicate. >> Content Based Networking (CBN) had some some methods to create data >> structures based on predicates, maybe those would be better. But >> in any case, you will potentially need to retrieve many PIT entries >> if there is Interest traffic for many prefixes of a root. Even on >> an Intel system, you?ll likely miss cache lines, so you?ll have a >> lot of NUMA access for each one. In CCNx 1.0, even a naive >> implementation only requires at most 3 lookups (one by name, one by >> name + keyid, one by name + content object hash), and one can do >> other things to optimize lookup for an extra write. >> >> d. In (c) above, if you have a threaded name tree or are just >> walking parent pointers, I suspect you?ll need locking of the >> ancestors in a multi-threaded system (?threaded" here meaning LWP) >> and that will be expensive. It would be interesting to see what a >> cache consistent multi-threaded name tree looks like. >> >> Marc >> >> >> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >> wrote: >> >> I had thought about these questions, but I want to know your idea >> besides typed component: >> 1. LPM allows "data discovery". How will exact match do similar >> things? >> 2. will removing selectors improve performance? How do we use >> other >> faster technique to replace selector? >> 3. fixed byte length and type. I agree more that type can be fixed >> byte, but 2 bytes for length might not be enough for future. >> >> >> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >> wrote: >> >> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >> wrote: >> >> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on >> how the registry of types is managed. >> >> >> Could you share it with us? >> >> Sure. Here?s a strawman. >> >> The type space is 16 bits, so you have 65,565 types. >> >> The type space is currently shared with the types used for the >> entire protocol, that gives us two options: >> (1) we reserve a range for name component types. Given the >> likelihood there will be at least as much and probably more need >> to component types than protocol extensions, we could reserve 1/2 >> of the type space, giving us 32K types for name components. >> (2) since there is no parsing ambiguity between name components >> and other fields of the protocol (sine they are sub-types of the >> name type) we could reuse numbers and thereby have an entire 65K >> name component types. >> >> We divide the type space into regions, and manage it with a >> registry. If we ever get to the point of creating an IETF >> standard, IANA has 25 years of experience running registries and >> there are well-understood rule sets for different kinds of >> registries (open, requires a written spec, requires standards >> approval). >> >> - We allocate one ?default" name component type for ?generic >> name?, which would be used on name prefixes and other common >> cases where there are no special semantics on the name component. >> - We allocate a range of name component types, say 1024, to >> globally understood types that are part of the base or extension >> NDN specifications (e.g. chunk#, version#, etc. >> - We reserve some portion of the space for unanticipated uses >> (say another 1024 types) >> - We give the rest of the space to application assignment. >> >> Make sense? >> >> >> While I?m sympathetic to that view, there are three ways in >> which Moore?s law or hardware tricks will not save us from >> performance flaws in the design >> >> >> we could design for performance, >> >> That?s not what people are advocating. We are advocating that we >> *not* design for known bad performance and hope serendipity or >> Moore?s Law will come to the rescue. >> >> but I think there will be a turning >> point when the slower design starts to become "fast enough?. >> >> Perhaps, perhaps not. Relative performance is what matters so >> things that don?t get faster while others do tend to get dropped >> or not used because they impose a performance penalty relative to >> the things that go faster. There is also the ?low-end? phenomenon >> where impovements in technology get applied to lowering cost >> rather than improving performance. For those environments bad >> performance just never get better. >> >> Do you >> think there will be some design of ndn that will *never* have >> performance improvement? >> >> I suspect LPM on data will always be slow (relative to the other >> functions). >> i suspect exclusions will always be slow because they will >> require extra memory references. >> >> However I of course don?t claim to clairvoyance so this is just >> speculation based on 35+ years of seeing performance improve by 4 >> orders of magnitude and still having to worry about counting >> cycles and memory references? >> >> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >> wrote: >> >> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >> wrote: >> >> We should not look at a certain chip nowadays and want ndn to >> perform >> well on it. It should be the other way around: once ndn app >> becomes >> popular, a better chip will be designed for ndn. >> >> While I?m sympathetic to that view, there are three ways in >> which Moore?s law or hardware tricks will not save us from >> performance flaws in the design: >> a) clock rates are not getting (much) faster >> b) memory accesses are getting (relatively) more expensive >> c) data structures that require locks to manipulate >> successfully will be relatively more expensive, even with >> near-zero lock contention. >> >> The fact is, IP *did* have some serious performance flaws in >> its design. We just forgot those because the design elements >> that depended on those mistakes have fallen into disuse. The >> poster children for this are: >> 1. IP options. Nobody can use them because they are too slow >> on modern forwarding hardware, so they can?t be reliably used >> anywhere >> 2. the UDP checksum, which was a bad design when it was >> specified and is now a giant PITA that still causes major pain >> in working around. >> >> I?m afraid students today are being taught the that designers >> of IP were flawless, as opposed to very good scientists and >> engineers that got most of it right. >> >> I feel the discussion today and yesterday has been off-topic. >> Now I >> see that there are 3 approaches: >> 1. we should not define a naming convention at all >> 2. typed component: use tlv type space and add a handful of >> types >> 3. marked component: introduce only one more type and add >> additional >> marker space >> >> I know how to make #2 flexible enough to do what things I can >> envision we need to do, and with a few simple conventions on >> how the registry of types is managed. >> >> It is just as powerful in practice as either throwing up our >> hands and letting applications design their own mutually >> incompatible schemes or trying to make naming conventions with >> markers in a way that is fast to generate/parse and also >> resilient against aliasing. >> >> Also everybody thinks that the current utf8 marker naming >> convention >> needs to be revised. >> >> >> >> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >> wrote: >> Would that chip be suitable, i.e. can we expect most names >> to fit in (the >> magnitude of) 96 bytes? What length are names usually in >> current NDN >> experiments? >> >> I guess wide deployment could make for even longer names. >> Related: Many URLs >> I encounter nowadays easily don't fit within two 80-column >> text lines, and >> NDN will have to carry more information than URLs, as far as >> I see. >> >> >> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >> >> In fact, the index in separate TLV will be slower on some >> architectures, >> like the ezChip NP4. The NP4 can hold the fist 96 frame >> bytes in memory, >> then any subsequent memory is accessed only as two adjacent >> 32-byte blocks >> (there can be at most 5 blocks available at any one time). >> If you need to >> switch between arrays, it would be very expensive. If you >> have to read past >> the name to get to the 2nd array, then read it, then backup >> to get to the >> name, it will be pretty expensive too. >> >> Marc >> >> On Sep 18, 2014, at 2:02 PM, >> wrote: >> >> Does this make that much difference? >> >> If you want to parse the first 5 components. One way to do >> it is: >> >> Read the index, find entry 5, then read in that many bytes >> from the start >> offset of the beginning of the name. >> OR >> Start reading name, (find size + move ) 5 times. >> >> How much speed are you getting from one to the other? You >> seem to imply >> that the first one is faster. I don?t think this is the >> case. >> >> In the first one you?ll probably have to get the cache line >> for the index, >> then all the required cache lines for the first 5 >> components. For the >> second, you?ll have to get all the cache lines for the first >> 5 components. >> Given an assumption that a cache miss is way more expensive >> than >> evaluating a number and computing an addition, you might >> find that the >> performance of the index is actually slower than the >> performance of the >> direct access. >> >> Granted, there is a case where you don?t access the name at >> all, for >> example, if you just get the offsets and then send the >> offsets as >> parameters to another processor/GPU/NPU/etc. In this case >> you may see a >> gain IF there are more cache line misses in reading the name >> than in >> reading the index. So, if the regular part of the name >> that you?re >> parsing is bigger than the cache line (64 bytes?) and the >> name is to be >> processed by a different processor, then your might see some >> performance >> gain in using the index, but in all other circumstances I >> bet this is not >> the case. I may be wrong, haven?t actually tested it. >> >> This is all to say, I don?t think we should be designing the >> protocol with >> only one architecture in mind. (The architecture of sending >> the name to a >> different processor than the index). >> >> If you have numbers that show that the index is faster I >> would like to see >> under what conditions and architectural assumptions. >> >> Nacho >> >> (I may have misinterpreted your description so feel free to >> correct me if >> I?m wrong.) >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> On 9/18/14, 12:54 AM, "Massimo Gallo" >> >> wrote: >> >> Indeed each components' offset must be encoded using a fixed >> amount of >> bytes: >> >> i.e., >> Type = Offsets >> Length = 10 Bytes >> Value = Offset1(1byte), Offset2(1byte), ... >> >> You may also imagine to have a "Offset_2byte" type if your >> name is too >> long. >> >> Max >> >> On 18/09/2014 09:27, Tai-Lin Chu wrote: >> >> if you do not need the entire hierarchal structure (suppose >> you only >> want the first x components) you can directly have it using >> the >> offsets. With the Nested TLV structure you have to >> iteratively parse >> the first x-1 components. With the offset structure you cane >> directly >> access to the firs x components. >> >> I don't get it. What you described only works if the >> "offset" is >> encoded in fixed bytes. With varNum, you will still need to >> parse x-1 >> offsets to get to the x offset. >> >> >> >> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >> wrote: >> >> On 17/09/2014 14:56, Mark Stapp wrote: >> >> ah, thanks - that's helpful. I thought you were saying "I >> like the >> existing NDN UTF8 'convention'." I'm still not sure I >> understand what >> you >> _do_ prefer, though. it sounds like you're describing an >> entirely >> different >> scheme where the info that describes the name-components is >> ... >> someplace >> other than _in_ the name-components. is that correct? when >> you say >> "field >> separator", what do you mean (since that's not a "TL" from a >> TLV)? >> >> Correct. >> In particular, with our name encoding, a TLV indicates the >> name >> hierarchy >> with offsets in the name and other TLV(s) indicates the >> offset to use >> in >> order to retrieve special components. >> As for the field separator, it is something like "/". >> Aliasing is >> avoided as >> you do not rely on field separators to parse the name; you >> use the >> "offset >> TLV " to do that. >> >> So now, it may be an aesthetic question but: >> >> if you do not need the entire hierarchal structure (suppose >> you only >> want >> the first x components) you can directly have it using the >> offsets. >> With the >> Nested TLV structure you have to iteratively parse the first >> x-1 >> components. >> With the offset structure you cane directly access to the >> firs x >> components. >> >> Max >> >> >> -- Mark >> >> On 9/17/14 6:02 AM, Massimo Gallo wrote: >> >> The why is simple: >> >> You use a lot of "generic component type" and very few >> "specific >> component type". You are imposing types for every component >> in order >> to >> handle few exceptions (segmentation, etc..). You create a >> rule >> (specify >> the component's type ) to handle exceptions! >> >> I would prefer not to have typed components. Instead I would >> prefer >> to >> have the name as simple sequence bytes with a field >> separator. Then, >> outside the name, if you have some components that could be >> used at >> network layer (e.g. a TLV field), you simply need something >> that >> indicates which is the offset allowing you to retrieve the >> version, >> segment, etc in the name... >> >> >> Max >> >> >> >> >> >> On 16/09/2014 20:33, Mark Stapp wrote: >> >> On 9/16/14 10:29 AM, Massimo Gallo wrote: >> >> I think we agree on the small number of "component types". >> However, if you have a small number of types, you will end >> up with >> names >> containing many generic components types and few specific >> components >> types. Due to the fact that the component type specification >> is an >> exception in the name, I would prefer something that specify >> component's >> type only when needed (something like UTF8 conventions but >> that >> applications MUST use). >> >> so ... I can't quite follow that. the thread has had some >> explanation >> about why the UTF8 requirement has problems (with aliasing, >> e.g.) >> and >> there's been email trying to explain that applications don't >> have to >> use types if they don't need to. your email sounds like "I >> prefer >> the >> UTF8 convention", but it doesn't say why you have that >> preference in >> the face of the points about the problems. can you say why >> it is >> that >> you express a preference for the "convention" with problems ? >> >> Thanks, >> Mark >> >> . >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> From felix at rabe.io Thu Sep 25 00:15:20 2014 From: felix at rabe.io (Felix Rabe) Date: Thu, 25 Sep 2014 09:15:20 +0200 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <5423C108.5020106@rabe.io> On 24/Sep/14 18:25, Marc.Mosko at parc.com wrote: > I think Tai-Lin?s example was just fine to talk about discovery. > /blah/blah/value, how do you discover all the ?value?s? Discovery > shouldn?t care if its email messages or temperature readings or world > cup photos. (Sorry, have only skim-read the discussion, so this point might have already been made.) How about a /blah/blah/// scheme? The would be a magic marker that amounts to "ls /blah/blah", the makes sure that I will be able to iterate a specific (fresh) version of this list as it exists at this very moment directly from the publisher, and is there in case the list is too large. (There might be a selector between and for stuff like limit queries, e.g. to get all records from "sa..." to "sb..." from a database - thinking about the LevelDB model here.) - Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: From Marc.Mosko at parc.com Thu Sep 25 00:17:21 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Thu, 25 Sep 2014 07:17:21 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> My beating on ?discover all? is exactly because of this. Let?s define discovery service. If the service is just ?discover latest? (left/right), can we not simplify the current approach? If the service includes more than ?latest?, then is the current approach the right approach? Sync has its place and is the right solution for somethings. However, it should not be a a bandage over discovery. Discovery should be its own valid and useful service. I agree that the exclusion approach can work, and work relatively well, for finding the rightmost/leftmost child. I believe this is because that operation is transitive through caches. So, within whatever timeout an application is willing to wait to find the ?latest?, it can keep asking and asking. I do think it would be best to actually try to ask an authoritative source first (i.e. a non-cached value), and if that fails then probe caches, but experimentation may show what works well. This is based on my belief that in the real world in broad use, the namespace will become pretty polluted and probing will result in a lot of junk, but that?s future prognosticating. Also, in the exact match vs. continuation match of content object to interest, it is pretty easy to encode that ?selector? request in a name component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) and any participating cache can respond with a link (or encapsulate) a response in an exact match system. In the CCNx 1.0 spec, one could also encode this a different way. One could use a name like ?/mail/inbox/selector_matching/? and in the payload include "exclude_before=(t=version, l=2, v=279) & sort=right?. This means that any cache that could process the ? selector_matching? function could look at the interest payload and evaluate the predicate there. The predicate could become large and not pollute the PIT with all the computation state. Including ?? in the name means that one could get a cached response if someone else had asked the same exact question (subject to the content object?s cache lifetime) and it also servers to multiplex different payloads for the same function (selector_matching). Marc On Sep 25, 2014, at 8:18 AM, Burke, Jeff wrote: > > http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf > > https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/SynchronizationPr > otocol.html > > J. > > > On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: > >> However, I cannot see whether we can achieve "best-effort *all*-value" >> efficiently. >> There are still interesting topics on >> 1. how do we express the discovery query? >> 2. is selector "discovery-complete"? i. e. can we express any >> discovery query with current selector? >> 3. if so, can we re-express current selector in a more efficient way? >> >> I personally see a named data as a set, which can then be categorized >> into "ordered set", and "unordered set". >> some questions that any discovery expression must solve: >> 1. is this a nil set or not? nil set means that this name is the leaf >> 2. set contains member X? >> 3. is set ordered or not >> 4. (ordered) first, prev, next, last >> 5. if we enforce component ordering, answer question 4. >> 6. recursively answer all questions above on any set member >> >> >> >> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >> wrote: >>> >>> >>> From: >>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>> To: Jeff Burke >>> Cc: , , >>> >>> Subject: Re: [Ndn-interest] any comments on naming convention? >>> >>> I think Tai-Lin?s example was just fine to talk about discovery. >>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>> shouldn?t >>> care if its email messages or temperature readings or world cup photos. >>> >>> >>> This is true if discovery means "finding everything" - in which case, >>> as you >>> point out, sync-style approaches may be best. But I am not sure that >>> this >>> definition is complete. The most pressing example that I can think of >>> is >>> best-effort latest-value, in which the consumer's goal is to get the >>> latest >>> copy the network can deliver at the moment, and may not care about >>> previous >>> values or (if freshness is used well) potential later versions. >>> >>> Another case that seems to work well is video seeking. Let's say I >>> want to >>> enable random access to a video by timecode. The publisher can provide a >>> time-code based discovery namespace that's queried using an Interest >>> that >>> essentially says "give me the closest keyframe to 00:37:03:12", which >>> returns an interest that, via the name, provides the exact timecode of >>> the >>> keyframe in question and a link to a segment-based namespace for >>> efficient >>> exact match playout. In two roundtrips and in a very lightweight way, >>> the >>> consumer has random access capability. If the NDN is the moral >>> equivalent >>> of IP, then I am not sure we should be afraid of roundtrips that provide >>> this kind of functionality, just as they are used in TCP. >>> >>> >>> I described one set of problems using the exclusion approach, and that >>> an >>> NDN paper on device discovery described a similar problem, though they >>> did >>> not go into the details of splitting interests, etc. That all was >>> simple >>> enough to see from the example. >>> >>> Another question is how does one do the discovery with exact match >>> names, >>> which is also conflating things. You could do a different discovery >>> with >>> continuation names too, just not the exclude method. >>> >>> As I alluded to, one needs a way to talk with a specific cache about its >>> ?table of contents? for a prefix so one can get a consistent set of >>> results >>> without all the round-trips of exclusions. Actually downloading the >>> ?headers? of the messages would be the same bytes, more or less. In a >>> way, >>> this is a little like name enumeration from a ccnx 0.x repo, but that >>> protocol has its own set of problems and I?m not suggesting to use that >>> directly. >>> >>> One approach is to encode a request in a name component and a >>> participating >>> cache can reply. It replies in such a way that one could continue >>> talking >>> with that cache to get its TOC. One would then issue another interest >>> with >>> a request for not-that-cache. >>> >>> >>> I'm curious how the TOC approach works in a multi-publisher scenario? >>> >>> >>> Another approach is to try to ask the authoritative source for the >>> ?current? >>> manifest name, i.e. /mail/inbox/current/, which could return the >>> manifest or a link to the manifest. Then fetching the actual manifest >>> from >>> the link could come from caches because you how have a consistent set of >>> names to ask for. If you cannot talk with an authoritative source, you >>> could try again without the nonce and see if there?s a cached copy of a >>> recent version around. >>> >>> Marc >>> >>> >>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: >>> >>> >>> >>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>> wrote: >>> >>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>> >>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>> pattern with static (/mail/inbox) and variable (148) components; with >>> proper naming convention, computers can also detect this pattern >>> easily. Now I want to look for all mails in my inbox. I can generate a >>> list of /mail/inbox/. These are my guesses, and with selectors >>> I can further refine my guesses. >>> >>> >>> I think this is a very bad example (or at least a very bad application >>> design). You have an app (a mail server / inbox) and you want it to >>> list >>> your emails? An email list is an application data structure. I don?t >>> think you should use the network structure to reflect this. >>> >>> >>> I think Tai-Lin is trying to sketch a small example, not propose a >>> full-scale approach to email. (Maybe I am misunderstanding.) >>> >>> >>> Another way to look at it is that if the network architecture is >>> providing >>> the equivalent of distributed storage to the application, perhaps the >>> application data structure could be adapted to match the affordances of >>> the network. Then it would not be so bad that the two structures were >>> aligned. >>> >>> >>> I?ll give you an example, how do you delete emails from your inbox? If >>> an >>> email was cached in the network it can never be deleted from your inbox? >>> >>> >>> This is conflating two issues - what you are pointing out is that the >>> data >>> structure of a linear list doesn't handle common email management >>> operations well. Again, I'm not sure if that's what he was getting at >>> here. But deletion is not the issue - the availability of a data object >>> on the network does not necessarily mean it's valid from the perspective >>> of the application. >>> >>> Or moved to another mailbox? Do you rely on the emails expiring? >>> >>> This problem is true for most (any?) situations where you use network >>> name >>> structure to directly reflect the application data structure. >>> >>> >>> Not sure I understand how you make the leap from the example to the >>> general statement. >>> >>> Jeff >>> >>> >>> >>> >>> Nacho >>> >>> >>> >>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>> >>> Ok, yes I think those would all be good things. >>> >>> One thing to keep in mind, especially with things like time series >>> sensor >>> data, is that people see a pattern and infer a way of doing it. That?s >>> easy >>> for a human :) But in Discovery, one should assume that one does not >>> know >>> of patterns in the data beyond what the protocols used to publish the >>> data >>> explicitly require. That said, I think some of the things you listed >>> are >>> good places to start: sensor data, web content, climate data or genome >>> data. >>> >>> We also need to state what the forwarding strategies are and what the >>> cache >>> behavior is. >>> >>> I outlined some of the points that I think are important in that other >>> posting. While ?discover latest? is useful, ?discover all? is also >>> important, and that one gets complicated fast. So points like >>> separating >>> discovery from retrieval and working with large data sets have been >>> important in shaping our thinking. That all said, I?d be happy >>> starting >>> from 0 and working through the Discovery service definition from >>> scratch >>> along with data set use cases. >>> >>> Marc >>> >>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>> wrote: >>> >>> Hi Marc, >>> >>> Thanks ? yes, I saw that as well. I was just trying to get one step >>> more >>> specific, which was to see if we could identify a few specific use >>> cases >>> around which to have the conversation. (e.g., time series sensor data >>> and >>> web content retrieval for "get latest"; climate data for huge data >>> sets; >>> local data in a vehicular network; etc.) What have you been looking at >>> that's driving considerations of discovery? >>> >>> Thanks, >>> Jeff >>> >>> From: >>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>> To: Jeff Burke >>> Cc: , >>> Subject: Re: [Ndn-interest] any comments on naming convention? >>> >>> Jeff, >>> >>> Take a look at my posting (that Felix fixed) in a new thread on >>> Discovery. >>> >>> >>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >>> 0 >>> .html >>> >>> I think it would be very productive to talk about what Discovery should >>> do, >>> and not focus on the how. It is sometimes easy to get caught up in the >>> how, >>> which I think is a less important topic than the what at this stage. >>> >>> Marc >>> >>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>> wrote: >>> >>> Marc, >>> >>> If you can't talk about your protocols, perhaps we can discuss this >>> based >>> on use cases. What are the use cases you are using to evaluate >>> discovery? >>> >>> Jeff >>> >>> >>> >>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>> wrote: >>> >>> No matter what the expressiveness of the predicates if the forwarder >>> can >>> send interests different ways you don't have a consistent underlying >>> set >>> to talk about so you would always need non-range exclusions to discover >>> every version. >>> >>> Range exclusions only work I believe if you get an authoritative >>> answer. >>> If different content pieces are scattered between different caches I >>> don't see how range exclusions would work to discover every version. >>> >>> I'm sorry to be pointing out problems without offering solutions but >>> we're not ready to publish our discovery protocols. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>> >>> I see. Can you briefly describe how ccnx discovery protocol solves the >>> all problems that you mentioned (not just exclude)? a doc will be >>> better. >>> >>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>> expect [and] and [or], so boolean algebra is fully supported. Regular >>> language or context free language might become part of selector too. >>> >>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>> That will get you one reading then you need to exclude it and ask >>> again. >>> >>> Sent from my telephone >>> >>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. >>> >>> >>> I am very confused. For your example, if I want to get all today's >>> sensor data, I just do (Any..Last second of last day)(First second of >>> tomorrow..Any). That's 18 bytes. >>> >>> >>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>> >>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>> >>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>> >>> If you talk sometimes to A and sometimes to B, you very easily >>> could miss content objects you want to discovery unless you avoid >>> all range exclusions and only exclude explicit versions. >>> >>> >>> Could you explain why missing content object situation happens? also >>> range exclusion is just a shorter notation for many explicit >>> exclude; >>> converting from explicit excludes to ranged exclude is always >>> possible. >>> >>> >>> Yes, my point was that if you cannot talk about a consistent set >>> with a particular cache, then you need to always use individual >>> excludes not range excludes if you want to discover all the versions >>> of an object. For something like a sensor reading that is updated, >>> say, once per second you will have 86,400 of them per day. If each >>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>> exclusions (plus encoding overhead) per day. >>> >>> yes, maybe using a more deterministic version number than a >>> timestamp makes sense here, but its just an example of needing a lot >>> of exclusions. >>> >>> >>> You exclude through 100 then issue a new interest. This goes to >>> cache B >>> >>> >>> I feel this case is invalid because cache A will also get the >>> interest, and cache A will return v101 if it exists. Like you said, >>> if >>> this goes to cache B only, it means that cache A dies. How do you >>> know >>> that v101 even exist? >>> >>> >>> I guess this depends on what the forwarding strategy is. If the >>> forwarder will always send each interest to all replicas, then yes, >>> modulo packet loss, you would discover v101 on cache A. If the >>> forwarder is just doing ?best path? and can round-robin between cache >>> A and cache B, then your application could miss v101. >>> >>> >>> >>> c,d In general I agree that LPM performance is related to the number >>> of components. In my own thread-safe LMP implementation, I used only >>> one RWMutex for the whole tree. I don't know whether adding lock for >>> every node will be faster or not because of lock overhead. >>> >>> However, we should compare (exact match + discovery protocol) vs >>> (ndn >>> lpm). Comparing performance of exact match to lpm is unfair. >>> >>> >>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>> specs for doing the exact match discovery. So, as I said, I?m not >>> ready to claim its better yet because we have not done that. >>> >>> >>> >>> >>> >>> >>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>> I would point out that using LPM on content object to Interest >>> matching to do discovery has its own set of problems. Discovery >>> involves more than just ?latest version? discovery too. >>> >>> This is probably getting off-topic from the original post about >>> naming conventions. >>> >>> a. If Interests can be forwarded multiple directions and two >>> different caches are responding, the exclusion set you build up >>> talking with cache A will be invalid for cache B. If you talk >>> sometimes to A and sometimes to B, you very easily could miss >>> content objects you want to discovery unless you avoid all range >>> exclusions and only exclude explicit versions. That will lead to >>> very large interest packets. In ccnx 1.0, we believe that an >>> explicit discovery protocol that allows conversations about >>> consistent sets is better. >>> >>> b. Yes, if you just want the ?latest version? discovery that >>> should be transitive between caches, but imagine this. You send >>> Interest #1 to cache A which returns version 100. You exclude >>> through 100 then issue a new interest. This goes to cache B who >>> only has version 99, so the interest times out or is NACK?d. So >>> you think you have it! But, cache A already has version 101, you >>> just don?t know. If you cannot have a conversation around >>> consistent sets, it seems like even doing latest version discovery >>> is difficult with selector based discovery. From what I saw in >>> ccnx 0.x, one ended up getting an Interest all the way to the >>> authoritative source because you can never believe an intermediate >>> cache that there?s not something more recent. >>> >>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>> interest in seeing your analysis. Case (a) is that a node can >>> correctly discover every version of a name prefix, and (b) is that >>> a node can correctly discover the latest version. We have not >>> formally compared (or yet published) our discovery protocols (we >>> have three, 2 for content, 1 for device) compared to selector based >>> discovery, so I cannot yet claim they are better, but they do not >>> have the non-determinism sketched above. >>> >>> c. Using LPM, there is a non-deterministic number of lookups you >>> must do in the PIT to match a content object. If you have a name >>> tree or a threaded hash table, those don?t all need to be hash >>> lookups, but you need to walk up the name tree for every prefix of >>> the content object name and evaluate the selector predicate. >>> Content Based Networking (CBN) had some some methods to create data >>> structures based on predicates, maybe those would be better. But >>> in any case, you will potentially need to retrieve many PIT entries >>> if there is Interest traffic for many prefixes of a root. Even on >>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>> implementation only requires at most 3 lookups (one by name, one by >>> name + keyid, one by name + content object hash), and one can do >>> other things to optimize lookup for an extra write. >>> >>> d. In (c) above, if you have a threaded name tree or are just >>> walking parent pointers, I suspect you?ll need locking of the >>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>> and that will be expensive. It would be interesting to see what a >>> cache consistent multi-threaded name tree looks like. >>> >>> Marc >>> >>> >>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>> wrote: >>> >>> I had thought about these questions, but I want to know your idea >>> besides typed component: >>> 1. LPM allows "data discovery". How will exact match do similar >>> things? >>> 2. will removing selectors improve performance? How do we use >>> other >>> faster technique to replace selector? >>> 3. fixed byte length and type. I agree more that type can be fixed >>> byte, but 2 bytes for length might not be enough for future. >>> >>> >>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>> wrote: >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> >>> Could you share it with us? >>> >>> Sure. Here?s a strawman. >>> >>> The type space is 16 bits, so you have 65,565 types. >>> >>> The type space is currently shared with the types used for the >>> entire protocol, that gives us two options: >>> (1) we reserve a range for name component types. Given the >>> likelihood there will be at least as much and probably more need >>> to component types than protocol extensions, we could reserve 1/2 >>> of the type space, giving us 32K types for name components. >>> (2) since there is no parsing ambiguity between name components >>> and other fields of the protocol (sine they are sub-types of the >>> name type) we could reuse numbers and thereby have an entire 65K >>> name component types. >>> >>> We divide the type space into regions, and manage it with a >>> registry. If we ever get to the point of creating an IETF >>> standard, IANA has 25 years of experience running registries and >>> there are well-understood rule sets for different kinds of >>> registries (open, requires a written spec, requires standards >>> approval). >>> >>> - We allocate one ?default" name component type for ?generic >>> name?, which would be used on name prefixes and other common >>> cases where there are no special semantics on the name component. >>> - We allocate a range of name component types, say 1024, to >>> globally understood types that are part of the base or extension >>> NDN specifications (e.g. chunk#, version#, etc. >>> - We reserve some portion of the space for unanticipated uses >>> (say another 1024 types) >>> - We give the rest of the space to application assignment. >>> >>> Make sense? >>> >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design >>> >>> >>> we could design for performance, >>> >>> That?s not what people are advocating. We are advocating that we >>> *not* design for known bad performance and hope serendipity or >>> Moore?s Law will come to the rescue. >>> >>> but I think there will be a turning >>> point when the slower design starts to become "fast enough?. >>> >>> Perhaps, perhaps not. Relative performance is what matters so >>> things that don?t get faster while others do tend to get dropped >>> or not used because they impose a performance penalty relative to >>> the things that go faster. There is also the ?low-end? phenomenon >>> where impovements in technology get applied to lowering cost >>> rather than improving performance. For those environments bad >>> performance just never get better. >>> >>> Do you >>> think there will be some design of ndn that will *never* have >>> performance improvement? >>> >>> I suspect LPM on data will always be slow (relative to the other >>> functions). >>> i suspect exclusions will always be slow because they will >>> require extra memory references. >>> >>> However I of course don?t claim to clairvoyance so this is just >>> speculation based on 35+ years of seeing performance improve by 4 >>> orders of magnitude and still having to worry about counting >>> cycles and memory references? >>> >>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>> wrote: >>> >>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>> wrote: >>> >>> We should not look at a certain chip nowadays and want ndn to >>> perform >>> well on it. It should be the other way around: once ndn app >>> becomes >>> popular, a better chip will be designed for ndn. >>> >>> While I?m sympathetic to that view, there are three ways in >>> which Moore?s law or hardware tricks will not save us from >>> performance flaws in the design: >>> a) clock rates are not getting (much) faster >>> b) memory accesses are getting (relatively) more expensive >>> c) data structures that require locks to manipulate >>> successfully will be relatively more expensive, even with >>> near-zero lock contention. >>> >>> The fact is, IP *did* have some serious performance flaws in >>> its design. We just forgot those because the design elements >>> that depended on those mistakes have fallen into disuse. The >>> poster children for this are: >>> 1. IP options. Nobody can use them because they are too slow >>> on modern forwarding hardware, so they can?t be reliably used >>> anywhere >>> 2. the UDP checksum, which was a bad design when it was >>> specified and is now a giant PITA that still causes major pain >>> in working around. >>> >>> I?m afraid students today are being taught the that designers >>> of IP were flawless, as opposed to very good scientists and >>> engineers that got most of it right. >>> >>> I feel the discussion today and yesterday has been off-topic. >>> Now I >>> see that there are 3 approaches: >>> 1. we should not define a naming convention at all >>> 2. typed component: use tlv type space and add a handful of >>> types >>> 3. marked component: introduce only one more type and add >>> additional >>> marker space >>> >>> I know how to make #2 flexible enough to do what things I can >>> envision we need to do, and with a few simple conventions on >>> how the registry of types is managed. >>> >>> It is just as powerful in practice as either throwing up our >>> hands and letting applications design their own mutually >>> incompatible schemes or trying to make naming conventions with >>> markers in a way that is fast to generate/parse and also >>> resilient against aliasing. >>> >>> Also everybody thinks that the current utf8 marker naming >>> convention >>> needs to be revised. >>> >>> >>> >>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>> wrote: >>> Would that chip be suitable, i.e. can we expect most names >>> to fit in (the >>> magnitude of) 96 bytes? What length are names usually in >>> current NDN >>> experiments? >>> >>> I guess wide deployment could make for even longer names. >>> Related: Many URLs >>> I encounter nowadays easily don't fit within two 80-column >>> text lines, and >>> NDN will have to carry more information than URLs, as far as >>> I see. >>> >>> >>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>> >>> In fact, the index in separate TLV will be slower on some >>> architectures, >>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>> bytes in memory, >>> then any subsequent memory is accessed only as two adjacent >>> 32-byte blocks >>> (there can be at most 5 blocks available at any one time). >>> If you need to >>> switch between arrays, it would be very expensive. If you >>> have to read past >>> the name to get to the 2nd array, then read it, then backup >>> to get to the >>> name, it will be pretty expensive too. >>> >>> Marc >>> >>> On Sep 18, 2014, at 2:02 PM, >>> wrote: >>> >>> Does this make that much difference? >>> >>> If you want to parse the first 5 components. One way to do >>> it is: >>> >>> Read the index, find entry 5, then read in that many bytes >>> from the start >>> offset of the beginning of the name. >>> OR >>> Start reading name, (find size + move ) 5 times. >>> >>> How much speed are you getting from one to the other? You >>> seem to imply >>> that the first one is faster. I don?t think this is the >>> case. >>> >>> In the first one you?ll probably have to get the cache line >>> for the index, >>> then all the required cache lines for the first 5 >>> components. For the >>> second, you?ll have to get all the cache lines for the first >>> 5 components. >>> Given an assumption that a cache miss is way more expensive >>> than >>> evaluating a number and computing an addition, you might >>> find that the >>> performance of the index is actually slower than the >>> performance of the >>> direct access. >>> >>> Granted, there is a case where you don?t access the name at >>> all, for >>> example, if you just get the offsets and then send the >>> offsets as >>> parameters to another processor/GPU/NPU/etc. In this case >>> you may see a >>> gain IF there are more cache line misses in reading the name >>> than in >>> reading the index. So, if the regular part of the name >>> that you?re >>> parsing is bigger than the cache line (64 bytes?) and the >>> name is to be >>> processed by a different processor, then your might see some >>> performance >>> gain in using the index, but in all other circumstances I >>> bet this is not >>> the case. I may be wrong, haven?t actually tested it. >>> >>> This is all to say, I don?t think we should be designing the >>> protocol with >>> only one architecture in mind. (The architecture of sending >>> the name to a >>> different processor than the index). >>> >>> If you have numbers that show that the index is faster I >>> would like to see >>> under what conditions and architectural assumptions. >>> >>> Nacho >>> >>> (I may have misinterpreted your description so feel free to >>> correct me if >>> I?m wrong.) >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>> >>> wrote: >>> >>> Indeed each components' offset must be encoded using a fixed >>> amount of >>> bytes: >>> >>> i.e., >>> Type = Offsets >>> Length = 10 Bytes >>> Value = Offset1(1byte), Offset2(1byte), ... >>> >>> You may also imagine to have a "Offset_2byte" type if your >>> name is too >>> long. >>> >>> Max >>> >>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want the first x components) you can directly have it using >>> the >>> offsets. With the Nested TLV structure you have to >>> iteratively parse >>> the first x-1 components. With the offset structure you cane >>> directly >>> access to the firs x components. >>> >>> I don't get it. What you described only works if the >>> "offset" is >>> encoded in fixed bytes. With varNum, you will still need to >>> parse x-1 >>> offsets to get to the x offset. >>> >>> >>> >>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>> wrote: >>> >>> On 17/09/2014 14:56, Mark Stapp wrote: >>> >>> ah, thanks - that's helpful. I thought you were saying "I >>> like the >>> existing NDN UTF8 'convention'." I'm still not sure I >>> understand what >>> you >>> _do_ prefer, though. it sounds like you're describing an >>> entirely >>> different >>> scheme where the info that describes the name-components is >>> ... >>> someplace >>> other than _in_ the name-components. is that correct? when >>> you say >>> "field >>> separator", what do you mean (since that's not a "TL" from a >>> TLV)? >>> >>> Correct. >>> In particular, with our name encoding, a TLV indicates the >>> name >>> hierarchy >>> with offsets in the name and other TLV(s) indicates the >>> offset to use >>> in >>> order to retrieve special components. >>> As for the field separator, it is something like "/". >>> Aliasing is >>> avoided as >>> you do not rely on field separators to parse the name; you >>> use the >>> "offset >>> TLV " to do that. >>> >>> So now, it may be an aesthetic question but: >>> >>> if you do not need the entire hierarchal structure (suppose >>> you only >>> want >>> the first x components) you can directly have it using the >>> offsets. >>> With the >>> Nested TLV structure you have to iteratively parse the first >>> x-1 >>> components. >>> With the offset structure you cane directly access to the >>> firs x >>> components. >>> >>> Max >>> >>> >>> -- Mark >>> >>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>> >>> The why is simple: >>> >>> You use a lot of "generic component type" and very few >>> "specific >>> component type". You are imposing types for every component >>> in order >>> to >>> handle few exceptions (segmentation, etc..). You create a >>> rule >>> (specify >>> the component's type ) to handle exceptions! >>> >>> I would prefer not to have typed components. Instead I would >>> prefer >>> to >>> have the name as simple sequence bytes with a field >>> separator. Then, >>> outside the name, if you have some components that could be >>> used at >>> network layer (e.g. a TLV field), you simply need something >>> that >>> indicates which is the offset allowing you to retrieve the >>> version, >>> segment, etc in the name... >>> >>> >>> Max >>> >>> >>> >>> >>> >>> On 16/09/2014 20:33, Mark Stapp wrote: >>> >>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>> >>> I think we agree on the small number of "component types". >>> However, if you have a small number of types, you will end >>> up with >>> names >>> containing many generic components types and few specific >>> components >>> types. Due to the fact that the component type specification >>> is an >>> exception in the name, I would prefer something that specify >>> component's >>> type only when needed (something like UTF8 conventions but >>> that >>> applications MUST use). >>> >>> so ... I can't quite follow that. the thread has had some >>> explanation >>> about why the UTF8 requirement has problems (with aliasing, >>> e.g.) >>> and >>> there's been email trying to explain that applications don't >>> have to >>> use types if they don't need to. your email sounds like "I >>> prefer >>> the >>> UTF8 convention", but it doesn't say why you have that >>> preference in >>> the face of the points about the problems. can you say why >>> it is >>> that >>> you express a preference for the "convention" with problems ? >>> >>> Thanks, >>> Mark >>> >>> . >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Ignacio.Solis at parc.com Thu Sep 25 00:45:37 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Thu, 25 Sep 2014 07:45:37 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: On 9/24/14, 6:30 PM, "Burke, Jeff" wrote: >Ok, so sync-style approaches may work better for this example as Marc >already pointed out, but nonetheless... (Marc, I am catching up on emails >and will respond to that shortly.) Sync can be done without selectors. :-) >>? >>A20- You publish /username/mailbox/list/20 (lifetime of 1 second) > > >This isn't 20 steps. First, no data leaves the publisher without an >Interest. Second, it's more like one API call: make this list available >as versioned Data with a minimum allowable time between responses of one >second. No matter how many requests outstanding, a stable load on the >source. Agreed. I wasn?t counting this as overhead. >> >>B- You request /username/mailbox/list >> >>C- You receive /username/mailbox/list/20 (lifetime of 1 second) > >At this point, you decide if list v20 is sufficient for your purposes. >Perhaps it is. This is true in the non selector case as well. > >Some thoughts: > >- In Scheme B, if the list has not changed, you still get a response, >because the publisher has no way to know anything about the consumer's >knowledge. In Scheme A, publishers have that knowledge from the exclusion >and need not reply. This is true without selectors. > If NACKs are used as heartbeats, they can be returned >more slowly... say every 3-10 seconds. So, many data packets are >potentially saved. Hopefully we don't get one email per second... :) I?m not sure what you mean by this. I wouldn?t recommend relying on not-answering for valid requests. Otherwise you start relying on timeouts. >- Benefit seems apparent in multi-consumer scenarios, even without sync. >Let's say I have 5 personal devices requesting mail. In Scheme B, every >publisher receives and processes 5 interests per second on average. In >Scheme A, with an upstream caching node, each receives 1 per second >maximum. The publisher still has to throttle requests, but with no help >or scaling support from the network. This can be done without selectors. As long as all the clients produce a request for the same name they can take advantage caching. Nacho >> >> >>In Scheme A you sent 2 interests, received 2 objects, going all the way >>to >>source. >>In Scheme B you sent 1 interest, received 1 object, going all the way to >>source. >> >>Scheme B is always better (doesn?t need to do C, D) for this example and >>it uses exact matching. > >It's better if your metric is roundtrips and you don't care about load on >the publisher, lower traffic in times of no new data, etc. But if you >don't, then you can certainly implement Scheme B on NDN, too. > >Jeff > >> >>You can play tricks with the lifetime of the object in both cases, >>selectors or not. >> >>> >>>- meanwhile, the email client can retrieve the emails using the names >>>obtained in these lists. Some emails may turn out to be unnecessary, so >>>they will be discarded when a most recent list comes. The email client >>>can also keep state about the names of the emails it has deleted to >>>minimize this problem. >> >>This is independent of selectors / exact matching. >> >>Nacho >> >> >> >>> >>>On Sep 24, 2014, at 2:37 AM, Marc.Mosko at parc.com wrote: >>> >>>> Ok, let?s take that example and run with it a bit. I?ll walk through >>>>a >>>>?discover all? example. This example leads me to why I say discovery >>>>should be separate from data retrieval. I don?t claim that we have a >>>>final solution to this problem, I think in a distributed peer-to-peer >>>>environment solving this problem is difficult. If you have a counter >>>>example as to how this discovery could progress using only the >>>>information know a priori by the requester, I would be interesting in >>>>seeing that example worked out. Please do correct me if you think this >>>>is wrong. >>>> >>>> You have mails that were originally numbered 0 - 10000, sequentially >>>>by >>>>the server. >>>> >>>> You travel between several places and access different emails from >>>>different places. This populates caches. Lets say 0,3,6,9,? are on >>>>cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. >>>>Also, you have deleted 500 random emails, so there?s only 9500 emails >>>>actually out there. >>>> >>>> You setup a new computer and now want to download all your emails. >>>>The >>>>new computer is on the path of caches C, B, then A, then the >>>>authoritative source server. The new email program has no initial >>>>state. The email program only knows that the email number is an >>>>integer >>>>that starts at 0. It issues an interest for /mail/inbox, and asks for >>>>left-most child because it want to populate in order. It gets a >>>>response from cache C with mail 2. >>>> >>>> Now, what does the email program do? It cannot exclude the range 0..2 >>>>because that would possibly miss 0 and 1. So, all it can do is exclude >>>>the exact number ?2? and ask again. It then gets cache C again and it >>>>responds with ?5?. There are about 3000 emails on cache C, and if they >>>>all take 4 bytes (for the exclude component plus its coding overhead), >>>>then that?s 12KB of exclusions to finally exhaust cache C. >>>> >>>> If we want Interests to avoid fragmentation, we can fit about 1200 >>>>bytes of exclusions, or 300 components. This means we need about 10 >>>>interest messages. Each interest would be something like ?exclude >>>>2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, >>>>308, >>>>?, >600?, etc. >>>> >>>> Those interests that exclude everything at cache C would then hit, say >>>>cache B and start getting results 1, 4, 7, ?. This means an Interest >>>>like ?exclude 2,5,8,11,?, >300? would then get back number 1. That >>>>means the next request actually has to split that one interest?s >>>>exclude >>>>in to two (because the interest was at maximum size), so you now issue >>>>two interests where one is ?exclude 1, 2, 5, 8, >210? and the other is >>>>?<210, 212, 215, ?, >300?. >>>> >>>> If you look in the CCNx 0.8 java code, there should be a class that >>>>does these Interest based discoveries and does the Interest splitting >>>>based on the currently know range of discovered content. I don?t have >>>>the specific reference right now, but I can send a link if you are >>>>interested in seeing that. The java class keeps state of what has been >>>>discovered so far, so it could re-start later if interrupted. >>>> >>>> So all those interests would now be getting results form cache B. You >>>>would then start to split all those ranges to accommodate the numbers >>>>coming back from B. Eventually, you?ll have at least 10 Interest >>>>messages outstanding that would be excluding all the 9500 messages that >>>>are in caches A, B, and C. Some of those interest messages might >>>>actually reach an authoritative server, which might respond too. It >>>>would like be more than 10 interests due to the algorithm that?s used >>>>to >>>>split full interests, which likely is not optimal because it does not >>>>know exactly where breaks should be a priori. >>>> >>>> Once you have exhausted caches A, B, and C, the interest messages >>>>would >>>>reach the authoritative source (if its on line), and it would be >>>>issuing >>>>NACKs (i assume) for interests have have excluded all non-deleted >>>>emails. >>>> >>>> In any case, it takes, at best, 9,500 round trips to ?discover? all >>>>9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 >>>>bytes >>>>of Interest exclusions. Note that it?s an arithmetic sum of bytes of >>>>exclusion, because at each Interest the size of the exclusions >>>>increases >>>>by 4. There was an NDN paper about light bulb discovery (or something >>>>like that) that noted this same problem and proposed some work around, >>>>but I don?t remember what they proposed. >>>> >>>> Yes, you could possibly pipeline it, but what would you do? In this >>>>example, where emails 0 - 10000 (minus some random ones) would allow >>>>you >>>>? if you knew a priori ? to issue say 10 interest in parallel that ask >>>>for different ranges. But, 2 years from now your undeleted emails >>>>might >>>>range form 100,000 - 150,000. The point is that a discovery protocol >>>>does not know, a priori, what is to be discovered. It might start >>>>learning some stuff as it goes on. >>>> >>>> If you could have retrieved just a table of contents from each cache, >>>>where each ?row? is say 64 bytes (i.e. the name continuation plus hash >>>>value), you would need to retrieve 3300 * 64 = 211KB from each cache >>>>(total 640 KB) to list all the emails. That would take 640KB / 1200 = >>>>534 interest messages of say 64 bytes = 34 KB to discover all 9500 >>>>emails plus another set to fetch the header rows. That?s, say 68 KB of >>>>interest traffic compared to 200 MB. Now, I?ve not said how to list >>>>these tables of contents, so an actual protocol might be higher >>>>communication cost, but even if it was 10x worse that would still be an >>>>attractive tradeoff. >>>> >>>> This assumes that you publish just the ?header? in the 1st segment >>>>(say >>>>1 KB total object size including the signatures). That?s 10 MB to >>>>learn >>>>the headers. >>>> >>>> You could also argue that the distribute of emails over caches is >>>>arbitrary. That?s true, I picked a difficult sequence. But unless you >>>>have some positive controls on what could be in a cache, it could be >>>>any >>>>difficult sequence. I also did not address the timeout issue, and how >>>>do you know you are done? >>>> >>>> This is also why sync works so much better than doing raw interest >>>>discovery. Sync exchanges tables of contents and diffs, it does not >>>>need to enumerate by exclusion everything to retrieve. >>>> >>>> Marc >>>> >>>> >>>> >>>> On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu wrote: >>>> >>>>> discovery can be reduced to "pattern detection" (can we infer what >>>>> exists?) and "pattern validation" (can we confirm this guess?) >>>>> >>>>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>>> pattern with static (/mail/inbox) and variable (148) components; with >>>>> proper naming convention, computers can also detect this pattern >>>>> easily. Now I want to look for all mails in my inbox. I can generate >>>>>a >>>>> list of /mail/inbox/. These are my guesses, and with >>>>>selectors >>>>> I can further refine my guesses. >>>>> >>>>> To validate them, bloom filter can provide "best effort" >>>>> discovery(with some false positives, so I call it "best-effort") >>>>> before I stupidly send all the interests to the network. >>>>> >>>>> The discovery protocol, as I described above, is essentially "pattern >>>>> detection by naming convention" and "bloom filter validation." This >>>>>is >>>>> definitely one of the "simpler" discovery protocol, because the data >>>>> producer only need to add additional bloom filter. Notice that we can >>>>> progressively add entries to bfilter with low computation cost. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>> Ok, yes I think those would all be good things. >>>>>> >>>>>> One thing to keep in mind, especially with things like time series >>>>>>sensor >>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>That?s easy >>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>not >>>>>>know >>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>the >>>>>>data >>>>>> explicitly require. That said, I think some of the things you >>>>>>listed >>>>>>are >>>>>> good places to start: sensor data, web content, climate data or >>>>>>genome data. >>>>>> >>>>>> We also need to state what the forwarding strategies are and what >>>>>>the >>>>>>cache >>>>>> behavior is. >>>>>> >>>>>> I outlined some of the points that I think are important in that >>>>>>other >>>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>>> important, and that one gets complicated fast. So points like >>>>>>separating >>>>>> discovery from retrieval and working with large data sets have been >>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>starting >>>>>> from 0 and working through the Discovery service definition from >>>>>>scratch >>>>>> along with data set use cases. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>wrote: >>>>>> >>>>>> Hi Marc, >>>>>> >>>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>>more >>>>>> specific, which was to see if we could identify a few specific use >>>>>>cases >>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>data and >>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>>sets; >>>>>> local data in a vehicular network; etc.) What have you been looking >>>>>>at >>>>>> that's driving considerations of discovery? >>>>>> >>>>>> Thanks, >>>>>> Jeff >>>>>> >>>>>> From: >>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>> To: Jeff Burke >>>>>> Cc: , >>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>> >>>>>> Jeff, >>>>>> >>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>Discovery. >>>>>> >>>>>> >>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00 >>>>>>0 >>>>>>2 >>>>>>00.html >>>>>> >>>>>> I think it would be very productive to talk about what Discovery >>>>>>should do, >>>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>>the how, >>>>>> which I think is a less important topic than the what at this stage. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>wrote: >>>>>> >>>>>> Marc, >>>>>> >>>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>>based >>>>>> on use cases. What are the use cases you are using to evaluate >>>>>> discovery? >>>>>> >>>>>> Jeff >>>>>> >>>>>> >>>>>> >>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>wrote: >>>>>> >>>>>> No matter what the expressiveness of the predicates if the forwarder >>>>>>can >>>>>> send interests different ways you don't have a consistent underlying >>>>>>set >>>>>> to talk about so you would always need non-range exclusions to >>>>>>discover >>>>>> every version. >>>>>> >>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>answer. >>>>>> If different content pieces are scattered between different caches I >>>>>> don't see how range exclusions would work to discover every version. >>>>>> >>>>>> I'm sorry to be pointing out problems without offering solutions but >>>>>> we're not ready to publish our discovery protocols. >>>>>> >>>>>> Sent from my telephone >>>>>> >>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>>>> >>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>>the >>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>> better. >>>>>> >>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>soon >>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>Regular >>>>>> language or context free language might become part of selector too. >>>>>> >>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>> That will get you one reading then you need to exclude it and ask >>>>>> again. >>>>>> >>>>>> Sent from my telephone >>>>>> >>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. >>>>>> >>>>>> >>>>>> I am very confused. For your example, if I want to get all today's >>>>>> sensor data, I just do (Any..Last second of last day)(First second >>>>>>of >>>>>> tomorrow..Any). That's 18 bytes. >>>>>> >>>>>> >>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>> >>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>> >>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>wrote: >>>>>> >>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>> could miss content objects you want to discovery unless you avoid >>>>>> all range exclusions and only exclude explicit versions. >>>>>> >>>>>> >>>>>> Could you explain why missing content object situation happens? also >>>>>> range exclusion is just a shorter notation for many explicit >>>>>> exclude; >>>>>> converting from explicit excludes to ranged exclude is always >>>>>> possible. >>>>>> >>>>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. For something like a sensor reading that is updated, >>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>> exclusions (plus encoding overhead) per day. >>>>>> >>>>>> yes, maybe using a more deterministic version number than a >>>>>> timestamp makes sense here, but its just an example of needing a lot >>>>>> of exclusions. >>>>>> >>>>>> >>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>> cache B >>>>>> >>>>>> >>>>>> I feel this case is invalid because cache A will also get the >>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>> if >>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>> know >>>>>> that v101 even exist? >>>>>> >>>>>> >>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>cache >>>>>> A and cache B, then your application could miss v101. >>>>>> >>>>>> >>>>>> >>>>>> c,d In general I agree that LPM performance is related to the number >>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>> every node will be faster or not because of lock overhead. >>>>>> >>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>> (ndn >>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>> >>>>>> >>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>> ready to claim its better yet because we have not done that. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>> I would point out that using LPM on content object to Interest >>>>>> matching to do discovery has its own set of problems. Discovery >>>>>> involves more than just ?latest version? discovery too. >>>>>> >>>>>> This is probably getting off-topic from the original post about >>>>>> naming conventions. >>>>>> >>>>>> a. If Interests can be forwarded multiple directions and two >>>>>> different caches are responding, the exclusion set you build up >>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>> content objects you want to discovery unless you avoid all range >>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>> explicit discovery protocol that allows conversations about >>>>>> consistent sets is better. >>>>>> >>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>> should be transitive between caches, but imagine this. You send >>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>> you think you have it! But, cache A already has version 101, you >>>>>> just don?t know. If you cannot have a conversation around >>>>>> consistent sets, it seems like even doing latest version discovery >>>>>> is difficult with selector based discovery. From what I saw in >>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>> authoritative source because you can never believe an intermediate >>>>>> cache that there?s not something more recent. >>>>>> >>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>> a node can correctly discover the latest version. We have not >>>>>> formally compared (or yet published) our discovery protocols (we >>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>> have the non-determinism sketched above. >>>>>> >>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>> must do in the PIT to match a content object. If you have a name >>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>> the content object name and evaluate the selector predicate. >>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>> structures based on predicates, maybe those would be better. But >>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>> name + keyid, one by name + content object hash), and one can do >>>>>> other things to optimize lookup for an extra write. >>>>>> >>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>> and that will be expensive. It would be interesting to see what a >>>>>> cache consistent multi-threaded name tree looks like. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> I had thought about these questions, but I want to know your idea >>>>>> besides typed component: >>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>> things? >>>>>> 2. will removing selectors improve performance? How do we use >>>>>> other >>>>>> faster technique to replace selector? >>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>> >>>>>> >>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>> wrote: >>>>>> >>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> I know how to make #2 flexible enough to do what things I can >>>>>> envision we need to do, and with a few simple conventions on >>>>>> how the registry of types is managed. >>>>>> >>>>>> >>>>>> Could you share it with us? >>>>>> >>>>>> Sure. Here?s a strawman. >>>>>> >>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>> >>>>>> The type space is currently shared with the types used for the >>>>>> entire protocol, that gives us two options: >>>>>> (1) we reserve a range for name component types. Given the >>>>>> likelihood there will be at least as much and probably more need >>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>> of the type space, giving us 32K types for name components. >>>>>> (2) since there is no parsing ambiguity between name components >>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>> name component types. >>>>>> >>>>>> We divide the type space into regions, and manage it with a >>>>>> registry. If we ever get to the point of creating an IETF >>>>>> standard, IANA has 25 years of experience running registries and >>>>>> there are well-understood rule sets for different kinds of >>>>>> registries (open, requires a written spec, requires standards >>>>>> approval). >>>>>> >>>>>> - We allocate one ?default" name component type for ?generic >>>>>> name?, which would be used on name prefixes and other common >>>>>> cases where there are no special semantics on the name component. >>>>>> - We allocate a range of name component types, say 1024, to >>>>>> globally understood types that are part of the base or extension >>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>> - We reserve some portion of the space for unanticipated uses >>>>>> (say another 1024 types) >>>>>> - We give the rest of the space to application assignment. >>>>>> >>>>>> Make sense? >>>>>> >>>>>> >>>>>> While I?m sympathetic to that view, there are three ways in >>>>>> which Moore?s law or hardware tricks will not save us from >>>>>> performance flaws in the design >>>>>> >>>>>> >>>>>> we could design for performance, >>>>>> >>>>>> That?s not what people are advocating. We are advocating that we >>>>>> *not* design for known bad performance and hope serendipity or >>>>>> Moore?s Law will come to the rescue. >>>>>> >>>>>> but I think there will be a turning >>>>>> point when the slower design starts to become "fast enough?. >>>>>> >>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>> things that don?t get faster while others do tend to get dropped >>>>>> or not used because they impose a performance penalty relative to >>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>> where impovements in technology get applied to lowering cost >>>>>> rather than improving performance. For those environments bad >>>>>> performance just never get better. >>>>>> >>>>>> Do you >>>>>> think there will be some design of ndn that will *never* have >>>>>> performance improvement? >>>>>> >>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>> functions). >>>>>> i suspect exclusions will always be slow because they will >>>>>> require extra memory references. >>>>>> >>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>> orders of magnitude and still having to worry about counting >>>>>> cycles and memory references? >>>>>> >>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>> wrote: >>>>>> >>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>> perform >>>>>> well on it. It should be the other way around: once ndn app >>>>>> becomes >>>>>> popular, a better chip will be designed for ndn. >>>>>> >>>>>> While I?m sympathetic to that view, there are three ways in >>>>>> which Moore?s law or hardware tricks will not save us from >>>>>> performance flaws in the design: >>>>>> a) clock rates are not getting (much) faster >>>>>> b) memory accesses are getting (relatively) more expensive >>>>>> c) data structures that require locks to manipulate >>>>>> successfully will be relatively more expensive, even with >>>>>> near-zero lock contention. >>>>>> >>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>> its design. We just forgot those because the design elements >>>>>> that depended on those mistakes have fallen into disuse. The >>>>>> poster children for this are: >>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>> anywhere >>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>> specified and is now a giant PITA that still causes major pain >>>>>> in working around. >>>>>> >>>>>> I?m afraid students today are being taught the that designers >>>>>> of IP were flawless, as opposed to very good scientists and >>>>>> engineers that got most of it right. >>>>>> >>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>> Now I >>>>>> see that there are 3 approaches: >>>>>> 1. we should not define a naming convention at all >>>>>> 2. typed component: use tlv type space and add a handful of >>>>>> types >>>>>> 3. marked component: introduce only one more type and add >>>>>> additional >>>>>> marker space >>>>>> >>>>>> I know how to make #2 flexible enough to do what things I can >>>>>> envision we need to do, and with a few simple conventions on >>>>>> how the registry of types is managed. >>>>>> >>>>>> It is just as powerful in practice as either throwing up our >>>>>> hands and letting applications design their own mutually >>>>>> incompatible schemes or trying to make naming conventions with >>>>>> markers in a way that is fast to generate/parse and also >>>>>> resilient against aliasing. >>>>>> >>>>>> Also everybody thinks that the current utf8 marker naming >>>>>> convention >>>>>> needs to be revised. >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>> wrote: >>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>> to fit in (the >>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>> current NDN >>>>>> experiments? >>>>>> >>>>>> I guess wide deployment could make for even longer names. >>>>>> Related: Many URLs >>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>> text lines, and >>>>>> NDN will have to carry more information than URLs, as far as >>>>>> I see. >>>>>> >>>>>> >>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>> >>>>>> In fact, the index in separate TLV will be slower on some >>>>>> architectures, >>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>> bytes in memory, >>>>>> then any subsequent memory is accessed only as two adjacent >>>>>> 32-byte blocks >>>>>> (there can be at most 5 blocks available at any one time). >>>>>> If you need to >>>>>> switch between arrays, it would be very expensive. If you >>>>>> have to read past >>>>>> the name to get to the 2nd array, then read it, then backup >>>>>> to get to the >>>>>> name, it will be pretty expensive too. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>> wrote: >>>>>> >>>>>> Does this make that much difference? >>>>>> >>>>>> If you want to parse the first 5 components. One way to do >>>>>> it is: >>>>>> >>>>>> Read the index, find entry 5, then read in that many bytes >>>>>> from the start >>>>>> offset of the beginning of the name. >>>>>> OR >>>>>> Start reading name, (find size + move ) 5 times. >>>>>> >>>>>> How much speed are you getting from one to the other? You >>>>>> seem to imply >>>>>> that the first one is faster. I don?t think this is the >>>>>> case. >>>>>> >>>>>> In the first one you?ll probably have to get the cache line >>>>>> for the index, >>>>>> then all the required cache lines for the first 5 >>>>>> components. For the >>>>>> second, you?ll have to get all the cache lines for the first >>>>>> 5 components. >>>>>> Given an assumption that a cache miss is way more expensive >>>>>> than >>>>>> evaluating a number and computing an addition, you might >>>>>> find that the >>>>>> performance of the index is actually slower than the >>>>>> performance of the >>>>>> direct access. >>>>>> >>>>>> Granted, there is a case where you don?t access the name at >>>>>> all, for >>>>>> example, if you just get the offsets and then send the >>>>>> offsets as >>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>> you may see a >>>>>> gain IF there are more cache line misses in reading the name >>>>>> than in >>>>>> reading the index. So, if the regular part of the name >>>>>> that you?re >>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>> name is to be >>>>>> processed by a different processor, then your might see some >>>>>> performance >>>>>> gain in using the index, but in all other circumstances I >>>>>> bet this is not >>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>> >>>>>> This is all to say, I don?t think we should be designing the >>>>>> protocol with >>>>>> only one architecture in mind. (The architecture of sending >>>>>> the name to a >>>>>> different processor than the index). >>>>>> >>>>>> If you have numbers that show that the index is faster I >>>>>> would like to see >>>>>> under what conditions and architectural assumptions. >>>>>> >>>>>> Nacho >>>>>> >>>>>> (I may have misinterpreted your description so feel free to >>>>>> correct me if >>>>>> I?m wrong.) >>>>>> >>>>>> >>>>>> -- >>>>>> Nacho (Ignacio) Solis >>>>>> Protocol Architect >>>>>> Principal Scientist >>>>>> Palo Alto Research Center (PARC) >>>>>> +1(650)812-4458 >>>>>> Ignacio.Solis at parc.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>> >>>>>> wrote: >>>>>> >>>>>> Indeed each components' offset must be encoded using a fixed >>>>>> amount of >>>>>> bytes: >>>>>> >>>>>> i.e., >>>>>> Type = Offsets >>>>>> Length = 10 Bytes >>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>> >>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>> name is too >>>>>> long. >>>>>> >>>>>> Max >>>>>> >>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose >>>>>> you only >>>>>> want the first x components) you can directly have it using >>>>>> the >>>>>> offsets. With the Nested TLV structure you have to >>>>>> iteratively parse >>>>>> the first x-1 components. With the offset structure you cane >>>>>> directly >>>>>> access to the firs x components. >>>>>> >>>>>> I don't get it. What you described only works if the >>>>>> "offset" is >>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>> parse x-1 >>>>>> offsets to get to the x offset. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>> wrote: >>>>>> >>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> >>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>> like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>> understand what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>> entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is >>>>>> ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when >>>>>> you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>> TLV)? >>>>>> >>>>>> Correct. >>>>>> In particular, with our name encoding, a TLV indicates the >>>>>> name >>>>>> hierarchy >>>>>> with offsets in the name and other TLV(s) indicates the >>>>>> offset to use >>>>>> in >>>>>> order to retrieve special components. >>>>>> As for the field separator, it is something like "/". >>>>>> Aliasing is >>>>>> avoided as >>>>>> you do not rely on field separators to parse the name; you >>>>>> use the >>>>>> "offset >>>>>> TLV " to do that. >>>>>> >>>>>> So now, it may be an aesthetic question but: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose >>>>>> you only >>>>>> want >>>>>> the first x components) you can directly have it using the >>>>>> offsets. >>>>>> With the >>>>>> Nested TLV structure you have to iteratively parse the first >>>>>> x-1 >>>>>> components. >>>>>> With the offset structure you cane directly access to the >>>>>> firs x >>>>>> components. >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few >>>>>> "specific >>>>>> component type". You are imposing types for every component >>>>>> in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>> rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would >>>>>> prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field >>>>>> separator. Then, >>>>>> outside the name, if you have some components that could be >>>>>> used at >>>>>> network layer (e.g. a TLV field), you simply need something >>>>>> that >>>>>> indicates which is the offset allowing you to retrieve the >>>>>> version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>> >>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>> >>>>>> I think we agree on the small number of "component types". >>>>>> However, if you have a small number of types, you will end >>>>>> up with >>>>>> names >>>>>> containing many generic components types and few specific >>>>>> components >>>>>> types. Due to the fact that the component type specification >>>>>> is an >>>>>> exception in the name, I would prefer something that specify >>>>>> component's >>>>>> type only when needed (something like UTF8 conventions but >>>>>> that >>>>>> applications MUST use). >>>>>> >>>>>> so ... I can't quite follow that. the thread has had some >>>>>> explanation >>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>> e.g.) >>>>>> and >>>>>> there's been email trying to explain that applications don't >>>>>> have to >>>>>> use types if they don't need to. your email sounds like "I >>>>>> prefer >>>>>> the >>>>>> UTF8 convention", but it doesn't say why you have that >>>>>> preference in >>>>>> the face of the points about the problems. can you say why >>>>>> it is >>>>>> that >>>>>> you express a preference for the "convention" with problems ? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> . >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>>_______________________________________________ >>>Ndn-interest mailing list >>>Ndn-interest at lists.cs.ucla.edu >>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Ignacio.Solis at parc.com Thu Sep 25 01:10:00 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Thu, 25 Sep 2014 08:10:00 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching Message-ID: On 9/25/14, 9:17 AM, "Marc.Mosko at parc.com" wrote: >In the CCNx 1.0 spec, one could also encode this a different way. One >could use a name like ?/mail/inbox/selector_matching/? >and in the payload include "exclude_before=(t=version, l=2, v=279) & >sort=right?. I want to highlight this. There is a role that selectors can play in a network. However, our biggest issue with selectors is that they are mandated at the forwarder level. This means that every node must support selectors. We want to make sure that the core protocol is simple and efficient. Exact matching gives us that. If you?re interested in selector matching and searching, then create that protocol over exact matching. Marc just described a simple ?Selector Protocol", basically: - Encode selectors (or any query you want) in the interest payload. - Add a name segment to indicate that this is a selector based query - Add a name segment to uniquely identify the query (a hash of the payload for example) Example: name = /mail/inbox/list/selector_matching/ payload = version > 100 Topology: A ?? B ?? C A and C run the Selector Protocol B does not run the Selector Protocol Now: - Any node that does not understand the Selector Protocol (B) forwards normally and does exact matching. - Any node that understands the Selector Protocol (C) can parse the payload to find a match. If no match is found, forward the interest. If a match is found, create a reply. The reply can contain 2 types of data: - Structured data with links to the actual content objects - Encapsulated content objects So, in our example, the Selector Protocol reply could be: name = /mail/inbox/list/selector_matching/ payload = [ matching name = /mail/inbox/list/v101 ] [ embedded object < name = /mail/inbox/list/v101, payload = list, signature = mail server > ] signature = responding cache A few notes: - Malicious nodes could inject false replies. So, if C is malicious, it can insert a reply linking to some random object or just return junk. Well, this would be the case with regular selectors as well. C could reply with random crap or it could reply with a valid answer that is not the optimal answer (so, for example, not the right-most child or something). This is something that we can?t prevent. In the case of CCN, our fast path does not check signatures, so you wouldn?t be able to check the signature of the reply no matter what. I?m unsure if NDN is still advocating that every node checks signatures. If you are, then this approach might not work for you. Nodes that DO understand the Selector Protocol can check the signature of the encapsulated reply (if they wanted to). Nodes that DO understand the Selector Protocol can unpack the reply, and add the corresponding object to their cache, effectively enabling them to answer other Selector Protocol queries. - The reply from the Selector Protocol enabled node (C), could: ? include a list of all valid answers ? embed no objects ? embed more than 1 object ? process complex queries, regex, etc. The Selector Protocol could also: - include a method for authentication - include a cursor or some other state between queries I think this sort of protocol gives you everything you want while still maintaining an exact match protocol as the core protocol. What is this protocol missing to satisfy your needs? Can we create a protocol that will satisfy your needs on top of exact matching? Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com From christian.tschudin at unibas.ch Thu Sep 25 03:04:57 2014 From: christian.tschudin at unibas.ch (christian.tschudin at unibas.ch) Date: Thu, 25 Sep 2014 12:04:57 +0200 (CEST) Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: On Thu, 25 Sep 2014, Ignacio.Solis at parc.com wrote: > On 9/25/14, 9:17 AM, "Marc.Mosko at parc.com" wrote: >> In the CCNx 1.0 spec, one could also encode this a different way. One >> could use a name like ?/mail/inbox/selector_matching/? >> and in the payload include "exclude_before=(t=version, l=2, v=279) & >> sort=right?. this discussion turns into a thread on how to encode function calls. It would be nice if CCNx 1.0 would go ahead and offer a general function call schema and apply it itself. For example, the CCNx 1.0 spec still has a special field "contentObjectHash" just for invoking the compare-the-message-digest function, same for "keyIdRestriction". I would like to see them handled with a common function call schema and to deal with them at the same level as selectors as you propose, namely as a request to the "network as a whole" without mandating that each node has to satisfy it. Along the line of Marc's notation: "/mail/inbox/20140925/" and in the payload include "matchObjectHash(h=abcd) & matchKeyId(i=xxx)" Of course, in PARC's network each node will honor such function calls; but other networks could opt to guarantee that semantics edge-to-edge, yet remain interopable (=catenet-friendly). This links to your main point, namely which functions have to be built-in at each forwarder (as opposed to network-as-a-whole). I agree with you that we should examine the requirements for letting people like us (using selectors or named-functions) use your substrate. Here is one more wish, beyond the generic function-call packet format above: - Distinguish the name on which to route from the name of the object. This can be implemented by having a mandatory pointer field "nameUnderRoute" in the fixed header that points to either the name in the interest, or a field in the optional headers section (inserted by some forwarder). The reason for this is that an object's name does not necessarily reflect where the request is best satisfied - it could be a function's name. For example, there could be a few nodes *in* your network knowing how to handle selectors, so you might want to route to them. The network will have to figure this out. Another use case are virtualization tricks like label stacks. best, christian > > > I want to highlight this. > > There is a role that selectors can play in a network. However, our > biggest issue with selectors is that they are mandated at the forwarder > level. This means that every node must support selectors. > > We want to make sure that the core protocol is simple and efficient. > Exact matching gives us that. If you?re interested in selector matching > and searching, then create that protocol over exact matching. > > Marc just described a simple ?Selector Protocol", basically: > - Encode selectors (or any query you want) in the interest payload. > - Add a name segment to indicate that this is a selector based query > - Add a name segment to uniquely identify the query (a hash of the payload > for example) > > Example: > name = /mail/inbox/list/selector_matching/ > > payload = version > 100 > > Topology: > > A ?? B ?? C > > A and C run the Selector Protocol > B does not run the Selector Protocol > > Now: > - Any node that does not understand the Selector Protocol (B) forwards > normally and does exact matching. > - Any node that understands the Selector Protocol (C) can parse the > payload to find a match. > > If no match is found, forward the interest. > If a match is found, create a reply. > > The reply can contain 2 types of data: > - Structured data with links to the actual content objects > - Encapsulated content objects > > So, in our example, the Selector Protocol reply could be: > > name = /mail/inbox/list/selector_matching/ > payload = > [ matching name = /mail/inbox/list/v101 ] > [ embedded object < name = /mail/inbox/list/v101, payload = list, > signature = mail server > ] > signature = responding cache > > > > A few notes: > - Malicious nodes could inject false replies. So, if C is malicious, it > can insert a reply linking to some random object or just return junk. > Well, this would be the case with regular selectors as well. C could > reply with random crap or it could reply with a valid answer that is not > the optimal answer (so, for example, not the right-most child or > something). > This is something that we can?t prevent. > > In the case of CCN, our fast path does not check signatures, so you > wouldn?t be able to check the signature of the reply no matter what. I?m > unsure if NDN is still advocating that every node checks signatures. If > you are, then this approach might not work for you. > > Nodes that DO understand the Selector Protocol can check the signature of > the encapsulated reply (if they wanted to). > Nodes that DO understand the Selector Protocol can unpack the reply, and > add the corresponding object to their cache, effectively enabling them to > answer other Selector Protocol queries. > > - The reply from the Selector Protocol enabled node (C), could: > ? include a list of all valid answers > ? embed no objects > ? embed more than 1 object > ? process complex queries, regex, etc. > > The Selector Protocol could also: > - include a method for authentication > - include a cursor or some other state between queries > > > I think this sort of protocol gives you everything you want while still > maintaining an exact match protocol as the core protocol. > > > What is this protocol missing to satisfy your needs? > Can we create a protocol that will satisfy your needs on top of exact > matching? > > > Nacho > > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Ignacio.Solis at parc.com Thu Sep 25 03:22:20 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Thu, 25 Sep 2014 10:22:20 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: On 9/25/14, 12:04 PM, "christian.tschudin at unibas.ch" wrote: >On Thu, 25 Sep 2014, Ignacio.Solis at parc.com wrote: > >> On 9/25/14, 9:17 AM, "Marc.Mosko at parc.com" wrote: >>> In the CCNx 1.0 spec, one could also encode this a different way. One >>> could use a name like ?/mail/inbox/selector_matching/? >>> and in the payload include "exclude_before=(t=version, l=2, v=279) & >>> sort=right?. > >this discussion turns into a thread on how to encode function calls. It >would be nice if CCNx 1.0 would go ahead and offer a general function >call schema and apply it itself. > >For example, the CCNx 1.0 spec still has a special field >"contentObjectHash" just for invoking the compare-the-message-digest >function, same for "keyIdRestriction". I would like to see them handled >with a common function call schema and to deal with them at the same >level as selectors as you propose, namely as a request to the "network >as a whole" without mandating that each node has to satisfy it. This is a really good point. Those can, in fact, be described in a general way. However, we believe that they are important enough that we can?t have them be optional. We use them at every node. They are part of the data that identifies the content, but are not used for (FIB) forwarding. Our network allows to have objects with the same name but different keyId or different Hash. Every node needs to be able to distinguish this. Hence, we can?t make these optional. >Along the line of Marc's notation > >"/mail/inbox/20140925/" >and in the payload include "matchObjectHash(h=abcd) & matchKeyId(i=xxx)" > >Of course, in PARC's network each node will honor such function calls; >but other networks could opt to guarantee that semantics edge-to-edge, >yet remain interopable (=catenet-friendly). > >This links to your main point, namely which functions have to be >built-in at each forwarder (as opposed to network-as-a-whole). I agree >with you that we should examine the requirements for letting people like >us (using selectors or named-functions) use your substrate. We believe these are required functions. We used them for various things (like self-certified names and nameless objects) that are required at every node. >Here is one >more wish, beyond the generic function-call packet format above: > >- Distinguish the name on which to route from the name of the object. We agree with this. But the ?name of the object? is out of scope of the network protocol. The network cares about the network name. The name of the object is something that is associated more with manifests (and meta data). BTW, this is one reason why we like nameless objects, they can be used for any name (and any manifest), independently of location and routing. The name that a user gives an object is NOT the same as the name the network gives the object. Otherwise we would be renaming (and re-signing/re-encrypting) every network object every time you rename a file, or move a directory or move from one location to another. > Another use case are virtualization tricks like label stacks. We cover some of these with manifests. They can be used as advanced links. Nacho >> I want to highlight this. >> >> There is a role that selectors can play in a network. However, our >> biggest issue with selectors is that they are mandated at the forwarder >> level. This means that every node must support selectors. >> >> We want to make sure that the core protocol is simple and efficient. >> Exact matching gives us that. If you?re interested in selector matching >> and searching, then create that protocol over exact matching. >> >> Marc just described a simple ?Selector Protocol", basically: >> - Encode selectors (or any query you want) in the interest payload. >> - Add a name segment to indicate that this is a selector based query >> - Add a name segment to uniquely identify the query (a hash of the >>payload >> for example) >> >> Example: >> name = /mail/inbox/list/selector_matching/ >> >> payload = version > 100 >> >> Topology: >> >> A ?? B ?? C >> >> A and C run the Selector Protocol >> B does not run the Selector Protocol >> >> Now: >> - Any node that does not understand the Selector Protocol (B) forwards >> normally and does exact matching. >> - Any node that understands the Selector Protocol (C) can parse the >> payload to find a match. >> >> If no match is found, forward the interest. >> If a match is found, create a reply. >> >> The reply can contain 2 types of data: >> - Structured data with links to the actual content objects >> - Encapsulated content objects >> >> So, in our example, the Selector Protocol reply could be: >> >> name = /mail/inbox/list/selector_matching/ >> payload = >> [ matching name = /mail/inbox/list/v101 ] >> [ embedded object < name = /mail/inbox/list/v101, payload = list, >> signature = mail server > ] >> signature = responding cache >> >> >> >> A few notes: >> - Malicious nodes could inject false replies. So, if C is malicious, it >> can insert a reply linking to some random object or just return junk. >> Well, this would be the case with regular selectors as well. C could >> reply with random crap or it could reply with a valid answer that is not >> the optimal answer (so, for example, not the right-most child or >> something). >> This is something that we can?t prevent. >> >> In the case of CCN, our fast path does not check signatures, so you >> wouldn?t be able to check the signature of the reply no matter what. >>I?m >> unsure if NDN is still advocating that every node checks signatures. If >> you are, then this approach might not work for you. >> >> Nodes that DO understand the Selector Protocol can check the signature >>of >> the encapsulated reply (if they wanted to). >> Nodes that DO understand the Selector Protocol can unpack the reply, and >> add the corresponding object to their cache, effectively enabling them >>to >> answer other Selector Protocol queries. >> >> - The reply from the Selector Protocol enabled node (C), could: >> ? include a list of all valid answers >> ? embed no objects >> ? embed more than 1 object >> ? process complex queries, regex, etc. >> >> The Selector Protocol could also: >> - include a method for authentication >> - include a cursor or some other state between queries >> >> >> I think this sort of protocol gives you everything you want while still >> maintaining an exact match protocol as the core protocol. >> >> >> What is this protocol missing to satisfy your needs? >> Can we create a protocol that will satisfy your needs on top of exact >> matching? >> >> >> Nacho >> >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Thu Sep 25 03:30:26 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Thu, 25 Sep 2014 10:30:26 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: In the ccnx 1.0 packet format: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+---------------+---------------+ | Version | PacketType | PayloadLength | +---------------+---------------+---------------+---------------+ | HopLimit (I)*| reserved | HeaderLength | +---------------+---------------+---------------+---------------+ / Optional header TLVs / +---------------+---------------+---------------+---------------+ | CCNx Message TLV / +---------------+---------------+---------------+---------------+ / Optional CCNx ValidationAlgorithm TLV / +---------------+---------------+---------------+---------------+ / Optional CCNx ValidationPayload TLV (ValidationAlg required) / +---------------+---------------+---------------+---------------+ which shares with NDN the property that the validation is put around the core message, things like the KeyId are particular to the authentication, not necessarily integral to the name. For example, a sensor could publish /parc/building33/pod22/room2270/t=20140925T110203 with an HMAC signature identified by a key exchange protocol identifier ?33790? with its IoT gateway. The IoT gateway might then strip away the HMAC validator and sign it with an RSA signature with a new validator with a standard KeyId. Therefore, I would not want to be including things like KeyId in the name. though of course the IoT gateway could re-name the measurement? Our approach was that every function applied in the fast path should have a distinguished field, thus the KeyIdRestriction and ContentObjectHashRestriction are distinguished TLV fields, not buried in the name. Things that must execute in the forwarding for correct operation need to be governed by the ?version? of the packet header. We would say that the correct processing of these two restrictions is necessary for correct operation. Optional or proprietary forwarding hints could be encoded in the name, in the ?optional header tlvs? or in vendor-specific (assigned) TLVs in the message body. They would be skipped by systems that don?t know about them. Marc On Sep 25, 2014, at 12:04 PM, christian.tschudin at unibas.ch wrote: > On Thu, 25 Sep 2014, Ignacio.Solis at parc.com wrote: > >> On 9/25/14, 9:17 AM, "Marc.Mosko at parc.com" wrote: >>> In the CCNx 1.0 spec, one could also encode this a different way. One >>> could use a name like ?/mail/inbox/selector_matching/? >>> and in the payload include "exclude_before=(t=version, l=2, v=279) & >>> sort=right?. > > this discussion turns into a thread on how to encode function calls. It would be nice if CCNx 1.0 would go ahead and offer a general function call schema and apply it itself. > > For example, the CCNx 1.0 spec still has a special field "contentObjectHash" just for invoking the compare-the-message-digest function, same for "keyIdRestriction". I would like to see them handled with a common function call schema and to deal with them at the same level as selectors as you propose, namely as a request to the "network as a whole" without mandating that each node has to satisfy it. > > Along the line of Marc's notation: > > "/mail/inbox/20140925/" > and in the payload include "matchObjectHash(h=abcd) & matchKeyId(i=xxx)" > > Of course, in PARC's network each node will honor such function calls; but other networks could opt to guarantee that semantics edge-to-edge, yet remain interopable (=catenet-friendly). > > This links to your main point, namely which functions have to be built-in at each forwarder (as opposed to network-as-a-whole). I agree with you that we should examine the requirements for letting people like us (using selectors or named-functions) use your substrate. Here is one more wish, beyond the generic function-call packet format above: > > - Distinguish the name on which to route from the name of the object. > > This can be implemented by having a mandatory pointer field > "nameUnderRoute" in the fixed header that points to either the > name in the interest, or a field in the optional headers section > (inserted by some forwarder). > > The reason for this is that an object's name does not necessarily > reflect where the request is best satisfied - it could be a > function's name. For example, there could be a few nodes *in* your > network knowing how to handle selectors, so you might want to > route to them. The network will have to figure this out. > > Another use case are virtualization tricks like label stacks. > > best, christian > > >> >> >> I want to highlight this. >> >> There is a role that selectors can play in a network. However, our >> biggest issue with selectors is that they are mandated at the forwarder >> level. This means that every node must support selectors. >> >> We want to make sure that the core protocol is simple and efficient. >> Exact matching gives us that. If you?re interested in selector matching >> and searching, then create that protocol over exact matching. >> >> Marc just described a simple ?Selector Protocol", basically: >> - Encode selectors (or any query you want) in the interest payload. >> - Add a name segment to indicate that this is a selector based query >> - Add a name segment to uniquely identify the query (a hash of the payload >> for example) >> >> Example: >> name = /mail/inbox/list/selector_matching/ >> >> payload = version > 100 >> >> Topology: >> >> A ?? B ?? C >> >> A and C run the Selector Protocol >> B does not run the Selector Protocol >> >> Now: >> - Any node that does not understand the Selector Protocol (B) forwards >> normally and does exact matching. >> - Any node that understands the Selector Protocol (C) can parse the >> payload to find a match. >> >> If no match is found, forward the interest. >> If a match is found, create a reply. >> >> The reply can contain 2 types of data: >> - Structured data with links to the actual content objects >> - Encapsulated content objects >> >> So, in our example, the Selector Protocol reply could be: >> >> name = /mail/inbox/list/selector_matching/ >> payload = >> [ matching name = /mail/inbox/list/v101 ] >> [ embedded object < name = /mail/inbox/list/v101, payload = list, >> signature = mail server > ] >> signature = responding cache >> >> >> >> A few notes: >> - Malicious nodes could inject false replies. So, if C is malicious, it >> can insert a reply linking to some random object or just return junk. >> Well, this would be the case with regular selectors as well. C could >> reply with random crap or it could reply with a valid answer that is not >> the optimal answer (so, for example, not the right-most child or >> something). >> This is something that we can?t prevent. >> >> In the case of CCN, our fast path does not check signatures, so you >> wouldn?t be able to check the signature of the reply no matter what. I?m >> unsure if NDN is still advocating that every node checks signatures. If >> you are, then this approach might not work for you. >> >> Nodes that DO understand the Selector Protocol can check the signature of >> the encapsulated reply (if they wanted to). >> Nodes that DO understand the Selector Protocol can unpack the reply, and >> add the corresponding object to their cache, effectively enabling them to >> answer other Selector Protocol queries. >> >> - The reply from the Selector Protocol enabled node (C), could: >> ? include a list of all valid answers >> ? embed no objects >> ? embed more than 1 object >> ? process complex queries, regex, etc. >> >> The Selector Protocol could also: >> - include a method for authentication >> - include a cursor or some other state between queries >> >> >> I think this sort of protocol gives you everything you want while still >> maintaining an exact match protocol as the core protocol. >> >> >> What is this protocol missing to satisfy your needs? >> Can we create a protocol that will satisfy your needs on top of exact >> matching? >> >> >> Nacho >> >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From haroonr at iiitd.ac.in Thu Sep 25 08:20:39 2014 From: haroonr at iiitd.ac.in (Haroon Rashid) Date: Thu, 25 Sep 2014 20:50:39 +0530 Subject: [Ndn-interest] Simulation vs Emulation of NDN applications Message-ID: Hello All, I need your guidance in implementing NDN applications. Please correct me in the following points: 1. I want to develop some applications and do measurements in NDN architecture. Since, I am new to this field, I think it will be easy to start with ndnSIM rather than going for NDNX/CCNX emulators directly. Once I get a good grip on this architecture, then I can start working with emulators. Am I going in the right direction or should I change my approach ? 2. If I work on ndnSIM, do I need to understand the NS-3 architecture first or should I start working with ndnSIM directly. In both of the cases how much time it will take to have a good grip on ndnSIM if I devote 3 hours daily. Regarding my background: I know c++ but I have never used NS-3. 3. Is there anything else which you would suggest to me for better understanding of NDN apart from reading the papers. This is because I have already read many papers and most of the concepts are clear now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dibenede at cs.colostate.edu Thu Sep 25 09:01:10 2014 From: dibenede at cs.colostate.edu (Steve DiBenedetto) Date: Thu, 25 Sep 2014 18:01:10 +0200 Subject: [Ndn-interest] Simulation vs Emulation of NDN applications In-Reply-To: References: Message-ID: On Sep 25, 2014, at 5:20 PM, Haroon Rashid wrote: > > Hello All, > > I need your guidance in implementing NDN applications. Please correct me in the following points: > > 1. I want to develop some applications and do measurements in NDN architecture. Since, I am new to this field, I think it will be easy to start with ndnSIM rather than going for NDNX/CCNX emulators directly. Once I get a good grip on this architecture, then I can start working with emulators. Am I going in the right direction or should I change my approach ? Can you tell us a bit more about what you want to do? ndnSIM is good for performing experiments. Especially so if you're interested in large topologies. If you're more interested in writing real NDN applications, some kinds of measurement (e.g. performance on live networks), or seeing how some concept translates into NDN, then NFD and your preferred programming language NDN library would be the better choice. > > 2. If I work on ndnSIM, do I need to understand the NS-3 architecture first or should I start working with ndnSIM directly. > In both of the cases how much time it will take to have a good grip on ndnSIM if I devote 3 hours daily. Regarding my background: I know c++ but I have never used NS-3. In general, it's a good idea to have a basic understanding of NS-3's object & factory system, callbacks, tracing, and maybe some other concepts because they're ubiquitous. However, I think you can avoid a lot of the other details if you're only interested in the application domain. > > 3. Is there anything else which you would suggest to me for better understanding of NDN apart from reading the papers. This is because I have already read many papers and most of the concepts are clear now. I would encourage you to start writing applications to figure out what gaps remain. If you're interested in writing C++, the ndn-cxx library has some simple producer/consumer examples to help you get started: http://named-data.net/doc/ndn-cxx/current/examples.html If you prefer to code in something like Python, I will shamelessly plug the code walkthrough I put together for ICN: https://github.com/dibenede/NFD-ICN2014/blob/ICN2014/README-ICN2014.md https://github.com/dibenede/NFD-ICN2014/tree/ICN2014/ICN2014-apps (However, you should install NFD according to the official instructions here http://named-data.net/doc/NFD/current/INSTALL.html rather than using the one provided by the above github links.) -Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanwang at memphis.edu Thu Sep 25 11:41:38 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Thu, 25 Sep 2014 18:41:38 +0000 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: References: Message-ID: <256B7501-66FE-4004-97BE-EC5EFA0F0939@memphis.edu> Some videos contain long periods of silence (during breaks). It would be best to remove them before publishing them. Lan On Sep 25, 2014, at 12:07 AM, "Burke, Jeff" > wrote: Yes. If someone can help, we can get this up and running... :) Probably about a week to port and a week to test with the specific video files. Jeff From: Alex Horn > Date: Wed, 24 Sep 2014 15:28:15 -0700 Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] NDNcomm 2014 videos Indeed - we had ndn - video testbed distribution for a few years ! ndn-video (source, tech report) was very useful in testing the early testbed. unfortunately it is out of date, as: a) uses pyccn/ccnx - needs to be updated to pyndn2/NFD b) uses gst .010 - needs to be updated to gst 1.X we don't internally have the resources for that, at the moment... (our recent video work has been in ndn-RTC) but if someone wanted to take it on, it is a fairly straightforward effort. meanwhile, we will get the conference video online in some form as soon as possible. thanks for your interest ! Alex On Wed, Sep 24, 2014 at 1:54 PM, Xiaoke Jiang > wrote: On Wednesday, 24 September, 2014 at 1:46 pm, Beichuan Zhang wrote: Maybe we can stream it over NDN testbed -:) There?re 3 nodes in China. Good solution. Some other guys in China also complained the slow connection. also, NDN could shows its advantage over data delivery with this solution. Xiaoke (Shock) _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanwang at memphis.EDU Thu Sep 25 12:53:45 2014 From: lanwang at memphis.EDU (Lan Wang (lanwang)) Date: Thu, 25 Sep 2014 19:53:45 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> Message-ID: How can a cache respond to /mail/inbox/selector_matching/ with a table of content? This name prefix is owned by the mail server. Also the reply really depends on what is in the cache at the moment, so the same name would correspond to different data. Lan On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: > My beating on ?discover all? is exactly because of this. Let?s define discovery service. If the service is just ?discover latest? (left/right), can we not simplify the current approach? If the service includes more than ?latest?, then is the current approach the right approach? > > Sync has its place and is the right solution for somethings. However, it should not be a a bandage over discovery. Discovery should be its own valid and useful service. > > I agree that the exclusion approach can work, and work relatively well, for finding the rightmost/leftmost child. I believe this is because that operation is transitive through caches. So, within whatever timeout an application is willing to wait to find the ?latest?, it can keep asking and asking. > > I do think it would be best to actually try to ask an authoritative source first (i.e. a non-cached value), and if that fails then probe caches, but experimentation may show what works well. This is based on my belief that in the real world in broad use, the namespace will become pretty polluted and probing will result in a lot of junk, but that?s future prognosticating. > > Also, in the exact match vs. continuation match of content object to interest, it is pretty easy to encode that ?selector? request in a name component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) and any participating cache can respond with a link (or encapsulate) a response in an exact match system. > > In the CCNx 1.0 spec, one could also encode this a different way. One could use a name like ?/mail/inbox/selector_matching/? and in the payload include "exclude_before=(t=version, l=2, v=279) & sort=right?. This means that any cache that could process the ? selector_matching? function could look at the interest payload and evaluate the predicate there. The predicate could become large and not pollute the PIT with all the computation state. Including ?? in the name means that one could get a cached response if someone else had asked the same exact question (subject to the content object?s cache lifetime) and it also servers to multiplex different payloads for the same function (selector_matching). > > Marc > > > On Sep 25, 2014, at 8:18 AM, Burke, Jeff wrote: > >> >> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >> >> https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/SynchronizationPr >> otocol.html >> >> J. >> >> >> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >> >>> However, I cannot see whether we can achieve "best-effort *all*-value" >>> efficiently. >>> There are still interesting topics on >>> 1. how do we express the discovery query? >>> 2. is selector "discovery-complete"? i. e. can we express any >>> discovery query with current selector? >>> 3. if so, can we re-express current selector in a more efficient way? >>> >>> I personally see a named data as a set, which can then be categorized >>> into "ordered set", and "unordered set". >>> some questions that any discovery expression must solve: >>> 1. is this a nil set or not? nil set means that this name is the leaf >>> 2. set contains member X? >>> 3. is set ordered or not >>> 4. (ordered) first, prev, next, last >>> 5. if we enforce component ordering, answer question 4. >>> 6. recursively answer all questions above on any set member >>> >>> >>> >>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>> wrote: >>>> >>>> >>>> From: >>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>> To: Jeff Burke >>>> Cc: , , >>>> >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>>> shouldn?t >>>> care if its email messages or temperature readings or world cup photos. >>>> >>>> >>>> This is true if discovery means "finding everything" - in which case, >>>> as you >>>> point out, sync-style approaches may be best. But I am not sure that >>>> this >>>> definition is complete. The most pressing example that I can think of >>>> is >>>> best-effort latest-value, in which the consumer's goal is to get the >>>> latest >>>> copy the network can deliver at the moment, and may not care about >>>> previous >>>> values or (if freshness is used well) potential later versions. >>>> >>>> Another case that seems to work well is video seeking. Let's say I >>>> want to >>>> enable random access to a video by timecode. The publisher can provide a >>>> time-code based discovery namespace that's queried using an Interest >>>> that >>>> essentially says "give me the closest keyframe to 00:37:03:12", which >>>> returns an interest that, via the name, provides the exact timecode of >>>> the >>>> keyframe in question and a link to a segment-based namespace for >>>> efficient >>>> exact match playout. In two roundtrips and in a very lightweight way, >>>> the >>>> consumer has random access capability. If the NDN is the moral >>>> equivalent >>>> of IP, then I am not sure we should be afraid of roundtrips that provide >>>> this kind of functionality, just as they are used in TCP. >>>> >>>> >>>> I described one set of problems using the exclusion approach, and that >>>> an >>>> NDN paper on device discovery described a similar problem, though they >>>> did >>>> not go into the details of splitting interests, etc. That all was >>>> simple >>>> enough to see from the example. >>>> >>>> Another question is how does one do the discovery with exact match >>>> names, >>>> which is also conflating things. You could do a different discovery >>>> with >>>> continuation names too, just not the exclude method. >>>> >>>> As I alluded to, one needs a way to talk with a specific cache about its >>>> ?table of contents? for a prefix so one can get a consistent set of >>>> results >>>> without all the round-trips of exclusions. Actually downloading the >>>> ?headers? of the messages would be the same bytes, more or less. In a >>>> way, >>>> this is a little like name enumeration from a ccnx 0.x repo, but that >>>> protocol has its own set of problems and I?m not suggesting to use that >>>> directly. >>>> >>>> One approach is to encode a request in a name component and a >>>> participating >>>> cache can reply. It replies in such a way that one could continue >>>> talking >>>> with that cache to get its TOC. One would then issue another interest >>>> with >>>> a request for not-that-cache. >>>> >>>> >>>> I'm curious how the TOC approach works in a multi-publisher scenario? >>>> >>>> >>>> Another approach is to try to ask the authoritative source for the >>>> ?current? >>>> manifest name, i.e. /mail/inbox/current/, which could return the >>>> manifest or a link to the manifest. Then fetching the actual manifest >>>> from >>>> the link could come from caches because you how have a consistent set of >>>> names to ask for. If you cannot talk with an authoritative source, you >>>> could try again without the nonce and see if there?s a cached copy of a >>>> recent version around. >>>> >>>> Marc >>>> >>>> >>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff wrote: >>>> >>>> >>>> >>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>> wrote: >>>> >>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>> >>>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>> pattern with static (/mail/inbox) and variable (148) components; with >>>> proper naming convention, computers can also detect this pattern >>>> easily. Now I want to look for all mails in my inbox. I can generate a >>>> list of /mail/inbox/. These are my guesses, and with selectors >>>> I can further refine my guesses. >>>> >>>> >>>> I think this is a very bad example (or at least a very bad application >>>> design). You have an app (a mail server / inbox) and you want it to >>>> list >>>> your emails? An email list is an application data structure. I don?t >>>> think you should use the network structure to reflect this. >>>> >>>> >>>> I think Tai-Lin is trying to sketch a small example, not propose a >>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>> >>>> >>>> Another way to look at it is that if the network architecture is >>>> providing >>>> the equivalent of distributed storage to the application, perhaps the >>>> application data structure could be adapted to match the affordances of >>>> the network. Then it would not be so bad that the two structures were >>>> aligned. >>>> >>>> >>>> I?ll give you an example, how do you delete emails from your inbox? If >>>> an >>>> email was cached in the network it can never be deleted from your inbox? >>>> >>>> >>>> This is conflating two issues - what you are pointing out is that the >>>> data >>>> structure of a linear list doesn't handle common email management >>>> operations well. Again, I'm not sure if that's what he was getting at >>>> here. But deletion is not the issue - the availability of a data object >>>> on the network does not necessarily mean it's valid from the perspective >>>> of the application. >>>> >>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>> >>>> This problem is true for most (any?) situations where you use network >>>> name >>>> structure to directly reflect the application data structure. >>>> >>>> >>>> Not sure I understand how you make the leap from the example to the >>>> general statement. >>>> >>>> Jeff >>>> >>>> >>>> >>>> >>>> Nacho >>>> >>>> >>>> >>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>> >>>> Ok, yes I think those would all be good things. >>>> >>>> One thing to keep in mind, especially with things like time series >>>> sensor >>>> data, is that people see a pattern and infer a way of doing it. That?s >>>> easy >>>> for a human :) But in Discovery, one should assume that one does not >>>> know >>>> of patterns in the data beyond what the protocols used to publish the >>>> data >>>> explicitly require. That said, I think some of the things you listed >>>> are >>>> good places to start: sensor data, web content, climate data or genome >>>> data. >>>> >>>> We also need to state what the forwarding strategies are and what the >>>> cache >>>> behavior is. >>>> >>>> I outlined some of the points that I think are important in that other >>>> posting. While ?discover latest? is useful, ?discover all? is also >>>> important, and that one gets complicated fast. So points like >>>> separating >>>> discovery from retrieval and working with large data sets have been >>>> important in shaping our thinking. That all said, I?d be happy >>>> starting >>>> from 0 and working through the Discovery service definition from >>>> scratch >>>> along with data set use cases. >>>> >>>> Marc >>>> >>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>> wrote: >>>> >>>> Hi Marc, >>>> >>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>> more >>>> specific, which was to see if we could identify a few specific use >>>> cases >>>> around which to have the conversation. (e.g., time series sensor data >>>> and >>>> web content retrieval for "get latest"; climate data for huge data >>>> sets; >>>> local data in a vehicular network; etc.) What have you been looking at >>>> that's driving considerations of discovery? >>>> >>>> Thanks, >>>> Jeff >>>> >>>> From: >>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>> To: Jeff Burke >>>> Cc: , >>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>> >>>> Jeff, >>>> >>>> Take a look at my posting (that Felix fixed) in a new thread on >>>> Discovery. >>>> >>>> >>>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00020 >>>> 0 >>>> .html >>>> >>>> I think it would be very productive to talk about what Discovery should >>>> do, >>>> and not focus on the how. It is sometimes easy to get caught up in the >>>> how, >>>> which I think is a less important topic than the what at this stage. >>>> >>>> Marc >>>> >>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>> wrote: >>>> >>>> Marc, >>>> >>>> If you can't talk about your protocols, perhaps we can discuss this >>>> based >>>> on use cases. What are the use cases you are using to evaluate >>>> discovery? >>>> >>>> Jeff >>>> >>>> >>>> >>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>> wrote: >>>> >>>> No matter what the expressiveness of the predicates if the forwarder >>>> can >>>> send interests different ways you don't have a consistent underlying >>>> set >>>> to talk about so you would always need non-range exclusions to discover >>>> every version. >>>> >>>> Range exclusions only work I believe if you get an authoritative >>>> answer. >>>> If different content pieces are scattered between different caches I >>>> don't see how range exclusions would work to discover every version. >>>> >>>> I'm sorry to be pointing out problems without offering solutions but >>>> we're not ready to publish our discovery protocols. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>> >>>> I see. Can you briefly describe how ccnx discovery protocol solves the >>>> all problems that you mentioned (not just exclude)? a doc will be >>>> better. >>>> >>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will soon >>>> expect [and] and [or], so boolean algebra is fully supported. Regular >>>> language or context free language might become part of selector too. >>>> >>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>> That will get you one reading then you need to exclude it and ask >>>> again. >>>> >>>> Sent from my telephone >>>> >>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. >>>> >>>> >>>> I am very confused. For your example, if I want to get all today's >>>> sensor data, I just do (Any..Last second of last day)(First second of >>>> tomorrow..Any). That's 18 bytes. >>>> >>>> >>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>> >>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>> >>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>> >>>> If you talk sometimes to A and sometimes to B, you very easily >>>> could miss content objects you want to discovery unless you avoid >>>> all range exclusions and only exclude explicit versions. >>>> >>>> >>>> Could you explain why missing content object situation happens? also >>>> range exclusion is just a shorter notation for many explicit >>>> exclude; >>>> converting from explicit excludes to ranged exclude is always >>>> possible. >>>> >>>> >>>> Yes, my point was that if you cannot talk about a consistent set >>>> with a particular cache, then you need to always use individual >>>> excludes not range excludes if you want to discover all the versions >>>> of an object. For something like a sensor reading that is updated, >>>> say, once per second you will have 86,400 of them per day. If each >>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>> exclusions (plus encoding overhead) per day. >>>> >>>> yes, maybe using a more deterministic version number than a >>>> timestamp makes sense here, but its just an example of needing a lot >>>> of exclusions. >>>> >>>> >>>> You exclude through 100 then issue a new interest. This goes to >>>> cache B >>>> >>>> >>>> I feel this case is invalid because cache A will also get the >>>> interest, and cache A will return v101 if it exists. Like you said, >>>> if >>>> this goes to cache B only, it means that cache A dies. How do you >>>> know >>>> that v101 even exist? >>>> >>>> >>>> I guess this depends on what the forwarding strategy is. If the >>>> forwarder will always send each interest to all replicas, then yes, >>>> modulo packet loss, you would discover v101 on cache A. If the >>>> forwarder is just doing ?best path? and can round-robin between cache >>>> A and cache B, then your application could miss v101. >>>> >>>> >>>> >>>> c,d In general I agree that LPM performance is related to the number >>>> of components. In my own thread-safe LMP implementation, I used only >>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>> every node will be faster or not because of lock overhead. >>>> >>>> However, we should compare (exact match + discovery protocol) vs >>>> (ndn >>>> lpm). Comparing performance of exact match to lpm is unfair. >>>> >>>> >>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>> specs for doing the exact match discovery. So, as I said, I?m not >>>> ready to claim its better yet because we have not done that. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>> I would point out that using LPM on content object to Interest >>>> matching to do discovery has its own set of problems. Discovery >>>> involves more than just ?latest version? discovery too. >>>> >>>> This is probably getting off-topic from the original post about >>>> naming conventions. >>>> >>>> a. If Interests can be forwarded multiple directions and two >>>> different caches are responding, the exclusion set you build up >>>> talking with cache A will be invalid for cache B. If you talk >>>> sometimes to A and sometimes to B, you very easily could miss >>>> content objects you want to discovery unless you avoid all range >>>> exclusions and only exclude explicit versions. That will lead to >>>> very large interest packets. In ccnx 1.0, we believe that an >>>> explicit discovery protocol that allows conversations about >>>> consistent sets is better. >>>> >>>> b. Yes, if you just want the ?latest version? discovery that >>>> should be transitive between caches, but imagine this. You send >>>> Interest #1 to cache A which returns version 100. You exclude >>>> through 100 then issue a new interest. This goes to cache B who >>>> only has version 99, so the interest times out or is NACK?d. So >>>> you think you have it! But, cache A already has version 101, you >>>> just don?t know. If you cannot have a conversation around >>>> consistent sets, it seems like even doing latest version discovery >>>> is difficult with selector based discovery. From what I saw in >>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>> authoritative source because you can never believe an intermediate >>>> cache that there?s not something more recent. >>>> >>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>> interest in seeing your analysis. Case (a) is that a node can >>>> correctly discover every version of a name prefix, and (b) is that >>>> a node can correctly discover the latest version. We have not >>>> formally compared (or yet published) our discovery protocols (we >>>> have three, 2 for content, 1 for device) compared to selector based >>>> discovery, so I cannot yet claim they are better, but they do not >>>> have the non-determinism sketched above. >>>> >>>> c. Using LPM, there is a non-deterministic number of lookups you >>>> must do in the PIT to match a content object. If you have a name >>>> tree or a threaded hash table, those don?t all need to be hash >>>> lookups, but you need to walk up the name tree for every prefix of >>>> the content object name and evaluate the selector predicate. >>>> Content Based Networking (CBN) had some some methods to create data >>>> structures based on predicates, maybe those would be better. But >>>> in any case, you will potentially need to retrieve many PIT entries >>>> if there is Interest traffic for many prefixes of a root. Even on >>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>> implementation only requires at most 3 lookups (one by name, one by >>>> name + keyid, one by name + content object hash), and one can do >>>> other things to optimize lookup for an extra write. >>>> >>>> d. In (c) above, if you have a threaded name tree or are just >>>> walking parent pointers, I suspect you?ll need locking of the >>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>> and that will be expensive. It would be interesting to see what a >>>> cache consistent multi-threaded name tree looks like. >>>> >>>> Marc >>>> >>>> >>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I had thought about these questions, but I want to know your idea >>>> besides typed component: >>>> 1. LPM allows "data discovery". How will exact match do similar >>>> things? >>>> 2. will removing selectors improve performance? How do we use >>>> other >>>> faster technique to replace selector? >>>> 3. fixed byte length and type. I agree more that type can be fixed >>>> byte, but 2 bytes for length might not be enough for future. >>>> >>>> >>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> >>>> Could you share it with us? >>>> >>>> Sure. Here?s a strawman. >>>> >>>> The type space is 16 bits, so you have 65,565 types. >>>> >>>> The type space is currently shared with the types used for the >>>> entire protocol, that gives us two options: >>>> (1) we reserve a range for name component types. Given the >>>> likelihood there will be at least as much and probably more need >>>> to component types than protocol extensions, we could reserve 1/2 >>>> of the type space, giving us 32K types for name components. >>>> (2) since there is no parsing ambiguity between name components >>>> and other fields of the protocol (sine they are sub-types of the >>>> name type) we could reuse numbers and thereby have an entire 65K >>>> name component types. >>>> >>>> We divide the type space into regions, and manage it with a >>>> registry. If we ever get to the point of creating an IETF >>>> standard, IANA has 25 years of experience running registries and >>>> there are well-understood rule sets for different kinds of >>>> registries (open, requires a written spec, requires standards >>>> approval). >>>> >>>> - We allocate one ?default" name component type for ?generic >>>> name?, which would be used on name prefixes and other common >>>> cases where there are no special semantics on the name component. >>>> - We allocate a range of name component types, say 1024, to >>>> globally understood types that are part of the base or extension >>>> NDN specifications (e.g. chunk#, version#, etc. >>>> - We reserve some portion of the space for unanticipated uses >>>> (say another 1024 types) >>>> - We give the rest of the space to application assignment. >>>> >>>> Make sense? >>>> >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design >>>> >>>> >>>> we could design for performance, >>>> >>>> That?s not what people are advocating. We are advocating that we >>>> *not* design for known bad performance and hope serendipity or >>>> Moore?s Law will come to the rescue. >>>> >>>> but I think there will be a turning >>>> point when the slower design starts to become "fast enough?. >>>> >>>> Perhaps, perhaps not. Relative performance is what matters so >>>> things that don?t get faster while others do tend to get dropped >>>> or not used because they impose a performance penalty relative to >>>> the things that go faster. There is also the ?low-end? phenomenon >>>> where impovements in technology get applied to lowering cost >>>> rather than improving performance. For those environments bad >>>> performance just never get better. >>>> >>>> Do you >>>> think there will be some design of ndn that will *never* have >>>> performance improvement? >>>> >>>> I suspect LPM on data will always be slow (relative to the other >>>> functions). >>>> i suspect exclusions will always be slow because they will >>>> require extra memory references. >>>> >>>> However I of course don?t claim to clairvoyance so this is just >>>> speculation based on 35+ years of seeing performance improve by 4 >>>> orders of magnitude and still having to worry about counting >>>> cycles and memory references? >>>> >>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>> wrote: >>>> >>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>> wrote: >>>> >>>> We should not look at a certain chip nowadays and want ndn to >>>> perform >>>> well on it. It should be the other way around: once ndn app >>>> becomes >>>> popular, a better chip will be designed for ndn. >>>> >>>> While I?m sympathetic to that view, there are three ways in >>>> which Moore?s law or hardware tricks will not save us from >>>> performance flaws in the design: >>>> a) clock rates are not getting (much) faster >>>> b) memory accesses are getting (relatively) more expensive >>>> c) data structures that require locks to manipulate >>>> successfully will be relatively more expensive, even with >>>> near-zero lock contention. >>>> >>>> The fact is, IP *did* have some serious performance flaws in >>>> its design. We just forgot those because the design elements >>>> that depended on those mistakes have fallen into disuse. The >>>> poster children for this are: >>>> 1. IP options. Nobody can use them because they are too slow >>>> on modern forwarding hardware, so they can?t be reliably used >>>> anywhere >>>> 2. the UDP checksum, which was a bad design when it was >>>> specified and is now a giant PITA that still causes major pain >>>> in working around. >>>> >>>> I?m afraid students today are being taught the that designers >>>> of IP were flawless, as opposed to very good scientists and >>>> engineers that got most of it right. >>>> >>>> I feel the discussion today and yesterday has been off-topic. >>>> Now I >>>> see that there are 3 approaches: >>>> 1. we should not define a naming convention at all >>>> 2. typed component: use tlv type space and add a handful of >>>> types >>>> 3. marked component: introduce only one more type and add >>>> additional >>>> marker space >>>> >>>> I know how to make #2 flexible enough to do what things I can >>>> envision we need to do, and with a few simple conventions on >>>> how the registry of types is managed. >>>> >>>> It is just as powerful in practice as either throwing up our >>>> hands and letting applications design their own mutually >>>> incompatible schemes or trying to make naming conventions with >>>> markers in a way that is fast to generate/parse and also >>>> resilient against aliasing. >>>> >>>> Also everybody thinks that the current utf8 marker naming >>>> convention >>>> needs to be revised. >>>> >>>> >>>> >>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>> wrote: >>>> Would that chip be suitable, i.e. can we expect most names >>>> to fit in (the >>>> magnitude of) 96 bytes? What length are names usually in >>>> current NDN >>>> experiments? >>>> >>>> I guess wide deployment could make for even longer names. >>>> Related: Many URLs >>>> I encounter nowadays easily don't fit within two 80-column >>>> text lines, and >>>> NDN will have to carry more information than URLs, as far as >>>> I see. >>>> >>>> >>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>> >>>> In fact, the index in separate TLV will be slower on some >>>> architectures, >>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>> bytes in memory, >>>> then any subsequent memory is accessed only as two adjacent >>>> 32-byte blocks >>>> (there can be at most 5 blocks available at any one time). >>>> If you need to >>>> switch between arrays, it would be very expensive. If you >>>> have to read past >>>> the name to get to the 2nd array, then read it, then backup >>>> to get to the >>>> name, it will be pretty expensive too. >>>> >>>> Marc >>>> >>>> On Sep 18, 2014, at 2:02 PM, >>>> wrote: >>>> >>>> Does this make that much difference? >>>> >>>> If you want to parse the first 5 components. One way to do >>>> it is: >>>> >>>> Read the index, find entry 5, then read in that many bytes >>>> from the start >>>> offset of the beginning of the name. >>>> OR >>>> Start reading name, (find size + move ) 5 times. >>>> >>>> How much speed are you getting from one to the other? You >>>> seem to imply >>>> that the first one is faster. I don?t think this is the >>>> case. >>>> >>>> In the first one you?ll probably have to get the cache line >>>> for the index, >>>> then all the required cache lines for the first 5 >>>> components. For the >>>> second, you?ll have to get all the cache lines for the first >>>> 5 components. >>>> Given an assumption that a cache miss is way more expensive >>>> than >>>> evaluating a number and computing an addition, you might >>>> find that the >>>> performance of the index is actually slower than the >>>> performance of the >>>> direct access. >>>> >>>> Granted, there is a case where you don?t access the name at >>>> all, for >>>> example, if you just get the offsets and then send the >>>> offsets as >>>> parameters to another processor/GPU/NPU/etc. In this case >>>> you may see a >>>> gain IF there are more cache line misses in reading the name >>>> than in >>>> reading the index. So, if the regular part of the name >>>> that you?re >>>> parsing is bigger than the cache line (64 bytes?) and the >>>> name is to be >>>> processed by a different processor, then your might see some >>>> performance >>>> gain in using the index, but in all other circumstances I >>>> bet this is not >>>> the case. I may be wrong, haven?t actually tested it. >>>> >>>> This is all to say, I don?t think we should be designing the >>>> protocol with >>>> only one architecture in mind. (The architecture of sending >>>> the name to a >>>> different processor than the index). >>>> >>>> If you have numbers that show that the index is faster I >>>> would like to see >>>> under what conditions and architectural assumptions. >>>> >>>> Nacho >>>> >>>> (I may have misinterpreted your description so feel free to >>>> correct me if >>>> I?m wrong.) >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>> >>>> wrote: >>>> >>>> Indeed each components' offset must be encoded using a fixed >>>> amount of >>>> bytes: >>>> >>>> i.e., >>>> Type = Offsets >>>> Length = 10 Bytes >>>> Value = Offset1(1byte), Offset2(1byte), ... >>>> >>>> You may also imagine to have a "Offset_2byte" type if your >>>> name is too >>>> long. >>>> >>>> Max >>>> >>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want the first x components) you can directly have it using >>>> the >>>> offsets. With the Nested TLV structure you have to >>>> iteratively parse >>>> the first x-1 components. With the offset structure you cane >>>> directly >>>> access to the firs x components. >>>> >>>> I don't get it. What you described only works if the >>>> "offset" is >>>> encoded in fixed bytes. With varNum, you will still need to >>>> parse x-1 >>>> offsets to get to the x offset. >>>> >>>> >>>> >>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>> wrote: >>>> >>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>> >>>> ah, thanks - that's helpful. I thought you were saying "I >>>> like the >>>> existing NDN UTF8 'convention'." I'm still not sure I >>>> understand what >>>> you >>>> _do_ prefer, though. it sounds like you're describing an >>>> entirely >>>> different >>>> scheme where the info that describes the name-components is >>>> ... >>>> someplace >>>> other than _in_ the name-components. is that correct? when >>>> you say >>>> "field >>>> separator", what do you mean (since that's not a "TL" from a >>>> TLV)? >>>> >>>> Correct. >>>> In particular, with our name encoding, a TLV indicates the >>>> name >>>> hierarchy >>>> with offsets in the name and other TLV(s) indicates the >>>> offset to use >>>> in >>>> order to retrieve special components. >>>> As for the field separator, it is something like "/". >>>> Aliasing is >>>> avoided as >>>> you do not rely on field separators to parse the name; you >>>> use the >>>> "offset >>>> TLV " to do that. >>>> >>>> So now, it may be an aesthetic question but: >>>> >>>> if you do not need the entire hierarchal structure (suppose >>>> you only >>>> want >>>> the first x components) you can directly have it using the >>>> offsets. >>>> With the >>>> Nested TLV structure you have to iteratively parse the first >>>> x-1 >>>> components. >>>> With the offset structure you cane directly access to the >>>> firs x >>>> components. >>>> >>>> Max >>>> >>>> >>>> -- Mark >>>> >>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>> >>>> The why is simple: >>>> >>>> You use a lot of "generic component type" and very few >>>> "specific >>>> component type". You are imposing types for every component >>>> in order >>>> to >>>> handle few exceptions (segmentation, etc..). You create a >>>> rule >>>> (specify >>>> the component's type ) to handle exceptions! >>>> >>>> I would prefer not to have typed components. Instead I would >>>> prefer >>>> to >>>> have the name as simple sequence bytes with a field >>>> separator. Then, >>>> outside the name, if you have some components that could be >>>> used at >>>> network layer (e.g. a TLV field), you simply need something >>>> that >>>> indicates which is the offset allowing you to retrieve the >>>> version, >>>> segment, etc in the name... >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> >>>> >>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>> >>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>> >>>> I think we agree on the small number of "component types". >>>> However, if you have a small number of types, you will end >>>> up with >>>> names >>>> containing many generic components types and few specific >>>> components >>>> types. Due to the fact that the component type specification >>>> is an >>>> exception in the name, I would prefer something that specify >>>> component's >>>> type only when needed (something like UTF8 conventions but >>>> that >>>> applications MUST use). >>>> >>>> so ... I can't quite follow that. the thread has had some >>>> explanation >>>> about why the UTF8 requirement has problems (with aliasing, >>>> e.g.) >>>> and >>>> there's been email trying to explain that applications don't >>>> have to >>>> use types if they don't need to. your email sounds like "I >>>> prefer >>>> the >>>> UTF8 convention", but it doesn't say why you have that >>>> preference in >>>> the face of the points about the problems. can you say why >>>> it is >>>> that >>>> you express a preference for the "convention" with problems ? >>>> >>>> Thanks, >>>> Mark >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From lanwang at memphis.edu Thu Sep 25 13:01:57 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Thu, 25 Sep 2014 20:01:57 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: Message-ID: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> On Sep 25, 2014, at 2:45 AM, Ignacio.Solis at parc.com wrote: > On 9/24/14, 6:30 PM, "Burke, Jeff" wrote: > >> Ok, so sync-style approaches may work better for this example as Marc >> already pointed out, but nonetheless... (Marc, I am catching up on emails >> and will respond to that shortly.) > > Sync can be done without selectors. :-) How? I remember Alex or Yingdi mentioned that chronosync needs selectors and/or at least longest prefix matching (not exact matching). > >>> ? >>> A20- You publish /username/mailbox/list/20 (lifetime of 1 second) >> >> >> This isn't 20 steps. First, no data leaves the publisher without an >> Interest. Second, it's more like one API call: make this list available >> as versioned Data with a minimum allowable time between responses of one >> second. No matter how many requests outstanding, a stable load on the >> source. > > Agreed. I wasn?t counting this as overhead. > > >>> >>> B- You request /username/mailbox/list >>> >>> C- You receive /username/mailbox/list/20 (lifetime of 1 second) >> >> At this point, you decide if list v20 is sufficient for your purposes. >> Perhaps it is. > > This is true in the non selector case as well. > > >> >> Some thoughts: >> >> - In Scheme B, if the list has not changed, you still get a response, >> because the publisher has no way to know anything about the consumer's >> knowledge. In Scheme A, publishers have that knowledge from the exclusion >> and need not reply. > > This is true without selectors. > >> If NACKs are used as heartbeats, they can be returned >> more slowly... say every 3-10 seconds. So, many data packets are >> potentially saved. Hopefully we don't get one email per second... :) > > I?m not sure what you mean by this. I wouldn?t recommend relying on > not-answering for valid requests. Otherwise you start relying on timeouts. > > >> - Benefit seems apparent in multi-consumer scenarios, even without sync. >> Let's say I have 5 personal devices requesting mail. In Scheme B, every >> publisher receives and processes 5 interests per second on average. In >> Scheme A, with an upstream caching node, each receives 1 per second >> maximum. The publisher still has to throttle requests, but with no help >> or scaling support from the network. > > This can be done without selectors. As long as all the clients produce a > request for the same name they can take advantage caching. What Jeff responded to is that scheme B requires a freshness of 0 for the initial interest to get to the producer (in order to get the latest list of email names). If freshness is 0, then there's no caching of the data. No meter how the clients name their Interests, they can't take advantage of caching. Lan > > Nacho > > > > >>> >>> >>> In Scheme A you sent 2 interests, received 2 objects, going all the way >>> to >>> source. >>> In Scheme B you sent 1 interest, received 1 object, going all the way to >>> source. >>> >>> Scheme B is always better (doesn?t need to do C, D) for this example and >>> it uses exact matching. >> >> It's better if your metric is roundtrips and you don't care about load on >> the publisher, lower traffic in times of no new data, etc. But if you >> don't, then you can certainly implement Scheme B on NDN, too. >> >> Jeff >> >>> >>> You can play tricks with the lifetime of the object in both cases, >>> selectors or not. >>> >>>> >>>> - meanwhile, the email client can retrieve the emails using the names >>>> obtained in these lists. Some emails may turn out to be unnecessary, so >>>> they will be discarded when a most recent list comes. The email client >>>> can also keep state about the names of the emails it has deleted to >>>> minimize this problem. >>> >>> This is independent of selectors / exact matching. >>> >>> Nacho >>> >>> >>> >>>> >>>> On Sep 24, 2014, at 2:37 AM, Marc.Mosko at parc.com wrote: >>>> >>>>> Ok, let?s take that example and run with it a bit. I?ll walk through >>>>> a >>>>> ?discover all? example. This example leads me to why I say discovery >>>>> should be separate from data retrieval. I don?t claim that we have a >>>>> final solution to this problem, I think in a distributed peer-to-peer >>>>> environment solving this problem is difficult. If you have a counter >>>>> example as to how this discovery could progress using only the >>>>> information know a priori by the requester, I would be interesting in >>>>> seeing that example worked out. Please do correct me if you think this >>>>> is wrong. >>>>> >>>>> You have mails that were originally numbered 0 - 10000, sequentially >>>>> by >>>>> the server. >>>>> >>>>> You travel between several places and access different emails from >>>>> different places. This populates caches. Lets say 0,3,6,9,? are on >>>>> cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. >>>>> Also, you have deleted 500 random emails, so there?s only 9500 emails >>>>> actually out there. >>>>> >>>>> You setup a new computer and now want to download all your emails. >>>>> The >>>>> new computer is on the path of caches C, B, then A, then the >>>>> authoritative source server. The new email program has no initial >>>>> state. The email program only knows that the email number is an >>>>> integer >>>>> that starts at 0. It issues an interest for /mail/inbox, and asks for >>>>> left-most child because it want to populate in order. It gets a >>>>> response from cache C with mail 2. >>>>> >>>>> Now, what does the email program do? It cannot exclude the range 0..2 >>>>> because that would possibly miss 0 and 1. So, all it can do is exclude >>>>> the exact number ?2? and ask again. It then gets cache C again and it >>>>> responds with ?5?. There are about 3000 emails on cache C, and if they >>>>> all take 4 bytes (for the exclude component plus its coding overhead), >>>>> then that?s 12KB of exclusions to finally exhaust cache C. >>>>> >>>>> If we want Interests to avoid fragmentation, we can fit about 1200 >>>>> bytes of exclusions, or 300 components. This means we need about 10 >>>>> interest messages. Each interest would be something like ?exclude >>>>> 2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, >>>>> 308, >>>>> ?, >600?, etc. >>>>> >>>>> Those interests that exclude everything at cache C would then hit, say >>>>> cache B and start getting results 1, 4, 7, ?. This means an Interest >>>>> like ?exclude 2,5,8,11,?, >300? would then get back number 1. That >>>>> means the next request actually has to split that one interest?s >>>>> exclude >>>>> in to two (because the interest was at maximum size), so you now issue >>>>> two interests where one is ?exclude 1, 2, 5, 8, >210? and the other is >>>>> ?<210, 212, 215, ?, >300?. >>>>> >>>>> If you look in the CCNx 0.8 java code, there should be a class that >>>>> does these Interest based discoveries and does the Interest splitting >>>>> based on the currently know range of discovered content. I don?t have >>>>> the specific reference right now, but I can send a link if you are >>>>> interested in seeing that. The java class keeps state of what has been >>>>> discovered so far, so it could re-start later if interrupted. >>>>> >>>>> So all those interests would now be getting results form cache B. You >>>>> would then start to split all those ranges to accommodate the numbers >>>>> coming back from B. Eventually, you?ll have at least 10 Interest >>>>> messages outstanding that would be excluding all the 9500 messages that >>>>> are in caches A, B, and C. Some of those interest messages might >>>>> actually reach an authoritative server, which might respond too. It >>>>> would like be more than 10 interests due to the algorithm that?s used >>>>> to >>>>> split full interests, which likely is not optimal because it does not >>>>> know exactly where breaks should be a priori. >>>>> >>>>> Once you have exhausted caches A, B, and C, the interest messages >>>>> would >>>>> reach the authoritative source (if its on line), and it would be >>>>> issuing >>>>> NACKs (i assume) for interests have have excluded all non-deleted >>>>> emails. >>>>> >>>>> In any case, it takes, at best, 9,500 round trips to ?discover? all >>>>> 9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 >>>>> bytes >>>>> of Interest exclusions. Note that it?s an arithmetic sum of bytes of >>>>> exclusion, because at each Interest the size of the exclusions >>>>> increases >>>>> by 4. There was an NDN paper about light bulb discovery (or something >>>>> like that) that noted this same problem and proposed some work around, >>>>> but I don?t remember what they proposed. >>>>> >>>>> Yes, you could possibly pipeline it, but what would you do? In this >>>>> example, where emails 0 - 10000 (minus some random ones) would allow >>>>> you >>>>> ? if you knew a priori ? to issue say 10 interest in parallel that ask >>>>> for different ranges. But, 2 years from now your undeleted emails >>>>> might >>>>> range form 100,000 - 150,000. The point is that a discovery protocol >>>>> does not know, a priori, what is to be discovered. It might start >>>>> learning some stuff as it goes on. >>>>> >>>>> If you could have retrieved just a table of contents from each cache, >>>>> where each ?row? is say 64 bytes (i.e. the name continuation plus hash >>>>> value), you would need to retrieve 3300 * 64 = 211KB from each cache >>>>> (total 640 KB) to list all the emails. That would take 640KB / 1200 = >>>>> 534 interest messages of say 64 bytes = 34 KB to discover all 9500 >>>>> emails plus another set to fetch the header rows. That?s, say 68 KB of >>>>> interest traffic compared to 200 MB. Now, I?ve not said how to list >>>>> these tables of contents, so an actual protocol might be higher >>>>> communication cost, but even if it was 10x worse that would still be an >>>>> attractive tradeoff. >>>>> >>>>> This assumes that you publish just the ?header? in the 1st segment >>>>> (say >>>>> 1 KB total object size including the signatures). That?s 10 MB to >>>>> learn >>>>> the headers. >>>>> >>>>> You could also argue that the distribute of emails over caches is >>>>> arbitrary. That?s true, I picked a difficult sequence. But unless you >>>>> have some positive controls on what could be in a cache, it could be >>>>> any >>>>> difficult sequence. I also did not address the timeout issue, and how >>>>> do you know you are done? >>>>> >>>>> This is also why sync works so much better than doing raw interest >>>>> discovery. Sync exchanges tables of contents and diffs, it does not >>>>> need to enumerate by exclusion everything to retrieve. >>>>> >>>>> Marc >>>>> >>>>> >>>>> >>>>> On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu wrote: >>>>> >>>>>> discovery can be reduced to "pattern detection" (can we infer what >>>>>> exists?) and "pattern validation" (can we confirm this guess?) >>>>>> >>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>>>> pattern with static (/mail/inbox) and variable (148) components; with >>>>>> proper naming convention, computers can also detect this pattern >>>>>> easily. Now I want to look for all mails in my inbox. I can generate >>>>>> a >>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>> selectors >>>>>> I can further refine my guesses. >>>>>> >>>>>> To validate them, bloom filter can provide "best effort" >>>>>> discovery(with some false positives, so I call it "best-effort") >>>>>> before I stupidly send all the interests to the network. >>>>>> >>>>>> The discovery protocol, as I described above, is essentially "pattern >>>>>> detection by naming convention" and "bloom filter validation." This >>>>>> is >>>>>> definitely one of the "simpler" discovery protocol, because the data >>>>>> producer only need to add additional bloom filter. Notice that we can >>>>>> progressively add entries to bfilter with low computation cost. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>> Ok, yes I think those would all be good things. >>>>>>> >>>>>>> One thing to keep in mind, especially with things like time series >>>>>>> sensor >>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>> That?s easy >>>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>> not >>>>>>> know >>>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>> the >>>>>>> data >>>>>>> explicitly require. That said, I think some of the things you >>>>>>> listed >>>>>>> are >>>>>>> good places to start: sensor data, web content, climate data or >>>>>>> genome data. >>>>>>> >>>>>>> We also need to state what the forwarding strategies are and what >>>>>>> the >>>>>>> cache >>>>>>> behavior is. >>>>>>> >>>>>>> I outlined some of the points that I think are important in that >>>>>>> other >>>>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>>>> important, and that one gets complicated fast. So points like >>>>>>> separating >>>>>>> discovery from retrieval and working with large data sets have been >>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>> starting >>>>>>> from 0 and working through the Discovery service definition from >>>>>>> scratch >>>>>>> along with data set use cases. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>> wrote: >>>>>>> >>>>>>> Hi Marc, >>>>>>> >>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>>> more >>>>>>> specific, which was to see if we could identify a few specific use >>>>>>> cases >>>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>> data and >>>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>>> sets; >>>>>>> local data in a vehicular network; etc.) What have you been looking >>>>>>> at >>>>>>> that's driving considerations of discovery? >>>>>>> >>>>>>> Thanks, >>>>>>> Jeff >>>>>>> >>>>>>> From: >>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>> To: Jeff Burke >>>>>>> Cc: , >>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>> >>>>>>> Jeff, >>>>>>> >>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>> Discovery. >>>>>>> >>>>>>> >>>>>>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/00 >>>>>>> 0 >>>>>>> 2 >>>>>>> 00.html >>>>>>> >>>>>>> I think it would be very productive to talk about what Discovery >>>>>>> should do, >>>>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>>> the how, >>>>>>> which I think is a less important topic than the what at this stage. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>> wrote: >>>>>>> >>>>>>> Marc, >>>>>>> >>>>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>>> based >>>>>>> on use cases. What are the use cases you are using to evaluate >>>>>>> discovery? >>>>>>> >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>> wrote: >>>>>>> >>>>>>> No matter what the expressiveness of the predicates if the forwarder >>>>>>> can >>>>>>> send interests different ways you don't have a consistent underlying >>>>>>> set >>>>>>> to talk about so you would always need non-range exclusions to >>>>>>> discover >>>>>>> every version. >>>>>>> >>>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>> answer. >>>>>>> If different content pieces are scattered between different caches I >>>>>>> don't see how range exclusions would work to discover every version. >>>>>>> >>>>>>> I'm sorry to be pointing out problems without offering solutions but >>>>>>> we're not ready to publish our discovery protocols. >>>>>>> >>>>>>> Sent from my telephone >>>>>>> >>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>>>>> >>>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>>> the >>>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>>> better. >>>>>>> >>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>> soon >>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>> Regular >>>>>>> language or context free language might become part of selector too. >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>> That will get you one reading then you need to exclude it and ask >>>>>>> again. >>>>>>> >>>>>>> Sent from my telephone >>>>>>> >>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>>>>> >>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>> with a particular cache, then you need to always use individual >>>>>>> excludes not range excludes if you want to discover all the versions >>>>>>> of an object. >>>>>>> >>>>>>> >>>>>>> I am very confused. For your example, if I want to get all today's >>>>>>> sensor data, I just do (Any..Last second of last day)(First second >>>>>>> of >>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>> >>>>>>> >>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>> >>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>> all range exclusions and only exclude explicit versions. >>>>>>> >>>>>>> >>>>>>> Could you explain why missing content object situation happens? also >>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>> exclude; >>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>> possible. >>>>>>> >>>>>>> >>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>> with a particular cache, then you need to always use individual >>>>>>> excludes not range excludes if you want to discover all the versions >>>>>>> of an object. For something like a sensor reading that is updated, >>>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>> exclusions (plus encoding overhead) per day. >>>>>>> >>>>>>> yes, maybe using a more deterministic version number than a >>>>>>> timestamp makes sense here, but its just an example of needing a lot >>>>>>> of exclusions. >>>>>>> >>>>>>> >>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>> cache B >>>>>>> >>>>>>> >>>>>>> I feel this case is invalid because cache A will also get the >>>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>>> if >>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>> know >>>>>>> that v101 even exist? >>>>>>> >>>>>>> >>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>> cache >>>>>>> A and cache B, then your application could miss v101. >>>>>>> >>>>>>> >>>>>>> >>>>>>> c,d In general I agree that LPM performance is related to the number >>>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>>> every node will be faster or not because of lock overhead. >>>>>>> >>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>> (ndn >>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>> >>>>>>> >>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>>> ready to claim its better yet because we have not done that. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>> I would point out that using LPM on content object to Interest >>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>> involves more than just ?latest version? discovery too. >>>>>>> >>>>>>> This is probably getting off-topic from the original post about >>>>>>> naming conventions. >>>>>>> >>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>> different caches are responding, the exclusion set you build up >>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>> content objects you want to discovery unless you avoid all range >>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>> explicit discovery protocol that allows conversations about >>>>>>> consistent sets is better. >>>>>>> >>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>> should be transitive between caches, but imagine this. You send >>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>> just don?t know. If you cannot have a conversation around >>>>>>> consistent sets, it seems like even doing latest version discovery >>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>> authoritative source because you can never believe an intermediate >>>>>>> cache that there?s not something more recent. >>>>>>> >>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>>> a node can correctly discover the latest version. We have not >>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>> have the non-determinism sketched above. >>>>>>> >>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>>> the content object name and evaluate the selector predicate. >>>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>>> structures based on predicates, maybe those would be better. But >>>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>> other things to optimize lookup for an extra write. >>>>>>> >>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>> and that will be expensive. It would be interesting to see what a >>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> I had thought about these questions, but I want to know your idea >>>>>>> besides typed component: >>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>> things? >>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>> other >>>>>>> faster technique to replace selector? >>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>> wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>> envision we need to do, and with a few simple conventions on >>>>>>> how the registry of types is managed. >>>>>>> >>>>>>> >>>>>>> Could you share it with us? >>>>>>> >>>>>>> Sure. Here?s a strawman. >>>>>>> >>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>> >>>>>>> The type space is currently shared with the types used for the >>>>>>> entire protocol, that gives us two options: >>>>>>> (1) we reserve a range for name component types. Given the >>>>>>> likelihood there will be at least as much and probably more need >>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>> of the type space, giving us 32K types for name components. >>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>> name component types. >>>>>>> >>>>>>> We divide the type space into regions, and manage it with a >>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>> there are well-understood rule sets for different kinds of >>>>>>> registries (open, requires a written spec, requires standards >>>>>>> approval). >>>>>>> >>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>> name?, which would be used on name prefixes and other common >>>>>>> cases where there are no special semantics on the name component. >>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>> globally understood types that are part of the base or extension >>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>> (say another 1024 types) >>>>>>> - We give the rest of the space to application assignment. >>>>>>> >>>>>>> Make sense? >>>>>>> >>>>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>> performance flaws in the design >>>>>>> >>>>>>> >>>>>>> we could design for performance, >>>>>>> >>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>> Moore?s Law will come to the rescue. >>>>>>> >>>>>>> but I think there will be a turning >>>>>>> point when the slower design starts to become "fast enough?. >>>>>>> >>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>> or not used because they impose a performance penalty relative to >>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>> where impovements in technology get applied to lowering cost >>>>>>> rather than improving performance. For those environments bad >>>>>>> performance just never get better. >>>>>>> >>>>>>> Do you >>>>>>> think there will be some design of ndn that will *never* have >>>>>>> performance improvement? >>>>>>> >>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>> functions). >>>>>>> i suspect exclusions will always be slow because they will >>>>>>> require extra memory references. >>>>>>> >>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>> orders of magnitude and still having to worry about counting >>>>>>> cycles and memory references? >>>>>>> >>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>> wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>> perform >>>>>>> well on it. It should be the other way around: once ndn app >>>>>>> becomes >>>>>>> popular, a better chip will be designed for ndn. >>>>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>> performance flaws in the design: >>>>>>> a) clock rates are not getting (much) faster >>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>> c) data structures that require locks to manipulate >>>>>>> successfully will be relatively more expensive, even with >>>>>>> near-zero lock contention. >>>>>>> >>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>> its design. We just forgot those because the design elements >>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>> poster children for this are: >>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>> anywhere >>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>> in working around. >>>>>>> >>>>>>> I?m afraid students today are being taught the that designers >>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>> engineers that got most of it right. >>>>>>> >>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>> Now I >>>>>>> see that there are 3 approaches: >>>>>>> 1. we should not define a naming convention at all >>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>> types >>>>>>> 3. marked component: introduce only one more type and add >>>>>>> additional >>>>>>> marker space >>>>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>> envision we need to do, and with a few simple conventions on >>>>>>> how the registry of types is managed. >>>>>>> >>>>>>> It is just as powerful in practice as either throwing up our >>>>>>> hands and letting applications design their own mutually >>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>> markers in a way that is fast to generate/parse and also >>>>>>> resilient against aliasing. >>>>>>> >>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>> convention >>>>>>> needs to be revised. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>> wrote: >>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>> to fit in (the >>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>> current NDN >>>>>>> experiments? >>>>>>> >>>>>>> I guess wide deployment could make for even longer names. >>>>>>> Related: Many URLs >>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>> text lines, and >>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>> I see. >>>>>>> >>>>>>> >>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>> >>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>> architectures, >>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>> bytes in memory, >>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>> 32-byte blocks >>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>> If you need to >>>>>>> switch between arrays, it would be very expensive. If you >>>>>>> have to read past >>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>> to get to the >>>>>>> name, it will be pretty expensive too. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>> wrote: >>>>>>> >>>>>>> Does this make that much difference? >>>>>>> >>>>>>> If you want to parse the first 5 components. One way to do >>>>>>> it is: >>>>>>> >>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>> from the start >>>>>>> offset of the beginning of the name. >>>>>>> OR >>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>> >>>>>>> How much speed are you getting from one to the other? You >>>>>>> seem to imply >>>>>>> that the first one is faster. I don?t think this is the >>>>>>> case. >>>>>>> >>>>>>> In the first one you?ll probably have to get the cache line >>>>>>> for the index, >>>>>>> then all the required cache lines for the first 5 >>>>>>> components. For the >>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>> 5 components. >>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>> than >>>>>>> evaluating a number and computing an addition, you might >>>>>>> find that the >>>>>>> performance of the index is actually slower than the >>>>>>> performance of the >>>>>>> direct access. >>>>>>> >>>>>>> Granted, there is a case where you don?t access the name at >>>>>>> all, for >>>>>>> example, if you just get the offsets and then send the >>>>>>> offsets as >>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>> you may see a >>>>>>> gain IF there are more cache line misses in reading the name >>>>>>> than in >>>>>>> reading the index. So, if the regular part of the name >>>>>>> that you?re >>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>> name is to be >>>>>>> processed by a different processor, then your might see some >>>>>>> performance >>>>>>> gain in using the index, but in all other circumstances I >>>>>>> bet this is not >>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>> >>>>>>> This is all to say, I don?t think we should be designing the >>>>>>> protocol with >>>>>>> only one architecture in mind. (The architecture of sending >>>>>>> the name to a >>>>>>> different processor than the index). >>>>>>> >>>>>>> If you have numbers that show that the index is faster I >>>>>>> would like to see >>>>>>> under what conditions and architectural assumptions. >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> (I may have misinterpreted your description so feel free to >>>>>>> correct me if >>>>>>> I?m wrong.) >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nacho (Ignacio) Solis >>>>>>> Protocol Architect >>>>>>> Principal Scientist >>>>>>> Palo Alto Research Center (PARC) >>>>>>> +1(650)812-4458 >>>>>>> Ignacio.Solis at parc.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>> amount of >>>>>>> bytes: >>>>>>> >>>>>>> i.e., >>>>>>> Type = Offsets >>>>>>> Length = 10 Bytes >>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>> >>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>> name is too >>>>>>> long. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>> you only >>>>>>> want the first x components) you can directly have it using >>>>>>> the >>>>>>> offsets. With the Nested TLV structure you have to >>>>>>> iteratively parse >>>>>>> the first x-1 components. With the offset structure you cane >>>>>>> directly >>>>>>> access to the firs x components. >>>>>>> >>>>>>> I don't get it. What you described only works if the >>>>>>> "offset" is >>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>> parse x-1 >>>>>>> offsets to get to the x offset. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>> wrote: >>>>>>> >>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>> >>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>> like the >>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>> understand what >>>>>>> you >>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>> entirely >>>>>>> different >>>>>>> scheme where the info that describes the name-components is >>>>>>> ... >>>>>>> someplace >>>>>>> other than _in_ the name-components. is that correct? when >>>>>>> you say >>>>>>> "field >>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>> TLV)? >>>>>>> >>>>>>> Correct. >>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>> name >>>>>>> hierarchy >>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>> offset to use >>>>>>> in >>>>>>> order to retrieve special components. >>>>>>> As for the field separator, it is something like "/". >>>>>>> Aliasing is >>>>>>> avoided as >>>>>>> you do not rely on field separators to parse the name; you >>>>>>> use the >>>>>>> "offset >>>>>>> TLV " to do that. >>>>>>> >>>>>>> So now, it may be an aesthetic question but: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>> you only >>>>>>> want >>>>>>> the first x components) you can directly have it using the >>>>>>> offsets. >>>>>>> With the >>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>> x-1 >>>>>>> components. >>>>>>> With the offset structure you cane directly access to the >>>>>>> firs x >>>>>>> components. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> -- Mark >>>>>>> >>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> The why is simple: >>>>>>> >>>>>>> You use a lot of "generic component type" and very few >>>>>>> "specific >>>>>>> component type". You are imposing types for every component >>>>>>> in order >>>>>>> to >>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>> rule >>>>>>> (specify >>>>>>> the component's type ) to handle exceptions! >>>>>>> >>>>>>> I would prefer not to have typed components. Instead I would >>>>>>> prefer >>>>>>> to >>>>>>> have the name as simple sequence bytes with a field >>>>>>> separator. Then, >>>>>>> outside the name, if you have some components that could be >>>>>>> used at >>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>> that >>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>> version, >>>>>>> segment, etc in the name... >>>>>>> >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>> >>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> I think we agree on the small number of "component types". >>>>>>> However, if you have a small number of types, you will end >>>>>>> up with >>>>>>> names >>>>>>> containing many generic components types and few specific >>>>>>> components >>>>>>> types. Due to the fact that the component type specification >>>>>>> is an >>>>>>> exception in the name, I would prefer something that specify >>>>>>> component's >>>>>>> type only when needed (something like UTF8 conventions but >>>>>>> that >>>>>>> applications MUST use). >>>>>>> >>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>> explanation >>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>> e.g.) >>>>>>> and >>>>>>> there's been email trying to explain that applications don't >>>>>>> have to >>>>>>> use types if they don't need to. your email sounds like "I >>>>>>> prefer >>>>>>> the >>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>> preference in >>>>>>> the face of the points about the problems. can you say why >>>>>>> it is >>>>>>> that >>>>>>> you express a preference for the "convention" with problems ? >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> . >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > From lanwang at memphis.edu Thu Sep 25 13:35:27 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Thu, 25 Sep 2014 20:35:27 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: On Sep 25, 2014, at 3:10 AM, Ignacio.Solis at parc.com wrote: > On 9/25/14, 9:17 AM, "Marc.Mosko at parc.com" wrote: >> In the CCNx 1.0 spec, one could also encode this a different way. One >> could use a name like ?/mail/inbox/selector_matching/? >> and in the payload include "exclude_before=(t=version, l=2, v=279) & >> sort=right?. > > > I want to highlight this. > > There is a role that selectors can play in a network. However, our > biggest issue with selectors is that they are mandated at the forwarder > level. This means that every node must support selectors. > > We want to make sure that the core protocol is simple and efficient. > Exact matching gives us that. If you?re interested in selector matching > and searching, then create that protocol over exact matching. > > Marc just described a simple ?Selector Protocol", basically: > - Encode selectors (or any query you want) in the interest payload. > - Add a name segment to indicate that this is a selector based query > - Add a name segment to uniquely identify the query (a hash of the payload > for example) > > Example: > name = /mail/inbox/list/selector_matching/ > > payload = version > 100 > > Topology: > > A ?? B ?? C > > A and C run the Selector Protocol > B does not run the Selector Protocol > > Now: > - Any node that does not understand the Selector Protocol (B) forwards > normally and does exact matching. > - Any node that understands the Selector Protocol (C) can parse the > payload to find a match. > > If no match is found, forward the interest. > If a match is found, create a reply. > > The reply can contain 2 types of data: > - Structured data with links to the actual content objects > - Encapsulated content objects > > So, in our example, the Selector Protocol reply could be: > > name = /mail/inbox/list/selector_matching/ > payload = > [ matching name = /mail/inbox/list/v101 ] > [ embedded object < name = /mail/inbox/list/v101, payload = list, > signature = mail server > ] > signature = responding cache > > > > A few notes: > - Malicious nodes could inject false replies. So, if C is malicious, it > can insert a reply linking to some random object or just return junk. > Well, this would be the case with regular selectors as well. C could > reply with random crap or it could reply with a valid answer that is not > the optimal answer (so, for example, not the right-most child or > something). > This is something that we can?t prevent. > > In the case of CCN, our fast path does not check signatures, so you > wouldn?t be able to check the signature of the reply no matter what. I?m > unsure if NDN is still advocating that every node checks signatures. If > you are, then this approach might not work for you. In NDN, if a router wants to check the signature, then it can check. If it wants to skip the checking, that's fine too. If the design doesn't allow the router to verify the signature, then that's a problem. In the above description, the cache signs a data packet with a name owned by someone else, it seems problematic for a design to advocate this. > > Nodes that DO understand the Selector Protocol can check the signature of > the encapsulated reply (if they wanted to). > Nodes that DO understand the Selector Protocol can unpack the reply, and > add the corresponding object to their cache, effectively enabling them to > answer other Selector Protocol queries. > > - The reply from the Selector Protocol enabled node (C), could: > ? include a list of all valid answers > ? embed no objects > ? embed more than 1 object > ? process complex queries, regex, etc. > > The Selector Protocol could also: > - include a method for authentication > - include a cursor or some other state between queries > > > I think this sort of protocol gives you everything you want while still > maintaining an exact match protocol as the core protocol. > > > What is this protocol missing to satisfy your needs? > Can we create a protocol that will satisfy your needs on top of exact > matching? One difference is that here the returned data can satisfy only that interest. In the original selector design, the returned data can satisfy other Interests with the same name but different selectors (e.g., > 50). Lan > > > Nacho > > > > -- > Nacho (Ignacio) Solis > Protocol Architect > Principal Scientist > Palo Alto Research Center (PARC) > +1(650)812-4458 > Ignacio.Solis at parc.com > > > > > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Thu Sep 25 14:09:32 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Thu, 25 Sep 2014 21:09:32 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> References: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> Message-ID: <7ACEC7C8-4DCB-4B22-8540-49D4C71FE116@parc.com> On Sep 25, 2014, at 10:01 PM, Lan Wang (lanwang) wrote: >>> - Benefit seems apparent in multi-consumer scenarios, even without sync. >>> Let's say I have 5 personal devices requesting mail. In Scheme B, every >>> publisher receives and processes 5 interests per second on average. In >>> Scheme A, with an upstream caching node, each receives 1 per second >>> maximum. The publisher still has to throttle requests, but with no help >>> or scaling support from the network. >> >> This can be done without selectors. As long as all the clients produce a >> request for the same name they can take advantage caching. > > What Jeff responded to is that scheme B requires a freshness of 0 for the initial interest to get to the producer (in order to get the latest list of email names). If freshness is 0, then there's no caching of the data. No meter how the clients name their Interests, they can't take advantage of caching. > How do selectors prevent you from sending an Interest to the producer, if it?s connected. I send first interest ?exclude <= 100? and cache A responds with version 110. Don?t you then turn around and send a second interest ?exclude <= 110? to see if another cache has a more recent version? Won?t that interest go to the producer, if its connected? It will then need to send a NACk (or you need to timeout), if there?s nothing more recent. Using selectors, you still never know if there?s a more recent version until you get to the producer or you timeout. You always need to keep asking and asking. Also, there?s nothing special about the content object from the producer, so you still don?t necessarily believe that its the most recent, and you?ll then ask again. Sure, an application could just accept the 1st or 2nd content object it gets back, but it never really knows. Sure, if the CreateTime (I think you call it Timestamp in NDN, if you still have it) is very recent and you assume synchronized clocks, then you might have some belief that it?s current. We could also talk about FreshnessSeconds and MustBeFresh, but that would be best to start its own thread on. Marc -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Marc.Mosko at parc.com Thu Sep 25 14:39:46 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Thu, 25 Sep 2014 21:39:46 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <7ACEC7C8-4DCB-4B22-8540-49D4C71FE116@parc.com> References: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> <7ACEC7C8-4DCB-4B22-8540-49D4C71FE116@parc.com> Message-ID: <3D15CFCB-18F5-4CB3-AB94-5C0482DD1BD0@parc.com> I should clarify that the problems I describe below for selectors never knowing if there?s a more recent version apply to, I think, any protocol that?s talking with caches. I don?t mean to pick on selectors, these arguments apply in general. If the caches cannot tell you about the most recent, then you?ll never know by asking them. So its not just selectors, it applies to exact match names or continuation names or any discovery method that gets responses from intermediate caches (unless there?s a consensus protocol operating, but then they are active participants not opportunistic caches). Only a response from the producer ? that says its from the producer and recent ? could give an application assurance that its getting the really most recent version. Distributed systems and consistency is a difficult problem. Marc On Sep 25, 2014, at 11:09 PM, wrote: > > On Sep 25, 2014, at 10:01 PM, Lan Wang (lanwang) wrote: > >>>> - Benefit seems apparent in multi-consumer scenarios, even without sync. >>>> Let's say I have 5 personal devices requesting mail. In Scheme B, every >>>> publisher receives and processes 5 interests per second on average. In >>>> Scheme A, with an upstream caching node, each receives 1 per second >>>> maximum. The publisher still has to throttle requests, but with no help >>>> or scaling support from the network. >>> >>> This can be done without selectors. As long as all the clients produce a >>> request for the same name they can take advantage caching. >> >> What Jeff responded to is that scheme B requires a freshness of 0 for the initial interest to get to the producer (in order to get the latest list of email names). If freshness is 0, then there's no caching of the data. No meter how the clients name their Interests, they can't take advantage of caching. >> > > How do selectors prevent you from sending an Interest to the producer, if it?s connected. I send first interest ?exclude <= 100? and cache A responds with version 110. Don?t you then turn around and send a second interest ?exclude <= 110? to see if another cache has a more recent version? Won?t that interest go to the producer, if its connected? It will then need to send a NACk (or you need to timeout), if there?s nothing more recent. > > Using selectors, you still never know if there?s a more recent version until you get to the producer or you timeout. You always need to keep asking and asking. Also, there?s nothing special about the content object from the producer, so you still don?t necessarily believe that its the most recent, and you?ll then ask again. Sure, an application could just accept the 1st or 2nd content object it gets back, but it never really knows. Sure, if the CreateTime (I think you call it Timestamp in NDN, if you still have it) is very recent and you assume synchronized clocks, then you might have some belief that it?s current. > > We could also talk about FreshnessSeconds and MustBeFresh, but that would be best to start its own thread on. > > Marc > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From iliamo at ucla.edu Thu Sep 25 14:51:56 2014 From: iliamo at ucla.edu (Ilya Moiseenko) Date: Thu, 25 Sep 2014 14:51:56 -0700 Subject: [Ndn-interest] NDNcomm 2014 videos In-Reply-To: References: Message-ID: <2637A161-EE48-4B68-9A42-7DD07C711644@ucla.edu> Hi Jeff, I have a plan to rewrite NDNvideo using Consumer-Producer API. Not sure it?ll be done next week though. Ilya On Sep 24, 2014, at 10:07 PM, Burke, Jeff wrote: > > Yes. If someone can help, we can get this up and running... :) > Probably about a week to port and a week to test with the specific video files. > > Jeff > > > From: Alex Horn > Date: Wed, 24 Sep 2014 15:28:15 -0700 > Cc: "ndn-interest at lists.cs.ucla.edu" > Subject: Re: [Ndn-interest] NDNcomm 2014 videos > >> Indeed - we had ndn - video testbed distribution for a few years ! >> >> ndn-video (source, tech report) was very useful in testing the early testbed. >> >> unfortunately it is out of date, as: >> >> a) uses pyccn/ccnx - needs to be updated to pyndn2/NFD >> b) uses gst .010 - needs to be updated to gst 1.X >> >> we don't internally have the resources for that, at the moment... >> >> (our recent video work has been in ndn-RTC) >> >> but if someone wanted to take it on, it is a fairly straightforward effort. >> >> meanwhile, we will get the conference video online in some form as soon as possible. >> >> thanks for your interest ! >> >> Alex >> >> >> >> On Wed, Sep 24, 2014 at 1:54 PM, Xiaoke Jiang wrote: >>> On Wednesday, 24 September, 2014 at 1:46 pm, Beichuan Zhang wrote: >>>> Maybe we can stream it over NDN testbed -:) There?re 3 nodes in China. >>>> >>> >>> Good solution. Some other guys in China also complained the slow connection. also, NDN could shows its advantage over data delivery with this solution. >>> >>> Xiaoke (Shock) >> _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From abannis at ucla.edu Thu Sep 25 15:26:47 2014 From: abannis at ucla.edu (Adeola Bannis) Date: Thu, 25 Sep 2014 15:26:47 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <3D15CFCB-18F5-4CB3-AB94-5C0482DD1BD0@parc.com> References: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> <7ACEC7C8-4DCB-4B22-8540-49D4C71FE116@parc.com> <3D15CFCB-18F5-4CB3-AB94-5C0482DD1BD0@parc.com> Message-ID: I understand that there were probably many network architecture decisions that went into the following, and I am not primarily a network architect. However, it seems most of these examples run up against at least one of two things: 1) An interest with a selector is allowed to match in a cache. By this I mean, the application semantics of 'leftmost child' and 'rightmost child' as I understand them are 'give me the earliest data ever produced under this prefix', or 'give me the latest data ever', respectively. (Early/late are lexicographical, not temporal.) Allowing caches to respond changes the meaning of the request to 'Give me the earliest/latest data that you've handled recently', and depending on how frequently things are evicted from the cache, it's almost meaningless. I think interests with selectors have special semantics and should never be served from a cache; they should propagate all the way back to the publisher. Excludes might be able to stay the same; the intent is 'please send me *any* data with this prefix, as long as it's not ___'. 2) An interest can only be answered with one Data packet. I understand the PIT will stay smaller, on average, if we evict an entry as soon as it is satisfied. However, interests have time-outs associated with them; would it be possible to just forward everything that comes in until it is time to evict the entry? I understand it might also complicate the handling on the interest sender's side, since now you would either have to block delivery until the timeout ends and deliver the batch of responses, or maintain the timeout in the application somewhere and handle the stream of response data. But in the case of multiple caches needing to be exhausted, they could at least be searched in parallel, reducing the number of interests that need to be sent. -Adeola On Thu, Sep 25, 2014 at 2:39 PM, wrote: > I should clarify that the problems I describe below for selectors never > knowing if there?s a more recent version apply to, I think, any protocol > that?s talking with caches. I don?t mean to pick on selectors, these > arguments apply in general. If the caches cannot tell you about the most > recent, then you?ll never know by asking them. So its not just selectors, > it applies to exact match names or continuation names or any discovery > method that gets responses from intermediate caches (unless there?s a > consensus protocol operating, but then they are active participants not > opportunistic caches). > > Only a response from the producer ? that says its from the producer and > recent ? could give an application assurance that its getting the really > most recent version. Distributed systems and consistency is a difficult > problem. > > Marc > > On Sep 25, 2014, at 11:09 PM, > wrote: > > > > > On Sep 25, 2014, at 10:01 PM, Lan Wang (lanwang) > wrote: > > > >>>> - Benefit seems apparent in multi-consumer scenarios, even without > sync. > >>>> Let's say I have 5 personal devices requesting mail. In Scheme B, > every > >>>> publisher receives and processes 5 interests per second on average. > In > >>>> Scheme A, with an upstream caching node, each receives 1 per second > >>>> maximum. The publisher still has to throttle requests, but with no > help > >>>> or scaling support from the network. > >>> > >>> This can be done without selectors. As long as all the clients > produce a > >>> request for the same name they can take advantage caching. > >> > >> What Jeff responded to is that scheme B requires a freshness of 0 for > the initial interest to get to the producer (in order to get the latest > list of email names). If freshness is 0, then there's no caching of the > data. No meter how the clients name their Interests, they can't take > advantage of caching. > >> > > > > How do selectors prevent you from sending an Interest to the producer, > if it?s connected. I send first interest ?exclude <= 100? and cache A > responds with version 110. Don?t you then turn around and send a second > interest ?exclude <= 110? to see if another cache has a more recent > version? Won?t that interest go to the producer, if its connected? It > will then need to send a NACk (or you need to timeout), if there?s nothing > more recent. > > > > Using selectors, you still never know if there?s a more recent version > until you get to the producer or you timeout. You always need to keep > asking and asking. Also, there?s nothing special about the content object > from the producer, so you still don?t necessarily believe that its the most > recent, and you?ll then ask again. Sure, an application could just accept > the 1st or 2nd content object it gets back, but it never really knows. > Sure, if the CreateTime (I think you call it Timestamp in NDN, if you still > have it) is very recent and you assume synchronized clocks, then you might > have some belief that it?s current. > > > > We could also talk about FreshnessSeconds and MustBeFresh, but that > would be best to start its own thread on. > > > > Marc > > > > _______________________________________________ > > Ndn-interest mailing list > > Ndn-interest at lists.cs.ucla.edu > > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ignacio.Solis at parc.com Thu Sep 25 16:17:06 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Thu, 25 Sep 2014 23:17:06 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: >In NDN, if a router wants to check the signature, then it can check. If >it wants to skip the checking, that's fine too. If the design doesn't >allow the router to verify the signature, then that's a problem. In the >above description, the cache signs a data packet with a name owned by >someone else, it seems problematic for a design to advocate this. Routers that support the Selector Protocol could check signatures. > >One difference is that here the returned data can satisfy only that >interest. In the original selector design, the returned data can satisfy >other Interests with the same name but different selectors (e.g., > 50). Interests with the same selector would be exact matched throughout the network at any node. Interests with different selectors would not match on all routers, but they would match just fine on routers that supported the Selector Protocol. Basically, everything you want works on routers that support the Selector Protocol, but routers that don?t want to support it, don?t have to. Nacho >> >> >> >> -- >> Nacho (Ignacio) Solis >> Protocol Architect >> Principal Scientist >> Palo Alto Research Center (PARC) >> +1(650)812-4458 >> Ignacio.Solis at parc.com >> >> >> >> >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From jburke at remap.ucla.edu Fri Sep 26 00:11:24 2014 From: jburke at remap.ucla.edu (Burke, Jeff) Date: Fri, 26 Sep 2014 07:11:24 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: Message-ID: I understand that there were probably many network architecture decisions that went into the following, and I am not primarily a network architect. However, it seems most of these examples run up against at least one of two things: 1) An interest with a selector is allowed to match in a cache. By this I mean, the application semantics of 'leftmost child' and 'rightmost child' as I understand them are 'give me the earliest data ever produced under this prefix', or 'give me the latest data ever', respectively. (Early/late are lexicographical, not temporal.) Allowing caches to respond changes the meaning of the request to 'Give me the earliest/latest data that you've handled recently', and depending on how frequently things are evicted from the cache, it's almost meaningless. I think interests with selectors have special semantics and should never be served from a cache; they should propagate all the way back to the publisher. Excludes might be able to stay the same; the intent is 'please send me *any* data with this prefix, as long as it's not ___'. I think differentiating between a cache and a producer is not necessarily a good thing. A core architectural idea (as I interpret it) of NDN, is that you asking the network for content, not a specific producer (or cache). Direct communication is a special case. If we focus on that, even in the case of selectors, we seem to lose a lot of interesting scaling properties. Not sure that we want to do that so easily. 2) An interest can only be answered with one Data packet. I understand the PIT will stay smaller, on average, if we evict an entry as soon as it is satisfied. However, interests have time-outs associated with them; would it be possible to just forward everything that comes in until it is time to evict the entry? I understand it might also complicate the handling on the interest sender's side, since now you would either have to block delivery until the timeout ends and deliver the batch of responses, or maintain the timeout in the application somewhere and handle the stream of response data. But in the case of multiple caches needing to be exhausted, they could at least be searched in parallel, reducing the number of interests that need to be sent. There was a paper @ ICN that proposed this as well. But there are some flow balance and denial-of-service issues, among other things, that perhaps others can comment on. -Adeola On Thu, Sep 25, 2014 at 2:39 PM, > wrote: I should clarify that the problems I describe below for selectors never knowing if there?s a more recent version apply to, I think, any protocol that?s talking with caches. I don?t mean to pick on selectors, these arguments apply in general. If the caches cannot tell you about the most recent, then you?ll never know by asking them. So its not just selectors, it applies to exact match names or continuation names or any discovery method that gets responses from intermediate caches (unless there?s a consensus protocol operating, but then they are active participants not opportunistic caches). Only a response from the producer ? that says its from the producer and recent ? could give an application assurance that its getting the really most recent version. Distributed systems and consistency is a difficult problem. Marc On Sep 25, 2014, at 11:09 PM, > > wrote: > > On Sep 25, 2014, at 10:01 PM, Lan Wang (lanwang) > wrote: > >>>> - Benefit seems apparent in multi-consumer scenarios, even without sync. >>>> Let's say I have 5 personal devices requesting mail. In Scheme B, every >>>> publisher receives and processes 5 interests per second on average. In >>>> Scheme A, with an upstream caching node, each receives 1 per second >>>> maximum. The publisher still has to throttle requests, but with no help >>>> or scaling support from the network. >>> >>> This can be done without selectors. As long as all the clients produce a >>> request for the same name they can take advantage caching. >> >> What Jeff responded to is that scheme B requires a freshness of 0 for the initial interest to get to the producer (in order to get the latest list of email names). If freshness is 0, then there's no caching of the data. No meter how the clients name their Interests, they can't take advantage of caching. >> > > How do selectors prevent you from sending an Interest to the producer, if it?s connected. I send first interest ?exclude <= 100? and cache A responds with version 110. Don?t you then turn around and send a second interest ?exclude <= 110? to see if another cache has a more recent version? Won?t that interest go to the producer, if its connected? It will then need to send a NACk (or you need to timeout), if there?s nothing more recent. > > Using selectors, you still never know if there?s a more recent version until you get to the producer or you timeout. You always need to keep asking and asking. Also, there?s nothing special about the content object from the producer, so you still don?t necessarily believe that its the most recent, and you?ll then ask again. Sure, an application could just accept the 1st or 2nd content object it gets back, but it never really knows. Sure, if the CreateTime (I think you call it Timestamp in NDN, if you still have it) is very recent and you assume synchronized clocks, then you might have some belief that it?s current. > > We could also talk about FreshnessSeconds and MustBeFresh, but that would be best to start its own thread on. > > Marc > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest _______________________________________________ Ndn-interest mailing list Ndn-interest at lists.cs.ucla.edu http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: From tailinchu at gmail.com Fri Sep 26 00:42:36 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Fri, 26 Sep 2014 00:42:36 -0700 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: How will fib work under exact matching? that implies #routable interest name = #fib name. After you mention selectors in name and key hash, I see a huge number of fib entries to be created. I am worried about those dummy nodes. On Thu, Sep 25, 2014 at 4:17 PM, wrote: > On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: > >>In NDN, if a router wants to check the signature, then it can check. If >>it wants to skip the checking, that's fine too. If the design doesn't >>allow the router to verify the signature, then that's a problem. In the >>above description, the cache signs a data packet with a name owned by >>someone else, it seems problematic for a design to advocate this. > > Routers that support the Selector Protocol could check signatures. > >> >>One difference is that here the returned data can satisfy only that >>interest. In the original selector design, the returned data can satisfy >>other Interests with the same name but different selectors (e.g., > 50). > > Interests with the same selector would be exact matched throughout the > network at any node. > > Interests with different selectors would not match on all routers, but > they would match just fine on routers that supported the Selector Protocol. > > Basically, everything you want works on routers that support the Selector > Protocol, but routers that don?t want to support it, don?t have to. > > > Nacho > > > >>> >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Fri Sep 26 00:46:37 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Fri, 26 Sep 2014 07:46:37 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> Message-ID: On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: >How can a cache respond to /mail/inbox/selector_matching/payload> with a table of content? This name prefix is owned by the mail >server. Also the reply really depends on what is in the cache at the >moment, so the same name would correspond to different data. A - Yes, the same name would correspond to different data. This is true given that then data has changed. NDN (and CCN) has no architectural requirement that a name maps to the same piece of data (Obviously not talking about self certifying hash-based names). B - Yes, you can consider the name prefix is ?owned? by the server, but the answer is actually something that the cache is choosing. The cache is choosing from the set if data that it has. The data that it encapsulates _is_ signed by the producer. Anybody that can decapsulate the data can verify that this is the case. Nacho >On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: > >> My beating on ?discover all? is exactly because of this. Let?s define >>discovery service. If the service is just ?discover latest? >>(left/right), can we not simplify the current approach? If the service >>includes more than ?latest?, then is the current approach the right >>approach? >> >> Sync has its place and is the right solution for somethings. However, >>it should not be a a bandage over discovery. Discovery should be its >>own valid and useful service. >> >> I agree that the exclusion approach can work, and work relatively well, >>for finding the rightmost/leftmost child. I believe this is because >>that operation is transitive through caches. So, within whatever >>timeout an application is willing to wait to find the ?latest?, it can >>keep asking and asking. >> >> I do think it would be best to actually try to ask an authoritative >>source first (i.e. a non-cached value), and if that fails then probe >>caches, but experimentation may show what works well. This is based on >>my belief that in the real world in broad use, the namespace will become >>pretty polluted and probing will result in a lot of junk, but that?s >>future prognosticating. >> >> Also, in the exact match vs. continuation match of content object to >>interest, it is pretty easy to encode that ?selector? request in a name >>component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) >>and any participating cache can respond with a link (or encapsulate) a >>response in an exact match system. >> >> In the CCNx 1.0 spec, one could also encode this a different way. One >>could use a name like ?/mail/inbox/selector_matching/? >>and in the payload include "exclude_before=(t=version, l=2, v=279) & >>sort=right?. This means that any cache that could process the ? >>selector_matching? function could look at the interest payload and >>evaluate the predicate there. The predicate could become large and not >>pollute the PIT with all the computation state. Including ?>payload>? in the name means that one could get a cached response if >>someone else had asked the same exact question (subject to the content >>object?s cache lifetime) and it also servers to multiplex different >>payloads for the same function (selector_matching). >> >> Marc >> >> >> On Sep 25, 2014, at 8:18 AM, Burke, Jeff wrote: >> >>> >>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>> >>> >>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchronizatio >>>nPr >>> otocol.html >>> >>> J. >>> >>> >>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>> >>>> However, I cannot see whether we can achieve "best-effort *all*-value" >>>> efficiently. >>>> There are still interesting topics on >>>> 1. how do we express the discovery query? >>>> 2. is selector "discovery-complete"? i. e. can we express any >>>> discovery query with current selector? >>>> 3. if so, can we re-express current selector in a more efficient way? >>>> >>>> I personally see a named data as a set, which can then be categorized >>>> into "ordered set", and "unordered set". >>>> some questions that any discovery expression must solve: >>>> 1. is this a nil set or not? nil set means that this name is the leaf >>>> 2. set contains member X? >>>> 3. is set ordered or not >>>> 4. (ordered) first, prev, next, last >>>> 5. if we enforce component ordering, answer question 4. >>>> 6. recursively answer all questions above on any set member >>>> >>>> >>>> >>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>> wrote: >>>>> >>>>> >>>>> From: >>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>> To: Jeff Burke >>>>> Cc: , , >>>>> >>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>> >>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>>>> shouldn?t >>>>> care if its email messages or temperature readings or world cup >>>>>photos. >>>>> >>>>> >>>>> This is true if discovery means "finding everything" - in which case, >>>>> as you >>>>> point out, sync-style approaches may be best. But I am not sure that >>>>> this >>>>> definition is complete. The most pressing example that I can think >>>>>of >>>>> is >>>>> best-effort latest-value, in which the consumer's goal is to get the >>>>> latest >>>>> copy the network can deliver at the moment, and may not care about >>>>> previous >>>>> values or (if freshness is used well) potential later versions. >>>>> >>>>> Another case that seems to work well is video seeking. Let's say I >>>>> want to >>>>> enable random access to a video by timecode. The publisher can >>>>>provide a >>>>> time-code based discovery namespace that's queried using an Interest >>>>> that >>>>> essentially says "give me the closest keyframe to 00:37:03:12", which >>>>> returns an interest that, via the name, provides the exact timecode >>>>>of >>>>> the >>>>> keyframe in question and a link to a segment-based namespace for >>>>> efficient >>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>way, >>>>> the >>>>> consumer has random access capability. If the NDN is the moral >>>>> equivalent >>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>provide >>>>> this kind of functionality, just as they are used in TCP. >>>>> >>>>> >>>>> I described one set of problems using the exclusion approach, and >>>>>that >>>>> an >>>>> NDN paper on device discovery described a similar problem, though >>>>>they >>>>> did >>>>> not go into the details of splitting interests, etc. That all was >>>>> simple >>>>> enough to see from the example. >>>>> >>>>> Another question is how does one do the discovery with exact match >>>>> names, >>>>> which is also conflating things. You could do a different discovery >>>>> with >>>>> continuation names too, just not the exclude method. >>>>> >>>>> As I alluded to, one needs a way to talk with a specific cache about >>>>>its >>>>> ?table of contents? for a prefix so one can get a consistent set of >>>>> results >>>>> without all the round-trips of exclusions. Actually downloading the >>>>> ?headers? of the messages would be the same bytes, more or less. In >>>>>a >>>>> way, >>>>> this is a little like name enumeration from a ccnx 0.x repo, but that >>>>> protocol has its own set of problems and I?m not suggesting to use >>>>>that >>>>> directly. >>>>> >>>>> One approach is to encode a request in a name component and a >>>>> participating >>>>> cache can reply. It replies in such a way that one could continue >>>>> talking >>>>> with that cache to get its TOC. One would then issue another >>>>>interest >>>>> with >>>>> a request for not-that-cache. >>>>> >>>>> >>>>> I'm curious how the TOC approach works in a multi-publisher scenario? >>>>> >>>>> >>>>> Another approach is to try to ask the authoritative source for the >>>>> ?current? >>>>> manifest name, i.e. /mail/inbox/current/, which could return >>>>>the >>>>> manifest or a link to the manifest. Then fetching the actual >>>>>manifest >>>>> from >>>>> the link could come from caches because you how have a consistent >>>>>set of >>>>> names to ask for. If you cannot talk with an authoritative source, >>>>>you >>>>> could try again without the nonce and see if there?s a cached copy >>>>>of a >>>>> recent version around. >>>>> >>>>> Marc >>>>> >>>>> >>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>wrote: >>>>> >>>>> >>>>> >>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>> >>>>> wrote: >>>>> >>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>> >>>>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>>> pattern with static (/mail/inbox) and variable (148) components; with >>>>> proper naming convention, computers can also detect this pattern >>>>> easily. Now I want to look for all mails in my inbox. I can generate >>>>>a >>>>> list of /mail/inbox/. These are my guesses, and with >>>>>selectors >>>>> I can further refine my guesses. >>>>> >>>>> >>>>> I think this is a very bad example (or at least a very bad >>>>>application >>>>> design). You have an app (a mail server / inbox) and you want it to >>>>> list >>>>> your emails? An email list is an application data structure. I >>>>>don?t >>>>> think you should use the network structure to reflect this. >>>>> >>>>> >>>>> I think Tai-Lin is trying to sketch a small example, not propose a >>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>> >>>>> >>>>> Another way to look at it is that if the network architecture is >>>>> providing >>>>> the equivalent of distributed storage to the application, perhaps the >>>>> application data structure could be adapted to match the affordances >>>>>of >>>>> the network. Then it would not be so bad that the two structures >>>>>were >>>>> aligned. >>>>> >>>>> >>>>> I?ll give you an example, how do you delete emails from your inbox? >>>>>If >>>>> an >>>>> email was cached in the network it can never be deleted from your >>>>>inbox? >>>>> >>>>> >>>>> This is conflating two issues - what you are pointing out is that the >>>>> data >>>>> structure of a linear list doesn't handle common email management >>>>> operations well. Again, I'm not sure if that's what he was getting >>>>>at >>>>> here. But deletion is not the issue - the availability of a data >>>>>object >>>>> on the network does not necessarily mean it's valid from the >>>>>perspective >>>>> of the application. >>>>> >>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>> >>>>> This problem is true for most (any?) situations where you use network >>>>> name >>>>> structure to directly reflect the application data structure. >>>>> >>>>> >>>>> Not sure I understand how you make the leap from the example to the >>>>> general statement. >>>>> >>>>> Jeff >>>>> >>>>> >>>>> >>>>> >>>>> Nacho >>>>> >>>>> >>>>> >>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>> >>>>> Ok, yes I think those would all be good things. >>>>> >>>>> One thing to keep in mind, especially with things like time series >>>>> sensor >>>>> data, is that people see a pattern and infer a way of doing it. >>>>>That?s >>>>> easy >>>>> for a human :) But in Discovery, one should assume that one does not >>>>> know >>>>> of patterns in the data beyond what the protocols used to publish the >>>>> data >>>>> explicitly require. That said, I think some of the things you listed >>>>> are >>>>> good places to start: sensor data, web content, climate data or >>>>>genome >>>>> data. >>>>> >>>>> We also need to state what the forwarding strategies are and what the >>>>> cache >>>>> behavior is. >>>>> >>>>> I outlined some of the points that I think are important in that >>>>>other >>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>> important, and that one gets complicated fast. So points like >>>>> separating >>>>> discovery from retrieval and working with large data sets have been >>>>> important in shaping our thinking. That all said, I?d be happy >>>>> starting >>>>> from 0 and working through the Discovery service definition from >>>>> scratch >>>>> along with data set use cases. >>>>> >>>>> Marc >>>>> >>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>> wrote: >>>>> >>>>> Hi Marc, >>>>> >>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>> more >>>>> specific, which was to see if we could identify a few specific use >>>>> cases >>>>> around which to have the conversation. (e.g., time series sensor >>>>>data >>>>> and >>>>> web content retrieval for "get latest"; climate data for huge data >>>>> sets; >>>>> local data in a vehicular network; etc.) What have you been looking >>>>>at >>>>> that's driving considerations of discovery? >>>>> >>>>> Thanks, >>>>> Jeff >>>>> >>>>> From: >>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>> To: Jeff Burke >>>>> Cc: , >>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>> >>>>> Jeff, >>>>> >>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>> Discovery. >>>>> >>>>> >>>>> >>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000 >>>>>20 >>>>> 0 >>>>> .html >>>>> >>>>> I think it would be very productive to talk about what Discovery >>>>>should >>>>> do, >>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>the >>>>> how, >>>>> which I think is a less important topic than the what at this stage. >>>>> >>>>> Marc >>>>> >>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>> wrote: >>>>> >>>>> Marc, >>>>> >>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>> based >>>>> on use cases. What are the use cases you are using to evaluate >>>>> discovery? >>>>> >>>>> Jeff >>>>> >>>>> >>>>> >>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>> wrote: >>>>> >>>>> No matter what the expressiveness of the predicates if the forwarder >>>>> can >>>>> send interests different ways you don't have a consistent underlying >>>>> set >>>>> to talk about so you would always need non-range exclusions to >>>>>discover >>>>> every version. >>>>> >>>>> Range exclusions only work I believe if you get an authoritative >>>>> answer. >>>>> If different content pieces are scattered between different caches I >>>>> don't see how range exclusions would work to discover every version. >>>>> >>>>> I'm sorry to be pointing out problems without offering solutions but >>>>> we're not ready to publish our discovery protocols. >>>>> >>>>> Sent from my telephone >>>>> >>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>>> >>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>the >>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>> better. >>>>> >>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>soon >>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>Regular >>>>> language or context free language might become part of selector too. >>>>> >>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>> That will get you one reading then you need to exclude it and ask >>>>> again. >>>>> >>>>> Sent from my telephone >>>>> >>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>>> >>>>> Yes, my point was that if you cannot talk about a consistent set >>>>> with a particular cache, then you need to always use individual >>>>> excludes not range excludes if you want to discover all the versions >>>>> of an object. >>>>> >>>>> >>>>> I am very confused. For your example, if I want to get all today's >>>>> sensor data, I just do (Any..Last second of last day)(First second of >>>>> tomorrow..Any). That's 18 bytes. >>>>> >>>>> >>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>> >>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>> >>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>> >>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>> could miss content objects you want to discovery unless you avoid >>>>> all range exclusions and only exclude explicit versions. >>>>> >>>>> >>>>> Could you explain why missing content object situation happens? also >>>>> range exclusion is just a shorter notation for many explicit >>>>> exclude; >>>>> converting from explicit excludes to ranged exclude is always >>>>> possible. >>>>> >>>>> >>>>> Yes, my point was that if you cannot talk about a consistent set >>>>> with a particular cache, then you need to always use individual >>>>> excludes not range excludes if you want to discover all the versions >>>>> of an object. For something like a sensor reading that is updated, >>>>> say, once per second you will have 86,400 of them per day. If each >>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>> exclusions (plus encoding overhead) per day. >>>>> >>>>> yes, maybe using a more deterministic version number than a >>>>> timestamp makes sense here, but its just an example of needing a lot >>>>> of exclusions. >>>>> >>>>> >>>>> You exclude through 100 then issue a new interest. This goes to >>>>> cache B >>>>> >>>>> >>>>> I feel this case is invalid because cache A will also get the >>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>> if >>>>> this goes to cache B only, it means that cache A dies. How do you >>>>> know >>>>> that v101 even exist? >>>>> >>>>> >>>>> I guess this depends on what the forwarding strategy is. If the >>>>> forwarder will always send each interest to all replicas, then yes, >>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>> forwarder is just doing ?best path? and can round-robin between cache >>>>> A and cache B, then your application could miss v101. >>>>> >>>>> >>>>> >>>>> c,d In general I agree that LPM performance is related to the number >>>>> of components. In my own thread-safe LMP implementation, I used only >>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>> every node will be faster or not because of lock overhead. >>>>> >>>>> However, we should compare (exact match + discovery protocol) vs >>>>> (ndn >>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>> >>>>> >>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>> ready to claim its better yet because we have not done that. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>> I would point out that using LPM on content object to Interest >>>>> matching to do discovery has its own set of problems. Discovery >>>>> involves more than just ?latest version? discovery too. >>>>> >>>>> This is probably getting off-topic from the original post about >>>>> naming conventions. >>>>> >>>>> a. If Interests can be forwarded multiple directions and two >>>>> different caches are responding, the exclusion set you build up >>>>> talking with cache A will be invalid for cache B. If you talk >>>>> sometimes to A and sometimes to B, you very easily could miss >>>>> content objects you want to discovery unless you avoid all range >>>>> exclusions and only exclude explicit versions. That will lead to >>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>> explicit discovery protocol that allows conversations about >>>>> consistent sets is better. >>>>> >>>>> b. Yes, if you just want the ?latest version? discovery that >>>>> should be transitive between caches, but imagine this. You send >>>>> Interest #1 to cache A which returns version 100. You exclude >>>>> through 100 then issue a new interest. This goes to cache B who >>>>> only has version 99, so the interest times out or is NACK?d. So >>>>> you think you have it! But, cache A already has version 101, you >>>>> just don?t know. If you cannot have a conversation around >>>>> consistent sets, it seems like even doing latest version discovery >>>>> is difficult with selector based discovery. From what I saw in >>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>> authoritative source because you can never believe an intermediate >>>>> cache that there?s not something more recent. >>>>> >>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>> interest in seeing your analysis. Case (a) is that a node can >>>>> correctly discover every version of a name prefix, and (b) is that >>>>> a node can correctly discover the latest version. We have not >>>>> formally compared (or yet published) our discovery protocols (we >>>>> have three, 2 for content, 1 for device) compared to selector based >>>>> discovery, so I cannot yet claim they are better, but they do not >>>>> have the non-determinism sketched above. >>>>> >>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>> must do in the PIT to match a content object. If you have a name >>>>> tree or a threaded hash table, those don?t all need to be hash >>>>> lookups, but you need to walk up the name tree for every prefix of >>>>> the content object name and evaluate the selector predicate. >>>>> Content Based Networking (CBN) had some some methods to create data >>>>> structures based on predicates, maybe those would be better. But >>>>> in any case, you will potentially need to retrieve many PIT entries >>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>> implementation only requires at most 3 lookups (one by name, one by >>>>> name + keyid, one by name + content object hash), and one can do >>>>> other things to optimize lookup for an extra write. >>>>> >>>>> d. In (c) above, if you have a threaded name tree or are just >>>>> walking parent pointers, I suspect you?ll need locking of the >>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>> and that will be expensive. It would be interesting to see what a >>>>> cache consistent multi-threaded name tree looks like. >>>>> >>>>> Marc >>>>> >>>>> >>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>> wrote: >>>>> >>>>> I had thought about these questions, but I want to know your idea >>>>> besides typed component: >>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>> things? >>>>> 2. will removing selectors improve performance? How do we use >>>>> other >>>>> faster technique to replace selector? >>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>> byte, but 2 bytes for length might not be enough for future. >>>>> >>>>> >>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>> wrote: >>>>> >>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>> wrote: >>>>> >>>>> I know how to make #2 flexible enough to do what things I can >>>>> envision we need to do, and with a few simple conventions on >>>>> how the registry of types is managed. >>>>> >>>>> >>>>> Could you share it with us? >>>>> >>>>> Sure. Here?s a strawman. >>>>> >>>>> The type space is 16 bits, so you have 65,565 types. >>>>> >>>>> The type space is currently shared with the types used for the >>>>> entire protocol, that gives us two options: >>>>> (1) we reserve a range for name component types. Given the >>>>> likelihood there will be at least as much and probably more need >>>>> to component types than protocol extensions, we could reserve 1/2 >>>>> of the type space, giving us 32K types for name components. >>>>> (2) since there is no parsing ambiguity between name components >>>>> and other fields of the protocol (sine they are sub-types of the >>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>> name component types. >>>>> >>>>> We divide the type space into regions, and manage it with a >>>>> registry. If we ever get to the point of creating an IETF >>>>> standard, IANA has 25 years of experience running registries and >>>>> there are well-understood rule sets for different kinds of >>>>> registries (open, requires a written spec, requires standards >>>>> approval). >>>>> >>>>> - We allocate one ?default" name component type for ?generic >>>>> name?, which would be used on name prefixes and other common >>>>> cases where there are no special semantics on the name component. >>>>> - We allocate a range of name component types, say 1024, to >>>>> globally understood types that are part of the base or extension >>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>> - We reserve some portion of the space for unanticipated uses >>>>> (say another 1024 types) >>>>> - We give the rest of the space to application assignment. >>>>> >>>>> Make sense? >>>>> >>>>> >>>>> While I?m sympathetic to that view, there are three ways in >>>>> which Moore?s law or hardware tricks will not save us from >>>>> performance flaws in the design >>>>> >>>>> >>>>> we could design for performance, >>>>> >>>>> That?s not what people are advocating. We are advocating that we >>>>> *not* design for known bad performance and hope serendipity or >>>>> Moore?s Law will come to the rescue. >>>>> >>>>> but I think there will be a turning >>>>> point when the slower design starts to become "fast enough?. >>>>> >>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>> things that don?t get faster while others do tend to get dropped >>>>> or not used because they impose a performance penalty relative to >>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>> where impovements in technology get applied to lowering cost >>>>> rather than improving performance. For those environments bad >>>>> performance just never get better. >>>>> >>>>> Do you >>>>> think there will be some design of ndn that will *never* have >>>>> performance improvement? >>>>> >>>>> I suspect LPM on data will always be slow (relative to the other >>>>> functions). >>>>> i suspect exclusions will always be slow because they will >>>>> require extra memory references. >>>>> >>>>> However I of course don?t claim to clairvoyance so this is just >>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>> orders of magnitude and still having to worry about counting >>>>> cycles and memory references? >>>>> >>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>> wrote: >>>>> >>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>> wrote: >>>>> >>>>> We should not look at a certain chip nowadays and want ndn to >>>>> perform >>>>> well on it. It should be the other way around: once ndn app >>>>> becomes >>>>> popular, a better chip will be designed for ndn. >>>>> >>>>> While I?m sympathetic to that view, there are three ways in >>>>> which Moore?s law or hardware tricks will not save us from >>>>> performance flaws in the design: >>>>> a) clock rates are not getting (much) faster >>>>> b) memory accesses are getting (relatively) more expensive >>>>> c) data structures that require locks to manipulate >>>>> successfully will be relatively more expensive, even with >>>>> near-zero lock contention. >>>>> >>>>> The fact is, IP *did* have some serious performance flaws in >>>>> its design. We just forgot those because the design elements >>>>> that depended on those mistakes have fallen into disuse. The >>>>> poster children for this are: >>>>> 1. IP options. Nobody can use them because they are too slow >>>>> on modern forwarding hardware, so they can?t be reliably used >>>>> anywhere >>>>> 2. the UDP checksum, which was a bad design when it was >>>>> specified and is now a giant PITA that still causes major pain >>>>> in working around. >>>>> >>>>> I?m afraid students today are being taught the that designers >>>>> of IP were flawless, as opposed to very good scientists and >>>>> engineers that got most of it right. >>>>> >>>>> I feel the discussion today and yesterday has been off-topic. >>>>> Now I >>>>> see that there are 3 approaches: >>>>> 1. we should not define a naming convention at all >>>>> 2. typed component: use tlv type space and add a handful of >>>>> types >>>>> 3. marked component: introduce only one more type and add >>>>> additional >>>>> marker space >>>>> >>>>> I know how to make #2 flexible enough to do what things I can >>>>> envision we need to do, and with a few simple conventions on >>>>> how the registry of types is managed. >>>>> >>>>> It is just as powerful in practice as either throwing up our >>>>> hands and letting applications design their own mutually >>>>> incompatible schemes or trying to make naming conventions with >>>>> markers in a way that is fast to generate/parse and also >>>>> resilient against aliasing. >>>>> >>>>> Also everybody thinks that the current utf8 marker naming >>>>> convention >>>>> needs to be revised. >>>>> >>>>> >>>>> >>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>> wrote: >>>>> Would that chip be suitable, i.e. can we expect most names >>>>> to fit in (the >>>>> magnitude of) 96 bytes? What length are names usually in >>>>> current NDN >>>>> experiments? >>>>> >>>>> I guess wide deployment could make for even longer names. >>>>> Related: Many URLs >>>>> I encounter nowadays easily don't fit within two 80-column >>>>> text lines, and >>>>> NDN will have to carry more information than URLs, as far as >>>>> I see. >>>>> >>>>> >>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>> >>>>> In fact, the index in separate TLV will be slower on some >>>>> architectures, >>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>> bytes in memory, >>>>> then any subsequent memory is accessed only as two adjacent >>>>> 32-byte blocks >>>>> (there can be at most 5 blocks available at any one time). >>>>> If you need to >>>>> switch between arrays, it would be very expensive. If you >>>>> have to read past >>>>> the name to get to the 2nd array, then read it, then backup >>>>> to get to the >>>>> name, it will be pretty expensive too. >>>>> >>>>> Marc >>>>> >>>>> On Sep 18, 2014, at 2:02 PM, >>>>> wrote: >>>>> >>>>> Does this make that much difference? >>>>> >>>>> If you want to parse the first 5 components. One way to do >>>>> it is: >>>>> >>>>> Read the index, find entry 5, then read in that many bytes >>>>> from the start >>>>> offset of the beginning of the name. >>>>> OR >>>>> Start reading name, (find size + move ) 5 times. >>>>> >>>>> How much speed are you getting from one to the other? You >>>>> seem to imply >>>>> that the first one is faster. I don?t think this is the >>>>> case. >>>>> >>>>> In the first one you?ll probably have to get the cache line >>>>> for the index, >>>>> then all the required cache lines for the first 5 >>>>> components. For the >>>>> second, you?ll have to get all the cache lines for the first >>>>> 5 components. >>>>> Given an assumption that a cache miss is way more expensive >>>>> than >>>>> evaluating a number and computing an addition, you might >>>>> find that the >>>>> performance of the index is actually slower than the >>>>> performance of the >>>>> direct access. >>>>> >>>>> Granted, there is a case where you don?t access the name at >>>>> all, for >>>>> example, if you just get the offsets and then send the >>>>> offsets as >>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>> you may see a >>>>> gain IF there are more cache line misses in reading the name >>>>> than in >>>>> reading the index. So, if the regular part of the name >>>>> that you?re >>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>> name is to be >>>>> processed by a different processor, then your might see some >>>>> performance >>>>> gain in using the index, but in all other circumstances I >>>>> bet this is not >>>>> the case. I may be wrong, haven?t actually tested it. >>>>> >>>>> This is all to say, I don?t think we should be designing the >>>>> protocol with >>>>> only one architecture in mind. (The architecture of sending >>>>> the name to a >>>>> different processor than the index). >>>>> >>>>> If you have numbers that show that the index is faster I >>>>> would like to see >>>>> under what conditions and architectural assumptions. >>>>> >>>>> Nacho >>>>> >>>>> (I may have misinterpreted your description so feel free to >>>>> correct me if >>>>> I?m wrong.) >>>>> >>>>> >>>>> -- >>>>> Nacho (Ignacio) Solis >>>>> Protocol Architect >>>>> Principal Scientist >>>>> Palo Alto Research Center (PARC) >>>>> +1(650)812-4458 >>>>> Ignacio.Solis at parc.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>> >>>>> wrote: >>>>> >>>>> Indeed each components' offset must be encoded using a fixed >>>>> amount of >>>>> bytes: >>>>> >>>>> i.e., >>>>> Type = Offsets >>>>> Length = 10 Bytes >>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>> >>>>> You may also imagine to have a "Offset_2byte" type if your >>>>> name is too >>>>> long. >>>>> >>>>> Max >>>>> >>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose >>>>> you only >>>>> want the first x components) you can directly have it using >>>>> the >>>>> offsets. With the Nested TLV structure you have to >>>>> iteratively parse >>>>> the first x-1 components. With the offset structure you cane >>>>> directly >>>>> access to the firs x components. >>>>> >>>>> I don't get it. What you described only works if the >>>>> "offset" is >>>>> encoded in fixed bytes. With varNum, you will still need to >>>>> parse x-1 >>>>> offsets to get to the x offset. >>>>> >>>>> >>>>> >>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>> wrote: >>>>> >>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>> >>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>> like the >>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>> understand what >>>>> you >>>>> _do_ prefer, though. it sounds like you're describing an >>>>> entirely >>>>> different >>>>> scheme where the info that describes the name-components is >>>>> ... >>>>> someplace >>>>> other than _in_ the name-components. is that correct? when >>>>> you say >>>>> "field >>>>> separator", what do you mean (since that's not a "TL" from a >>>>> TLV)? >>>>> >>>>> Correct. >>>>> In particular, with our name encoding, a TLV indicates the >>>>> name >>>>> hierarchy >>>>> with offsets in the name and other TLV(s) indicates the >>>>> offset to use >>>>> in >>>>> order to retrieve special components. >>>>> As for the field separator, it is something like "/". >>>>> Aliasing is >>>>> avoided as >>>>> you do not rely on field separators to parse the name; you >>>>> use the >>>>> "offset >>>>> TLV " to do that. >>>>> >>>>> So now, it may be an aesthetic question but: >>>>> >>>>> if you do not need the entire hierarchal structure (suppose >>>>> you only >>>>> want >>>>> the first x components) you can directly have it using the >>>>> offsets. >>>>> With the >>>>> Nested TLV structure you have to iteratively parse the first >>>>> x-1 >>>>> components. >>>>> With the offset structure you cane directly access to the >>>>> firs x >>>>> components. >>>>> >>>>> Max >>>>> >>>>> >>>>> -- Mark >>>>> >>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>> >>>>> The why is simple: >>>>> >>>>> You use a lot of "generic component type" and very few >>>>> "specific >>>>> component type". You are imposing types for every component >>>>> in order >>>>> to >>>>> handle few exceptions (segmentation, etc..). You create a >>>>> rule >>>>> (specify >>>>> the component's type ) to handle exceptions! >>>>> >>>>> I would prefer not to have typed components. Instead I would >>>>> prefer >>>>> to >>>>> have the name as simple sequence bytes with a field >>>>> separator. Then, >>>>> outside the name, if you have some components that could be >>>>> used at >>>>> network layer (e.g. a TLV field), you simply need something >>>>> that >>>>> indicates which is the offset allowing you to retrieve the >>>>> version, >>>>> segment, etc in the name... >>>>> >>>>> >>>>> Max >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>> >>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>> >>>>> I think we agree on the small number of "component types". >>>>> However, if you have a small number of types, you will end >>>>> up with >>>>> names >>>>> containing many generic components types and few specific >>>>> components >>>>> types. Due to the fact that the component type specification >>>>> is an >>>>> exception in the name, I would prefer something that specify >>>>> component's >>>>> type only when needed (something like UTF8 conventions but >>>>> that >>>>> applications MUST use). >>>>> >>>>> so ... I can't quite follow that. the thread has had some >>>>> explanation >>>>> about why the UTF8 requirement has problems (with aliasing, >>>>> e.g.) >>>>> and >>>>> there's been email trying to explain that applications don't >>>>> have to >>>>> use types if they don't need to. your email sounds like "I >>>>> prefer >>>>> the >>>>> UTF8 convention", but it doesn't say why you have that >>>>> preference in >>>>> the face of the points about the problems. can you say why >>>>> it is >>>>> that >>>>> you express a preference for the "convention" with problems ? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> . >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Fri Sep 26 01:09:01 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Fri, 26 Sep 2014 08:09:01 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: , Message-ID: It wAsnt as clear as it could have been in presentation. Fib stays lmp it's only content object to interest matching that's exact. Sent from my telephone > On Sep 26, 2014, at 9:43, "Tai-Lin Chu" wrote: > > How will fib work under exact matching? that implies #routable > interest name = #fib name. > After you mention selectors in name and key hash, I see a huge number > of fib entries to be created. > > I am worried about those dummy nodes. > >> On Thu, Sep 25, 2014 at 4:17 PM, wrote: >>> On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: >>> >>> In NDN, if a router wants to check the signature, then it can check. If >>> it wants to skip the checking, that's fine too. If the design doesn't >>> allow the router to verify the signature, then that's a problem. In the >>> above description, the cache signs a data packet with a name owned by >>> someone else, it seems problematic for a design to advocate this. >> >> Routers that support the Selector Protocol could check signatures. >> >>> >>> One difference is that here the returned data can satisfy only that >>> interest. In the original selector design, the returned data can satisfy >>> other Interests with the same name but different selectors (e.g., > 50). >> >> Interests with the same selector would be exact matched throughout the >> network at any node. >> >> Interests with different selectors would not match on all routers, but >> they would match just fine on routers that supported the Selector Protocol. >> >> Basically, everything you want works on routers that support the Selector >> Protocol, but routers that don?t want to support it, don?t have to. >> >> >> Nacho >> >> >> >>>> >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From christian.tschudin at unibas.ch Fri Sep 26 01:12:45 2014 From: christian.tschudin at unibas.ch (christian.tschudin at unibas.ch) Date: Fri, 26 Sep 2014 10:12:45 +0200 (CEST) Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: On Fri, 26 Sep 2014, Ignacio.Solis at parc.com wrote: > On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: > >> In NDN, if a router wants to check the signature, then it can check. If >> it wants to skip the checking, that's fine too. If the design doesn't >> allow the router to verify the signature, then that's a problem. In the >> above description, the cache signs a data packet with a name owned by >> someone else, it seems problematic for a design to advocate this. > > Routers that support the Selector Protocol could check signatures. > >> >> One difference is that here the returned data can satisfy only that >> interest. In the original selector design, the returned data can satisfy >> other Interests with the same name but different selectors (e.g., > 50). > > Interests with the same selector would be exact matched throughout the > network at any node. > > Interests with different selectors would not match on all routers, but > they would match just fine on routers that supported the Selector Protocol. > > Basically, everything you want works on routers that support the Selector > Protocol, but routers that don?t want to support it, don?t have to. A perhaps more careful statement would be: - interests carry a query, not the name of data (except if you go for the hash). - caching results based on the query (that you call name) only works of the query is idempotent. I doubt that the selector expressiveness remains in idempotent land. One requirement thus would be that selector-carrying interests can disable cache reponses from nodes that dont support selectors. While it is true that you can force this by adding a nonce to each query, it would be cleaner to have an explicit signaling. Such a don't-cache-flag towards selector-ignoring nodes would be different from the dont-cache-flag in the query (that is directed to selector-aware nodes). christian > > > Nacho > > > >>> >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From tailinchu at gmail.com Fri Sep 26 01:21:06 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Fri, 26 Sep 2014 01:21:06 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> Message-ID: >B - Yes, you can consider the name prefix is ?owned? by the server, but the answer is actually something that the cache is choosing. The cache is choosing from the set if data that it has. The data that it encapsulates _is_ signed by the producer. Anybody that can decapsulate the data can verify that this is the case. does the producer sign the table of content directly? or producer signs the cache's key, which in turns sign the table of content, so that we can verify? On Fri, Sep 26, 2014 at 12:46 AM, wrote: > On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: > >>How can a cache respond to /mail/inbox/selector_matching/>payload> with a table of content? This name prefix is owned by the mail >>server. Also the reply really depends on what is in the cache at the >>moment, so the same name would correspond to different data. > > A - Yes, the same name would correspond to different data. This is true > given that then data has changed. NDN (and CCN) has no architectural > requirement that a name maps to the same piece of data (Obviously not > talking about self certifying hash-based names). > > B - Yes, you can consider the name prefix is ?owned? by the server, but > the answer is actually something that the cache is choosing. The cache is > choosing from the set if data that it has. The data that it encapsulates > _is_ signed by the producer. Anybody that can decapsulate the data can > verify that this is the case. > > Nacho > > >>On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >> >>> My beating on ?discover all? is exactly because of this. Let?s define >>>discovery service. If the service is just ?discover latest? >>>(left/right), can we not simplify the current approach? If the service >>>includes more than ?latest?, then is the current approach the right >>>approach? >>> >>> Sync has its place and is the right solution for somethings. However, >>>it should not be a a bandage over discovery. Discovery should be its >>>own valid and useful service. >>> >>> I agree that the exclusion approach can work, and work relatively well, >>>for finding the rightmost/leftmost child. I believe this is because >>>that operation is transitive through caches. So, within whatever >>>timeout an application is willing to wait to find the ?latest?, it can >>>keep asking and asking. >>> >>> I do think it would be best to actually try to ask an authoritative >>>source first (i.e. a non-cached value), and if that fails then probe >>>caches, but experimentation may show what works well. This is based on >>>my belief that in the real world in broad use, the namespace will become >>>pretty polluted and probing will result in a lot of junk, but that?s >>>future prognosticating. >>> >>> Also, in the exact match vs. continuation match of content object to >>>interest, it is pretty easy to encode that ?selector? request in a name >>>component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) >>>and any participating cache can respond with a link (or encapsulate) a >>>response in an exact match system. >>> >>> In the CCNx 1.0 spec, one could also encode this a different way. One >>>could use a name like ?/mail/inbox/selector_matching/? >>>and in the payload include "exclude_before=(t=version, l=2, v=279) & >>>sort=right?. This means that any cache that could process the ? >>>selector_matching? function could look at the interest payload and >>>evaluate the predicate there. The predicate could become large and not >>>pollute the PIT with all the computation state. Including ?>>payload>? in the name means that one could get a cached response if >>>someone else had asked the same exact question (subject to the content >>>object?s cache lifetime) and it also servers to multiplex different >>>payloads for the same function (selector_matching). >>> >>> Marc >>> >>> >>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff wrote: >>> >>>> >>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>> >>>> >>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchronizatio >>>>nPr >>>> otocol.html >>>> >>>> J. >>>> >>>> >>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>> >>>>> However, I cannot see whether we can achieve "best-effort *all*-value" >>>>> efficiently. >>>>> There are still interesting topics on >>>>> 1. how do we express the discovery query? >>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>> discovery query with current selector? >>>>> 3. if so, can we re-express current selector in a more efficient way? >>>>> >>>>> I personally see a named data as a set, which can then be categorized >>>>> into "ordered set", and "unordered set". >>>>> some questions that any discovery expression must solve: >>>>> 1. is this a nil set or not? nil set means that this name is the leaf >>>>> 2. set contains member X? >>>>> 3. is set ordered or not >>>>> 4. (ordered) first, prev, next, last >>>>> 5. if we enforce component ordering, answer question 4. >>>>> 6. recursively answer all questions above on any set member >>>>> >>>>> >>>>> >>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>> wrote: >>>>>> >>>>>> >>>>>> From: >>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>> To: Jeff Burke >>>>>> Cc: , , >>>>>> >>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>> >>>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>>>>> shouldn?t >>>>>> care if its email messages or temperature readings or world cup >>>>>>photos. >>>>>> >>>>>> >>>>>> This is true if discovery means "finding everything" - in which case, >>>>>> as you >>>>>> point out, sync-style approaches may be best. But I am not sure that >>>>>> this >>>>>> definition is complete. The most pressing example that I can think >>>>>>of >>>>>> is >>>>>> best-effort latest-value, in which the consumer's goal is to get the >>>>>> latest >>>>>> copy the network can deliver at the moment, and may not care about >>>>>> previous >>>>>> values or (if freshness is used well) potential later versions. >>>>>> >>>>>> Another case that seems to work well is video seeking. Let's say I >>>>>> want to >>>>>> enable random access to a video by timecode. The publisher can >>>>>>provide a >>>>>> time-code based discovery namespace that's queried using an Interest >>>>>> that >>>>>> essentially says "give me the closest keyframe to 00:37:03:12", which >>>>>> returns an interest that, via the name, provides the exact timecode >>>>>>of >>>>>> the >>>>>> keyframe in question and a link to a segment-based namespace for >>>>>> efficient >>>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>>way, >>>>>> the >>>>>> consumer has random access capability. If the NDN is the moral >>>>>> equivalent >>>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>>provide >>>>>> this kind of functionality, just as they are used in TCP. >>>>>> >>>>>> >>>>>> I described one set of problems using the exclusion approach, and >>>>>>that >>>>>> an >>>>>> NDN paper on device discovery described a similar problem, though >>>>>>they >>>>>> did >>>>>> not go into the details of splitting interests, etc. That all was >>>>>> simple >>>>>> enough to see from the example. >>>>>> >>>>>> Another question is how does one do the discovery with exact match >>>>>> names, >>>>>> which is also conflating things. You could do a different discovery >>>>>> with >>>>>> continuation names too, just not the exclude method. >>>>>> >>>>>> As I alluded to, one needs a way to talk with a specific cache about >>>>>>its >>>>>> ?table of contents? for a prefix so one can get a consistent set of >>>>>> results >>>>>> without all the round-trips of exclusions. Actually downloading the >>>>>> ?headers? of the messages would be the same bytes, more or less. In >>>>>>a >>>>>> way, >>>>>> this is a little like name enumeration from a ccnx 0.x repo, but that >>>>>> protocol has its own set of problems and I?m not suggesting to use >>>>>>that >>>>>> directly. >>>>>> >>>>>> One approach is to encode a request in a name component and a >>>>>> participating >>>>>> cache can reply. It replies in such a way that one could continue >>>>>> talking >>>>>> with that cache to get its TOC. One would then issue another >>>>>>interest >>>>>> with >>>>>> a request for not-that-cache. >>>>>> >>>>>> >>>>>> I'm curious how the TOC approach works in a multi-publisher scenario? >>>>>> >>>>>> >>>>>> Another approach is to try to ask the authoritative source for the >>>>>> ?current? >>>>>> manifest name, i.e. /mail/inbox/current/, which could return >>>>>>the >>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>manifest >>>>>> from >>>>>> the link could come from caches because you how have a consistent >>>>>>set of >>>>>> names to ask for. If you cannot talk with an authoritative source, >>>>>>you >>>>>> could try again without the nonce and see if there?s a cached copy >>>>>>of a >>>>>> recent version around. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>> >>>>>> wrote: >>>>>> >>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>> >>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>>>> pattern with static (/mail/inbox) and variable (148) components; with >>>>>> proper naming convention, computers can also detect this pattern >>>>>> easily. Now I want to look for all mails in my inbox. I can generate >>>>>>a >>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>selectors >>>>>> I can further refine my guesses. >>>>>> >>>>>> >>>>>> I think this is a very bad example (or at least a very bad >>>>>>application >>>>>> design). You have an app (a mail server / inbox) and you want it to >>>>>> list >>>>>> your emails? An email list is an application data structure. I >>>>>>don?t >>>>>> think you should use the network structure to reflect this. >>>>>> >>>>>> >>>>>> I think Tai-Lin is trying to sketch a small example, not propose a >>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>> >>>>>> >>>>>> Another way to look at it is that if the network architecture is >>>>>> providing >>>>>> the equivalent of distributed storage to the application, perhaps the >>>>>> application data structure could be adapted to match the affordances >>>>>>of >>>>>> the network. Then it would not be so bad that the two structures >>>>>>were >>>>>> aligned. >>>>>> >>>>>> >>>>>> I?ll give you an example, how do you delete emails from your inbox? >>>>>>If >>>>>> an >>>>>> email was cached in the network it can never be deleted from your >>>>>>inbox? >>>>>> >>>>>> >>>>>> This is conflating two issues - what you are pointing out is that the >>>>>> data >>>>>> structure of a linear list doesn't handle common email management >>>>>> operations well. Again, I'm not sure if that's what he was getting >>>>>>at >>>>>> here. But deletion is not the issue - the availability of a data >>>>>>object >>>>>> on the network does not necessarily mean it's valid from the >>>>>>perspective >>>>>> of the application. >>>>>> >>>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>>> >>>>>> This problem is true for most (any?) situations where you use network >>>>>> name >>>>>> structure to directly reflect the application data structure. >>>>>> >>>>>> >>>>>> Not sure I understand how you make the leap from the example to the >>>>>> general statement. >>>>>> >>>>>> Jeff >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Nacho >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>> >>>>>> Ok, yes I think those would all be good things. >>>>>> >>>>>> One thing to keep in mind, especially with things like time series >>>>>> sensor >>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>That?s >>>>>> easy >>>>>> for a human :) But in Discovery, one should assume that one does not >>>>>> know >>>>>> of patterns in the data beyond what the protocols used to publish the >>>>>> data >>>>>> explicitly require. That said, I think some of the things you listed >>>>>> are >>>>>> good places to start: sensor data, web content, climate data or >>>>>>genome >>>>>> data. >>>>>> >>>>>> We also need to state what the forwarding strategies are and what the >>>>>> cache >>>>>> behavior is. >>>>>> >>>>>> I outlined some of the points that I think are important in that >>>>>>other >>>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>>> important, and that one gets complicated fast. So points like >>>>>> separating >>>>>> discovery from retrieval and working with large data sets have been >>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>> starting >>>>>> from 0 and working through the Discovery service definition from >>>>>> scratch >>>>>> along with data set use cases. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>> wrote: >>>>>> >>>>>> Hi Marc, >>>>>> >>>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>> more >>>>>> specific, which was to see if we could identify a few specific use >>>>>> cases >>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>data >>>>>> and >>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>> sets; >>>>>> local data in a vehicular network; etc.) What have you been looking >>>>>>at >>>>>> that's driving considerations of discovery? >>>>>> >>>>>> Thanks, >>>>>> Jeff >>>>>> >>>>>> From: >>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>> To: Jeff Burke >>>>>> Cc: , >>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>> >>>>>> Jeff, >>>>>> >>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>> Discovery. >>>>>> >>>>>> >>>>>> >>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000 >>>>>>20 >>>>>> 0 >>>>>> .html >>>>>> >>>>>> I think it would be very productive to talk about what Discovery >>>>>>should >>>>>> do, >>>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>>the >>>>>> how, >>>>>> which I think is a less important topic than the what at this stage. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>> wrote: >>>>>> >>>>>> Marc, >>>>>> >>>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>> based >>>>>> on use cases. What are the use cases you are using to evaluate >>>>>> discovery? >>>>>> >>>>>> Jeff >>>>>> >>>>>> >>>>>> >>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>> wrote: >>>>>> >>>>>> No matter what the expressiveness of the predicates if the forwarder >>>>>> can >>>>>> send interests different ways you don't have a consistent underlying >>>>>> set >>>>>> to talk about so you would always need non-range exclusions to >>>>>>discover >>>>>> every version. >>>>>> >>>>>> Range exclusions only work I believe if you get an authoritative >>>>>> answer. >>>>>> If different content pieces are scattered between different caches I >>>>>> don't see how range exclusions would work to discover every version. >>>>>> >>>>>> I'm sorry to be pointing out problems without offering solutions but >>>>>> we're not ready to publish our discovery protocols. >>>>>> >>>>>> Sent from my telephone >>>>>> >>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>>>> >>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>>the >>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>> better. >>>>>> >>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>soon >>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>Regular >>>>>> language or context free language might become part of selector too. >>>>>> >>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>> That will get you one reading then you need to exclude it and ask >>>>>> again. >>>>>> >>>>>> Sent from my telephone >>>>>> >>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. >>>>>> >>>>>> >>>>>> I am very confused. For your example, if I want to get all today's >>>>>> sensor data, I just do (Any..Last second of last day)(First second of >>>>>> tomorrow..Any). That's 18 bytes. >>>>>> >>>>>> >>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>> >>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>> >>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>>> >>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>> could miss content objects you want to discovery unless you avoid >>>>>> all range exclusions and only exclude explicit versions. >>>>>> >>>>>> >>>>>> Could you explain why missing content object situation happens? also >>>>>> range exclusion is just a shorter notation for many explicit >>>>>> exclude; >>>>>> converting from explicit excludes to ranged exclude is always >>>>>> possible. >>>>>> >>>>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. For something like a sensor reading that is updated, >>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>> exclusions (plus encoding overhead) per day. >>>>>> >>>>>> yes, maybe using a more deterministic version number than a >>>>>> timestamp makes sense here, but its just an example of needing a lot >>>>>> of exclusions. >>>>>> >>>>>> >>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>> cache B >>>>>> >>>>>> >>>>>> I feel this case is invalid because cache A will also get the >>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>> if >>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>> know >>>>>> that v101 even exist? >>>>>> >>>>>> >>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>> forwarder is just doing ?best path? and can round-robin between cache >>>>>> A and cache B, then your application could miss v101. >>>>>> >>>>>> >>>>>> >>>>>> c,d In general I agree that LPM performance is related to the number >>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>> every node will be faster or not because of lock overhead. >>>>>> >>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>> (ndn >>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>> >>>>>> >>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>> ready to claim its better yet because we have not done that. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>> I would point out that using LPM on content object to Interest >>>>>> matching to do discovery has its own set of problems. Discovery >>>>>> involves more than just ?latest version? discovery too. >>>>>> >>>>>> This is probably getting off-topic from the original post about >>>>>> naming conventions. >>>>>> >>>>>> a. If Interests can be forwarded multiple directions and two >>>>>> different caches are responding, the exclusion set you build up >>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>> content objects you want to discovery unless you avoid all range >>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>> explicit discovery protocol that allows conversations about >>>>>> consistent sets is better. >>>>>> >>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>> should be transitive between caches, but imagine this. You send >>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>> you think you have it! But, cache A already has version 101, you >>>>>> just don?t know. If you cannot have a conversation around >>>>>> consistent sets, it seems like even doing latest version discovery >>>>>> is difficult with selector based discovery. From what I saw in >>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>> authoritative source because you can never believe an intermediate >>>>>> cache that there?s not something more recent. >>>>>> >>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>> a node can correctly discover the latest version. We have not >>>>>> formally compared (or yet published) our discovery protocols (we >>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>> have the non-determinism sketched above. >>>>>> >>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>> must do in the PIT to match a content object. If you have a name >>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>> the content object name and evaluate the selector predicate. >>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>> structures based on predicates, maybe those would be better. But >>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>> name + keyid, one by name + content object hash), and one can do >>>>>> other things to optimize lookup for an extra write. >>>>>> >>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>> and that will be expensive. It would be interesting to see what a >>>>>> cache consistent multi-threaded name tree looks like. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> I had thought about these questions, but I want to know your idea >>>>>> besides typed component: >>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>> things? >>>>>> 2. will removing selectors improve performance? How do we use >>>>>> other >>>>>> faster technique to replace selector? >>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>> >>>>>> >>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>> wrote: >>>>>> >>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> I know how to make #2 flexible enough to do what things I can >>>>>> envision we need to do, and with a few simple conventions on >>>>>> how the registry of types is managed. >>>>>> >>>>>> >>>>>> Could you share it with us? >>>>>> >>>>>> Sure. Here?s a strawman. >>>>>> >>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>> >>>>>> The type space is currently shared with the types used for the >>>>>> entire protocol, that gives us two options: >>>>>> (1) we reserve a range for name component types. Given the >>>>>> likelihood there will be at least as much and probably more need >>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>> of the type space, giving us 32K types for name components. >>>>>> (2) since there is no parsing ambiguity between name components >>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>> name component types. >>>>>> >>>>>> We divide the type space into regions, and manage it with a >>>>>> registry. If we ever get to the point of creating an IETF >>>>>> standard, IANA has 25 years of experience running registries and >>>>>> there are well-understood rule sets for different kinds of >>>>>> registries (open, requires a written spec, requires standards >>>>>> approval). >>>>>> >>>>>> - We allocate one ?default" name component type for ?generic >>>>>> name?, which would be used on name prefixes and other common >>>>>> cases where there are no special semantics on the name component. >>>>>> - We allocate a range of name component types, say 1024, to >>>>>> globally understood types that are part of the base or extension >>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>> - We reserve some portion of the space for unanticipated uses >>>>>> (say another 1024 types) >>>>>> - We give the rest of the space to application assignment. >>>>>> >>>>>> Make sense? >>>>>> >>>>>> >>>>>> While I?m sympathetic to that view, there are three ways in >>>>>> which Moore?s law or hardware tricks will not save us from >>>>>> performance flaws in the design >>>>>> >>>>>> >>>>>> we could design for performance, >>>>>> >>>>>> That?s not what people are advocating. We are advocating that we >>>>>> *not* design for known bad performance and hope serendipity or >>>>>> Moore?s Law will come to the rescue. >>>>>> >>>>>> but I think there will be a turning >>>>>> point when the slower design starts to become "fast enough?. >>>>>> >>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>> things that don?t get faster while others do tend to get dropped >>>>>> or not used because they impose a performance penalty relative to >>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>> where impovements in technology get applied to lowering cost >>>>>> rather than improving performance. For those environments bad >>>>>> performance just never get better. >>>>>> >>>>>> Do you >>>>>> think there will be some design of ndn that will *never* have >>>>>> performance improvement? >>>>>> >>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>> functions). >>>>>> i suspect exclusions will always be slow because they will >>>>>> require extra memory references. >>>>>> >>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>> orders of magnitude and still having to worry about counting >>>>>> cycles and memory references? >>>>>> >>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>> wrote: >>>>>> >>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>> perform >>>>>> well on it. It should be the other way around: once ndn app >>>>>> becomes >>>>>> popular, a better chip will be designed for ndn. >>>>>> >>>>>> While I?m sympathetic to that view, there are three ways in >>>>>> which Moore?s law or hardware tricks will not save us from >>>>>> performance flaws in the design: >>>>>> a) clock rates are not getting (much) faster >>>>>> b) memory accesses are getting (relatively) more expensive >>>>>> c) data structures that require locks to manipulate >>>>>> successfully will be relatively more expensive, even with >>>>>> near-zero lock contention. >>>>>> >>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>> its design. We just forgot those because the design elements >>>>>> that depended on those mistakes have fallen into disuse. The >>>>>> poster children for this are: >>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>> anywhere >>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>> specified and is now a giant PITA that still causes major pain >>>>>> in working around. >>>>>> >>>>>> I?m afraid students today are being taught the that designers >>>>>> of IP were flawless, as opposed to very good scientists and >>>>>> engineers that got most of it right. >>>>>> >>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>> Now I >>>>>> see that there are 3 approaches: >>>>>> 1. we should not define a naming convention at all >>>>>> 2. typed component: use tlv type space and add a handful of >>>>>> types >>>>>> 3. marked component: introduce only one more type and add >>>>>> additional >>>>>> marker space >>>>>> >>>>>> I know how to make #2 flexible enough to do what things I can >>>>>> envision we need to do, and with a few simple conventions on >>>>>> how the registry of types is managed. >>>>>> >>>>>> It is just as powerful in practice as either throwing up our >>>>>> hands and letting applications design their own mutually >>>>>> incompatible schemes or trying to make naming conventions with >>>>>> markers in a way that is fast to generate/parse and also >>>>>> resilient against aliasing. >>>>>> >>>>>> Also everybody thinks that the current utf8 marker naming >>>>>> convention >>>>>> needs to be revised. >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>> wrote: >>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>> to fit in (the >>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>> current NDN >>>>>> experiments? >>>>>> >>>>>> I guess wide deployment could make for even longer names. >>>>>> Related: Many URLs >>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>> text lines, and >>>>>> NDN will have to carry more information than URLs, as far as >>>>>> I see. >>>>>> >>>>>> >>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>> >>>>>> In fact, the index in separate TLV will be slower on some >>>>>> architectures, >>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>> bytes in memory, >>>>>> then any subsequent memory is accessed only as two adjacent >>>>>> 32-byte blocks >>>>>> (there can be at most 5 blocks available at any one time). >>>>>> If you need to >>>>>> switch between arrays, it would be very expensive. If you >>>>>> have to read past >>>>>> the name to get to the 2nd array, then read it, then backup >>>>>> to get to the >>>>>> name, it will be pretty expensive too. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>> wrote: >>>>>> >>>>>> Does this make that much difference? >>>>>> >>>>>> If you want to parse the first 5 components. One way to do >>>>>> it is: >>>>>> >>>>>> Read the index, find entry 5, then read in that many bytes >>>>>> from the start >>>>>> offset of the beginning of the name. >>>>>> OR >>>>>> Start reading name, (find size + move ) 5 times. >>>>>> >>>>>> How much speed are you getting from one to the other? You >>>>>> seem to imply >>>>>> that the first one is faster. I don?t think this is the >>>>>> case. >>>>>> >>>>>> In the first one you?ll probably have to get the cache line >>>>>> for the index, >>>>>> then all the required cache lines for the first 5 >>>>>> components. For the >>>>>> second, you?ll have to get all the cache lines for the first >>>>>> 5 components. >>>>>> Given an assumption that a cache miss is way more expensive >>>>>> than >>>>>> evaluating a number and computing an addition, you might >>>>>> find that the >>>>>> performance of the index is actually slower than the >>>>>> performance of the >>>>>> direct access. >>>>>> >>>>>> Granted, there is a case where you don?t access the name at >>>>>> all, for >>>>>> example, if you just get the offsets and then send the >>>>>> offsets as >>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>> you may see a >>>>>> gain IF there are more cache line misses in reading the name >>>>>> than in >>>>>> reading the index. So, if the regular part of the name >>>>>> that you?re >>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>> name is to be >>>>>> processed by a different processor, then your might see some >>>>>> performance >>>>>> gain in using the index, but in all other circumstances I >>>>>> bet this is not >>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>> >>>>>> This is all to say, I don?t think we should be designing the >>>>>> protocol with >>>>>> only one architecture in mind. (The architecture of sending >>>>>> the name to a >>>>>> different processor than the index). >>>>>> >>>>>> If you have numbers that show that the index is faster I >>>>>> would like to see >>>>>> under what conditions and architectural assumptions. >>>>>> >>>>>> Nacho >>>>>> >>>>>> (I may have misinterpreted your description so feel free to >>>>>> correct me if >>>>>> I?m wrong.) >>>>>> >>>>>> >>>>>> -- >>>>>> Nacho (Ignacio) Solis >>>>>> Protocol Architect >>>>>> Principal Scientist >>>>>> Palo Alto Research Center (PARC) >>>>>> +1(650)812-4458 >>>>>> Ignacio.Solis at parc.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>> >>>>>> wrote: >>>>>> >>>>>> Indeed each components' offset must be encoded using a fixed >>>>>> amount of >>>>>> bytes: >>>>>> >>>>>> i.e., >>>>>> Type = Offsets >>>>>> Length = 10 Bytes >>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>> >>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>> name is too >>>>>> long. >>>>>> >>>>>> Max >>>>>> >>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose >>>>>> you only >>>>>> want the first x components) you can directly have it using >>>>>> the >>>>>> offsets. With the Nested TLV structure you have to >>>>>> iteratively parse >>>>>> the first x-1 components. With the offset structure you cane >>>>>> directly >>>>>> access to the firs x components. >>>>>> >>>>>> I don't get it. What you described only works if the >>>>>> "offset" is >>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>> parse x-1 >>>>>> offsets to get to the x offset. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>> wrote: >>>>>> >>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> >>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>> like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>> understand what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>> entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is >>>>>> ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when >>>>>> you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>> TLV)? >>>>>> >>>>>> Correct. >>>>>> In particular, with our name encoding, a TLV indicates the >>>>>> name >>>>>> hierarchy >>>>>> with offsets in the name and other TLV(s) indicates the >>>>>> offset to use >>>>>> in >>>>>> order to retrieve special components. >>>>>> As for the field separator, it is something like "/". >>>>>> Aliasing is >>>>>> avoided as >>>>>> you do not rely on field separators to parse the name; you >>>>>> use the >>>>>> "offset >>>>>> TLV " to do that. >>>>>> >>>>>> So now, it may be an aesthetic question but: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose >>>>>> you only >>>>>> want >>>>>> the first x components) you can directly have it using the >>>>>> offsets. >>>>>> With the >>>>>> Nested TLV structure you have to iteratively parse the first >>>>>> x-1 >>>>>> components. >>>>>> With the offset structure you cane directly access to the >>>>>> firs x >>>>>> components. >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few >>>>>> "specific >>>>>> component type". You are imposing types for every component >>>>>> in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>> rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would >>>>>> prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field >>>>>> separator. Then, >>>>>> outside the name, if you have some components that could be >>>>>> used at >>>>>> network layer (e.g. a TLV field), you simply need something >>>>>> that >>>>>> indicates which is the offset allowing you to retrieve the >>>>>> version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>> >>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>> >>>>>> I think we agree on the small number of "component types". >>>>>> However, if you have a small number of types, you will end >>>>>> up with >>>>>> names >>>>>> containing many generic components types and few specific >>>>>> components >>>>>> types. Due to the fact that the component type specification >>>>>> is an >>>>>> exception in the name, I would prefer something that specify >>>>>> component's >>>>>> type only when needed (something like UTF8 conventions but >>>>>> that >>>>>> applications MUST use). >>>>>> >>>>>> so ... I can't quite follow that. the thread has had some >>>>>> explanation >>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>> e.g.) >>>>>> and >>>>>> there's been email trying to explain that applications don't >>>>>> have to >>>>>> use types if they don't need to. your email sounds like "I >>>>>> prefer >>>>>> the >>>>>> UTF8 convention", but it doesn't say why you have that >>>>>> preference in >>>>>> the face of the points about the problems. can you say why >>>>>> it is >>>>>> that >>>>>> you express a preference for the "convention" with problems ? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> . >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Fri Sep 26 01:26:29 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Fri, 26 Sep 2014 08:26:29 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> References: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> Message-ID: On 9/25/14, 10:01 PM, "Lan Wang (lanwang)" wrote: >On Sep 25, 2014, at 2:45 AM, Ignacio.Solis at parc.com wrote: >> On 9/24/14, 6:30 PM, "Burke, Jeff" wrote: >> >>> Ok, so sync-style approaches may work better for this example as Marc >>> already pointed out, but nonetheless... (Marc, I am catching up on >>>emails >>> and will respond to that shortly.) >> >> Sync can be done without selectors. :-) > >How? I remember Alex or Yingdi mentioned that chronosync needs selectors >and/or at least longest prefix matching (not exact matching). Last time I checked, from a simplistic perspective, chronosync ends up doing exact match for most of the data. Maybe I missed something? Let me do a simple recap form the paper (please correct me if I read this wrong): 1: "every party keeps an outstanding sync interest with the current state digest? So, you issue an interest for /prefix/ In steady state, nothing happens. 2: "As soon as some party generates new data, the state digest changes, and the outstanding interest gets satisfied" So, some party has new data. That means they have a new root. So you answer the outstanding interest to tell the participants there is new data. This data answer can have the same name. 3: "Whoever receives the sync data updates the digest tree to reflect the new change to the dataset state, and sends out a new sync interest with the updated state digest? 4: "Meanwhile, the users may send interests to request for Alice?s text message using the data name directly." Exact match all the way. Now, the paper describes a case they solve with excludes: Alice: State1 Bob: State1 Ted: State1 Alice an bob produce content: Alice: State1->State1a Bob: State1->State1b Ted: State1 Bob?s update reaches Ted. But it doesn?t reach Alice. Alice?s update doesn?t reach anybody. Alice: State1->State1a Bob: State1->State1b Ted: State1->State1b It?s unclear to me what happens at this time. The paper seems to imply that the protocol is now stuck if it the parties don?t ask for the previous state and get a different answer. (See discussion bellow). I would have assumed that this is situation is not fatal to the protocol. If you see an interest for something you don?t know, you ask about it. So, if Alice sees an interest for State1b and it doesn?t recognize it, it should realize that they have data that it doesn?t have and it should ask about what data that is. This can be done with exact matching. True, it may be that the cached copy of the (State1->State1b) message won?t be around, but this wouldn?t affect the system drastically. We can talk about this over email but I?m sure selectors are not needed to sync states at this point in the protocol. Nacho Discussion: The paper states: "When the wait time Tw times out, Ted proceeds to send a sync interest with the previous state digest again" Does this mean that it can only sync back up if there is only one state difference? In the situation of: Alice: State1->State1b->State1c->State1d Bob: State1->State1e->State1f->State1g Do Alice and Bob try to sync by issuing interests for the previous states? State1c and State1f? I would assume not. I would also assume that it doesn?t need to send interests for State1b and State1e. Otherwise any partition would make the system get out of sync. Given 2 (or n) parties with large differences in state, can Chronosync recover? If so, then it would seem that excludes are not needed. Assumption: In a system with 2 nodes, those nodes can sync. Say you have 4 nodes that need to sync. You would have a situation where: Node1: State-A Node2: State-B Node3: State-C Node4: State-D Node 1 and 2 will sync (given that we?re assuming that any 2 nodes can sync), producing the following: Node1: State-AB Node2: State-AB Node3: State-C Node4: State-D Next round, 2 nodes would merge their state, and the following time 2 nodes would sync. Effectively syncing everybody. > >> >>>> ? >>>> A20- You publish /username/mailbox/list/20 (lifetime of 1 second) >>> >>> >>> This isn't 20 steps. First, no data leaves the publisher without an >>> Interest. Second, it's more like one API call: make this list >>>available >>> as versioned Data with a minimum allowable time between responses of >>>one >>> second. No matter how many requests outstanding, a stable load on the >>> source. >> >> Agreed. I wasn?t counting this as overhead. >> >> >>>> >>>> B- You request /username/mailbox/list >>>> >>>> C- You receive /username/mailbox/list/20 (lifetime of 1 second) >>> >>> At this point, you decide if list v20 is sufficient for your purposes. >>> Perhaps it is. >> >> This is true in the non selector case as well. >> >> >>> >>> Some thoughts: >>> >>> - In Scheme B, if the list has not changed, you still get a response, >>> because the publisher has no way to know anything about the consumer's >>> knowledge. In Scheme A, publishers have that knowledge from the >>>exclusion >>> and need not reply. >> >> This is true without selectors. >> >>> If NACKs are used as heartbeats, they can be returned >>> more slowly... say every 3-10 seconds. So, many data packets are >>> potentially saved. Hopefully we don't get one email per second... :) >> >> I?m not sure what you mean by this. I wouldn?t recommend relying on >> not-answering for valid requests. Otherwise you start relying on >>timeouts. >> >> >>> - Benefit seems apparent in multi-consumer scenarios, even without >>>sync. >>> Let's say I have 5 personal devices requesting mail. In Scheme B, every >>> publisher receives and processes 5 interests per second on average. In >>> Scheme A, with an upstream caching node, each receives 1 per second >>> maximum. The publisher still has to throttle requests, but with no >>>help >>> or scaling support from the network. >> >> This can be done without selectors. As long as all the clients produce >>a >> request for the same name they can take advantage caching. > >What Jeff responded to is that scheme B requires a freshness of 0 for the >initial interest to get to the producer (in order to get the latest list >of email names). If freshness is 0, then there's no caching of the data. > No meter how the clients name their Interests, they can't take advantage >of caching. > >Lan >> >> Nacho >> >> >> >> >>>> >>>> >>>> In Scheme A you sent 2 interests, received 2 objects, going all the >>>>way >>>> to >>>> source. >>>> In Scheme B you sent 1 interest, received 1 object, going all the way >>>>to >>>> source. >>>> >>>> Scheme B is always better (doesn?t need to do C, D) for this example >>>>and >>>> it uses exact matching. >>> >>> It's better if your metric is roundtrips and you don't care about load >>>on >>> the publisher, lower traffic in times of no new data, etc. But if you >>> don't, then you can certainly implement Scheme B on NDN, too. >>> >>> Jeff >>> >>>> >>>> You can play tricks with the lifetime of the object in both cases, >>>> selectors or not. >>>> >>>>> >>>>> - meanwhile, the email client can retrieve the emails using the names >>>>> obtained in these lists. Some emails may turn out to be >>>>>unnecessary, so >>>>> they will be discarded when a most recent list comes. The email >>>>>client >>>>> can also keep state about the names of the emails it has deleted to >>>>> minimize this problem. >>>> >>>> This is independent of selectors / exact matching. >>>> >>>> Nacho >>>> >>>> >>>> >>>>> >>>>> On Sep 24, 2014, at 2:37 AM, Marc.Mosko at parc.com wrote: >>>>> >>>>>> Ok, let?s take that example and run with it a bit. I?ll walk >>>>>>through >>>>>> a >>>>>> ?discover all? example. This example leads me to why I say >>>>>>discovery >>>>>> should be separate from data retrieval. I don?t claim that we have >>>>>>a >>>>>> final solution to this problem, I think in a distributed >>>>>>peer-to-peer >>>>>> environment solving this problem is difficult. If you have a >>>>>>counter >>>>>> example as to how this discovery could progress using only the >>>>>> information know a priori by the requester, I would be interesting >>>>>>in >>>>>> seeing that example worked out. Please do correct me if you think >>>>>>this >>>>>> is wrong. >>>>>> >>>>>> You have mails that were originally numbered 0 - 10000, sequentially >>>>>> by >>>>>> the server. >>>>>> >>>>>> You travel between several places and access different emails from >>>>>> different places. This populates caches. Lets say 0,3,6,9,? are on >>>>>> cache A, 1,4,7,10,? are on cache B ,and 2,5,8,11? are on cache C. >>>>>> Also, you have deleted 500 random emails, so there?s only 9500 >>>>>>emails >>>>>> actually out there. >>>>>> >>>>>> You setup a new computer and now want to download all your emails. >>>>>> The >>>>>> new computer is on the path of caches C, B, then A, then the >>>>>> authoritative source server. The new email program has no initial >>>>>> state. The email program only knows that the email number is an >>>>>> integer >>>>>> that starts at 0. It issues an interest for /mail/inbox, and asks >>>>>>for >>>>>> left-most child because it want to populate in order. It gets a >>>>>> response from cache C with mail 2. >>>>>> >>>>>> Now, what does the email program do? It cannot exclude the range >>>>>>0..2 >>>>>> because that would possibly miss 0 and 1. So, all it can do is >>>>>>exclude >>>>>> the exact number ?2? and ask again. It then gets cache C again and >>>>>>it >>>>>> responds with ?5?. There are about 3000 emails on cache C, and if >>>>>>they >>>>>> all take 4 bytes (for the exclude component plus its coding >>>>>>overhead), >>>>>> then that?s 12KB of exclusions to finally exhaust cache C. >>>>>> >>>>>> If we want Interests to avoid fragmentation, we can fit about 1200 >>>>>> bytes of exclusions, or 300 components. This means we need about 10 >>>>>> interest messages. Each interest would be something like ?exclude >>>>>> 2,5,8,11,?, >300?, then the next would be ?exclude < 300, 302, 305, >>>>>> 308, >>>>>> ?, >600?, etc. >>>>>> >>>>>> Those interests that exclude everything at cache C would then hit, >>>>>>say >>>>>> cache B and start getting results 1, 4, 7, ?. This means an >>>>>>Interest >>>>>> like ?exclude 2,5,8,11,?, >300? would then get back number 1. That >>>>>> means the next request actually has to split that one interest?s >>>>>> exclude >>>>>> in to two (because the interest was at maximum size), so you now >>>>>>issue >>>>>> two interests where one is ?exclude 1, 2, 5, 8, >210? and the other >>>>>>is >>>>>> ?<210, 212, 215, ?, >300?. >>>>>> >>>>>> If you look in the CCNx 0.8 java code, there should be a class that >>>>>> does these Interest based discoveries and does the Interest >>>>>>splitting >>>>>> based on the currently know range of discovered content. I don?t >>>>>>have >>>>>> the specific reference right now, but I can send a link if you are >>>>>> interested in seeing that. The java class keeps state of what has >>>>>>been >>>>>> discovered so far, so it could re-start later if interrupted. >>>>>> >>>>>> So all those interests would now be getting results form cache B. >>>>>>You >>>>>> would then start to split all those ranges to accommodate the >>>>>>numbers >>>>>> coming back from B. Eventually, you?ll have at least 10 Interest >>>>>> messages outstanding that would be excluding all the 9500 messages >>>>>>that >>>>>> are in caches A, B, and C. Some of those interest messages might >>>>>> actually reach an authoritative server, which might respond too. It >>>>>> would like be more than 10 interests due to the algorithm that?s >>>>>>used >>>>>> to >>>>>> split full interests, which likely is not optimal because it does >>>>>>not >>>>>> know exactly where breaks should be a priori. >>>>>> >>>>>> Once you have exhausted caches A, B, and C, the interest messages >>>>>> would >>>>>> reach the authoritative source (if its on line), and it would be >>>>>> issuing >>>>>> NACKs (i assume) for interests have have excluded all non-deleted >>>>>> emails. >>>>>> >>>>>> In any case, it takes, at best, 9,500 round trips to ?discover? all >>>>>> 9500 emails. It also required Sum_{i=1..10000} 4*i = 200,020,000 >>>>>> bytes >>>>>> of Interest exclusions. Note that it?s an arithmetic sum of bytes >>>>>>of >>>>>> exclusion, because at each Interest the size of the exclusions >>>>>> increases >>>>>> by 4. There was an NDN paper about light bulb discovery (or >>>>>>something >>>>>> like that) that noted this same problem and proposed some work >>>>>>around, >>>>>> but I don?t remember what they proposed. >>>>>> >>>>>> Yes, you could possibly pipeline it, but what would you do? In this >>>>>> example, where emails 0 - 10000 (minus some random ones) would allow >>>>>> you >>>>>> ? if you knew a priori ? to issue say 10 interest in parallel that >>>>>>ask >>>>>> for different ranges. But, 2 years from now your undeleted emails >>>>>> might >>>>>> range form 100,000 - 150,000. The point is that a discovery >>>>>>protocol >>>>>> does not know, a priori, what is to be discovered. It might start >>>>>> learning some stuff as it goes on. >>>>>> >>>>>> If you could have retrieved just a table of contents from each >>>>>>cache, >>>>>> where each ?row? is say 64 bytes (i.e. the name continuation plus >>>>>>hash >>>>>> value), you would need to retrieve 3300 * 64 = 211KB from each cache >>>>>> (total 640 KB) to list all the emails. That would take 640KB / >>>>>>1200 = >>>>>> 534 interest messages of say 64 bytes = 34 KB to discover all 9500 >>>>>> emails plus another set to fetch the header rows. That?s, say 68 KB >>>>>>of >>>>>> interest traffic compared to 200 MB. Now, I?ve not said how to list >>>>>> these tables of contents, so an actual protocol might be higher >>>>>> communication cost, but even if it was 10x worse that would still >>>>>>be an >>>>>> attractive tradeoff. >>>>>> >>>>>> This assumes that you publish just the ?header? in the 1st segment >>>>>> (say >>>>>> 1 KB total object size including the signatures). That?s 10 MB to >>>>>> learn >>>>>> the headers. >>>>>> >>>>>> You could also argue that the distribute of emails over caches is >>>>>> arbitrary. That?s true, I picked a difficult sequence. But unless >>>>>>you >>>>>> have some positive controls on what could be in a cache, it could be >>>>>> any >>>>>> difficult sequence. I also did not address the timeout issue, and >>>>>>how >>>>>> do you know you are done? >>>>>> >>>>>> This is also why sync works so much better than doing raw interest >>>>>> discovery. Sync exchanges tables of contents and diffs, it does not >>>>>> need to enumerate by exclusion everything to retrieve. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> >>>>>> On Sep 24, 2014, at 4:27 AM, Tai-Lin Chu >>>>>>wrote: >>>>>> >>>>>>> discovery can be reduced to "pattern detection" (can we infer what >>>>>>> exists?) and "pattern validation" (can we confirm this guess?) >>>>>>> >>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>see a >>>>>>> pattern with static (/mail/inbox) and variable (148) components; >>>>>>>with >>>>>>> proper naming convention, computers can also detect this pattern >>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>generate >>>>>>> a >>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>> selectors >>>>>>> I can further refine my guesses. >>>>>>> >>>>>>> To validate them, bloom filter can provide "best effort" >>>>>>> discovery(with some false positives, so I call it "best-effort") >>>>>>> before I stupidly send all the interests to the network. >>>>>>> >>>>>>> The discovery protocol, as I described above, is essentially >>>>>>>"pattern >>>>>>> detection by naming convention" and "bloom filter validation." This >>>>>>> is >>>>>>> definitely one of the "simpler" discovery protocol, because the >>>>>>>data >>>>>>> producer only need to add additional bloom filter. Notice that we >>>>>>>can >>>>>>> progressively add entries to bfilter with low computation cost. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>>> Ok, yes I think those would all be good things. >>>>>>>> >>>>>>>> One thing to keep in mind, especially with things like time series >>>>>>>> sensor >>>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>> That?s easy >>>>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>>> not >>>>>>>> know >>>>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>>> the >>>>>>>> data >>>>>>>> explicitly require. That said, I think some of the things you >>>>>>>> listed >>>>>>>> are >>>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>> genome data. >>>>>>>> >>>>>>>> We also need to state what the forwarding strategies are and what >>>>>>>> the >>>>>>>> cache >>>>>>>> behavior is. >>>>>>>> >>>>>>>> I outlined some of the points that I think are important in that >>>>>>>> other >>>>>>>> posting. While ?discover latest? is useful, ?discover all? is >>>>>>>>also >>>>>>>> important, and that one gets complicated fast. So points like >>>>>>>> separating >>>>>>>> discovery from retrieval and working with large data sets have >>>>>>>>been >>>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>>> starting >>>>>>>> from 0 and working through the Discovery service definition from >>>>>>>> scratch >>>>>>>> along with data set use cases. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Marc, >>>>>>>> >>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one >>>>>>>>step >>>>>>>> more >>>>>>>> specific, which was to see if we could identify a few specific use >>>>>>>> cases >>>>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>>> data and >>>>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>>>> sets; >>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>looking >>>>>>>> at >>>>>>>> that's driving considerations of discovery? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jeff >>>>>>>> >>>>>>>> From: >>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>> To: Jeff Burke >>>>>>>> Cc: , >>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>> >>>>>>>> Jeff, >>>>>>>> >>>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>>> Discovery. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/ >>>>>>>>00 >>>>>>>> 0 >>>>>>>> 2 >>>>>>>> 00.html >>>>>>>> >>>>>>>> I think it would be very productive to talk about what Discovery >>>>>>>> should do, >>>>>>>> and not focus on the how. It is sometimes easy to get caught up >>>>>>>>in >>>>>>>> the how, >>>>>>>> which I think is a less important topic than the what at this >>>>>>>>stage. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>> wrote: >>>>>>>> >>>>>>>> Marc, >>>>>>>> >>>>>>>> If you can't talk about your protocols, perhaps we can discuss >>>>>>>>this >>>>>>>> based >>>>>>>> on use cases. What are the use cases you are using to evaluate >>>>>>>> discovery? >>>>>>>> >>>>>>>> Jeff >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>> wrote: >>>>>>>> >>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>forwarder >>>>>>>> can >>>>>>>> send interests different ways you don't have a consistent >>>>>>>>underlying >>>>>>>> set >>>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>> discover >>>>>>>> every version. >>>>>>>> >>>>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>>> answer. >>>>>>>> If different content pieces are scattered between different >>>>>>>>caches I >>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>version. >>>>>>>> >>>>>>>> I'm sorry to be pointing out problems without offering solutions >>>>>>>>but >>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>> >>>>>>>> Sent from my telephone >>>>>>>> >>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>wrote: >>>>>>>> >>>>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>>>> the >>>>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>>>> better. >>>>>>>> >>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>>> soon >>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>> Regular >>>>>>>> language or context free language might become part of selector >>>>>>>>too. >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>>> That will get you one reading then you need to exclude it and ask >>>>>>>> again. >>>>>>>> >>>>>>>> Sent from my telephone >>>>>>>> >>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>wrote: >>>>>>>> >>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>> with a particular cache, then you need to always use individual >>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>versions >>>>>>>> of an object. >>>>>>>> >>>>>>>> >>>>>>>> I am very confused. For your example, if I want to get all today's >>>>>>>> sensor data, I just do (Any..Last second of last day)(First second >>>>>>>> of >>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>> >>>>>>>> >>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>> >>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>> wrote: >>>>>>>> >>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>> >>>>>>>> >>>>>>>> Could you explain why missing content object situation happens? >>>>>>>>also >>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>> exclude; >>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>> possible. >>>>>>>> >>>>>>>> >>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>> with a particular cache, then you need to always use individual >>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>versions >>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>updated, >>>>>>>> say, once per second you will have 86,400 of them per day. If >>>>>>>>each >>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>> >>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>> timestamp makes sense here, but its just an example of needing a >>>>>>>>lot >>>>>>>> of exclusions. >>>>>>>> >>>>>>>> >>>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>>> cache B >>>>>>>> >>>>>>>> >>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>> interest, and cache A will return v101 if it exists. Like you >>>>>>>>said, >>>>>>>> if >>>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>>> know >>>>>>>> that v101 even exist? >>>>>>>> >>>>>>>> >>>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>>> forwarder will always send each interest to all replicas, then >>>>>>>>yes, >>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>> cache >>>>>>>> A and cache B, then your application could miss v101. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>number >>>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>>only >>>>>>>> one RWMutex for the whole tree. I don't know whether adding lock >>>>>>>>for >>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>> >>>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>>> (ndn >>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>> >>>>>>>> >>>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>> >>>>>>>> This is probably getting off-topic from the original post about >>>>>>>> naming conventions. >>>>>>>> >>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>> content objects you want to discovery unless you avoid all range >>>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>> consistent sets is better. >>>>>>>> >>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>> should be transitive between caches, but imagine this. You send >>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>> consistent sets, it seems like even doing latest version discovery >>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>> authoritative source because you can never believe an intermediate >>>>>>>> cache that there?s not something more recent. >>>>>>>> >>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>>> have three, 2 for content, 1 for device) compared to selector >>>>>>>>based >>>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>>> have the non-determinism sketched above. >>>>>>>> >>>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>> Content Based Networking (CBN) had some some methods to create >>>>>>>>data >>>>>>>> structures based on predicates, maybe those would be better. But >>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>entries >>>>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>> implementation only requires at most 3 lookups (one by name, one >>>>>>>>by >>>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>>> other things to optimize lookup for an extra write. >>>>>>>> >>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>>> and that will be expensive. It would be interesting to see what a >>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> >>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>> wrote: >>>>>>>> >>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>> besides typed component: >>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>> things? >>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>> other >>>>>>>> faster technique to replace selector? >>>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>> wrote: >>>>>>>> >>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>> how the registry of types is managed. >>>>>>>> >>>>>>>> >>>>>>>> Could you share it with us? >>>>>>>> >>>>>>>> Sure. Here?s a strawman. >>>>>>>> >>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>> >>>>>>>> The type space is currently shared with the types used for the >>>>>>>> entire protocol, that gives us two options: >>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>> likelihood there will be at least as much and probably more need >>>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>>> name component types. >>>>>>>> >>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>> approval). >>>>>>>> >>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>> cases where there are no special semantics on the name component. >>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>> globally understood types that are part of the base or extension >>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>> (say another 1024 types) >>>>>>>> - We give the rest of the space to application assignment. >>>>>>>> >>>>>>>> Make sense? >>>>>>>> >>>>>>>> >>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>> performance flaws in the design >>>>>>>> >>>>>>>> >>>>>>>> we could design for performance, >>>>>>>> >>>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>> Moore?s Law will come to the rescue. >>>>>>>> >>>>>>>> but I think there will be a turning >>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>> >>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>>> or not used because they impose a performance penalty relative to >>>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>> rather than improving performance. For those environments bad >>>>>>>> performance just never get better. >>>>>>>> >>>>>>>> Do you >>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>> performance improvement? >>>>>>>> >>>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>>> functions). >>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>> require extra memory references. >>>>>>>> >>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>> cycles and memory references? >>>>>>>> >>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>> wrote: >>>>>>>> >>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>> perform >>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>> becomes >>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>> >>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>> performance flaws in the design: >>>>>>>> a) clock rates are not getting (much) faster >>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>> c) data structures that require locks to manipulate >>>>>>>> successfully will be relatively more expensive, even with >>>>>>>> near-zero lock contention. >>>>>>>> >>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>> its design. We just forgot those because the design elements >>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>> poster children for this are: >>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>> anywhere >>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>> in working around. >>>>>>>> >>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>> engineers that got most of it right. >>>>>>>> >>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>> Now I >>>>>>>> see that there are 3 approaches: >>>>>>>> 1. we should not define a naming convention at all >>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>> types >>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>> additional >>>>>>>> marker space >>>>>>>> >>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>> how the registry of types is managed. >>>>>>>> >>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>> hands and letting applications design their own mutually >>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>> resilient against aliasing. >>>>>>>> >>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>> convention >>>>>>>> needs to be revised. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>> wrote: >>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>> to fit in (the >>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>> current NDN >>>>>>>> experiments? >>>>>>>> >>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>> Related: Many URLs >>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>> text lines, and >>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>> I see. >>>>>>>> >>>>>>>> >>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>> >>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>> architectures, >>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>> bytes in memory, >>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>> 32-byte blocks >>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>> If you need to >>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>> have to read past >>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>> to get to the >>>>>>>> name, it will be pretty expensive too. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>> wrote: >>>>>>>> >>>>>>>> Does this make that much difference? >>>>>>>> >>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>> it is: >>>>>>>> >>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>> from the start >>>>>>>> offset of the beginning of the name. >>>>>>>> OR >>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>> >>>>>>>> How much speed are you getting from one to the other? You >>>>>>>> seem to imply >>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>> case. >>>>>>>> >>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>> for the index, >>>>>>>> then all the required cache lines for the first 5 >>>>>>>> components. For the >>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>> 5 components. >>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>> than >>>>>>>> evaluating a number and computing an addition, you might >>>>>>>> find that the >>>>>>>> performance of the index is actually slower than the >>>>>>>> performance of the >>>>>>>> direct access. >>>>>>>> >>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>> all, for >>>>>>>> example, if you just get the offsets and then send the >>>>>>>> offsets as >>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>> you may see a >>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>> than in >>>>>>>> reading the index. So, if the regular part of the name >>>>>>>> that you?re >>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>> name is to be >>>>>>>> processed by a different processor, then your might see some >>>>>>>> performance >>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>> bet this is not >>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>> >>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>> protocol with >>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>> the name to a >>>>>>>> different processor than the index). >>>>>>>> >>>>>>>> If you have numbers that show that the index is faster I >>>>>>>> would like to see >>>>>>>> under what conditions and architectural assumptions. >>>>>>>> >>>>>>>> Nacho >>>>>>>> >>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>> correct me if >>>>>>>> I?m wrong.) >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nacho (Ignacio) Solis >>>>>>>> Protocol Architect >>>>>>>> Principal Scientist >>>>>>>> Palo Alto Research Center (PARC) >>>>>>>> +1(650)812-4458 >>>>>>>> Ignacio.Solis at parc.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>> amount of >>>>>>>> bytes: >>>>>>>> >>>>>>>> i.e., >>>>>>>> Type = Offsets >>>>>>>> Length = 10 Bytes >>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>> >>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>> name is too >>>>>>>> long. >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>> >>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>> you only >>>>>>>> want the first x components) you can directly have it using >>>>>>>> the >>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>> iteratively parse >>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>> directly >>>>>>>> access to the firs x components. >>>>>>>> >>>>>>>> I don't get it. What you described only works if the >>>>>>>> "offset" is >>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>> parse x-1 >>>>>>>> offsets to get to the x offset. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>> >>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>> like the >>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>> understand what >>>>>>>> you >>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>> entirely >>>>>>>> different >>>>>>>> scheme where the info that describes the name-components is >>>>>>>> ... >>>>>>>> someplace >>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>> you say >>>>>>>> "field >>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>> TLV)? >>>>>>>> >>>>>>>> Correct. >>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>> name >>>>>>>> hierarchy >>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>> offset to use >>>>>>>> in >>>>>>>> order to retrieve special components. >>>>>>>> As for the field separator, it is something like "/". >>>>>>>> Aliasing is >>>>>>>> avoided as >>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>> use the >>>>>>>> "offset >>>>>>>> TLV " to do that. >>>>>>>> >>>>>>>> So now, it may be an aesthetic question but: >>>>>>>> >>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>> you only >>>>>>>> want >>>>>>>> the first x components) you can directly have it using the >>>>>>>> offsets. >>>>>>>> With the >>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>> x-1 >>>>>>>> components. >>>>>>>> With the offset structure you cane directly access to the >>>>>>>> firs x >>>>>>>> components. >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> >>>>>>>> -- Mark >>>>>>>> >>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>> >>>>>>>> The why is simple: >>>>>>>> >>>>>>>> You use a lot of "generic component type" and very few >>>>>>>> "specific >>>>>>>> component type". You are imposing types for every component >>>>>>>> in order >>>>>>>> to >>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>> rule >>>>>>>> (specify >>>>>>>> the component's type ) to handle exceptions! >>>>>>>> >>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>> prefer >>>>>>>> to >>>>>>>> have the name as simple sequence bytes with a field >>>>>>>> separator. Then, >>>>>>>> outside the name, if you have some components that could be >>>>>>>> used at >>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>> that >>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>> version, >>>>>>>> segment, etc in the name... >>>>>>>> >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>> >>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>> >>>>>>>> I think we agree on the small number of "component types". >>>>>>>> However, if you have a small number of types, you will end >>>>>>>> up with >>>>>>>> names >>>>>>>> containing many generic components types and few specific >>>>>>>> components >>>>>>>> types. Due to the fact that the component type specification >>>>>>>> is an >>>>>>>> exception in the name, I would prefer something that specify >>>>>>>> component's >>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>> that >>>>>>>> applications MUST use). >>>>>>>> >>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>> explanation >>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>> e.g.) >>>>>>>> and >>>>>>>> there's been email trying to explain that applications don't >>>>>>>> have to >>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>> prefer >>>>>>>> the >>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>> preference in >>>>>>>> the face of the points about the problems. can you say why >>>>>>>> it is >>>>>>>> that >>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>>> . >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Fri Sep 26 01:26:41 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Fri, 26 Sep 2014 08:26:41 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: The ccnx 1.0 cache control directives for content objects are specified here, by the way. This applies to content objects. http://www.ccnx.org/pubs/ccnx-mosko-caching-01.txt For an Interest, we played with the idea of adding a ?MaximumAge? restriction so only a response with at least that many mills of remaining lifetime would satisfy the interest, but that didn?t seem like a great idea, so its not in the spec anywhere. Saying ?MustBeFresh? in an Interest, as its written up in NDN and in the old CCNx 0.x specs, does not work either. Here?s a counter example: A ? | B ? C ? D ? E Node A sends an Interest for ?stale ok?. Node C forwards it to D. E is the actual producer who could send a fresh response. Node B sends a ?MustBeFresh? Interest. Node C, seeing that its different than the previous interest, forwards it to node D too. Node D receives the 1st interest from A and returns a stale Content Object. Node C receives a ContentObject that says ?Freshness=5?. So, it must be fresh, yes? Node C will send the stale response to both A and B. Node E will eventually get the request from node B, because D could not satisy the request for a fresh response. Node E forwards the interest to D, who caches it, then forwards it to C. Node C has already satisfied the two interest, so it drops the fresh response. Marc On Sep 26, 2014, at 10:12 AM, wrote: > On Fri, 26 Sep 2014, Ignacio.Solis at parc.com wrote: > >> On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: >> >>> In NDN, if a router wants to check the signature, then it can check. If >>> it wants to skip the checking, that's fine too. If the design doesn't >>> allow the router to verify the signature, then that's a problem. In the >>> above description, the cache signs a data packet with a name owned by >>> someone else, it seems problematic for a design to advocate this. >> >> Routers that support the Selector Protocol could check signatures. >> >>> >>> One difference is that here the returned data can satisfy only that >>> interest. In the original selector design, the returned data can satisfy >>> other Interests with the same name but different selectors (e.g., > 50). >> >> Interests with the same selector would be exact matched throughout the >> network at any node. >> >> Interests with different selectors would not match on all routers, but >> they would match just fine on routers that supported the Selector Protocol. >> >> Basically, everything you want works on routers that support the Selector >> Protocol, but routers that don?t want to support it, don?t have to. > > A perhaps more careful statement would be: > > - interests carry a query, not the name of data > (except if you go for the hash). > > - caching results based on the query (that you call name) > only works of the query is idempotent. > > I doubt that the selector expressiveness remains in idempotent land. > > One requirement thus would be that selector-carrying interests can disable cache reponses from nodes that dont support selectors. > > While it is true that you can force this by adding a nonce to each query, it would be cleaner to have an explicit signaling. Such a don't-cache-flag towards selector-ignoring nodes would be different from the dont-cache-flag in the query (that is directed to selector-aware nodes). > > > christian > >> >> >> Nacho >> >> >> >>>> >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From Ignacio.Solis at parc.com Fri Sep 26 01:29:47 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Fri, 26 Sep 2014 08:29:47 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> Message-ID: On 9/26/14, 10:21 AM, "Tai-Lin Chu" wrote: >>B - Yes, you can consider the name prefix is ?owned? by the server, but >the answer is actually something that the cache is choosing. The cache is >choosing from the set if data that it has. The data that it encapsulates >_is_ signed by the producer. Anybody that can decapsulate the data can >verify that this is the case. > >does the producer sign the table of content directly? or producer >signs the cache's key, which in turns sign the table of content, so >that we can verify? The producer doesn?t sign anything (new). You (a node running the Selector Protocol) can verify because you can decapsulate the message, and inside you find a data packet that is the original data packet that the producer generated. See my diagram: name = /mail/inbox/list/selector_matching/ payload = [ matching name = /mail/inbox/list/v101 ] [ embedded object = < name = /mail/inbox/list/v101, payload = list, signature = mail server > ] signature = responding cache The embedded object can be verified in any traditional way. Nacho > >On Fri, Sep 26, 2014 at 12:46 AM, wrote: >> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: >> >>>How can a cache respond to /mail/inbox/selector_matching/>>payload> with a table of content? This name prefix is owned by the mail >>>server. Also the reply really depends on what is in the cache at the >>>moment, so the same name would correspond to different data. >> >> A - Yes, the same name would correspond to different data. This is true >> given that then data has changed. NDN (and CCN) has no architectural >> requirement that a name maps to the same piece of data (Obviously not >> talking about self certifying hash-based names). >> >> B - Yes, you can consider the name prefix is ?owned? by the server, but >> the answer is actually something that the cache is choosing. The cache >>is >> choosing from the set if data that it has. The data that it >>encapsulates >> _is_ signed by the producer. Anybody that can decapsulate the data can >> verify that this is the case. >> >> Nacho >> >> >>>On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>> >>>> My beating on ?discover all? is exactly because of this. Let?s define >>>>discovery service. If the service is just ?discover latest? >>>>(left/right), can we not simplify the current approach? If the service >>>>includes more than ?latest?, then is the current approach the right >>>>approach? >>>> >>>> Sync has its place and is the right solution for somethings. However, >>>>it should not be a a bandage over discovery. Discovery should be its >>>>own valid and useful service. >>>> >>>> I agree that the exclusion approach can work, and work relatively >>>>well, >>>>for finding the rightmost/leftmost child. I believe this is because >>>>that operation is transitive through caches. So, within whatever >>>>timeout an application is willing to wait to find the ?latest?, it can >>>>keep asking and asking. >>>> >>>> I do think it would be best to actually try to ask an authoritative >>>>source first (i.e. a non-cached value), and if that fails then probe >>>>caches, but experimentation may show what works well. This is based on >>>>my belief that in the real world in broad use, the namespace will >>>>become >>>>pretty polluted and probing will result in a lot of junk, but that?s >>>>future prognosticating. >>>> >>>> Also, in the exact match vs. continuation match of content object to >>>>interest, it is pretty easy to encode that ?selector? request in a name >>>>component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) >>>>and any participating cache can respond with a link (or encapsulate) a >>>>response in an exact match system. >>>> >>>> In the CCNx 1.0 spec, one could also encode this a different way. One >>>>could use a name like ?/mail/inbox/selector_matching/? >>>>and in the payload include "exclude_before=(t=version, l=2, v=279) & >>>>sort=right?. This means that any cache that could process the ? >>>>selector_matching? function could look at the interest payload and >>>>evaluate the predicate there. The predicate could become large and not >>>>pollute the PIT with all the computation state. Including ?>>>payload>? in the name means that one could get a cached response if >>>>someone else had asked the same exact question (subject to the content >>>>object?s cache lifetime) and it also servers to multiplex different >>>>payloads for the same function (selector_matching). >>>> >>>> Marc >>>> >>>> >>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>wrote: >>>> >>>>> >>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>> >>>>> >>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchronizat >>>>>io >>>>>nPr >>>>> otocol.html >>>>> >>>>> J. >>>>> >>>>> >>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>> >>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>*all*-value" >>>>>> efficiently. >>>>>> There are still interesting topics on >>>>>> 1. how do we express the discovery query? >>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>> discovery query with current selector? >>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>way? >>>>>> >>>>>> I personally see a named data as a set, which can then be >>>>>>categorized >>>>>> into "ordered set", and "unordered set". >>>>>> some questions that any discovery expression must solve: >>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>leaf >>>>>> 2. set contains member X? >>>>>> 3. is set ordered or not >>>>>> 4. (ordered) first, prev, next, last >>>>>> 5. if we enforce component ordering, answer question 4. >>>>>> 6. recursively answer all questions above on any set member >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> From: >>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>> To: Jeff Burke >>>>>>> Cc: , , >>>>>>> >>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>> >>>>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>>>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>>>>>> shouldn?t >>>>>>> care if its email messages or temperature readings or world cup >>>>>>>photos. >>>>>>> >>>>>>> >>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>case, >>>>>>> as you >>>>>>> point out, sync-style approaches may be best. But I am not sure >>>>>>>that >>>>>>> this >>>>>>> definition is complete. The most pressing example that I can think >>>>>>>of >>>>>>> is >>>>>>> best-effort latest-value, in which the consumer's goal is to get >>>>>>>the >>>>>>> latest >>>>>>> copy the network can deliver at the moment, and may not care about >>>>>>> previous >>>>>>> values or (if freshness is used well) potential later versions. >>>>>>> >>>>>>> Another case that seems to work well is video seeking. Let's say I >>>>>>> want to >>>>>>> enable random access to a video by timecode. The publisher can >>>>>>>provide a >>>>>>> time-code based discovery namespace that's queried using an >>>>>>>Interest >>>>>>> that >>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>which >>>>>>> returns an interest that, via the name, provides the exact timecode >>>>>>>of >>>>>>> the >>>>>>> keyframe in question and a link to a segment-based namespace for >>>>>>> efficient >>>>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>>>way, >>>>>>> the >>>>>>> consumer has random access capability. If the NDN is the moral >>>>>>> equivalent >>>>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>>>provide >>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>> >>>>>>> >>>>>>> I described one set of problems using the exclusion approach, and >>>>>>>that >>>>>>> an >>>>>>> NDN paper on device discovery described a similar problem, though >>>>>>>they >>>>>>> did >>>>>>> not go into the details of splitting interests, etc. That all was >>>>>>> simple >>>>>>> enough to see from the example. >>>>>>> >>>>>>> Another question is how does one do the discovery with exact match >>>>>>> names, >>>>>>> which is also conflating things. You could do a different >>>>>>>discovery >>>>>>> with >>>>>>> continuation names too, just not the exclude method. >>>>>>> >>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>about >>>>>>>its >>>>>>> ?table of contents? for a prefix so one can get a consistent set of >>>>>>> results >>>>>>> without all the round-trips of exclusions. Actually downloading >>>>>>>the >>>>>>> ?headers? of the messages would be the same bytes, more or less. >>>>>>>In >>>>>>>a >>>>>>> way, >>>>>>> this is a little like name enumeration from a ccnx 0.x repo, but >>>>>>>that >>>>>>> protocol has its own set of problems and I?m not suggesting to use >>>>>>>that >>>>>>> directly. >>>>>>> >>>>>>> One approach is to encode a request in a name component and a >>>>>>> participating >>>>>>> cache can reply. It replies in such a way that one could continue >>>>>>> talking >>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>interest >>>>>>> with >>>>>>> a request for not-that-cache. >>>>>>> >>>>>>> >>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>scenario? >>>>>>> >>>>>>> >>>>>>> Another approach is to try to ask the authoritative source for the >>>>>>> ?current? >>>>>>> manifest name, i.e. /mail/inbox/current/, which could return >>>>>>>the >>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>manifest >>>>>>> from >>>>>>> the link could come from caches because you how have a consistent >>>>>>>set of >>>>>>> names to ask for. If you cannot talk with an authoritative source, >>>>>>>you >>>>>>> could try again without the nonce and see if there?s a cached copy >>>>>>>of a >>>>>>> recent version around. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>> >>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>see a >>>>>>> pattern with static (/mail/inbox) and variable (148) components; >>>>>>>with >>>>>>> proper naming convention, computers can also detect this pattern >>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>generate >>>>>>>a >>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>selectors >>>>>>> I can further refine my guesses. >>>>>>> >>>>>>> >>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>application >>>>>>> design). You have an app (a mail server / inbox) and you want it >>>>>>>to >>>>>>> list >>>>>>> your emails? An email list is an application data structure. I >>>>>>>don?t >>>>>>> think you should use the network structure to reflect this. >>>>>>> >>>>>>> >>>>>>> I think Tai-Lin is trying to sketch a small example, not propose a >>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>> >>>>>>> >>>>>>> Another way to look at it is that if the network architecture is >>>>>>> providing >>>>>>> the equivalent of distributed storage to the application, perhaps >>>>>>>the >>>>>>> application data structure could be adapted to match the >>>>>>>affordances >>>>>>>of >>>>>>> the network. Then it would not be so bad that the two structures >>>>>>>were >>>>>>> aligned. >>>>>>> >>>>>>> >>>>>>> I?ll give you an example, how do you delete emails from your inbox? >>>>>>>If >>>>>>> an >>>>>>> email was cached in the network it can never be deleted from your >>>>>>>inbox? >>>>>>> >>>>>>> >>>>>>> This is conflating two issues - what you are pointing out is that >>>>>>>the >>>>>>> data >>>>>>> structure of a linear list doesn't handle common email management >>>>>>> operations well. Again, I'm not sure if that's what he was getting >>>>>>>at >>>>>>> here. But deletion is not the issue - the availability of a data >>>>>>>object >>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>perspective >>>>>>> of the application. >>>>>>> >>>>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>>>> >>>>>>> This problem is true for most (any?) situations where you use >>>>>>>network >>>>>>> name >>>>>>> structure to directly reflect the application data structure. >>>>>>> >>>>>>> >>>>>>> Not sure I understand how you make the leap from the example to the >>>>>>> general statement. >>>>>>> >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>> >>>>>>> Ok, yes I think those would all be good things. >>>>>>> >>>>>>> One thing to keep in mind, especially with things like time series >>>>>>> sensor >>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>That?s >>>>>>> easy >>>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>>not >>>>>>> know >>>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>>the >>>>>>> data >>>>>>> explicitly require. That said, I think some of the things you >>>>>>>listed >>>>>>> are >>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>genome >>>>>>> data. >>>>>>> >>>>>>> We also need to state what the forwarding strategies are and what >>>>>>>the >>>>>>> cache >>>>>>> behavior is. >>>>>>> >>>>>>> I outlined some of the points that I think are important in that >>>>>>>other >>>>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>>>> important, and that one gets complicated fast. So points like >>>>>>> separating >>>>>>> discovery from retrieval and working with large data sets have been >>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>> starting >>>>>>> from 0 and working through the Discovery service definition from >>>>>>> scratch >>>>>>> along with data set use cases. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>> wrote: >>>>>>> >>>>>>> Hi Marc, >>>>>>> >>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>>> more >>>>>>> specific, which was to see if we could identify a few specific use >>>>>>> cases >>>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>>data >>>>>>> and >>>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>>> sets; >>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>looking >>>>>>>at >>>>>>> that's driving considerations of discovery? >>>>>>> >>>>>>> Thanks, >>>>>>> Jeff >>>>>>> >>>>>>> From: >>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>> To: Jeff Burke >>>>>>> Cc: , >>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>> >>>>>>> Jeff, >>>>>>> >>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>> Discovery. >>>>>>> >>>>>>> >>>>>>> >>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/0 >>>>>>>00 >>>>>>>20 >>>>>>> 0 >>>>>>> .html >>>>>>> >>>>>>> I think it would be very productive to talk about what Discovery >>>>>>>should >>>>>>> do, >>>>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>>>the >>>>>>> how, >>>>>>> which I think is a less important topic than the what at this >>>>>>>stage. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>> wrote: >>>>>>> >>>>>>> Marc, >>>>>>> >>>>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>>> based >>>>>>> on use cases. What are the use cases you are using to evaluate >>>>>>> discovery? >>>>>>> >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>> wrote: >>>>>>> >>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>forwarder >>>>>>> can >>>>>>> send interests different ways you don't have a consistent >>>>>>>underlying >>>>>>> set >>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>discover >>>>>>> every version. >>>>>>> >>>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>> answer. >>>>>>> If different content pieces are scattered between different caches >>>>>>>I >>>>>>> don't see how range exclusions would work to discover every >>>>>>>version. >>>>>>> >>>>>>> I'm sorry to be pointing out problems without offering solutions >>>>>>>but >>>>>>> we're not ready to publish our discovery protocols. >>>>>>> >>>>>>> Sent from my telephone >>>>>>> >>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>wrote: >>>>>>> >>>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>>>the >>>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>>> better. >>>>>>> >>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>>soon >>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>Regular >>>>>>> language or context free language might become part of selector >>>>>>>too. >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>> That will get you one reading then you need to exclude it and ask >>>>>>> again. >>>>>>> >>>>>>> Sent from my telephone >>>>>>> >>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>wrote: >>>>>>> >>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>> with a particular cache, then you need to always use individual >>>>>>> excludes not range excludes if you want to discover all the >>>>>>>versions >>>>>>> of an object. >>>>>>> >>>>>>> >>>>>>> I am very confused. For your example, if I want to get all today's >>>>>>> sensor data, I just do (Any..Last second of last day)(First second >>>>>>>of >>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>> >>>>>>> >>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>> >>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>wrote: >>>>>>> >>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>> all range exclusions and only exclude explicit versions. >>>>>>> >>>>>>> >>>>>>> Could you explain why missing content object situation happens? >>>>>>>also >>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>> exclude; >>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>> possible. >>>>>>> >>>>>>> >>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>> with a particular cache, then you need to always use individual >>>>>>> excludes not range excludes if you want to discover all the >>>>>>>versions >>>>>>> of an object. For something like a sensor reading that is updated, >>>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>> exclusions (plus encoding overhead) per day. >>>>>>> >>>>>>> yes, maybe using a more deterministic version number than a >>>>>>> timestamp makes sense here, but its just an example of needing a >>>>>>>lot >>>>>>> of exclusions. >>>>>>> >>>>>>> >>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>> cache B >>>>>>> >>>>>>> >>>>>>> I feel this case is invalid because cache A will also get the >>>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>>> if >>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>> know >>>>>>> that v101 even exist? >>>>>>> >>>>>>> >>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>cache >>>>>>> A and cache B, then your application could miss v101. >>>>>>> >>>>>>> >>>>>>> >>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>number >>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>only >>>>>>> one RWMutex for the whole tree. I don't know whether adding lock >>>>>>>for >>>>>>> every node will be faster or not because of lock overhead. >>>>>>> >>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>> (ndn >>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>> >>>>>>> >>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>>> ready to claim its better yet because we have not done that. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>> I would point out that using LPM on content object to Interest >>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>> involves more than just ?latest version? discovery too. >>>>>>> >>>>>>> This is probably getting off-topic from the original post about >>>>>>> naming conventions. >>>>>>> >>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>> different caches are responding, the exclusion set you build up >>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>> content objects you want to discovery unless you avoid all range >>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>> explicit discovery protocol that allows conversations about >>>>>>> consistent sets is better. >>>>>>> >>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>> should be transitive between caches, but imagine this. You send >>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>> just don?t know. If you cannot have a conversation around >>>>>>> consistent sets, it seems like even doing latest version discovery >>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>> authoritative source because you can never believe an intermediate >>>>>>> cache that there?s not something more recent. >>>>>>> >>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>>> a node can correctly discover the latest version. We have not >>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>> have the non-determinism sketched above. >>>>>>> >>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>>> the content object name and evaluate the selector predicate. >>>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>>> structures based on predicates, maybe those would be better. But >>>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>> other things to optimize lookup for an extra write. >>>>>>> >>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>> and that will be expensive. It would be interesting to see what a >>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> I had thought about these questions, but I want to know your idea >>>>>>> besides typed component: >>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>> things? >>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>> other >>>>>>> faster technique to replace selector? >>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>> wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>> envision we need to do, and with a few simple conventions on >>>>>>> how the registry of types is managed. >>>>>>> >>>>>>> >>>>>>> Could you share it with us? >>>>>>> >>>>>>> Sure. Here?s a strawman. >>>>>>> >>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>> >>>>>>> The type space is currently shared with the types used for the >>>>>>> entire protocol, that gives us two options: >>>>>>> (1) we reserve a range for name component types. Given the >>>>>>> likelihood there will be at least as much and probably more need >>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>> of the type space, giving us 32K types for name components. >>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>> name component types. >>>>>>> >>>>>>> We divide the type space into regions, and manage it with a >>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>> there are well-understood rule sets for different kinds of >>>>>>> registries (open, requires a written spec, requires standards >>>>>>> approval). >>>>>>> >>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>> name?, which would be used on name prefixes and other common >>>>>>> cases where there are no special semantics on the name component. >>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>> globally understood types that are part of the base or extension >>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>> (say another 1024 types) >>>>>>> - We give the rest of the space to application assignment. >>>>>>> >>>>>>> Make sense? >>>>>>> >>>>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>> performance flaws in the design >>>>>>> >>>>>>> >>>>>>> we could design for performance, >>>>>>> >>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>> Moore?s Law will come to the rescue. >>>>>>> >>>>>>> but I think there will be a turning >>>>>>> point when the slower design starts to become "fast enough?. >>>>>>> >>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>> or not used because they impose a performance penalty relative to >>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>> where impovements in technology get applied to lowering cost >>>>>>> rather than improving performance. For those environments bad >>>>>>> performance just never get better. >>>>>>> >>>>>>> Do you >>>>>>> think there will be some design of ndn that will *never* have >>>>>>> performance improvement? >>>>>>> >>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>> functions). >>>>>>> i suspect exclusions will always be slow because they will >>>>>>> require extra memory references. >>>>>>> >>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>> orders of magnitude and still having to worry about counting >>>>>>> cycles and memory references? >>>>>>> >>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>> wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>> perform >>>>>>> well on it. It should be the other way around: once ndn app >>>>>>> becomes >>>>>>> popular, a better chip will be designed for ndn. >>>>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>> performance flaws in the design: >>>>>>> a) clock rates are not getting (much) faster >>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>> c) data structures that require locks to manipulate >>>>>>> successfully will be relatively more expensive, even with >>>>>>> near-zero lock contention. >>>>>>> >>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>> its design. We just forgot those because the design elements >>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>> poster children for this are: >>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>> anywhere >>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>> in working around. >>>>>>> >>>>>>> I?m afraid students today are being taught the that designers >>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>> engineers that got most of it right. >>>>>>> >>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>> Now I >>>>>>> see that there are 3 approaches: >>>>>>> 1. we should not define a naming convention at all >>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>> types >>>>>>> 3. marked component: introduce only one more type and add >>>>>>> additional >>>>>>> marker space >>>>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>> envision we need to do, and with a few simple conventions on >>>>>>> how the registry of types is managed. >>>>>>> >>>>>>> It is just as powerful in practice as either throwing up our >>>>>>> hands and letting applications design their own mutually >>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>> markers in a way that is fast to generate/parse and also >>>>>>> resilient against aliasing. >>>>>>> >>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>> convention >>>>>>> needs to be revised. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>> wrote: >>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>> to fit in (the >>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>> current NDN >>>>>>> experiments? >>>>>>> >>>>>>> I guess wide deployment could make for even longer names. >>>>>>> Related: Many URLs >>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>> text lines, and >>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>> I see. >>>>>>> >>>>>>> >>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>> >>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>> architectures, >>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>> bytes in memory, >>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>> 32-byte blocks >>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>> If you need to >>>>>>> switch between arrays, it would be very expensive. If you >>>>>>> have to read past >>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>> to get to the >>>>>>> name, it will be pretty expensive too. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>> wrote: >>>>>>> >>>>>>> Does this make that much difference? >>>>>>> >>>>>>> If you want to parse the first 5 components. One way to do >>>>>>> it is: >>>>>>> >>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>> from the start >>>>>>> offset of the beginning of the name. >>>>>>> OR >>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>> >>>>>>> How much speed are you getting from one to the other? You >>>>>>> seem to imply >>>>>>> that the first one is faster. I don?t think this is the >>>>>>> case. >>>>>>> >>>>>>> In the first one you?ll probably have to get the cache line >>>>>>> for the index, >>>>>>> then all the required cache lines for the first 5 >>>>>>> components. For the >>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>> 5 components. >>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>> than >>>>>>> evaluating a number and computing an addition, you might >>>>>>> find that the >>>>>>> performance of the index is actually slower than the >>>>>>> performance of the >>>>>>> direct access. >>>>>>> >>>>>>> Granted, there is a case where you don?t access the name at >>>>>>> all, for >>>>>>> example, if you just get the offsets and then send the >>>>>>> offsets as >>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>> you may see a >>>>>>> gain IF there are more cache line misses in reading the name >>>>>>> than in >>>>>>> reading the index. So, if the regular part of the name >>>>>>> that you?re >>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>> name is to be >>>>>>> processed by a different processor, then your might see some >>>>>>> performance >>>>>>> gain in using the index, but in all other circumstances I >>>>>>> bet this is not >>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>> >>>>>>> This is all to say, I don?t think we should be designing the >>>>>>> protocol with >>>>>>> only one architecture in mind. (The architecture of sending >>>>>>> the name to a >>>>>>> different processor than the index). >>>>>>> >>>>>>> If you have numbers that show that the index is faster I >>>>>>> would like to see >>>>>>> under what conditions and architectural assumptions. >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> (I may have misinterpreted your description so feel free to >>>>>>> correct me if >>>>>>> I?m wrong.) >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nacho (Ignacio) Solis >>>>>>> Protocol Architect >>>>>>> Principal Scientist >>>>>>> Palo Alto Research Center (PARC) >>>>>>> +1(650)812-4458 >>>>>>> Ignacio.Solis at parc.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>> amount of >>>>>>> bytes: >>>>>>> >>>>>>> i.e., >>>>>>> Type = Offsets >>>>>>> Length = 10 Bytes >>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>> >>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>> name is too >>>>>>> long. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>> you only >>>>>>> want the first x components) you can directly have it using >>>>>>> the >>>>>>> offsets. With the Nested TLV structure you have to >>>>>>> iteratively parse >>>>>>> the first x-1 components. With the offset structure you cane >>>>>>> directly >>>>>>> access to the firs x components. >>>>>>> >>>>>>> I don't get it. What you described only works if the >>>>>>> "offset" is >>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>> parse x-1 >>>>>>> offsets to get to the x offset. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>> wrote: >>>>>>> >>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>> >>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>> like the >>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>> understand what >>>>>>> you >>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>> entirely >>>>>>> different >>>>>>> scheme where the info that describes the name-components is >>>>>>> ... >>>>>>> someplace >>>>>>> other than _in_ the name-components. is that correct? when >>>>>>> you say >>>>>>> "field >>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>> TLV)? >>>>>>> >>>>>>> Correct. >>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>> name >>>>>>> hierarchy >>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>> offset to use >>>>>>> in >>>>>>> order to retrieve special components. >>>>>>> As for the field separator, it is something like "/". >>>>>>> Aliasing is >>>>>>> avoided as >>>>>>> you do not rely on field separators to parse the name; you >>>>>>> use the >>>>>>> "offset >>>>>>> TLV " to do that. >>>>>>> >>>>>>> So now, it may be an aesthetic question but: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>> you only >>>>>>> want >>>>>>> the first x components) you can directly have it using the >>>>>>> offsets. >>>>>>> With the >>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>> x-1 >>>>>>> components. >>>>>>> With the offset structure you cane directly access to the >>>>>>> firs x >>>>>>> components. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> -- Mark >>>>>>> >>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> The why is simple: >>>>>>> >>>>>>> You use a lot of "generic component type" and very few >>>>>>> "specific >>>>>>> component type". You are imposing types for every component >>>>>>> in order >>>>>>> to >>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>> rule >>>>>>> (specify >>>>>>> the component's type ) to handle exceptions! >>>>>>> >>>>>>> I would prefer not to have typed components. Instead I would >>>>>>> prefer >>>>>>> to >>>>>>> have the name as simple sequence bytes with a field >>>>>>> separator. Then, >>>>>>> outside the name, if you have some components that could be >>>>>>> used at >>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>> that >>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>> version, >>>>>>> segment, etc in the name... >>>>>>> >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>> >>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> I think we agree on the small number of "component types". >>>>>>> However, if you have a small number of types, you will end >>>>>>> up with >>>>>>> names >>>>>>> containing many generic components types and few specific >>>>>>> components >>>>>>> types. Due to the fact that the component type specification >>>>>>> is an >>>>>>> exception in the name, I would prefer something that specify >>>>>>> component's >>>>>>> type only when needed (something like UTF8 conventions but >>>>>>> that >>>>>>> applications MUST use). >>>>>>> >>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>> explanation >>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>> e.g.) >>>>>>> and >>>>>>> there's been email trying to explain that applications don't >>>>>>> have to >>>>>>> use types if they don't need to. your email sounds like "I >>>>>>> prefer >>>>>>> the >>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>> preference in >>>>>>> the face of the points about the problems. can you say why >>>>>>> it is >>>>>>> that >>>>>>> you express a preference for the "convention" with problems ? >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> . >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>>_______________________________________________ >>>Ndn-interest mailing list >>>Ndn-interest at lists.cs.ucla.edu >>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Fri Sep 26 01:34:30 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Fri, 26 Sep 2014 08:34:30 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: I?m not sure what you mean here. The FIB is a regular FIB, it does longest prefix match. The PIT needs an entry for every interest that can?t be aggregated. This is true for selectors as well (even though you might optimize this by having only 1 name entry and a bunch of selector entries branched off of this one). Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/26/14, 9:42 AM, "Tai-Lin Chu" wrote: >How will fib work under exact matching? that implies #routable >interest name = #fib name. >After you mention selectors in name and key hash, I see a huge number >of fib entries to be created. > >I am worried about those dummy nodes. > >On Thu, Sep 25, 2014 at 4:17 PM, wrote: >> On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: >> >>>In NDN, if a router wants to check the signature, then it can check. If >>>it wants to skip the checking, that's fine too. If the design doesn't >>>allow the router to verify the signature, then that's a problem. In the >>>above description, the cache signs a data packet with a name owned by >>>someone else, it seems problematic for a design to advocate this. >> >> Routers that support the Selector Protocol could check signatures. >> >>> >>>One difference is that here the returned data can satisfy only that >>>interest. In the original selector design, the returned data can >>>satisfy >>>other Interests with the same name but different selectors (e.g., > 50). >> >> Interests with the same selector would be exact matched throughout the >> network at any node. >> >> Interests with different selectors would not match on all routers, but >> they would match just fine on routers that supported the Selector >>Protocol. >> >> Basically, everything you want works on routers that support the >>Selector >> Protocol, but routers that don?t want to support it, don?t have to. >> >> >> Nacho >> >> >> >>>> >>>> >>>> >>>> -- >>>> Nacho (Ignacio) Solis >>>> Protocol Architect >>>> Principal Scientist >>>> Palo Alto Research Center (PARC) >>>> +1(650)812-4458 >>>> Ignacio.Solis at parc.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>>_______________________________________________ >>>Ndn-interest mailing list >>>Ndn-interest at lists.cs.ucla.edu >>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Marc.Mosko at parc.com Fri Sep 26 01:57:15 2014 From: Marc.Mosko at parc.com (Marc.Mosko at parc.com) Date: Fri, 26 Sep 2014 08:57:15 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: <8BEDC57D-24A4-41C6-92C9-1482FF1DA78F@parc.com> > Node E will eventually get the request from node B, because D could not satisy the request for a fresh response. Node E forwards the interest to D, who caches it, then forwards it to C. Node C has already satisfied the two interest, so it drops the fresh response. The last paragraph was not as clear as it should have been. Should say: Node E will eventually get the Interest from node B, because D could not satisy the request for a fresh content object. Node E forwards a fresh content object to D, who caches it, then forwards it to C. Node C has already satisfied the two interests, so it drops the fresh response. Marc On Sep 26, 2014, at 10:26 AM, wrote: > The ccnx 1.0 cache control directives for content objects are specified here, by the way. This applies to content objects. > > http://www.ccnx.org/pubs/ccnx-mosko-caching-01.txt > > For an Interest, we played with the idea of adding a ?MaximumAge? restriction so only a response with at least that many mills of remaining lifetime would satisfy the interest, but that didn?t seem like a great idea, so its not in the spec anywhere. > > Saying ?MustBeFresh? in an Interest, as its written up in NDN and in the old CCNx 0.x specs, does not work either. Here?s a counter example: > > A ? | > B ? C ? D ? E > > Node A sends an Interest for ?stale ok?. Node C forwards it to D. E is the actual producer who could send a fresh response. > > Node B sends a ?MustBeFresh? Interest. Node C, seeing that its different than the previous interest, forwards it to node D too. > > Node D receives the 1st interest from A and returns a stale Content Object. > > Node C receives a ContentObject that says ?Freshness=5?. So, it must be fresh, yes? Node C will send the stale response to both A and B. > > Node E will eventually get the request from node B, because D could not satisy the request for a fresh response. Node E forwards the interest to D, who caches it, then forwards it to C. Node C has already satisfied the two interest, so it drops the fresh response. > > Marc > > On Sep 26, 2014, at 10:12 AM, wrote: > >> On Fri, 26 Sep 2014, Ignacio.Solis at parc.com wrote: >> >>> On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: >>> >>>> In NDN, if a router wants to check the signature, then it can check. If >>>> it wants to skip the checking, that's fine too. If the design doesn't >>>> allow the router to verify the signature, then that's a problem. In the >>>> above description, the cache signs a data packet with a name owned by >>>> someone else, it seems problematic for a design to advocate this. >>> >>> Routers that support the Selector Protocol could check signatures. >>> >>>> >>>> One difference is that here the returned data can satisfy only that >>>> interest. In the original selector design, the returned data can satisfy >>>> other Interests with the same name but different selectors (e.g., > 50). >>> >>> Interests with the same selector would be exact matched throughout the >>> network at any node. >>> >>> Interests with different selectors would not match on all routers, but >>> they would match just fine on routers that supported the Selector Protocol. >>> >>> Basically, everything you want works on routers that support the Selector >>> Protocol, but routers that don?t want to support it, don?t have to. >> >> A perhaps more careful statement would be: >> >> - interests carry a query, not the name of data >> (except if you go for the hash). >> >> - caching results based on the query (that you call name) >> only works of the query is idempotent. >> >> I doubt that the selector expressiveness remains in idempotent land. >> >> One requirement thus would be that selector-carrying interests can disable cache reponses from nodes that dont support selectors. >> >> While it is true that you can force this by adding a nonce to each query, it would be cleaner to have an explicit signaling. Such a don't-cache-flag towards selector-ignoring nodes would be different from the dont-cache-flag in the query (that is directed to selector-aware nodes). >> >> >> christian >> >>> >>> >>> Nacho >>> >>> >>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Nacho (Ignacio) Solis >>>>> Protocol Architect >>>>> Principal Scientist >>>>> Palo Alto Research Center (PARC) >>>>> +1(650)812-4458 >>>>> Ignacio.Solis at parc.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2595 bytes Desc: not available URL: From lanwang at memphis.edu Fri Sep 26 08:01:28 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Fri, 26 Sep 2014 15:01:28 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <7ACEC7C8-4DCB-4B22-8540-49D4C71FE116@parc.com> References: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> <7ACEC7C8-4DCB-4B22-8540-49D4C71FE116@parc.com> Message-ID: <38317000-9E95-4EF7-AB9E-357CD519C2A3@memphis.edu> On Sep 25, 2014, at 4:09 PM, wrote: > > On Sep 25, 2014, at 10:01 PM, Lan Wang (lanwang) wrote: > >>>> - Benefit seems apparent in multi-consumer scenarios, even without sync. >>>> Let's say I have 5 personal devices requesting mail. In Scheme B, every >>>> publisher receives and processes 5 interests per second on average. In >>>> Scheme A, with an upstream caching node, each receives 1 per second >>>> maximum. The publisher still has to throttle requests, but with no help >>>> or scaling support from the network. >>> >>> This can be done without selectors. As long as all the clients produce a >>> request for the same name they can take advantage caching. >> >> What Jeff responded to is that scheme B requires a freshness of 0 for the initial interest to get to the producer (in order to get the latest list of email names). If freshness is 0, then there's no caching of the data. No meter how the clients name their Interests, they can't take advantage of caching. >> > > How do selectors prevent you from sending an Interest to the producer, if it?s connected. I send first interest ?exclude <= 100? and cache A responds with version 110. Don?t you then turn around and send a second interest ?exclude <= 110? to see if another cache has a more recent version? Won?t that interest go to the producer, if its connected? It will then need to send a NACk (or you need to timeout), if there?s nothing more recent. > > Using selectors, you still never know if there?s a more recent version until you get to the producer or you timeout. You always need to keep asking and asking. Also, there?s nothing special about the content object from the producer, so you still don?t necessarily believe that its the most recent, and you?ll then ask again. Sure, an application could just accept the 1st or 2nd content object it gets back, but it never really knows. Sure, if the CreateTime (I think you call it Timestamp in NDN, if you still have it) is very recent and you assume synchronized clocks, then you might have some belief that it?s current. > > We could also talk about FreshnessSeconds and MustBeFresh, but that would be best to start its own thread on. First of all, I'm not saying selectors prevent you from sending an Interest to the producer. Jeff's example is when you have five devices all wanting to get your emails, then the caching of the Data packet that contains the list of emails in Scheme A helps reduce the load on the producer. No matter how many devices want to get the list and when they send their Interest, the load on the server is constant (at most one Interest for the email list per second in the example). But in Scheme B, in worst case, the producer can get 5 Interests per second. Second, with or without selectors, you need to keep asking since you never know when new emails will arrive and the list will change. With any design, you need to keep asking. The question is how often to ask. The user may be happy to get his new emails once every 10 minutes. You can ask for a new list every 10 minutes. If you get a list from somewhere (if it was cached, it must have been less than 1 second old if FreshnessSecond of 1 second was used; if not, it must have been generated by the server) or get a NACK from the server, you may stop asking. If the user insists getting new emails as soon as possible, then the email client can send an Interest, say, every minute. This serves as a pending Interest at the server, and the server will respond whenever there's a new list. This pending Interest needs to be refreshed whenever it times out (every minute in this example). This is similar to sync. Lan From lanwang at memphis.edu Fri Sep 26 13:25:26 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Fri, 26 Sep 2014 20:25:26 +0000 Subject: [Ndn-interest] Selector Protocol over exact matching In-Reply-To: References: Message-ID: <0D231132-16A3-4EE2-AF5F-76707F86F2FA@memphis.edu> On Sep 25, 2014, at 6:17 PM, wrote: > On 9/25/14, 10:35 PM, "Lan Wang (lanwang)" wrote: > >> In NDN, if a router wants to check the signature, then it can check. If >> it wants to skip the checking, that's fine too. If the design doesn't >> allow the router to verify the signature, then that's a problem. In the >> above description, the cache signs a data packet with a name owned by >> someone else, it seems problematic for a design to advocate this. > > Routers that support the Selector Protocol could check signatures. The question here is whether the cache should be allowed to generate a data packet with a name belonging to someone else. There is no clear trust model here and checking signature without a clear idea of what key is allowed to sign the data does not seem to be meaningful to me (or if any key is allowed to sign a piece of data, then there's no need for generating the signature and signature checking). > >> >> One difference is that here the returned data can satisfy only that >> interest. In the original selector design, the returned data can satisfy >> other Interests with the same name but different selectors (e.g., > 50). > > Interests with the same selector would be exact matched throughout the > network at any node. > > Interests with different selectors would not match on all routers, but > they would match just fine on routers that supported the Selector Protocol. > > Basically, everything you want works on routers that support the Selector > Protocol, but routers that don?t want to support it, don?t have to. > It seems that the behavior of the routers that support the hash-based selectors would be very complicated. Suppose it gets a real data packet (/name/100) from its upstream to satisfy the Interest (/name/100). In addition to forwarding the data, it needs to check all other Interests with the /name prefix but with some other component following /name, because they might be hash based Interests. Some of them may not be hash based Interests, just Interests like /name/99, but it's hard to tell which is which. So you have to check all of them. If any of them is such a hash-based selector Interest, you check the selector and if it satisfies the selector, you need to encapsulate the real data packet and sign it to send this to downstream (in addition to forwarding the real data packet. Another problem is what if the name of the real data /name/xxx collides with the Interest with the hash-based selector (the hash is also xxx). Nevertheless, my main concern is the first point about the cache generating the data packet with others' name prefixes. Lan > > Nacho > > > >>> >>> >>> >>> -- >>> Nacho (Ignacio) Solis >>> Protocol Architect >>> Principal Scientist >>> Palo Alto Research Center (PARC) >>> +1(650)812-4458 >>> Ignacio.Solis at parc.com >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From lanwang at memphis.edu Fri Sep 26 13:50:37 2014 From: lanwang at memphis.edu (Lan Wang (lanwang)) Date: Fri, 26 Sep 2014 20:50:37 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> Message-ID: <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: > On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: > >> How can a cache respond to /mail/inbox/selector_matching/> payload> with a table of content? This name prefix is owned by the mail >> server. Also the reply really depends on what is in the cache at the >> moment, so the same name would correspond to different data. > > A - Yes, the same name would correspond to different data. This is true > given that then data has changed. NDN (and CCN) has no architectural > requirement that a name maps to the same piece of data (Obviously not > talking about self certifying hash-based names). There is a difference. A complete NDN name including the implicit digest uniquely identifies a piece of data. But here the same complete name may map to different data (I suppose you don't have implicit digest in an effort to do exact matching). In other words, in your proposal, the same name /mail/inbox/selector_matching/hash1 may map to two or more different data packets. But in NDN, two Data packets may share a name prefix, but definitely not the implicit digest. And at least it is my understanding that the application design should make sure that the same producer doesn't produce different Data packets with the same name prefix before implicit digest. It is possible in attack scenarios for different producers to generate Data packets with the same name prefix before implicit digest, but still not the same implicit digest. Lan > > B - Yes, you can consider the name prefix is ?owned? by the server, but > the answer is actually something that the cache is choosing. The cache is > choosing from the set if data that it has. The data that it encapsulates > _is_ signed by the producer. Anybody that can decapsulate the data can > verify that this is the case. > > Nacho > > >> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >> >>> My beating on ?discover all? is exactly because of this. Let?s define >>> discovery service. If the service is just ?discover latest? >>> (left/right), can we not simplify the current approach? If the service >>> includes more than ?latest?, then is the current approach the right >>> approach? >>> >>> Sync has its place and is the right solution for somethings. However, >>> it should not be a a bandage over discovery. Discovery should be its >>> own valid and useful service. >>> >>> I agree that the exclusion approach can work, and work relatively well, >>> for finding the rightmost/leftmost child. I believe this is because >>> that operation is transitive through caches. So, within whatever >>> timeout an application is willing to wait to find the ?latest?, it can >>> keep asking and asking. >>> >>> I do think it would be best to actually try to ask an authoritative >>> source first (i.e. a non-cached value), and if that fails then probe >>> caches, but experimentation may show what works well. This is based on >>> my belief that in the real world in broad use, the namespace will become >>> pretty polluted and probing will result in a lot of junk, but that?s >>> future prognosticating. >>> >>> Also, in the exact match vs. continuation match of content object to >>> interest, it is pretty easy to encode that ?selector? request in a name >>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) >>> and any participating cache can respond with a link (or encapsulate) a >>> response in an exact match system. >>> >>> In the CCNx 1.0 spec, one could also encode this a different way. One >>> could use a name like ?/mail/inbox/selector_matching/? >>> and in the payload include "exclude_before=(t=version, l=2, v=279) & >>> sort=right?. This means that any cache that could process the ? >>> selector_matching? function could look at the interest payload and >>> evaluate the predicate there. The predicate could become large and not >>> pollute the PIT with all the computation state. Including ?>> payload>? in the name means that one could get a cached response if >>> someone else had asked the same exact question (subject to the content >>> object?s cache lifetime) and it also servers to multiplex different >>> payloads for the same function (selector_matching). >>> >>> Marc >>> >>> >>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff wrote: >>> >>>> >>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>> >>>> >>>> https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchronizatio >>>> nPr >>>> otocol.html >>>> >>>> J. >>>> >>>> >>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>> >>>>> However, I cannot see whether we can achieve "best-effort *all*-value" >>>>> efficiently. >>>>> There are still interesting topics on >>>>> 1. how do we express the discovery query? >>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>> discovery query with current selector? >>>>> 3. if so, can we re-express current selector in a more efficient way? >>>>> >>>>> I personally see a named data as a set, which can then be categorized >>>>> into "ordered set", and "unordered set". >>>>> some questions that any discovery expression must solve: >>>>> 1. is this a nil set or not? nil set means that this name is the leaf >>>>> 2. set contains member X? >>>>> 3. is set ordered or not >>>>> 4. (ordered) first, prev, next, last >>>>> 5. if we enforce component ordering, answer question 4. >>>>> 6. recursively answer all questions above on any set member >>>>> >>>>> >>>>> >>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>> wrote: >>>>>> >>>>>> >>>>>> From: >>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>> To: Jeff Burke >>>>>> Cc: , , >>>>>> >>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>> >>>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>>>>> shouldn?t >>>>>> care if its email messages or temperature readings or world cup >>>>>> photos. >>>>>> >>>>>> >>>>>> This is true if discovery means "finding everything" - in which case, >>>>>> as you >>>>>> point out, sync-style approaches may be best. But I am not sure that >>>>>> this >>>>>> definition is complete. The most pressing example that I can think >>>>>> of >>>>>> is >>>>>> best-effort latest-value, in which the consumer's goal is to get the >>>>>> latest >>>>>> copy the network can deliver at the moment, and may not care about >>>>>> previous >>>>>> values or (if freshness is used well) potential later versions. >>>>>> >>>>>> Another case that seems to work well is video seeking. Let's say I >>>>>> want to >>>>>> enable random access to a video by timecode. The publisher can >>>>>> provide a >>>>>> time-code based discovery namespace that's queried using an Interest >>>>>> that >>>>>> essentially says "give me the closest keyframe to 00:37:03:12", which >>>>>> returns an interest that, via the name, provides the exact timecode >>>>>> of >>>>>> the >>>>>> keyframe in question and a link to a segment-based namespace for >>>>>> efficient >>>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>> way, >>>>>> the >>>>>> consumer has random access capability. If the NDN is the moral >>>>>> equivalent >>>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>> provide >>>>>> this kind of functionality, just as they are used in TCP. >>>>>> >>>>>> >>>>>> I described one set of problems using the exclusion approach, and >>>>>> that >>>>>> an >>>>>> NDN paper on device discovery described a similar problem, though >>>>>> they >>>>>> did >>>>>> not go into the details of splitting interests, etc. That all was >>>>>> simple >>>>>> enough to see from the example. >>>>>> >>>>>> Another question is how does one do the discovery with exact match >>>>>> names, >>>>>> which is also conflating things. You could do a different discovery >>>>>> with >>>>>> continuation names too, just not the exclude method. >>>>>> >>>>>> As I alluded to, one needs a way to talk with a specific cache about >>>>>> its >>>>>> ?table of contents? for a prefix so one can get a consistent set of >>>>>> results >>>>>> without all the round-trips of exclusions. Actually downloading the >>>>>> ?headers? of the messages would be the same bytes, more or less. In >>>>>> a >>>>>> way, >>>>>> this is a little like name enumeration from a ccnx 0.x repo, but that >>>>>> protocol has its own set of problems and I?m not suggesting to use >>>>>> that >>>>>> directly. >>>>>> >>>>>> One approach is to encode a request in a name component and a >>>>>> participating >>>>>> cache can reply. It replies in such a way that one could continue >>>>>> talking >>>>>> with that cache to get its TOC. One would then issue another >>>>>> interest >>>>>> with >>>>>> a request for not-that-cache. >>>>>> >>>>>> >>>>>> I'm curious how the TOC approach works in a multi-publisher scenario? >>>>>> >>>>>> >>>>>> Another approach is to try to ask the authoritative source for the >>>>>> ?current? >>>>>> manifest name, i.e. /mail/inbox/current/, which could return >>>>>> the >>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>> manifest >>>>>> from >>>>>> the link could come from caches because you how have a consistent >>>>>> set of >>>>>> names to ask for. If you cannot talk with an authoritative source, >>>>>> you >>>>>> could try again without the nonce and see if there?s a cached copy >>>>>> of a >>>>>> recent version around. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>> >>>>>> wrote: >>>>>> >>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>> >>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, see a >>>>>> pattern with static (/mail/inbox) and variable (148) components; with >>>>>> proper naming convention, computers can also detect this pattern >>>>>> easily. Now I want to look for all mails in my inbox. I can generate >>>>>> a >>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>> selectors >>>>>> I can further refine my guesses. >>>>>> >>>>>> >>>>>> I think this is a very bad example (or at least a very bad >>>>>> application >>>>>> design). You have an app (a mail server / inbox) and you want it to >>>>>> list >>>>>> your emails? An email list is an application data structure. I >>>>>> don?t >>>>>> think you should use the network structure to reflect this. >>>>>> >>>>>> >>>>>> I think Tai-Lin is trying to sketch a small example, not propose a >>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>> >>>>>> >>>>>> Another way to look at it is that if the network architecture is >>>>>> providing >>>>>> the equivalent of distributed storage to the application, perhaps the >>>>>> application data structure could be adapted to match the affordances >>>>>> of >>>>>> the network. Then it would not be so bad that the two structures >>>>>> were >>>>>> aligned. >>>>>> >>>>>> >>>>>> I?ll give you an example, how do you delete emails from your inbox? >>>>>> If >>>>>> an >>>>>> email was cached in the network it can never be deleted from your >>>>>> inbox? >>>>>> >>>>>> >>>>>> This is conflating two issues - what you are pointing out is that the >>>>>> data >>>>>> structure of a linear list doesn't handle common email management >>>>>> operations well. Again, I'm not sure if that's what he was getting >>>>>> at >>>>>> here. But deletion is not the issue - the availability of a data >>>>>> object >>>>>> on the network does not necessarily mean it's valid from the >>>>>> perspective >>>>>> of the application. >>>>>> >>>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>>> >>>>>> This problem is true for most (any?) situations where you use network >>>>>> name >>>>>> structure to directly reflect the application data structure. >>>>>> >>>>>> >>>>>> Not sure I understand how you make the leap from the example to the >>>>>> general statement. >>>>>> >>>>>> Jeff >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Nacho >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>> >>>>>> Ok, yes I think those would all be good things. >>>>>> >>>>>> One thing to keep in mind, especially with things like time series >>>>>> sensor >>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>> That?s >>>>>> easy >>>>>> for a human :) But in Discovery, one should assume that one does not >>>>>> know >>>>>> of patterns in the data beyond what the protocols used to publish the >>>>>> data >>>>>> explicitly require. That said, I think some of the things you listed >>>>>> are >>>>>> good places to start: sensor data, web content, climate data or >>>>>> genome >>>>>> data. >>>>>> >>>>>> We also need to state what the forwarding strategies are and what the >>>>>> cache >>>>>> behavior is. >>>>>> >>>>>> I outlined some of the points that I think are important in that >>>>>> other >>>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>>> important, and that one gets complicated fast. So points like >>>>>> separating >>>>>> discovery from retrieval and working with large data sets have been >>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>> starting >>>>>> from 0 and working through the Discovery service definition from >>>>>> scratch >>>>>> along with data set use cases. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>> wrote: >>>>>> >>>>>> Hi Marc, >>>>>> >>>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>> more >>>>>> specific, which was to see if we could identify a few specific use >>>>>> cases >>>>>> around which to have the conversation. (e.g., time series sensor >>>>>> data >>>>>> and >>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>> sets; >>>>>> local data in a vehicular network; etc.) What have you been looking >>>>>> at >>>>>> that's driving considerations of discovery? >>>>>> >>>>>> Thanks, >>>>>> Jeff >>>>>> >>>>>> From: >>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>> To: Jeff Burke >>>>>> Cc: , >>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>> >>>>>> Jeff, >>>>>> >>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>> Discovery. >>>>>> >>>>>> >>>>>> >>>>>> http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/000 >>>>>> 20 >>>>>> 0 >>>>>> .html >>>>>> >>>>>> I think it would be very productive to talk about what Discovery >>>>>> should >>>>>> do, >>>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>> the >>>>>> how, >>>>>> which I think is a less important topic than the what at this stage. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>> wrote: >>>>>> >>>>>> Marc, >>>>>> >>>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>> based >>>>>> on use cases. What are the use cases you are using to evaluate >>>>>> discovery? >>>>>> >>>>>> Jeff >>>>>> >>>>>> >>>>>> >>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>> wrote: >>>>>> >>>>>> No matter what the expressiveness of the predicates if the forwarder >>>>>> can >>>>>> send interests different ways you don't have a consistent underlying >>>>>> set >>>>>> to talk about so you would always need non-range exclusions to >>>>>> discover >>>>>> every version. >>>>>> >>>>>> Range exclusions only work I believe if you get an authoritative >>>>>> answer. >>>>>> If different content pieces are scattered between different caches I >>>>>> don't see how range exclusions would work to discover every version. >>>>>> >>>>>> I'm sorry to be pointing out problems without offering solutions but >>>>>> we're not ready to publish our discovery protocols. >>>>>> >>>>>> Sent from my telephone >>>>>> >>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" wrote: >>>>>> >>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>> the >>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>> better. >>>>>> >>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>> soon >>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>> Regular >>>>>> language or context free language might become part of selector too. >>>>>> >>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>> That will get you one reading then you need to exclude it and ask >>>>>> again. >>>>>> >>>>>> Sent from my telephone >>>>>> >>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" wrote: >>>>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. >>>>>> >>>>>> >>>>>> I am very confused. For your example, if I want to get all today's >>>>>> sensor data, I just do (Any..Last second of last day)(First second of >>>>>> tomorrow..Any). That's 18 bytes. >>>>>> >>>>>> >>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>> >>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>> >>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu wrote: >>>>>> >>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>> could miss content objects you want to discovery unless you avoid >>>>>> all range exclusions and only exclude explicit versions. >>>>>> >>>>>> >>>>>> Could you explain why missing content object situation happens? also >>>>>> range exclusion is just a shorter notation for many explicit >>>>>> exclude; >>>>>> converting from explicit excludes to ranged exclude is always >>>>>> possible. >>>>>> >>>>>> >>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>> with a particular cache, then you need to always use individual >>>>>> excludes not range excludes if you want to discover all the versions >>>>>> of an object. For something like a sensor reading that is updated, >>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>> exclusions (plus encoding overhead) per day. >>>>>> >>>>>> yes, maybe using a more deterministic version number than a >>>>>> timestamp makes sense here, but its just an example of needing a lot >>>>>> of exclusions. >>>>>> >>>>>> >>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>> cache B >>>>>> >>>>>> >>>>>> I feel this case is invalid because cache A will also get the >>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>> if >>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>> know >>>>>> that v101 even exist? >>>>>> >>>>>> >>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>> forwarder is just doing ?best path? and can round-robin between cache >>>>>> A and cache B, then your application could miss v101. >>>>>> >>>>>> >>>>>> >>>>>> c,d In general I agree that LPM performance is related to the number >>>>>> of components. In my own thread-safe LMP implementation, I used only >>>>>> one RWMutex for the whole tree. I don't know whether adding lock for >>>>>> every node will be faster or not because of lock overhead. >>>>>> >>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>> (ndn >>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>> >>>>>> >>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>> ready to claim its better yet because we have not done that. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>> I would point out that using LPM on content object to Interest >>>>>> matching to do discovery has its own set of problems. Discovery >>>>>> involves more than just ?latest version? discovery too. >>>>>> >>>>>> This is probably getting off-topic from the original post about >>>>>> naming conventions. >>>>>> >>>>>> a. If Interests can be forwarded multiple directions and two >>>>>> different caches are responding, the exclusion set you build up >>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>> content objects you want to discovery unless you avoid all range >>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>> explicit discovery protocol that allows conversations about >>>>>> consistent sets is better. >>>>>> >>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>> should be transitive between caches, but imagine this. You send >>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>> you think you have it! But, cache A already has version 101, you >>>>>> just don?t know. If you cannot have a conversation around >>>>>> consistent sets, it seems like even doing latest version discovery >>>>>> is difficult with selector based discovery. From what I saw in >>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>> authoritative source because you can never believe an intermediate >>>>>> cache that there?s not something more recent. >>>>>> >>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>> a node can correctly discover the latest version. We have not >>>>>> formally compared (or yet published) our discovery protocols (we >>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>> have the non-determinism sketched above. >>>>>> >>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>> must do in the PIT to match a content object. If you have a name >>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>> the content object name and evaluate the selector predicate. >>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>> structures based on predicates, maybe those would be better. But >>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>> name + keyid, one by name + content object hash), and one can do >>>>>> other things to optimize lookup for an extra write. >>>>>> >>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>> and that will be expensive. It would be interesting to see what a >>>>>> cache consistent multi-threaded name tree looks like. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> I had thought about these questions, but I want to know your idea >>>>>> besides typed component: >>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>> things? >>>>>> 2. will removing selectors improve performance? How do we use >>>>>> other >>>>>> faster technique to replace selector? >>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>> >>>>>> >>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>> wrote: >>>>>> >>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> I know how to make #2 flexible enough to do what things I can >>>>>> envision we need to do, and with a few simple conventions on >>>>>> how the registry of types is managed. >>>>>> >>>>>> >>>>>> Could you share it with us? >>>>>> >>>>>> Sure. Here?s a strawman. >>>>>> >>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>> >>>>>> The type space is currently shared with the types used for the >>>>>> entire protocol, that gives us two options: >>>>>> (1) we reserve a range for name component types. Given the >>>>>> likelihood there will be at least as much and probably more need >>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>> of the type space, giving us 32K types for name components. >>>>>> (2) since there is no parsing ambiguity between name components >>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>> name component types. >>>>>> >>>>>> We divide the type space into regions, and manage it with a >>>>>> registry. If we ever get to the point of creating an IETF >>>>>> standard, IANA has 25 years of experience running registries and >>>>>> there are well-understood rule sets for different kinds of >>>>>> registries (open, requires a written spec, requires standards >>>>>> approval). >>>>>> >>>>>> - We allocate one ?default" name component type for ?generic >>>>>> name?, which would be used on name prefixes and other common >>>>>> cases where there are no special semantics on the name component. >>>>>> - We allocate a range of name component types, say 1024, to >>>>>> globally understood types that are part of the base or extension >>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>> - We reserve some portion of the space for unanticipated uses >>>>>> (say another 1024 types) >>>>>> - We give the rest of the space to application assignment. >>>>>> >>>>>> Make sense? >>>>>> >>>>>> >>>>>> While I?m sympathetic to that view, there are three ways in >>>>>> which Moore?s law or hardware tricks will not save us from >>>>>> performance flaws in the design >>>>>> >>>>>> >>>>>> we could design for performance, >>>>>> >>>>>> That?s not what people are advocating. We are advocating that we >>>>>> *not* design for known bad performance and hope serendipity or >>>>>> Moore?s Law will come to the rescue. >>>>>> >>>>>> but I think there will be a turning >>>>>> point when the slower design starts to become "fast enough?. >>>>>> >>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>> things that don?t get faster while others do tend to get dropped >>>>>> or not used because they impose a performance penalty relative to >>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>> where impovements in technology get applied to lowering cost >>>>>> rather than improving performance. For those environments bad >>>>>> performance just never get better. >>>>>> >>>>>> Do you >>>>>> think there will be some design of ndn that will *never* have >>>>>> performance improvement? >>>>>> >>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>> functions). >>>>>> i suspect exclusions will always be slow because they will >>>>>> require extra memory references. >>>>>> >>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>> orders of magnitude and still having to worry about counting >>>>>> cycles and memory references? >>>>>> >>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>> wrote: >>>>>> >>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>> wrote: >>>>>> >>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>> perform >>>>>> well on it. It should be the other way around: once ndn app >>>>>> becomes >>>>>> popular, a better chip will be designed for ndn. >>>>>> >>>>>> While I?m sympathetic to that view, there are three ways in >>>>>> which Moore?s law or hardware tricks will not save us from >>>>>> performance flaws in the design: >>>>>> a) clock rates are not getting (much) faster >>>>>> b) memory accesses are getting (relatively) more expensive >>>>>> c) data structures that require locks to manipulate >>>>>> successfully will be relatively more expensive, even with >>>>>> near-zero lock contention. >>>>>> >>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>> its design. We just forgot those because the design elements >>>>>> that depended on those mistakes have fallen into disuse. The >>>>>> poster children for this are: >>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>> anywhere >>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>> specified and is now a giant PITA that still causes major pain >>>>>> in working around. >>>>>> >>>>>> I?m afraid students today are being taught the that designers >>>>>> of IP were flawless, as opposed to very good scientists and >>>>>> engineers that got most of it right. >>>>>> >>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>> Now I >>>>>> see that there are 3 approaches: >>>>>> 1. we should not define a naming convention at all >>>>>> 2. typed component: use tlv type space and add a handful of >>>>>> types >>>>>> 3. marked component: introduce only one more type and add >>>>>> additional >>>>>> marker space >>>>>> >>>>>> I know how to make #2 flexible enough to do what things I can >>>>>> envision we need to do, and with a few simple conventions on >>>>>> how the registry of types is managed. >>>>>> >>>>>> It is just as powerful in practice as either throwing up our >>>>>> hands and letting applications design their own mutually >>>>>> incompatible schemes or trying to make naming conventions with >>>>>> markers in a way that is fast to generate/parse and also >>>>>> resilient against aliasing. >>>>>> >>>>>> Also everybody thinks that the current utf8 marker naming >>>>>> convention >>>>>> needs to be revised. >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>> wrote: >>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>> to fit in (the >>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>> current NDN >>>>>> experiments? >>>>>> >>>>>> I guess wide deployment could make for even longer names. >>>>>> Related: Many URLs >>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>> text lines, and >>>>>> NDN will have to carry more information than URLs, as far as >>>>>> I see. >>>>>> >>>>>> >>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>> >>>>>> In fact, the index in separate TLV will be slower on some >>>>>> architectures, >>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>> bytes in memory, >>>>>> then any subsequent memory is accessed only as two adjacent >>>>>> 32-byte blocks >>>>>> (there can be at most 5 blocks available at any one time). >>>>>> If you need to >>>>>> switch between arrays, it would be very expensive. If you >>>>>> have to read past >>>>>> the name to get to the 2nd array, then read it, then backup >>>>>> to get to the >>>>>> name, it will be pretty expensive too. >>>>>> >>>>>> Marc >>>>>> >>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>> wrote: >>>>>> >>>>>> Does this make that much difference? >>>>>> >>>>>> If you want to parse the first 5 components. One way to do >>>>>> it is: >>>>>> >>>>>> Read the index, find entry 5, then read in that many bytes >>>>>> from the start >>>>>> offset of the beginning of the name. >>>>>> OR >>>>>> Start reading name, (find size + move ) 5 times. >>>>>> >>>>>> How much speed are you getting from one to the other? You >>>>>> seem to imply >>>>>> that the first one is faster. I don?t think this is the >>>>>> case. >>>>>> >>>>>> In the first one you?ll probably have to get the cache line >>>>>> for the index, >>>>>> then all the required cache lines for the first 5 >>>>>> components. For the >>>>>> second, you?ll have to get all the cache lines for the first >>>>>> 5 components. >>>>>> Given an assumption that a cache miss is way more expensive >>>>>> than >>>>>> evaluating a number and computing an addition, you might >>>>>> find that the >>>>>> performance of the index is actually slower than the >>>>>> performance of the >>>>>> direct access. >>>>>> >>>>>> Granted, there is a case where you don?t access the name at >>>>>> all, for >>>>>> example, if you just get the offsets and then send the >>>>>> offsets as >>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>> you may see a >>>>>> gain IF there are more cache line misses in reading the name >>>>>> than in >>>>>> reading the index. So, if the regular part of the name >>>>>> that you?re >>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>> name is to be >>>>>> processed by a different processor, then your might see some >>>>>> performance >>>>>> gain in using the index, but in all other circumstances I >>>>>> bet this is not >>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>> >>>>>> This is all to say, I don?t think we should be designing the >>>>>> protocol with >>>>>> only one architecture in mind. (The architecture of sending >>>>>> the name to a >>>>>> different processor than the index). >>>>>> >>>>>> If you have numbers that show that the index is faster I >>>>>> would like to see >>>>>> under what conditions and architectural assumptions. >>>>>> >>>>>> Nacho >>>>>> >>>>>> (I may have misinterpreted your description so feel free to >>>>>> correct me if >>>>>> I?m wrong.) >>>>>> >>>>>> >>>>>> -- >>>>>> Nacho (Ignacio) Solis >>>>>> Protocol Architect >>>>>> Principal Scientist >>>>>> Palo Alto Research Center (PARC) >>>>>> +1(650)812-4458 >>>>>> Ignacio.Solis at parc.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>> >>>>>> wrote: >>>>>> >>>>>> Indeed each components' offset must be encoded using a fixed >>>>>> amount of >>>>>> bytes: >>>>>> >>>>>> i.e., >>>>>> Type = Offsets >>>>>> Length = 10 Bytes >>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>> >>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>> name is too >>>>>> long. >>>>>> >>>>>> Max >>>>>> >>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose >>>>>> you only >>>>>> want the first x components) you can directly have it using >>>>>> the >>>>>> offsets. With the Nested TLV structure you have to >>>>>> iteratively parse >>>>>> the first x-1 components. With the offset structure you cane >>>>>> directly >>>>>> access to the firs x components. >>>>>> >>>>>> I don't get it. What you described only works if the >>>>>> "offset" is >>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>> parse x-1 >>>>>> offsets to get to the x offset. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>> wrote: >>>>>> >>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>> >>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>> like the >>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>> understand what >>>>>> you >>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>> entirely >>>>>> different >>>>>> scheme where the info that describes the name-components is >>>>>> ... >>>>>> someplace >>>>>> other than _in_ the name-components. is that correct? when >>>>>> you say >>>>>> "field >>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>> TLV)? >>>>>> >>>>>> Correct. >>>>>> In particular, with our name encoding, a TLV indicates the >>>>>> name >>>>>> hierarchy >>>>>> with offsets in the name and other TLV(s) indicates the >>>>>> offset to use >>>>>> in >>>>>> order to retrieve special components. >>>>>> As for the field separator, it is something like "/". >>>>>> Aliasing is >>>>>> avoided as >>>>>> you do not rely on field separators to parse the name; you >>>>>> use the >>>>>> "offset >>>>>> TLV " to do that. >>>>>> >>>>>> So now, it may be an aesthetic question but: >>>>>> >>>>>> if you do not need the entire hierarchal structure (suppose >>>>>> you only >>>>>> want >>>>>> the first x components) you can directly have it using the >>>>>> offsets. >>>>>> With the >>>>>> Nested TLV structure you have to iteratively parse the first >>>>>> x-1 >>>>>> components. >>>>>> With the offset structure you cane directly access to the >>>>>> firs x >>>>>> components. >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> -- Mark >>>>>> >>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>> >>>>>> The why is simple: >>>>>> >>>>>> You use a lot of "generic component type" and very few >>>>>> "specific >>>>>> component type". You are imposing types for every component >>>>>> in order >>>>>> to >>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>> rule >>>>>> (specify >>>>>> the component's type ) to handle exceptions! >>>>>> >>>>>> I would prefer not to have typed components. Instead I would >>>>>> prefer >>>>>> to >>>>>> have the name as simple sequence bytes with a field >>>>>> separator. Then, >>>>>> outside the name, if you have some components that could be >>>>>> used at >>>>>> network layer (e.g. a TLV field), you simply need something >>>>>> that >>>>>> indicates which is the offset allowing you to retrieve the >>>>>> version, >>>>>> segment, etc in the name... >>>>>> >>>>>> >>>>>> Max >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>> >>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>> >>>>>> I think we agree on the small number of "component types". >>>>>> However, if you have a small number of types, you will end >>>>>> up with >>>>>> names >>>>>> containing many generic components types and few specific >>>>>> components >>>>>> types. Due to the fact that the component type specification >>>>>> is an >>>>>> exception in the name, I would prefer something that specify >>>>>> component's >>>>>> type only when needed (something like UTF8 conventions but >>>>>> that >>>>>> applications MUST use). >>>>>> >>>>>> so ... I can't quite follow that. the thread has had some >>>>>> explanation >>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>> e.g.) >>>>>> and >>>>>> there's been email trying to explain that applications don't >>>>>> have to >>>>>> use types if they don't need to. your email sounds like "I >>>>>> prefer >>>>>> the >>>>>> UTF8 convention", but it doesn't say why you have that >>>>>> preference in >>>>>> the face of the points about the problems. can you say why >>>>>> it is >>>>>> that >>>>>> you express a preference for the "convention" with problems ? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> . >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Ignacio.Solis at parc.com Sat Sep 27 01:03:37 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Sat, 27 Sep 2014 08:03:37 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <38317000-9E95-4EF7-AB9E-357CD519C2A3@memphis.edu> References: <4F6F35A2-C2F2-45BB-94D2-B7A9859959C5@memphis.edu> <7ACEC7C8-4DCB-4B22-8540-49D4C71FE116@parc.com> <38317000-9E95-4EF7-AB9E-357CD519C2A3@memphis.edu> Message-ID: On 9/26/14, 5:01 PM, "Lan Wang (lanwang)" wrote: > >On Sep 25, 2014, at 4:09 PM, > wrote: > >> >> On Sep 25, 2014, at 10:01 PM, Lan Wang (lanwang) >>wrote: >> >>>>> - Benefit seems apparent in multi-consumer scenarios, even without >>>>>sync. >>>>> Let's say I have 5 personal devices requesting mail. In Scheme B, >>>>>every >>>>> publisher receives and processes 5 interests per second on average. >>>>>In >>>>> Scheme A, with an upstream caching node, each receives 1 per second >>>>> maximum. The publisher still has to throttle requests, but with no >>>>>help >>>>> or scaling support from the network. >>>> >>>> This can be done without selectors. As long as all the clients >>>>produce a >>>> request for the same name they can take advantage caching. >>> >>> What Jeff responded to is that scheme B requires a freshness of 0 for >>>the initial interest to get to the producer (in order to get the latest >>>list of email names). If freshness is 0, then there's no caching of >>>the data. No meter how the clients name their Interests, they can't >>>take advantage of caching. >>> >> >> How do selectors prevent you from sending an Interest to the producer, >>if it?s connected. I send first interest ?exclude <= 100? and cache A >>responds with version 110. Don?t you then turn around and send a second >>interest ?exclude <= 110? to see if another cache has a more recent >>version? Won?t that interest go to the producer, if its connected? It >>will then need to send a NACk (or you need to timeout), if there?s >>nothing more recent. >> >> Using selectors, you still never know if there?s a more recent version >>until you get to the producer or you timeout. You always need to keep >>asking and asking. Also, there?s nothing special about the content >>object from the producer, so you still don?t necessarily believe that >>its the most recent, and you?ll then ask again. Sure, an application >>could just accept the 1st or 2nd content object it gets back, but it >>never really knows. Sure, if the CreateTime (I think you call it >>Timestamp in NDN, if you still have it) is very recent and you assume >>synchronized clocks, then you might have some belief that it?s current. >> >> We could also talk about FreshnessSeconds and MustBeFresh, but that >>would be best to start its own thread on. > >First of all, I'm not saying selectors prevent you from sending an >Interest to the producer. Jeff's example is when you have five devices >all wanting to get your emails, then the caching of the Data packet that >contains the list of emails in Scheme A helps reduce the load on the >producer. No matter how many devices want to get the list and when they >send their Interest, the load on the server is constant (at most one >Interest for the email list per second in the example). But in Scheme B, >in worst case, the producer can get 5 Interests per second. First, the worst case is the worst case in _any_ scheme. There is no guarantee that any node will cache any data object. Second, the assumption that every interest must go to the server depends on the protocol/naming scheme you are using. Example 1: For example, say I name the data: /mailbox/list/ This means that I can programmatically generate a name that will get me the ?latest? data with a 1 minute window. If multiple clients request at the same time, they get the same object, no load on server. If nodes requests data at different time, then they would request different objects. In this case, the protocol/naming scheme is forcing the 1 minute window of naming. Assuming always-caching, the server would have at most one request per minute. Example 2: Server publishes an object called /mailbox/list/latest with a lifetime of 1 minute. Clients would issue requests for this object. If clients are requesting at the same time, they would get the same object. If clients are not requesting at the same time, they may get different objects. This method limits the load on the server, again, assuming the network is caching. Both of those methods limit the frequency at which the server produces data. Because of this limit, clients are not able to force a request that is dynamically generated for them at that point in time. Both examples have a probability for that to happen though. This is a big limitation on the client. It can?t query dynamically. But wait, there?s more! (that?s a joke, for those not familiar with American TV commercials) If you want to allow clients to get to the server for fresh data, then we are increasing the load on the server, independently of whether you use selectors or not. Other approaches might include: Example 3: - A mixture of Example 1 and Example 2. Example 4: - A mixture of Example 1 (or/and 2) and a dynamic query mechanism. Note that while selectors might be able to get you some of the dynamic answers, they would have a similar probability than getting the cached answers from example 1 and 2, and they would still have to make a choice of whether to ask the server dynamically. After all, if it didn?t want to ask dynamically, we would be satisfied with example 1 or 2. >Second, with or without selectors, you need to keep asking since you >never know when new emails will arrive and the list will change. With >any design, you need to keep asking. The question is how often to ask. >The user may be happy to get his new emails once every 10 minutes. You >can ask for a new list every 10 minutes. If you get a list from >somewhere (if it was cached, it must have been less than 1 second old if >FreshnessSecond of 1 second was used; if not, it must have been generated >by the server) or get a NACK from the server, you may stop asking. If >the user insists getting new emails as soon as possible, then the email >client can send an Interest, say, every minute. This serves as a pending >Interest at the server, and the server will respond whenever there's a >new list. This pending Interest needs to be refreshed whenever it times >out (every minute in this example). This is similar to sync. I don?t think you should model your protocols assuming PIT entries will live that long. It may be true that for a small network you might get 1 minute of PIT lifetime, but for a real network this won?t be the case. It will be very expensive to get PIT lifetimes across the internet that are more than a few seconds. (Think of routers running a few 100 gig interfaces, the size of pit entries and the amount of memory required). Nacho From Ignacio.Solis at parc.com Sat Sep 27 01:10:43 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Sat, 27 Sep 2014 08:10:43 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" wrote: >On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: >> >>> How can a cache respond to /mail/inbox/selector_matching/>> payload> with a table of content? This name prefix is owned by the >>>mail >>> server. Also the reply really depends on what is in the cache at the >>> moment, so the same name would correspond to different data. >> >> A - Yes, the same name would correspond to different data. This is true >> given that then data has changed. NDN (and CCN) has no architectural >> requirement that a name maps to the same piece of data (Obviously not >> talking about self certifying hash-based names). > >There is a difference. A complete NDN name including the implicit digest >uniquely identifies a piece of data. That?s the same thing for CCN with a ContentObjectHash. >But here the same complete name may map to different data (I suppose you >don't have implicit digest in an effort to do exact matching). We do, it?s called ContentObjectHash, but it?s not considered part of the name, it?s considered a matching restriction. >In other words, in your proposal, the same name >/mail/inbox/selector_matching/hash1 may map to two or more different data >packets. But in NDN, two Data packets may share a name prefix, but >definitely not the implicit digest. And at least it is my understanding >that the application design should make sure that the same producer >doesn't produce different Data packets with the same name prefix before >implicit digest. This is an application design issue. The network cannot enforce this. Applications will be able to name various data objects with the same name. After all, applications don?t really control the implicit digest. >It is possible in attack scenarios for different producers to generate >Data packets with the same name prefix before implicit digest, but still >not the same implicit digest. Why is this an attack scenario? Isn?t it true that if I name my local printer /printer that name can exist in the network at different locations from different publishers? Just to clarify, in the examples provided we weren?t using implicit hashes anywhere. IF we were using implicit hashes (as in, we knew what the implicit hash was), then selectors are useless. If you know the implicit hash, then you don?t need selectors. In the case of CCN, we use names without explicit hashes for most of our initial traffic (discovery, manifests, dynamically generated data, etc.), but after that, we use implicit digests (ContentObjectHash restriction) for practically all of the other traffic. Nacho >> >> B - Yes, you can consider the name prefix is ?owned? by the server, but >> the answer is actually something that the cache is choosing. The cache >>is >> choosing from the set if data that it has. The data that it >>encapsulates >> _is_ signed by the producer. Anybody that can decapsulate the data can >> verify that this is the case. >> >> Nacho >> >> >>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>> >>>> My beating on ?discover all? is exactly because of this. Let?s define >>>> discovery service. If the service is just ?discover latest? >>>> (left/right), can we not simplify the current approach? If the >>>>service >>>> includes more than ?latest?, then is the current approach the right >>>> approach? >>>> >>>> Sync has its place and is the right solution for somethings. However, >>>> it should not be a a bandage over discovery. Discovery should be its >>>> own valid and useful service. >>>> >>>> I agree that the exclusion approach can work, and work relatively >>>>well, >>>> for finding the rightmost/leftmost child. I believe this is because >>>> that operation is transitive through caches. So, within whatever >>>> timeout an application is willing to wait to find the ?latest?, it can >>>> keep asking and asking. >>>> >>>> I do think it would be best to actually try to ask an authoritative >>>> source first (i.e. a non-cached value), and if that fails then probe >>>> caches, but experimentation may show what works well. This is based >>>>on >>>> my belief that in the real world in broad use, the namespace will >>>>become >>>> pretty polluted and probing will result in a lot of junk, but that?s >>>> future prognosticating. >>>> >>>> Also, in the exact match vs. continuation match of content object to >>>> interest, it is pretty easy to encode that ?selector? request in a >>>>name >>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) >>>> and any participating cache can respond with a link (or encapsulate) a >>>> response in an exact match system. >>>> >>>> In the CCNx 1.0 spec, one could also encode this a different way. One >>>> could use a name like ?/mail/inbox/selector_matching/>>>payload>? >>>> and in the payload include "exclude_before=(t=version, l=2, v=279) & >>>> sort=right?. This means that any cache that could process the ? >>>> selector_matching? function could look at the interest payload and >>>> evaluate the predicate there. The predicate could become large and >>>>not >>>> pollute the PIT with all the computation state. Including ?>>> payload>? in the name means that one could get a cached response if >>>> someone else had asked the same exact question (subject to the content >>>> object?s cache lifetime) and it also servers to multiplex different >>>> payloads for the same function (selector_matching). >>>> >>>> Marc >>>> >>>> >>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>wrote: >>>> >>>>> >>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>> >>>>> >>>>> >>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchronizat >>>>>io >>>>> nPr >>>>> otocol.html >>>>> >>>>> J. >>>>> >>>>> >>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>> >>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>*all*-value" >>>>>> efficiently. >>>>>> There are still interesting topics on >>>>>> 1. how do we express the discovery query? >>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>> discovery query with current selector? >>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>way? >>>>>> >>>>>> I personally see a named data as a set, which can then be >>>>>>categorized >>>>>> into "ordered set", and "unordered set". >>>>>> some questions that any discovery expression must solve: >>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>leaf >>>>>> 2. set contains member X? >>>>>> 3. is set ordered or not >>>>>> 4. (ordered) first, prev, next, last >>>>>> 5. if we enforce component ordering, answer question 4. >>>>>> 6. recursively answer all questions above on any set member >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> From: >>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>> To: Jeff Burke >>>>>>> Cc: , , >>>>>>> >>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>> >>>>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>>>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>>>>>> shouldn?t >>>>>>> care if its email messages or temperature readings or world cup >>>>>>> photos. >>>>>>> >>>>>>> >>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>case, >>>>>>> as you >>>>>>> point out, sync-style approaches may be best. But I am not sure >>>>>>>that >>>>>>> this >>>>>>> definition is complete. The most pressing example that I can think >>>>>>> of >>>>>>> is >>>>>>> best-effort latest-value, in which the consumer's goal is to get >>>>>>>the >>>>>>> latest >>>>>>> copy the network can deliver at the moment, and may not care about >>>>>>> previous >>>>>>> values or (if freshness is used well) potential later versions. >>>>>>> >>>>>>> Another case that seems to work well is video seeking. Let's say I >>>>>>> want to >>>>>>> enable random access to a video by timecode. The publisher can >>>>>>> provide a >>>>>>> time-code based discovery namespace that's queried using an >>>>>>>Interest >>>>>>> that >>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>which >>>>>>> returns an interest that, via the name, provides the exact timecode >>>>>>> of >>>>>>> the >>>>>>> keyframe in question and a link to a segment-based namespace for >>>>>>> efficient >>>>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>>> way, >>>>>>> the >>>>>>> consumer has random access capability. If the NDN is the moral >>>>>>> equivalent >>>>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>>> provide >>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>> >>>>>>> >>>>>>> I described one set of problems using the exclusion approach, and >>>>>>> that >>>>>>> an >>>>>>> NDN paper on device discovery described a similar problem, though >>>>>>> they >>>>>>> did >>>>>>> not go into the details of splitting interests, etc. That all was >>>>>>> simple >>>>>>> enough to see from the example. >>>>>>> >>>>>>> Another question is how does one do the discovery with exact match >>>>>>> names, >>>>>>> which is also conflating things. You could do a different >>>>>>>discovery >>>>>>> with >>>>>>> continuation names too, just not the exclude method. >>>>>>> >>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>about >>>>>>> its >>>>>>> ?table of contents? for a prefix so one can get a consistent set of >>>>>>> results >>>>>>> without all the round-trips of exclusions. Actually downloading >>>>>>>the >>>>>>> ?headers? of the messages would be the same bytes, more or less. >>>>>>>In >>>>>>> a >>>>>>> way, >>>>>>> this is a little like name enumeration from a ccnx 0.x repo, but >>>>>>>that >>>>>>> protocol has its own set of problems and I?m not suggesting to use >>>>>>> that >>>>>>> directly. >>>>>>> >>>>>>> One approach is to encode a request in a name component and a >>>>>>> participating >>>>>>> cache can reply. It replies in such a way that one could continue >>>>>>> talking >>>>>>> with that cache to get its TOC. One would then issue another >>>>>>> interest >>>>>>> with >>>>>>> a request for not-that-cache. >>>>>>> >>>>>>> >>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>scenario? >>>>>>> >>>>>>> >>>>>>> Another approach is to try to ask the authoritative source for the >>>>>>> ?current? >>>>>>> manifest name, i.e. /mail/inbox/current/, which could return >>>>>>> the >>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>> manifest >>>>>>> from >>>>>>> the link could come from caches because you how have a consistent >>>>>>> set of >>>>>>> names to ask for. If you cannot talk with an authoritative source, >>>>>>> you >>>>>>> could try again without the nonce and see if there?s a cached copy >>>>>>> of a >>>>>>> recent version around. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>> >>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>see a >>>>>>> pattern with static (/mail/inbox) and variable (148) components; >>>>>>>with >>>>>>> proper naming convention, computers can also detect this pattern >>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>generate >>>>>>> a >>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>> selectors >>>>>>> I can further refine my guesses. >>>>>>> >>>>>>> >>>>>>> I think this is a very bad example (or at least a very bad >>>>>>> application >>>>>>> design). You have an app (a mail server / inbox) and you want it >>>>>>>to >>>>>>> list >>>>>>> your emails? An email list is an application data structure. I >>>>>>> don?t >>>>>>> think you should use the network structure to reflect this. >>>>>>> >>>>>>> >>>>>>> I think Tai-Lin is trying to sketch a small example, not propose a >>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>> >>>>>>> >>>>>>> Another way to look at it is that if the network architecture is >>>>>>> providing >>>>>>> the equivalent of distributed storage to the application, perhaps >>>>>>>the >>>>>>> application data structure could be adapted to match the >>>>>>>affordances >>>>>>> of >>>>>>> the network. Then it would not be so bad that the two structures >>>>>>> were >>>>>>> aligned. >>>>>>> >>>>>>> >>>>>>> I?ll give you an example, how do you delete emails from your inbox? >>>>>>> If >>>>>>> an >>>>>>> email was cached in the network it can never be deleted from your >>>>>>> inbox? >>>>>>> >>>>>>> >>>>>>> This is conflating two issues - what you are pointing out is that >>>>>>>the >>>>>>> data >>>>>>> structure of a linear list doesn't handle common email management >>>>>>> operations well. Again, I'm not sure if that's what he was getting >>>>>>> at >>>>>>> here. But deletion is not the issue - the availability of a data >>>>>>> object >>>>>>> on the network does not necessarily mean it's valid from the >>>>>>> perspective >>>>>>> of the application. >>>>>>> >>>>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>>>> >>>>>>> This problem is true for most (any?) situations where you use >>>>>>>network >>>>>>> name >>>>>>> structure to directly reflect the application data structure. >>>>>>> >>>>>>> >>>>>>> Not sure I understand how you make the leap from the example to the >>>>>>> general statement. >>>>>>> >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>> >>>>>>> Ok, yes I think those would all be good things. >>>>>>> >>>>>>> One thing to keep in mind, especially with things like time series >>>>>>> sensor >>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>> That?s >>>>>>> easy >>>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>>not >>>>>>> know >>>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>>the >>>>>>> data >>>>>>> explicitly require. That said, I think some of the things you >>>>>>>listed >>>>>>> are >>>>>>> good places to start: sensor data, web content, climate data or >>>>>>> genome >>>>>>> data. >>>>>>> >>>>>>> We also need to state what the forwarding strategies are and what >>>>>>>the >>>>>>> cache >>>>>>> behavior is. >>>>>>> >>>>>>> I outlined some of the points that I think are important in that >>>>>>> other >>>>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>>>> important, and that one gets complicated fast. So points like >>>>>>> separating >>>>>>> discovery from retrieval and working with large data sets have been >>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>> starting >>>>>>> from 0 and working through the Discovery service definition from >>>>>>> scratch >>>>>>> along with data set use cases. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>> wrote: >>>>>>> >>>>>>> Hi Marc, >>>>>>> >>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>>> more >>>>>>> specific, which was to see if we could identify a few specific use >>>>>>> cases >>>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>> data >>>>>>> and >>>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>>> sets; >>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>looking >>>>>>> at >>>>>>> that's driving considerations of discovery? >>>>>>> >>>>>>> Thanks, >>>>>>> Jeff >>>>>>> >>>>>>> From: >>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>> To: Jeff Burke >>>>>>> Cc: , >>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>> >>>>>>> Jeff, >>>>>>> >>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>> Discovery. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/0 >>>>>>>00 >>>>>>> 20 >>>>>>> 0 >>>>>>> .html >>>>>>> >>>>>>> I think it would be very productive to talk about what Discovery >>>>>>> should >>>>>>> do, >>>>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>>> the >>>>>>> how, >>>>>>> which I think is a less important topic than the what at this >>>>>>>stage. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>> wrote: >>>>>>> >>>>>>> Marc, >>>>>>> >>>>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>>> based >>>>>>> on use cases. What are the use cases you are using to evaluate >>>>>>> discovery? >>>>>>> >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>> wrote: >>>>>>> >>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>forwarder >>>>>>> can >>>>>>> send interests different ways you don't have a consistent >>>>>>>underlying >>>>>>> set >>>>>>> to talk about so you would always need non-range exclusions to >>>>>>> discover >>>>>>> every version. >>>>>>> >>>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>> answer. >>>>>>> If different content pieces are scattered between different caches >>>>>>>I >>>>>>> don't see how range exclusions would work to discover every >>>>>>>version. >>>>>>> >>>>>>> I'm sorry to be pointing out problems without offering solutions >>>>>>>but >>>>>>> we're not ready to publish our discovery protocols. >>>>>>> >>>>>>> Sent from my telephone >>>>>>> >>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>wrote: >>>>>>> >>>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>>> the >>>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>>> better. >>>>>>> >>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>> soon >>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>> Regular >>>>>>> language or context free language might become part of selector >>>>>>>too. >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>> That will get you one reading then you need to exclude it and ask >>>>>>> again. >>>>>>> >>>>>>> Sent from my telephone >>>>>>> >>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>wrote: >>>>>>> >>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>> with a particular cache, then you need to always use individual >>>>>>> excludes not range excludes if you want to discover all the >>>>>>>versions >>>>>>> of an object. >>>>>>> >>>>>>> >>>>>>> I am very confused. For your example, if I want to get all today's >>>>>>> sensor data, I just do (Any..Last second of last day)(First second >>>>>>>of >>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>> >>>>>>> >>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>> >>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>wrote: >>>>>>> >>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>> all range exclusions and only exclude explicit versions. >>>>>>> >>>>>>> >>>>>>> Could you explain why missing content object situation happens? >>>>>>>also >>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>> exclude; >>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>> possible. >>>>>>> >>>>>>> >>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>> with a particular cache, then you need to always use individual >>>>>>> excludes not range excludes if you want to discover all the >>>>>>>versions >>>>>>> of an object. For something like a sensor reading that is updated, >>>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>> exclusions (plus encoding overhead) per day. >>>>>>> >>>>>>> yes, maybe using a more deterministic version number than a >>>>>>> timestamp makes sense here, but its just an example of needing a >>>>>>>lot >>>>>>> of exclusions. >>>>>>> >>>>>>> >>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>> cache B >>>>>>> >>>>>>> >>>>>>> I feel this case is invalid because cache A will also get the >>>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>>> if >>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>> know >>>>>>> that v101 even exist? >>>>>>> >>>>>>> >>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>cache >>>>>>> A and cache B, then your application could miss v101. >>>>>>> >>>>>>> >>>>>>> >>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>number >>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>only >>>>>>> one RWMutex for the whole tree. I don't know whether adding lock >>>>>>>for >>>>>>> every node will be faster or not because of lock overhead. >>>>>>> >>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>> (ndn >>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>> >>>>>>> >>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>>> ready to claim its better yet because we have not done that. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>> I would point out that using LPM on content object to Interest >>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>> involves more than just ?latest version? discovery too. >>>>>>> >>>>>>> This is probably getting off-topic from the original post about >>>>>>> naming conventions. >>>>>>> >>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>> different caches are responding, the exclusion set you build up >>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>> content objects you want to discovery unless you avoid all range >>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>> explicit discovery protocol that allows conversations about >>>>>>> consistent sets is better. >>>>>>> >>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>> should be transitive between caches, but imagine this. You send >>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>> just don?t know. If you cannot have a conversation around >>>>>>> consistent sets, it seems like even doing latest version discovery >>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>> authoritative source because you can never believe an intermediate >>>>>>> cache that there?s not something more recent. >>>>>>> >>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>>> a node can correctly discover the latest version. We have not >>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>> have the non-determinism sketched above. >>>>>>> >>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>>> the content object name and evaluate the selector predicate. >>>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>>> structures based on predicates, maybe those would be better. But >>>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>> other things to optimize lookup for an extra write. >>>>>>> >>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>> and that will be expensive. It would be interesting to see what a >>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> I had thought about these questions, but I want to know your idea >>>>>>> besides typed component: >>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>> things? >>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>> other >>>>>>> faster technique to replace selector? >>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>> wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>> envision we need to do, and with a few simple conventions on >>>>>>> how the registry of types is managed. >>>>>>> >>>>>>> >>>>>>> Could you share it with us? >>>>>>> >>>>>>> Sure. Here?s a strawman. >>>>>>> >>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>> >>>>>>> The type space is currently shared with the types used for the >>>>>>> entire protocol, that gives us two options: >>>>>>> (1) we reserve a range for name component types. Given the >>>>>>> likelihood there will be at least as much and probably more need >>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>> of the type space, giving us 32K types for name components. >>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>> name component types. >>>>>>> >>>>>>> We divide the type space into regions, and manage it with a >>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>> there are well-understood rule sets for different kinds of >>>>>>> registries (open, requires a written spec, requires standards >>>>>>> approval). >>>>>>> >>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>> name?, which would be used on name prefixes and other common >>>>>>> cases where there are no special semantics on the name component. >>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>> globally understood types that are part of the base or extension >>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>> (say another 1024 types) >>>>>>> - We give the rest of the space to application assignment. >>>>>>> >>>>>>> Make sense? >>>>>>> >>>>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>> performance flaws in the design >>>>>>> >>>>>>> >>>>>>> we could design for performance, >>>>>>> >>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>> Moore?s Law will come to the rescue. >>>>>>> >>>>>>> but I think there will be a turning >>>>>>> point when the slower design starts to become "fast enough?. >>>>>>> >>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>> or not used because they impose a performance penalty relative to >>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>> where impovements in technology get applied to lowering cost >>>>>>> rather than improving performance. For those environments bad >>>>>>> performance just never get better. >>>>>>> >>>>>>> Do you >>>>>>> think there will be some design of ndn that will *never* have >>>>>>> performance improvement? >>>>>>> >>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>> functions). >>>>>>> i suspect exclusions will always be slow because they will >>>>>>> require extra memory references. >>>>>>> >>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>> orders of magnitude and still having to worry about counting >>>>>>> cycles and memory references? >>>>>>> >>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>> wrote: >>>>>>> >>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>> wrote: >>>>>>> >>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>> perform >>>>>>> well on it. It should be the other way around: once ndn app >>>>>>> becomes >>>>>>> popular, a better chip will be designed for ndn. >>>>>>> >>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>> performance flaws in the design: >>>>>>> a) clock rates are not getting (much) faster >>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>> c) data structures that require locks to manipulate >>>>>>> successfully will be relatively more expensive, even with >>>>>>> near-zero lock contention. >>>>>>> >>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>> its design. We just forgot those because the design elements >>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>> poster children for this are: >>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>> anywhere >>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>> in working around. >>>>>>> >>>>>>> I?m afraid students today are being taught the that designers >>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>> engineers that got most of it right. >>>>>>> >>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>> Now I >>>>>>> see that there are 3 approaches: >>>>>>> 1. we should not define a naming convention at all >>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>> types >>>>>>> 3. marked component: introduce only one more type and add >>>>>>> additional >>>>>>> marker space >>>>>>> >>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>> envision we need to do, and with a few simple conventions on >>>>>>> how the registry of types is managed. >>>>>>> >>>>>>> It is just as powerful in practice as either throwing up our >>>>>>> hands and letting applications design their own mutually >>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>> markers in a way that is fast to generate/parse and also >>>>>>> resilient against aliasing. >>>>>>> >>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>> convention >>>>>>> needs to be revised. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>> wrote: >>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>> to fit in (the >>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>> current NDN >>>>>>> experiments? >>>>>>> >>>>>>> I guess wide deployment could make for even longer names. >>>>>>> Related: Many URLs >>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>> text lines, and >>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>> I see. >>>>>>> >>>>>>> >>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>> >>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>> architectures, >>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>> bytes in memory, >>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>> 32-byte blocks >>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>> If you need to >>>>>>> switch between arrays, it would be very expensive. If you >>>>>>> have to read past >>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>> to get to the >>>>>>> name, it will be pretty expensive too. >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>> wrote: >>>>>>> >>>>>>> Does this make that much difference? >>>>>>> >>>>>>> If you want to parse the first 5 components. One way to do >>>>>>> it is: >>>>>>> >>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>> from the start >>>>>>> offset of the beginning of the name. >>>>>>> OR >>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>> >>>>>>> How much speed are you getting from one to the other? You >>>>>>> seem to imply >>>>>>> that the first one is faster. I don?t think this is the >>>>>>> case. >>>>>>> >>>>>>> In the first one you?ll probably have to get the cache line >>>>>>> for the index, >>>>>>> then all the required cache lines for the first 5 >>>>>>> components. For the >>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>> 5 components. >>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>> than >>>>>>> evaluating a number and computing an addition, you might >>>>>>> find that the >>>>>>> performance of the index is actually slower than the >>>>>>> performance of the >>>>>>> direct access. >>>>>>> >>>>>>> Granted, there is a case where you don?t access the name at >>>>>>> all, for >>>>>>> example, if you just get the offsets and then send the >>>>>>> offsets as >>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>> you may see a >>>>>>> gain IF there are more cache line misses in reading the name >>>>>>> than in >>>>>>> reading the index. So, if the regular part of the name >>>>>>> that you?re >>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>> name is to be >>>>>>> processed by a different processor, then your might see some >>>>>>> performance >>>>>>> gain in using the index, but in all other circumstances I >>>>>>> bet this is not >>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>> >>>>>>> This is all to say, I don?t think we should be designing the >>>>>>> protocol with >>>>>>> only one architecture in mind. (The architecture of sending >>>>>>> the name to a >>>>>>> different processor than the index). >>>>>>> >>>>>>> If you have numbers that show that the index is faster I >>>>>>> would like to see >>>>>>> under what conditions and architectural assumptions. >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> (I may have misinterpreted your description so feel free to >>>>>>> correct me if >>>>>>> I?m wrong.) >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nacho (Ignacio) Solis >>>>>>> Protocol Architect >>>>>>> Principal Scientist >>>>>>> Palo Alto Research Center (PARC) >>>>>>> +1(650)812-4458 >>>>>>> Ignacio.Solis at parc.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>> amount of >>>>>>> bytes: >>>>>>> >>>>>>> i.e., >>>>>>> Type = Offsets >>>>>>> Length = 10 Bytes >>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>> >>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>> name is too >>>>>>> long. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>> you only >>>>>>> want the first x components) you can directly have it using >>>>>>> the >>>>>>> offsets. With the Nested TLV structure you have to >>>>>>> iteratively parse >>>>>>> the first x-1 components. With the offset structure you cane >>>>>>> directly >>>>>>> access to the firs x components. >>>>>>> >>>>>>> I don't get it. What you described only works if the >>>>>>> "offset" is >>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>> parse x-1 >>>>>>> offsets to get to the x offset. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>> wrote: >>>>>>> >>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>> >>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>> like the >>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>> understand what >>>>>>> you >>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>> entirely >>>>>>> different >>>>>>> scheme where the info that describes the name-components is >>>>>>> ... >>>>>>> someplace >>>>>>> other than _in_ the name-components. is that correct? when >>>>>>> you say >>>>>>> "field >>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>> TLV)? >>>>>>> >>>>>>> Correct. >>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>> name >>>>>>> hierarchy >>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>> offset to use >>>>>>> in >>>>>>> order to retrieve special components. >>>>>>> As for the field separator, it is something like "/". >>>>>>> Aliasing is >>>>>>> avoided as >>>>>>> you do not rely on field separators to parse the name; you >>>>>>> use the >>>>>>> "offset >>>>>>> TLV " to do that. >>>>>>> >>>>>>> So now, it may be an aesthetic question but: >>>>>>> >>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>> you only >>>>>>> want >>>>>>> the first x components) you can directly have it using the >>>>>>> offsets. >>>>>>> With the >>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>> x-1 >>>>>>> components. >>>>>>> With the offset structure you cane directly access to the >>>>>>> firs x >>>>>>> components. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> -- Mark >>>>>>> >>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> The why is simple: >>>>>>> >>>>>>> You use a lot of "generic component type" and very few >>>>>>> "specific >>>>>>> component type". You are imposing types for every component >>>>>>> in order >>>>>>> to >>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>> rule >>>>>>> (specify >>>>>>> the component's type ) to handle exceptions! >>>>>>> >>>>>>> I would prefer not to have typed components. Instead I would >>>>>>> prefer >>>>>>> to >>>>>>> have the name as simple sequence bytes with a field >>>>>>> separator. Then, >>>>>>> outside the name, if you have some components that could be >>>>>>> used at >>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>> that >>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>> version, >>>>>>> segment, etc in the name... >>>>>>> >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>> >>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>> >>>>>>> I think we agree on the small number of "component types". >>>>>>> However, if you have a small number of types, you will end >>>>>>> up with >>>>>>> names >>>>>>> containing many generic components types and few specific >>>>>>> components >>>>>>> types. Due to the fact that the component type specification >>>>>>> is an >>>>>>> exception in the name, I would prefer something that specify >>>>>>> component's >>>>>>> type only when needed (something like UTF8 conventions but >>>>>>> that >>>>>>> applications MUST use). >>>>>>> >>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>> explanation >>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>> e.g.) >>>>>>> and >>>>>>> there's been email trying to explain that applications don't >>>>>>> have to >>>>>>> use types if they don't need to. your email sounds like "I >>>>>>> prefer >>>>>>> the >>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>> preference in >>>>>>> the face of the points about the problems. can you say why >>>>>>> it is >>>>>>> that >>>>>>> you express a preference for the "convention" with problems ? >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> . >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> > > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From tailinchu at gmail.com Sat Sep 27 11:40:41 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 27 Sep 2014 11:40:41 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: > /mail/inbox/selector_matching/ So Is this implicit? BTW, I read all your replies. I think the discovery protocol (send out table of content) has to reach the original provider ; otherwise there will be some issues in the trust model. At least the cached table of content has to be confirmed with the original provider either by key delegation or by other confirmation protocol. Besides this, LGTM. On Sat, Sep 27, 2014 at 1:10 AM, wrote: > On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" wrote: > >>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: >>> >>>> How can a cache respond to /mail/inbox/selector_matching/>>> payload> with a table of content? This name prefix is owned by the >>>>mail >>>> server. Also the reply really depends on what is in the cache at the >>>> moment, so the same name would correspond to different data. >>> >>> A - Yes, the same name would correspond to different data. This is true >>> given that then data has changed. NDN (and CCN) has no architectural >>> requirement that a name maps to the same piece of data (Obviously not >>> talking about self certifying hash-based names). >> >>There is a difference. A complete NDN name including the implicit digest >>uniquely identifies a piece of data. > > That?s the same thing for CCN with a ContentObjectHash. > > >>But here the same complete name may map to different data (I suppose you >>don't have implicit digest in an effort to do exact matching). > > We do, it?s called ContentObjectHash, but it?s not considered part of the > name, it?s considered a matching restriction. > > >>In other words, in your proposal, the same name >>/mail/inbox/selector_matching/hash1 may map to two or more different data >>packets. But in NDN, two Data packets may share a name prefix, but >>definitely not the implicit digest. And at least it is my understanding >>that the application design should make sure that the same producer >>doesn't produce different Data packets with the same name prefix before >>implicit digest. > > This is an application design issue. The network cannot enforce this. > Applications will be able to name various data objects with the same name. > After all, applications don?t really control the implicit digest. > >>It is possible in attack scenarios for different producers to generate >>Data packets with the same name prefix before implicit digest, but still >>not the same implicit digest. > > Why is this an attack scenario? Isn?t it true that if I name my local > printer /printer that name can exist in the network at different locations > from different publishers? > > > Just to clarify, in the examples provided we weren?t using implicit hashes > anywhere. IF we were using implicit hashes (as in, we knew what the > implicit hash was), then selectors are useless. If you know the implicit > hash, then you don?t need selectors. > > In the case of CCN, we use names without explicit hashes for most of our > initial traffic (discovery, manifests, dynamically generated data, etc.), > but after that, we use implicit digests (ContentObjectHash restriction) > for practically all of the other traffic. > > Nacho > > >>> >>> B - Yes, you can consider the name prefix is ?owned? by the server, but >>> the answer is actually something that the cache is choosing. The cache >>>is >>> choosing from the set if data that it has. The data that it >>>encapsulates >>> _is_ signed by the producer. Anybody that can decapsulate the data can >>> verify that this is the case. >>> >>> Nacho >>> >>> >>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>> >>>>> My beating on ?discover all? is exactly because of this. Let?s define >>>>> discovery service. If the service is just ?discover latest? >>>>> (left/right), can we not simplify the current approach? If the >>>>>service >>>>> includes more than ?latest?, then is the current approach the right >>>>> approach? >>>>> >>>>> Sync has its place and is the right solution for somethings. However, >>>>> it should not be a a bandage over discovery. Discovery should be its >>>>> own valid and useful service. >>>>> >>>>> I agree that the exclusion approach can work, and work relatively >>>>>well, >>>>> for finding the rightmost/leftmost child. I believe this is because >>>>> that operation is transitive through caches. So, within whatever >>>>> timeout an application is willing to wait to find the ?latest?, it can >>>>> keep asking and asking. >>>>> >>>>> I do think it would be best to actually try to ask an authoritative >>>>> source first (i.e. a non-cached value), and if that fails then probe >>>>> caches, but experimentation may show what works well. This is based >>>>>on >>>>> my belief that in the real world in broad use, the namespace will >>>>>become >>>>> pretty polluted and probing will result in a lot of junk, but that?s >>>>> future prognosticating. >>>>> >>>>> Also, in the exact match vs. continuation match of content object to >>>>> interest, it is pretty easy to encode that ?selector? request in a >>>>>name >>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & sort=right?) >>>>> and any participating cache can respond with a link (or encapsulate) a >>>>> response in an exact match system. >>>>> >>>>> In the CCNx 1.0 spec, one could also encode this a different way. One >>>>> could use a name like ?/mail/inbox/selector_matching/>>>>payload>? >>>>> and in the payload include "exclude_before=(t=version, l=2, v=279) & >>>>> sort=right?. This means that any cache that could process the ? >>>>> selector_matching? function could look at the interest payload and >>>>> evaluate the predicate there. The predicate could become large and >>>>>not >>>>> pollute the PIT with all the computation state. Including ?>>>> payload>? in the name means that one could get a cached response if >>>>> someone else had asked the same exact question (subject to the content >>>>> object?s cache lifetime) and it also servers to multiplex different >>>>> payloads for the same function (selector_matching). >>>>> >>>>> Marc >>>>> >>>>> >>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>wrote: >>>>> >>>>>> >>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>> >>>>>> >>>>>> >>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchronizat >>>>>>io >>>>>> nPr >>>>>> otocol.html >>>>>> >>>>>> J. >>>>>> >>>>>> >>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>>> >>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>*all*-value" >>>>>>> efficiently. >>>>>>> There are still interesting topics on >>>>>>> 1. how do we express the discovery query? >>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>> discovery query with current selector? >>>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>>way? >>>>>>> >>>>>>> I personally see a named data as a set, which can then be >>>>>>>categorized >>>>>>> into "ordered set", and "unordered set". >>>>>>> some questions that any discovery expression must solve: >>>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>>leaf >>>>>>> 2. set contains member X? >>>>>>> 3. is set ordered or not >>>>>>> 4. (ordered) first, prev, next, last >>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>> 6. recursively answer all questions above on any set member >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> From: >>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>> To: Jeff Burke >>>>>>>> Cc: , , >>>>>>>> >>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>> >>>>>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>>>>> /blah/blah/value, how do you discover all the ?value?s? Discovery >>>>>>>> shouldn?t >>>>>>>> care if its email messages or temperature readings or world cup >>>>>>>> photos. >>>>>>>> >>>>>>>> >>>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>>case, >>>>>>>> as you >>>>>>>> point out, sync-style approaches may be best. But I am not sure >>>>>>>>that >>>>>>>> this >>>>>>>> definition is complete. The most pressing example that I can think >>>>>>>> of >>>>>>>> is >>>>>>>> best-effort latest-value, in which the consumer's goal is to get >>>>>>>>the >>>>>>>> latest >>>>>>>> copy the network can deliver at the moment, and may not care about >>>>>>>> previous >>>>>>>> values or (if freshness is used well) potential later versions. >>>>>>>> >>>>>>>> Another case that seems to work well is video seeking. Let's say I >>>>>>>> want to >>>>>>>> enable random access to a video by timecode. The publisher can >>>>>>>> provide a >>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>Interest >>>>>>>> that >>>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>>which >>>>>>>> returns an interest that, via the name, provides the exact timecode >>>>>>>> of >>>>>>>> the >>>>>>>> keyframe in question and a link to a segment-based namespace for >>>>>>>> efficient >>>>>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>>>> way, >>>>>>>> the >>>>>>>> consumer has random access capability. If the NDN is the moral >>>>>>>> equivalent >>>>>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>>>> provide >>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>> >>>>>>>> >>>>>>>> I described one set of problems using the exclusion approach, and >>>>>>>> that >>>>>>>> an >>>>>>>> NDN paper on device discovery described a similar problem, though >>>>>>>> they >>>>>>>> did >>>>>>>> not go into the details of splitting interests, etc. That all was >>>>>>>> simple >>>>>>>> enough to see from the example. >>>>>>>> >>>>>>>> Another question is how does one do the discovery with exact match >>>>>>>> names, >>>>>>>> which is also conflating things. You could do a different >>>>>>>>discovery >>>>>>>> with >>>>>>>> continuation names too, just not the exclude method. >>>>>>>> >>>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>>about >>>>>>>> its >>>>>>>> ?table of contents? for a prefix so one can get a consistent set of >>>>>>>> results >>>>>>>> without all the round-trips of exclusions. Actually downloading >>>>>>>>the >>>>>>>> ?headers? of the messages would be the same bytes, more or less. >>>>>>>>In >>>>>>>> a >>>>>>>> way, >>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, but >>>>>>>>that >>>>>>>> protocol has its own set of problems and I?m not suggesting to use >>>>>>>> that >>>>>>>> directly. >>>>>>>> >>>>>>>> One approach is to encode a request in a name component and a >>>>>>>> participating >>>>>>>> cache can reply. It replies in such a way that one could continue >>>>>>>> talking >>>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>> interest >>>>>>>> with >>>>>>>> a request for not-that-cache. >>>>>>>> >>>>>>>> >>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>scenario? >>>>>>>> >>>>>>>> >>>>>>>> Another approach is to try to ask the authoritative source for the >>>>>>>> ?current? >>>>>>>> manifest name, i.e. /mail/inbox/current/, which could return >>>>>>>> the >>>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>> manifest >>>>>>>> from >>>>>>>> the link could come from caches because you how have a consistent >>>>>>>> set of >>>>>>>> names to ask for. If you cannot talk with an authoritative source, >>>>>>>> you >>>>>>>> could try again without the nonce and see if there?s a cached copy >>>>>>>> of a >>>>>>>> recent version around. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> >>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>>> >>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>>see a >>>>>>>> pattern with static (/mail/inbox) and variable (148) components; >>>>>>>>with >>>>>>>> proper naming convention, computers can also detect this pattern >>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>generate >>>>>>>> a >>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>> selectors >>>>>>>> I can further refine my guesses. >>>>>>>> >>>>>>>> >>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>> application >>>>>>>> design). You have an app (a mail server / inbox) and you want it >>>>>>>>to >>>>>>>> list >>>>>>>> your emails? An email list is an application data structure. I >>>>>>>> don?t >>>>>>>> think you should use the network structure to reflect this. >>>>>>>> >>>>>>>> >>>>>>>> I think Tai-Lin is trying to sketch a small example, not propose a >>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>> >>>>>>>> >>>>>>>> Another way to look at it is that if the network architecture is >>>>>>>> providing >>>>>>>> the equivalent of distributed storage to the application, perhaps >>>>>>>>the >>>>>>>> application data structure could be adapted to match the >>>>>>>>affordances >>>>>>>> of >>>>>>>> the network. Then it would not be so bad that the two structures >>>>>>>> were >>>>>>>> aligned. >>>>>>>> >>>>>>>> >>>>>>>> I?ll give you an example, how do you delete emails from your inbox? >>>>>>>> If >>>>>>>> an >>>>>>>> email was cached in the network it can never be deleted from your >>>>>>>> inbox? >>>>>>>> >>>>>>>> >>>>>>>> This is conflating two issues - what you are pointing out is that >>>>>>>>the >>>>>>>> data >>>>>>>> structure of a linear list doesn't handle common email management >>>>>>>> operations well. Again, I'm not sure if that's what he was getting >>>>>>>> at >>>>>>>> here. But deletion is not the issue - the availability of a data >>>>>>>> object >>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>> perspective >>>>>>>> of the application. >>>>>>>> >>>>>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>>>>> >>>>>>>> This problem is true for most (any?) situations where you use >>>>>>>>network >>>>>>>> name >>>>>>>> structure to directly reflect the application data structure. >>>>>>>> >>>>>>>> >>>>>>>> Not sure I understand how you make the leap from the example to the >>>>>>>> general statement. >>>>>>>> >>>>>>>> Jeff >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Nacho >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>>> >>>>>>>> Ok, yes I think those would all be good things. >>>>>>>> >>>>>>>> One thing to keep in mind, especially with things like time series >>>>>>>> sensor >>>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>> That?s >>>>>>>> easy >>>>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>>>not >>>>>>>> know >>>>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>>>the >>>>>>>> data >>>>>>>> explicitly require. That said, I think some of the things you >>>>>>>>listed >>>>>>>> are >>>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>> genome >>>>>>>> data. >>>>>>>> >>>>>>>> We also need to state what the forwarding strategies are and what >>>>>>>>the >>>>>>>> cache >>>>>>>> behavior is. >>>>>>>> >>>>>>>> I outlined some of the points that I think are important in that >>>>>>>> other >>>>>>>> posting. While ?discover latest? is useful, ?discover all? is also >>>>>>>> important, and that one gets complicated fast. So points like >>>>>>>> separating >>>>>>>> discovery from retrieval and working with large data sets have been >>>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>>> starting >>>>>>>> from 0 and working through the Discovery service definition from >>>>>>>> scratch >>>>>>>> along with data set use cases. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Marc, >>>>>>>> >>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one step >>>>>>>> more >>>>>>>> specific, which was to see if we could identify a few specific use >>>>>>>> cases >>>>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>>> data >>>>>>>> and >>>>>>>> web content retrieval for "get latest"; climate data for huge data >>>>>>>> sets; >>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>looking >>>>>>>> at >>>>>>>> that's driving considerations of discovery? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jeff >>>>>>>> >>>>>>>> From: >>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>> To: Jeff Burke >>>>>>>> Cc: , >>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>> >>>>>>>> Jeff, >>>>>>>> >>>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>>> Discovery. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September/0 >>>>>>>>00 >>>>>>>> 20 >>>>>>>> 0 >>>>>>>> .html >>>>>>>> >>>>>>>> I think it would be very productive to talk about what Discovery >>>>>>>> should >>>>>>>> do, >>>>>>>> and not focus on the how. It is sometimes easy to get caught up in >>>>>>>> the >>>>>>>> how, >>>>>>>> which I think is a less important topic than the what at this >>>>>>>>stage. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>> wrote: >>>>>>>> >>>>>>>> Marc, >>>>>>>> >>>>>>>> If you can't talk about your protocols, perhaps we can discuss this >>>>>>>> based >>>>>>>> on use cases. What are the use cases you are using to evaluate >>>>>>>> discovery? >>>>>>>> >>>>>>>> Jeff >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>> wrote: >>>>>>>> >>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>forwarder >>>>>>>> can >>>>>>>> send interests different ways you don't have a consistent >>>>>>>>underlying >>>>>>>> set >>>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>> discover >>>>>>>> every version. >>>>>>>> >>>>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>>> answer. >>>>>>>> If different content pieces are scattered between different caches >>>>>>>>I >>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>version. >>>>>>>> >>>>>>>> I'm sorry to be pointing out problems without offering solutions >>>>>>>>but >>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>> >>>>>>>> Sent from my telephone >>>>>>>> >>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>wrote: >>>>>>>> >>>>>>>> I see. Can you briefly describe how ccnx discovery protocol solves >>>>>>>> the >>>>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>>>> better. >>>>>>>> >>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>>> soon >>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>> Regular >>>>>>>> language or context free language might become part of selector >>>>>>>>too. >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>>> That will get you one reading then you need to exclude it and ask >>>>>>>> again. >>>>>>>> >>>>>>>> Sent from my telephone >>>>>>>> >>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>wrote: >>>>>>>> >>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>> with a particular cache, then you need to always use individual >>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>versions >>>>>>>> of an object. >>>>>>>> >>>>>>>> >>>>>>>> I am very confused. For your example, if I want to get all today's >>>>>>>> sensor data, I just do (Any..Last second of last day)(First second >>>>>>>>of >>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>> >>>>>>>> >>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>> >>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>wrote: >>>>>>>> >>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>> >>>>>>>> >>>>>>>> Could you explain why missing content object situation happens? >>>>>>>>also >>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>> exclude; >>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>> possible. >>>>>>>> >>>>>>>> >>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>> with a particular cache, then you need to always use individual >>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>versions >>>>>>>> of an object. For something like a sensor reading that is updated, >>>>>>>> say, once per second you will have 86,400 of them per day. If each >>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>> >>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>> timestamp makes sense here, but its just an example of needing a >>>>>>>>lot >>>>>>>> of exclusions. >>>>>>>> >>>>>>>> >>>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>>> cache B >>>>>>>> >>>>>>>> >>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>> interest, and cache A will return v101 if it exists. Like you said, >>>>>>>> if >>>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>>> know >>>>>>>> that v101 even exist? >>>>>>>> >>>>>>>> >>>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>>> forwarder will always send each interest to all replicas, then yes, >>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>>cache >>>>>>>> A and cache B, then your application could miss v101. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>number >>>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>>only >>>>>>>> one RWMutex for the whole tree. I don't know whether adding lock >>>>>>>>for >>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>> >>>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>>> (ndn >>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>> >>>>>>>> >>>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>>> specs for doing the exact match discovery. So, as I said, I?m not >>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>> >>>>>>>> This is probably getting off-topic from the original post about >>>>>>>> naming conventions. >>>>>>>> >>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>> content objects you want to discovery unless you avoid all range >>>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>> consistent sets is better. >>>>>>>> >>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>> should be transitive between caches, but imagine this. You send >>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>> consistent sets, it seems like even doing latest version discovery >>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>> authoritative source because you can never believe an intermediate >>>>>>>> cache that there?s not something more recent. >>>>>>>> >>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>> correctly discover every version of a name prefix, and (b) is that >>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>>> have three, 2 for content, 1 for device) compared to selector based >>>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>>> have the non-determinism sketched above. >>>>>>>> >>>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>> lookups, but you need to walk up the name tree for every prefix of >>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>> Content Based Networking (CBN) had some some methods to create data >>>>>>>> structures based on predicates, maybe those would be better. But >>>>>>>> in any case, you will potentially need to retrieve many PIT entries >>>>>>>> if there is Interest traffic for many prefixes of a root. Even on >>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>> implementation only requires at most 3 lookups (one by name, one by >>>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>>> other things to optimize lookup for an extra write. >>>>>>>> >>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning LWP) >>>>>>>> and that will be expensive. It would be interesting to see what a >>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> >>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>> wrote: >>>>>>>> >>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>> besides typed component: >>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>> things? >>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>> other >>>>>>>> faster technique to replace selector? >>>>>>>> 3. fixed byte length and type. I agree more that type can be fixed >>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>> wrote: >>>>>>>> >>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>> how the registry of types is managed. >>>>>>>> >>>>>>>> >>>>>>>> Could you share it with us? >>>>>>>> >>>>>>>> Sure. Here?s a strawman. >>>>>>>> >>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>> >>>>>>>> The type space is currently shared with the types used for the >>>>>>>> entire protocol, that gives us two options: >>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>> likelihood there will be at least as much and probably more need >>>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>>> name component types. >>>>>>>> >>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>> approval). >>>>>>>> >>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>> cases where there are no special semantics on the name component. >>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>> globally understood types that are part of the base or extension >>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>> (say another 1024 types) >>>>>>>> - We give the rest of the space to application assignment. >>>>>>>> >>>>>>>> Make sense? >>>>>>>> >>>>>>>> >>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>> performance flaws in the design >>>>>>>> >>>>>>>> >>>>>>>> we could design for performance, >>>>>>>> >>>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>> Moore?s Law will come to the rescue. >>>>>>>> >>>>>>>> but I think there will be a turning >>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>> >>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>>> or not used because they impose a performance penalty relative to >>>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>> rather than improving performance. For those environments bad >>>>>>>> performance just never get better. >>>>>>>> >>>>>>>> Do you >>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>> performance improvement? >>>>>>>> >>>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>>> functions). >>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>> require extra memory references. >>>>>>>> >>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>> cycles and memory references? >>>>>>>> >>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>> wrote: >>>>>>>> >>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>> perform >>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>> becomes >>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>> >>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>> performance flaws in the design: >>>>>>>> a) clock rates are not getting (much) faster >>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>> c) data structures that require locks to manipulate >>>>>>>> successfully will be relatively more expensive, even with >>>>>>>> near-zero lock contention. >>>>>>>> >>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>> its design. We just forgot those because the design elements >>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>> poster children for this are: >>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>> anywhere >>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>> in working around. >>>>>>>> >>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>> engineers that got most of it right. >>>>>>>> >>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>> Now I >>>>>>>> see that there are 3 approaches: >>>>>>>> 1. we should not define a naming convention at all >>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>> types >>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>> additional >>>>>>>> marker space >>>>>>>> >>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>> how the registry of types is managed. >>>>>>>> >>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>> hands and letting applications design their own mutually >>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>> resilient against aliasing. >>>>>>>> >>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>> convention >>>>>>>> needs to be revised. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>> wrote: >>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>> to fit in (the >>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>> current NDN >>>>>>>> experiments? >>>>>>>> >>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>> Related: Many URLs >>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>> text lines, and >>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>> I see. >>>>>>>> >>>>>>>> >>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>> >>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>> architectures, >>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>> bytes in memory, >>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>> 32-byte blocks >>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>> If you need to >>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>> have to read past >>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>> to get to the >>>>>>>> name, it will be pretty expensive too. >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>> wrote: >>>>>>>> >>>>>>>> Does this make that much difference? >>>>>>>> >>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>> it is: >>>>>>>> >>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>> from the start >>>>>>>> offset of the beginning of the name. >>>>>>>> OR >>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>> >>>>>>>> How much speed are you getting from one to the other? You >>>>>>>> seem to imply >>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>> case. >>>>>>>> >>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>> for the index, >>>>>>>> then all the required cache lines for the first 5 >>>>>>>> components. For the >>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>> 5 components. >>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>> than >>>>>>>> evaluating a number and computing an addition, you might >>>>>>>> find that the >>>>>>>> performance of the index is actually slower than the >>>>>>>> performance of the >>>>>>>> direct access. >>>>>>>> >>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>> all, for >>>>>>>> example, if you just get the offsets and then send the >>>>>>>> offsets as >>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>> you may see a >>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>> than in >>>>>>>> reading the index. So, if the regular part of the name >>>>>>>> that you?re >>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>> name is to be >>>>>>>> processed by a different processor, then your might see some >>>>>>>> performance >>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>> bet this is not >>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>> >>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>> protocol with >>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>> the name to a >>>>>>>> different processor than the index). >>>>>>>> >>>>>>>> If you have numbers that show that the index is faster I >>>>>>>> would like to see >>>>>>>> under what conditions and architectural assumptions. >>>>>>>> >>>>>>>> Nacho >>>>>>>> >>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>> correct me if >>>>>>>> I?m wrong.) >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nacho (Ignacio) Solis >>>>>>>> Protocol Architect >>>>>>>> Principal Scientist >>>>>>>> Palo Alto Research Center (PARC) >>>>>>>> +1(650)812-4458 >>>>>>>> Ignacio.Solis at parc.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>> amount of >>>>>>>> bytes: >>>>>>>> >>>>>>>> i.e., >>>>>>>> Type = Offsets >>>>>>>> Length = 10 Bytes >>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>> >>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>> name is too >>>>>>>> long. >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>> >>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>> you only >>>>>>>> want the first x components) you can directly have it using >>>>>>>> the >>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>> iteratively parse >>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>> directly >>>>>>>> access to the firs x components. >>>>>>>> >>>>>>>> I don't get it. What you described only works if the >>>>>>>> "offset" is >>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>> parse x-1 >>>>>>>> offsets to get to the x offset. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>> >>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>> like the >>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>> understand what >>>>>>>> you >>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>> entirely >>>>>>>> different >>>>>>>> scheme where the info that describes the name-components is >>>>>>>> ... >>>>>>>> someplace >>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>> you say >>>>>>>> "field >>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>> TLV)? >>>>>>>> >>>>>>>> Correct. >>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>> name >>>>>>>> hierarchy >>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>> offset to use >>>>>>>> in >>>>>>>> order to retrieve special components. >>>>>>>> As for the field separator, it is something like "/". >>>>>>>> Aliasing is >>>>>>>> avoided as >>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>> use the >>>>>>>> "offset >>>>>>>> TLV " to do that. >>>>>>>> >>>>>>>> So now, it may be an aesthetic question but: >>>>>>>> >>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>> you only >>>>>>>> want >>>>>>>> the first x components) you can directly have it using the >>>>>>>> offsets. >>>>>>>> With the >>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>> x-1 >>>>>>>> components. >>>>>>>> With the offset structure you cane directly access to the >>>>>>>> firs x >>>>>>>> components. >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> >>>>>>>> -- Mark >>>>>>>> >>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>> >>>>>>>> The why is simple: >>>>>>>> >>>>>>>> You use a lot of "generic component type" and very few >>>>>>>> "specific >>>>>>>> component type". You are imposing types for every component >>>>>>>> in order >>>>>>>> to >>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>> rule >>>>>>>> (specify >>>>>>>> the component's type ) to handle exceptions! >>>>>>>> >>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>> prefer >>>>>>>> to >>>>>>>> have the name as simple sequence bytes with a field >>>>>>>> separator. Then, >>>>>>>> outside the name, if you have some components that could be >>>>>>>> used at >>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>> that >>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>> version, >>>>>>>> segment, etc in the name... >>>>>>>> >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>> >>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>> >>>>>>>> I think we agree on the small number of "component types". >>>>>>>> However, if you have a small number of types, you will end >>>>>>>> up with >>>>>>>> names >>>>>>>> containing many generic components types and few specific >>>>>>>> components >>>>>>>> types. Due to the fact that the component type specification >>>>>>>> is an >>>>>>>> exception in the name, I would prefer something that specify >>>>>>>> component's >>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>> that >>>>>>>> applications MUST use). >>>>>>>> >>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>> explanation >>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>> e.g.) >>>>>>>> and >>>>>>>> there's been email trying to explain that applications don't >>>>>>>> have to >>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>> prefer >>>>>>>> the >>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>> preference in >>>>>>>> the face of the points about the problems. can you say why >>>>>>>> it is >>>>>>>> that >>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>>> . >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >> >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > > > _______________________________________________ > Ndn-interest mailing list > Ndn-interest at lists.cs.ucla.edu > http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Sat Sep 27 12:40:55 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Sat, 27 Sep 2014 19:40:55 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: On 9/27/14, 8:40 PM, "Tai-Lin Chu" wrote: >> /mail/inbox/selector_matching/ > >So Is this implicit? No. This is an explicit hash of the interest payload. So, an interest could look like: Interest: name = /mail/inbox/selector_matching/1234567890 payload = ?user=nacho? where hash(?user=nacho?) = 1234567890 >BTW, I read all your replies. I think the discovery protocol (send out >table of content) has to reach the original provider ; otherwise there >will be some issues in the trust model. At least the cached table of >content has to be confirmed with the original provider either by key >delegation or by other confirmation protocol. Besides this, LGTM. The trust model is just slightly different. You could have something like: Interest: name = /mail/inbox/selector_matching/1234567890 payload = ?user=nacho,publisher=mail_server_key? In this case, the reply would come signed by some random cache, but the encapsulated object would be signed by mail_server_key. So, any node that understood the Selector Protocol could decapsulate the reply and check the signature. Nodes that do not understand the Selector Protocol would not be able to check the signature of the encapsulated answer. This to me is not a problem. Base nodes (the ones not running the Selector Protocol) would not be checking signatures anyway, at least not in the fast path. This is an expensive operation that requires the node to get the key, etc. Nodes that run the Selector Protocol can check signatures if they wish (and can get their hands on a key). Nacho >On Sat, Sep 27, 2014 at 1:10 AM, wrote: >> On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" wrote: >> >>>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: >>>> >>>>> How can a cache respond to /mail/inbox/selector_matching/>>>> payload> with a table of content? This name prefix is owned by the >>>>>mail >>>>> server. Also the reply really depends on what is in the cache at >>>>>the >>>>> moment, so the same name would correspond to different data. >>>> >>>> A - Yes, the same name would correspond to different data. This is >>>>true >>>> given that then data has changed. NDN (and CCN) has no architectural >>>> requirement that a name maps to the same piece of data (Obviously not >>>> talking about self certifying hash-based names). >>> >>>There is a difference. A complete NDN name including the implicit >>>digest >>>uniquely identifies a piece of data. >> >> That?s the same thing for CCN with a ContentObjectHash. >> >> >>>But here the same complete name may map to different data (I suppose you >>>don't have implicit digest in an effort to do exact matching). >> >> We do, it?s called ContentObjectHash, but it?s not considered part of >>the >> name, it?s considered a matching restriction. >> >> >>>In other words, in your proposal, the same name >>>/mail/inbox/selector_matching/hash1 may map to two or more different >>>data >>>packets. But in NDN, two Data packets may share a name prefix, but >>>definitely not the implicit digest. And at least it is my understanding >>>that the application design should make sure that the same producer >>>doesn't produce different Data packets with the same name prefix before >>>implicit digest. >> >> This is an application design issue. The network cannot enforce this. >> Applications will be able to name various data objects with the same >>name. >> After all, applications don?t really control the implicit digest. >> >>>It is possible in attack scenarios for different producers to generate >>>Data packets with the same name prefix before implicit digest, but still >>>not the same implicit digest. >> >> Why is this an attack scenario? Isn?t it true that if I name my local >> printer /printer that name can exist in the network at different >>locations >> from different publishers? >> >> >> Just to clarify, in the examples provided we weren?t using implicit >>hashes >> anywhere. IF we were using implicit hashes (as in, we knew what the >> implicit hash was), then selectors are useless. If you know the implicit >> hash, then you don?t need selectors. >> >> In the case of CCN, we use names without explicit hashes for most of our >> initial traffic (discovery, manifests, dynamically generated data, >>etc.), >> but after that, we use implicit digests (ContentObjectHash restriction) >> for practically all of the other traffic. >> >> Nacho >> >> >>>> >>>> B - Yes, you can consider the name prefix is ?owned? by the server, >>>>but >>>> the answer is actually something that the cache is choosing. The cache >>>>is >>>> choosing from the set if data that it has. The data that it >>>>encapsulates >>>> _is_ signed by the producer. Anybody that can decapsulate the data >>>>can >>>> verify that this is the case. >>>> >>>> Nacho >>>> >>>> >>>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>>> >>>>>> My beating on ?discover all? is exactly because of this. Let?s >>>>>>define >>>>>> discovery service. If the service is just ?discover latest? >>>>>> (left/right), can we not simplify the current approach? If the >>>>>>service >>>>>> includes more than ?latest?, then is the current approach the right >>>>>> approach? >>>>>> >>>>>> Sync has its place and is the right solution for somethings. >>>>>>However, >>>>>> it should not be a a bandage over discovery. Discovery should be >>>>>>its >>>>>> own valid and useful service. >>>>>> >>>>>> I agree that the exclusion approach can work, and work relatively >>>>>>well, >>>>>> for finding the rightmost/leftmost child. I believe this is because >>>>>> that operation is transitive through caches. So, within whatever >>>>>> timeout an application is willing to wait to find the ?latest?, it >>>>>>can >>>>>> keep asking and asking. >>>>>> >>>>>> I do think it would be best to actually try to ask an authoritative >>>>>> source first (i.e. a non-cached value), and if that fails then probe >>>>>> caches, but experimentation may show what works well. This is based >>>>>>on >>>>>> my belief that in the real world in broad use, the namespace will >>>>>>become >>>>>> pretty polluted and probing will result in a lot of junk, but >>>>>>that?s >>>>>> future prognosticating. >>>>>> >>>>>> Also, in the exact match vs. continuation match of content object to >>>>>> interest, it is pretty easy to encode that ?selector? request in a >>>>>>name >>>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & >>>>>>sort=right?) >>>>>> and any participating cache can respond with a link (or >>>>>>encapsulate) a >>>>>> response in an exact match system. >>>>>> >>>>>> In the CCNx 1.0 spec, one could also encode this a different way. >>>>>>One >>>>>> could use a name like ?/mail/inbox/selector_matching/>>>>>payload>? >>>>>> and in the payload include "exclude_before=(t=version, l=2, v=279) & >>>>>> sort=right?. This means that any cache that could process the ? >>>>>> selector_matching? function could look at the interest payload and >>>>>> evaluate the predicate there. The predicate could become large and >>>>>>not >>>>>> pollute the PIT with all the computation state. Including ?>>>>> payload>? in the name means that one could get a cached response if >>>>>> someone else had asked the same exact question (subject to the >>>>>>content >>>>>> object?s cache lifetime) and it also servers to multiplex different >>>>>> payloads for the same function (selector_matching). >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>>wrote: >>>>>> >>>>>>> >>>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>>> >>>>>>> >>>>>>> >>>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchroniz >>>>>>>at >>>>>>>io >>>>>>> nPr >>>>>>> otocol.html >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> >>>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>>>> >>>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>>*all*-value" >>>>>>>> efficiently. >>>>>>>> There are still interesting topics on >>>>>>>> 1. how do we express the discovery query? >>>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>>> discovery query with current selector? >>>>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>>>way? >>>>>>>> >>>>>>>> I personally see a named data as a set, which can then be >>>>>>>>categorized >>>>>>>> into "ordered set", and "unordered set". >>>>>>>> some questions that any discovery expression must solve: >>>>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>>>leaf >>>>>>>> 2. set contains member X? >>>>>>>> 3. is set ordered or not >>>>>>>> 4. (ordered) first, prev, next, last >>>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>>> 6. recursively answer all questions above on any set member >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> From: >>>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>>> To: Jeff Burke >>>>>>>>> Cc: , , >>>>>>>>> >>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>> >>>>>>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>>>>>> /blah/blah/value, how do you discover all the ?value?s? >>>>>>>>>Discovery >>>>>>>>> shouldn?t >>>>>>>>> care if its email messages or temperature readings or world cup >>>>>>>>> photos. >>>>>>>>> >>>>>>>>> >>>>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>>>case, >>>>>>>>> as you >>>>>>>>> point out, sync-style approaches may be best. But I am not sure >>>>>>>>>that >>>>>>>>> this >>>>>>>>> definition is complete. The most pressing example that I can >>>>>>>>>think >>>>>>>>> of >>>>>>>>> is >>>>>>>>> best-effort latest-value, in which the consumer's goal is to get >>>>>>>>>the >>>>>>>>> latest >>>>>>>>> copy the network can deliver at the moment, and may not care >>>>>>>>>about >>>>>>>>> previous >>>>>>>>> values or (if freshness is used well) potential later versions. >>>>>>>>> >>>>>>>>> Another case that seems to work well is video seeking. Let's >>>>>>>>>say I >>>>>>>>> want to >>>>>>>>> enable random access to a video by timecode. The publisher can >>>>>>>>> provide a >>>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>>Interest >>>>>>>>> that >>>>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>>>which >>>>>>>>> returns an interest that, via the name, provides the exact >>>>>>>>>timecode >>>>>>>>> of >>>>>>>>> the >>>>>>>>> keyframe in question and a link to a segment-based namespace for >>>>>>>>> efficient >>>>>>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>>>>> way, >>>>>>>>> the >>>>>>>>> consumer has random access capability. If the NDN is the moral >>>>>>>>> equivalent >>>>>>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>>>>> provide >>>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>>> >>>>>>>>> >>>>>>>>> I described one set of problems using the exclusion approach, and >>>>>>>>> that >>>>>>>>> an >>>>>>>>> NDN paper on device discovery described a similar problem, though >>>>>>>>> they >>>>>>>>> did >>>>>>>>> not go into the details of splitting interests, etc. That all >>>>>>>>>was >>>>>>>>> simple >>>>>>>>> enough to see from the example. >>>>>>>>> >>>>>>>>> Another question is how does one do the discovery with exact >>>>>>>>>match >>>>>>>>> names, >>>>>>>>> which is also conflating things. You could do a different >>>>>>>>>discovery >>>>>>>>> with >>>>>>>>> continuation names too, just not the exclude method. >>>>>>>>> >>>>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>>>about >>>>>>>>> its >>>>>>>>> ?table of contents? for a prefix so one can get a consistent set >>>>>>>>>of >>>>>>>>> results >>>>>>>>> without all the round-trips of exclusions. Actually downloading >>>>>>>>>the >>>>>>>>> ?headers? of the messages would be the same bytes, more or less. >>>>>>>>>In >>>>>>>>> a >>>>>>>>> way, >>>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, but >>>>>>>>>that >>>>>>>>> protocol has its own set of problems and I?m not suggesting to >>>>>>>>>use >>>>>>>>> that >>>>>>>>> directly. >>>>>>>>> >>>>>>>>> One approach is to encode a request in a name component and a >>>>>>>>> participating >>>>>>>>> cache can reply. It replies in such a way that one could >>>>>>>>>continue >>>>>>>>> talking >>>>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>>> interest >>>>>>>>> with >>>>>>>>> a request for not-that-cache. >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>>scenario? >>>>>>>>> >>>>>>>>> >>>>>>>>> Another approach is to try to ask the authoritative source for >>>>>>>>>the >>>>>>>>> ?current? >>>>>>>>> manifest name, i.e. /mail/inbox/current/, which could >>>>>>>>>return >>>>>>>>> the >>>>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>>> manifest >>>>>>>>> from >>>>>>>>> the link could come from caches because you how have a consistent >>>>>>>>> set of >>>>>>>>> names to ask for. If you cannot talk with an authoritative >>>>>>>>>source, >>>>>>>>> you >>>>>>>>> could try again without the nonce and see if there?s a cached >>>>>>>>>copy >>>>>>>>> of a >>>>>>>>> recent version around. >>>>>>>>> >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>>>> >>>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>>>see a >>>>>>>>> pattern with static (/mail/inbox) and variable (148) components; >>>>>>>>>with >>>>>>>>> proper naming convention, computers can also detect this pattern >>>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>>generate >>>>>>>>> a >>>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>>> selectors >>>>>>>>> I can further refine my guesses. >>>>>>>>> >>>>>>>>> >>>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>>> application >>>>>>>>> design). You have an app (a mail server / inbox) and you want it >>>>>>>>>to >>>>>>>>> list >>>>>>>>> your emails? An email list is an application data structure. I >>>>>>>>> don?t >>>>>>>>> think you should use the network structure to reflect this. >>>>>>>>> >>>>>>>>> >>>>>>>>> I think Tai-Lin is trying to sketch a small example, not propose >>>>>>>>>a >>>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>>> >>>>>>>>> >>>>>>>>> Another way to look at it is that if the network architecture is >>>>>>>>> providing >>>>>>>>> the equivalent of distributed storage to the application, perhaps >>>>>>>>>the >>>>>>>>> application data structure could be adapted to match the >>>>>>>>>affordances >>>>>>>>> of >>>>>>>>> the network. Then it would not be so bad that the two structures >>>>>>>>> were >>>>>>>>> aligned. >>>>>>>>> >>>>>>>>> >>>>>>>>> I?ll give you an example, how do you delete emails from your >>>>>>>>>inbox? >>>>>>>>> If >>>>>>>>> an >>>>>>>>> email was cached in the network it can never be deleted from your >>>>>>>>> inbox? >>>>>>>>> >>>>>>>>> >>>>>>>>> This is conflating two issues - what you are pointing out is that >>>>>>>>>the >>>>>>>>> data >>>>>>>>> structure of a linear list doesn't handle common email management >>>>>>>>> operations well. Again, I'm not sure if that's what he was >>>>>>>>>getting >>>>>>>>> at >>>>>>>>> here. But deletion is not the issue - the availability of a data >>>>>>>>> object >>>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>>> perspective >>>>>>>>> of the application. >>>>>>>>> >>>>>>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>>>>>> >>>>>>>>> This problem is true for most (any?) situations where you use >>>>>>>>>network >>>>>>>>> name >>>>>>>>> structure to directly reflect the application data structure. >>>>>>>>> >>>>>>>>> >>>>>>>>> Not sure I understand how you make the leap from the example to >>>>>>>>>the >>>>>>>>> general statement. >>>>>>>>> >>>>>>>>> Jeff >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Nacho >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>>>> >>>>>>>>> Ok, yes I think those would all be good things. >>>>>>>>> >>>>>>>>> One thing to keep in mind, especially with things like time >>>>>>>>>series >>>>>>>>> sensor >>>>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>>> That?s >>>>>>>>> easy >>>>>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>>>>not >>>>>>>>> know >>>>>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>>>>the >>>>>>>>> data >>>>>>>>> explicitly require. That said, I think some of the things you >>>>>>>>>listed >>>>>>>>> are >>>>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>>> genome >>>>>>>>> data. >>>>>>>>> >>>>>>>>> We also need to state what the forwarding strategies are and what >>>>>>>>>the >>>>>>>>> cache >>>>>>>>> behavior is. >>>>>>>>> >>>>>>>>> I outlined some of the points that I think are important in that >>>>>>>>> other >>>>>>>>> posting. While ?discover latest? is useful, ?discover all? is >>>>>>>>>also >>>>>>>>> important, and that one gets complicated fast. So points like >>>>>>>>> separating >>>>>>>>> discovery from retrieval and working with large data sets have >>>>>>>>>been >>>>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>>>> starting >>>>>>>>> from 0 and working through the Discovery service definition from >>>>>>>>> scratch >>>>>>>>> along with data set use cases. >>>>>>>>> >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Marc, >>>>>>>>> >>>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one >>>>>>>>>step >>>>>>>>> more >>>>>>>>> specific, which was to see if we could identify a few specific >>>>>>>>>use >>>>>>>>> cases >>>>>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>>>> data >>>>>>>>> and >>>>>>>>> web content retrieval for "get latest"; climate data for huge >>>>>>>>>data >>>>>>>>> sets; >>>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>>looking >>>>>>>>> at >>>>>>>>> that's driving considerations of discovery? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jeff >>>>>>>>> >>>>>>>>> From: >>>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>>> To: Jeff Burke >>>>>>>>> Cc: , >>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>> >>>>>>>>> Jeff, >>>>>>>>> >>>>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>>>> Discovery. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September >>>>>>>>>/0 >>>>>>>>>00 >>>>>>>>> 20 >>>>>>>>> 0 >>>>>>>>> .html >>>>>>>>> >>>>>>>>> I think it would be very productive to talk about what Discovery >>>>>>>>> should >>>>>>>>> do, >>>>>>>>> and not focus on the how. It is sometimes easy to get caught up >>>>>>>>>in >>>>>>>>> the >>>>>>>>> how, >>>>>>>>> which I think is a less important topic than the what at this >>>>>>>>>stage. >>>>>>>>> >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Marc, >>>>>>>>> >>>>>>>>> If you can't talk about your protocols, perhaps we can discuss >>>>>>>>>this >>>>>>>>> based >>>>>>>>> on use cases. What are the use cases you are using to evaluate >>>>>>>>> discovery? >>>>>>>>> >>>>>>>>> Jeff >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>>forwarder >>>>>>>>> can >>>>>>>>> send interests different ways you don't have a consistent >>>>>>>>>underlying >>>>>>>>> set >>>>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>>> discover >>>>>>>>> every version. >>>>>>>>> >>>>>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>>>> answer. >>>>>>>>> If different content pieces are scattered between different >>>>>>>>>caches >>>>>>>>>I >>>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>>version. >>>>>>>>> >>>>>>>>> I'm sorry to be pointing out problems without offering solutions >>>>>>>>>but >>>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>>> >>>>>>>>> Sent from my telephone >>>>>>>>> >>>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> I see. Can you briefly describe how ccnx discovery protocol >>>>>>>>>solves >>>>>>>>> the >>>>>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>>>>> better. >>>>>>>>> >>>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>>>> soon >>>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>>> Regular >>>>>>>>> language or context free language might become part of selector >>>>>>>>>too. >>>>>>>>> >>>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>>>> That will get you one reading then you need to exclude it and ask >>>>>>>>> again. >>>>>>>>> >>>>>>>>> Sent from my telephone >>>>>>>>> >>>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>versions >>>>>>>>> of an object. >>>>>>>>> >>>>>>>>> >>>>>>>>> I am very confused. For your example, if I want to get all >>>>>>>>>today's >>>>>>>>> sensor data, I just do (Any..Last second of last day)(First >>>>>>>>>second >>>>>>>>>of >>>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>>> >>>>>>>>> >>>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>>> >>>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>>> >>>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>> >>>>>>>>> >>>>>>>>> Could you explain why missing content object situation happens? >>>>>>>>>also >>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>> exclude; >>>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>>> possible. >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>versions >>>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>>updated, >>>>>>>>> say, once per second you will have 86,400 of them per day. If >>>>>>>>>each >>>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>>> >>>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>>> timestamp makes sense here, but its just an example of needing a >>>>>>>>>lot >>>>>>>>> of exclusions. >>>>>>>>> >>>>>>>>> >>>>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>>>> cache B >>>>>>>>> >>>>>>>>> >>>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>>> interest, and cache A will return v101 if it exists. Like you >>>>>>>>>said, >>>>>>>>> if >>>>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>>>> know >>>>>>>>> that v101 even exist? >>>>>>>>> >>>>>>>>> >>>>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>>>> forwarder will always send each interest to all replicas, then >>>>>>>>>yes, >>>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>>>cache >>>>>>>>> A and cache B, then your application could miss v101. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>>number >>>>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>>>only >>>>>>>>> one RWMutex for the whole tree. I don't know whether adding lock >>>>>>>>>for >>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>> >>>>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>>>> (ndn >>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>>>> specs for doing the exact match discovery. So, as I said, I?m >>>>>>>>>not >>>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>> >>>>>>>>> This is probably getting off-topic from the original post about >>>>>>>>> naming conventions. >>>>>>>>> >>>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>>> content objects you want to discovery unless you avoid all range >>>>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>> consistent sets is better. >>>>>>>>> >>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>> should be transitive between caches, but imagine this. You send >>>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>> consistent sets, it seems like even doing latest version >>>>>>>>>discovery >>>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>>> authoritative source because you can never believe an >>>>>>>>>intermediate >>>>>>>>> cache that there?s not something more recent. >>>>>>>>> >>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>>> correctly discover every version of a name prefix, and (b) is >>>>>>>>>that >>>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>>>> have three, 2 for content, 1 for device) compared to selector >>>>>>>>>based >>>>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>>>> have the non-determinism sketched above. >>>>>>>>> >>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>>> lookups, but you need to walk up the name tree for every prefix >>>>>>>>>of >>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>> Content Based Networking (CBN) had some some methods to create >>>>>>>>>data >>>>>>>>> structures based on predicates, maybe those would be better. But >>>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>>entries >>>>>>>>> if there is Interest traffic for many prefixes of a root. Even >>>>>>>>>on >>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>> implementation only requires at most 3 lookups (one by name, one >>>>>>>>>by >>>>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>> >>>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning >>>>>>>>>LWP) >>>>>>>>> and that will be expensive. It would be interesting to see what >>>>>>>>>a >>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>> >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>>> besides typed component: >>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>> things? >>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>> other >>>>>>>>> faster technique to replace selector? >>>>>>>>> 3. fixed byte length and type. I agree more that type can be >>>>>>>>>fixed >>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>> how the registry of types is managed. >>>>>>>>> >>>>>>>>> >>>>>>>>> Could you share it with us? >>>>>>>>> >>>>>>>>> Sure. Here?s a strawman. >>>>>>>>> >>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>> >>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>> entire protocol, that gives us two options: >>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>> likelihood there will be at least as much and probably more need >>>>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>>>> name component types. >>>>>>>>> >>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>> approval). >>>>>>>>> >>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>> cases where there are no special semantics on the name component. >>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>> globally understood types that are part of the base or extension >>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>> (say another 1024 types) >>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>> >>>>>>>>> Make sense? >>>>>>>>> >>>>>>>>> >>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>> performance flaws in the design >>>>>>>>> >>>>>>>>> >>>>>>>>> we could design for performance, >>>>>>>>> >>>>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>> >>>>>>>>> but I think there will be a turning >>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>> >>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>>>> or not used because they impose a performance penalty relative to >>>>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>> performance just never get better. >>>>>>>>> >>>>>>>>> Do you >>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>> performance improvement? >>>>>>>>> >>>>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>>>> functions). >>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>> require extra memory references. >>>>>>>>> >>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>> cycles and memory references? >>>>>>>>> >>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>> perform >>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>> becomes >>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>> >>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>> performance flaws in the design: >>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>> near-zero lock contention. >>>>>>>>> >>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>> poster children for this are: >>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>> anywhere >>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>>> in working around. >>>>>>>>> >>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>> engineers that got most of it right. >>>>>>>>> >>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>> Now I >>>>>>>>> see that there are 3 approaches: >>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>> types >>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>> additional >>>>>>>>> marker space >>>>>>>>> >>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>> how the registry of types is managed. >>>>>>>>> >>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>> hands and letting applications design their own mutually >>>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>> resilient against aliasing. >>>>>>>>> >>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>> convention >>>>>>>>> needs to be revised. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>> wrote: >>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>> to fit in (the >>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>> current NDN >>>>>>>>> experiments? >>>>>>>>> >>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>> Related: Many URLs >>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>> text lines, and >>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>> I see. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>> >>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>> architectures, >>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>> bytes in memory, >>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>> 32-byte blocks >>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>> If you need to >>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>> have to read past >>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>> to get to the >>>>>>>>> name, it will be pretty expensive too. >>>>>>>>> >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Does this make that much difference? >>>>>>>>> >>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>> it is: >>>>>>>>> >>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>> from the start >>>>>>>>> offset of the beginning of the name. >>>>>>>>> OR >>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>> >>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>> seem to imply >>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>> case. >>>>>>>>> >>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>> for the index, >>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>> components. For the >>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>> 5 components. >>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>> than >>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>> find that the >>>>>>>>> performance of the index is actually slower than the >>>>>>>>> performance of the >>>>>>>>> direct access. >>>>>>>>> >>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>> all, for >>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>> offsets as >>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>> you may see a >>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>> than in >>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>> that you?re >>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>> name is to be >>>>>>>>> processed by a different processor, then your might see some >>>>>>>>> performance >>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>> bet this is not >>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>> >>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>> protocol with >>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>> the name to a >>>>>>>>> different processor than the index). >>>>>>>>> >>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>> would like to see >>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>> >>>>>>>>> Nacho >>>>>>>>> >>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>> correct me if >>>>>>>>> I?m wrong.) >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>> Protocol Architect >>>>>>>>> Principal Scientist >>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>> +1(650)812-4458 >>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>> amount of >>>>>>>>> bytes: >>>>>>>>> >>>>>>>>> i.e., >>>>>>>>> Type = Offsets >>>>>>>>> Length = 10 Bytes >>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>> >>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>> name is too >>>>>>>>> long. >>>>>>>>> >>>>>>>>> Max >>>>>>>>> >>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>> >>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>> you only >>>>>>>>> want the first x components) you can directly have it using >>>>>>>>> the >>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>> iteratively parse >>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>> directly >>>>>>>>> access to the firs x components. >>>>>>>>> >>>>>>>>> I don't get it. What you described only works if the >>>>>>>>> "offset" is >>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>> parse x-1 >>>>>>>>> offsets to get to the x offset. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>> >>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>> like the >>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>> understand what >>>>>>>>> you >>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>> entirely >>>>>>>>> different >>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>> ... >>>>>>>>> someplace >>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>> you say >>>>>>>>> "field >>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>> TLV)? >>>>>>>>> >>>>>>>>> Correct. >>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>> name >>>>>>>>> hierarchy >>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>> offset to use >>>>>>>>> in >>>>>>>>> order to retrieve special components. >>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>> Aliasing is >>>>>>>>> avoided as >>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>> use the >>>>>>>>> "offset >>>>>>>>> TLV " to do that. >>>>>>>>> >>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>> >>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>> you only >>>>>>>>> want >>>>>>>>> the first x components) you can directly have it using the >>>>>>>>> offsets. >>>>>>>>> With the >>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>> x-1 >>>>>>>>> components. >>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>> firs x >>>>>>>>> components. >>>>>>>>> >>>>>>>>> Max >>>>>>>>> >>>>>>>>> >>>>>>>>> -- Mark >>>>>>>>> >>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>> >>>>>>>>> The why is simple: >>>>>>>>> >>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>> "specific >>>>>>>>> component type". You are imposing types for every component >>>>>>>>> in order >>>>>>>>> to >>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>> rule >>>>>>>>> (specify >>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>> >>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>> prefer >>>>>>>>> to >>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>> separator. Then, >>>>>>>>> outside the name, if you have some components that could be >>>>>>>>> used at >>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>> that >>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>> version, >>>>>>>>> segment, etc in the name... >>>>>>>>> >>>>>>>>> >>>>>>>>> Max >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>> >>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>> >>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>> up with >>>>>>>>> names >>>>>>>>> containing many generic components types and few specific >>>>>>>>> components >>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>> is an >>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>> component's >>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>> that >>>>>>>>> applications MUST use). >>>>>>>>> >>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>> explanation >>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>> e.g.) >>>>>>>>> and >>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>> have to >>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>> prefer >>>>>>>>> the >>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>> preference in >>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>> it is >>>>>>>>> that >>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> . >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>> >>> >>>_______________________________________________ >>>Ndn-interest mailing list >>>Ndn-interest at lists.cs.ucla.edu >>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >> >> _______________________________________________ >> Ndn-interest mailing list >> Ndn-interest at lists.cs.ucla.edu >> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From tailinchu at gmail.com Sat Sep 27 13:19:57 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 27 Sep 2014 13:19:57 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: The concern is that the table of content is not confirmed by the original provider; the cache server's data is "trusted with some other chains". This trust model works but brings restriction. It basically requires build another trust model on "cache server"; otherwise, nothing discovered can be trusted, which also means that you discover nothing. Another critical point is that those cache servers are not hierarchical, so we can only apply flat signing (one guy signs them all.) This looks very problematic. An quick fix is that you just use impose the name hierarchy, but it is cumbersome too. Here is our discussion so far: exact matching -> ... needs discovery protocol (because it is not lpm) -> discovery needs table of content -> restrictive trust model My argument is that this restrictive trust model logically discourages exact matching. On Sat, Sep 27, 2014 at 12:40 PM, wrote: > On 9/27/14, 8:40 PM, "Tai-Lin Chu" wrote: > >>> /mail/inbox/selector_matching/ >> >>So Is this implicit? > > No. This is an explicit hash of the interest payload. > > So, an interest could look like: > > Interest: > name = /mail/inbox/selector_matching/1234567890 > payload = ?user=nacho? > > where hash(?user=nacho?) = 1234567890 > > >>BTW, I read all your replies. I think the discovery protocol (send out >>table of content) has to reach the original provider ; otherwise there >>will be some issues in the trust model. At least the cached table of >>content has to be confirmed with the original provider either by key >>delegation or by other confirmation protocol. Besides this, LGTM. > > > The trust model is just slightly different. > > You could have something like: > > Interest: > name = /mail/inbox/selector_matching/1234567890 > payload = ?user=nacho,publisher=mail_server_key? > > > In this case, the reply would come signed by some random cache, but the > encapsulated object would be signed by mail_server_key. So, any node that > understood the Selector Protocol could decapsulate the reply and check the > signature. > > Nodes that do not understand the Selector Protocol would not be able to > check the signature of the encapsulated answer. > > This to me is not a problem. Base nodes (the ones not running the Selector > Protocol) would not be checking signatures anyway, at least not in the > fast path. This is an expensive operation that requires the node to get > the key, etc. Nodes that run the Selector Protocol can check signatures > if they wish (and can get their hands on a key). > > > > Nacho > > >>On Sat, Sep 27, 2014 at 1:10 AM, wrote: >>> On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" wrote: >>> >>>>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>>>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" wrote: >>>>> >>>>>> How can a cache respond to /mail/inbox/selector_matching/>>>>> payload> with a table of content? This name prefix is owned by the >>>>>>mail >>>>>> server. Also the reply really depends on what is in the cache at >>>>>>the >>>>>> moment, so the same name would correspond to different data. >>>>> >>>>> A - Yes, the same name would correspond to different data. This is >>>>>true >>>>> given that then data has changed. NDN (and CCN) has no architectural >>>>> requirement that a name maps to the same piece of data (Obviously not >>>>> talking about self certifying hash-based names). >>>> >>>>There is a difference. A complete NDN name including the implicit >>>>digest >>>>uniquely identifies a piece of data. >>> >>> That?s the same thing for CCN with a ContentObjectHash. >>> >>> >>>>But here the same complete name may map to different data (I suppose you >>>>don't have implicit digest in an effort to do exact matching). >>> >>> We do, it?s called ContentObjectHash, but it?s not considered part of >>>the >>> name, it?s considered a matching restriction. >>> >>> >>>>In other words, in your proposal, the same name >>>>/mail/inbox/selector_matching/hash1 may map to two or more different >>>>data >>>>packets. But in NDN, two Data packets may share a name prefix, but >>>>definitely not the implicit digest. And at least it is my understanding >>>>that the application design should make sure that the same producer >>>>doesn't produce different Data packets with the same name prefix before >>>>implicit digest. >>> >>> This is an application design issue. The network cannot enforce this. >>> Applications will be able to name various data objects with the same >>>name. >>> After all, applications don?t really control the implicit digest. >>> >>>>It is possible in attack scenarios for different producers to generate >>>>Data packets with the same name prefix before implicit digest, but still >>>>not the same implicit digest. >>> >>> Why is this an attack scenario? Isn?t it true that if I name my local >>> printer /printer that name can exist in the network at different >>>locations >>> from different publishers? >>> >>> >>> Just to clarify, in the examples provided we weren?t using implicit >>>hashes >>> anywhere. IF we were using implicit hashes (as in, we knew what the >>> implicit hash was), then selectors are useless. If you know the implicit >>> hash, then you don?t need selectors. >>> >>> In the case of CCN, we use names without explicit hashes for most of our >>> initial traffic (discovery, manifests, dynamically generated data, >>>etc.), >>> but after that, we use implicit digests (ContentObjectHash restriction) >>> for practically all of the other traffic. >>> >>> Nacho >>> >>> >>>>> >>>>> B - Yes, you can consider the name prefix is ?owned? by the server, >>>>>but >>>>> the answer is actually something that the cache is choosing. The cache >>>>>is >>>>> choosing from the set if data that it has. The data that it >>>>>encapsulates >>>>> _is_ signed by the producer. Anybody that can decapsulate the data >>>>>can >>>>> verify that this is the case. >>>>> >>>>> Nacho >>>>> >>>>> >>>>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>>>> >>>>>>> My beating on ?discover all? is exactly because of this. Let?s >>>>>>>define >>>>>>> discovery service. If the service is just ?discover latest? >>>>>>> (left/right), can we not simplify the current approach? If the >>>>>>>service >>>>>>> includes more than ?latest?, then is the current approach the right >>>>>>> approach? >>>>>>> >>>>>>> Sync has its place and is the right solution for somethings. >>>>>>>However, >>>>>>> it should not be a a bandage over discovery. Discovery should be >>>>>>>its >>>>>>> own valid and useful service. >>>>>>> >>>>>>> I agree that the exclusion approach can work, and work relatively >>>>>>>well, >>>>>>> for finding the rightmost/leftmost child. I believe this is because >>>>>>> that operation is transitive through caches. So, within whatever >>>>>>> timeout an application is willing to wait to find the ?latest?, it >>>>>>>can >>>>>>> keep asking and asking. >>>>>>> >>>>>>> I do think it would be best to actually try to ask an authoritative >>>>>>> source first (i.e. a non-cached value), and if that fails then probe >>>>>>> caches, but experimentation may show what works well. This is based >>>>>>>on >>>>>>> my belief that in the real world in broad use, the namespace will >>>>>>>become >>>>>>> pretty polluted and probing will result in a lot of junk, but >>>>>>>that?s >>>>>>> future prognosticating. >>>>>>> >>>>>>> Also, in the exact match vs. continuation match of content object to >>>>>>> interest, it is pretty easy to encode that ?selector? request in a >>>>>>>name >>>>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & >>>>>>>sort=right?) >>>>>>> and any participating cache can respond with a link (or >>>>>>>encapsulate) a >>>>>>> response in an exact match system. >>>>>>> >>>>>>> In the CCNx 1.0 spec, one could also encode this a different way. >>>>>>>One >>>>>>> could use a name like ?/mail/inbox/selector_matching/>>>>>>payload>? >>>>>>> and in the payload include "exclude_before=(t=version, l=2, v=279) & >>>>>>> sort=right?. This means that any cache that could process the ? >>>>>>> selector_matching? function could look at the interest payload and >>>>>>> evaluate the predicate there. The predicate could become large and >>>>>>>not >>>>>>> pollute the PIT with all the computation state. Including ?>>>>>> payload>? in the name means that one could get a cached response if >>>>>>> someone else had asked the same exact question (subject to the >>>>>>>content >>>>>>> object?s cache lifetime) and it also servers to multiplex different >>>>>>> payloads for the same function (selector_matching). >>>>>>> >>>>>>> Marc >>>>>>> >>>>>>> >>>>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>>>wrote: >>>>>>> >>>>>>>> >>>>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchroniz >>>>>>>>at >>>>>>>>io >>>>>>>> nPr >>>>>>>> otocol.html >>>>>>>> >>>>>>>> J. >>>>>>>> >>>>>>>> >>>>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>>>>> >>>>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>>>*all*-value" >>>>>>>>> efficiently. >>>>>>>>> There are still interesting topics on >>>>>>>>> 1. how do we express the discovery query? >>>>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>>>> discovery query with current selector? >>>>>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>>>>way? >>>>>>>>> >>>>>>>>> I personally see a named data as a set, which can then be >>>>>>>>>categorized >>>>>>>>> into "ordered set", and "unordered set". >>>>>>>>> some questions that any discovery expression must solve: >>>>>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>>>>leaf >>>>>>>>> 2. set contains member X? >>>>>>>>> 3. is set ordered or not >>>>>>>>> 4. (ordered) first, prev, next, last >>>>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>>>> 6. recursively answer all questions above on any set member >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> From: >>>>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>>>> To: Jeff Burke >>>>>>>>>> Cc: , , >>>>>>>>>> >>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>> >>>>>>>>>> I think Tai-Lin?s example was just fine to talk about discovery. >>>>>>>>>> /blah/blah/value, how do you discover all the ?value?s? >>>>>>>>>>Discovery >>>>>>>>>> shouldn?t >>>>>>>>>> care if its email messages or temperature readings or world cup >>>>>>>>>> photos. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>>>>case, >>>>>>>>>> as you >>>>>>>>>> point out, sync-style approaches may be best. But I am not sure >>>>>>>>>>that >>>>>>>>>> this >>>>>>>>>> definition is complete. The most pressing example that I can >>>>>>>>>>think >>>>>>>>>> of >>>>>>>>>> is >>>>>>>>>> best-effort latest-value, in which the consumer's goal is to get >>>>>>>>>>the >>>>>>>>>> latest >>>>>>>>>> copy the network can deliver at the moment, and may not care >>>>>>>>>>about >>>>>>>>>> previous >>>>>>>>>> values or (if freshness is used well) potential later versions. >>>>>>>>>> >>>>>>>>>> Another case that seems to work well is video seeking. Let's >>>>>>>>>>say I >>>>>>>>>> want to >>>>>>>>>> enable random access to a video by timecode. The publisher can >>>>>>>>>> provide a >>>>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>>>Interest >>>>>>>>>> that >>>>>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>>>>which >>>>>>>>>> returns an interest that, via the name, provides the exact >>>>>>>>>>timecode >>>>>>>>>> of >>>>>>>>>> the >>>>>>>>>> keyframe in question and a link to a segment-based namespace for >>>>>>>>>> efficient >>>>>>>>>> exact match playout. In two roundtrips and in a very lightweight >>>>>>>>>> way, >>>>>>>>>> the >>>>>>>>>> consumer has random access capability. If the NDN is the moral >>>>>>>>>> equivalent >>>>>>>>>> of IP, then I am not sure we should be afraid of roundtrips that >>>>>>>>>> provide >>>>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I described one set of problems using the exclusion approach, and >>>>>>>>>> that >>>>>>>>>> an >>>>>>>>>> NDN paper on device discovery described a similar problem, though >>>>>>>>>> they >>>>>>>>>> did >>>>>>>>>> not go into the details of splitting interests, etc. That all >>>>>>>>>>was >>>>>>>>>> simple >>>>>>>>>> enough to see from the example. >>>>>>>>>> >>>>>>>>>> Another question is how does one do the discovery with exact >>>>>>>>>>match >>>>>>>>>> names, >>>>>>>>>> which is also conflating things. You could do a different >>>>>>>>>>discovery >>>>>>>>>> with >>>>>>>>>> continuation names too, just not the exclude method. >>>>>>>>>> >>>>>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>>>>about >>>>>>>>>> its >>>>>>>>>> ?table of contents? for a prefix so one can get a consistent set >>>>>>>>>>of >>>>>>>>>> results >>>>>>>>>> without all the round-trips of exclusions. Actually downloading >>>>>>>>>>the >>>>>>>>>> ?headers? of the messages would be the same bytes, more or less. >>>>>>>>>>In >>>>>>>>>> a >>>>>>>>>> way, >>>>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, but >>>>>>>>>>that >>>>>>>>>> protocol has its own set of problems and I?m not suggesting to >>>>>>>>>>use >>>>>>>>>> that >>>>>>>>>> directly. >>>>>>>>>> >>>>>>>>>> One approach is to encode a request in a name component and a >>>>>>>>>> participating >>>>>>>>>> cache can reply. It replies in such a way that one could >>>>>>>>>>continue >>>>>>>>>> talking >>>>>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>>>> interest >>>>>>>>>> with >>>>>>>>>> a request for not-that-cache. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>>>scenario? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Another approach is to try to ask the authoritative source for >>>>>>>>>>the >>>>>>>>>> ?current? >>>>>>>>>> manifest name, i.e. /mail/inbox/current/, which could >>>>>>>>>>return >>>>>>>>>> the >>>>>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>>>> manifest >>>>>>>>>> from >>>>>>>>>> the link could come from caches because you how have a consistent >>>>>>>>>> set of >>>>>>>>>> names to ask for. If you cannot talk with an authoritative >>>>>>>>>>source, >>>>>>>>>> you >>>>>>>>>> could try again without the nonce and see if there?s a cached >>>>>>>>>>copy >>>>>>>>>> of a >>>>>>>>>> recent version around. >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>>>>> >>>>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>>>>see a >>>>>>>>>> pattern with static (/mail/inbox) and variable (148) components; >>>>>>>>>>with >>>>>>>>>> proper naming convention, computers can also detect this pattern >>>>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>>>generate >>>>>>>>>> a >>>>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>>>> selectors >>>>>>>>>> I can further refine my guesses. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>>>> application >>>>>>>>>> design). You have an app (a mail server / inbox) and you want it >>>>>>>>>>to >>>>>>>>>> list >>>>>>>>>> your emails? An email list is an application data structure. I >>>>>>>>>> don?t >>>>>>>>>> think you should use the network structure to reflect this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think Tai-Lin is trying to sketch a small example, not propose >>>>>>>>>>a >>>>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Another way to look at it is that if the network architecture is >>>>>>>>>> providing >>>>>>>>>> the equivalent of distributed storage to the application, perhaps >>>>>>>>>>the >>>>>>>>>> application data structure could be adapted to match the >>>>>>>>>>affordances >>>>>>>>>> of >>>>>>>>>> the network. Then it would not be so bad that the two structures >>>>>>>>>> were >>>>>>>>>> aligned. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I?ll give you an example, how do you delete emails from your >>>>>>>>>>inbox? >>>>>>>>>> If >>>>>>>>>> an >>>>>>>>>> email was cached in the network it can never be deleted from your >>>>>>>>>> inbox? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This is conflating two issues - what you are pointing out is that >>>>>>>>>>the >>>>>>>>>> data >>>>>>>>>> structure of a linear list doesn't handle common email management >>>>>>>>>> operations well. Again, I'm not sure if that's what he was >>>>>>>>>>getting >>>>>>>>>> at >>>>>>>>>> here. But deletion is not the issue - the availability of a data >>>>>>>>>> object >>>>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>>>> perspective >>>>>>>>>> of the application. >>>>>>>>>> >>>>>>>>>> Or moved to another mailbox? Do you rely on the emails expiring? >>>>>>>>>> >>>>>>>>>> This problem is true for most (any?) situations where you use >>>>>>>>>>network >>>>>>>>>> name >>>>>>>>>> structure to directly reflect the application data structure. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Not sure I understand how you make the leap from the example to >>>>>>>>>>the >>>>>>>>>> general statement. >>>>>>>>>> >>>>>>>>>> Jeff >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Nacho >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>>>>> >>>>>>>>>> Ok, yes I think those would all be good things. >>>>>>>>>> >>>>>>>>>> One thing to keep in mind, especially with things like time >>>>>>>>>>series >>>>>>>>>> sensor >>>>>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>>>> That?s >>>>>>>>>> easy >>>>>>>>>> for a human :) But in Discovery, one should assume that one does >>>>>>>>>>not >>>>>>>>>> know >>>>>>>>>> of patterns in the data beyond what the protocols used to publish >>>>>>>>>>the >>>>>>>>>> data >>>>>>>>>> explicitly require. That said, I think some of the things you >>>>>>>>>>listed >>>>>>>>>> are >>>>>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>>>> genome >>>>>>>>>> data. >>>>>>>>>> >>>>>>>>>> We also need to state what the forwarding strategies are and what >>>>>>>>>>the >>>>>>>>>> cache >>>>>>>>>> behavior is. >>>>>>>>>> >>>>>>>>>> I outlined some of the points that I think are important in that >>>>>>>>>> other >>>>>>>>>> posting. While ?discover latest? is useful, ?discover all? is >>>>>>>>>>also >>>>>>>>>> important, and that one gets complicated fast. So points like >>>>>>>>>> separating >>>>>>>>>> discovery from retrieval and working with large data sets have >>>>>>>>>>been >>>>>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>>>>> starting >>>>>>>>>> from 0 and working through the Discovery service definition from >>>>>>>>>> scratch >>>>>>>>>> along with data set use cases. >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Marc, >>>>>>>>>> >>>>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one >>>>>>>>>>step >>>>>>>>>> more >>>>>>>>>> specific, which was to see if we could identify a few specific >>>>>>>>>>use >>>>>>>>>> cases >>>>>>>>>> around which to have the conversation. (e.g., time series sensor >>>>>>>>>> data >>>>>>>>>> and >>>>>>>>>> web content retrieval for "get latest"; climate data for huge >>>>>>>>>>data >>>>>>>>>> sets; >>>>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>>>looking >>>>>>>>>> at >>>>>>>>>> that's driving considerations of discovery? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Jeff >>>>>>>>>> >>>>>>>>>> From: >>>>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>>>> To: Jeff Burke >>>>>>>>>> Cc: , >>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>> >>>>>>>>>> Jeff, >>>>>>>>>> >>>>>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>>>>> Discovery. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-September >>>>>>>>>>/0 >>>>>>>>>>00 >>>>>>>>>> 20 >>>>>>>>>> 0 >>>>>>>>>> .html >>>>>>>>>> >>>>>>>>>> I think it would be very productive to talk about what Discovery >>>>>>>>>> should >>>>>>>>>> do, >>>>>>>>>> and not focus on the how. It is sometimes easy to get caught up >>>>>>>>>>in >>>>>>>>>> the >>>>>>>>>> how, >>>>>>>>>> which I think is a less important topic than the what at this >>>>>>>>>>stage. >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Marc, >>>>>>>>>> >>>>>>>>>> If you can't talk about your protocols, perhaps we can discuss >>>>>>>>>>this >>>>>>>>>> based >>>>>>>>>> on use cases. What are the use cases you are using to evaluate >>>>>>>>>> discovery? >>>>>>>>>> >>>>>>>>>> Jeff >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>>>forwarder >>>>>>>>>> can >>>>>>>>>> send interests different ways you don't have a consistent >>>>>>>>>>underlying >>>>>>>>>> set >>>>>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>>>> discover >>>>>>>>>> every version. >>>>>>>>>> >>>>>>>>>> Range exclusions only work I believe if you get an authoritative >>>>>>>>>> answer. >>>>>>>>>> If different content pieces are scattered between different >>>>>>>>>>caches >>>>>>>>>>I >>>>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>>>version. >>>>>>>>>> >>>>>>>>>> I'm sorry to be pointing out problems without offering solutions >>>>>>>>>>but >>>>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>>>> >>>>>>>>>> Sent from my telephone >>>>>>>>>> >>>>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>> I see. Can you briefly describe how ccnx discovery protocol >>>>>>>>>>solves >>>>>>>>>> the >>>>>>>>>> all problems that you mentioned (not just exclude)? a doc will be >>>>>>>>>> better. >>>>>>>>>> >>>>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I will >>>>>>>>>> soon >>>>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>>>> Regular >>>>>>>>>> language or context free language might become part of selector >>>>>>>>>>too. >>>>>>>>>> >>>>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>>>>> That will get you one reading then you need to exclude it and ask >>>>>>>>>> again. >>>>>>>>>> >>>>>>>>>> Sent from my telephone >>>>>>>>>> >>>>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>versions >>>>>>>>>> of an object. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I am very confused. For your example, if I want to get all >>>>>>>>>>today's >>>>>>>>>> sensor data, I just do (Any..Last second of last day)(First >>>>>>>>>>second >>>>>>>>>>of >>>>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>>>> >>>>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>>>> >>>>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>>>> could miss content objects you want to discovery unless you avoid >>>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Could you explain why missing content object situation happens? >>>>>>>>>>also >>>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>>> exclude; >>>>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>>>> possible. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes, my point was that if you cannot talk about a consistent set >>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>versions >>>>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>>>updated, >>>>>>>>>> say, once per second you will have 86,400 of them per day. If >>>>>>>>>>each >>>>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>>>> >>>>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>>>> timestamp makes sense here, but its just an example of needing a >>>>>>>>>>lot >>>>>>>>>> of exclusions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You exclude through 100 then issue a new interest. This goes to >>>>>>>>>> cache B >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>>>> interest, and cache A will return v101 if it exists. Like you >>>>>>>>>>said, >>>>>>>>>> if >>>>>>>>>> this goes to cache B only, it means that cache A dies. How do you >>>>>>>>>> know >>>>>>>>>> that v101 even exist? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I guess this depends on what the forwarding strategy is. If the >>>>>>>>>> forwarder will always send each interest to all replicas, then >>>>>>>>>>yes, >>>>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>>>>cache >>>>>>>>>> A and cache B, then your application could miss v101. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>>>number >>>>>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>>>>only >>>>>>>>>> one RWMutex for the whole tree. I don't know whether adding lock >>>>>>>>>>for >>>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>>> >>>>>>>>>> However, we should compare (exact match + discovery protocol) vs >>>>>>>>>> (ndn >>>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes, we should compare them. And we need to publish the ccnx 1.0 >>>>>>>>>> specs for doing the exact match discovery. So, as I said, I?m >>>>>>>>>>not >>>>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>>>> matching to do discovery has its own set of problems. Discovery >>>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>>> >>>>>>>>>> This is probably getting off-topic from the original post about >>>>>>>>>> naming conventions. >>>>>>>>>> >>>>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>>>> content objects you want to discovery unless you avoid all range >>>>>>>>>> exclusions and only exclude explicit versions. That will lead to >>>>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>>> consistent sets is better. >>>>>>>>>> >>>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>>> should be transitive between caches, but imagine this. You send >>>>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>>>> through 100 then issue a new interest. This goes to cache B who >>>>>>>>>> only has version 99, so the interest times out or is NACK?d. So >>>>>>>>>> you think you have it! But, cache A already has version 101, you >>>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>>> consistent sets, it seems like even doing latest version >>>>>>>>>>discovery >>>>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>>>> authoritative source because you can never believe an >>>>>>>>>>intermediate >>>>>>>>>> cache that there?s not something more recent. >>>>>>>>>> >>>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>>>> correctly discover every version of a name prefix, and (b) is >>>>>>>>>>that >>>>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>>>> formally compared (or yet published) our discovery protocols (we >>>>>>>>>> have three, 2 for content, 1 for device) compared to selector >>>>>>>>>>based >>>>>>>>>> discovery, so I cannot yet claim they are better, but they do not >>>>>>>>>> have the non-determinism sketched above. >>>>>>>>>> >>>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups you >>>>>>>>>> must do in the PIT to match a content object. If you have a name >>>>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>>>> lookups, but you need to walk up the name tree for every prefix >>>>>>>>>>of >>>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>>> Content Based Networking (CBN) had some some methods to create >>>>>>>>>>data >>>>>>>>>> structures based on predicates, maybe those would be better. But >>>>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>>>entries >>>>>>>>>> if there is Interest traffic for many prefixes of a root. Even >>>>>>>>>>on >>>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll have a >>>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>>> implementation only requires at most 3 lookups (one by name, one >>>>>>>>>>by >>>>>>>>>> name + keyid, one by name + content object hash), and one can do >>>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>>> >>>>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning >>>>>>>>>>LWP) >>>>>>>>>> and that will be expensive. It would be interesting to see what >>>>>>>>>>a >>>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I had thought about these questions, but I want to know your idea >>>>>>>>>> besides typed component: >>>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>>> things? >>>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>>> other >>>>>>>>>> faster technique to replace selector? >>>>>>>>>> 3. fixed byte length and type. I agree more that type can be >>>>>>>>>>fixed >>>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>> how the registry of types is managed. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Could you share it with us? >>>>>>>>>> >>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>> >>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>> >>>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>> likelihood there will be at least as much and probably more need >>>>>>>>>> to component types than protocol extensions, we could reserve 1/2 >>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>>> and other fields of the protocol (sine they are sub-types of the >>>>>>>>>> name type) we could reuse numbers and thereby have an entire 65K >>>>>>>>>> name component types. >>>>>>>>>> >>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>> standard, IANA has 25 years of experience running registries and >>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>>> approval). >>>>>>>>>> >>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>> cases where there are no special semantics on the name component. >>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>> globally understood types that are part of the base or extension >>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>> (say another 1024 types) >>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>> >>>>>>>>>> Make sense? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>> performance flaws in the design >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> we could design for performance, >>>>>>>>>> >>>>>>>>>> That?s not what people are advocating. We are advocating that we >>>>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>> >>>>>>>>>> but I think there will be a turning >>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>> >>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>> things that don?t get faster while others do tend to get dropped >>>>>>>>>> or not used because they impose a performance penalty relative to >>>>>>>>>> the things that go faster. There is also the ?low-end? phenomenon >>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>>> performance just never get better. >>>>>>>>>> >>>>>>>>>> Do you >>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>> performance improvement? >>>>>>>>>> >>>>>>>>>> I suspect LPM on data will always be slow (relative to the other >>>>>>>>>> functions). >>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>> require extra memory references. >>>>>>>>>> >>>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>>> speculation based on 35+ years of seeing performance improve by 4 >>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>> cycles and memory references? >>>>>>>>>> >>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>> perform >>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>> becomes >>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>> >>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>> performance flaws in the design: >>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>> near-zero lock contention. >>>>>>>>>> >>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>> poster children for this are: >>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>> anywhere >>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>>>> in working around. >>>>>>>>>> >>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>> engineers that got most of it right. >>>>>>>>>> >>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>> Now I >>>>>>>>>> see that there are 3 approaches: >>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>> types >>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>> additional >>>>>>>>>> marker space >>>>>>>>>> >>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>> how the registry of types is managed. >>>>>>>>>> >>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>> resilient against aliasing. >>>>>>>>>> >>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>> convention >>>>>>>>>> needs to be revised. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>> wrote: >>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>> to fit in (the >>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>> current NDN >>>>>>>>>> experiments? >>>>>>>>>> >>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>> Related: Many URLs >>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>> text lines, and >>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>> I see. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>> >>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>> architectures, >>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>> bytes in memory, >>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>> 32-byte blocks >>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>> If you need to >>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>> have to read past >>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>> to get to the >>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Does this make that much difference? >>>>>>>>>> >>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>> it is: >>>>>>>>>> >>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>> from the start >>>>>>>>>> offset of the beginning of the name. >>>>>>>>>> OR >>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>> >>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>> seem to imply >>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>> case. >>>>>>>>>> >>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>> for the index, >>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>> components. For the >>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>> 5 components. >>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>> than >>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>> find that the >>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>> performance of the >>>>>>>>>> direct access. >>>>>>>>>> >>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>> all, for >>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>> offsets as >>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>> you may see a >>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>> than in >>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>> that you?re >>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>> name is to be >>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>> performance >>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>> bet this is not >>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>> >>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>> protocol with >>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>> the name to a >>>>>>>>>> different processor than the index). >>>>>>>>>> >>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>> would like to see >>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>> >>>>>>>>>> Nacho >>>>>>>>>> >>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>> correct me if >>>>>>>>>> I?m wrong.) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>> Protocol Architect >>>>>>>>>> Principal Scientist >>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>> +1(650)812-4458 >>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>> amount of >>>>>>>>>> bytes: >>>>>>>>>> >>>>>>>>>> i.e., >>>>>>>>>> Type = Offsets >>>>>>>>>> Length = 10 Bytes >>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>> >>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>> name is too >>>>>>>>>> long. >>>>>>>>>> >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>> >>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>> you only >>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>> the >>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>> iteratively parse >>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>> directly >>>>>>>>>> access to the firs x components. >>>>>>>>>> >>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>> "offset" is >>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>> parse x-1 >>>>>>>>>> offsets to get to the x offset. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>> >>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>> like the >>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>> understand what >>>>>>>>>> you >>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>> entirely >>>>>>>>>> different >>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>> ... >>>>>>>>>> someplace >>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>> you say >>>>>>>>>> "field >>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>> TLV)? >>>>>>>>>> >>>>>>>>>> Correct. >>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>> name >>>>>>>>>> hierarchy >>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>> offset to use >>>>>>>>>> in >>>>>>>>>> order to retrieve special components. >>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>> Aliasing is >>>>>>>>>> avoided as >>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>> use the >>>>>>>>>> "offset >>>>>>>>>> TLV " to do that. >>>>>>>>>> >>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>> >>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>> you only >>>>>>>>>> want >>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>> offsets. >>>>>>>>>> With the >>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>> x-1 >>>>>>>>>> components. >>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>> firs x >>>>>>>>>> components. >>>>>>>>>> >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- Mark >>>>>>>>>> >>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>> >>>>>>>>>> The why is simple: >>>>>>>>>> >>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>> "specific >>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>> in order >>>>>>>>>> to >>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>> rule >>>>>>>>>> (specify >>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>> >>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>> prefer >>>>>>>>>> to >>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>> separator. Then, >>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>> used at >>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>> that >>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>> version, >>>>>>>>>> segment, etc in the name... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>> >>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>> >>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>> up with >>>>>>>>>> names >>>>>>>>>> containing many generic components types and few specific >>>>>>>>>> components >>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>> is an >>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>> component's >>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>> that >>>>>>>>>> applications MUST use). >>>>>>>>>> >>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>> explanation >>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>> e.g.) >>>>>>>>>> and >>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>> have to >>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>> prefer >>>>>>>>>> the >>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>> preference in >>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>> it is >>>>>>>>>> that >>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mark >>>>>>>>>> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>> >>>> >>>>_______________________________________________ >>>>Ndn-interest mailing list >>>>Ndn-interest at lists.cs.ucla.edu >>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>> >>> _______________________________________________ >>> Ndn-interest mailing list >>> Ndn-interest at lists.cs.ucla.edu >>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> >>_______________________________________________ >>Ndn-interest mailing list >>Ndn-interest at lists.cs.ucla.edu >>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest > From Ignacio.Solis at parc.com Sat Sep 27 14:44:06 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Sat, 27 Sep 2014 21:44:06 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: On 9/27/14, 10:19 PM, "Tai-Lin Chu" wrote: >The concern is that the table of content is not confirmed by the >original provider; the cache server's data is "trusted with some other >chains". This trust model works but brings restriction. It basically >requires build another trust model on "cache server"; otherwise, >nothing discovered can be trusted, which also means that you discover >nothing. I?m not sure what you mean by trust to the cache. NDN has no trust to the cache and no way to trust that a selector match is the correct match. As I know, a cache can have /foo/1 /foo/2 /foo/3 At could reply with /foo/2 and not give you /foo/3 as the ?latest?. You have no way to trust that a cache will give you anything specific. You can?t really require this because you can?t require cache nodes to have a specific cache replacement policy, so as far as you know, the cache could have dropped /foo/3 from the cache. As a matter of fact, unless you require signature verification at cache nodes (CCN requires this), you don?t even have that. From what I?ve been told, it?s optional for nodes to check for signatures. So, at any point in the network, you never know if previous nodes have verified the signature. So, I?m not sure what kind of ?trust model? you refer to. Is there some trust model has that this Selector Protocol breaks at the nodes that run the Selector Protocol? If so, could you please explain it. >Another critical point is that those cache servers are not >hierarchical, so we can only apply flat signing (one guy signs them >all.) This looks very problematic. An quick fix is that you just use >impose the name hierarchy, but it is cumbersome too. Nobody really cares about the signature of the reply. You care about what?s encapsulated inside, which, in fact, does authenticate to the selector request. Every node running the Selector Protocol can check this reply and this signature. >Here is our discussion so far: >exact matching -> ... needs discovery protocol (because it is not lpm) >-> discovery needs table of content -> restrictive trust model >My argument is that this restrictive trust model logically discourages >exact matching. I?m not sure what to make of this. Every system needs a discovery protocol. NDN is doing it via selector matching at the forwarder. CCN does it at a layer above that. We don?t believe you should force nodes to let their caches be discoverable and to run the computation needed for this. There is no restrictive trust model. In CCN we don?t do anything of what I?ve described because we don?t do Selector Protocol. The Selector Protocol I?m just described is meant to give you the same semantics as NDN using exact matching. this includes the security model. Just because the ?layer underneath? (aka CCN) does not do the same security model doesn?t mean that the protocol doesn?t deliver it to you. It seems to me that you?d be hard pressed to find a feature difference between NDN selectors and the CCN nodes running the Selector Protocol I described. Let me go over it once again: Network: A - - - B - - C - - D E - F - + A, B, D, E and F are running the Selector Protocol. C is not. D is serving content for /foo B has a copy of /foo/100, signed by D Node A wants something that starts with /foo/ but has a next component greater than 50 A issues an interest: Interest: name = /foo/hash(Sel(>50)) Payload = Sel(>50) Interest arrives at B. B notices that it?s a Selector Protocol based interest. It interprets the payload, looks at the cache and finds /foo/100 as a match. It generates a reply. Data: Name = /foo/hash(Sel(>50)) Payload = ( Data: Name = /foo/100, Signature = Node D, Payload = data) Signature = Node B That data is sent to node A. A is running the Selector Protocol. It notices that this reply is a Selector Protocol reply. It decapsulates the Payload. It extracts /foo/100. It checks that /foo/100 is signed by Node D. A?s interest is satisfied. A issues new interest (for something newer, it wasn?t satisfied with 100). A issues an interest: Interest: name = /foo/hash(Sel(>100)) Payload = Sel(>100) Sends interest to B. B knows it?s a Selector Protocol interest. Parses the payload for the selectors. It looks at the cache, finds no match. B sends the interest to C C doesn?t understand Selector Protocol. It just does exact matching. Finds no match. It forwards the interest to node D. D is running the Selector Protocol. D looks at the data to see what?s the latest one. It?s 200. D creates encapsulated reply. Data: Name = /foo/hash(Sel(>100)) Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) Signature = Node D Forwards the data to C. C doesn?t know Selector Protocol. It matches the Data to the Interest based on exact match of the name /foo/hash(Sel(>100)). It may or may not cache the data. C forwards the data to B. B is running the Selector Protocol. It matches the data to the interest based on PIT. It then proceeds to check that the selectors actually match. It check that /foo/200 is greater than /foo/100. The check passes. It decides to keep a copy of /foo/200 in its cache. Node B forwards the data to Node A, which receives it. Node A is running the Selector Protocol. It decapsulates the data, checks the authenticity and hands it to the app. Node E wants some of the /foo data, but only with the right signature. Node E issues an interest: Interest: name = /foo/hash(Sel(Key=NodeD)) Payload = Sel(Key=NodeD) Sends it to Node F. F receives the interest. It knows it?s a Selector Protocol interest. Parses payload, looks in cache but finds no match. F forwards the interest to node B. B receives the interest. It knows it?s a Selector Protocol interest. Parses payload, looks in the cache and finds a match (namely /foo/200). /foo/200 checks out since it is signed by Node D. Node B creates a reply by encapsulating /foo/200: Name = /foo/hash(Sel(Key=NodeD)) Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) Signature = Node B It sends the data to F. Node F is running the Selector Protocol. It sees the reply. It decapsulates the object inside (/foo/200). It knows that this PIT entry has selectors and requires that the signature come from Node D. It checks that the signature of /foo/200 is from node D. It is. This is a valid reply to the interest, so it forwards the data along to node E and consumes the interest. Node F keeps a copy of the /foo/200 object. Node E receives the object. Matches it to the PIT. Decapsulates the data (since E is running the Selector Protocol), matches it to the selectors and once checked sends it to the application. Done. In this scenario, most nodes were running the Selector Protocol. But it?s possible for some nodes not to run it. Those nodes would only do exact matching (like node C). In this example, Node C kept a copy of the packet /foo/hash(Sel(>100)) (which encapsulated /foo/200), it could use this as a reply to another interest with the same name, but it wouldn?t be able to use this to answer a selector of /foo/hash(Sel(>150)) since that would require selector parsing. That request would just be forwarded. To summarize, nodes running the Selector Protocol behave like NDN nodes. The rest of the other nodes can do regular CCN with exact matching. Again, we are not advocating for this discovery protocol, we are just saying that you could implement the selector functionality on top of exact matching. Those nodes that wanted to run the protocol would be able to do so, and those that did not want to run the protocol would not be required to do so. Nacho >On Sat, Sep 27, 2014 at 12:40 PM, wrote: >> On 9/27/14, 8:40 PM, "Tai-Lin Chu" wrote: >> >>>> /mail/inbox/selector_matching/ >>> >>>So Is this implicit? >> >> No. This is an explicit hash of the interest payload. >> >> So, an interest could look like: >> >> Interest: >> name = /mail/inbox/selector_matching/1234567890 >> payload = ?user=nacho? >> >> where hash(?user=nacho?) = 1234567890 >> >> >>>BTW, I read all your replies. I think the discovery protocol (send out >>>table of content) has to reach the original provider ; otherwise there >>>will be some issues in the trust model. At least the cached table of >>>content has to be confirmed with the original provider either by key >>>delegation or by other confirmation protocol. Besides this, LGTM. >> >> >> The trust model is just slightly different. >> >> You could have something like: >> >> Interest: >> name = /mail/inbox/selector_matching/1234567890 >> payload = ?user=nacho,publisher=mail_server_key? >> >> >> In this case, the reply would come signed by some random cache, but the >> encapsulated object would be signed by mail_server_key. So, any node >>that >> understood the Selector Protocol could decapsulate the reply and check >>the >> signature. >> >> Nodes that do not understand the Selector Protocol would not be able to >> check the signature of the encapsulated answer. >> >> This to me is not a problem. Base nodes (the ones not running the >>Selector >> Protocol) would not be checking signatures anyway, at least not in the >> fast path. This is an expensive operation that requires the node to get >> the key, etc. Nodes that run the Selector Protocol can check signatures >> if they wish (and can get their hands on a key). >> >> >> >> Nacho >> >> >>>On Sat, Sep 27, 2014 at 1:10 AM, wrote: >>>> On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" >>>>wrote: >>>> >>>>>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>>>>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" >>>>>>wrote: >>>>>> >>>>>>> How can a cache respond to /mail/inbox/selector_matching/>>>>>> payload> with a table of content? This name prefix is owned by the >>>>>>>mail >>>>>>> server. Also the reply really depends on what is in the cache at >>>>>>>the >>>>>>> moment, so the same name would correspond to different data. >>>>>> >>>>>> A - Yes, the same name would correspond to different data. This is >>>>>>true >>>>>> given that then data has changed. NDN (and CCN) has no architectural >>>>>> requirement that a name maps to the same piece of data (Obviously >>>>>>not >>>>>> talking about self certifying hash-based names). >>>>> >>>>>There is a difference. A complete NDN name including the implicit >>>>>digest >>>>>uniquely identifies a piece of data. >>>> >>>> That?s the same thing for CCN with a ContentObjectHash. >>>> >>>> >>>>>But here the same complete name may map to different data (I suppose >>>>>you >>>>>don't have implicit digest in an effort to do exact matching). >>>> >>>> We do, it?s called ContentObjectHash, but it?s not considered part of >>>>the >>>> name, it?s considered a matching restriction. >>>> >>>> >>>>>In other words, in your proposal, the same name >>>>>/mail/inbox/selector_matching/hash1 may map to two or more different >>>>>data >>>>>packets. But in NDN, two Data packets may share a name prefix, but >>>>>definitely not the implicit digest. And at least it is my >>>>>understanding >>>>>that the application design should make sure that the same producer >>>>>doesn't produce different Data packets with the same name prefix >>>>>before >>>>>implicit digest. >>>> >>>> This is an application design issue. The network cannot enforce this. >>>> Applications will be able to name various data objects with the same >>>>name. >>>> After all, applications don?t really control the implicit digest. >>>> >>>>>It is possible in attack scenarios for different producers to generate >>>>>Data packets with the same name prefix before implicit digest, but >>>>>still >>>>>not the same implicit digest. >>>> >>>> Why is this an attack scenario? Isn?t it true that if I name my >>>>local >>>> printer /printer that name can exist in the network at different >>>>locations >>>> from different publishers? >>>> >>>> >>>> Just to clarify, in the examples provided we weren?t using implicit >>>>hashes >>>> anywhere. IF we were using implicit hashes (as in, we knew what the >>>> implicit hash was), then selectors are useless. If you know the >>>>implicit >>>> hash, then you don?t need selectors. >>>> >>>> In the case of CCN, we use names without explicit hashes for most of >>>>our >>>> initial traffic (discovery, manifests, dynamically generated data, >>>>etc.), >>>> but after that, we use implicit digests (ContentObjectHash >>>>restriction) >>>> for practically all of the other traffic. >>>> >>>> Nacho >>>> >>>> >>>>>> >>>>>> B - Yes, you can consider the name prefix is ?owned? by the server, >>>>>>but >>>>>> the answer is actually something that the cache is choosing. The >>>>>>cache >>>>>>is >>>>>> choosing from the set if data that it has. The data that it >>>>>>encapsulates >>>>>> _is_ signed by the producer. Anybody that can decapsulate the data >>>>>>can >>>>>> verify that this is the case. >>>>>> >>>>>> Nacho >>>>>> >>>>>> >>>>>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>>>>> >>>>>>>> My beating on ?discover all? is exactly because of this. Let?s >>>>>>>>define >>>>>>>> discovery service. If the service is just ?discover latest? >>>>>>>> (left/right), can we not simplify the current approach? If the >>>>>>>>service >>>>>>>> includes more than ?latest?, then is the current approach the >>>>>>>>right >>>>>>>> approach? >>>>>>>> >>>>>>>> Sync has its place and is the right solution for somethings. >>>>>>>>However, >>>>>>>> it should not be a a bandage over discovery. Discovery should be >>>>>>>>its >>>>>>>> own valid and useful service. >>>>>>>> >>>>>>>> I agree that the exclusion approach can work, and work relatively >>>>>>>>well, >>>>>>>> for finding the rightmost/leftmost child. I believe this is >>>>>>>>because >>>>>>>> that operation is transitive through caches. So, within whatever >>>>>>>> timeout an application is willing to wait to find the ?latest?, it >>>>>>>>can >>>>>>>> keep asking and asking. >>>>>>>> >>>>>>>> I do think it would be best to actually try to ask an >>>>>>>>authoritative >>>>>>>> source first (i.e. a non-cached value), and if that fails then >>>>>>>>probe >>>>>>>> caches, but experimentation may show what works well. This is >>>>>>>>based >>>>>>>>on >>>>>>>> my belief that in the real world in broad use, the namespace will >>>>>>>>become >>>>>>>> pretty polluted and probing will result in a lot of junk, but >>>>>>>>that?s >>>>>>>> future prognosticating. >>>>>>>> >>>>>>>> Also, in the exact match vs. continuation match of content object >>>>>>>>to >>>>>>>> interest, it is pretty easy to encode that ?selector? request in a >>>>>>>>name >>>>>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & >>>>>>>>sort=right?) >>>>>>>> and any participating cache can respond with a link (or >>>>>>>>encapsulate) a >>>>>>>> response in an exact match system. >>>>>>>> >>>>>>>> In the CCNx 1.0 spec, one could also encode this a different way. >>>>>>>>One >>>>>>>> could use a name like ?/mail/inbox/selector_matching/>>>>>>>payload>? >>>>>>>> and in the payload include "exclude_before=(t=version, l=2, >>>>>>>>v=279) & >>>>>>>> sort=right?. This means that any cache that could process the ? >>>>>>>> selector_matching? function could look at the interest payload and >>>>>>>> evaluate the predicate there. The predicate could become large >>>>>>>>and >>>>>>>>not >>>>>>>> pollute the PIT with all the computation state. Including ?>>>>>>>of >>>>>>>> payload>? in the name means that one could get a cached response >>>>>>>>if >>>>>>>> someone else had asked the same exact question (subject to the >>>>>>>>content >>>>>>>> object?s cache lifetime) and it also servers to multiplex >>>>>>>>different >>>>>>>> payloads for the same function (selector_matching). >>>>>>>> >>>>>>>> Marc >>>>>>>> >>>>>>>> >>>>>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>>>>wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchron >>>>>>>>>iz >>>>>>>>>at >>>>>>>>>io >>>>>>>>> nPr >>>>>>>>> otocol.html >>>>>>>>> >>>>>>>>> J. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>>>>>> >>>>>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>>>>*all*-value" >>>>>>>>>> efficiently. >>>>>>>>>> There are still interesting topics on >>>>>>>>>> 1. how do we express the discovery query? >>>>>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>>>>> discovery query with current selector? >>>>>>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>>>>>way? >>>>>>>>>> >>>>>>>>>> I personally see a named data as a set, which can then be >>>>>>>>>>categorized >>>>>>>>>> into "ordered set", and "unordered set". >>>>>>>>>> some questions that any discovery expression must solve: >>>>>>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>>>>>leaf >>>>>>>>>> 2. set contains member X? >>>>>>>>>> 3. is set ordered or not >>>>>>>>>> 4. (ordered) first, prev, next, last >>>>>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>>>>> 6. recursively answer all questions above on any set member >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: >>>>>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>>>>> To: Jeff Burke >>>>>>>>>>> Cc: , , >>>>>>>>>>> >>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>>> >>>>>>>>>>> I think Tai-Lin?s example was just fine to talk about >>>>>>>>>>>discovery. >>>>>>>>>>> /blah/blah/value, how do you discover all the ?value?s? >>>>>>>>>>>Discovery >>>>>>>>>>> shouldn?t >>>>>>>>>>> care if its email messages or temperature readings or world cup >>>>>>>>>>> photos. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>>>>>case, >>>>>>>>>>> as you >>>>>>>>>>> point out, sync-style approaches may be best. But I am not >>>>>>>>>>>sure >>>>>>>>>>>that >>>>>>>>>>> this >>>>>>>>>>> definition is complete. The most pressing example that I can >>>>>>>>>>>think >>>>>>>>>>> of >>>>>>>>>>> is >>>>>>>>>>> best-effort latest-value, in which the consumer's goal is to >>>>>>>>>>>get >>>>>>>>>>>the >>>>>>>>>>> latest >>>>>>>>>>> copy the network can deliver at the moment, and may not care >>>>>>>>>>>about >>>>>>>>>>> previous >>>>>>>>>>> values or (if freshness is used well) potential later versions. >>>>>>>>>>> >>>>>>>>>>> Another case that seems to work well is video seeking. Let's >>>>>>>>>>>say I >>>>>>>>>>> want to >>>>>>>>>>> enable random access to a video by timecode. The publisher can >>>>>>>>>>> provide a >>>>>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>>>>Interest >>>>>>>>>>> that >>>>>>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>>>>>which >>>>>>>>>>> returns an interest that, via the name, provides the exact >>>>>>>>>>>timecode >>>>>>>>>>> of >>>>>>>>>>> the >>>>>>>>>>> keyframe in question and a link to a segment-based namespace >>>>>>>>>>>for >>>>>>>>>>> efficient >>>>>>>>>>> exact match playout. In two roundtrips and in a very >>>>>>>>>>>lightweight >>>>>>>>>>> way, >>>>>>>>>>> the >>>>>>>>>>> consumer has random access capability. If the NDN is the >>>>>>>>>>>moral >>>>>>>>>>> equivalent >>>>>>>>>>> of IP, then I am not sure we should be afraid of roundtrips >>>>>>>>>>>that >>>>>>>>>>> provide >>>>>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I described one set of problems using the exclusion approach, >>>>>>>>>>>and >>>>>>>>>>> that >>>>>>>>>>> an >>>>>>>>>>> NDN paper on device discovery described a similar problem, >>>>>>>>>>>though >>>>>>>>>>> they >>>>>>>>>>> did >>>>>>>>>>> not go into the details of splitting interests, etc. That all >>>>>>>>>>>was >>>>>>>>>>> simple >>>>>>>>>>> enough to see from the example. >>>>>>>>>>> >>>>>>>>>>> Another question is how does one do the discovery with exact >>>>>>>>>>>match >>>>>>>>>>> names, >>>>>>>>>>> which is also conflating things. You could do a different >>>>>>>>>>>discovery >>>>>>>>>>> with >>>>>>>>>>> continuation names too, just not the exclude method. >>>>>>>>>>> >>>>>>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>>>>>about >>>>>>>>>>> its >>>>>>>>>>> ?table of contents? for a prefix so one can get a consistent >>>>>>>>>>>set >>>>>>>>>>>of >>>>>>>>>>> results >>>>>>>>>>> without all the round-trips of exclusions. Actually >>>>>>>>>>>downloading >>>>>>>>>>>the >>>>>>>>>>> ?headers? of the messages would be the same bytes, more or >>>>>>>>>>>less. >>>>>>>>>>>In >>>>>>>>>>> a >>>>>>>>>>> way, >>>>>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, >>>>>>>>>>>but >>>>>>>>>>>that >>>>>>>>>>> protocol has its own set of problems and I?m not suggesting to >>>>>>>>>>>use >>>>>>>>>>> that >>>>>>>>>>> directly. >>>>>>>>>>> >>>>>>>>>>> One approach is to encode a request in a name component and a >>>>>>>>>>> participating >>>>>>>>>>> cache can reply. It replies in such a way that one could >>>>>>>>>>>continue >>>>>>>>>>> talking >>>>>>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>>>>> interest >>>>>>>>>>> with >>>>>>>>>>> a request for not-that-cache. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>>>>scenario? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Another approach is to try to ask the authoritative source for >>>>>>>>>>>the >>>>>>>>>>> ?current? >>>>>>>>>>> manifest name, i.e. /mail/inbox/current/, which could >>>>>>>>>>>return >>>>>>>>>>> the >>>>>>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>>>>> manifest >>>>>>>>>>> from >>>>>>>>>>> the link could come from caches because you how have a >>>>>>>>>>>consistent >>>>>>>>>>> set of >>>>>>>>>>> names to ask for. If you cannot talk with an authoritative >>>>>>>>>>>source, >>>>>>>>>>> you >>>>>>>>>>> could try again without the nonce and see if there?s a cached >>>>>>>>>>>copy >>>>>>>>>>> of a >>>>>>>>>>> recent version around. >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>>>>>> >>>>>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>>>>>see a >>>>>>>>>>> pattern with static (/mail/inbox) and variable (148) >>>>>>>>>>>components; >>>>>>>>>>>with >>>>>>>>>>> proper naming convention, computers can also detect this >>>>>>>>>>>pattern >>>>>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>>>>generate >>>>>>>>>>> a >>>>>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>>>>> selectors >>>>>>>>>>> I can further refine my guesses. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>>>>> application >>>>>>>>>>> design). You have an app (a mail server / inbox) and you want >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>> list >>>>>>>>>>> your emails? An email list is an application data structure. >>>>>>>>>>>I >>>>>>>>>>> don?t >>>>>>>>>>> think you should use the network structure to reflect this. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think Tai-Lin is trying to sketch a small example, not >>>>>>>>>>>propose >>>>>>>>>>>a >>>>>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Another way to look at it is that if the network architecture >>>>>>>>>>>is >>>>>>>>>>> providing >>>>>>>>>>> the equivalent of distributed storage to the application, >>>>>>>>>>>perhaps >>>>>>>>>>>the >>>>>>>>>>> application data structure could be adapted to match the >>>>>>>>>>>affordances >>>>>>>>>>> of >>>>>>>>>>> the network. Then it would not be so bad that the two >>>>>>>>>>>structures >>>>>>>>>>> were >>>>>>>>>>> aligned. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I?ll give you an example, how do you delete emails from your >>>>>>>>>>>inbox? >>>>>>>>>>> If >>>>>>>>>>> an >>>>>>>>>>> email was cached in the network it can never be deleted from >>>>>>>>>>>your >>>>>>>>>>> inbox? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is conflating two issues - what you are pointing out is >>>>>>>>>>>that >>>>>>>>>>>the >>>>>>>>>>> data >>>>>>>>>>> structure of a linear list doesn't handle common email >>>>>>>>>>>management >>>>>>>>>>> operations well. Again, I'm not sure if that's what he was >>>>>>>>>>>getting >>>>>>>>>>> at >>>>>>>>>>> here. But deletion is not the issue - the availability of a >>>>>>>>>>>data >>>>>>>>>>> object >>>>>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>>>>> perspective >>>>>>>>>>> of the application. >>>>>>>>>>> >>>>>>>>>>> Or moved to another mailbox? Do you rely on the emails >>>>>>>>>>>expiring? >>>>>>>>>>> >>>>>>>>>>> This problem is true for most (any?) situations where you use >>>>>>>>>>>network >>>>>>>>>>> name >>>>>>>>>>> structure to directly reflect the application data structure. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Not sure I understand how you make the leap from the example to >>>>>>>>>>>the >>>>>>>>>>> general statement. >>>>>>>>>>> >>>>>>>>>>> Jeff >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Nacho >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>>>>>> >>>>>>>>>>> Ok, yes I think those would all be good things. >>>>>>>>>>> >>>>>>>>>>> One thing to keep in mind, especially with things like time >>>>>>>>>>>series >>>>>>>>>>> sensor >>>>>>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>>>>> That?s >>>>>>>>>>> easy >>>>>>>>>>> for a human :) But in Discovery, one should assume that one >>>>>>>>>>>does >>>>>>>>>>>not >>>>>>>>>>> know >>>>>>>>>>> of patterns in the data beyond what the protocols used to >>>>>>>>>>>publish >>>>>>>>>>>the >>>>>>>>>>> data >>>>>>>>>>> explicitly require. That said, I think some of the things you >>>>>>>>>>>listed >>>>>>>>>>> are >>>>>>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>>>>> genome >>>>>>>>>>> data. >>>>>>>>>>> >>>>>>>>>>> We also need to state what the forwarding strategies are and >>>>>>>>>>>what >>>>>>>>>>>the >>>>>>>>>>> cache >>>>>>>>>>> behavior is. >>>>>>>>>>> >>>>>>>>>>> I outlined some of the points that I think are important in >>>>>>>>>>>that >>>>>>>>>>> other >>>>>>>>>>> posting. While ?discover latest? is useful, ?discover all? is >>>>>>>>>>>also >>>>>>>>>>> important, and that one gets complicated fast. So points like >>>>>>>>>>> separating >>>>>>>>>>> discovery from retrieval and working with large data sets have >>>>>>>>>>>been >>>>>>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>>>>>> starting >>>>>>>>>>> from 0 and working through the Discovery service definition >>>>>>>>>>>from >>>>>>>>>>> scratch >>>>>>>>>>> along with data set use cases. >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Marc, >>>>>>>>>>> >>>>>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one >>>>>>>>>>>step >>>>>>>>>>> more >>>>>>>>>>> specific, which was to see if we could identify a few specific >>>>>>>>>>>use >>>>>>>>>>> cases >>>>>>>>>>> around which to have the conversation. (e.g., time series >>>>>>>>>>>sensor >>>>>>>>>>> data >>>>>>>>>>> and >>>>>>>>>>> web content retrieval for "get latest"; climate data for huge >>>>>>>>>>>data >>>>>>>>>>> sets; >>>>>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>>>>looking >>>>>>>>>>> at >>>>>>>>>>> that's driving considerations of discovery? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Jeff >>>>>>>>>>> >>>>>>>>>>> From: >>>>>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>>>>> To: Jeff Burke >>>>>>>>>>> Cc: , >>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>>> >>>>>>>>>>> Jeff, >>>>>>>>>>> >>>>>>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>>>>>> Discovery. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-Septemb >>>>>>>>>>>er >>>>>>>>>>>/0 >>>>>>>>>>>00 >>>>>>>>>>> 20 >>>>>>>>>>> 0 >>>>>>>>>>> .html >>>>>>>>>>> >>>>>>>>>>> I think it would be very productive to talk about what >>>>>>>>>>>Discovery >>>>>>>>>>> should >>>>>>>>>>> do, >>>>>>>>>>> and not focus on the how. It is sometimes easy to get caught >>>>>>>>>>>up >>>>>>>>>>>in >>>>>>>>>>> the >>>>>>>>>>> how, >>>>>>>>>>> which I think is a less important topic than the what at this >>>>>>>>>>>stage. >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Marc, >>>>>>>>>>> >>>>>>>>>>> If you can't talk about your protocols, perhaps we can discuss >>>>>>>>>>>this >>>>>>>>>>> based >>>>>>>>>>> on use cases. What are the use cases you are using to >>>>>>>>>>>evaluate >>>>>>>>>>> discovery? >>>>>>>>>>> >>>>>>>>>>> Jeff >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>>>>forwarder >>>>>>>>>>> can >>>>>>>>>>> send interests different ways you don't have a consistent >>>>>>>>>>>underlying >>>>>>>>>>> set >>>>>>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>>>>> discover >>>>>>>>>>> every version. >>>>>>>>>>> >>>>>>>>>>> Range exclusions only work I believe if you get an >>>>>>>>>>>authoritative >>>>>>>>>>> answer. >>>>>>>>>>> If different content pieces are scattered between different >>>>>>>>>>>caches >>>>>>>>>>>I >>>>>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>>>>version. >>>>>>>>>>> >>>>>>>>>>> I'm sorry to be pointing out problems without offering >>>>>>>>>>>solutions >>>>>>>>>>>but >>>>>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>>>>> >>>>>>>>>>> Sent from my telephone >>>>>>>>>>> >>>>>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>> I see. Can you briefly describe how ccnx discovery protocol >>>>>>>>>>>solves >>>>>>>>>>> the >>>>>>>>>>> all problems that you mentioned (not just exclude)? a doc will >>>>>>>>>>>be >>>>>>>>>>> better. >>>>>>>>>>> >>>>>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I >>>>>>>>>>>will >>>>>>>>>>> soon >>>>>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>>>>> Regular >>>>>>>>>>> language or context free language might become part of selector >>>>>>>>>>>too. >>>>>>>>>>> >>>>>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>>>>>> That will get you one reading then you need to exclude it and >>>>>>>>>>>ask >>>>>>>>>>> again. >>>>>>>>>>> >>>>>>>>>>> Sent from my telephone >>>>>>>>>>> >>>>>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>set >>>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>versions >>>>>>>>>>> of an object. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am very confused. For your example, if I want to get all >>>>>>>>>>>today's >>>>>>>>>>> sensor data, I just do (Any..Last second of last day)(First >>>>>>>>>>>second >>>>>>>>>>>of >>>>>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>>>>> >>>>>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>>>>> >>>>>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>>>>> could miss content objects you want to discovery unless you >>>>>>>>>>>avoid >>>>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Could you explain why missing content object situation happens? >>>>>>>>>>>also >>>>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>>>> exclude; >>>>>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>>>>> possible. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>set >>>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>versions >>>>>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>>>>updated, >>>>>>>>>>> say, once per second you will have 86,400 of them per day. If >>>>>>>>>>>each >>>>>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>>>>> >>>>>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>>>>> timestamp makes sense here, but its just an example of needing >>>>>>>>>>>a >>>>>>>>>>>lot >>>>>>>>>>> of exclusions. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You exclude through 100 then issue a new interest. This goes >>>>>>>>>>>to >>>>>>>>>>> cache B >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>>>>> interest, and cache A will return v101 if it exists. Like you >>>>>>>>>>>said, >>>>>>>>>>> if >>>>>>>>>>> this goes to cache B only, it means that cache A dies. How do >>>>>>>>>>>you >>>>>>>>>>> know >>>>>>>>>>> that v101 even exist? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I guess this depends on what the forwarding strategy is. If >>>>>>>>>>>the >>>>>>>>>>> forwarder will always send each interest to all replicas, then >>>>>>>>>>>yes, >>>>>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>>>>>cache >>>>>>>>>>> A and cache B, then your application could miss v101. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>>>>number >>>>>>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>>>>>only >>>>>>>>>>> one RWMutex for the whole tree. I don't know whether adding >>>>>>>>>>>lock >>>>>>>>>>>for >>>>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>>>> >>>>>>>>>>> However, we should compare (exact match + discovery protocol) >>>>>>>>>>>vs >>>>>>>>>>> (ndn >>>>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yes, we should compare them. And we need to publish the ccnx >>>>>>>>>>>1.0 >>>>>>>>>>> specs for doing the exact match discovery. So, as I said, I?m >>>>>>>>>>>not >>>>>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>>>>> matching to do discovery has its own set of problems. >>>>>>>>>>>Discovery >>>>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>>>> >>>>>>>>>>> This is probably getting off-topic from the original post about >>>>>>>>>>> naming conventions. >>>>>>>>>>> >>>>>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>>>>> content objects you want to discovery unless you avoid all >>>>>>>>>>>range >>>>>>>>>>> exclusions and only exclude explicit versions. That will lead >>>>>>>>>>>to >>>>>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>>>> consistent sets is better. >>>>>>>>>>> >>>>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>>>> should be transitive between caches, but imagine this. You >>>>>>>>>>>send >>>>>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>>>>> through 100 then issue a new interest. This goes to cache B >>>>>>>>>>>who >>>>>>>>>>> only has version 99, so the interest times out or is NACK?d. >>>>>>>>>>>So >>>>>>>>>>> you think you have it! But, cache A already has version 101, >>>>>>>>>>>you >>>>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>>>> consistent sets, it seems like even doing latest version >>>>>>>>>>>discovery >>>>>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>>>>> authoritative source because you can never believe an >>>>>>>>>>>intermediate >>>>>>>>>>> cache that there?s not something more recent. >>>>>>>>>>> >>>>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>>>>> correctly discover every version of a name prefix, and (b) is >>>>>>>>>>>that >>>>>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>>>>> formally compared (or yet published) our discovery protocols >>>>>>>>>>>(we >>>>>>>>>>> have three, 2 for content, 1 for device) compared to selector >>>>>>>>>>>based >>>>>>>>>>> discovery, so I cannot yet claim they are better, but they do >>>>>>>>>>>not >>>>>>>>>>> have the non-determinism sketched above. >>>>>>>>>>> >>>>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups >>>>>>>>>>>you >>>>>>>>>>> must do in the PIT to match a content object. If you have a >>>>>>>>>>>name >>>>>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>>>>> lookups, but you need to walk up the name tree for every prefix >>>>>>>>>>>of >>>>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>>>> Content Based Networking (CBN) had some some methods to create >>>>>>>>>>>data >>>>>>>>>>> structures based on predicates, maybe those would be better. >>>>>>>>>>>But >>>>>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>>>>entries >>>>>>>>>>> if there is Interest traffic for many prefixes of a root. Even >>>>>>>>>>>on >>>>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll >>>>>>>>>>>have a >>>>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>>>> implementation only requires at most 3 lookups (one by name, >>>>>>>>>>>one >>>>>>>>>>>by >>>>>>>>>>> name + keyid, one by name + content object hash), and one can >>>>>>>>>>>do >>>>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>>>> >>>>>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning >>>>>>>>>>>LWP) >>>>>>>>>>> and that will be expensive. It would be interesting to see >>>>>>>>>>>what >>>>>>>>>>>a >>>>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I had thought about these questions, but I want to know your >>>>>>>>>>>idea >>>>>>>>>>> besides typed component: >>>>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>>>> things? >>>>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>>>> other >>>>>>>>>>> faster technique to replace selector? >>>>>>>>>>> 3. fixed byte length and type. I agree more that type can be >>>>>>>>>>>fixed >>>>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Could you share it with us? >>>>>>>>>>> >>>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>>> >>>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>>> >>>>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>>> likelihood there will be at least as much and probably more >>>>>>>>>>>need >>>>>>>>>>> to component types than protocol extensions, we could reserve >>>>>>>>>>>1/2 >>>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>>>> and other fields of the protocol (sine they are sub-types of >>>>>>>>>>>the >>>>>>>>>>> name type) we could reuse numbers and thereby have an entire >>>>>>>>>>>65K >>>>>>>>>>> name component types. >>>>>>>>>>> >>>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>>> standard, IANA has 25 years of experience running registries >>>>>>>>>>>and >>>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>>>> approval). >>>>>>>>>>> >>>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>>> cases where there are no special semantics on the name >>>>>>>>>>>component. >>>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>>> globally understood types that are part of the base or >>>>>>>>>>>extension >>>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>>> (say another 1024 types) >>>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>>> >>>>>>>>>>> Make sense? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>> performance flaws in the design >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> we could design for performance, >>>>>>>>>>> >>>>>>>>>>> That?s not what people are advocating. We are advocating that >>>>>>>>>>>we >>>>>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>>> >>>>>>>>>>> but I think there will be a turning >>>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>>> >>>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>>> things that don?t get faster while others do tend to get >>>>>>>>>>>dropped >>>>>>>>>>> or not used because they impose a performance penalty relative >>>>>>>>>>>to >>>>>>>>>>> the things that go faster. There is also the ?low-end? >>>>>>>>>>>phenomenon >>>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>>>> performance just never get better. >>>>>>>>>>> >>>>>>>>>>> Do you >>>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>>> performance improvement? >>>>>>>>>>> >>>>>>>>>>> I suspect LPM on data will always be slow (relative to the >>>>>>>>>>>other >>>>>>>>>>> functions). >>>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>>> require extra memory references. >>>>>>>>>>> >>>>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>>>> speculation based on 35+ years of seeing performance improve >>>>>>>>>>>by 4 >>>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>>> cycles and memory references? >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>>> perform >>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>> becomes >>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>> >>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>> performance flaws in the design: >>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>>> near-zero lock contention. >>>>>>>>>>> >>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>>> poster children for this are: >>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>>> anywhere >>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>>>>> in working around. >>>>>>>>>>> >>>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>>> engineers that got most of it right. >>>>>>>>>>> >>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>>> Now I >>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>> types >>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>> additional >>>>>>>>>>> marker space >>>>>>>>>>> >>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>> >>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>>> resilient against aliasing. >>>>>>>>>>> >>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>> convention >>>>>>>>>>> needs to be revised. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>> wrote: >>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>> to fit in (the >>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>> current NDN >>>>>>>>>>> experiments? >>>>>>>>>>> >>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>> Related: Many URLs >>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>> text lines, and >>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>> I see. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>> >>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>> architectures, >>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>> bytes in memory, >>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>> 32-byte blocks >>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>> If you need to >>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>> have to read past >>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>> to get to the >>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Does this make that much difference? >>>>>>>>>>> >>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>> it is: >>>>>>>>>>> >>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>> from the start >>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>> OR >>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>> >>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>> seem to imply >>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>> case. >>>>>>>>>>> >>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>> for the index, >>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>> components. For the >>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>> 5 components. >>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>> than >>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>> find that the >>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>> performance of the >>>>>>>>>>> direct access. >>>>>>>>>>> >>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>> all, for >>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>> offsets as >>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>> you may see a >>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>> than in >>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>> that you?re >>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>> name is to be >>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>> performance >>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>> bet this is not >>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>> >>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>> protocol with >>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>> the name to a >>>>>>>>>>> different processor than the index). >>>>>>>>>>> >>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>> would like to see >>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>> >>>>>>>>>>> Nacho >>>>>>>>>>> >>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>> correct me if >>>>>>>>>>> I?m wrong.) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>> Protocol Architect >>>>>>>>>>> Principal Scientist >>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>> amount of >>>>>>>>>>> bytes: >>>>>>>>>>> >>>>>>>>>>> i.e., >>>>>>>>>>> Type = Offsets >>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>> >>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>> name is too >>>>>>>>>>> long. >>>>>>>>>>> >>>>>>>>>>> Max >>>>>>>>>>> >>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>> >>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>> you only >>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>> the >>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>> iteratively parse >>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>> directly >>>>>>>>>>> access to the firs x components. >>>>>>>>>>> >>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>> "offset" is >>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>> parse x-1 >>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>> >>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>> like the >>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>> understand what >>>>>>>>>>> you >>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>> entirely >>>>>>>>>>> different >>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>> ... >>>>>>>>>>> someplace >>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>> you say >>>>>>>>>>> "field >>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>> TLV)? >>>>>>>>>>> >>>>>>>>>>> Correct. >>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>> name >>>>>>>>>>> hierarchy >>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>> offset to use >>>>>>>>>>> in >>>>>>>>>>> order to retrieve special components. >>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>> Aliasing is >>>>>>>>>>> avoided as >>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>> use the >>>>>>>>>>> "offset >>>>>>>>>>> TLV " to do that. >>>>>>>>>>> >>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>> >>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>> you only >>>>>>>>>>> want >>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>> offsets. >>>>>>>>>>> With the >>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>> x-1 >>>>>>>>>>> components. >>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>> firs x >>>>>>>>>>> components. >>>>>>>>>>> >>>>>>>>>>> Max >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- Mark >>>>>>>>>>> >>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>> >>>>>>>>>>> The why is simple: >>>>>>>>>>> >>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>> "specific >>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>> in order >>>>>>>>>>> to >>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>> rule >>>>>>>>>>> (specify >>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>> >>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>> prefer >>>>>>>>>>> to >>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>> separator. Then, >>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>> used at >>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>> that >>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>> version, >>>>>>>>>>> segment, etc in the name... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Max >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>> >>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>> >>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>> up with >>>>>>>>>>> names >>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>> components >>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>> is an >>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>> component's >>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>> that >>>>>>>>>>> applications MUST use). >>>>>>>>>>> >>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>> explanation >>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>> e.g.) >>>>>>>>>>> and >>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>> have to >>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>> prefer >>>>>>>>>>> the >>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>> preference in >>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>> it is >>>>>>>>>>> that >>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>> >>>>> >>>>>_______________________________________________ >>>>>Ndn-interest mailing list >>>>>Ndn-interest at lists.cs.ucla.edu >>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>> >>>> _______________________________________________ >>>> Ndn-interest mailing list >>>> Ndn-interest at lists.cs.ucla.edu >>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> >>>_______________________________________________ >>>Ndn-interest mailing list >>>Ndn-interest at lists.cs.ucla.edu >>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >> From tailinchu at gmail.com Sat Sep 27 15:09:32 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 27 Sep 2014 15:09:32 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: > I?m not sure what you mean by trust to the cache. NDN has no trust to the cache and no way to trust that a selector match is the correct match. ndn does not allow cache to publish new data under other provider's prefix, i.e, table of content, but your discovery protocol is doing this. On Sat, Sep 27, 2014 at 2:44 PM, wrote: > On 9/27/14, 10:19 PM, "Tai-Lin Chu" wrote: > >>The concern is that the table of content is not confirmed by the >>original provider; the cache server's data is "trusted with some other >>chains". This trust model works but brings restriction. It basically >>requires build another trust model on "cache server"; otherwise, >>nothing discovered can be trusted, which also means that you discover >>nothing. > > I?m not sure what you mean by trust to the cache. NDN has no trust to the > cache and no way to trust that a selector match is the correct match. > > As I know, a cache can have > > /foo/1 > /foo/2 > /foo/3 > > At could reply with /foo/2 and not give you /foo/3 as the ?latest?. You > have no way to trust that a cache will give you anything specific. You > can?t really require this because you can?t require cache nodes to have a > specific cache replacement policy, so as far as you know, the cache could > have dropped /foo/3 from the cache. > > As a matter of fact, unless you require signature verification at cache > nodes (CCN requires this), you don?t even have that. From what I?ve been > told, it?s optional for nodes to check for signatures. So, at any point > in the network, you never know if previous nodes have verified the > signature. > > So, I?m not sure what kind of ?trust model? you refer to. Is there some > trust model has that this Selector Protocol breaks at the nodes that run > the Selector Protocol? If so, could you please explain it. > > >>Another critical point is that those cache servers are not >>hierarchical, so we can only apply flat signing (one guy signs them >>all.) This looks very problematic. An quick fix is that you just use >>impose the name hierarchy, but it is cumbersome too. > > Nobody really cares about the signature of the reply. You care about > what?s encapsulated inside, which, in fact, does authenticate to the > selector request. Every node running the Selector Protocol can check this > reply and this signature. > >>Here is our discussion so far: >>exact matching -> ... needs discovery protocol (because it is not lpm) >>-> discovery needs table of content -> restrictive trust model >>My argument is that this restrictive trust model logically discourages >>exact matching. > > I?m not sure what to make of this. > > Every system needs a discovery protocol. NDN is doing it via selector > matching at the forwarder. CCN does it at a layer above that. We don?t > believe you should force nodes to let their caches be discoverable and to > run the computation needed for this. > > There is no restrictive trust model. In CCN we don?t do anything of what > I?ve described because we don?t do Selector Protocol. The Selector > Protocol I?m just described is meant to give you the same semantics as NDN > using exact matching. this includes the security model. Just because the > ?layer underneath? (aka CCN) does not do the same security model doesn?t > mean that the protocol doesn?t deliver it to you. > > It seems to me that you?d be hard pressed to find a feature difference > between NDN selectors and the CCN nodes running the Selector Protocol I > described. > > > Let me go over it once again: > > > Network: > > A - - - B - - C - - D > E - F - + > > > A, B, D, E and F are running the Selector Protocol. C is not. > > D is serving content for /foo > B has a copy of /foo/100, signed by D > > Node A wants something that starts with /foo/ but has a next component > greater than 50 > > > > A issues an interest: > Interest: name = /foo/hash(Sel(>50)) > Payload = Sel(>50) > > > Interest arrives at B. B notices that it?s a Selector Protocol based > interest. > It interprets the payload, looks at the cache and finds /foo/100 as a > match. > It generates a reply. > > Data: > Name = /foo/hash(Sel(>50)) > Payload = ( Data: Name = /foo/100, Signature = Node D, Payload = data) > Signature = Node B > > That data is sent to node A. > > A is running the Selector Protocol. > It notices that this reply is a Selector Protocol reply. > It decapsulates the Payload. It extracts /foo/100. > It checks that /foo/100 is signed by Node D. > > A?s interest is satisfied. > > > A issues new interest (for something newer, it wasn?t satisfied with 100). > > A issues an interest: > Interest: name = /foo/hash(Sel(>100)) > Payload = Sel(>100) > > Sends interest to B. > > B knows it?s a Selector Protocol interest. > Parses the payload for the selectors. It looks at the cache, finds no > match. > > B sends the interest to C > > > C doesn?t understand Selector Protocol. It just does exact matching. Finds > no match. > It forwards the interest to node D. > > > D is running the Selector Protocol. > D looks at the data to see what?s the latest one. It?s 200. > D creates encapsulated reply. > > Data: > Name = /foo/hash(Sel(>100)) > Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) > Signature = Node D > > > Forwards the data to C. > > C doesn?t know Selector Protocol. It matches the Data to the Interest > based on exact match of the name > /foo/hash(Sel(>100)). It may or may not cache the data. > > C forwards the data to B. > > B is running the Selector Protocol. It matches the data to the interest > based on PIT. It then proceeds to check that the selectors actually > match. It check that /foo/200 is greater than /foo/100. The check > passes. It decides to keep a copy of /foo/200 in its cache. > > Node B forwards the data to Node A, which receives it. Node A is running > the Selector Protocol. It decapsulates the data, checks the authenticity > and hands it to the app. > > > > Node E wants some of the /foo data, but only with the right signature. > > Node E issues an interest: > Interest: name = /foo/hash(Sel(Key=NodeD)) > Payload = Sel(Key=NodeD) > > > > Sends it to Node F. > > > F receives the interest. It knows it?s a Selector Protocol interest. > Parses payload, looks in cache but finds no match. > > F forwards the interest to node B. > > B receives the interest. It knows it?s a Selector Protocol interest. > Parses payload, looks in the cache and finds a match (namely /foo/200). > /foo/200 checks out since it is signed by Node D. > > Node B creates a reply by encapsulating /foo/200: > Name = /foo/hash(Sel(Key=NodeD)) > Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) > Signature = Node B > > It sends the data to F. > > Node F is running the Selector Protocol. It sees the reply. It > decapsulates the object inside (/foo/200). It knows that this PIT entry > has selectors and requires that the signature come from Node D. It checks > that the signature of /foo/200 is from node D. It is. This is a valid > reply to the interest, so it forwards the data along to node E and > consumes the interest. Node F keeps a copy of the /foo/200 object. > > Node E receives the object. Matches it to the PIT. Decapsulates the data > (since E is running the Selector Protocol), matches it to the selectors > and once checked sends it to the application. > > > > Done. > > In this scenario, most nodes were running the Selector Protocol. But it?s > possible for some nodes not to run it. Those nodes would only do exact > matching (like node C). In this example, Node C kept a copy of the > packet /foo/hash(Sel(>100)) (which encapsulated /foo/200), it could use > this as a reply to another interest with the same name, but it wouldn?t be > able to use this to answer a selector of /foo/hash(Sel(>150)) since that > would require selector parsing. That request would just be forwarded. > > > To summarize, nodes running the Selector Protocol behave like NDN nodes. > The rest of the other nodes can do regular CCN with exact matching. > > Again, we are not advocating for this discovery protocol, we are just > saying that you could implement the selector functionality on top of exact > matching. Those nodes that wanted to run the protocol would be able to do > so, and those that did not want to run the protocol would not be required > to do so. > > Nacho > > > >>On Sat, Sep 27, 2014 at 12:40 PM, wrote: >>> On 9/27/14, 8:40 PM, "Tai-Lin Chu" wrote: >>> >>>>> /mail/inbox/selector_matching/ >>>> >>>>So Is this implicit? >>> >>> No. This is an explicit hash of the interest payload. >>> >>> So, an interest could look like: >>> >>> Interest: >>> name = /mail/inbox/selector_matching/1234567890 >>> payload = ?user=nacho? >>> >>> where hash(?user=nacho?) = 1234567890 >>> >>> >>>>BTW, I read all your replies. I think the discovery protocol (send out >>>>table of content) has to reach the original provider ; otherwise there >>>>will be some issues in the trust model. At least the cached table of >>>>content has to be confirmed with the original provider either by key >>>>delegation or by other confirmation protocol. Besides this, LGTM. >>> >>> >>> The trust model is just slightly different. >>> >>> You could have something like: >>> >>> Interest: >>> name = /mail/inbox/selector_matching/1234567890 >>> payload = ?user=nacho,publisher=mail_server_key? >>> >>> >>> In this case, the reply would come signed by some random cache, but the >>> encapsulated object would be signed by mail_server_key. So, any node >>>that >>> understood the Selector Protocol could decapsulate the reply and check >>>the >>> signature. >>> >>> Nodes that do not understand the Selector Protocol would not be able to >>> check the signature of the encapsulated answer. >>> >>> This to me is not a problem. Base nodes (the ones not running the >>>Selector >>> Protocol) would not be checking signatures anyway, at least not in the >>> fast path. This is an expensive operation that requires the node to get >>> the key, etc. Nodes that run the Selector Protocol can check signatures >>> if they wish (and can get their hands on a key). >>> >>> >>> >>> Nacho >>> >>> >>>>On Sat, Sep 27, 2014 at 1:10 AM, wrote: >>>>> On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" >>>>>wrote: >>>>> >>>>>>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>>>>>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" >>>>>>>wrote: >>>>>>> >>>>>>>> How can a cache respond to /mail/inbox/selector_matching/>>>>>>> payload> with a table of content? This name prefix is owned by the >>>>>>>>mail >>>>>>>> server. Also the reply really depends on what is in the cache at >>>>>>>>the >>>>>>>> moment, so the same name would correspond to different data. >>>>>>> >>>>>>> A - Yes, the same name would correspond to different data. This is >>>>>>>true >>>>>>> given that then data has changed. NDN (and CCN) has no architectural >>>>>>> requirement that a name maps to the same piece of data (Obviously >>>>>>>not >>>>>>> talking about self certifying hash-based names). >>>>>> >>>>>>There is a difference. A complete NDN name including the implicit >>>>>>digest >>>>>>uniquely identifies a piece of data. >>>>> >>>>> That?s the same thing for CCN with a ContentObjectHash. >>>>> >>>>> >>>>>>But here the same complete name may map to different data (I suppose >>>>>>you >>>>>>don't have implicit digest in an effort to do exact matching). >>>>> >>>>> We do, it?s called ContentObjectHash, but it?s not considered part of >>>>>the >>>>> name, it?s considered a matching restriction. >>>>> >>>>> >>>>>>In other words, in your proposal, the same name >>>>>>/mail/inbox/selector_matching/hash1 may map to two or more different >>>>>>data >>>>>>packets. But in NDN, two Data packets may share a name prefix, but >>>>>>definitely not the implicit digest. And at least it is my >>>>>>understanding >>>>>>that the application design should make sure that the same producer >>>>>>doesn't produce different Data packets with the same name prefix >>>>>>before >>>>>>implicit digest. >>>>> >>>>> This is an application design issue. The network cannot enforce this. >>>>> Applications will be able to name various data objects with the same >>>>>name. >>>>> After all, applications don?t really control the implicit digest. >>>>> >>>>>>It is possible in attack scenarios for different producers to generate >>>>>>Data packets with the same name prefix before implicit digest, but >>>>>>still >>>>>>not the same implicit digest. >>>>> >>>>> Why is this an attack scenario? Isn?t it true that if I name my >>>>>local >>>>> printer /printer that name can exist in the network at different >>>>>locations >>>>> from different publishers? >>>>> >>>>> >>>>> Just to clarify, in the examples provided we weren?t using implicit >>>>>hashes >>>>> anywhere. IF we were using implicit hashes (as in, we knew what the >>>>> implicit hash was), then selectors are useless. If you know the >>>>>implicit >>>>> hash, then you don?t need selectors. >>>>> >>>>> In the case of CCN, we use names without explicit hashes for most of >>>>>our >>>>> initial traffic (discovery, manifests, dynamically generated data, >>>>>etc.), >>>>> but after that, we use implicit digests (ContentObjectHash >>>>>restriction) >>>>> for practically all of the other traffic. >>>>> >>>>> Nacho >>>>> >>>>> >>>>>>> >>>>>>> B - Yes, you can consider the name prefix is ?owned? by the server, >>>>>>>but >>>>>>> the answer is actually something that the cache is choosing. The >>>>>>>cache >>>>>>>is >>>>>>> choosing from the set if data that it has. The data that it >>>>>>>encapsulates >>>>>>> _is_ signed by the producer. Anybody that can decapsulate the data >>>>>>>can >>>>>>> verify that this is the case. >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> >>>>>>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>>>>>> >>>>>>>>> My beating on ?discover all? is exactly because of this. Let?s >>>>>>>>>define >>>>>>>>> discovery service. If the service is just ?discover latest? >>>>>>>>> (left/right), can we not simplify the current approach? If the >>>>>>>>>service >>>>>>>>> includes more than ?latest?, then is the current approach the >>>>>>>>>right >>>>>>>>> approach? >>>>>>>>> >>>>>>>>> Sync has its place and is the right solution for somethings. >>>>>>>>>However, >>>>>>>>> it should not be a a bandage over discovery. Discovery should be >>>>>>>>>its >>>>>>>>> own valid and useful service. >>>>>>>>> >>>>>>>>> I agree that the exclusion approach can work, and work relatively >>>>>>>>>well, >>>>>>>>> for finding the rightmost/leftmost child. I believe this is >>>>>>>>>because >>>>>>>>> that operation is transitive through caches. So, within whatever >>>>>>>>> timeout an application is willing to wait to find the ?latest?, it >>>>>>>>>can >>>>>>>>> keep asking and asking. >>>>>>>>> >>>>>>>>> I do think it would be best to actually try to ask an >>>>>>>>>authoritative >>>>>>>>> source first (i.e. a non-cached value), and if that fails then >>>>>>>>>probe >>>>>>>>> caches, but experimentation may show what works well. This is >>>>>>>>>based >>>>>>>>>on >>>>>>>>> my belief that in the real world in broad use, the namespace will >>>>>>>>>become >>>>>>>>> pretty polluted and probing will result in a lot of junk, but >>>>>>>>>that?s >>>>>>>>> future prognosticating. >>>>>>>>> >>>>>>>>> Also, in the exact match vs. continuation match of content object >>>>>>>>>to >>>>>>>>> interest, it is pretty easy to encode that ?selector? request in a >>>>>>>>>name >>>>>>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & >>>>>>>>>sort=right?) >>>>>>>>> and any participating cache can respond with a link (or >>>>>>>>>encapsulate) a >>>>>>>>> response in an exact match system. >>>>>>>>> >>>>>>>>> In the CCNx 1.0 spec, one could also encode this a different way. >>>>>>>>>One >>>>>>>>> could use a name like ?/mail/inbox/selector_matching/>>>>>>>>payload>? >>>>>>>>> and in the payload include "exclude_before=(t=version, l=2, >>>>>>>>>v=279) & >>>>>>>>> sort=right?. This means that any cache that could process the ? >>>>>>>>> selector_matching? function could look at the interest payload and >>>>>>>>> evaluate the predicate there. The predicate could become large >>>>>>>>>and >>>>>>>>>not >>>>>>>>> pollute the PIT with all the computation state. Including ?>>>>>>>>of >>>>>>>>> payload>? in the name means that one could get a cached response >>>>>>>>>if >>>>>>>>> someone else had asked the same exact question (subject to the >>>>>>>>>content >>>>>>>>> object?s cache lifetime) and it also servers to multiplex >>>>>>>>>different >>>>>>>>> payloads for the same function (selector_matching). >>>>>>>>> >>>>>>>>> Marc >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchron >>>>>>>>>>iz >>>>>>>>>>at >>>>>>>>>>io >>>>>>>>>> nPr >>>>>>>>>> otocol.html >>>>>>>>>> >>>>>>>>>> J. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>>>>>>> >>>>>>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>>>>>*all*-value" >>>>>>>>>>> efficiently. >>>>>>>>>>> There are still interesting topics on >>>>>>>>>>> 1. how do we express the discovery query? >>>>>>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>>>>>> discovery query with current selector? >>>>>>>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>>>>>>way? >>>>>>>>>>> >>>>>>>>>>> I personally see a named data as a set, which can then be >>>>>>>>>>>categorized >>>>>>>>>>> into "ordered set", and "unordered set". >>>>>>>>>>> some questions that any discovery expression must solve: >>>>>>>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>>>>>>leaf >>>>>>>>>>> 2. set contains member X? >>>>>>>>>>> 3. is set ordered or not >>>>>>>>>>> 4. (ordered) first, prev, next, last >>>>>>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>>>>>> 6. recursively answer all questions above on any set member >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> From: >>>>>>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>> Cc: , , >>>>>>>>>>>> >>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>>>> >>>>>>>>>>>> I think Tai-Lin?s example was just fine to talk about >>>>>>>>>>>>discovery. >>>>>>>>>>>> /blah/blah/value, how do you discover all the ?value?s? >>>>>>>>>>>>Discovery >>>>>>>>>>>> shouldn?t >>>>>>>>>>>> care if its email messages or temperature readings or world cup >>>>>>>>>>>> photos. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>>>>>>case, >>>>>>>>>>>> as you >>>>>>>>>>>> point out, sync-style approaches may be best. But I am not >>>>>>>>>>>>sure >>>>>>>>>>>>that >>>>>>>>>>>> this >>>>>>>>>>>> definition is complete. The most pressing example that I can >>>>>>>>>>>>think >>>>>>>>>>>> of >>>>>>>>>>>> is >>>>>>>>>>>> best-effort latest-value, in which the consumer's goal is to >>>>>>>>>>>>get >>>>>>>>>>>>the >>>>>>>>>>>> latest >>>>>>>>>>>> copy the network can deliver at the moment, and may not care >>>>>>>>>>>>about >>>>>>>>>>>> previous >>>>>>>>>>>> values or (if freshness is used well) potential later versions. >>>>>>>>>>>> >>>>>>>>>>>> Another case that seems to work well is video seeking. Let's >>>>>>>>>>>>say I >>>>>>>>>>>> want to >>>>>>>>>>>> enable random access to a video by timecode. The publisher can >>>>>>>>>>>> provide a >>>>>>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>>>>>Interest >>>>>>>>>>>> that >>>>>>>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>>>>>>which >>>>>>>>>>>> returns an interest that, via the name, provides the exact >>>>>>>>>>>>timecode >>>>>>>>>>>> of >>>>>>>>>>>> the >>>>>>>>>>>> keyframe in question and a link to a segment-based namespace >>>>>>>>>>>>for >>>>>>>>>>>> efficient >>>>>>>>>>>> exact match playout. In two roundtrips and in a very >>>>>>>>>>>>lightweight >>>>>>>>>>>> way, >>>>>>>>>>>> the >>>>>>>>>>>> consumer has random access capability. If the NDN is the >>>>>>>>>>>>moral >>>>>>>>>>>> equivalent >>>>>>>>>>>> of IP, then I am not sure we should be afraid of roundtrips >>>>>>>>>>>>that >>>>>>>>>>>> provide >>>>>>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I described one set of problems using the exclusion approach, >>>>>>>>>>>>and >>>>>>>>>>>> that >>>>>>>>>>>> an >>>>>>>>>>>> NDN paper on device discovery described a similar problem, >>>>>>>>>>>>though >>>>>>>>>>>> they >>>>>>>>>>>> did >>>>>>>>>>>> not go into the details of splitting interests, etc. That all >>>>>>>>>>>>was >>>>>>>>>>>> simple >>>>>>>>>>>> enough to see from the example. >>>>>>>>>>>> >>>>>>>>>>>> Another question is how does one do the discovery with exact >>>>>>>>>>>>match >>>>>>>>>>>> names, >>>>>>>>>>>> which is also conflating things. You could do a different >>>>>>>>>>>>discovery >>>>>>>>>>>> with >>>>>>>>>>>> continuation names too, just not the exclude method. >>>>>>>>>>>> >>>>>>>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>>>>>>about >>>>>>>>>>>> its >>>>>>>>>>>> ?table of contents? for a prefix so one can get a consistent >>>>>>>>>>>>set >>>>>>>>>>>>of >>>>>>>>>>>> results >>>>>>>>>>>> without all the round-trips of exclusions. Actually >>>>>>>>>>>>downloading >>>>>>>>>>>>the >>>>>>>>>>>> ?headers? of the messages would be the same bytes, more or >>>>>>>>>>>>less. >>>>>>>>>>>>In >>>>>>>>>>>> a >>>>>>>>>>>> way, >>>>>>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, >>>>>>>>>>>>but >>>>>>>>>>>>that >>>>>>>>>>>> protocol has its own set of problems and I?m not suggesting to >>>>>>>>>>>>use >>>>>>>>>>>> that >>>>>>>>>>>> directly. >>>>>>>>>>>> >>>>>>>>>>>> One approach is to encode a request in a name component and a >>>>>>>>>>>> participating >>>>>>>>>>>> cache can reply. It replies in such a way that one could >>>>>>>>>>>>continue >>>>>>>>>>>> talking >>>>>>>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>>>>>> interest >>>>>>>>>>>> with >>>>>>>>>>>> a request for not-that-cache. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>>>>>scenario? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Another approach is to try to ask the authoritative source for >>>>>>>>>>>>the >>>>>>>>>>>> ?current? >>>>>>>>>>>> manifest name, i.e. /mail/inbox/current/, which could >>>>>>>>>>>>return >>>>>>>>>>>> the >>>>>>>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>>>>>> manifest >>>>>>>>>>>> from >>>>>>>>>>>> the link could come from caches because you how have a >>>>>>>>>>>>consistent >>>>>>>>>>>> set of >>>>>>>>>>>> names to ask for. If you cannot talk with an authoritative >>>>>>>>>>>>source, >>>>>>>>>>>> you >>>>>>>>>>>> could try again without the nonce and see if there?s a cached >>>>>>>>>>>>copy >>>>>>>>>>>> of a >>>>>>>>>>>> recent version around. >>>>>>>>>>>> >>>>>>>>>>>> Marc >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>>>>>>> >>>>>>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>>>>>>see a >>>>>>>>>>>> pattern with static (/mail/inbox) and variable (148) >>>>>>>>>>>>components; >>>>>>>>>>>>with >>>>>>>>>>>> proper naming convention, computers can also detect this >>>>>>>>>>>>pattern >>>>>>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>>>>>generate >>>>>>>>>>>> a >>>>>>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>>>>>> selectors >>>>>>>>>>>> I can further refine my guesses. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>>>>>> application >>>>>>>>>>>> design). You have an app (a mail server / inbox) and you want >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>> list >>>>>>>>>>>> your emails? An email list is an application data structure. >>>>>>>>>>>>I >>>>>>>>>>>> don?t >>>>>>>>>>>> think you should use the network structure to reflect this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I think Tai-Lin is trying to sketch a small example, not >>>>>>>>>>>>propose >>>>>>>>>>>>a >>>>>>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Another way to look at it is that if the network architecture >>>>>>>>>>>>is >>>>>>>>>>>> providing >>>>>>>>>>>> the equivalent of distributed storage to the application, >>>>>>>>>>>>perhaps >>>>>>>>>>>>the >>>>>>>>>>>> application data structure could be adapted to match the >>>>>>>>>>>>affordances >>>>>>>>>>>> of >>>>>>>>>>>> the network. Then it would not be so bad that the two >>>>>>>>>>>>structures >>>>>>>>>>>> were >>>>>>>>>>>> aligned. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I?ll give you an example, how do you delete emails from your >>>>>>>>>>>>inbox? >>>>>>>>>>>> If >>>>>>>>>>>> an >>>>>>>>>>>> email was cached in the network it can never be deleted from >>>>>>>>>>>>your >>>>>>>>>>>> inbox? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This is conflating two issues - what you are pointing out is >>>>>>>>>>>>that >>>>>>>>>>>>the >>>>>>>>>>>> data >>>>>>>>>>>> structure of a linear list doesn't handle common email >>>>>>>>>>>>management >>>>>>>>>>>> operations well. Again, I'm not sure if that's what he was >>>>>>>>>>>>getting >>>>>>>>>>>> at >>>>>>>>>>>> here. But deletion is not the issue - the availability of a >>>>>>>>>>>>data >>>>>>>>>>>> object >>>>>>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>>>>>> perspective >>>>>>>>>>>> of the application. >>>>>>>>>>>> >>>>>>>>>>>> Or moved to another mailbox? Do you rely on the emails >>>>>>>>>>>>expiring? >>>>>>>>>>>> >>>>>>>>>>>> This problem is true for most (any?) situations where you use >>>>>>>>>>>>network >>>>>>>>>>>> name >>>>>>>>>>>> structure to directly reflect the application data structure. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Not sure I understand how you make the leap from the example to >>>>>>>>>>>>the >>>>>>>>>>>> general statement. >>>>>>>>>>>> >>>>>>>>>>>> Jeff >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Nacho >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>>>>>>> >>>>>>>>>>>> Ok, yes I think those would all be good things. >>>>>>>>>>>> >>>>>>>>>>>> One thing to keep in mind, especially with things like time >>>>>>>>>>>>series >>>>>>>>>>>> sensor >>>>>>>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>>>>>> That?s >>>>>>>>>>>> easy >>>>>>>>>>>> for a human :) But in Discovery, one should assume that one >>>>>>>>>>>>does >>>>>>>>>>>>not >>>>>>>>>>>> know >>>>>>>>>>>> of patterns in the data beyond what the protocols used to >>>>>>>>>>>>publish >>>>>>>>>>>>the >>>>>>>>>>>> data >>>>>>>>>>>> explicitly require. That said, I think some of the things you >>>>>>>>>>>>listed >>>>>>>>>>>> are >>>>>>>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>>>>>> genome >>>>>>>>>>>> data. >>>>>>>>>>>> >>>>>>>>>>>> We also need to state what the forwarding strategies are and >>>>>>>>>>>>what >>>>>>>>>>>>the >>>>>>>>>>>> cache >>>>>>>>>>>> behavior is. >>>>>>>>>>>> >>>>>>>>>>>> I outlined some of the points that I think are important in >>>>>>>>>>>>that >>>>>>>>>>>> other >>>>>>>>>>>> posting. While ?discover latest? is useful, ?discover all? is >>>>>>>>>>>>also >>>>>>>>>>>> important, and that one gets complicated fast. So points like >>>>>>>>>>>> separating >>>>>>>>>>>> discovery from retrieval and working with large data sets have >>>>>>>>>>>>been >>>>>>>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>>>>>>> starting >>>>>>>>>>>> from 0 and working through the Discovery service definition >>>>>>>>>>>>from >>>>>>>>>>>> scratch >>>>>>>>>>>> along with data set use cases. >>>>>>>>>>>> >>>>>>>>>>>> Marc >>>>>>>>>>>> >>>>>>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Marc, >>>>>>>>>>>> >>>>>>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one >>>>>>>>>>>>step >>>>>>>>>>>> more >>>>>>>>>>>> specific, which was to see if we could identify a few specific >>>>>>>>>>>>use >>>>>>>>>>>> cases >>>>>>>>>>>> around which to have the conversation. (e.g., time series >>>>>>>>>>>>sensor >>>>>>>>>>>> data >>>>>>>>>>>> and >>>>>>>>>>>> web content retrieval for "get latest"; climate data for huge >>>>>>>>>>>>data >>>>>>>>>>>> sets; >>>>>>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>>>>>looking >>>>>>>>>>>> at >>>>>>>>>>>> that's driving considerations of discovery? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Jeff >>>>>>>>>>>> >>>>>>>>>>>> From: >>>>>>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>> Cc: , >>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>>>> >>>>>>>>>>>> Jeff, >>>>>>>>>>>> >>>>>>>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>>>>>>> Discovery. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-Septemb >>>>>>>>>>>>er >>>>>>>>>>>>/0 >>>>>>>>>>>>00 >>>>>>>>>>>> 20 >>>>>>>>>>>> 0 >>>>>>>>>>>> .html >>>>>>>>>>>> >>>>>>>>>>>> I think it would be very productive to talk about what >>>>>>>>>>>>Discovery >>>>>>>>>>>> should >>>>>>>>>>>> do, >>>>>>>>>>>> and not focus on the how. It is sometimes easy to get caught >>>>>>>>>>>>up >>>>>>>>>>>>in >>>>>>>>>>>> the >>>>>>>>>>>> how, >>>>>>>>>>>> which I think is a less important topic than the what at this >>>>>>>>>>>>stage. >>>>>>>>>>>> >>>>>>>>>>>> Marc >>>>>>>>>>>> >>>>>>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Marc, >>>>>>>>>>>> >>>>>>>>>>>> If you can't talk about your protocols, perhaps we can discuss >>>>>>>>>>>>this >>>>>>>>>>>> based >>>>>>>>>>>> on use cases. What are the use cases you are using to >>>>>>>>>>>>evaluate >>>>>>>>>>>> discovery? >>>>>>>>>>>> >>>>>>>>>>>> Jeff >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>>>>>forwarder >>>>>>>>>>>> can >>>>>>>>>>>> send interests different ways you don't have a consistent >>>>>>>>>>>>underlying >>>>>>>>>>>> set >>>>>>>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>>>>>> discover >>>>>>>>>>>> every version. >>>>>>>>>>>> >>>>>>>>>>>> Range exclusions only work I believe if you get an >>>>>>>>>>>>authoritative >>>>>>>>>>>> answer. >>>>>>>>>>>> If different content pieces are scattered between different >>>>>>>>>>>>caches >>>>>>>>>>>>I >>>>>>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>>>>>version. >>>>>>>>>>>> >>>>>>>>>>>> I'm sorry to be pointing out problems without offering >>>>>>>>>>>>solutions >>>>>>>>>>>>but >>>>>>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>>>>>> >>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>> >>>>>>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>>>>>wrote: >>>>>>>>>>>> >>>>>>>>>>>> I see. Can you briefly describe how ccnx discovery protocol >>>>>>>>>>>>solves >>>>>>>>>>>> the >>>>>>>>>>>> all problems that you mentioned (not just exclude)? a doc will >>>>>>>>>>>>be >>>>>>>>>>>> better. >>>>>>>>>>>> >>>>>>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I >>>>>>>>>>>>will >>>>>>>>>>>> soon >>>>>>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>>>>>> Regular >>>>>>>>>>>> language or context free language might become part of selector >>>>>>>>>>>>too. >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>>>>>>> That will get you one reading then you need to exclude it and >>>>>>>>>>>>ask >>>>>>>>>>>> again. >>>>>>>>>>>> >>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>> >>>>>>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>>>>>wrote: >>>>>>>>>>>> >>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>set >>>>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>versions >>>>>>>>>>>> of an object. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am very confused. For your example, if I want to get all >>>>>>>>>>>>today's >>>>>>>>>>>> sensor data, I just do (Any..Last second of last day)(First >>>>>>>>>>>>second >>>>>>>>>>>>of >>>>>>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>>>>>wrote: >>>>>>>>>>>> >>>>>>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>>>>>> could miss content objects you want to discovery unless you >>>>>>>>>>>>avoid >>>>>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Could you explain why missing content object situation happens? >>>>>>>>>>>>also >>>>>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>>>>> exclude; >>>>>>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>>>>>> possible. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>set >>>>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>versions >>>>>>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>>>>>updated, >>>>>>>>>>>> say, once per second you will have 86,400 of them per day. If >>>>>>>>>>>>each >>>>>>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>>>>>> >>>>>>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>>>>>> timestamp makes sense here, but its just an example of needing >>>>>>>>>>>>a >>>>>>>>>>>>lot >>>>>>>>>>>> of exclusions. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You exclude through 100 then issue a new interest. This goes >>>>>>>>>>>>to >>>>>>>>>>>> cache B >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>>>>>> interest, and cache A will return v101 if it exists. Like you >>>>>>>>>>>>said, >>>>>>>>>>>> if >>>>>>>>>>>> this goes to cache B only, it means that cache A dies. How do >>>>>>>>>>>>you >>>>>>>>>>>> know >>>>>>>>>>>> that v101 even exist? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I guess this depends on what the forwarding strategy is. If >>>>>>>>>>>>the >>>>>>>>>>>> forwarder will always send each interest to all replicas, then >>>>>>>>>>>>yes, >>>>>>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>>>>>>cache >>>>>>>>>>>> A and cache B, then your application could miss v101. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>>>>>number >>>>>>>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>>>>>>only >>>>>>>>>>>> one RWMutex for the whole tree. I don't know whether adding >>>>>>>>>>>>lock >>>>>>>>>>>>for >>>>>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>>>>> >>>>>>>>>>>> However, we should compare (exact match + discovery protocol) >>>>>>>>>>>>vs >>>>>>>>>>>> (ndn >>>>>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yes, we should compare them. And we need to publish the ccnx >>>>>>>>>>>>1.0 >>>>>>>>>>>> specs for doing the exact match discovery. So, as I said, I?m >>>>>>>>>>>>not >>>>>>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>>>>>> matching to do discovery has its own set of problems. >>>>>>>>>>>>Discovery >>>>>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>>>>> >>>>>>>>>>>> This is probably getting off-topic from the original post about >>>>>>>>>>>> naming conventions. >>>>>>>>>>>> >>>>>>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>>>>>> content objects you want to discovery unless you avoid all >>>>>>>>>>>>range >>>>>>>>>>>> exclusions and only exclude explicit versions. That will lead >>>>>>>>>>>>to >>>>>>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>>>>> consistent sets is better. >>>>>>>>>>>> >>>>>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>>>>> should be transitive between caches, but imagine this. You >>>>>>>>>>>>send >>>>>>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>>>>>> through 100 then issue a new interest. This goes to cache B >>>>>>>>>>>>who >>>>>>>>>>>> only has version 99, so the interest times out or is NACK?d. >>>>>>>>>>>>So >>>>>>>>>>>> you think you have it! But, cache A already has version 101, >>>>>>>>>>>>you >>>>>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>>>>> consistent sets, it seems like even doing latest version >>>>>>>>>>>>discovery >>>>>>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>>>>>> authoritative source because you can never believe an >>>>>>>>>>>>intermediate >>>>>>>>>>>> cache that there?s not something more recent. >>>>>>>>>>>> >>>>>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>>>>>> correctly discover every version of a name prefix, and (b) is >>>>>>>>>>>>that >>>>>>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>>>>>> formally compared (or yet published) our discovery protocols >>>>>>>>>>>>(we >>>>>>>>>>>> have three, 2 for content, 1 for device) compared to selector >>>>>>>>>>>>based >>>>>>>>>>>> discovery, so I cannot yet claim they are better, but they do >>>>>>>>>>>>not >>>>>>>>>>>> have the non-determinism sketched above. >>>>>>>>>>>> >>>>>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups >>>>>>>>>>>>you >>>>>>>>>>>> must do in the PIT to match a content object. If you have a >>>>>>>>>>>>name >>>>>>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>>>>>> lookups, but you need to walk up the name tree for every prefix >>>>>>>>>>>>of >>>>>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>>>>> Content Based Networking (CBN) had some some methods to create >>>>>>>>>>>>data >>>>>>>>>>>> structures based on predicates, maybe those would be better. >>>>>>>>>>>>But >>>>>>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>>>>>entries >>>>>>>>>>>> if there is Interest traffic for many prefixes of a root. Even >>>>>>>>>>>>on >>>>>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll >>>>>>>>>>>>have a >>>>>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>>>>> implementation only requires at most 3 lookups (one by name, >>>>>>>>>>>>one >>>>>>>>>>>>by >>>>>>>>>>>> name + keyid, one by name + content object hash), and one can >>>>>>>>>>>>do >>>>>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>>>>> >>>>>>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning >>>>>>>>>>>>LWP) >>>>>>>>>>>> and that will be expensive. It would be interesting to see >>>>>>>>>>>>what >>>>>>>>>>>>a >>>>>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>>>>> >>>>>>>>>>>> Marc >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I had thought about these questions, but I want to know your >>>>>>>>>>>>idea >>>>>>>>>>>> besides typed component: >>>>>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>>>>> things? >>>>>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>>>>> other >>>>>>>>>>>> faster technique to replace selector? >>>>>>>>>>>> 3. fixed byte length and type. I agree more that type can be >>>>>>>>>>>>fixed >>>>>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Could you share it with us? >>>>>>>>>>>> >>>>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>>>> >>>>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>>>> >>>>>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>>>> likelihood there will be at least as much and probably more >>>>>>>>>>>>need >>>>>>>>>>>> to component types than protocol extensions, we could reserve >>>>>>>>>>>>1/2 >>>>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>>>>> and other fields of the protocol (sine they are sub-types of >>>>>>>>>>>>the >>>>>>>>>>>> name type) we could reuse numbers and thereby have an entire >>>>>>>>>>>>65K >>>>>>>>>>>> name component types. >>>>>>>>>>>> >>>>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>>>> standard, IANA has 25 years of experience running registries >>>>>>>>>>>>and >>>>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>>>>> approval). >>>>>>>>>>>> >>>>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>>>> cases where there are no special semantics on the name >>>>>>>>>>>>component. >>>>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>>>> globally understood types that are part of the base or >>>>>>>>>>>>extension >>>>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>>>> (say another 1024 types) >>>>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>>>> >>>>>>>>>>>> Make sense? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>> performance flaws in the design >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> we could design for performance, >>>>>>>>>>>> >>>>>>>>>>>> That?s not what people are advocating. We are advocating that >>>>>>>>>>>>we >>>>>>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>>>> >>>>>>>>>>>> but I think there will be a turning >>>>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>>>> >>>>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>>>> things that don?t get faster while others do tend to get >>>>>>>>>>>>dropped >>>>>>>>>>>> or not used because they impose a performance penalty relative >>>>>>>>>>>>to >>>>>>>>>>>> the things that go faster. There is also the ?low-end? >>>>>>>>>>>>phenomenon >>>>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>>>>> performance just never get better. >>>>>>>>>>>> >>>>>>>>>>>> Do you >>>>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>>>> performance improvement? >>>>>>>>>>>> >>>>>>>>>>>> I suspect LPM on data will always be slow (relative to the >>>>>>>>>>>>other >>>>>>>>>>>> functions). >>>>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>>>> require extra memory references. >>>>>>>>>>>> >>>>>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>>>>> speculation based on 35+ years of seeing performance improve >>>>>>>>>>>>by 4 >>>>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>>>> cycles and memory references? >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>>>> perform >>>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>>> becomes >>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>>> >>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>> performance flaws in the design: >>>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>>>> near-zero lock contention. >>>>>>>>>>>> >>>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>>>> poster children for this are: >>>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>>>> anywhere >>>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>>>>>> in working around. >>>>>>>>>>>> >>>>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>>>> engineers that got most of it right. >>>>>>>>>>>> >>>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>>>> Now I >>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>>> types >>>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>>> additional >>>>>>>>>>>> marker space >>>>>>>>>>>> >>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>> >>>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>>>> resilient against aliasing. >>>>>>>>>>>> >>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>>> convention >>>>>>>>>>>> needs to be revised. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>>> wrote: >>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>>> to fit in (the >>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>>> current NDN >>>>>>>>>>>> experiments? >>>>>>>>>>>> >>>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>>> Related: Many URLs >>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>>> text lines, and >>>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>>> I see. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>> >>>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>>> architectures, >>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>>> bytes in memory, >>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>>> 32-byte blocks >>>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>>> If you need to >>>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>>> have to read past >>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>>> to get to the >>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>> >>>>>>>>>>>> Marc >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>> >>>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>>> it is: >>>>>>>>>>>> >>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>>> from the start >>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>> OR >>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>> >>>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>>> seem to imply >>>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>>> case. >>>>>>>>>>>> >>>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>>> for the index, >>>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>>> components. For the >>>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>>> 5 components. >>>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>>> than >>>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>>> find that the >>>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>>> performance of the >>>>>>>>>>>> direct access. >>>>>>>>>>>> >>>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>>> all, for >>>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>>> offsets as >>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>>> you may see a >>>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>>> than in >>>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>>> that you?re >>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>>> name is to be >>>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>>> performance >>>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>>> bet this is not >>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>> >>>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>>> protocol with >>>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>>> the name to a >>>>>>>>>>>> different processor than the index). >>>>>>>>>>>> >>>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>>> would like to see >>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>> >>>>>>>>>>>> Nacho >>>>>>>>>>>> >>>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>>> correct me if >>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>> Protocol Architect >>>>>>>>>>>> Principal Scientist >>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>>> amount of >>>>>>>>>>>> bytes: >>>>>>>>>>>> >>>>>>>>>>>> i.e., >>>>>>>>>>>> Type = Offsets >>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>> >>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>>> name is too >>>>>>>>>>>> long. >>>>>>>>>>>> >>>>>>>>>>>> Max >>>>>>>>>>>> >>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>> >>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>> you only >>>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>>> the >>>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>>> iteratively parse >>>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>>> directly >>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>> >>>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>>> "offset" is >>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>>> parse x-1 >>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>> >>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>>> like the >>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>>> understand what >>>>>>>>>>>> you >>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>>> entirely >>>>>>>>>>>> different >>>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>>> ... >>>>>>>>>>>> someplace >>>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>>> you say >>>>>>>>>>>> "field >>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>>> TLV)? >>>>>>>>>>>> >>>>>>>>>>>> Correct. >>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>>> name >>>>>>>>>>>> hierarchy >>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>>> offset to use >>>>>>>>>>>> in >>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>>> Aliasing is >>>>>>>>>>>> avoided as >>>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>>> use the >>>>>>>>>>>> "offset >>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>> >>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>> >>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>> you only >>>>>>>>>>>> want >>>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>>> offsets. >>>>>>>>>>>> With the >>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>>> x-1 >>>>>>>>>>>> components. >>>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>>> firs x >>>>>>>>>>>> components. >>>>>>>>>>>> >>>>>>>>>>>> Max >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- Mark >>>>>>>>>>>> >>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>> >>>>>>>>>>>> The why is simple: >>>>>>>>>>>> >>>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>>> "specific >>>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>>> in order >>>>>>>>>>>> to >>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>>> rule >>>>>>>>>>>> (specify >>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>> >>>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>>> prefer >>>>>>>>>>>> to >>>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>>> separator. Then, >>>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>>> used at >>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>>> that >>>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>>> version, >>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Max >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>> >>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>>> up with >>>>>>>>>>>> names >>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>> components >>>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>>> is an >>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>> component's >>>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>>> that >>>>>>>>>>>> applications MUST use). >>>>>>>>>>>> >>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>> explanation >>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>>> e.g.) >>>>>>>>>>>> and >>>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>>> have to >>>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>>> prefer >>>>>>>>>>>> the >>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>>> preference in >>>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>>> it is >>>>>>>>>>>> that >>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Mark >>>>>>>>>>>> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ndn-interest mailing list >>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>> >>>>>> >>>>>>_______________________________________________ >>>>>>Ndn-interest mailing list >>>>>>Ndn-interest at lists.cs.ucla.edu >>>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ndn-interest mailing list >>>>> Ndn-interest at lists.cs.ucla.edu >>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >>>>_______________________________________________ >>>>Ndn-interest mailing list >>>>Ndn-interest at lists.cs.ucla.edu >>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>> > From tailinchu at gmail.com Sat Sep 27 15:56:12 2014 From: tailinchu at gmail.com (Tai-Lin Chu) Date: Sat, 27 Sep 2014 15:56:12 -0700 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: Thanks for the detail. I see yet another problem: how do you know whether this packet delivered is encapsulated or not? You have to check twice in the worst case: you did not check out in the outer packet, and then read again from the data portion. (although some markings in the packet could solve this) Just some thoughts: this protocol actually makes ndn node harder, because now it has to know whether there are some ccnx node in the network that tries to encapsulate. ccnx node actually does not understand selectors but creates problems for ndn nodes to solve. :( But thanks for this co-existing world proposal. On Sat, Sep 27, 2014 at 3:09 PM, Tai-Lin Chu wrote: >> I?m not sure what you mean by trust to the cache. NDN has no trust to the > cache and no way to trust that a selector match is the correct match. > > ndn does not allow cache to publish new data under other provider's > prefix, i.e, table of content, but your discovery protocol is doing > this. > > On Sat, Sep 27, 2014 at 2:44 PM, wrote: >> On 9/27/14, 10:19 PM, "Tai-Lin Chu" wrote: >> >>>The concern is that the table of content is not confirmed by the >>>original provider; the cache server's data is "trusted with some other >>>chains". This trust model works but brings restriction. It basically >>>requires build another trust model on "cache server"; otherwise, >>>nothing discovered can be trusted, which also means that you discover >>>nothing. >> >> I?m not sure what you mean by trust to the cache. NDN has no trust to the >> cache and no way to trust that a selector match is the correct match. >> >> As I know, a cache can have >> >> /foo/1 >> /foo/2 >> /foo/3 >> >> At could reply with /foo/2 and not give you /foo/3 as the ?latest?. You >> have no way to trust that a cache will give you anything specific. You >> can?t really require this because you can?t require cache nodes to have a >> specific cache replacement policy, so as far as you know, the cache could >> have dropped /foo/3 from the cache. >> >> As a matter of fact, unless you require signature verification at cache >> nodes (CCN requires this), you don?t even have that. From what I?ve been >> told, it?s optional for nodes to check for signatures. So, at any point >> in the network, you never know if previous nodes have verified the >> signature. >> >> So, I?m not sure what kind of ?trust model? you refer to. Is there some >> trust model has that this Selector Protocol breaks at the nodes that run >> the Selector Protocol? If so, could you please explain it. >> >> >>>Another critical point is that those cache servers are not >>>hierarchical, so we can only apply flat signing (one guy signs them >>>all.) This looks very problematic. An quick fix is that you just use >>>impose the name hierarchy, but it is cumbersome too. >> >> Nobody really cares about the signature of the reply. You care about >> what?s encapsulated inside, which, in fact, does authenticate to the >> selector request. Every node running the Selector Protocol can check this >> reply and this signature. >> >>>Here is our discussion so far: >>>exact matching -> ... needs discovery protocol (because it is not lpm) >>>-> discovery needs table of content -> restrictive trust model >>>My argument is that this restrictive trust model logically discourages >>>exact matching. >> >> I?m not sure what to make of this. >> >> Every system needs a discovery protocol. NDN is doing it via selector >> matching at the forwarder. CCN does it at a layer above that. We don?t >> believe you should force nodes to let their caches be discoverable and to >> run the computation needed for this. >> >> There is no restrictive trust model. In CCN we don?t do anything of what >> I?ve described because we don?t do Selector Protocol. The Selector >> Protocol I?m just described is meant to give you the same semantics as NDN >> using exact matching. this includes the security model. Just because the >> ?layer underneath? (aka CCN) does not do the same security model doesn?t >> mean that the protocol doesn?t deliver it to you. >> >> It seems to me that you?d be hard pressed to find a feature difference >> between NDN selectors and the CCN nodes running the Selector Protocol I >> described. >> >> >> Let me go over it once again: >> >> >> Network: >> >> A - - - B - - C - - D >> E - F - + >> >> >> A, B, D, E and F are running the Selector Protocol. C is not. >> >> D is serving content for /foo >> B has a copy of /foo/100, signed by D >> >> Node A wants something that starts with /foo/ but has a next component >> greater than 50 >> >> >> >> A issues an interest: >> Interest: name = /foo/hash(Sel(>50)) >> Payload = Sel(>50) >> >> >> Interest arrives at B. B notices that it?s a Selector Protocol based >> interest. >> It interprets the payload, looks at the cache and finds /foo/100 as a >> match. >> It generates a reply. >> >> Data: >> Name = /foo/hash(Sel(>50)) >> Payload = ( Data: Name = /foo/100, Signature = Node D, Payload = data) >> Signature = Node B >> >> That data is sent to node A. >> >> A is running the Selector Protocol. >> It notices that this reply is a Selector Protocol reply. >> It decapsulates the Payload. It extracts /foo/100. >> It checks that /foo/100 is signed by Node D. >> >> A?s interest is satisfied. >> >> >> A issues new interest (for something newer, it wasn?t satisfied with 100). >> >> A issues an interest: >> Interest: name = /foo/hash(Sel(>100)) >> Payload = Sel(>100) >> >> Sends interest to B. >> >> B knows it?s a Selector Protocol interest. >> Parses the payload for the selectors. It looks at the cache, finds no >> match. >> >> B sends the interest to C >> >> >> C doesn?t understand Selector Protocol. It just does exact matching. Finds >> no match. >> It forwards the interest to node D. >> >> >> D is running the Selector Protocol. >> D looks at the data to see what?s the latest one. It?s 200. >> D creates encapsulated reply. >> >> Data: >> Name = /foo/hash(Sel(>100)) >> Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) >> Signature = Node D >> >> >> Forwards the data to C. >> >> C doesn?t know Selector Protocol. It matches the Data to the Interest >> based on exact match of the name >> /foo/hash(Sel(>100)). It may or may not cache the data. >> >> C forwards the data to B. >> >> B is running the Selector Protocol. It matches the data to the interest >> based on PIT. It then proceeds to check that the selectors actually >> match. It check that /foo/200 is greater than /foo/100. The check >> passes. It decides to keep a copy of /foo/200 in its cache. >> >> Node B forwards the data to Node A, which receives it. Node A is running >> the Selector Protocol. It decapsulates the data, checks the authenticity >> and hands it to the app. >> >> >> >> Node E wants some of the /foo data, but only with the right signature. >> >> Node E issues an interest: >> Interest: name = /foo/hash(Sel(Key=NodeD)) >> Payload = Sel(Key=NodeD) >> >> >> >> Sends it to Node F. >> >> >> F receives the interest. It knows it?s a Selector Protocol interest. >> Parses payload, looks in cache but finds no match. >> >> F forwards the interest to node B. >> >> B receives the interest. It knows it?s a Selector Protocol interest. >> Parses payload, looks in the cache and finds a match (namely /foo/200). >> /foo/200 checks out since it is signed by Node D. >> >> Node B creates a reply by encapsulating /foo/200: >> Name = /foo/hash(Sel(Key=NodeD)) >> Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) >> Signature = Node B >> >> It sends the data to F. >> >> Node F is running the Selector Protocol. It sees the reply. It >> decapsulates the object inside (/foo/200). It knows that this PIT entry >> has selectors and requires that the signature come from Node D. It checks >> that the signature of /foo/200 is from node D. It is. This is a valid >> reply to the interest, so it forwards the data along to node E and >> consumes the interest. Node F keeps a copy of the /foo/200 object. >> >> Node E receives the object. Matches it to the PIT. Decapsulates the data >> (since E is running the Selector Protocol), matches it to the selectors >> and once checked sends it to the application. >> >> >> >> Done. >> >> In this scenario, most nodes were running the Selector Protocol. But it?s >> possible for some nodes not to run it. Those nodes would only do exact >> matching (like node C). In this example, Node C kept a copy of the >> packet /foo/hash(Sel(>100)) (which encapsulated /foo/200), it could use >> this as a reply to another interest with the same name, but it wouldn?t be >> able to use this to answer a selector of /foo/hash(Sel(>150)) since that >> would require selector parsing. That request would just be forwarded. >> >> >> To summarize, nodes running the Selector Protocol behave like NDN nodes. >> The rest of the other nodes can do regular CCN with exact matching. >> >> Again, we are not advocating for this discovery protocol, we are just >> saying that you could implement the selector functionality on top of exact >> matching. Those nodes that wanted to run the protocol would be able to do >> so, and those that did not want to run the protocol would not be required >> to do so. >> >> Nacho >> >> >> >>>On Sat, Sep 27, 2014 at 12:40 PM, wrote: >>>> On 9/27/14, 8:40 PM, "Tai-Lin Chu" wrote: >>>> >>>>>> /mail/inbox/selector_matching/ >>>>> >>>>>So Is this implicit? >>>> >>>> No. This is an explicit hash of the interest payload. >>>> >>>> So, an interest could look like: >>>> >>>> Interest: >>>> name = /mail/inbox/selector_matching/1234567890 >>>> payload = ?user=nacho? >>>> >>>> where hash(?user=nacho?) = 1234567890 >>>> >>>> >>>>>BTW, I read all your replies. I think the discovery protocol (send out >>>>>table of content) has to reach the original provider ; otherwise there >>>>>will be some issues in the trust model. At least the cached table of >>>>>content has to be confirmed with the original provider either by key >>>>>delegation or by other confirmation protocol. Besides this, LGTM. >>>> >>>> >>>> The trust model is just slightly different. >>>> >>>> You could have something like: >>>> >>>> Interest: >>>> name = /mail/inbox/selector_matching/1234567890 >>>> payload = ?user=nacho,publisher=mail_server_key? >>>> >>>> >>>> In this case, the reply would come signed by some random cache, but the >>>> encapsulated object would be signed by mail_server_key. So, any node >>>>that >>>> understood the Selector Protocol could decapsulate the reply and check >>>>the >>>> signature. >>>> >>>> Nodes that do not understand the Selector Protocol would not be able to >>>> check the signature of the encapsulated answer. >>>> >>>> This to me is not a problem. Base nodes (the ones not running the >>>>Selector >>>> Protocol) would not be checking signatures anyway, at least not in the >>>> fast path. This is an expensive operation that requires the node to get >>>> the key, etc. Nodes that run the Selector Protocol can check signatures >>>> if they wish (and can get their hands on a key). >>>> >>>> >>>> >>>> Nacho >>>> >>>> >>>>>On Sat, Sep 27, 2014 at 1:10 AM, wrote: >>>>>> On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" >>>>>>wrote: >>>>>> >>>>>>>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>>>>>>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" >>>>>>>>wrote: >>>>>>>> >>>>>>>>> How can a cache respond to /mail/inbox/selector_matching/>>>>>>>> payload> with a table of content? This name prefix is owned by the >>>>>>>>>mail >>>>>>>>> server. Also the reply really depends on what is in the cache at >>>>>>>>>the >>>>>>>>> moment, so the same name would correspond to different data. >>>>>>>> >>>>>>>> A - Yes, the same name would correspond to different data. This is >>>>>>>>true >>>>>>>> given that then data has changed. NDN (and CCN) has no architectural >>>>>>>> requirement that a name maps to the same piece of data (Obviously >>>>>>>>not >>>>>>>> talking about self certifying hash-based names). >>>>>>> >>>>>>>There is a difference. A complete NDN name including the implicit >>>>>>>digest >>>>>>>uniquely identifies a piece of data. >>>>>> >>>>>> That?s the same thing for CCN with a ContentObjectHash. >>>>>> >>>>>> >>>>>>>But here the same complete name may map to different data (I suppose >>>>>>>you >>>>>>>don't have implicit digest in an effort to do exact matching). >>>>>> >>>>>> We do, it?s called ContentObjectHash, but it?s not considered part of >>>>>>the >>>>>> name, it?s considered a matching restriction. >>>>>> >>>>>> >>>>>>>In other words, in your proposal, the same name >>>>>>>/mail/inbox/selector_matching/hash1 may map to two or more different >>>>>>>data >>>>>>>packets. But in NDN, two Data packets may share a name prefix, but >>>>>>>definitely not the implicit digest. And at least it is my >>>>>>>understanding >>>>>>>that the application design should make sure that the same producer >>>>>>>doesn't produce different Data packets with the same name prefix >>>>>>>before >>>>>>>implicit digest. >>>>>> >>>>>> This is an application design issue. The network cannot enforce this. >>>>>> Applications will be able to name various data objects with the same >>>>>>name. >>>>>> After all, applications don?t really control the implicit digest. >>>>>> >>>>>>>It is possible in attack scenarios for different producers to generate >>>>>>>Data packets with the same name prefix before implicit digest, but >>>>>>>still >>>>>>>not the same implicit digest. >>>>>> >>>>>> Why is this an attack scenario? Isn?t it true that if I name my >>>>>>local >>>>>> printer /printer that name can exist in the network at different >>>>>>locations >>>>>> from different publishers? >>>>>> >>>>>> >>>>>> Just to clarify, in the examples provided we weren?t using implicit >>>>>>hashes >>>>>> anywhere. IF we were using implicit hashes (as in, we knew what the >>>>>> implicit hash was), then selectors are useless. If you know the >>>>>>implicit >>>>>> hash, then you don?t need selectors. >>>>>> >>>>>> In the case of CCN, we use names without explicit hashes for most of >>>>>>our >>>>>> initial traffic (discovery, manifests, dynamically generated data, >>>>>>etc.), >>>>>> but after that, we use implicit digests (ContentObjectHash >>>>>>restriction) >>>>>> for practically all of the other traffic. >>>>>> >>>>>> Nacho >>>>>> >>>>>> >>>>>>>> >>>>>>>> B - Yes, you can consider the name prefix is ?owned? by the server, >>>>>>>>but >>>>>>>> the answer is actually something that the cache is choosing. The >>>>>>>>cache >>>>>>>>is >>>>>>>> choosing from the set if data that it has. The data that it >>>>>>>>encapsulates >>>>>>>> _is_ signed by the producer. Anybody that can decapsulate the data >>>>>>>>can >>>>>>>> verify that this is the case. >>>>>>>> >>>>>>>> Nacho >>>>>>>> >>>>>>>> >>>>>>>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>>>>>>> >>>>>>>>>> My beating on ?discover all? is exactly because of this. Let?s >>>>>>>>>>define >>>>>>>>>> discovery service. If the service is just ?discover latest? >>>>>>>>>> (left/right), can we not simplify the current approach? If the >>>>>>>>>>service >>>>>>>>>> includes more than ?latest?, then is the current approach the >>>>>>>>>>right >>>>>>>>>> approach? >>>>>>>>>> >>>>>>>>>> Sync has its place and is the right solution for somethings. >>>>>>>>>>However, >>>>>>>>>> it should not be a a bandage over discovery. Discovery should be >>>>>>>>>>its >>>>>>>>>> own valid and useful service. >>>>>>>>>> >>>>>>>>>> I agree that the exclusion approach can work, and work relatively >>>>>>>>>>well, >>>>>>>>>> for finding the rightmost/leftmost child. I believe this is >>>>>>>>>>because >>>>>>>>>> that operation is transitive through caches. So, within whatever >>>>>>>>>> timeout an application is willing to wait to find the ?latest?, it >>>>>>>>>>can >>>>>>>>>> keep asking and asking. >>>>>>>>>> >>>>>>>>>> I do think it would be best to actually try to ask an >>>>>>>>>>authoritative >>>>>>>>>> source first (i.e. a non-cached value), and if that fails then >>>>>>>>>>probe >>>>>>>>>> caches, but experimentation may show what works well. This is >>>>>>>>>>based >>>>>>>>>>on >>>>>>>>>> my belief that in the real world in broad use, the namespace will >>>>>>>>>>become >>>>>>>>>> pretty polluted and probing will result in a lot of junk, but >>>>>>>>>>that?s >>>>>>>>>> future prognosticating. >>>>>>>>>> >>>>>>>>>> Also, in the exact match vs. continuation match of content object >>>>>>>>>>to >>>>>>>>>> interest, it is pretty easy to encode that ?selector? request in a >>>>>>>>>>name >>>>>>>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & >>>>>>>>>>sort=right?) >>>>>>>>>> and any participating cache can respond with a link (or >>>>>>>>>>encapsulate) a >>>>>>>>>> response in an exact match system. >>>>>>>>>> >>>>>>>>>> In the CCNx 1.0 spec, one could also encode this a different way. >>>>>>>>>>One >>>>>>>>>> could use a name like ?/mail/inbox/selector_matching/>>>>>>>>>payload>? >>>>>>>>>> and in the payload include "exclude_before=(t=version, l=2, >>>>>>>>>>v=279) & >>>>>>>>>> sort=right?. This means that any cache that could process the ? >>>>>>>>>> selector_matching? function could look at the interest payload and >>>>>>>>>> evaluate the predicate there. The predicate could become large >>>>>>>>>>and >>>>>>>>>>not >>>>>>>>>> pollute the PIT with all the computation state. Including ?>>>>>>>>>of >>>>>>>>>> payload>? in the name means that one could get a cached response >>>>>>>>>>if >>>>>>>>>> someone else had asked the same exact question (subject to the >>>>>>>>>>content >>>>>>>>>> object?s cache lifetime) and it also servers to multiplex >>>>>>>>>>different >>>>>>>>>> payloads for the same function (selector_matching). >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchron >>>>>>>>>>>iz >>>>>>>>>>>at >>>>>>>>>>>io >>>>>>>>>>> nPr >>>>>>>>>>> otocol.html >>>>>>>>>>> >>>>>>>>>>> J. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" wrote: >>>>>>>>>>> >>>>>>>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>>>>>>*all*-value" >>>>>>>>>>>> efficiently. >>>>>>>>>>>> There are still interesting topics on >>>>>>>>>>>> 1. how do we express the discovery query? >>>>>>>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>>>>>>> discovery query with current selector? >>>>>>>>>>>> 3. if so, can we re-express current selector in a more efficient >>>>>>>>>>>>way? >>>>>>>>>>>> >>>>>>>>>>>> I personally see a named data as a set, which can then be >>>>>>>>>>>>categorized >>>>>>>>>>>> into "ordered set", and "unordered set". >>>>>>>>>>>> some questions that any discovery expression must solve: >>>>>>>>>>>> 1. is this a nil set or not? nil set means that this name is the >>>>>>>>>>>>leaf >>>>>>>>>>>> 2. set contains member X? >>>>>>>>>>>> 3. is set ordered or not >>>>>>>>>>>> 4. (ordered) first, prev, next, last >>>>>>>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>>>>>>> 6. recursively answer all questions above on any set member >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> From: >>>>>>>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>>> Cc: , , >>>>>>>>>>>>> >>>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>>>>> >>>>>>>>>>>>> I think Tai-Lin?s example was just fine to talk about >>>>>>>>>>>>>discovery. >>>>>>>>>>>>> /blah/blah/value, how do you discover all the ?value?s? >>>>>>>>>>>>>Discovery >>>>>>>>>>>>> shouldn?t >>>>>>>>>>>>> care if its email messages or temperature readings or world cup >>>>>>>>>>>>> photos. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This is true if discovery means "finding everything" - in which >>>>>>>>>>>>>case, >>>>>>>>>>>>> as you >>>>>>>>>>>>> point out, sync-style approaches may be best. But I am not >>>>>>>>>>>>>sure >>>>>>>>>>>>>that >>>>>>>>>>>>> this >>>>>>>>>>>>> definition is complete. The most pressing example that I can >>>>>>>>>>>>>think >>>>>>>>>>>>> of >>>>>>>>>>>>> is >>>>>>>>>>>>> best-effort latest-value, in which the consumer's goal is to >>>>>>>>>>>>>get >>>>>>>>>>>>>the >>>>>>>>>>>>> latest >>>>>>>>>>>>> copy the network can deliver at the moment, and may not care >>>>>>>>>>>>>about >>>>>>>>>>>>> previous >>>>>>>>>>>>> values or (if freshness is used well) potential later versions. >>>>>>>>>>>>> >>>>>>>>>>>>> Another case that seems to work well is video seeking. Let's >>>>>>>>>>>>>say I >>>>>>>>>>>>> want to >>>>>>>>>>>>> enable random access to a video by timecode. The publisher can >>>>>>>>>>>>> provide a >>>>>>>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>>>>>>Interest >>>>>>>>>>>>> that >>>>>>>>>>>>> essentially says "give me the closest keyframe to 00:37:03:12", >>>>>>>>>>>>>which >>>>>>>>>>>>> returns an interest that, via the name, provides the exact >>>>>>>>>>>>>timecode >>>>>>>>>>>>> of >>>>>>>>>>>>> the >>>>>>>>>>>>> keyframe in question and a link to a segment-based namespace >>>>>>>>>>>>>for >>>>>>>>>>>>> efficient >>>>>>>>>>>>> exact match playout. In two roundtrips and in a very >>>>>>>>>>>>>lightweight >>>>>>>>>>>>> way, >>>>>>>>>>>>> the >>>>>>>>>>>>> consumer has random access capability. If the NDN is the >>>>>>>>>>>>>moral >>>>>>>>>>>>> equivalent >>>>>>>>>>>>> of IP, then I am not sure we should be afraid of roundtrips >>>>>>>>>>>>>that >>>>>>>>>>>>> provide >>>>>>>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I described one set of problems using the exclusion approach, >>>>>>>>>>>>>and >>>>>>>>>>>>> that >>>>>>>>>>>>> an >>>>>>>>>>>>> NDN paper on device discovery described a similar problem, >>>>>>>>>>>>>though >>>>>>>>>>>>> they >>>>>>>>>>>>> did >>>>>>>>>>>>> not go into the details of splitting interests, etc. That all >>>>>>>>>>>>>was >>>>>>>>>>>>> simple >>>>>>>>>>>>> enough to see from the example. >>>>>>>>>>>>> >>>>>>>>>>>>> Another question is how does one do the discovery with exact >>>>>>>>>>>>>match >>>>>>>>>>>>> names, >>>>>>>>>>>>> which is also conflating things. You could do a different >>>>>>>>>>>>>discovery >>>>>>>>>>>>> with >>>>>>>>>>>>> continuation names too, just not the exclude method. >>>>>>>>>>>>> >>>>>>>>>>>>> As I alluded to, one needs a way to talk with a specific cache >>>>>>>>>>>>>about >>>>>>>>>>>>> its >>>>>>>>>>>>> ?table of contents? for a prefix so one can get a consistent >>>>>>>>>>>>>set >>>>>>>>>>>>>of >>>>>>>>>>>>> results >>>>>>>>>>>>> without all the round-trips of exclusions. Actually >>>>>>>>>>>>>downloading >>>>>>>>>>>>>the >>>>>>>>>>>>> ?headers? of the messages would be the same bytes, more or >>>>>>>>>>>>>less. >>>>>>>>>>>>>In >>>>>>>>>>>>> a >>>>>>>>>>>>> way, >>>>>>>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, >>>>>>>>>>>>>but >>>>>>>>>>>>>that >>>>>>>>>>>>> protocol has its own set of problems and I?m not suggesting to >>>>>>>>>>>>>use >>>>>>>>>>>>> that >>>>>>>>>>>>> directly. >>>>>>>>>>>>> >>>>>>>>>>>>> One approach is to encode a request in a name component and a >>>>>>>>>>>>> participating >>>>>>>>>>>>> cache can reply. It replies in such a way that one could >>>>>>>>>>>>>continue >>>>>>>>>>>>> talking >>>>>>>>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>>>>>>> interest >>>>>>>>>>>>> with >>>>>>>>>>>>> a request for not-that-cache. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>>>>>>scenario? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Another approach is to try to ask the authoritative source for >>>>>>>>>>>>>the >>>>>>>>>>>>> ?current? >>>>>>>>>>>>> manifest name, i.e. /mail/inbox/current/, which could >>>>>>>>>>>>>return >>>>>>>>>>>>> the >>>>>>>>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>>>>>>> manifest >>>>>>>>>>>>> from >>>>>>>>>>>>> the link could come from caches because you how have a >>>>>>>>>>>>>consistent >>>>>>>>>>>>> set of >>>>>>>>>>>>> names to ask for. If you cannot talk with an authoritative >>>>>>>>>>>>>source, >>>>>>>>>>>>> you >>>>>>>>>>>>> could try again without the nonce and see if there?s a cached >>>>>>>>>>>>>copy >>>>>>>>>>>>> of a >>>>>>>>>>>>> recent version around. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human being, >>>>>>>>>>>>>see a >>>>>>>>>>>>> pattern with static (/mail/inbox) and variable (148) >>>>>>>>>>>>>components; >>>>>>>>>>>>>with >>>>>>>>>>>>> proper naming convention, computers can also detect this >>>>>>>>>>>>>pattern >>>>>>>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>>>>>>generate >>>>>>>>>>>>> a >>>>>>>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>>>>>>> selectors >>>>>>>>>>>>> I can further refine my guesses. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>>>>>>> application >>>>>>>>>>>>> design). You have an app (a mail server / inbox) and you want >>>>>>>>>>>>>it >>>>>>>>>>>>>to >>>>>>>>>>>>> list >>>>>>>>>>>>> your emails? An email list is an application data structure. >>>>>>>>>>>>>I >>>>>>>>>>>>> don?t >>>>>>>>>>>>> think you should use the network structure to reflect this. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think Tai-Lin is trying to sketch a small example, not >>>>>>>>>>>>>propose >>>>>>>>>>>>>a >>>>>>>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Another way to look at it is that if the network architecture >>>>>>>>>>>>>is >>>>>>>>>>>>> providing >>>>>>>>>>>>> the equivalent of distributed storage to the application, >>>>>>>>>>>>>perhaps >>>>>>>>>>>>>the >>>>>>>>>>>>> application data structure could be adapted to match the >>>>>>>>>>>>>affordances >>>>>>>>>>>>> of >>>>>>>>>>>>> the network. Then it would not be so bad that the two >>>>>>>>>>>>>structures >>>>>>>>>>>>> were >>>>>>>>>>>>> aligned. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I?ll give you an example, how do you delete emails from your >>>>>>>>>>>>>inbox? >>>>>>>>>>>>> If >>>>>>>>>>>>> an >>>>>>>>>>>>> email was cached in the network it can never be deleted from >>>>>>>>>>>>>your >>>>>>>>>>>>> inbox? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This is conflating two issues - what you are pointing out is >>>>>>>>>>>>>that >>>>>>>>>>>>>the >>>>>>>>>>>>> data >>>>>>>>>>>>> structure of a linear list doesn't handle common email >>>>>>>>>>>>>management >>>>>>>>>>>>> operations well. Again, I'm not sure if that's what he was >>>>>>>>>>>>>getting >>>>>>>>>>>>> at >>>>>>>>>>>>> here. But deletion is not the issue - the availability of a >>>>>>>>>>>>>data >>>>>>>>>>>>> object >>>>>>>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>>>>>>> perspective >>>>>>>>>>>>> of the application. >>>>>>>>>>>>> >>>>>>>>>>>>> Or moved to another mailbox? Do you rely on the emails >>>>>>>>>>>>>expiring? >>>>>>>>>>>>> >>>>>>>>>>>>> This problem is true for most (any?) situations where you use >>>>>>>>>>>>>network >>>>>>>>>>>>> name >>>>>>>>>>>>> structure to directly reflect the application data structure. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Not sure I understand how you make the leap from the example to >>>>>>>>>>>>>the >>>>>>>>>>>>> general statement. >>>>>>>>>>>>> >>>>>>>>>>>>> Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Nacho >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, yes I think those would all be good things. >>>>>>>>>>>>> >>>>>>>>>>>>> One thing to keep in mind, especially with things like time >>>>>>>>>>>>>series >>>>>>>>>>>>> sensor >>>>>>>>>>>>> data, is that people see a pattern and infer a way of doing it. >>>>>>>>>>>>> That?s >>>>>>>>>>>>> easy >>>>>>>>>>>>> for a human :) But in Discovery, one should assume that one >>>>>>>>>>>>>does >>>>>>>>>>>>>not >>>>>>>>>>>>> know >>>>>>>>>>>>> of patterns in the data beyond what the protocols used to >>>>>>>>>>>>>publish >>>>>>>>>>>>>the >>>>>>>>>>>>> data >>>>>>>>>>>>> explicitly require. That said, I think some of the things you >>>>>>>>>>>>>listed >>>>>>>>>>>>> are >>>>>>>>>>>>> good places to start: sensor data, web content, climate data or >>>>>>>>>>>>> genome >>>>>>>>>>>>> data. >>>>>>>>>>>>> >>>>>>>>>>>>> We also need to state what the forwarding strategies are and >>>>>>>>>>>>>what >>>>>>>>>>>>>the >>>>>>>>>>>>> cache >>>>>>>>>>>>> behavior is. >>>>>>>>>>>>> >>>>>>>>>>>>> I outlined some of the points that I think are important in >>>>>>>>>>>>>that >>>>>>>>>>>>> other >>>>>>>>>>>>> posting. While ?discover latest? is useful, ?discover all? is >>>>>>>>>>>>>also >>>>>>>>>>>>> important, and that one gets complicated fast. So points like >>>>>>>>>>>>> separating >>>>>>>>>>>>> discovery from retrieval and working with large data sets have >>>>>>>>>>>>>been >>>>>>>>>>>>> important in shaping our thinking. That all said, I?d be happy >>>>>>>>>>>>> starting >>>>>>>>>>>>> from 0 and working through the Discovery service definition >>>>>>>>>>>>>from >>>>>>>>>>>>> scratch >>>>>>>>>>>>> along with data set use cases. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Marc, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get one >>>>>>>>>>>>>step >>>>>>>>>>>>> more >>>>>>>>>>>>> specific, which was to see if we could identify a few specific >>>>>>>>>>>>>use >>>>>>>>>>>>> cases >>>>>>>>>>>>> around which to have the conversation. (e.g., time series >>>>>>>>>>>>>sensor >>>>>>>>>>>>> data >>>>>>>>>>>>> and >>>>>>>>>>>>> web content retrieval for "get latest"; climate data for huge >>>>>>>>>>>>>data >>>>>>>>>>>>> sets; >>>>>>>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>>>>>>looking >>>>>>>>>>>>> at >>>>>>>>>>>>> that's driving considerations of discovery? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> From: >>>>>>>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>>> Cc: , >>>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming convention? >>>>>>>>>>>>> >>>>>>>>>>>>> Jeff, >>>>>>>>>>>>> >>>>>>>>>>>>> Take a look at my posting (that Felix fixed) in a new thread on >>>>>>>>>>>>> Discovery. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-Septemb >>>>>>>>>>>>>er >>>>>>>>>>>>>/0 >>>>>>>>>>>>>00 >>>>>>>>>>>>> 20 >>>>>>>>>>>>> 0 >>>>>>>>>>>>> .html >>>>>>>>>>>>> >>>>>>>>>>>>> I think it would be very productive to talk about what >>>>>>>>>>>>>Discovery >>>>>>>>>>>>> should >>>>>>>>>>>>> do, >>>>>>>>>>>>> and not focus on the how. It is sometimes easy to get caught >>>>>>>>>>>>>up >>>>>>>>>>>>>in >>>>>>>>>>>>> the >>>>>>>>>>>>> how, >>>>>>>>>>>>> which I think is a less important topic than the what at this >>>>>>>>>>>>>stage. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Marc, >>>>>>>>>>>>> >>>>>>>>>>>>> If you can't talk about your protocols, perhaps we can discuss >>>>>>>>>>>>>this >>>>>>>>>>>>> based >>>>>>>>>>>>> on use cases. What are the use cases you are using to >>>>>>>>>>>>>evaluate >>>>>>>>>>>>> discovery? >>>>>>>>>>>>> >>>>>>>>>>>>> Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>>>>>>forwarder >>>>>>>>>>>>> can >>>>>>>>>>>>> send interests different ways you don't have a consistent >>>>>>>>>>>>>underlying >>>>>>>>>>>>> set >>>>>>>>>>>>> to talk about so you would always need non-range exclusions to >>>>>>>>>>>>> discover >>>>>>>>>>>>> every version. >>>>>>>>>>>>> >>>>>>>>>>>>> Range exclusions only work I believe if you get an >>>>>>>>>>>>>authoritative >>>>>>>>>>>>> answer. >>>>>>>>>>>>> If different content pieces are scattered between different >>>>>>>>>>>>>caches >>>>>>>>>>>>>I >>>>>>>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>>>>>>version. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm sorry to be pointing out problems without offering >>>>>>>>>>>>>solutions >>>>>>>>>>>>>but >>>>>>>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>>>>>>> >>>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I see. Can you briefly describe how ccnx discovery protocol >>>>>>>>>>>>>solves >>>>>>>>>>>>> the >>>>>>>>>>>>> all problems that you mentioned (not just exclude)? a doc will >>>>>>>>>>>>>be >>>>>>>>>>>>> better. >>>>>>>>>>>>> >>>>>>>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I >>>>>>>>>>>>>will >>>>>>>>>>>>> soon >>>>>>>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>>>>>>> Regular >>>>>>>>>>>>> language or context free language might become part of selector >>>>>>>>>>>>>too. >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, wrote: >>>>>>>>>>>>> That will get you one reading then you need to exclude it and >>>>>>>>>>>>>ask >>>>>>>>>>>>> again. >>>>>>>>>>>>> >>>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>>set >>>>>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>>versions >>>>>>>>>>>>> of an object. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I am very confused. For your example, if I want to get all >>>>>>>>>>>>>today's >>>>>>>>>>>>> sensor data, I just do (Any..Last second of last day)(First >>>>>>>>>>>>>second >>>>>>>>>>>>>of >>>>>>>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> If you talk sometimes to A and sometimes to B, you very easily >>>>>>>>>>>>> could miss content objects you want to discovery unless you >>>>>>>>>>>>>avoid >>>>>>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Could you explain why missing content object situation happens? >>>>>>>>>>>>>also >>>>>>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>>>>>> exclude; >>>>>>>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>>>>>>> possible. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>>set >>>>>>>>>>>>> with a particular cache, then you need to always use individual >>>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>>versions >>>>>>>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>>>>>>updated, >>>>>>>>>>>>> say, once per second you will have 86,400 of them per day. If >>>>>>>>>>>>>each >>>>>>>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes of >>>>>>>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>>>>>>> >>>>>>>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>>>>>>> timestamp makes sense here, but its just an example of needing >>>>>>>>>>>>>a >>>>>>>>>>>>>lot >>>>>>>>>>>>> of exclusions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You exclude through 100 then issue a new interest. This goes >>>>>>>>>>>>>to >>>>>>>>>>>>> cache B >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>>>>>>> interest, and cache A will return v101 if it exists. Like you >>>>>>>>>>>>>said, >>>>>>>>>>>>> if >>>>>>>>>>>>> this goes to cache B only, it means that cache A dies. How do >>>>>>>>>>>>>you >>>>>>>>>>>>> know >>>>>>>>>>>>> that v101 even exist? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I guess this depends on what the forwarding strategy is. If >>>>>>>>>>>>>the >>>>>>>>>>>>> forwarder will always send each interest to all replicas, then >>>>>>>>>>>>>yes, >>>>>>>>>>>>> modulo packet loss, you would discover v101 on cache A. If the >>>>>>>>>>>>> forwarder is just doing ?best path? and can round-robin between >>>>>>>>>>>>>cache >>>>>>>>>>>>> A and cache B, then your application could miss v101. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>>>>>>number >>>>>>>>>>>>> of components. In my own thread-safe LMP implementation, I used >>>>>>>>>>>>>only >>>>>>>>>>>>> one RWMutex for the whole tree. I don't know whether adding >>>>>>>>>>>>>lock >>>>>>>>>>>>>for >>>>>>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>>>>>> >>>>>>>>>>>>> However, we should compare (exact match + discovery protocol) >>>>>>>>>>>>>vs >>>>>>>>>>>>> (ndn >>>>>>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, we should compare them. And we need to publish the ccnx >>>>>>>>>>>>>1.0 >>>>>>>>>>>>> specs for doing the exact match discovery. So, as I said, I?m >>>>>>>>>>>>>not >>>>>>>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, wrote: >>>>>>>>>>>>> I would point out that using LPM on content object to Interest >>>>>>>>>>>>> matching to do discovery has its own set of problems. >>>>>>>>>>>>>Discovery >>>>>>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>>>>>> >>>>>>>>>>>>> This is probably getting off-topic from the original post about >>>>>>>>>>>>> naming conventions. >>>>>>>>>>>>> >>>>>>>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>>>>>>> different caches are responding, the exclusion set you build up >>>>>>>>>>>>> talking with cache A will be invalid for cache B. If you talk >>>>>>>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>>>>>>> content objects you want to discovery unless you avoid all >>>>>>>>>>>>>range >>>>>>>>>>>>> exclusions and only exclude explicit versions. That will lead >>>>>>>>>>>>>to >>>>>>>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>>>>>> consistent sets is better. >>>>>>>>>>>>> >>>>>>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>>>>>> should be transitive between caches, but imagine this. You >>>>>>>>>>>>>send >>>>>>>>>>>>> Interest #1 to cache A which returns version 100. You exclude >>>>>>>>>>>>> through 100 then issue a new interest. This goes to cache B >>>>>>>>>>>>>who >>>>>>>>>>>>> only has version 99, so the interest times out or is NACK?d. >>>>>>>>>>>>>So >>>>>>>>>>>>> you think you have it! But, cache A already has version 101, >>>>>>>>>>>>>you >>>>>>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>>>>>> consistent sets, it seems like even doing latest version >>>>>>>>>>>>>discovery >>>>>>>>>>>>> is difficult with selector based discovery. From what I saw in >>>>>>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>>>>>>> authoritative source because you can never believe an >>>>>>>>>>>>>intermediate >>>>>>>>>>>>> cache that there?s not something more recent. >>>>>>>>>>>>> >>>>>>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d be >>>>>>>>>>>>> interest in seeing your analysis. Case (a) is that a node can >>>>>>>>>>>>> correctly discover every version of a name prefix, and (b) is >>>>>>>>>>>>>that >>>>>>>>>>>>> a node can correctly discover the latest version. We have not >>>>>>>>>>>>> formally compared (or yet published) our discovery protocols >>>>>>>>>>>>>(we >>>>>>>>>>>>> have three, 2 for content, 1 for device) compared to selector >>>>>>>>>>>>>based >>>>>>>>>>>>> discovery, so I cannot yet claim they are better, but they do >>>>>>>>>>>>>not >>>>>>>>>>>>> have the non-determinism sketched above. >>>>>>>>>>>>> >>>>>>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups >>>>>>>>>>>>>you >>>>>>>>>>>>> must do in the PIT to match a content object. If you have a >>>>>>>>>>>>>name >>>>>>>>>>>>> tree or a threaded hash table, those don?t all need to be hash >>>>>>>>>>>>> lookups, but you need to walk up the name tree for every prefix >>>>>>>>>>>>>of >>>>>>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>>>>>> Content Based Networking (CBN) had some some methods to create >>>>>>>>>>>>>data >>>>>>>>>>>>> structures based on predicates, maybe those would be better. >>>>>>>>>>>>>But >>>>>>>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>>>>>>entries >>>>>>>>>>>>> if there is Interest traffic for many prefixes of a root. Even >>>>>>>>>>>>>on >>>>>>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll >>>>>>>>>>>>>have a >>>>>>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>>>>>> implementation only requires at most 3 lookups (one by name, >>>>>>>>>>>>>one >>>>>>>>>>>>>by >>>>>>>>>>>>> name + keyid, one by name + content object hash), and one can >>>>>>>>>>>>>do >>>>>>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>>>>>> >>>>>>>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning >>>>>>>>>>>>>LWP) >>>>>>>>>>>>> and that will be expensive. It would be interesting to see >>>>>>>>>>>>>what >>>>>>>>>>>>>a >>>>>>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I had thought about these questions, but I want to know your >>>>>>>>>>>>>idea >>>>>>>>>>>>> besides typed component: >>>>>>>>>>>>> 1. LPM allows "data discovery". How will exact match do similar >>>>>>>>>>>>> things? >>>>>>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>>>>>> other >>>>>>>>>>>>> faster technique to replace selector? >>>>>>>>>>>>> 3. fixed byte length and type. I agree more that type can be >>>>>>>>>>>>>fixed >>>>>>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Could you share it with us? >>>>>>>>>>>>> >>>>>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>>>>> >>>>>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>>>>> >>>>>>>>>>>>> The type space is currently shared with the types used for the >>>>>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>>>>> likelihood there will be at least as much and probably more >>>>>>>>>>>>>need >>>>>>>>>>>>> to component types than protocol extensions, we could reserve >>>>>>>>>>>>>1/2 >>>>>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>>>>> (2) since there is no parsing ambiguity between name components >>>>>>>>>>>>> and other fields of the protocol (sine they are sub-types of >>>>>>>>>>>>>the >>>>>>>>>>>>> name type) we could reuse numbers and thereby have an entire >>>>>>>>>>>>>65K >>>>>>>>>>>>> name component types. >>>>>>>>>>>>> >>>>>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>>>>> standard, IANA has 25 years of experience running registries >>>>>>>>>>>>>and >>>>>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>>>>>> approval). >>>>>>>>>>>>> >>>>>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>>>>> cases where there are no special semantics on the name >>>>>>>>>>>>>component. >>>>>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>>>>> globally understood types that are part of the base or >>>>>>>>>>>>>extension >>>>>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>>>>> (say another 1024 types) >>>>>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>>>>> >>>>>>>>>>>>> Make sense? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>> performance flaws in the design >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> we could design for performance, >>>>>>>>>>>>> >>>>>>>>>>>>> That?s not what people are advocating. We are advocating that >>>>>>>>>>>>>we >>>>>>>>>>>>> *not* design for known bad performance and hope serendipity or >>>>>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>>>>> >>>>>>>>>>>>> but I think there will be a turning >>>>>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>>>>> >>>>>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>>>>> things that don?t get faster while others do tend to get >>>>>>>>>>>>>dropped >>>>>>>>>>>>> or not used because they impose a performance penalty relative >>>>>>>>>>>>>to >>>>>>>>>>>>> the things that go faster. There is also the ?low-end? >>>>>>>>>>>>>phenomenon >>>>>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>>>>>> performance just never get better. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you >>>>>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>>>>> performance improvement? >>>>>>>>>>>>> >>>>>>>>>>>>> I suspect LPM on data will always be slow (relative to the >>>>>>>>>>>>>other >>>>>>>>>>>>> functions). >>>>>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>>>>> require extra memory references. >>>>>>>>>>>>> >>>>>>>>>>>>> However I of course don?t claim to clairvoyance so this is just >>>>>>>>>>>>> speculation based on 35+ years of seeing performance improve >>>>>>>>>>>>>by 4 >>>>>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>>>>> cycles and memory references? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>>>>> perform >>>>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>>>> becomes >>>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>>>> >>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>> performance flaws in the design: >>>>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>>>>> near-zero lock contention. >>>>>>>>>>>>> >>>>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>>>>> poster children for this are: >>>>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>>>>> anywhere >>>>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>>>> specified and is now a giant PITA that still causes major pain >>>>>>>>>>>>> in working around. >>>>>>>>>>>>> >>>>>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>>>>> engineers that got most of it right. >>>>>>>>>>>>> >>>>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>>>>> Now I >>>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>>>> types >>>>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>>>> additional >>>>>>>>>>>>> marker space >>>>>>>>>>>>> >>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>> >>>>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>>>>> incompatible schemes or trying to make naming conventions with >>>>>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>>>>> resilient against aliasing. >>>>>>>>>>>>> >>>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>>>> convention >>>>>>>>>>>>> needs to be revised. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>>>> to fit in (the >>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>>>> current NDN >>>>>>>>>>>>> experiments? >>>>>>>>>>>>> >>>>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>>>> Related: Many URLs >>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>>>> text lines, and >>>>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>>>> I see. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>>>> architectures, >>>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>>>> bytes in memory, >>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>>>> 32-byte blocks >>>>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>>>> If you need to >>>>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>>>> have to read past >>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>>>> to get to the >>>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>>> >>>>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>>>> it is: >>>>>>>>>>>>> >>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>>>> from the start >>>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>>> OR >>>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>>> >>>>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>>>> seem to imply >>>>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>>>> case. >>>>>>>>>>>>> >>>>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>>>> for the index, >>>>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>>>> components. For the >>>>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>>>> 5 components. >>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>>>> than >>>>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>>>> find that the >>>>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>>>> performance of the >>>>>>>>>>>>> direct access. >>>>>>>>>>>>> >>>>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>>>> all, for >>>>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>>>> offsets as >>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>>>> you may see a >>>>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>>>> than in >>>>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>>>> that you?re >>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>>>> name is to be >>>>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>>>> performance >>>>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>>>> bet this is not >>>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>>> >>>>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>>>> protocol with >>>>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>>>> the name to a >>>>>>>>>>>>> different processor than the index). >>>>>>>>>>>>> >>>>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>>>> would like to see >>>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>>> >>>>>>>>>>>>> Nacho >>>>>>>>>>>>> >>>>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>>>> correct me if >>>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>>> Protocol Architect >>>>>>>>>>>>> Principal Scientist >>>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>>>> amount of >>>>>>>>>>>>> bytes: >>>>>>>>>>>>> >>>>>>>>>>>>> i.e., >>>>>>>>>>>>> Type = Offsets >>>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>>> >>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>>>> name is too >>>>>>>>>>>>> long. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>> you only >>>>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>>>> the >>>>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>>>> iteratively parse >>>>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>>>> directly >>>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>>>> "offset" is >>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>>>> parse x-1 >>>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>>>> like the >>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>>>> understand what >>>>>>>>>>>>> you >>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>>>> entirely >>>>>>>>>>>>> different >>>>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>>>> ... >>>>>>>>>>>>> someplace >>>>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>>>> you say >>>>>>>>>>>>> "field >>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>>>> TLV)? >>>>>>>>>>>>> >>>>>>>>>>>>> Correct. >>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>>>> name >>>>>>>>>>>>> hierarchy >>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>>>> offset to use >>>>>>>>>>>>> in >>>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>>>> Aliasing is >>>>>>>>>>>>> avoided as >>>>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>>>> use the >>>>>>>>>>>>> "offset >>>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>>> >>>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>> you only >>>>>>>>>>>>> want >>>>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>>>> offsets. >>>>>>>>>>>>> With the >>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>>>> x-1 >>>>>>>>>>>>> components. >>>>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>>>> firs x >>>>>>>>>>>>> components. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- Mark >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> The why is simple: >>>>>>>>>>>>> >>>>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>>>> "specific >>>>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>>>> in order >>>>>>>>>>>>> to >>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>>>> rule >>>>>>>>>>>>> (specify >>>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>>> >>>>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>>>> prefer >>>>>>>>>>>>> to >>>>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>>>> separator. Then, >>>>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>>>> used at >>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>>>> that >>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>>>> version, >>>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>>>> up with >>>>>>>>>>>>> names >>>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>>> components >>>>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>>>> is an >>>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>>> component's >>>>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>>>> that >>>>>>>>>>>>> applications MUST use). >>>>>>>>>>>>> >>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>>> explanation >>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>>>> e.g.) >>>>>>>>>>>>> and >>>>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>>>> have to >>>>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>>>> prefer >>>>>>>>>>>>> the >>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>>>> preference in >>>>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>>>> it is >>>>>>>>>>>>> that >>>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Mark >>>>>>>>>>>>> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>> >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Ndn-interest mailing list >>>>>>>Ndn-interest at lists.cs.ucla.edu >>>>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>>_______________________________________________ >>>>>Ndn-interest mailing list >>>>>Ndn-interest at lists.cs.ucla.edu >>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >> From Ignacio.Solis at parc.com Sat Sep 27 21:16:42 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Sun, 28 Sep 2014 04:16:42 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: On 9/28/14, 12:09 AM, "Tai-Lin Chu" wrote: >> I?m not sure what you mean by trust to the cache. NDN has no trust to >>the >cache and no way to trust that a selector match is the correct match. > >ndn does not allow cache to publish new data under other provider's >prefix, i.e, table of content, but your discovery protocol is doing >this. NDN allows for caches to pick an answer out of a set. This answer is originally signed by the producer. The Selector Protocol allows for a cache to pick an answer from a set (actually, we could even pick multiple answers from a set). These answer is signed by the producer. The answers are packaged before being transmitted. But nothing is broken in terms of security. In both cases, the same bits are transmitted. But in the Selector Protocol, you travel encapsulated and there are some bits outside you. But from the point of view of the Selector Protocol, the same data is transmitted in NDN and CCN. Saying that Selector Protocols has a different trust model is like saying that NDN over UDP/TCP has a different trust model. From the point of view of NDN or the Selector Protocol, what happens at the other nodes is basically a form of link layer communication. In the Selector Protocol it is obvious that a cache picked the answer. It?s also obvious which cache picked the answer. In NDN you don?t know when a cache picks an answer or for that matter which cache picked the answer. Nacho > >On Sat, Sep 27, 2014 at 2:44 PM, wrote: >> On 9/27/14, 10:19 PM, "Tai-Lin Chu" wrote: >> >>>The concern is that the table of content is not confirmed by the >>>original provider; the cache server's data is "trusted with some other >>>chains". This trust model works but brings restriction. It basically >>>requires build another trust model on "cache server"; otherwise, >>>nothing discovered can be trusted, which also means that you discover >>>nothing. >> >> I?m not sure what you mean by trust to the cache. NDN has no trust to >>the >> cache and no way to trust that a selector match is the correct match. >> >> As I know, a cache can have >> >> /foo/1 >> /foo/2 >> /foo/3 >> >> At could reply with /foo/2 and not give you /foo/3 as the ?latest?. You >> have no way to trust that a cache will give you anything specific. You >> can?t really require this because you can?t require cache nodes to have >>a >> specific cache replacement policy, so as far as you know, the cache >>could >> have dropped /foo/3 from the cache. >> >> As a matter of fact, unless you require signature verification at cache >> nodes (CCN requires this), you don?t even have that. From what I?ve >>been >> told, it?s optional for nodes to check for signatures. So, at any point >> in the network, you never know if previous nodes have verified the >> signature. >> >> So, I?m not sure what kind of ?trust model? you refer to. Is there some >> trust model has that this Selector Protocol breaks at the nodes that run >> the Selector Protocol? If so, could you please explain it. >> >> >>>Another critical point is that those cache servers are not >>>hierarchical, so we can only apply flat signing (one guy signs them >>>all.) This looks very problematic. An quick fix is that you just use >>>impose the name hierarchy, but it is cumbersome too. >> >> Nobody really cares about the signature of the reply. You care about >> what?s encapsulated inside, which, in fact, does authenticate to the >> selector request. Every node running the Selector Protocol can check >>this >> reply and this signature. >> >>>Here is our discussion so far: >>>exact matching -> ... needs discovery protocol (because it is not lpm) >>>-> discovery needs table of content -> restrictive trust model >>>My argument is that this restrictive trust model logically discourages >>>exact matching. >> >> I?m not sure what to make of this. >> >> Every system needs a discovery protocol. NDN is doing it via selector >> matching at the forwarder. CCN does it at a layer above that. We don?t >> believe you should force nodes to let their caches be discoverable and >>to >> run the computation needed for this. >> >> There is no restrictive trust model. In CCN we don?t do anything of >>what >> I?ve described because we don?t do Selector Protocol. The Selector >> Protocol I?m just described is meant to give you the same semantics as >>NDN >> using exact matching. this includes the security model. Just because the >> ?layer underneath? (aka CCN) does not do the same security model doesn?t >> mean that the protocol doesn?t deliver it to you. >> >> It seems to me that you?d be hard pressed to find a feature difference >> between NDN selectors and the CCN nodes running the Selector Protocol I >> described. >> >> >> Let me go over it once again: >> >> >> Network: >> >> A - - - B - - C - - D >> E - F - + >> >> >> A, B, D, E and F are running the Selector Protocol. C is not. >> >> D is serving content for /foo >> B has a copy of /foo/100, signed by D >> >> Node A wants something that starts with /foo/ but has a next component >> greater than 50 >> >> >> >> A issues an interest: >> Interest: name = /foo/hash(Sel(>50)) >> Payload = Sel(>50) >> >> >> Interest arrives at B. B notices that it?s a Selector Protocol based >> interest. >> It interprets the payload, looks at the cache and finds /foo/100 as a >> match. >> It generates a reply. >> >> Data: >> Name = /foo/hash(Sel(>50)) >> Payload = ( Data: Name = /foo/100, Signature = Node D, Payload = data) >> Signature = Node B >> >> That data is sent to node A. >> >> A is running the Selector Protocol. >> It notices that this reply is a Selector Protocol reply. >> It decapsulates the Payload. It extracts /foo/100. >> It checks that /foo/100 is signed by Node D. >> >> A?s interest is satisfied. >> >> >> A issues new interest (for something newer, it wasn?t satisfied with >>100). >> >> A issues an interest: >> Interest: name = /foo/hash(Sel(>100)) >> Payload = Sel(>100) >> >> Sends interest to B. >> >> B knows it?s a Selector Protocol interest. >> Parses the payload for the selectors. It looks at the cache, finds no >> match. >> >> B sends the interest to C >> >> >> C doesn?t understand Selector Protocol. It just does exact matching. >>Finds >> no match. >> It forwards the interest to node D. >> >> >> D is running the Selector Protocol. >> D looks at the data to see what?s the latest one. It?s 200. >> D creates encapsulated reply. >> >> Data: >> Name = /foo/hash(Sel(>100)) >> Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) >> Signature = Node D >> >> >> Forwards the data to C. >> >> C doesn?t know Selector Protocol. It matches the Data to the Interest >> based on exact match of the name >> /foo/hash(Sel(>100)). It may or may not cache the data. >> >> C forwards the data to B. >> >> B is running the Selector Protocol. It matches the data to the interest >> based on PIT. It then proceeds to check that the selectors actually >> match. It check that /foo/200 is greater than /foo/100. The check >> passes. It decides to keep a copy of /foo/200 in its cache. >> >> Node B forwards the data to Node A, which receives it. Node A is running >> the Selector Protocol. It decapsulates the data, checks the authenticity >> and hands it to the app. >> >> >> >> Node E wants some of the /foo data, but only with the right signature. >> >> Node E issues an interest: >> Interest: name = /foo/hash(Sel(Key=NodeD)) >> Payload = Sel(Key=NodeD) >> >> >> >> Sends it to Node F. >> >> >> F receives the interest. It knows it?s a Selector Protocol interest. >> Parses payload, looks in cache but finds no match. >> >> F forwards the interest to node B. >> >> B receives the interest. It knows it?s a Selector Protocol interest. >> Parses payload, looks in the cache and finds a match (namely /foo/200). >> /foo/200 checks out since it is signed by Node D. >> >> Node B creates a reply by encapsulating /foo/200: >> Name = /foo/hash(Sel(Key=NodeD)) >> Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) >> Signature = Node B >> >> It sends the data to F. >> >> Node F is running the Selector Protocol. It sees the reply. It >> decapsulates the object inside (/foo/200). It knows that this PIT entry >> has selectors and requires that the signature come from Node D. It >>checks >> that the signature of /foo/200 is from node D. It is. This is a valid >> reply to the interest, so it forwards the data along to node E and >> consumes the interest. Node F keeps a copy of the /foo/200 object. >> >> Node E receives the object. Matches it to the PIT. Decapsulates the data >> (since E is running the Selector Protocol), matches it to the selectors >> and once checked sends it to the application. >> >> >> >> Done. >> >> In this scenario, most nodes were running the Selector Protocol. But >>it?s >> possible for some nodes not to run it. Those nodes would only do exact >> matching (like node C). In this example, Node C kept a copy of the >> packet /foo/hash(Sel(>100)) (which encapsulated /foo/200), it could use >> this as a reply to another interest with the same name, but it wouldn?t >>be >> able to use this to answer a selector of /foo/hash(Sel(>150)) since that >> would require selector parsing. That request would just be forwarded. >> >> >> To summarize, nodes running the Selector Protocol behave like NDN nodes. >> The rest of the other nodes can do regular CCN with exact matching. >> >> Again, we are not advocating for this discovery protocol, we are just >> saying that you could implement the selector functionality on top of >>exact >> matching. Those nodes that wanted to run the protocol would be able to >>do >> so, and those that did not want to run the protocol would not be >>required >> to do so. >> >> Nacho >> >> >> >>>On Sat, Sep 27, 2014 at 12:40 PM, wrote: >>>> On 9/27/14, 8:40 PM, "Tai-Lin Chu" wrote: >>>> >>>>>> /mail/inbox/selector_matching/ >>>>> >>>>>So Is this implicit? >>>> >>>> No. This is an explicit hash of the interest payload. >>>> >>>> So, an interest could look like: >>>> >>>> Interest: >>>> name = /mail/inbox/selector_matching/1234567890 >>>> payload = ?user=nacho? >>>> >>>> where hash(?user=nacho?) = 1234567890 >>>> >>>> >>>>>BTW, I read all your replies. I think the discovery protocol (send out >>>>>table of content) has to reach the original provider ; otherwise there >>>>>will be some issues in the trust model. At least the cached table of >>>>>content has to be confirmed with the original provider either by key >>>>>delegation or by other confirmation protocol. Besides this, LGTM. >>>> >>>> >>>> The trust model is just slightly different. >>>> >>>> You could have something like: >>>> >>>> Interest: >>>> name = /mail/inbox/selector_matching/1234567890 >>>> payload = ?user=nacho,publisher=mail_server_key? >>>> >>>> >>>> In this case, the reply would come signed by some random cache, but >>>>the >>>> encapsulated object would be signed by mail_server_key. So, any node >>>>that >>>> understood the Selector Protocol could decapsulate the reply and check >>>>the >>>> signature. >>>> >>>> Nodes that do not understand the Selector Protocol would not be able >>>>to >>>> check the signature of the encapsulated answer. >>>> >>>> This to me is not a problem. Base nodes (the ones not running the >>>>Selector >>>> Protocol) would not be checking signatures anyway, at least not in the >>>> fast path. This is an expensive operation that requires the node to >>>>get >>>> the key, etc. Nodes that run the Selector Protocol can check >>>>signatures >>>> if they wish (and can get their hands on a key). >>>> >>>> >>>> >>>> Nacho >>>> >>>> >>>>>On Sat, Sep 27, 2014 at 1:10 AM, wrote: >>>>>> On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" >>>>>>wrote: >>>>>> >>>>>>>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>>>>>>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" >>>>>>>>wrote: >>>>>>>> >>>>>>>>> How can a cache respond to /mail/inbox/selector_matching/>>>>>>>> payload> with a table of content? This name prefix is owned by >>>>>>>>>the >>>>>>>>>mail >>>>>>>>> server. Also the reply really depends on what is in the cache >>>>>>>>>at >>>>>>>>>the >>>>>>>>> moment, so the same name would correspond to different data. >>>>>>>> >>>>>>>> A - Yes, the same name would correspond to different data. This >>>>>>>>is >>>>>>>>true >>>>>>>> given that then data has changed. NDN (and CCN) has no >>>>>>>>architectural >>>>>>>> requirement that a name maps to the same piece of data (Obviously >>>>>>>>not >>>>>>>> talking about self certifying hash-based names). >>>>>>> >>>>>>>There is a difference. A complete NDN name including the implicit >>>>>>>digest >>>>>>>uniquely identifies a piece of data. >>>>>> >>>>>> That?s the same thing for CCN with a ContentObjectHash. >>>>>> >>>>>> >>>>>>>But here the same complete name may map to different data (I suppose >>>>>>>you >>>>>>>don't have implicit digest in an effort to do exact matching). >>>>>> >>>>>> We do, it?s called ContentObjectHash, but it?s not considered part >>>>>>of >>>>>>the >>>>>> name, it?s considered a matching restriction. >>>>>> >>>>>> >>>>>>>In other words, in your proposal, the same name >>>>>>>/mail/inbox/selector_matching/hash1 may map to two or more different >>>>>>>data >>>>>>>packets. But in NDN, two Data packets may share a name prefix, but >>>>>>>definitely not the implicit digest. And at least it is my >>>>>>>understanding >>>>>>>that the application design should make sure that the same producer >>>>>>>doesn't produce different Data packets with the same name prefix >>>>>>>before >>>>>>>implicit digest. >>>>>> >>>>>> This is an application design issue. The network cannot enforce >>>>>>this. >>>>>> Applications will be able to name various data objects with the same >>>>>>name. >>>>>> After all, applications don?t really control the implicit digest. >>>>>> >>>>>>>It is possible in attack scenarios for different producers to >>>>>>>generate >>>>>>>Data packets with the same name prefix before implicit digest, but >>>>>>>still >>>>>>>not the same implicit digest. >>>>>> >>>>>> Why is this an attack scenario? Isn?t it true that if I name my >>>>>>local >>>>>> printer /printer that name can exist in the network at different >>>>>>locations >>>>>> from different publishers? >>>>>> >>>>>> >>>>>> Just to clarify, in the examples provided we weren?t using implicit >>>>>>hashes >>>>>> anywhere. IF we were using implicit hashes (as in, we knew what the >>>>>> implicit hash was), then selectors are useless. If you know the >>>>>>implicit >>>>>> hash, then you don?t need selectors. >>>>>> >>>>>> In the case of CCN, we use names without explicit hashes for most of >>>>>>our >>>>>> initial traffic (discovery, manifests, dynamically generated data, >>>>>>etc.), >>>>>> but after that, we use implicit digests (ContentObjectHash >>>>>>restriction) >>>>>> for practically all of the other traffic. >>>>>> >>>>>> Nacho >>>>>> >>>>>> >>>>>>>> >>>>>>>> B - Yes, you can consider the name prefix is ?owned? by the >>>>>>>>server, >>>>>>>>but >>>>>>>> the answer is actually something that the cache is choosing. The >>>>>>>>cache >>>>>>>>is >>>>>>>> choosing from the set if data that it has. The data that it >>>>>>>>encapsulates >>>>>>>> _is_ signed by the producer. Anybody that can decapsulate the >>>>>>>>data >>>>>>>>can >>>>>>>> verify that this is the case. >>>>>>>> >>>>>>>> Nacho >>>>>>>> >>>>>>>> >>>>>>>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>>>>>>> >>>>>>>>>> My beating on ?discover all? is exactly because of this. Let?s >>>>>>>>>>define >>>>>>>>>> discovery service. If the service is just ?discover latest? >>>>>>>>>> (left/right), can we not simplify the current approach? If the >>>>>>>>>>service >>>>>>>>>> includes more than ?latest?, then is the current approach the >>>>>>>>>>right >>>>>>>>>> approach? >>>>>>>>>> >>>>>>>>>> Sync has its place and is the right solution for somethings. >>>>>>>>>>However, >>>>>>>>>> it should not be a a bandage over discovery. Discovery should >>>>>>>>>>be >>>>>>>>>>its >>>>>>>>>> own valid and useful service. >>>>>>>>>> >>>>>>>>>> I agree that the exclusion approach can work, and work >>>>>>>>>>relatively >>>>>>>>>>well, >>>>>>>>>> for finding the rightmost/leftmost child. I believe this is >>>>>>>>>>because >>>>>>>>>> that operation is transitive through caches. So, within >>>>>>>>>>whatever >>>>>>>>>> timeout an application is willing to wait to find the ?latest?, >>>>>>>>>>it >>>>>>>>>>can >>>>>>>>>> keep asking and asking. >>>>>>>>>> >>>>>>>>>> I do think it would be best to actually try to ask an >>>>>>>>>>authoritative >>>>>>>>>> source first (i.e. a non-cached value), and if that fails then >>>>>>>>>>probe >>>>>>>>>> caches, but experimentation may show what works well. This is >>>>>>>>>>based >>>>>>>>>>on >>>>>>>>>> my belief that in the real world in broad use, the namespace >>>>>>>>>>will >>>>>>>>>>become >>>>>>>>>> pretty polluted and probing will result in a lot of junk, but >>>>>>>>>>that?s >>>>>>>>>> future prognosticating. >>>>>>>>>> >>>>>>>>>> Also, in the exact match vs. continuation match of content >>>>>>>>>>object >>>>>>>>>>to >>>>>>>>>> interest, it is pretty easy to encode that ?selector? request >>>>>>>>>>in a >>>>>>>>>>name >>>>>>>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & >>>>>>>>>>sort=right?) >>>>>>>>>> and any participating cache can respond with a link (or >>>>>>>>>>encapsulate) a >>>>>>>>>> response in an exact match system. >>>>>>>>>> >>>>>>>>>> In the CCNx 1.0 spec, one could also encode this a different >>>>>>>>>>way. >>>>>>>>>>One >>>>>>>>>> could use a name like ?/mail/inbox/selector_matching/>>>>>>>>>payload>? >>>>>>>>>> and in the payload include "exclude_before=(t=version, l=2, >>>>>>>>>>v=279) & >>>>>>>>>> sort=right?. This means that any cache that could process the ? >>>>>>>>>> selector_matching? function could look at the interest payload >>>>>>>>>>and >>>>>>>>>> evaluate the predicate there. The predicate could become large >>>>>>>>>>and >>>>>>>>>>not >>>>>>>>>> pollute the PIT with all the computation state. Including >>>>>>>>>>?>>>>>>>>>of >>>>>>>>>> payload>? in the name means that one could get a cached response >>>>>>>>>>if >>>>>>>>>> someone else had asked the same exact question (subject to the >>>>>>>>>>content >>>>>>>>>> object?s cache lifetime) and it also servers to multiplex >>>>>>>>>>different >>>>>>>>>> payloads for the same function (selector_matching). >>>>>>>>>> >>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synchr >>>>>>>>>>>on >>>>>>>>>>>iz >>>>>>>>>>>at >>>>>>>>>>>io >>>>>>>>>>> nPr >>>>>>>>>>> otocol.html >>>>>>>>>>> >>>>>>>>>>> J. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>>>>>>*all*-value" >>>>>>>>>>>> efficiently. >>>>>>>>>>>> There are still interesting topics on >>>>>>>>>>>> 1. how do we express the discovery query? >>>>>>>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>>>>>>> discovery query with current selector? >>>>>>>>>>>> 3. if so, can we re-express current selector in a more >>>>>>>>>>>>efficient >>>>>>>>>>>>way? >>>>>>>>>>>> >>>>>>>>>>>> I personally see a named data as a set, which can then be >>>>>>>>>>>>categorized >>>>>>>>>>>> into "ordered set", and "unordered set". >>>>>>>>>>>> some questions that any discovery expression must solve: >>>>>>>>>>>> 1. is this a nil set or not? nil set means that this name is >>>>>>>>>>>>the >>>>>>>>>>>>leaf >>>>>>>>>>>> 2. set contains member X? >>>>>>>>>>>> 3. is set ordered or not >>>>>>>>>>>> 4. (ordered) first, prev, next, last >>>>>>>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>>>>>>> 6. recursively answer all questions above on any set member >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> From: >>>>>>>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>>> Cc: , , >>>>>>>>>>>>> >>>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming >>>>>>>>>>>>>convention? >>>>>>>>>>>>> >>>>>>>>>>>>> I think Tai-Lin?s example was just fine to talk about >>>>>>>>>>>>>discovery. >>>>>>>>>>>>> /blah/blah/value, how do you discover all the ?value?s? >>>>>>>>>>>>>Discovery >>>>>>>>>>>>> shouldn?t >>>>>>>>>>>>> care if its email messages or temperature readings or world >>>>>>>>>>>>>cup >>>>>>>>>>>>> photos. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This is true if discovery means "finding everything" - in >>>>>>>>>>>>>which >>>>>>>>>>>>>case, >>>>>>>>>>>>> as you >>>>>>>>>>>>> point out, sync-style approaches may be best. But I am not >>>>>>>>>>>>>sure >>>>>>>>>>>>>that >>>>>>>>>>>>> this >>>>>>>>>>>>> definition is complete. The most pressing example that I can >>>>>>>>>>>>>think >>>>>>>>>>>>> of >>>>>>>>>>>>> is >>>>>>>>>>>>> best-effort latest-value, in which the consumer's goal is to >>>>>>>>>>>>>get >>>>>>>>>>>>>the >>>>>>>>>>>>> latest >>>>>>>>>>>>> copy the network can deliver at the moment, and may not care >>>>>>>>>>>>>about >>>>>>>>>>>>> previous >>>>>>>>>>>>> values or (if freshness is used well) potential later >>>>>>>>>>>>>versions. >>>>>>>>>>>>> >>>>>>>>>>>>> Another case that seems to work well is video seeking. Let's >>>>>>>>>>>>>say I >>>>>>>>>>>>> want to >>>>>>>>>>>>> enable random access to a video by timecode. The publisher >>>>>>>>>>>>>can >>>>>>>>>>>>> provide a >>>>>>>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>>>>>>Interest >>>>>>>>>>>>> that >>>>>>>>>>>>> essentially says "give me the closest keyframe to >>>>>>>>>>>>>00:37:03:12", >>>>>>>>>>>>>which >>>>>>>>>>>>> returns an interest that, via the name, provides the exact >>>>>>>>>>>>>timecode >>>>>>>>>>>>> of >>>>>>>>>>>>> the >>>>>>>>>>>>> keyframe in question and a link to a segment-based namespace >>>>>>>>>>>>>for >>>>>>>>>>>>> efficient >>>>>>>>>>>>> exact match playout. In two roundtrips and in a very >>>>>>>>>>>>>lightweight >>>>>>>>>>>>> way, >>>>>>>>>>>>> the >>>>>>>>>>>>> consumer has random access capability. If the NDN is the >>>>>>>>>>>>>moral >>>>>>>>>>>>> equivalent >>>>>>>>>>>>> of IP, then I am not sure we should be afraid of roundtrips >>>>>>>>>>>>>that >>>>>>>>>>>>> provide >>>>>>>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I described one set of problems using the exclusion approach, >>>>>>>>>>>>>and >>>>>>>>>>>>> that >>>>>>>>>>>>> an >>>>>>>>>>>>> NDN paper on device discovery described a similar problem, >>>>>>>>>>>>>though >>>>>>>>>>>>> they >>>>>>>>>>>>> did >>>>>>>>>>>>> not go into the details of splitting interests, etc. That >>>>>>>>>>>>>all >>>>>>>>>>>>>was >>>>>>>>>>>>> simple >>>>>>>>>>>>> enough to see from the example. >>>>>>>>>>>>> >>>>>>>>>>>>> Another question is how does one do the discovery with exact >>>>>>>>>>>>>match >>>>>>>>>>>>> names, >>>>>>>>>>>>> which is also conflating things. You could do a different >>>>>>>>>>>>>discovery >>>>>>>>>>>>> with >>>>>>>>>>>>> continuation names too, just not the exclude method. >>>>>>>>>>>>> >>>>>>>>>>>>> As I alluded to, one needs a way to talk with a specific >>>>>>>>>>>>>cache >>>>>>>>>>>>>about >>>>>>>>>>>>> its >>>>>>>>>>>>> ?table of contents? for a prefix so one can get a consistent >>>>>>>>>>>>>set >>>>>>>>>>>>>of >>>>>>>>>>>>> results >>>>>>>>>>>>> without all the round-trips of exclusions. Actually >>>>>>>>>>>>>downloading >>>>>>>>>>>>>the >>>>>>>>>>>>> ?headers? of the messages would be the same bytes, more or >>>>>>>>>>>>>less. >>>>>>>>>>>>>In >>>>>>>>>>>>> a >>>>>>>>>>>>> way, >>>>>>>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, >>>>>>>>>>>>>but >>>>>>>>>>>>>that >>>>>>>>>>>>> protocol has its own set of problems and I?m not suggesting >>>>>>>>>>>>>to >>>>>>>>>>>>>use >>>>>>>>>>>>> that >>>>>>>>>>>>> directly. >>>>>>>>>>>>> >>>>>>>>>>>>> One approach is to encode a request in a name component and a >>>>>>>>>>>>> participating >>>>>>>>>>>>> cache can reply. It replies in such a way that one could >>>>>>>>>>>>>continue >>>>>>>>>>>>> talking >>>>>>>>>>>>> with that cache to get its TOC. One would then issue another >>>>>>>>>>>>> interest >>>>>>>>>>>>> with >>>>>>>>>>>>> a request for not-that-cache. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>>>>>>scenario? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Another approach is to try to ask the authoritative source >>>>>>>>>>>>>for >>>>>>>>>>>>>the >>>>>>>>>>>>> ?current? >>>>>>>>>>>>> manifest name, i.e. /mail/inbox/current/, which could >>>>>>>>>>>>>return >>>>>>>>>>>>> the >>>>>>>>>>>>> manifest or a link to the manifest. Then fetching the actual >>>>>>>>>>>>> manifest >>>>>>>>>>>>> from >>>>>>>>>>>>> the link could come from caches because you how have a >>>>>>>>>>>>>consistent >>>>>>>>>>>>> set of >>>>>>>>>>>>> names to ask for. If you cannot talk with an authoritative >>>>>>>>>>>>>source, >>>>>>>>>>>>> you >>>>>>>>>>>>> could try again without the nonce and see if there?s a cached >>>>>>>>>>>>>copy >>>>>>>>>>>>> of a >>>>>>>>>>>>> recent version around. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human >>>>>>>>>>>>>being, >>>>>>>>>>>>>see a >>>>>>>>>>>>> pattern with static (/mail/inbox) and variable (148) >>>>>>>>>>>>>components; >>>>>>>>>>>>>with >>>>>>>>>>>>> proper naming convention, computers can also detect this >>>>>>>>>>>>>pattern >>>>>>>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>>>>>>generate >>>>>>>>>>>>> a >>>>>>>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>>>>>>> selectors >>>>>>>>>>>>> I can further refine my guesses. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>>>>>>> application >>>>>>>>>>>>> design). You have an app (a mail server / inbox) and you >>>>>>>>>>>>>want >>>>>>>>>>>>>it >>>>>>>>>>>>>to >>>>>>>>>>>>> list >>>>>>>>>>>>> your emails? An email list is an application data structure. >>>>>>>>>>>>>I >>>>>>>>>>>>> don?t >>>>>>>>>>>>> think you should use the network structure to reflect this. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think Tai-Lin is trying to sketch a small example, not >>>>>>>>>>>>>propose >>>>>>>>>>>>>a >>>>>>>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Another way to look at it is that if the network architecture >>>>>>>>>>>>>is >>>>>>>>>>>>> providing >>>>>>>>>>>>> the equivalent of distributed storage to the application, >>>>>>>>>>>>>perhaps >>>>>>>>>>>>>the >>>>>>>>>>>>> application data structure could be adapted to match the >>>>>>>>>>>>>affordances >>>>>>>>>>>>> of >>>>>>>>>>>>> the network. Then it would not be so bad that the two >>>>>>>>>>>>>structures >>>>>>>>>>>>> were >>>>>>>>>>>>> aligned. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I?ll give you an example, how do you delete emails from your >>>>>>>>>>>>>inbox? >>>>>>>>>>>>> If >>>>>>>>>>>>> an >>>>>>>>>>>>> email was cached in the network it can never be deleted from >>>>>>>>>>>>>your >>>>>>>>>>>>> inbox? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This is conflating two issues - what you are pointing out is >>>>>>>>>>>>>that >>>>>>>>>>>>>the >>>>>>>>>>>>> data >>>>>>>>>>>>> structure of a linear list doesn't handle common email >>>>>>>>>>>>>management >>>>>>>>>>>>> operations well. Again, I'm not sure if that's what he was >>>>>>>>>>>>>getting >>>>>>>>>>>>> at >>>>>>>>>>>>> here. But deletion is not the issue - the availability of a >>>>>>>>>>>>>data >>>>>>>>>>>>> object >>>>>>>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>>>>>>> perspective >>>>>>>>>>>>> of the application. >>>>>>>>>>>>> >>>>>>>>>>>>> Or moved to another mailbox? Do you rely on the emails >>>>>>>>>>>>>expiring? >>>>>>>>>>>>> >>>>>>>>>>>>> This problem is true for most (any?) situations where you use >>>>>>>>>>>>>network >>>>>>>>>>>>> name >>>>>>>>>>>>> structure to directly reflect the application data structure. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Not sure I understand how you make the leap from the example >>>>>>>>>>>>>to >>>>>>>>>>>>>the >>>>>>>>>>>>> general statement. >>>>>>>>>>>>> >>>>>>>>>>>>> Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Nacho >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, yes I think those would all be good things. >>>>>>>>>>>>> >>>>>>>>>>>>> One thing to keep in mind, especially with things like time >>>>>>>>>>>>>series >>>>>>>>>>>>> sensor >>>>>>>>>>>>> data, is that people see a pattern and infer a way of doing >>>>>>>>>>>>>it. >>>>>>>>>>>>> That?s >>>>>>>>>>>>> easy >>>>>>>>>>>>> for a human :) But in Discovery, one should assume that one >>>>>>>>>>>>>does >>>>>>>>>>>>>not >>>>>>>>>>>>> know >>>>>>>>>>>>> of patterns in the data beyond what the protocols used to >>>>>>>>>>>>>publish >>>>>>>>>>>>>the >>>>>>>>>>>>> data >>>>>>>>>>>>> explicitly require. That said, I think some of the things >>>>>>>>>>>>>you >>>>>>>>>>>>>listed >>>>>>>>>>>>> are >>>>>>>>>>>>> good places to start: sensor data, web content, climate data >>>>>>>>>>>>>or >>>>>>>>>>>>> genome >>>>>>>>>>>>> data. >>>>>>>>>>>>> >>>>>>>>>>>>> We also need to state what the forwarding strategies are and >>>>>>>>>>>>>what >>>>>>>>>>>>>the >>>>>>>>>>>>> cache >>>>>>>>>>>>> behavior is. >>>>>>>>>>>>> >>>>>>>>>>>>> I outlined some of the points that I think are important in >>>>>>>>>>>>>that >>>>>>>>>>>>> other >>>>>>>>>>>>> posting. While ?discover latest? is useful, ?discover all? >>>>>>>>>>>>>is >>>>>>>>>>>>>also >>>>>>>>>>>>> important, and that one gets complicated fast. So points >>>>>>>>>>>>>like >>>>>>>>>>>>> separating >>>>>>>>>>>>> discovery from retrieval and working with large data sets >>>>>>>>>>>>>have >>>>>>>>>>>>>been >>>>>>>>>>>>> important in shaping our thinking. That all said, I?d be >>>>>>>>>>>>>happy >>>>>>>>>>>>> starting >>>>>>>>>>>>> from 0 and working through the Discovery service definition >>>>>>>>>>>>>from >>>>>>>>>>>>> scratch >>>>>>>>>>>>> along with data set use cases. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Marc, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get >>>>>>>>>>>>>one >>>>>>>>>>>>>step >>>>>>>>>>>>> more >>>>>>>>>>>>> specific, which was to see if we could identify a few >>>>>>>>>>>>>specific >>>>>>>>>>>>>use >>>>>>>>>>>>> cases >>>>>>>>>>>>> around which to have the conversation. (e.g., time series >>>>>>>>>>>>>sensor >>>>>>>>>>>>> data >>>>>>>>>>>>> and >>>>>>>>>>>>> web content retrieval for "get latest"; climate data for huge >>>>>>>>>>>>>data >>>>>>>>>>>>> sets; >>>>>>>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>>>>>>looking >>>>>>>>>>>>> at >>>>>>>>>>>>> that's driving considerations of discovery? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> From: >>>>>>>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>>> Cc: , >>>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming >>>>>>>>>>>>>convention? >>>>>>>>>>>>> >>>>>>>>>>>>> Jeff, >>>>>>>>>>>>> >>>>>>>>>>>>> Take a look at my posting (that Felix fixed) in a new thread >>>>>>>>>>>>>on >>>>>>>>>>>>> Discovery. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-Septe >>>>>>>>>>>>>mb >>>>>>>>>>>>>er >>>>>>>>>>>>>/0 >>>>>>>>>>>>>00 >>>>>>>>>>>>> 20 >>>>>>>>>>>>> 0 >>>>>>>>>>>>> .html >>>>>>>>>>>>> >>>>>>>>>>>>> I think it would be very productive to talk about what >>>>>>>>>>>>>Discovery >>>>>>>>>>>>> should >>>>>>>>>>>>> do, >>>>>>>>>>>>> and not focus on the how. It is sometimes easy to get caught >>>>>>>>>>>>>up >>>>>>>>>>>>>in >>>>>>>>>>>>> the >>>>>>>>>>>>> how, >>>>>>>>>>>>> which I think is a less important topic than the what at this >>>>>>>>>>>>>stage. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Marc, >>>>>>>>>>>>> >>>>>>>>>>>>> If you can't talk about your protocols, perhaps we can >>>>>>>>>>>>>discuss >>>>>>>>>>>>>this >>>>>>>>>>>>> based >>>>>>>>>>>>> on use cases. What are the use cases you are using to >>>>>>>>>>>>>evaluate >>>>>>>>>>>>> discovery? >>>>>>>>>>>>> >>>>>>>>>>>>> Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>>>>>>forwarder >>>>>>>>>>>>> can >>>>>>>>>>>>> send interests different ways you don't have a consistent >>>>>>>>>>>>>underlying >>>>>>>>>>>>> set >>>>>>>>>>>>> to talk about so you would always need non-range exclusions >>>>>>>>>>>>>to >>>>>>>>>>>>> discover >>>>>>>>>>>>> every version. >>>>>>>>>>>>> >>>>>>>>>>>>> Range exclusions only work I believe if you get an >>>>>>>>>>>>>authoritative >>>>>>>>>>>>> answer. >>>>>>>>>>>>> If different content pieces are scattered between different >>>>>>>>>>>>>caches >>>>>>>>>>>>>I >>>>>>>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>>>>>>version. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm sorry to be pointing out problems without offering >>>>>>>>>>>>>solutions >>>>>>>>>>>>>but >>>>>>>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>>>>>>> >>>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I see. Can you briefly describe how ccnx discovery protocol >>>>>>>>>>>>>solves >>>>>>>>>>>>> the >>>>>>>>>>>>> all problems that you mentioned (not just exclude)? a doc >>>>>>>>>>>>>will >>>>>>>>>>>>>be >>>>>>>>>>>>> better. >>>>>>>>>>>>> >>>>>>>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I >>>>>>>>>>>>>will >>>>>>>>>>>>> soon >>>>>>>>>>>>> expect [and] and [or], so boolean algebra is fully supported. >>>>>>>>>>>>> Regular >>>>>>>>>>>>> language or context free language might become part of >>>>>>>>>>>>>selector >>>>>>>>>>>>>too. >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> That will get you one reading then you need to exclude it and >>>>>>>>>>>>>ask >>>>>>>>>>>>> again. >>>>>>>>>>>>> >>>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>>set >>>>>>>>>>>>> with a particular cache, then you need to always use >>>>>>>>>>>>>individual >>>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>>versions >>>>>>>>>>>>> of an object. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I am very confused. For your example, if I want to get all >>>>>>>>>>>>>today's >>>>>>>>>>>>> sensor data, I just do (Any..Last second of last day)(First >>>>>>>>>>>>>second >>>>>>>>>>>>>of >>>>>>>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>>>>>> >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> If you talk sometimes to A and sometimes to B, you very >>>>>>>>>>>>>easily >>>>>>>>>>>>> could miss content objects you want to discovery unless you >>>>>>>>>>>>>avoid >>>>>>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Could you explain why missing content object situation >>>>>>>>>>>>>happens? >>>>>>>>>>>>>also >>>>>>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>>>>>> exclude; >>>>>>>>>>>>> converting from explicit excludes to ranged exclude is always >>>>>>>>>>>>> possible. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>>set >>>>>>>>>>>>> with a particular cache, then you need to always use >>>>>>>>>>>>>individual >>>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>>versions >>>>>>>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>>>>>>updated, >>>>>>>>>>>>> say, once per second you will have 86,400 of them per day. >>>>>>>>>>>>>If >>>>>>>>>>>>>each >>>>>>>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 bytes >>>>>>>>>>>>>of >>>>>>>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>>>>>>> >>>>>>>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>>>>>>> timestamp makes sense here, but its just an example of >>>>>>>>>>>>>needing >>>>>>>>>>>>>a >>>>>>>>>>>>>lot >>>>>>>>>>>>> of exclusions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You exclude through 100 then issue a new interest. This goes >>>>>>>>>>>>>to >>>>>>>>>>>>> cache B >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I feel this case is invalid because cache A will also get the >>>>>>>>>>>>> interest, and cache A will return v101 if it exists. Like you >>>>>>>>>>>>>said, >>>>>>>>>>>>> if >>>>>>>>>>>>> this goes to cache B only, it means that cache A dies. How do >>>>>>>>>>>>>you >>>>>>>>>>>>> know >>>>>>>>>>>>> that v101 even exist? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I guess this depends on what the forwarding strategy is. If >>>>>>>>>>>>>the >>>>>>>>>>>>> forwarder will always send each interest to all replicas, >>>>>>>>>>>>>then >>>>>>>>>>>>>yes, >>>>>>>>>>>>> modulo packet loss, you would discover v101 on cache A. If >>>>>>>>>>>>>the >>>>>>>>>>>>> forwarder is just doing ?best path? and can round-robin >>>>>>>>>>>>>between >>>>>>>>>>>>>cache >>>>>>>>>>>>> A and cache B, then your application could miss v101. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> c,d In general I agree that LPM performance is related to the >>>>>>>>>>>>>number >>>>>>>>>>>>> of components. In my own thread-safe LMP implementation, I >>>>>>>>>>>>>used >>>>>>>>>>>>>only >>>>>>>>>>>>> one RWMutex for the whole tree. I don't know whether adding >>>>>>>>>>>>>lock >>>>>>>>>>>>>for >>>>>>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>>>>>> >>>>>>>>>>>>> However, we should compare (exact match + discovery protocol) >>>>>>>>>>>>>vs >>>>>>>>>>>>> (ndn >>>>>>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, we should compare them. And we need to publish the ccnx >>>>>>>>>>>>>1.0 >>>>>>>>>>>>> specs for doing the exact match discovery. So, as I said, >>>>>>>>>>>>>I?m >>>>>>>>>>>>>not >>>>>>>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, >>>>>>>>>>>>>wrote: >>>>>>>>>>>>> I would point out that using LPM on content object to >>>>>>>>>>>>>Interest >>>>>>>>>>>>> matching to do discovery has its own set of problems. >>>>>>>>>>>>>Discovery >>>>>>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>>>>>> >>>>>>>>>>>>> This is probably getting off-topic from the original post >>>>>>>>>>>>>about >>>>>>>>>>>>> naming conventions. >>>>>>>>>>>>> >>>>>>>>>>>>> a. If Interests can be forwarded multiple directions and two >>>>>>>>>>>>> different caches are responding, the exclusion set you build >>>>>>>>>>>>>up >>>>>>>>>>>>> talking with cache A will be invalid for cache B. If you >>>>>>>>>>>>>talk >>>>>>>>>>>>> sometimes to A and sometimes to B, you very easily could miss >>>>>>>>>>>>> content objects you want to discovery unless you avoid all >>>>>>>>>>>>>range >>>>>>>>>>>>> exclusions and only exclude explicit versions. That will >>>>>>>>>>>>>lead >>>>>>>>>>>>>to >>>>>>>>>>>>> very large interest packets. In ccnx 1.0, we believe that an >>>>>>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>>>>>> consistent sets is better. >>>>>>>>>>>>> >>>>>>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>>>>>> should be transitive between caches, but imagine this. You >>>>>>>>>>>>>send >>>>>>>>>>>>> Interest #1 to cache A which returns version 100. You >>>>>>>>>>>>>exclude >>>>>>>>>>>>> through 100 then issue a new interest. This goes to cache B >>>>>>>>>>>>>who >>>>>>>>>>>>> only has version 99, so the interest times out or is NACK?d. >>>>>>>>>>>>>So >>>>>>>>>>>>> you think you have it! But, cache A already has version 101, >>>>>>>>>>>>>you >>>>>>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>>>>>> consistent sets, it seems like even doing latest version >>>>>>>>>>>>>discovery >>>>>>>>>>>>> is difficult with selector based discovery. From what I saw >>>>>>>>>>>>>in >>>>>>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to the >>>>>>>>>>>>> authoritative source because you can never believe an >>>>>>>>>>>>>intermediate >>>>>>>>>>>>> cache that there?s not something more recent. >>>>>>>>>>>>> >>>>>>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, I?d >>>>>>>>>>>>>be >>>>>>>>>>>>> interest in seeing your analysis. Case (a) is that a node >>>>>>>>>>>>>can >>>>>>>>>>>>> correctly discover every version of a name prefix, and (b) is >>>>>>>>>>>>>that >>>>>>>>>>>>> a node can correctly discover the latest version. We have >>>>>>>>>>>>>not >>>>>>>>>>>>> formally compared (or yet published) our discovery protocols >>>>>>>>>>>>>(we >>>>>>>>>>>>> have three, 2 for content, 1 for device) compared to selector >>>>>>>>>>>>>based >>>>>>>>>>>>> discovery, so I cannot yet claim they are better, but they do >>>>>>>>>>>>>not >>>>>>>>>>>>> have the non-determinism sketched above. >>>>>>>>>>>>> >>>>>>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups >>>>>>>>>>>>>you >>>>>>>>>>>>> must do in the PIT to match a content object. If you have a >>>>>>>>>>>>>name >>>>>>>>>>>>> tree or a threaded hash table, those don?t all need to be >>>>>>>>>>>>>hash >>>>>>>>>>>>> lookups, but you need to walk up the name tree for every >>>>>>>>>>>>>prefix >>>>>>>>>>>>>of >>>>>>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>>>>>> Content Based Networking (CBN) had some some methods to >>>>>>>>>>>>>create >>>>>>>>>>>>>data >>>>>>>>>>>>> structures based on predicates, maybe those would be better. >>>>>>>>>>>>>But >>>>>>>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>>>>>>entries >>>>>>>>>>>>> if there is Interest traffic for many prefixes of a root. >>>>>>>>>>>>>Even >>>>>>>>>>>>>on >>>>>>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll >>>>>>>>>>>>>have a >>>>>>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>>>>>> implementation only requires at most 3 lookups (one by name, >>>>>>>>>>>>>one >>>>>>>>>>>>>by >>>>>>>>>>>>> name + keyid, one by name + content object hash), and one can >>>>>>>>>>>>>do >>>>>>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>>>>>> >>>>>>>>>>>>> d. In (c) above, if you have a threaded name tree or are just >>>>>>>>>>>>> walking parent pointers, I suspect you?ll need locking of the >>>>>>>>>>>>> ancestors in a multi-threaded system (?threaded" here meaning >>>>>>>>>>>>>LWP) >>>>>>>>>>>>> and that will be expensive. It would be interesting to see >>>>>>>>>>>>>what >>>>>>>>>>>>>a >>>>>>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I had thought about these questions, but I want to know your >>>>>>>>>>>>>idea >>>>>>>>>>>>> besides typed component: >>>>>>>>>>>>> 1. LPM allows "data discovery". How will exact match do >>>>>>>>>>>>>similar >>>>>>>>>>>>> things? >>>>>>>>>>>>> 2. will removing selectors improve performance? How do we use >>>>>>>>>>>>> other >>>>>>>>>>>>> faster technique to replace selector? >>>>>>>>>>>>> 3. fixed byte length and type. I agree more that type can be >>>>>>>>>>>>>fixed >>>>>>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Could you share it with us? >>>>>>>>>>>>> >>>>>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>>>>> >>>>>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>>>>> >>>>>>>>>>>>> The type space is currently shared with the types used for >>>>>>>>>>>>>the >>>>>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>>>>> likelihood there will be at least as much and probably more >>>>>>>>>>>>>need >>>>>>>>>>>>> to component types than protocol extensions, we could reserve >>>>>>>>>>>>>1/2 >>>>>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>>>>> (2) since there is no parsing ambiguity between name >>>>>>>>>>>>>components >>>>>>>>>>>>> and other fields of the protocol (sine they are sub-types of >>>>>>>>>>>>>the >>>>>>>>>>>>> name type) we could reuse numbers and thereby have an entire >>>>>>>>>>>>>65K >>>>>>>>>>>>> name component types. >>>>>>>>>>>>> >>>>>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>>>>> standard, IANA has 25 years of experience running registries >>>>>>>>>>>>>and >>>>>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>>>>> registries (open, requires a written spec, requires standards >>>>>>>>>>>>> approval). >>>>>>>>>>>>> >>>>>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>>>>> cases where there are no special semantics on the name >>>>>>>>>>>>>component. >>>>>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>>>>> globally understood types that are part of the base or >>>>>>>>>>>>>extension >>>>>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>>>>> - We reserve some portion of the space for unanticipated uses >>>>>>>>>>>>> (say another 1024 types) >>>>>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>>>>> >>>>>>>>>>>>> Make sense? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>> performance flaws in the design >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> we could design for performance, >>>>>>>>>>>>> >>>>>>>>>>>>> That?s not what people are advocating. We are advocating that >>>>>>>>>>>>>we >>>>>>>>>>>>> *not* design for known bad performance and hope serendipity >>>>>>>>>>>>>or >>>>>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>>>>> >>>>>>>>>>>>> but I think there will be a turning >>>>>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>>>>> >>>>>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters so >>>>>>>>>>>>> things that don?t get faster while others do tend to get >>>>>>>>>>>>>dropped >>>>>>>>>>>>> or not used because they impose a performance penalty >>>>>>>>>>>>>relative >>>>>>>>>>>>>to >>>>>>>>>>>>> the things that go faster. There is also the ?low-end? >>>>>>>>>>>>>phenomenon >>>>>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>>>>> rather than improving performance. For those environments bad >>>>>>>>>>>>> performance just never get better. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you >>>>>>>>>>>>> think there will be some design of ndn that will *never* have >>>>>>>>>>>>> performance improvement? >>>>>>>>>>>>> >>>>>>>>>>>>> I suspect LPM on data will always be slow (relative to the >>>>>>>>>>>>>other >>>>>>>>>>>>> functions). >>>>>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>>>>> require extra memory references. >>>>>>>>>>>>> >>>>>>>>>>>>> However I of course don?t claim to clairvoyance so this is >>>>>>>>>>>>>just >>>>>>>>>>>>> speculation based on 35+ years of seeing performance improve >>>>>>>>>>>>>by 4 >>>>>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>>>>> cycles and memory references? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn to >>>>>>>>>>>>> perform >>>>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>>>> becomes >>>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>>>> >>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>> performance flaws in the design: >>>>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>>>>> near-zero lock contention. >>>>>>>>>>>>> >>>>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>>>>> poster children for this are: >>>>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>>>> on modern forwarding hardware, so they can?t be reliably used >>>>>>>>>>>>> anywhere >>>>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>>>> specified and is now a giant PITA that still causes major >>>>>>>>>>>>>pain >>>>>>>>>>>>> in working around. >>>>>>>>>>>>> >>>>>>>>>>>>> I?m afraid students today are being taught the that designers >>>>>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>>>>> engineers that got most of it right. >>>>>>>>>>>>> >>>>>>>>>>>>> I feel the discussion today and yesterday has been off-topic. >>>>>>>>>>>>> Now I >>>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>>>> types >>>>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>>>> additional >>>>>>>>>>>>> marker space >>>>>>>>>>>>> >>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I can >>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>> >>>>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>>>>> incompatible schemes or trying to make naming conventions >>>>>>>>>>>>>with >>>>>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>>>>> resilient against aliasing. >>>>>>>>>>>>> >>>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>>>> convention >>>>>>>>>>>>> needs to be revised. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>>>> to fit in (the >>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>>>> current NDN >>>>>>>>>>>>> experiments? >>>>>>>>>>>>> >>>>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>>>> Related: Many URLs >>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>>>> text lines, and >>>>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>>>> I see. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>>>> architectures, >>>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>>>> bytes in memory, >>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>>>> 32-byte blocks >>>>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>>>> If you need to >>>>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>>>> have to read past >>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>>>> to get to the >>>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>>> >>>>>>>>>>>>> Marc >>>>>>>>>>>>> >>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>>> >>>>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>>>> it is: >>>>>>>>>>>>> >>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>>>> from the start >>>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>>> OR >>>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>>> >>>>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>>>> seem to imply >>>>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>>>> case. >>>>>>>>>>>>> >>>>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>>>> for the index, >>>>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>>>> components. For the >>>>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>>>> 5 components. >>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>>>> than >>>>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>>>> find that the >>>>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>>>> performance of the >>>>>>>>>>>>> direct access. >>>>>>>>>>>>> >>>>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>>>> all, for >>>>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>>>> offsets as >>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>>>> you may see a >>>>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>>>> than in >>>>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>>>> that you?re >>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>>>> name is to be >>>>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>>>> performance >>>>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>>>> bet this is not >>>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>>> >>>>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>>>> protocol with >>>>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>>>> the name to a >>>>>>>>>>>>> different processor than the index). >>>>>>>>>>>>> >>>>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>>>> would like to see >>>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>>> >>>>>>>>>>>>> Nacho >>>>>>>>>>>>> >>>>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>>>> correct me if >>>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>>> Protocol Architect >>>>>>>>>>>>> Principal Scientist >>>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>>>> amount of >>>>>>>>>>>>> bytes: >>>>>>>>>>>>> >>>>>>>>>>>>> i.e., >>>>>>>>>>>>> Type = Offsets >>>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>>> >>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>>>> name is too >>>>>>>>>>>>> long. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>> you only >>>>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>>>> the >>>>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>>>> iteratively parse >>>>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>>>> directly >>>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>>>> "offset" is >>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>>>> parse x-1 >>>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>>>> like the >>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>>>> understand what >>>>>>>>>>>>> you >>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>>>> entirely >>>>>>>>>>>>> different >>>>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>>>> ... >>>>>>>>>>>>> someplace >>>>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>>>> you say >>>>>>>>>>>>> "field >>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>>>> TLV)? >>>>>>>>>>>>> >>>>>>>>>>>>> Correct. >>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>>>> name >>>>>>>>>>>>> hierarchy >>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>>>> offset to use >>>>>>>>>>>>> in >>>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>>>> Aliasing is >>>>>>>>>>>>> avoided as >>>>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>>>> use the >>>>>>>>>>>>> "offset >>>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>>> >>>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>>> >>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>> you only >>>>>>>>>>>>> want >>>>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>>>> offsets. >>>>>>>>>>>>> With the >>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>>>> x-1 >>>>>>>>>>>>> components. >>>>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>>>> firs x >>>>>>>>>>>>> components. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- Mark >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> The why is simple: >>>>>>>>>>>>> >>>>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>>>> "specific >>>>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>>>> in order >>>>>>>>>>>>> to >>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>>>> rule >>>>>>>>>>>>> (specify >>>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>>> >>>>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>>>> prefer >>>>>>>>>>>>> to >>>>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>>>> separator. Then, >>>>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>>>> used at >>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>>>> that >>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>>>> version, >>>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>>>> up with >>>>>>>>>>>>> names >>>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>>> components >>>>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>>>> is an >>>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>>> component's >>>>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>>>> that >>>>>>>>>>>>> applications MUST use). >>>>>>>>>>>>> >>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>>> explanation >>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>>>> e.g.) >>>>>>>>>>>>> and >>>>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>>>> have to >>>>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>>>> prefer >>>>>>>>>>>>> the >>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>>>> preference in >>>>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>>>> it is >>>>>>>>>>>>> that >>>>>>>>>>>>> you express a preference for the "convention" with problems ? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Mark >>>>>>>>>>>>> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Ndn-interest mailing list >>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>> >>>>>>> >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Ndn-interest mailing list >>>>>>>Ndn-interest at lists.cs.ucla.edu >>>>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Ndn-interest mailing list >>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>>>>_______________________________________________ >>>>>Ndn-interest mailing list >>>>>Ndn-interest at lists.cs.ucla.edu >>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>> >> > >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest From Ignacio.Solis at parc.com Sat Sep 27 21:27:29 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Sun, 28 Sep 2014 04:27:29 +0000 Subject: [Ndn-interest] any comments on naming convention? In-Reply-To: References: <34691EA2-C408-4DBB-962D-621D7C6B40C8@parc.com> <4EC4EA30-713A-4CB0-8A90-2781194A3E78@memphis.edu> Message-ID: On 9/28/14, 12:56 AM, "Tai-Lin Chu" wrote: >Thanks for the detail. >I see yet another problem: how do you know whether this packet >delivered is encapsulated or not? You have to check twice in the worst >case: you did not check out in the outer packet, and then read again >from the data portion. (although some markings in the packet could >solve this) First, there is no double check. CCN does not check outer packets for signatures in the fast path (the forwarding path). Nodes running the Selector Protocol can check the inner packet if they wish. (Assuming you?re talking about checking signatures). You know that the packet is a Selector Protocol packet because the PIT said so. The PIT knows because it can see it from the name component ?selector_protocol?, which, in the case of CCN could be labeled using a name segment type. However, you could imagine having an actual content type as well. As a matter of fact, for all we know this could be a regular CCN manifest with embedded objects. >Just some thoughts: this protocol actually makes ndn node harder, >because now it has to know whether there are some ccnx node in the >network that tries to encapsulate. ccnx node actually does not >understand selectors but creates problems for ndn nodes to solve. :( >But thanks for this co-existing world proposal. I?m not sure how the ccnx node creates problems for the other nodes, feel free to expand (if you?re not tired of the thread). I?m also not sure how it makes ndn nodes harder. What it does is make NDN a protocol running on top of CCN (namely the Selector Protocol). The fact that this is possible gives us confidence that the CCN protocol is a good subset of features to act as the common communication framework. To be fair, CCN can also run on top of NDN. We could just use exact matching in NDN and not use any selectors. As a matter of fact, our software stack can support this sort of arrangement (running on top of NFD for example). We don?t perceive any loss of functionality (from an application perspective) but don?t like to make nodes do more work than needed. I find it funny that there?s a strong discussion of saving 1 or 2 bytes by having a flexible TLVs but you still force nodes to process things like selectors and are willing to carry large payloads in the names. Nacho >On Sat, Sep 27, 2014 at 3:09 PM, Tai-Lin Chu wrote: >>> I?m not sure what you mean by trust to the cache. NDN has no trust to >>>the >> cache and no way to trust that a selector match is the correct match. >> >> ndn does not allow cache to publish new data under other provider's >> prefix, i.e, table of content, but your discovery protocol is doing >> this. >> >> On Sat, Sep 27, 2014 at 2:44 PM, wrote: >>> On 9/27/14, 10:19 PM, "Tai-Lin Chu" wrote: >>> >>>>The concern is that the table of content is not confirmed by the >>>>original provider; the cache server's data is "trusted with some other >>>>chains". This trust model works but brings restriction. It basially >>>>requires build another trust model on "cache server"; otherwise, >>>>nothing discovered can be trusted, which also means that you discover >>>>nothing. >>> >>> I?m not sure what you mean by trust to the cache. NDN has no trust to >>>the >>> cache and no way to trust that a selector match is the correct match. >>> >>> As I know, a cache can have >>> >>> /foo/1 >>> /foo/2 >>> /foo/3 >>> >>> At could reply with /foo/2 and not give you /foo/3 as the ?latest?. >>>You >>> have no way to trust that a cache will give you anything specific. You >>> can?t really require this because you can?t require cache nodes to >>>have a >>> specific cache replacement policy, so as far as you know, the cache >>>could >>> have dropped /foo/3 from the cache. >>> >>> As a matter of fact, unless you require signature verification at cache >>> nodes (CCN requires this), you don?t even have that. From what I?ve >>>been >>> told, it?s optional for nodes to check for signatures. So, at any >>>point >>> in the network, you never know if previous nodes have verified the >>> signature. >>> >>> So, I?m not sure what kind of ?trust model? you refer to. Is there >>>some >>> trust model has that this Selector Protocol breaks at the nodes that >>>run >>> the Selector Protocol? If so, could you please explain it. >>> >>> >>>>Another critical point is that those cache servers are not >>>>hierarchical, so we can only apply flat signing (one guy signs them >>>>all.) This looks very problematic. An quick fix is that you just use >>>>impose the name hierarchy, but it is cumbersome too. >>> >>> Nobody really cares about the signature of the reply. You care about >>> what?s encapsulated inside, which, in fact, does authenticate to the >>> selector request. Every node running the Selector Protocol can check >>>this >>> reply and this signature. >>> >>>>Here is our discussion so far: >>>>exact matching -> ... needs discovery protocol (because it is not lpm) >>>>-> discovery needs table of content -> restrictive trust model >>>>My argument is that this restrictive trust model logically discourages >>>>exact matching. >>> >>> I?m not sure what to make of this. >>> >>> Every system needs a discovery protocol. NDN is doing it via selector >>> matching at the forwarder. CCN does it at a layer above that. We >>>don?t >>> believe you should force nodes to let their caches be discoverable and >>>to >>> run the computation needed for this. >>> >>> There is no restrictive trust model. In CCN we don?t do anything of >>>what >>> I?ve described because we don?t do Selector Protocol. The Selector >>> Protocol I?m just described is meant to give you the same semantics as >>>NDN >>> using exact matching. this includes the security model. Just because >>>the >>> ?layer underneath? (aka CCN) does not do the same security model >>>doesn?t >>> mean that the protocol doesn?t deliver it to you. >>> >>> It seems to me that you?d be hard pressed to find a feature difference >>> between NDN selectors and the CCN nodes running the Selector Protocol I >>> described. >>> >>> >>> Let me go over it once again: >>> >>> >>> Network: >>> >>> A - - - B - - C - - D >>> E - F - + >>> >>> >>> A, B, D, E and F are running the Selector Protocol. C is not. >>> >>> D is serving content for /foo >>> B has a copy of /foo/100, signed by D >>> >>> Node A wants something that starts with /foo/ but has a next component >>> greater than 50 >>> >>> >>> >>> A issues an interest: >>> Interest: name = /foo/hash(Sel(>50)) >>> Payload = Sel(>50) >>> >>> >>> Interest arrives at B. B notices that it?s a Selector Protocol based >>> interest. >>> It interprets the payload, looks at the cache and finds /foo/100 as a >>> match. >>> It generates a reply. >>> >>> Data: >>> Name = /foo/hash(Sel(>50)) >>> Payload = ( Data: Name = /foo/100, Signature = Node D, Payload = data) >>> Signature = Node B >>> >>> That data is sent to node A. >>> >>> A is running the Selector Protocol. >>> It notices that this reply is a Selector Protocol reply. >>> It decapsulates the Payload. It extracts /foo/100. >>> It checks that /foo/100 is signed by Node D. >>> >>> A?s interest is satisfied. >>> >>> >>> A issues new interest (for something newer, it wasn?t satisfied with >>>100). >>> >>> A issues an interest: >>> Interest: name = /foo/hash(Sel(>100)) >>> Payload = Sel(>100) >>> >>> Sends interest to B. >>> >>> B knows it?s a Selector Protocol interest. >>> Parses the payload for the selectors. It looks at the cache, finds no >>> match. >>> >>> B sends the interest to C >>> >>> >>> C doesn?t understand Selector Protocol. It just does exact matching. >>>Finds >>> no match. >>> It forwards the interest to node D. >>> >>> >>> D is running the Selector Protocol. >>> D looks at the data to see what?s the latest one. It?s 200. >>> D creates encapsulated reply. >>> >>> Data: >>> Name = /foo/hash(Sel(>100)) >>> Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) >>> Signature = Node D >>> >>> >>> Forwards the data to C. >>> >>> C doesn?t know Selector Protocol. It matches the Data to the Interest >>> based on exact match of the name >>> /foo/hash(Sel(>100)). It may or may not cache the data. >>> >>> C forwards the data to B. >>> >>> B is running the Selector Protocol. It matches the data to the >>>interest >>> based on PIT. It then proceeds to check that the selectors actually >>> match. It check that /foo/200 is greater than /foo/100. The check >>> passes. It decides to keep a copy of /foo/200 in its cache. >>> >>> Node B forwards the data to Node A, which receives it. Node A is >>>running >>> the Selector Protocol. It decapsulates the data, checks the >>>authenticity >>> and hands it to the app. >>> >>> >>> >>> Node E wants some of the /foo data, but only with the right signature. >>> >>> Node E issues an interest: >>> Interest: name = /foo/hash(Sel(Key=NodeD)) >>> Payload = Sel(Key=NodeD) >>> >>> >>> >>> Sends it to Node F. >>> >>> >>> F receives the interest. It knows it?s a Selector Protocol interest. >>> Parses payload, looks in cache but finds no match. >>> >>> F forwards the interest to node B. >>> >>> B receives the interest. It knows it?s a Selector Protocol interest. >>> Parses payload, looks in the cache and finds a match (namely /foo/200). >>> /foo/200 checks out since it is signed by Node D. >>> >>> Node B creates a reply by encapsulating /foo/200: >>> Name = /foo/hash(Sel(Key=NodeD)) >>> Payload = ( Data: Name = /foo/200, Signature = Node D, Payload = data) >>> Signature = Node B >>> >>> It sends the data to F. >>> >>> Node F is running the Selector Protocol. It sees the reply. It >>> decapsulates the object inside (/foo/200). It knows that this PIT >>>entry >>> has selectors and requires that the signature come from Node D. It >>>checks >>> that the signature of /foo/200 is from node D. It is. This is a valid >>> reply to the interest, so it forwards the data along to node E and >>> consumes the interest. Node F keeps a copy of the /foo/200 object. >>> >>> Node E receives the object. Matches it to the PIT. Decapsulates the >>>data >>> (since E is running the Selector Protocol), matches it to the selectors >>> and once checked sends it to the application. >>> >>> >>> >>> Done. >>> >>> In this scenario, most nodes were running the Selector Protocol. But >>>it?s >>> possible for some nodes not to run it. Those nodes would only do exact >>> matching (like node C). In this example, Node C kept a copy of the >>> packet /foo/hash(Sel(>100)) (which encapsulated /foo/200), it could >>>use >>> this as a reply to another interest with the same name, but it >>>wouldn?t be >>> able to use this to answer a selector of /foo/hash(Sel(>150)) since >>>that >>> would require selector parsing. That request would just be forwarded. >>> >>> >>> To summarize, nodes running the Selector Protocol behave like NDN >>>nodes. >>> The rest of the other nodes can do regular CCN with exact matching. >>> >>> Again, we are not advocating for this discovery protocol, we are just >>> saying that you could implement the selector functionality on top of >>>exact >>> matching. Those nodes that wanted to run the protocol would be able to >>>do >>> so, and those that did not want to run the protocol would not be >>>required >>> to do so. >>> >>> Nacho >>> >>> >>> >>>>On Sat, Sep 27, 2014 at 12:40 PM, wrote: >>>>> On 9/27/14, 8:40 PM, "Tai-Lin Chu" wrote: >>>>> >>>>>>> /mail/inbox/selector_matching/ >>>>>> >>>>>>So Is this implicit? >>>>> >>>>> No. This is an explicit hash of the interest payload. >>>>> >>>>> So, an interest could look like: >>>>> >>>>> Interest: >>>>> name = /mail/inbox/selector_matching/1234567890 >>>>> payload = ?user=nacho? >>>>> >>>>> where hash(?user=nacho?) = 1234567890 >>>>> >>>>> >>>>>>BTW, I read all your replies. I think the discovery protocol (send >>>>>>out >>>>>>table of content) has to reach the original provider ; otherwise >>>>>>there >>>>>>will be some issues in the trust model. At least the cached table of >>>>>>content has to be confirmed with the original provider either by key >>>>>>delegation or by other confirmation protocol. Besides this, LGTM. >>>>> >>>>> >>>>> The trust model is just slightly different. >>>>> >>>>> You could have something like: >>>>> >>>>> Interest: >>>>> name = /mail/inbox/selector_matching/1234567890 >>>>> payload = ?user=nacho,publisher=mail_server_key? >>>>> >>>>> >>>>> In this case, the reply would come signed by some random cache, but >>>>>the >>>>> encapsulated object would be signed by mail_server_key. So, any node >>>>>that >>>>> understood the Selector Protocol could decapsulate the reply and >>>>>check >>>>>the >>>>> signature. >>>>> >>>>> Nodes that do not understand the Selector Protocol would not be able >>>>>to >>>>> check the signature of the encapsulated answer. >>>>> >>>>> This to me is not a problem. Base nodes (the ones not running the >>>>>Selector >>>>> Protocol) would not be checking signatures anyway, at least not in >>>>>the >>>>> fast path. This is an expensive operation that requires the node to >>>>>get >>>>> the key, etc. Nodes that run the Selector Protocol can check >>>>>signatures >>>>> if they wish (and can get their hands on a key). >>>>> >>>>> >>>>> >>>>> Nacho >>>>> >>>>> >>>>>>On Sat, Sep 27, 2014 at 1:10 AM, wrote: >>>>>>> On 9/26/14, 10:50 PM, "Lan Wang (lanwang)" >>>>>>>wrote: >>>>>>> >>>>>>>>On Sep 26, 2014, at 2:46 AM, Ignacio.Solis at parc.com wrote: >>>>>>>>> On 9/25/14, 9:53 PM, "Lan Wang (lanwang)" >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>>> How can a cache respond to /mail/inbox/selector_matching/>>>>>>>>>of >>>>>>>>>> payload> with a table of content? This name prefix is owned by >>>>>>>>>>the >>>>>>>>>>mail >>>>>>>>>> server. Also the reply really depends on what is in the cache >>>>>>>>>>at >>>>>>>>>>the >>>>>>>>>> moment, so the same name would correspond to different data. >>>>>>>>> >>>>>>>>> A - Yes, the same name would correspond to different data. This >>>>>>>>>is >>>>>>>>>true >>>>>>>>> given that then data has changed. NDN (and CCN) has no >>>>>>>>>architectural >>>>>>>>> requirement that a name maps to the same piece of data (Obviously >>>>>>>>>not >>>>>>>>> talking about self certifying hash-based names). >>>>>>>> >>>>>>>>There is a difference. A complete NDN name including the implicit >>>>>>>>digest >>>>>>>>uniquely identifies a piece of data. >>>>>>> >>>>>>> That?s the same thing for CCN with a ContentObjectHash. >>>>>>> >>>>>>> >>>>>>>>But here the same complete name may map to different data (I >>>>>>>>suppose >>>>>>>>you >>>>>>>>don't have implicit digest in an effort to do exact matching). >>>>>>> >>>>>>> We do, it?s called ContentObjectHash, but it?s not considered part >>>>>>>of >>>>>>>the >>>>>>> name, it?s considered a matching restriction. >>>>>>> >>>>>>> >>>>>>>>In other words, in your proposal, the same name >>>>>>>>/mail/inbox/selector_matching/hash1 may map to two or more >>>>>>>>different >>>>>>>>data >>>>>>>>packets. But in NDN, two Data packets may share a name prefix, but >>>>>>>>definitely not the implicit digest. And at least it is my >>>>>>>>understanding >>>>>>>>that the application design should make sure that the same producer >>>>>>>>doesn't produce different Data packets with the same name prefix >>>>>>>>before >>>>>>>>implicit digest. >>>>>>> >>>>>>> This is an application design issue. The network cannot enforce >>>>>>>this. >>>>>>> Applications will be able to name various data objects with the >>>>>>>same >>>>>>>name. >>>>>>> After all, applications don?t really control the implicit digest. >>>>>>> >>>>>>>>It is possible in attack scenarios for different producers to >>>>>>>>generate >>>>>>>>Data packets with the same name prefix before implicit digest, but >>>>>>>>still >>>>>>>>not the same implicit digest. >>>>>>> >>>>>>> Why is this an attack scenario? Isn?t it true that if I name my >>>>>>>local >>>>>>> printer /printer that name can exist in the network at different >>>>>>>locations >>>>>>> from different publishers? >>>>>>> >>>>>>> >>>>>>> Just to clarify, in the examples provided we weren?t using implicit >>>>>>>hashes >>>>>>> anywhere. IF we were using implicit hashes (as in, we knew what >>>>>>>the >>>>>>> implicit hash was), then selectors are useless. If you know the >>>>>>>implicit >>>>>>> hash, then you don?t need selectors. >>>>>>> >>>>>>> In the case of CCN, we use names without explicit hashes for most >>>>>>>of >>>>>>>our >>>>>>> initial traffic (discovery, manifests, dynamically generated data, >>>>>>>etc.), >>>>>>> but after that, we use implicit digests (ContentObjectHash >>>>>>>restriction) >>>>>>> for practically all of the other traffic. >>>>>>> >>>>>>> Nacho >>>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> B - Yes, you can consider the name prefix is ?owned? by the >>>>>>>>>server, >>>>>>>>>but >>>>>>>>> the answer is actually something that the cache is choosing. The >>>>>>>>>cache >>>>>>>>>is >>>>>>>>> choosing from the set if data that it has. The data that it >>>>>>>>>encapsulates >>>>>>>>> _is_ signed by the producer. Anybody that can decapsulate the >>>>>>>>>data >>>>>>>>>can >>>>>>>>> verify that this is the case. >>>>>>>>> >>>>>>>>> Nacho >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sep 25, 2014, at 2:17 AM, Marc.Mosko at parc.com wrote: >>>>>>>>>> >>>>>>>>>>> My beating on ?discover all? is exactly because of this. Let?s >>>>>>>>>>>define >>>>>>>>>>> discovery service. If the service is just ?discover latest? >>>>>>>>>>> (left/right), can we not simplify the current approach? If the >>>>>>>>>>>service >>>>>>>>>>> includes more than ?latest?, then is the current approach the >>>>>>>>>>>right >>>>>>>>>>> approach? >>>>>>>>>>> >>>>>>>>>>> Sync has its place and is the right solution for somethings. >>>>>>>>>>>However, >>>>>>>>>>> it should not be a a bandage over discovery. Discovery should >>>>>>>>>>>be >>>>>>>>>>>its >>>>>>>>>>> own valid and useful service. >>>>>>>>>>> >>>>>>>>>>> I agree that the exclusion approach can work, and work >>>>>>>>>>>relatively >>>>>>>>>>>well, >>>>>>>>>>> for finding the rightmost/leftmost child. I believe this is >>>>>>>>>>>because >>>>>>>>>>> that operation is transitive through caches. So, within >>>>>>>>>>>whatever >>>>>>>>>>> timeout an application is willing to wait to find the >>>>>>>>>>>?latest?, it >>>>>>>>>>>can >>>>>>>>>>> keep asking and asking. >>>>>>>>>>> >>>>>>>>>>> I do think it would be best to actually try to ask an >>>>>>>>>>>authoritative >>>>>>>>>>> source first (i.e. a non-cached value), and if that fails then >>>>>>>>>>>probe >>>>>>>>>>> caches, but experimentation may show what works well. This is >>>>>>>>>>>based >>>>>>>>>>>on >>>>>>>>>>> my belief that in the real world in broad use, the namespace >>>>>>>>>>>will >>>>>>>>>>>become >>>>>>>>>>> pretty polluted and probing will result in a lot of junk, but >>>>>>>>>>>that?s >>>>>>>>>>> future prognosticating. >>>>>>>>>>> >>>>>>>>>>> Also, in the exact match vs. continuation match of content >>>>>>>>>>>object >>>>>>>>>>>to >>>>>>>>>>> interest, it is pretty easy to encode that ?selector? request >>>>>>>>>>>in a >>>>>>>>>>>name >>>>>>>>>>> component (i.e. ?exclude_before=(t=version, l=2, v=279) & >>>>>>>>>>>sort=right?) >>>>>>>>>>> and any participating cache can respond with a link (or >>>>>>>>>>>encapsulate) a >>>>>>>>>>> response in an exact match system. >>>>>>>>>>> >>>>>>>>>>> In the CCNx 1.0 spec, one could also encode this a different >>>>>>>>>>>way. >>>>>>>>>>>One >>>>>>>>>>> could use a name like ?/mail/inbox/selector_matching/>>>>>>>>>>payload>? >>>>>>>>>>> and in the payload include "exclude_before=(t=version, l=2, >>>>>>>>>>>v=279) & >>>>>>>>>>> sort=right?. This means that any cache that could process the >>>>>>>>>>>? >>>>>>>>>>> selector_matching? function could look at the interest payload >>>>>>>>>>>and >>>>>>>>>>> evaluate the predicate there. The predicate could become large >>>>>>>>>>>and >>>>>>>>>>>not >>>>>>>>>>> pollute the PIT with all the computation state. Including >>>>>>>>>>>?>>>>>>>>>>of >>>>>>>>>>> payload>? in the name means that one could get a cached >>>>>>>>>>>response >>>>>>>>>>>if >>>>>>>>>>> someone else had asked the same exact question (subject to the >>>>>>>>>>>content >>>>>>>>>>> object?s cache lifetime) and it also servers to multiplex >>>>>>>>>>>different >>>>>>>>>>> payloads for the same function (selector_matching). >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sep 25, 2014, at 8:18 AM, Burke, Jeff >>>>>>>>>>> >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://irl.cs.ucla.edu/~zhenkai/papers/chronosync.pdf >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>https://www.ccnx.org/releases/ccnx-0.7.0rc1/doc/technical/Synch >>>>>>>>>>>>ron >>>>>>>>>>>>iz >>>>>>>>>>>>at >>>>>>>>>>>>io >>>>>>>>>>>> nPr >>>>>>>>>>>> otocol.html >>>>>>>>>>>> >>>>>>>>>>>> J. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9/24/14, 11:16 PM, "Tai-Lin Chu" >>>>>>>>>>>>wrote: >>>>>>>>>>>> >>>>>>>>>>>>> However, I cannot see whether we can achieve "best-effort >>>>>>>>>>>>>*all*-value" >>>>>>>>>>>>> efficiently. >>>>>>>>>>>>> There are still interesting topics on >>>>>>>>>>>>> 1. how do we express the discovery query? >>>>>>>>>>>>> 2. is selector "discovery-complete"? i. e. can we express any >>>>>>>>>>>>> discovery query with current selector? >>>>>>>>>>>>> 3. if so, can we re-express current selector in a more >>>>>>>>>>>>>efficient >>>>>>>>>>>>>way? >>>>>>>>>>>>> >>>>>>>>>>>>> I personally see a named data as a set, which can then be >>>>>>>>>>>>>categorized >>>>>>>>>>>>> into "ordered set", and "unordered set". >>>>>>>>>>>>> some questions that any discovery expression must solve: >>>>>>>>>>>>> 1. is this a nil set or not? nil set means that this name is >>>>>>>>>>>>>the >>>>>>>>>>>>>leaf >>>>>>>>>>>>> 2. set contains member X? >>>>>>>>>>>>> 3. is set ordered or not >>>>>>>>>>>>> 4. (ordered) first, prev, next, last >>>>>>>>>>>>> 5. if we enforce component ordering, answer question 4. >>>>>>>>>>>>> 6. recursively answer all questions above on any set member >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Sep 24, 2014 at 10:45 PM, Burke, Jeff >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> From: >>>>>>>>>>>>>> Date: Wed, 24 Sep 2014 16:25:53 +0000 >>>>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>>>> Cc: , , >>>>>>>>>>>>>> >>>>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming >>>>>>>>>>>>>>convention? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think Tai-Lin?s example was just fine to talk about >>>>>>>>>>>>>>discovery. >>>>>>>>>>>>>> /blah/blah/value, how do you discover all the ?value?s? >>>>>>>>>>>>>>Discovery >>>>>>>>>>>>>> shouldn?t >>>>>>>>>>>>>> care if its email messages or temperature readings or world >>>>>>>>>>>>>>cup >>>>>>>>>>>>>> photos. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is true if discovery means "finding everything" - in >>>>>>>>>>>>>>which >>>>>>>>>>>>>>case, >>>>>>>>>>>>>> as you >>>>>>>>>>>>>> point out, sync-style approaches may be best. But I am not >>>>>>>>>>>>>>sure >>>>>>>>>>>>>>that >>>>>>>>>>>>>> this >>>>>>>>>>>>>> definition is complete. The most pressing example that I >>>>>>>>>>>>>>can >>>>>>>>>>>>>>think >>>>>>>>>>>>>> of >>>>>>>>>>>>>> is >>>>>>>>>>>>>> best-effort latest-value, in which the consumer's goal is to >>>>>>>>>>>>>>get >>>>>>>>>>>>>>the >>>>>>>>>>>>>> latest >>>>>>>>>>>>>> copy the network can deliver at the moment, and may not care >>>>>>>>>>>>>>about >>>>>>>>>>>>>> previous >>>>>>>>>>>>>> values or (if freshness is used well) potential later >>>>>>>>>>>>>>versions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another case that seems to work well is video seeking. >>>>>>>>>>>>>>Let's >>>>>>>>>>>>>>say I >>>>>>>>>>>>>> want to >>>>>>>>>>>>>> enable random access to a video by timecode. The publisher >>>>>>>>>>>>>>can >>>>>>>>>>>>>> provide a >>>>>>>>>>>>>> time-code based discovery namespace that's queried using an >>>>>>>>>>>>>>Interest >>>>>>>>>>>>>> that >>>>>>>>>>>>>> essentially says "give me the closest keyframe to >>>>>>>>>>>>>>00:37:03:12", >>>>>>>>>>>>>>which >>>>>>>>>>>>>> returns an interest that, via the name, provides the exact >>>>>>>>>>>>>>timecode >>>>>>>>>>>>>> of >>>>>>>>>>>>>> the >>>>>>>>>>>>>> keyframe in question and a link to a segment-based namespace >>>>>>>>>>>>>>for >>>>>>>>>>>>>> efficient >>>>>>>>>>>>>> exact match playout. In two roundtrips and in a very >>>>>>>>>>>>>>lightweight >>>>>>>>>>>>>> way, >>>>>>>>>>>>>> the >>>>>>>>>>>>>> consumer has random access capability. If the NDN is the >>>>>>>>>>>>>>moral >>>>>>>>>>>>>> equivalent >>>>>>>>>>>>>> of IP, then I am not sure we should be afraid of roundtrips >>>>>>>>>>>>>>that >>>>>>>>>>>>>> provide >>>>>>>>>>>>>> this kind of functionality, just as they are used in TCP. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I described one set of problems using the exclusion >>>>>>>>>>>>>>approach, >>>>>>>>>>>>>>and >>>>>>>>>>>>>> that >>>>>>>>>>>>>> an >>>>>>>>>>>>>> NDN paper on device discovery described a similar problem, >>>>>>>>>>>>>>though >>>>>>>>>>>>>> they >>>>>>>>>>>>>> did >>>>>>>>>>>>>> not go into the details of splitting interests, etc. That >>>>>>>>>>>>>>all >>>>>>>>>>>>>>was >>>>>>>>>>>>>> simple >>>>>>>>>>>>>> enough to see from the example. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another question is how does one do the discovery with exact >>>>>>>>>>>>>>match >>>>>>>>>>>>>> names, >>>>>>>>>>>>>> which is also conflating things. You could do a different >>>>>>>>>>>>>>discovery >>>>>>>>>>>>>> with >>>>>>>>>>>>>> continuation names too, just not the exclude method. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As I alluded to, one needs a way to talk with a specific >>>>>>>>>>>>>>cache >>>>>>>>>>>>>>about >>>>>>>>>>>>>> its >>>>>>>>>>>>>> ?table of contents? for a prefix so one can get a consistent >>>>>>>>>>>>>>set >>>>>>>>>>>>>>of >>>>>>>>>>>>>> results >>>>>>>>>>>>>> without all the round-trips of exclusions. Actually >>>>>>>>>>>>>>downloading >>>>>>>>>>>>>>the >>>>>>>>>>>>>> ?headers? of the messages would be the same bytes, more or >>>>>>>>>>>>>>less. >>>>>>>>>>>>>>In >>>>>>>>>>>>>> a >>>>>>>>>>>>>> way, >>>>>>>>>>>>>> this is a little like name enumeration from a ccnx 0.x repo, >>>>>>>>>>>>>>but >>>>>>>>>>>>>>that >>>>>>>>>>>>>> protocol has its own set of problems and I?m not suggesting >>>>>>>>>>>>>>to >>>>>>>>>>>>>>use >>>>>>>>>>>>>> that >>>>>>>>>>>>>> directly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One approach is to encode a request in a name component and >>>>>>>>>>>>>>a >>>>>>>>>>>>>> participating >>>>>>>>>>>>>> cache can reply. It replies in such a way that one could >>>>>>>>>>>>>>continue >>>>>>>>>>>>>> talking >>>>>>>>>>>>>> with that cache to get its TOC. One would then issue >>>>>>>>>>>>>>another >>>>>>>>>>>>>> interest >>>>>>>>>>>>>> with >>>>>>>>>>>>>> a request for not-that-cache. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm curious how the TOC approach works in a multi-publisher >>>>>>>>>>>>>>scenario? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another approach is to try to ask the authoritative source >>>>>>>>>>>>>>for >>>>>>>>>>>>>>the >>>>>>>>>>>>>> ?current? >>>>>>>>>>>>>> manifest name, i.e. /mail/inbox/current/, which could >>>>>>>>>>>>>>return >>>>>>>>>>>>>> the >>>>>>>>>>>>>> manifest or a link to the manifest. Then fetching the >>>>>>>>>>>>>>actual >>>>>>>>>>>>>> manifest >>>>>>>>>>>>>> from >>>>>>>>>>>>>> the link could come from caches because you how have a >>>>>>>>>>>>>>consistent >>>>>>>>>>>>>> set of >>>>>>>>>>>>>> names to ask for. If you cannot talk with an authoritative >>>>>>>>>>>>>>source, >>>>>>>>>>>>>> you >>>>>>>>>>>>>> could try again without the nonce and see if there?s a >>>>>>>>>>>>>>cached >>>>>>>>>>>>>>copy >>>>>>>>>>>>>> of a >>>>>>>>>>>>>> recent version around. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Marc >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 24, 2014, at 5:46 PM, Burke, Jeff >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/24/14, 8:20 AM, "Ignacio.Solis at parc.com" >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/24/14, 4:27 AM, "Tai-Lin Chu" >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> For example, I see a pattern /mail/inbox/148. I, a human >>>>>>>>>>>>>>being, >>>>>>>>>>>>>>see a >>>>>>>>>>>>>> pattern with static (/mail/inbox) and variable (148) >>>>>>>>>>>>>>components; >>>>>>>>>>>>>>with >>>>>>>>>>>>>> proper naming convention, computers can also detect this >>>>>>>>>>>>>>pattern >>>>>>>>>>>>>> easily. Now I want to look for all mails in my inbox. I can >>>>>>>>>>>>>>generate >>>>>>>>>>>>>> a >>>>>>>>>>>>>> list of /mail/inbox/. These are my guesses, and with >>>>>>>>>>>>>> selectors >>>>>>>>>>>>>> I can further refine my guesses. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this is a very bad example (or at least a very bad >>>>>>>>>>>>>> application >>>>>>>>>>>>>> design). You have an app (a mail server / inbox) and you >>>>>>>>>>>>>>want >>>>>>>>>>>>>>it >>>>>>>>>>>>>>to >>>>>>>>>>>>>> list >>>>>>>>>>>>>> your emails? An email list is an application data >>>>>>>>>>>>>>structure. >>>>>>>>>>>>>>I >>>>>>>>>>>>>> don?t >>>>>>>>>>>>>> think you should use the network structure to reflect this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think Tai-Lin is trying to sketch a small example, not >>>>>>>>>>>>>>propose >>>>>>>>>>>>>>a >>>>>>>>>>>>>> full-scale approach to email. (Maybe I am misunderstanding.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another way to look at it is that if the network >>>>>>>>>>>>>>architecture >>>>>>>>>>>>>>is >>>>>>>>>>>>>> providing >>>>>>>>>>>>>> the equivalent of distributed storage to the application, >>>>>>>>>>>>>>perhaps >>>>>>>>>>>>>>the >>>>>>>>>>>>>> application data structure could be adapted to match the >>>>>>>>>>>>>>affordances >>>>>>>>>>>>>> of >>>>>>>>>>>>>> the network. Then it would not be so bad that the two >>>>>>>>>>>>>>structures >>>>>>>>>>>>>> were >>>>>>>>>>>>>> aligned. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I?ll give you an example, how do you delete emails from your >>>>>>>>>>>>>>inbox? >>>>>>>>>>>>>> If >>>>>>>>>>>>>> an >>>>>>>>>>>>>> email was cached in the network it can never be deleted from >>>>>>>>>>>>>>your >>>>>>>>>>>>>> inbox? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is conflating two issues - what you are pointing out is >>>>>>>>>>>>>>that >>>>>>>>>>>>>>the >>>>>>>>>>>>>> data >>>>>>>>>>>>>> structure of a linear list doesn't handle common email >>>>>>>>>>>>>>management >>>>>>>>>>>>>> operations well. Again, I'm not sure if that's what he was >>>>>>>>>>>>>>getting >>>>>>>>>>>>>> at >>>>>>>>>>>>>> here. But deletion is not the issue - the availability of a >>>>>>>>>>>>>>data >>>>>>>>>>>>>> object >>>>>>>>>>>>>> on the network does not necessarily mean it's valid from the >>>>>>>>>>>>>> perspective >>>>>>>>>>>>>> of the application. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Or moved to another mailbox? Do you rely on the emails >>>>>>>>>>>>>>expiring? >>>>>>>>>>>>>> >>>>>>>>>>>>>> This problem is true for most (any?) situations where you >>>>>>>>>>>>>>use >>>>>>>>>>>>>>network >>>>>>>>>>>>>> name >>>>>>>>>>>>>> structure to directly reflect the application data >>>>>>>>>>>>>>structure. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Not sure I understand how you make the leap from the >>>>>>>>>>>>>>example to >>>>>>>>>>>>>>the >>>>>>>>>>>>>> general statement. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jeff >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nacho >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Sep 23, 2014 at 2:34 AM, >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, yes I think those would all be good things. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One thing to keep in mind, especially with things like time >>>>>>>>>>>>>>series >>>>>>>>>>>>>> sensor >>>>>>>>>>>>>> data, is that people see a pattern and infer a way of doing >>>>>>>>>>>>>>it. >>>>>>>>>>>>>> That?s >>>>>>>>>>>>>> easy >>>>>>>>>>>>>> for a human :) But in Discovery, one should assume that one >>>>>>>>>>>>>>does >>>>>>>>>>>>>>not >>>>>>>>>>>>>> know >>>>>>>>>>>>>> of patterns in the data beyond what the protocols used to >>>>>>>>>>>>>>publish >>>>>>>>>>>>>>the >>>>>>>>>>>>>> data >>>>>>>>>>>>>> explicitly require. That said, I think some of the things >>>>>>>>>>>>>>you >>>>>>>>>>>>>>listed >>>>>>>>>>>>>> are >>>>>>>>>>>>>> good places to start: sensor data, web content, climate >>>>>>>>>>>>>>data or >>>>>>>>>>>>>> genome >>>>>>>>>>>>>> data. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We also need to state what the forwarding strategies are and >>>>>>>>>>>>>>what >>>>>>>>>>>>>>the >>>>>>>>>>>>>> cache >>>>>>>>>>>>>> behavior is. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I outlined some of the points that I think are important in >>>>>>>>>>>>>>that >>>>>>>>>>>>>> other >>>>>>>>>>>>>> posting. While ?discover latest? is useful, ?discover all? >>>>>>>>>>>>>>is >>>>>>>>>>>>>>also >>>>>>>>>>>>>> important, and that one gets complicated fast. So points >>>>>>>>>>>>>>like >>>>>>>>>>>>>> separating >>>>>>>>>>>>>> discovery from retrieval and working with large data sets >>>>>>>>>>>>>>have >>>>>>>>>>>>>>been >>>>>>>>>>>>>> important in shaping our thinking. That all said, I?d be >>>>>>>>>>>>>>happy >>>>>>>>>>>>>> starting >>>>>>>>>>>>>> from 0 and working through the Discovery service definition >>>>>>>>>>>>>>from >>>>>>>>>>>>>> scratch >>>>>>>>>>>>>> along with data set use cases. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Marc >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 23, 2014, at 12:36 AM, Burke, Jeff >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Marc, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks ? yes, I saw that as well. I was just trying to get >>>>>>>>>>>>>>one >>>>>>>>>>>>>>step >>>>>>>>>>>>>> more >>>>>>>>>>>>>> specific, which was to see if we could identify a few >>>>>>>>>>>>>>specific >>>>>>>>>>>>>>use >>>>>>>>>>>>>> cases >>>>>>>>>>>>>> around which to have the conversation. (e.g., time series >>>>>>>>>>>>>>sensor >>>>>>>>>>>>>> data >>>>>>>>>>>>>> and >>>>>>>>>>>>>> web content retrieval for "get latest"; climate data for >>>>>>>>>>>>>>huge >>>>>>>>>>>>>>data >>>>>>>>>>>>>> sets; >>>>>>>>>>>>>> local data in a vehicular network; etc.) What have you been >>>>>>>>>>>>>>looking >>>>>>>>>>>>>> at >>>>>>>>>>>>>> that's driving considerations of discovery? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Jeff >>>>>>>>>>>>>> >>>>>>>>>>>>>> From: >>>>>>>>>>>>>> Date: Mon, 22 Sep 2014 22:29:43 +0000 >>>>>>>>>>>>>> To: Jeff Burke >>>>>>>>>>>>>> Cc: , >>>>>>>>>>>>>> Subject: Re: [Ndn-interest] any comments on naming >>>>>>>>>>>>>>convention? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jeff, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Take a look at my posting (that Felix fixed) in a new >>>>>>>>>>>>>>thread on >>>>>>>>>>>>>> Discovery. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>http://www.lists.cs.ucla.edu/pipermail/ndn-interest/2014-Sept >>>>>>>>>>>>>>emb >>>>>>>>>>>>>>er >>>>>>>>>>>>>>/0 >>>>>>>>>>>>>>00 >>>>>>>>>>>>>> 20 >>>>>>>>>>>>>> 0 >>>>>>>>>>>>>> .html >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think it would be very productive to talk about what >>>>>>>>>>>>>>Discovery >>>>>>>>>>>>>> should >>>>>>>>>>>>>> do, >>>>>>>>>>>>>> and not focus on the how. It is sometimes easy to get >>>>>>>>>>>>>>caught >>>>>>>>>>>>>>up >>>>>>>>>>>>>>in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> how, >>>>>>>>>>>>>> which I think is a less important topic than the what at >>>>>>>>>>>>>>this >>>>>>>>>>>>>>stage. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Marc >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 22, 2014, at 11:04 PM, Burke, Jeff >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Marc, >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you can't talk about your protocols, perhaps we can >>>>>>>>>>>>>>discuss >>>>>>>>>>>>>>this >>>>>>>>>>>>>> based >>>>>>>>>>>>>> on use cases. What are the use cases you are using to >>>>>>>>>>>>>>evaluate >>>>>>>>>>>>>> discovery? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jeff >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/21/14, 11:23 AM, "Marc.Mosko at parc.com" >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> No matter what the expressiveness of the predicates if the >>>>>>>>>>>>>>forwarder >>>>>>>>>>>>>> can >>>>>>>>>>>>>> send interests different ways you don't have a consistent >>>>>>>>>>>>>>underlying >>>>>>>>>>>>>> set >>>>>>>>>>>>>> to talk about so you would always need non-range exclusions >>>>>>>>>>>>>>to >>>>>>>>>>>>>> discover >>>>>>>>>>>>>> every version. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Range exclusions only work I believe if you get an >>>>>>>>>>>>>>authoritative >>>>>>>>>>>>>> answer. >>>>>>>>>>>>>> If different content pieces are scattered between different >>>>>>>>>>>>>>caches >>>>>>>>>>>>>>I >>>>>>>>>>>>>> don't see how range exclusions would work to discover every >>>>>>>>>>>>>>version. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm sorry to be pointing out problems without offering >>>>>>>>>>>>>>solutions >>>>>>>>>>>>>>but >>>>>>>>>>>>>> we're not ready to publish our discovery protocols. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 21, 2014, at 8:50, "Tai-Lin Chu" >>>>>>>>>>>>>> >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I see. Can you briefly describe how ccnx discovery protocol >>>>>>>>>>>>>>solves >>>>>>>>>>>>>> the >>>>>>>>>>>>>> all problems that you mentioned (not just exclude)? a doc >>>>>>>>>>>>>>will >>>>>>>>>>>>>>be >>>>>>>>>>>>>> better. >>>>>>>>>>>>>> >>>>>>>>>>>>>> My unserious conjecture( :) ) : exclude is equal to [not]. I >>>>>>>>>>>>>>will >>>>>>>>>>>>>> soon >>>>>>>>>>>>>> expect [and] and [or], so boolean algebra is fully >>>>>>>>>>>>>>supported. >>>>>>>>>>>>>> Regular >>>>>>>>>>>>>> language or context free language might become part of >>>>>>>>>>>>>>selector >>>>>>>>>>>>>>too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Sep 20, 2014 at 11:25 PM, >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> That will get you one reading then you need to exclude it >>>>>>>>>>>>>>and >>>>>>>>>>>>>>ask >>>>>>>>>>>>>> again. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sent from my telephone >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 21, 2014, at 8:22, "Tai-Lin Chu" >>>>>>>>>>>>>> >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>>>set >>>>>>>>>>>>>> with a particular cache, then you need to always use >>>>>>>>>>>>>>individual >>>>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>>>versions >>>>>>>>>>>>>> of an object. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am very confused. For your example, if I want to get all >>>>>>>>>>>>>>today's >>>>>>>>>>>>>> sensor data, I just do (Any..Last second of last day)(First >>>>>>>>>>>>>>second >>>>>>>>>>>>>>of >>>>>>>>>>>>>> tomorrow..Any). That's 18 bytes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1]http://named-data.net/doc/ndn-tlv/interest.html#exclude >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Sep 20, 2014 at 10:55 PM, >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 21, 2014, at 1:47 AM, Tai-Lin Chu >>>>>>>>>>>>>> >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you talk sometimes to A and sometimes to B, you very >>>>>>>>>>>>>>easily >>>>>>>>>>>>>> could miss content objects you want to discovery unless you >>>>>>>>>>>>>>avoid >>>>>>>>>>>>>> all range exclusions and only exclude explicit versions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Could you explain why missing content object situation >>>>>>>>>>>>>>happens? >>>>>>>>>>>>>>also >>>>>>>>>>>>>> range exclusion is just a shorter notation for many explicit >>>>>>>>>>>>>> exclude; >>>>>>>>>>>>>> converting from explicit excludes to ranged exclude is >>>>>>>>>>>>>>always >>>>>>>>>>>>>> possible. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, my point was that if you cannot talk about a consistent >>>>>>>>>>>>>>set >>>>>>>>>>>>>> with a particular cache, then you need to always use >>>>>>>>>>>>>>individual >>>>>>>>>>>>>> excludes not range excludes if you want to discover all the >>>>>>>>>>>>>>versions >>>>>>>>>>>>>> of an object. For something like a sensor reading that is >>>>>>>>>>>>>>updated, >>>>>>>>>>>>>> say, once per second you will have 86,400 of them per day. >>>>>>>>>>>>>>If >>>>>>>>>>>>>>each >>>>>>>>>>>>>> exclusion is a timestamp (say 8 bytes), that?s 691,200 >>>>>>>>>>>>>>bytes of >>>>>>>>>>>>>> exclusions (plus encoding overhead) per day. >>>>>>>>>>>>>> >>>>>>>>>>>>>> yes, maybe using a more deterministic version number than a >>>>>>>>>>>>>> timestamp makes sense here, but its just an example of >>>>>>>>>>>>>>needing >>>>>>>>>>>>>>a >>>>>>>>>>>>>>lot >>>>>>>>>>>>>> of exclusions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You exclude through 100 then issue a new interest. This >>>>>>>>>>>>>>goes >>>>>>>>>>>>>>to >>>>>>>>>>>>>> cache B >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I feel this case is invalid because cache A will also get >>>>>>>>>>>>>>the >>>>>>>>>>>>>> interest, and cache A will return v101 if it exists. Like >>>>>>>>>>>>>>you >>>>>>>>>>>>>>said, >>>>>>>>>>>>>> if >>>>>>>>>>>>>> this goes to cache B only, it means that cache A dies. How >>>>>>>>>>>>>>do >>>>>>>>>>>>>>you >>>>>>>>>>>>>> know >>>>>>>>>>>>>> that v101 even exist? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess this depends on what the forwarding strategy is. If >>>>>>>>>>>>>>the >>>>>>>>>>>>>> forwarder will always send each interest to all replicas, >>>>>>>>>>>>>>then >>>>>>>>>>>>>>yes, >>>>>>>>>>>>>> modulo packet loss, you would discover v101 on cache A. If >>>>>>>>>>>>>>the >>>>>>>>>>>>>> forwarder is just doing ?best path? and can round-robin >>>>>>>>>>>>>>between >>>>>>>>>>>>>>cache >>>>>>>>>>>>>> A and cache B, then your application could miss v101. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> c,d In general I agree that LPM performance is related to >>>>>>>>>>>>>>the >>>>>>>>>>>>>>number >>>>>>>>>>>>>> of components. In my own thread-safe LMP implementation, I >>>>>>>>>>>>>>used >>>>>>>>>>>>>>only >>>>>>>>>>>>>> one RWMutex for the whole tree. I don't know whether adding >>>>>>>>>>>>>>lock >>>>>>>>>>>>>>for >>>>>>>>>>>>>> every node will be faster or not because of lock overhead. >>>>>>>>>>>>>> >>>>>>>>>>>>>> However, we should compare (exact match + discovery >>>>>>>>>>>>>>protocol) >>>>>>>>>>>>>>vs >>>>>>>>>>>>>> (ndn >>>>>>>>>>>>>> lpm). Comparing performance of exact match to lpm is unfair. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, we should compare them. And we need to publish the >>>>>>>>>>>>>>ccnx >>>>>>>>>>>>>>1.0 >>>>>>>>>>>>>> specs for doing the exact match discovery. So, as I said, >>>>>>>>>>>>>>I?m >>>>>>>>>>>>>>not >>>>>>>>>>>>>> ready to claim its better yet because we have not done that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Sep 20, 2014 at 2:38 PM, >>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>> I would point out that using LPM on content object to >>>>>>>>>>>>>>Interest >>>>>>>>>>>>>> matching to do discovery has its own set of problems. >>>>>>>>>>>>>>Discovery >>>>>>>>>>>>>> involves more than just ?latest version? discovery too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is probably getting off-topic from the original post >>>>>>>>>>>>>>about >>>>>>>>>>>>>> naming conventions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> a. If Interests can be forwarded multiple directions and >>>>>>>>>>>>>>two >>>>>>>>>>>>>> different caches are responding, the exclusion set you >>>>>>>>>>>>>>build up >>>>>>>>>>>>>> talking with cache A will be invalid for cache B. If you >>>>>>>>>>>>>>talk >>>>>>>>>>>>>> sometimes to A and sometimes to B, you very easily could >>>>>>>>>>>>>>miss >>>>>>>>>>>>>> content objects you want to discovery unless you avoid all >>>>>>>>>>>>>>range >>>>>>>>>>>>>> exclusions and only exclude explicit versions. That will >>>>>>>>>>>>>>lead >>>>>>>>>>>>>>to >>>>>>>>>>>>>> very large interest packets. In ccnx 1.0, we believe that >>>>>>>>>>>>>>an >>>>>>>>>>>>>> explicit discovery protocol that allows conversations about >>>>>>>>>>>>>> consistent sets is better. >>>>>>>>>>>>>> >>>>>>>>>>>>>> b. Yes, if you just want the ?latest version? discovery that >>>>>>>>>>>>>> should be transitive between caches, but imagine this. You >>>>>>>>>>>>>>send >>>>>>>>>>>>>> Interest #1 to cache A which returns version 100. You >>>>>>>>>>>>>>exclude >>>>>>>>>>>>>> through 100 then issue a new interest. This goes to cache B >>>>>>>>>>>>>>who >>>>>>>>>>>>>> only has version 99, so the interest times out or is NACK?d. >>>>>>>>>>>>>>So >>>>>>>>>>>>>> you think you have it! But, cache A already has version >>>>>>>>>>>>>>101, >>>>>>>>>>>>>>you >>>>>>>>>>>>>> just don?t know. If you cannot have a conversation around >>>>>>>>>>>>>> consistent sets, it seems like even doing latest version >>>>>>>>>>>>>>discovery >>>>>>>>>>>>>> is difficult with selector based discovery. From what I >>>>>>>>>>>>>>saw in >>>>>>>>>>>>>> ccnx 0.x, one ended up getting an Interest all the way to >>>>>>>>>>>>>>the >>>>>>>>>>>>>> authoritative source because you can never believe an >>>>>>>>>>>>>>intermediate >>>>>>>>>>>>>> cache that there?s not something more recent. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I?m sure you?ve walked through cases (a) and (b) in ndn, >>>>>>>>>>>>>>I?d be >>>>>>>>>>>>>> interest in seeing your analysis. Case (a) is that a node >>>>>>>>>>>>>>can >>>>>>>>>>>>>> correctly discover every version of a name prefix, and (b) >>>>>>>>>>>>>>is >>>>>>>>>>>>>>that >>>>>>>>>>>>>> a node can correctly discover the latest version. We have >>>>>>>>>>>>>>not >>>>>>>>>>>>>> formally compared (or yet published) our discovery protocols >>>>>>>>>>>>>>(we >>>>>>>>>>>>>> have three, 2 for content, 1 for device) compared to >>>>>>>>>>>>>>selector >>>>>>>>>>>>>>based >>>>>>>>>>>>>> discovery, so I cannot yet claim they are better, but they >>>>>>>>>>>>>>do >>>>>>>>>>>>>>not >>>>>>>>>>>>>> have the non-determinism sketched above. >>>>>>>>>>>>>> >>>>>>>>>>>>>> c. Using LPM, there is a non-deterministic number of lookups >>>>>>>>>>>>>>you >>>>>>>>>>>>>> must do in the PIT to match a content object. If you have a >>>>>>>>>>>>>>name >>>>>>>>>>>>>> tree or a threaded hash table, those don?t all need to be >>>>>>>>>>>>>>hash >>>>>>>>>>>>>> lookups, but you need to walk up the name tree for every >>>>>>>>>>>>>>prefix >>>>>>>>>>>>>>of >>>>>>>>>>>>>> the content object name and evaluate the selector predicate. >>>>>>>>>>>>>> Content Based Networking (CBN) had some some methods to >>>>>>>>>>>>>>create >>>>>>>>>>>>>>data >>>>>>>>>>>>>> structures based on predicates, maybe those would be better. >>>>>>>>>>>>>>But >>>>>>>>>>>>>> in any case, you will potentially need to retrieve many PIT >>>>>>>>>>>>>>entries >>>>>>>>>>>>>> if there is Interest traffic for many prefixes of a root. >>>>>>>>>>>>>>Even >>>>>>>>>>>>>>on >>>>>>>>>>>>>> an Intel system, you?ll likely miss cache lines, so you?ll >>>>>>>>>>>>>>have a >>>>>>>>>>>>>> lot of NUMA access for each one. In CCNx 1.0, even a naive >>>>>>>>>>>>>> implementation only requires at most 3 lookups (one by name, >>>>>>>>>>>>>>one >>>>>>>>>>>>>>by >>>>>>>>>>>>>> name + keyid, one by name + content object hash), and one >>>>>>>>>>>>>>can >>>>>>>>>>>>>>do >>>>>>>>>>>>>> other things to optimize lookup for an extra write. >>>>>>>>>>>>>> >>>>>>>>>>>>>> d. In (c) above, if you have a threaded name tree or are >>>>>>>>>>>>>>just >>>>>>>>>>>>>> walking parent pointers, I suspect you?ll need locking of >>>>>>>>>>>>>>the >>>>>>>>>>>>>> ancestors in a multi-threaded system (?threaded" here >>>>>>>>>>>>>>meaning >>>>>>>>>>>>>>LWP) >>>>>>>>>>>>>> and that will be expensive. It would be interesting to see >>>>>>>>>>>>>>what >>>>>>>>>>>>>>a >>>>>>>>>>>>>> cache consistent multi-threaded name tree looks like. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Marc >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 20, 2014, at 8:15 PM, Tai-Lin Chu >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I had thought about these questions, but I want to know your >>>>>>>>>>>>>>idea >>>>>>>>>>>>>> besides typed component: >>>>>>>>>>>>>> 1. LPM allows "data discovery". How will exact match do >>>>>>>>>>>>>>similar >>>>>>>>>>>>>> things? >>>>>>>>>>>>>> 2. will removing selectors improve performance? How do we >>>>>>>>>>>>>>use >>>>>>>>>>>>>> other >>>>>>>>>>>>>> faster technique to replace selector? >>>>>>>>>>>>>> 3. fixed byte length and type. I agree more that type can be >>>>>>>>>>>>>>fixed >>>>>>>>>>>>>> byte, but 2 bytes for length might not be enough for future. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Sep 20, 2014 at 5:36 AM, Dave Oran (oran) >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 18, 2014, at 9:09 PM, Tai-Lin Chu >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I >>>>>>>>>>>>>>can >>>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Could you share it with us? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sure. Here?s a strawman. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The type space is 16 bits, so you have 65,565 types. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The type space is currently shared with the types used for >>>>>>>>>>>>>>the >>>>>>>>>>>>>> entire protocol, that gives us two options: >>>>>>>>>>>>>> (1) we reserve a range for name component types. Given the >>>>>>>>>>>>>> likelihood there will be at least as much and probably more >>>>>>>>>>>>>>need >>>>>>>>>>>>>> to component types than protocol extensions, we could >>>>>>>>>>>>>>reserve >>>>>>>>>>>>>>1/2 >>>>>>>>>>>>>> of the type space, giving us 32K types for name components. >>>>>>>>>>>>>> (2) since there is no parsing ambiguity between name >>>>>>>>>>>>>>components >>>>>>>>>>>>>> and other fields of the protocol (sine they are sub-types of >>>>>>>>>>>>>>the >>>>>>>>>>>>>> name type) we could reuse numbers and thereby have an entire >>>>>>>>>>>>>>65K >>>>>>>>>>>>>> name component types. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We divide the type space into regions, and manage it with a >>>>>>>>>>>>>> registry. If we ever get to the point of creating an IETF >>>>>>>>>>>>>> standard, IANA has 25 years of experience running registries >>>>>>>>>>>>>>and >>>>>>>>>>>>>> there are well-understood rule sets for different kinds of >>>>>>>>>>>>>> registries (open, requires a written spec, requires >>>>>>>>>>>>>>standards >>>>>>>>>>>>>> approval). >>>>>>>>>>>>>> >>>>>>>>>>>>>> - We allocate one ?default" name component type for ?generic >>>>>>>>>>>>>> name?, which would be used on name prefixes and other common >>>>>>>>>>>>>> cases where there are no special semantics on the name >>>>>>>>>>>>>>component. >>>>>>>>>>>>>> - We allocate a range of name component types, say 1024, to >>>>>>>>>>>>>> globally understood types that are part of the base or >>>>>>>>>>>>>>extension >>>>>>>>>>>>>> NDN specifications (e.g. chunk#, version#, etc. >>>>>>>>>>>>>> - We reserve some portion of the space for unanticipated >>>>>>>>>>>>>>uses >>>>>>>>>>>>>> (say another 1024 types) >>>>>>>>>>>>>> - We give the rest of the space to application assignment. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Make sense? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>>> performance flaws in the design >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> we could design for performance, >>>>>>>>>>>>>> >>>>>>>>>>>>>> That?s not what people are advocating. We are advocating >>>>>>>>>>>>>>that >>>>>>>>>>>>>>we >>>>>>>>>>>>>> *not* design for known bad performance and hope serendipity >>>>>>>>>>>>>>or >>>>>>>>>>>>>> Moore?s Law will come to the rescue. >>>>>>>>>>>>>> >>>>>>>>>>>>>> but I think there will be a turning >>>>>>>>>>>>>> point when the slower design starts to become "fast enough?. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Perhaps, perhaps not. Relative performance is what matters >>>>>>>>>>>>>>so >>>>>>>>>>>>>> things that don?t get faster while others do tend to get >>>>>>>>>>>>>>dropped >>>>>>>>>>>>>> or not used because they impose a performance penalty >>>>>>>>>>>>>>relative >>>>>>>>>>>>>>to >>>>>>>>>>>>>> the things that go faster. There is also the ?low-end? >>>>>>>>>>>>>>phenomenon >>>>>>>>>>>>>> where impovements in technology get applied to lowering cost >>>>>>>>>>>>>> rather than improving performance. For those environments >>>>>>>>>>>>>>bad >>>>>>>>>>>>>> performance just never get better. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you >>>>>>>>>>>>>> think there will be some design of ndn that will *never* >>>>>>>>>>>>>>have >>>>>>>>>>>>>> performance improvement? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I suspect LPM on data will always be slow (relative to the >>>>>>>>>>>>>>other >>>>>>>>>>>>>> functions). >>>>>>>>>>>>>> i suspect exclusions will always be slow because they will >>>>>>>>>>>>>> require extra memory references. >>>>>>>>>>>>>> >>>>>>>>>>>>>> However I of course don?t claim to clairvoyance so this is >>>>>>>>>>>>>>just >>>>>>>>>>>>>> speculation based on 35+ years of seeing performance improve >>>>>>>>>>>>>>by 4 >>>>>>>>>>>>>> orders of magnitude and still having to worry about counting >>>>>>>>>>>>>> cycles and memory references? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 18, 2014 at 5:20 PM, Dave Oran (oran) >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 18, 2014, at 7:41 PM, Tai-Lin Chu >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> We should not look at a certain chip nowadays and want ndn >>>>>>>>>>>>>>to >>>>>>>>>>>>>> perform >>>>>>>>>>>>>> well on it. It should be the other way around: once ndn app >>>>>>>>>>>>>> becomes >>>>>>>>>>>>>> popular, a better chip will be designed for ndn. >>>>>>>>>>>>>> >>>>>>>>>>>>>> While I?m sympathetic to that view, there are three ways in >>>>>>>>>>>>>> which Moore?s law or hardware tricks will not save us from >>>>>>>>>>>>>> performance flaws in the design: >>>>>>>>>>>>>> a) clock rates are not getting (much) faster >>>>>>>>>>>>>> b) memory accesses are getting (relatively) more expensive >>>>>>>>>>>>>> c) data structures that require locks to manipulate >>>>>>>>>>>>>> successfully will be relatively more expensive, even with >>>>>>>>>>>>>> near-zero lock contention. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The fact is, IP *did* have some serious performance flaws in >>>>>>>>>>>>>> its design. We just forgot those because the design elements >>>>>>>>>>>>>> that depended on those mistakes have fallen into disuse. The >>>>>>>>>>>>>> poster children for this are: >>>>>>>>>>>>>> 1. IP options. Nobody can use them because they are too slow >>>>>>>>>>>>>> on modern forwarding hardware, so they can?t be reliably >>>>>>>>>>>>>>used >>>>>>>>>>>>>> anywhere >>>>>>>>>>>>>> 2. the UDP checksum, which was a bad design when it was >>>>>>>>>>>>>> specified and is now a giant PITA that still causes major >>>>>>>>>>>>>>pain >>>>>>>>>>>>>> in working around. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I?m afraid students today are being taught the that >>>>>>>>>>>>>>designers >>>>>>>>>>>>>> of IP were flawless, as opposed to very good scientists and >>>>>>>>>>>>>> engineers that got most of it right. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I feel the discussion today and yesterday has been >>>>>>>>>>>>>>off-topic. >>>>>>>>>>>>>> Now I >>>>>>>>>>>>>> see that there are 3 approaches: >>>>>>>>>>>>>> 1. we should not define a naming convention at all >>>>>>>>>>>>>> 2. typed component: use tlv type space and add a handful of >>>>>>>>>>>>>> types >>>>>>>>>>>>>> 3. marked component: introduce only one more type and add >>>>>>>>>>>>>> additional >>>>>>>>>>>>>> marker space >>>>>>>>>>>>>> >>>>>>>>>>>>>> I know how to make #2 flexible enough to do what things I >>>>>>>>>>>>>>can >>>>>>>>>>>>>> envision we need to do, and with a few simple conventions on >>>>>>>>>>>>>> how the registry of types is managed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is just as powerful in practice as either throwing up our >>>>>>>>>>>>>> hands and letting applications design their own mutually >>>>>>>>>>>>>> incompatible schemes or trying to make naming conventions >>>>>>>>>>>>>>with >>>>>>>>>>>>>> markers in a way that is fast to generate/parse and also >>>>>>>>>>>>>> resilient against aliasing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also everybody thinks that the current utf8 marker naming >>>>>>>>>>>>>> convention >>>>>>>>>>>>>> needs to be revised. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 18, 2014 at 3:27 PM, Felix Rabe >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> Would that chip be suitable, i.e. can we expect most names >>>>>>>>>>>>>> to fit in (the >>>>>>>>>>>>>> magnitude of) 96 bytes? What length are names usually in >>>>>>>>>>>>>> current NDN >>>>>>>>>>>>>> experiments? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess wide deployment could make for even longer names. >>>>>>>>>>>>>> Related: Many URLs >>>>>>>>>>>>>> I encounter nowadays easily don't fit within two 80-column >>>>>>>>>>>>>> text lines, and >>>>>>>>>>>>>> NDN will have to carry more information than URLs, as far as >>>>>>>>>>>>>> I see. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/Sep/14 23:15, Marc.Mosko at parc.com wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> In fact, the index in separate TLV will be slower on some >>>>>>>>>>>>>> architectures, >>>>>>>>>>>>>> like the ezChip NP4. The NP4 can hold the fist 96 frame >>>>>>>>>>>>>> bytes in memory, >>>>>>>>>>>>>> then any subsequent memory is accessed only as two adjacent >>>>>>>>>>>>>> 32-byte blocks >>>>>>>>>>>>>> (there can be at most 5 blocks available at any one time). >>>>>>>>>>>>>> If you need to >>>>>>>>>>>>>> switch between arrays, it would be very expensive. If you >>>>>>>>>>>>>> have to read past >>>>>>>>>>>>>> the name to get to the 2nd array, then read it, then backup >>>>>>>>>>>>>> to get to the >>>>>>>>>>>>>> name, it will be pretty expensive too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Marc >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sep 18, 2014, at 2:02 PM, >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does this make that much difference? >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you want to parse the first 5 components. One way to do >>>>>>>>>>>>>> it is: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Read the index, find entry 5, then read in that many bytes >>>>>>>>>>>>>> from the start >>>>>>>>>>>>>> offset of the beginning of the name. >>>>>>>>>>>>>> OR >>>>>>>>>>>>>> Start reading name, (find size + move ) 5 times. >>>>>>>>>>>>>> >>>>>>>>>>>>>> How much speed are you getting from one to the other? You >>>>>>>>>>>>>> seem to imply >>>>>>>>>>>>>> that the first one is faster. I don?t think this is the >>>>>>>>>>>>>> case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In the first one you?ll probably have to get the cache line >>>>>>>>>>>>>> for the index, >>>>>>>>>>>>>> then all the required cache lines for the first 5 >>>>>>>>>>>>>> components. For the >>>>>>>>>>>>>> second, you?ll have to get all the cache lines for the first >>>>>>>>>>>>>> 5 components. >>>>>>>>>>>>>> Given an assumption that a cache miss is way more expensive >>>>>>>>>>>>>> than >>>>>>>>>>>>>> evaluating a number and computing an addition, you might >>>>>>>>>>>>>> find that the >>>>>>>>>>>>>> performance of the index is actually slower than the >>>>>>>>>>>>>> performance of the >>>>>>>>>>>>>> direct access. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Granted, there is a case where you don?t access the name at >>>>>>>>>>>>>> all, for >>>>>>>>>>>>>> example, if you just get the offsets and then send the >>>>>>>>>>>>>> offsets as >>>>>>>>>>>>>> parameters to another processor/GPU/NPU/etc. In this case >>>>>>>>>>>>>> you may see a >>>>>>>>>>>>>> gain IF there are more cache line misses in reading the name >>>>>>>>>>>>>> than in >>>>>>>>>>>>>> reading the index. So, if the regular part of the name >>>>>>>>>>>>>> that you?re >>>>>>>>>>>>>> parsing is bigger than the cache line (64 bytes?) and the >>>>>>>>>>>>>> name is to be >>>>>>>>>>>>>> processed by a different processor, then your might see some >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> gain in using the index, but in all other circumstances I >>>>>>>>>>>>>> bet this is not >>>>>>>>>>>>>> the case. I may be wrong, haven?t actually tested it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is all to say, I don?t think we should be designing the >>>>>>>>>>>>>> protocol with >>>>>>>>>>>>>> only one architecture in mind. (The architecture of sending >>>>>>>>>>>>>> the name to a >>>>>>>>>>>>>> different processor than the index). >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you have numbers that show that the index is faster I >>>>>>>>>>>>>> would like to see >>>>>>>>>>>>>> under what conditions and architectural assumptions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nacho >>>>>>>>>>>>>> >>>>>>>>>>>>>> (I may have misinterpreted your description so feel free to >>>>>>>>>>>>>> correct me if >>>>>>>>>>>>>> I?m wrong.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Nacho (Ignacio) Solis >>>>>>>>>>>>>> Protocol Architect >>>>>>>>>>>>>> Principal Scientist >>>>>>>>>>>>>> Palo Alto Research Center (PARC) >>>>>>>>>>>>>> +1(650)812-4458 >>>>>>>>>>>>>> Ignacio.Solis at parc.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/18/14, 12:54 AM, "Massimo Gallo" >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Indeed each components' offset must be encoded using a fixed >>>>>>>>>>>>>> amount of >>>>>>>>>>>>>> bytes: >>>>>>>>>>>>>> >>>>>>>>>>>>>> i.e., >>>>>>>>>>>>>> Type = Offsets >>>>>>>>>>>>>> Length = 10 Bytes >>>>>>>>>>>>>> Value = Offset1(1byte), Offset2(1byte), ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> You may also imagine to have a "Offset_2byte" type if your >>>>>>>>>>>>>> name is too >>>>>>>>>>>>>> long. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Max >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/09/2014 09:27, Tai-Lin Chu wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>> you only >>>>>>>>>>>>>> want the first x components) you can directly have it using >>>>>>>>>>>>>> the >>>>>>>>>>>>>> offsets. With the Nested TLV structure you have to >>>>>>>>>>>>>> iteratively parse >>>>>>>>>>>>>> the first x-1 components. With the offset structure you cane >>>>>>>>>>>>>> directly >>>>>>>>>>>>>> access to the firs x components. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't get it. What you described only works if the >>>>>>>>>>>>>> "offset" is >>>>>>>>>>>>>> encoded in fixed bytes. With varNum, you will still need to >>>>>>>>>>>>>> parse x-1 >>>>>>>>>>>>>> offsets to get to the x offset. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Sep 17, 2014 at 11:57 PM, Massimo Gallo >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 17/09/2014 14:56, Mark Stapp wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ah, thanks - that's helpful. I thought you were saying "I >>>>>>>>>>>>>> like the >>>>>>>>>>>>>> existing NDN UTF8 'convention'." I'm still not sure I >>>>>>>>>>>>>> understand what >>>>>>>>>>>>>> you >>>>>>>>>>>>>> _do_ prefer, though. it sounds like you're describing an >>>>>>>>>>>>>> entirely >>>>>>>>>>>>>> different >>>>>>>>>>>>>> scheme where the info that describes the name-components is >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> someplace >>>>>>>>>>>>>> other than _in_ the name-components. is that correct? when >>>>>>>>>>>>>> you say >>>>>>>>>>>>>> "field >>>>>>>>>>>>>> separator", what do you mean (since that's not a "TL" from a >>>>>>>>>>>>>> TLV)? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Correct. >>>>>>>>>>>>>> In particular, with our name encoding, a TLV indicates the >>>>>>>>>>>>>> name >>>>>>>>>>>>>> hierarchy >>>>>>>>>>>>>> with offsets in the name and other TLV(s) indicates the >>>>>>>>>>>>>> offset to use >>>>>>>>>>>>>> in >>>>>>>>>>>>>> order to retrieve special components. >>>>>>>>>>>>>> As for the field separator, it is something like "/". >>>>>>>>>>>>>> Aliasing is >>>>>>>>>>>>>> avoided as >>>>>>>>>>>>>> you do not rely on field separators to parse the name; you >>>>>>>>>>>>>> use the >>>>>>>>>>>>>> "offset >>>>>>>>>>>>>> TLV " to do that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So now, it may be an aesthetic question but: >>>>>>>>>>>>>> >>>>>>>>>>>>>> if you do not need the entire hierarchal structure (suppose >>>>>>>>>>>>>> you only >>>>>>>>>>>>>> want >>>>>>>>>>>>>> the first x components) you can directly have it using the >>>>>>>>>>>>>> offsets. >>>>>>>>>>>>>> With the >>>>>>>>>>>>>> Nested TLV structure you have to iteratively parse the first >>>>>>>>>>>>>> x-1 >>>>>>>>>>>>>> components. >>>>>>>>>>>>>> With the offset structure you cane directly access to the >>>>>>>>>>>>>> firs x >>>>>>>>>>>>>> components. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Max >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- Mark >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/17/14 6:02 AM, Massimo Gallo wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> The why is simple: >>>>>>>>>>>>>> >>>>>>>>>>>>>> You use a lot of "generic component type" and very few >>>>>>>>>>>>>> "specific >>>>>>>>>>>>>> component type". You are imposing types for every component >>>>>>>>>>>>>> in order >>>>>>>>>>>>>> to >>>>>>>>>>>>>> handle few exceptions (segmentation, etc..). You create a >>>>>>>>>>>>>> rule >>>>>>>>>>>>>> (specify >>>>>>>>>>>>>> the component's type ) to handle exceptions! >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would prefer not to have typed components. Instead I would >>>>>>>>>>>>>> prefer >>>>>>>>>>>>>> to >>>>>>>>>>>>>> have the name as simple sequence bytes with a field >>>>>>>>>>>>>> separator. Then, >>>>>>>>>>>>>> outside the name, if you have some components that could be >>>>>>>>>>>>>> used at >>>>>>>>>>>>>> network layer (e.g. a TLV field), you simply need something >>>>>>>>>>>>>> that >>>>>>>>>>>>>> indicates which is the offset allowing you to retrieve the >>>>>>>>>>>>>> version, >>>>>>>>>>>>>> segment, etc in the name... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Max >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 16/09/2014 20:33, Mark Stapp wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/16/14 10:29 AM, Massimo Gallo wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think we agree on the small number of "component types". >>>>>>>>>>>>>> However, if you have a small number of types, you will end >>>>>>>>>>>>>> up with >>>>>>>>>>>>>> names >>>>>>>>>>>>>> containing many generic components types and few specific >>>>>>>>>>>>>> components >>>>>>>>>>>>>> types. Due to the fact that the component type specification >>>>>>>>>>>>>> is an >>>>>>>>>>>>>> exception in the name, I would prefer something that specify >>>>>>>>>>>>>> component's >>>>>>>>>>>>>> type only when needed (something like UTF8 conventions but >>>>>>>>>>>>>> that >>>>>>>>>>>>>> applications MUST use). >>>>>>>>>>>>>> >>>>>>>>>>>>>> so ... I can't quite follow that. the thread has had some >>>>>>>>>>>>>> explanation >>>>>>>>>>>>>> about why the UTF8 requirement has problems (with aliasing, >>>>>>>>>>>>>> e.g.) >>>>>>>>>>>>>> and >>>>>>>>>>>>>> there's been email trying to explain that applications don't >>>>>>>>>>>>>> have to >>>>>>>>>>>>>> use types if they don't need to. your email sounds like "I >>>>>>>>>>>>>> prefer >>>>>>>>>>>>>> the >>>>>>>>>>>>>> UTF8 convention", but it doesn't say why you have that >>>>>>>>>>>>>> preference in >>>>>>>>>>>>>> the face of the points about the problems. can you say why >>>>>>>>>>>>>> it is >>>>>>>>>>>>>> that >>>>>>>>>>>>>> you express a preference for the "convention" with problems >>>>>>>>>>>>>>? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Mark >>>>>>>>>>>>>> >>>>>>>>>>>>>> . >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Ndn-interest mailing list >>>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Ndn-interest mailing list >>>>>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>Ndn-interest mailing list >>>>>>>>Ndn-interest at lists.cs.ucla.edu >>>>>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Ndn-interest mailing list >>>>>>> Ndn-interest at lists.cs.ucla.edu >>>>>>> http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>>> >>>>>>_______________________________________________ >>>>>>Ndn-interest mailing list >>>>>>Ndn-interest at lists.cs.ucla.edu >>>>>>http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest >>>>> >>> From felix at rabe.io Mon Sep 29 12:56:49 2014 From: felix at rabe.io (Felix Rabe) Date: Mon, 29 Sep 2014 21:56:49 +0200 Subject: [Ndn-interest] Multiple names for lookup-by-content? Message-ID: <5429B981.4040106@rabe.io> I've just thought of something - sorry if this is a duplicate, I can't possibly completely follow what has been discussed before, so feel free to point me to earlier discussions (even if just 4 days old) of the same idea: The idea of a lookup by content via its hash intrigues me. I've heard of a suggestion of including this hash as a special field, so routing can happen either by name (if content with such a hash is unknown) or by content hash. I think of a content hash as yet another name. Why not include ... both? "/canonical/path/to/a/file" and "/hash-of-file"? Their order could indicate precedence, so the first name would be matched wherever a router only looks at one name (I'm thinking of performance here), whereas multiple names could be supported and matched in special situations (like a distributed database that uses NDN as its transport), but are optional to match. - Felix From shijunxiao at email.arizona.edu Mon Sep 29 14:46:22 2014 From: shijunxiao at email.arizona.edu (Junxiao Shi) Date: Mon, 29 Sep 2014 14:46:22 -0700 Subject: [Ndn-interest] Multiple names for lookup-by-content? In-Reply-To: <5429B981.4040106@rabe.io> References: <5429B981.4040106@rabe.io> Message-ID: Hi Felix I had a similar idea: Content-Addressable NDN Repository < http://www.slideshare.net/yoursunny/carepo-final>. It gives each Data packet two Names: one hierarchical Name, and one hash Name. The hash is computed over the payload (Content); it's different from implicit digest component which is computed over whole Data packet. My project was implemented as a NDNR repository, because I wanted to avoid changing the forwarding daemon at that time. But it's also possible to have it in the forwarding pipelines and the ContentStore. In my project, Data retrieval by hash Name is limited to local area network only. The main reason that prevents this from being using in wide-area network is routing scalability. Interests with hierarchical Names can be forwarded by routes installed by routing protocol. But it's impractical to install routes for every hash Name, because there are too many of them. Yours, Junxiao -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ignacio.Solis at parc.com Mon Sep 29 15:46:29 2014 From: Ignacio.Solis at parc.com (Ignacio.Solis at parc.com) Date: Mon, 29 Sep 2014 22:46:29 +0000 Subject: [Ndn-interest] Multiple names for lookup-by-content? In-Reply-To: <5429B981.4040106@rabe.io> References: <5429B981.4040106@rabe.io> Message-ID: CCN 1.0 has a separate field in interest for match-by-hash. It?s called the ContentObjectHash. An interest that contains that field is said to have a ContentObjectHash restriction. Forwarding of the interest happens on the regular name. Matching of the content object happens on the hash. Given an interest with a Name and a Hash, the systems considers it a match to an object that either: a- Has the same Name and same Hash OR b- Has no/empty Name and the same Hash This latter mode effectively is a hash of the content. The ContentObjectHash is a type of self-certified name. For CCN the hash is not used for forwarding since we believe flat routing is too expensive. Finally, at the last CCNxCon I presented a matching system with order of preference based on labels (which included hashes of content). You can find a video of my presentation at http://www.ccnx.org/events/ccnxcon-2013/ (video link = http://www.ccnx.org/video/CCNxCon2013/CCN-9-06-2013-pt3.mp4 , my presentation is at 21:50). Nacho -- Nacho (Ignacio) Solis Protocol Architect Principal Scientist Palo Alto Research Center (PARC) +1(650)812-4458 Ignacio.Solis at parc.com On 9/29/14, 12:56 PM, "Felix Rabe" wrote: >I've just thought of something - sorry if this is a duplicate, I can't >possibly completely follow what has been discussed before, so feel free >to point me to earlier discussions (even if just 4 days old) of the same >idea: > >The idea of a lookup by content via its hash intrigues me. I've heard of >a suggestion of including this hash as a special field, so routing can >happen either by name (if content with such a hash is unknown) or by >content hash. > >I think of a content hash as yet another name. Why not include ... both? >"/canonical/path/to/a/file" and "/hash-of-file"? > >Their order could indicate precedence, so the first name would be >matched wherever a router only looks at one name (I'm thinking of >performance here), whereas multiple names could be supported and matched >in special situations (like a distributed database that uses NDN as its >transport), but are optional to match. > >- Felix >_______________________________________________ >Ndn-interest mailing list >Ndn-interest at lists.cs.ucla.edu >http://www.lists.cs.ucla.edu/mailman/listinfo/ndn-interest