[Ndn-interest] Proposed deprecation: byte offset segmenting

Thu Jan 31 17:20:10 PST 2019

Dear folks

For on-line publishers, it could be used like the HTTP byte offset to fetch
> whatever byte range one is interest in.  Though I think that is mostly used
> to stripe request between different replicas.  Maybe combining that with
> routing hints would be interesting?  Though one could do pretty much the
> same think with segment numbers in that case.
>

Yes, a pair of byte offset segment marker/component is equivalent to HTTP
request header Range in this case.
Although, when I translated NFS into NDN, I choose to chunk every file at
4096 octets and use segment number in Data names, in order to obtain cache
benefits. See https://repository.arizona.edu/handle/10150/625652 section
A.3.

A use case I can think of is segmenting content at non uniform chunk sizes,
> e.g. boundary determined by Rabin fingerprints. I had a related project
> although it does not directly use byte offset segmenting:
> Content-Addressable NDN Repository https://github.com/yoursunny/carepo
> Retrieval of byte offset segmented content would require either
> stop-and-wait, or a manifest listing all segment boundaries to enable
> pipelining.
>

Davide asked me in person on why I couldn't use segment numbers in this use
case. The reason I need byte offsets in names is to enable chunking in
parallel.
Rabin fingerprint works by computing a (cheap) digest in a 31-byte sliding
window over file content, and declare a chunk boundary if the digest ends
with several zeros. For example, using 12 zeros would cause the average
chunk size to be 4096 octets. Since this process is probabilistic, the
number of chunks from a given input is unknown before perform this
computation.
If I want to divide a large file to multiple machines and let each machine
handle a portion of the file, any machine other than the first one would
not know what segment number it should use for the Data packets it
generates.

Davide further suggested allows non-consecutive segment numbers. This
solves the parallel chunking problem, but increases manifest size.
When using byte offsets in names, the manifest only needs to contain each
segment's byte offset and SHA256 digest; the SHA256 digest is for
eliminating duplicate segments.
When using non-consecutive segment numbers in names, the manifest also
needs to contain the segment numbers. For a download to proceed
efficiently, the byte offsets are still necessary, so that the downloader
can seek to the correct position in the local file before writing a chunk.

Yours, Junxiao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/ndn-interest/attachments/20190131/e3e0023d/attachment.html>