<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">This was John DeHart’s performance profiling results back in April. It was NFD 0.1 tested on ONL.<div><br></div><div>Beichuan<br><div><br><div>Begin forwarded message:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>From: </b></span><span style="font-family:'Helvetica';">John DeHart <<a href="mailto:jdd@seas.wustl.edu">jdd@seas.wustl.edu</a>><br></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>Subject: </b></span><span style="font-family:'Helvetica';"><b>Re: NFD Performance testing</b><br></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>Date: </b></span><span style="font-family:'Helvetica';">April 11, 2014 at 8:28:36 AM MST<br></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>To: </b></span><span style="font-family:'Helvetica';">Alex Afanasyev <<a href="mailto:alexander.afanasyev@ucla.edu">alexander.afanasyev@ucla.edu</a>><br></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;"><span style="font-family:'Helvetica'; color:rgba(0, 0, 0, 1.0);"><b>Cc: </b></span><span style="font-family:'Helvetica';">Beichuan Zhang <<a href="mailto:bzhang@cs.arizona.edu">bzhang@cs.arizona.edu</a>>, Junxiao Shi <<a href="mailto:shijunxiao@email.arizona.edu">shijunxiao@email.arizona.edu</a>>, Patrick Crowley <<a href="mailto:pcrowley@wustl.edu">pcrowley@wustl.edu</a>>, Haowei Yuan <<a href="mailto:hyuan@wustl.edu">hyuan@wustl.edu</a>>, "Ben Abraham, Hila" <<a href="mailto:hila@wustl.edu">hila@wustl.edu</a>><br></span></div><br><div><br>Alex,<br><br>I just re-ran my tests for forwarding interests and content for unique names<br>of short, medium and long lengths.<br>Looks like performance improved between 8% and 16% depending on length.<br>Before:<br> short: 13700 Interests/sec<br> medium: 9500 Interests/sec<br> long: 4300 Interests/sec<br><br>Now:<br> short: 14900 Interests/sec<br> medium: 10500 Interests/sec<br> long: 5000 Interests/sec<br><br>John<br><br>On 4/10/14 8:08 PM, Alex Afanasyev wrote:<br><blockquote type="cite">John, we just merged a commit that optimizes hash computation. When you have time, can you try to rerun some of the evaluations and check if we have better numbers (I hope).<br><br>Commit f52dac7f03ac9ba996769cf620badeeb147b6d43 (current master) includes CS fix as well, so you can just get latest code from master branch of github or gerrit.<br><br>Thanks!<br><br>---<br>Alex<br><br>On Apr 10, 2014, at 12:37 PM, John DeHart <<a href="mailto:jdd@seas.wustl.edu">jdd@seas.wustl.edu</a>> wrote:<br><br><blockquote type="cite">Beichuan,<br><br>Yes, I have the full gprof output available. I just didn't want to email that<br>to everyone. There are some processing tools I have not tried out yet<br>that might make things easier to read but for now, here is a sample:<br><br><a href="http://www.arl.wustl.edu/~jdd/NDN_GPROF_RESULTS/">http://www.arl.wustl.edu/~jdd/NDN_GPROF_RESULTS/</a><br><br>There you should see this file: gprof.out.no_content_long_names.txt<br><br>If you search in there for "Call graph" you will get to the hierarchical view.<br><br>If that is useful, I can put the rest of my files there also.<br><br>John<br><br><br><br><br>On 4/10/14 2:26 PM, Beichuan Zhang wrote:<br><blockquote type="cite">Hi John,<br><br>Thanks a lot! These are very informative and useful.<br><br>To correlate these numbers with the code better, is it possible to get a hierarchical view of the functions and time? That'll make our analysis much easier. (Even the current flat form is already very useful.)<br><br>Looking forward to the CS test results too.<br><br>Beichuan<br><br><br>On Apr 10, 2014, at 12:08 PM, John DeHart <jdd@seas.wustl.edu> wrote:<br><br><blockquote type="cite">All:<br><br>Here is another set of data. This set takes content out of the picture.<br>I don't run any content servers in these tests so the Interests are never fulfilled.<br>I again vary the length of the names used to see that impact.<br><br> Approximate rate nfd was able to handle:<br> short: 37500 Interests/sec<br> medium: 22400 Interests/sec<br> long: 7100 Interests/sec<br><br>For the short name case I find it interesting that nfd::Cs::find() still consumes 1.67% of<br>our processing time. Remember there is no content ever returned so the CS should<br>always be empty. Not that we want to optimize anything for the case of an empty<br>CS, but that seems kind of high for a case where it doesn't have to do any searching.<br><br>Again lots of usage for Block::Block(). For my amusement, for the long name case,<br>I gathered all the usage of all the different Block::Block() signatures at the end of this<br>note and tried to match them up to see which ones get used the most. Not sure it tells<br>us anything but since I'm not familiar with the code I was curious.<br><br>ndn::Block::Block(ndn::Block const&) on average gets invoked 970 per Interest<br>for the long names.<br><br>I've asked Jeff Burke's group for some actual names they use so I can see<br>if my names are of reasonable lengths.<br><br>Next I plan to do some tests where I load the CS and then bombard it with<br>Interests that will always match something stored.<br><br>Also, I should note that I am NOT seeing any signs of memory growth.<br><br>John<br><br><br>--------------------------------------------------------------------------------<br><br>These tests are for an applied load of unique interests with no content returned.<br>This should put a load on the PIT with no load on the CS.<br><br>Two sets of tests were run.<br>1. Optimized nfd<br>2. Profiled nfd<br><br>1. Optimized nfd tests<br> For these tests nfd was built with the standard default compilation options, defaultFlags = ['-O2', '-g', '-Wall']<br><br> The following tests were run with 128 client all routing through one central nfd router. There are no<br> servers to provide content for the supplied interests. There are hosts running nfd that the interests are routed to.<br><br> 16 client hosts running 8 ndn-traffic processes and one nfd.<br> 16 server hosts running one nfd.<br> 1 router host running 1 nfd as the central router.<br><br> Three test cases for name length:<br> short: /example/000<br> medium: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000<br> long: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000<br><br> 128 different base names: end of name ranged from 000 to 127.<br> A sequence number is appended by ndn-traffic to each Interest to force every name to be unique.<br><br> Applied load was approximately 42000 interests/sec.<br><br> Approximate rate nfd was able to handle:<br> short: 37500 Interests/sec<br> medium: 22400 Interests/sec<br> long: 7100 Interests/sec<br><br>2. Profiled nfd test<br> In order to generate gprof output, nfd is built with profile enabled, defaultFlags = ['-O2', '-pg', '-g', '-Wall'].<br> This obviously slows nfd down and the performance is not nearly what the optimized case shows. But what we<br> are interested in here is what gprof can tell us about which functions are consuming time.<br><br> The following tests were run with 128 client/server pairs all routing through one central nfd router.<br><br> Tests were run for 2,000,000 Interests received by nfd. Counter in pit code added to trigger call<br> to exit() so gmon.out could be generated.<br><br> 16 client hosts running 8 ndn-traffic process and 1 nfd.<br> 16 server hosts running 8 ndn-traffic-server process and 1 nfd.<br> 1 router host running 1 nfd as the central router.<br><br> Three test cases for name length:<br> short: /example/000<br> medium: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000<br> long: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000<br><br> 128 different base names: end of name ranged from 000 to 127.<br> A sequence number is appended by ndn-traffic to each Interest to force every name to be unique.<br><br> Applied load was approximately 25500 interests/sec.<br> In the short, medium and long test case, the central router nfd was not able to keep up with<br> the applied load.<br><br> The gprof data shown below is from the Flat profile given by gprof. I'm only showing the top consumers<br> that consume at least 1% of the cpu time used.<br><br>short:<br> % cumulative self self total<br>time seconds seconds calls s/call s/call name<br>19.76 5.68 5.68 741559024 0.00 0.00 boost::detail::shared_count::~shared_count()<br>13.24 9.49 3.81 150091648 0.00 0.00 ndn::Block::Block(ndn::Block const&)<br> 7.39 11.61 2.13 156984996 0.00 0.00 ndn::Block::~Block()<br> 4.21 12.82 1.21 1914454 0.00 0.00 nfd::NameTree::eraseEntryIfEmpty(boost::shared_ptr<nfd::name_tree::Entry>)<br> 4.18 14.02 1.20 8001344 0.00 0.00 nfd::NameTree::insert(ndn::Name const&)<br> 3.93 15.15 1.13 9919137 0.00 0.00 ndn::Name::toUri() const<br> 2.99 16.01 0.86 2000193 0.00 0.00 ndn::Scheduler::schedulePeriodicEvent(boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::function<void ()> const&)<br> 1.81 16.53 0.52 1916756 0.00 0.00 nfd::NameTree::findExactMatch(ndn::Name const&) const<br> 1.79 17.05 0.52 5914452 0.00 0.00 nfd::NameTree::findLongestPrefixMatch(boost::shared_ptr<nfd::name_tree::Entry>, boost::function<bool (nfd::name_tree::Entry const&)> const&) const<br> 1.67 17.53 0.48 1999999 0.00 0.00 nfd::Cs::find(ndn::Interest const&) const<br> 1.67 18.01 0.48 1 0.48 27.91 boost::asio::detail::task_io_service::run(boost::system::error_code&)<br> 1.46 18.43 0.42 2000129 0.00 0.00 nfd::NameTree::lookup(ndn::Name const&)<br> 1.04 18.73 0.30 13828583 0.00 0.00 boost::detail::function::functor_manager<boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::Forwarder, boost::shared_ptr<nfd::pit::Entry> >, boost::_bi::list2<boost::_bi::value<nfd::Forwarder*>, boost::_bi::value<boost::shared_ptr<nfd::pit::Entry> > > > >::manage(boost::detail::function::function_buffer const&, boost::detail::function::function_buffer&, boost::detail::function::functor_manager_operation_type)<br><br><br>medium:<br> % cumulative self self total<br>time seconds seconds calls s/call s/call name<br>20.50 8.60 8.60 1266365703 0.00 0.00 boost::detail::shared_count::~shared_count()<br>18.09 16.19 7.59 348085329 0.00 0.00 ndn::Block::Block(ndn::Block const&)<br> 7.08 19.16 2.97 351410680 0.00 0.00 ndn::Block::~Block()<br> 6.42 21.86 2.70 20001146 0.00 0.00 nfd::NameTree::insert(ndn::Name const&)<br> 5.72 24.26 2.40 21952166 0.00 0.00 ndn::Name::toUri() const<br> 3.13 25.57 1.32 21951129 0.00 0.00 nfd::name_tree::hashName(ndn::Name const&)<br> 2.86 26.77 1.20 1947681 0.00 0.00 nfd::NameTree::eraseEntryIfEmpty(boost::shared_ptr<nfd::name_tree::Entry>)<br> 2.29 27.73 0.96 2000129 0.00 0.00 nfd::NameTree::lookup(ndn::Name const&)<br> 2.24 28.67 0.94 2000193 0.00 0.00 ndn::Scheduler::schedulePeriodicEvent(boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::function<void ()> const&)<br> 1.99 29.51 0.84 5947679 0.00 0.00 nfd::NameTree::findLongestPrefixMatch(boost::shared_ptr<nfd::name_tree::Entry>, boost::function<bool (nfd::name_tree::Entry const&)> const&) const<br> 1.50 30.14 0.63 1949983 0.00 0.00 nfd::NameTree::findExactMatch(ndn::Name const&) const<br> 1.49 30.76 0.63 78018430 0.00 0.00 std::vector<ndn::Block, std::allocator<ndn::Block> >::_M_insert_aux(__gnu_cxx::__normal_iterator<ndn::Block*, std::vector<ndn::Block, std::allocator<ndn::Block> > >, ndn::Block const&)<br><br><br>long:<br> % cumulative self self total<br>time seconds seconds calls ms/call ms/call name<br>25.38 26.82 26.82 1940055896 0.00 0.00 ndn::Block::Block(ndn::Block const&)<br>22.66 50.75 23.94 4280104948 0.00 0.00 boost::detail::shared_count::~shared_count()<br> 8.42 59.65 8.90 57984812 0.00 0.00 ndn::Name::toUri() const<br> 6.59 66.61 6.97 56000552 0.00 0.00 nfd::NameTree::insert(ndn::Name const&)<br> 6.56 73.54 6.93 57983775 0.00 0.00 nfd::name_tree::hashName(ndn::Name const&)<br> 6.48 80.39 6.85 1941813258 0.00 0.00 ndn::Block::~Block()<br> 3.27 83.84 3.45 2000129 0.00 0.04 nfd::NameTree::lookup(ndn::Name const&)<br> 2.40 86.37 2.54 282015065 0.00 0.00 std::vector<ndn::Block, std::allocator<ndn::Block> >::_M_insert_aux(__gnu_cxx::__normal_iterator<ndn::Block*, std::vector<ndn::Block, std::allocator<ndn::Block> > >, ndn::Block const&)<br> 2.06 88.55 2.18 5980919 0.00 0.00 nfd::NameTree::findLongestPrefixMatch(boost::shared_ptr<nfd::name_tree::Entry>, boost::function<bool (nfd::name_tree::Entry const&)> const&) const<br> 1.45 90.08 1.53 564030243 0.00 0.00 __tcf_1<br> 1.00 91.14 1.06 1983223 0.00 0.00 nfd::NameTree::findExactMatch(ndn::Name const&) const<br><br>Block::Block Analysis:<br>Here are all the gprof listings for Block::Block from the flat profile for the long name case above.<br>The numbers in parens (#) are my addition to try to match this with the actual code.<br><br>25.38 26.82 26.82 1940055896 0.00 0.00 (1) ndn::Block::Block(ndn::Block const&)<br> 0.37 96.51 0.40 64016585 0.00 0.00 (2) ndn::Block::Block(boost::shared_ptr<ndn::Buffer const> const&, unsigned int, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&)<br> 0.05 102.92 0.06 62072947 0.00 0.00 (3) ndn::Block::Block(unsigned int)<br> 0.02 104.68 0.02 16006475 0.00 0.00 (4) ndn::Block::Block()<br> 0.02 105.14 0.02 (5) ndn::Block::Block(std::istream&)<br> 0.00 105.64 0.00 2457 0.00 0.00 (6) ndn::Block::Block(boost::shared_ptr<ndn::Buffer const> const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, bool)<br> 0.00 105.64 0.00 1744 0.00 0.00 (7) ndn::Block::Block(unsigned int, boost::shared_ptr<ndn::Buffer const> const&)<br> 0.00 105.64 0.00 1196 0.00 0.00 (8) ndn::Block::Block(boost::shared_ptr<ndn::Buffer const> const&)<br> 0.00 105.64 0.00 517 0.00 0.00 (9) ndn::Block::Block(unsigned int, ndn::Block const&)<br> 0.00 105.64 0.00 1 0.00 20.00 (10) ndn::Block::Block(unsigned char const*, unsigned long)<br><br>And here are all the constructors' signatures from block.cpp:<br>ndn-cpp-dev/src/encoding$ grep "Block::Block" block.cpp<br>(4) Block::Block()<br>(1) Block::Block(const EncodingBuffer& buffer)<br>(2) Block::Block(const ConstBufferPtr& wire,<br> uint32_t type,<br> const Buffer::const_iterator& begin, const Buffer::const_iterator& end,<br> const Buffer::const_iterator& valueBegin, const Buffer::const_iterator& valueEnd)<br>(8) Block::Block(const ConstBufferPtr& buffer)<br>(6) Block::Block(const ConstBufferPtr& buffer,<br> const Buffer::const_iterator& begin, const Buffer::const_iterator& end,<br> bool verifyLength/* = true*/)<br>(5) Block::Block(std::istream& is)<br>(10) Block::Block(const uint8_t* buffer, size_t maxlength)<br> Block::Block(const void* bufferX, size_t maxlength)<br>(3) Block::Block(uint32_t type)<br>(7) Block::Block(uint32_t type, const ConstBufferPtr& value)<br>(9) Block::Block(uint32_t type, const Block& value)<br><br><br><br></blockquote></blockquote></blockquote></blockquote><br></div></blockquote></div><br></div></body></html>