[Nfd-dev] Fwd: NFD Performance testing
Beichuan Zhang
bzhang at cs.arizona.edu
Tue Aug 5 13:54:55 PDT 2014
This was John DeHart’s performance profiling results back in April. It was NFD 0.1 tested on ONL.
Beichuan
Begin forwarded message:
> From: John DeHart <jdd at seas.wustl.edu>
> Subject: Re: NFD Performance testing
> Date: April 11, 2014 at 8:28:36 AM MST
> To: Alex Afanasyev <alexander.afanasyev at ucla.edu>
> Cc: Beichuan Zhang <bzhang at cs.arizona.edu>, Junxiao Shi <shijunxiao at email.arizona.edu>, Patrick Crowley <pcrowley at wustl.edu>, Haowei Yuan <hyuan at wustl.edu>, "Ben Abraham, Hila" <hila at wustl.edu>
>
>
> Alex,
>
> I just re-ran my tests for forwarding interests and content for unique names
> of short, medium and long lengths.
> Looks like performance improved between 8% and 16% depending on length.
> Before:
> short: 13700 Interests/sec
> medium: 9500 Interests/sec
> long: 4300 Interests/sec
>
> Now:
> short: 14900 Interests/sec
> medium: 10500 Interests/sec
> long: 5000 Interests/sec
>
> John
>
> On 4/10/14 8:08 PM, Alex Afanasyev wrote:
>> John, we just merged a commit that optimizes hash computation. When you have time, can you try to rerun some of the evaluations and check if we have better numbers (I hope).
>>
>> Commit f52dac7f03ac9ba996769cf620badeeb147b6d43 (current master) includes CS fix as well, so you can just get latest code from master branch of github or gerrit.
>>
>> Thanks!
>>
>> ---
>> Alex
>>
>> On Apr 10, 2014, at 12:37 PM, John DeHart <jdd at seas.wustl.edu> wrote:
>>
>>> Beichuan,
>>>
>>> Yes, I have the full gprof output available. I just didn't want to email that
>>> to everyone. There are some processing tools I have not tried out yet
>>> that might make things easier to read but for now, here is a sample:
>>>
>>> http://www.arl.wustl.edu/~jdd/NDN_GPROF_RESULTS/
>>>
>>> There you should see this file: gprof.out.no_content_long_names.txt
>>>
>>> If you search in there for "Call graph" you will get to the hierarchical view.
>>>
>>> If that is useful, I can put the rest of my files there also.
>>>
>>> John
>>>
>>>
>>>
>>>
>>> On 4/10/14 2:26 PM, Beichuan Zhang wrote:
>>>> Hi John,
>>>>
>>>> Thanks a lot! These are very informative and useful.
>>>>
>>>> To correlate these numbers with the code better, is it possible to get a hierarchical view of the functions and time? That'll make our analysis much easier. (Even the current flat form is already very useful.)
>>>>
>>>> Looking forward to the CS test results too.
>>>>
>>>> Beichuan
>>>>
>>>>
>>>> On Apr 10, 2014, at 12:08 PM, John DeHart <jdd at seas.wustl.edu> wrote:
>>>>
>>>>> All:
>>>>>
>>>>> Here is another set of data. This set takes content out of the picture.
>>>>> I don't run any content servers in these tests so the Interests are never fulfilled.
>>>>> I again vary the length of the names used to see that impact.
>>>>>
>>>>> Approximate rate nfd was able to handle:
>>>>> short: 37500 Interests/sec
>>>>> medium: 22400 Interests/sec
>>>>> long: 7100 Interests/sec
>>>>>
>>>>> For the short name case I find it interesting that nfd::Cs::find() still consumes 1.67% of
>>>>> our processing time. Remember there is no content ever returned so the CS should
>>>>> always be empty. Not that we want to optimize anything for the case of an empty
>>>>> CS, but that seems kind of high for a case where it doesn't have to do any searching.
>>>>>
>>>>> Again lots of usage for Block::Block(). For my amusement, for the long name case,
>>>>> I gathered all the usage of all the different Block::Block() signatures at the end of this
>>>>> note and tried to match them up to see which ones get used the most. Not sure it tells
>>>>> us anything but since I'm not familiar with the code I was curious.
>>>>>
>>>>> ndn::Block::Block(ndn::Block const&) on average gets invoked 970 per Interest
>>>>> for the long names.
>>>>>
>>>>> I've asked Jeff Burke's group for some actual names they use so I can see
>>>>> if my names are of reasonable lengths.
>>>>>
>>>>> Next I plan to do some tests where I load the CS and then bombard it with
>>>>> Interests that will always match something stored.
>>>>>
>>>>> Also, I should note that I am NOT seeing any signs of memory growth.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------------
>>>>>
>>>>> These tests are for an applied load of unique interests with no content returned.
>>>>> This should put a load on the PIT with no load on the CS.
>>>>>
>>>>> Two sets of tests were run.
>>>>> 1. Optimized nfd
>>>>> 2. Profiled nfd
>>>>>
>>>>> 1. Optimized nfd tests
>>>>> For these tests nfd was built with the standard default compilation options, defaultFlags = ['-O2', '-g', '-Wall']
>>>>>
>>>>> The following tests were run with 128 client all routing through one central nfd router. There are no
>>>>> servers to provide content for the supplied interests. There are hosts running nfd that the interests are routed to.
>>>>>
>>>>> 16 client hosts running 8 ndn-traffic processes and one nfd.
>>>>> 16 server hosts running one nfd.
>>>>> 1 router host running 1 nfd as the central router.
>>>>>
>>>>> Three test cases for name length:
>>>>> short: /example/000
>>>>> medium: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000
>>>>> long: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000
>>>>>
>>>>> 128 different base names: end of name ranged from 000 to 127.
>>>>> A sequence number is appended by ndn-traffic to each Interest to force every name to be unique.
>>>>>
>>>>> Applied load was approximately 42000 interests/sec.
>>>>>
>>>>> Approximate rate nfd was able to handle:
>>>>> short: 37500 Interests/sec
>>>>> medium: 22400 Interests/sec
>>>>> long: 7100 Interests/sec
>>>>>
>>>>> 2. Profiled nfd test
>>>>> In order to generate gprof output, nfd is built with profile enabled, defaultFlags = ['-O2', '-pg', '-g', '-Wall'].
>>>>> This obviously slows nfd down and the performance is not nearly what the optimized case shows. But what we
>>>>> are interested in here is what gprof can tell us about which functions are consuming time.
>>>>>
>>>>> The following tests were run with 128 client/server pairs all routing through one central nfd router.
>>>>>
>>>>> Tests were run for 2,000,000 Interests received by nfd. Counter in pit code added to trigger call
>>>>> to exit() so gmon.out could be generated.
>>>>>
>>>>> 16 client hosts running 8 ndn-traffic process and 1 nfd.
>>>>> 16 server hosts running 8 ndn-traffic-server process and 1 nfd.
>>>>> 1 router host running 1 nfd as the central router.
>>>>>
>>>>> Three test cases for name length:
>>>>> short: /example/000
>>>>> medium: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000
>>>>> long: /example/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/ABCDE/FGHIJ/KLMNO/PQRST/UVWXY/Z/000
>>>>>
>>>>> 128 different base names: end of name ranged from 000 to 127.
>>>>> A sequence number is appended by ndn-traffic to each Interest to force every name to be unique.
>>>>>
>>>>> Applied load was approximately 25500 interests/sec.
>>>>> In the short, medium and long test case, the central router nfd was not able to keep up with
>>>>> the applied load.
>>>>>
>>>>> The gprof data shown below is from the Flat profile given by gprof. I'm only showing the top consumers
>>>>> that consume at least 1% of the cpu time used.
>>>>>
>>>>> short:
>>>>> % cumulative self self total
>>>>> time seconds seconds calls s/call s/call name
>>>>> 19.76 5.68 5.68 741559024 0.00 0.00 boost::detail::shared_count::~shared_count()
>>>>> 13.24 9.49 3.81 150091648 0.00 0.00 ndn::Block::Block(ndn::Block const&)
>>>>> 7.39 11.61 2.13 156984996 0.00 0.00 ndn::Block::~Block()
>>>>> 4.21 12.82 1.21 1914454 0.00 0.00 nfd::NameTree::eraseEntryIfEmpty(boost::shared_ptr<nfd::name_tree::Entry>)
>>>>> 4.18 14.02 1.20 8001344 0.00 0.00 nfd::NameTree::insert(ndn::Name const&)
>>>>> 3.93 15.15 1.13 9919137 0.00 0.00 ndn::Name::toUri() const
>>>>> 2.99 16.01 0.86 2000193 0.00 0.00 ndn::Scheduler::schedulePeriodicEvent(boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::function<void ()> const&)
>>>>> 1.81 16.53 0.52 1916756 0.00 0.00 nfd::NameTree::findExactMatch(ndn::Name const&) const
>>>>> 1.79 17.05 0.52 5914452 0.00 0.00 nfd::NameTree::findLongestPrefixMatch(boost::shared_ptr<nfd::name_tree::Entry>, boost::function<bool (nfd::name_tree::Entry const&)> const&) const
>>>>> 1.67 17.53 0.48 1999999 0.00 0.00 nfd::Cs::find(ndn::Interest const&) const
>>>>> 1.67 18.01 0.48 1 0.48 27.91 boost::asio::detail::task_io_service::run(boost::system::error_code&)
>>>>> 1.46 18.43 0.42 2000129 0.00 0.00 nfd::NameTree::lookup(ndn::Name const&)
>>>>> 1.04 18.73 0.30 13828583 0.00 0.00 boost::detail::function::functor_manager<boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::Forwarder, boost::shared_ptr<nfd::pit::Entry> >, boost::_bi::list2<boost::_bi::value<nfd::Forwarder*>, boost::_bi::value<boost::shared_ptr<nfd::pit::Entry> > > > >::manage(boost::detail::function::function_buffer const&, boost::detail::function::function_buffer&, boost::detail::function::functor_manager_operation_type)
>>>>>
>>>>>
>>>>> medium:
>>>>> % cumulative self self total
>>>>> time seconds seconds calls s/call s/call name
>>>>> 20.50 8.60 8.60 1266365703 0.00 0.00 boost::detail::shared_count::~shared_count()
>>>>> 18.09 16.19 7.59 348085329 0.00 0.00 ndn::Block::Block(ndn::Block const&)
>>>>> 7.08 19.16 2.97 351410680 0.00 0.00 ndn::Block::~Block()
>>>>> 6.42 21.86 2.70 20001146 0.00 0.00 nfd::NameTree::insert(ndn::Name const&)
>>>>> 5.72 24.26 2.40 21952166 0.00 0.00 ndn::Name::toUri() const
>>>>> 3.13 25.57 1.32 21951129 0.00 0.00 nfd::name_tree::hashName(ndn::Name const&)
>>>>> 2.86 26.77 1.20 1947681 0.00 0.00 nfd::NameTree::eraseEntryIfEmpty(boost::shared_ptr<nfd::name_tree::Entry>)
>>>>> 2.29 27.73 0.96 2000129 0.00 0.00 nfd::NameTree::lookup(ndn::Name const&)
>>>>> 2.24 28.67 0.94 2000193 0.00 0.00 ndn::Scheduler::schedulePeriodicEvent(boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&, boost::function<void ()> const&)
>>>>> 1.99 29.51 0.84 5947679 0.00 0.00 nfd::NameTree::findLongestPrefixMatch(boost::shared_ptr<nfd::name_tree::Entry>, boost::function<bool (nfd::name_tree::Entry const&)> const&) const
>>>>> 1.50 30.14 0.63 1949983 0.00 0.00 nfd::NameTree::findExactMatch(ndn::Name const&) const
>>>>> 1.49 30.76 0.63 78018430 0.00 0.00 std::vector<ndn::Block, std::allocator<ndn::Block> >::_M_insert_aux(__gnu_cxx::__normal_iterator<ndn::Block*, std::vector<ndn::Block, std::allocator<ndn::Block> > >, ndn::Block const&)
>>>>>
>>>>>
>>>>> long:
>>>>> % cumulative self self total
>>>>> time seconds seconds calls ms/call ms/call name
>>>>> 25.38 26.82 26.82 1940055896 0.00 0.00 ndn::Block::Block(ndn::Block const&)
>>>>> 22.66 50.75 23.94 4280104948 0.00 0.00 boost::detail::shared_count::~shared_count()
>>>>> 8.42 59.65 8.90 57984812 0.00 0.00 ndn::Name::toUri() const
>>>>> 6.59 66.61 6.97 56000552 0.00 0.00 nfd::NameTree::insert(ndn::Name const&)
>>>>> 6.56 73.54 6.93 57983775 0.00 0.00 nfd::name_tree::hashName(ndn::Name const&)
>>>>> 6.48 80.39 6.85 1941813258 0.00 0.00 ndn::Block::~Block()
>>>>> 3.27 83.84 3.45 2000129 0.00 0.04 nfd::NameTree::lookup(ndn::Name const&)
>>>>> 2.40 86.37 2.54 282015065 0.00 0.00 std::vector<ndn::Block, std::allocator<ndn::Block> >::_M_insert_aux(__gnu_cxx::__normal_iterator<ndn::Block*, std::vector<ndn::Block, std::allocator<ndn::Block> > >, ndn::Block const&)
>>>>> 2.06 88.55 2.18 5980919 0.00 0.00 nfd::NameTree::findLongestPrefixMatch(boost::shared_ptr<nfd::name_tree::Entry>, boost::function<bool (nfd::name_tree::Entry const&)> const&) const
>>>>> 1.45 90.08 1.53 564030243 0.00 0.00 __tcf_1
>>>>> 1.00 91.14 1.06 1983223 0.00 0.00 nfd::NameTree::findExactMatch(ndn::Name const&) const
>>>>>
>>>>> Block::Block Analysis:
>>>>> Here are all the gprof listings for Block::Block from the flat profile for the long name case above.
>>>>> The numbers in parens (#) are my addition to try to match this with the actual code.
>>>>>
>>>>> 25.38 26.82 26.82 1940055896 0.00 0.00 (1) ndn::Block::Block(ndn::Block const&)
>>>>> 0.37 96.51 0.40 64016585 0.00 0.00 (2) ndn::Block::Block(boost::shared_ptr<ndn::Buffer const> const&, unsigned int, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&)
>>>>> 0.05 102.92 0.06 62072947 0.00 0.00 (3) ndn::Block::Block(unsigned int)
>>>>> 0.02 104.68 0.02 16006475 0.00 0.00 (4) ndn::Block::Block()
>>>>> 0.02 105.14 0.02 (5) ndn::Block::Block(std::istream&)
>>>>> 0.00 105.64 0.00 2457 0.00 0.00 (6) ndn::Block::Block(boost::shared_ptr<ndn::Buffer const> const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > > const&, bool)
>>>>> 0.00 105.64 0.00 1744 0.00 0.00 (7) ndn::Block::Block(unsigned int, boost::shared_ptr<ndn::Buffer const> const&)
>>>>> 0.00 105.64 0.00 1196 0.00 0.00 (8) ndn::Block::Block(boost::shared_ptr<ndn::Buffer const> const&)
>>>>> 0.00 105.64 0.00 517 0.00 0.00 (9) ndn::Block::Block(unsigned int, ndn::Block const&)
>>>>> 0.00 105.64 0.00 1 0.00 20.00 (10) ndn::Block::Block(unsigned char const*, unsigned long)
>>>>>
>>>>> And here are all the constructors' signatures from block.cpp:
>>>>> ndn-cpp-dev/src/encoding$ grep "Block::Block" block.cpp
>>>>> (4) Block::Block()
>>>>> (1) Block::Block(const EncodingBuffer& buffer)
>>>>> (2) Block::Block(const ConstBufferPtr& wire,
>>>>> uint32_t type,
>>>>> const Buffer::const_iterator& begin, const Buffer::const_iterator& end,
>>>>> const Buffer::const_iterator& valueBegin, const Buffer::const_iterator& valueEnd)
>>>>> (8) Block::Block(const ConstBufferPtr& buffer)
>>>>> (6) Block::Block(const ConstBufferPtr& buffer,
>>>>> const Buffer::const_iterator& begin, const Buffer::const_iterator& end,
>>>>> bool verifyLength/* = true*/)
>>>>> (5) Block::Block(std::istream& is)
>>>>> (10) Block::Block(const uint8_t* buffer, size_t maxlength)
>>>>> Block::Block(const void* bufferX, size_t maxlength)
>>>>> (3) Block::Block(uint32_t type)
>>>>> (7) Block::Block(uint32_t type, const ConstBufferPtr& value)
>>>>> (9) Block::Block(uint32_t type, const Block& value)
>>>>>
>>>>>
>>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/nfd-dev/attachments/20140805/64998f98/attachment.html>
More information about the Nfd-dev
mailing list