[ndnSIM] Simulation terminated with signal SIGSEGV - possible bug

Thiago Teixeira tteixeira at umass.edu
Mon May 21 07:45:30 PDT 2018


Hi John,

Please find my answers below in red. As Junxiao pointed out, we are trying to run our scenario with more memory (32 GB).
We will let you know how it goes.

Thanks,
Thiago


From: John Baugh [mailto:jpbaugh at umich.edu]
Sent: Saturday, May 19, 2018 5:38 AM
To: Thiago Teixeira <tteixeira at umass.edu>
Cc: Junxiao Shi <shijunxiao at email.arizona.edu>; ndnsim <ndnsim at lists.cs.ucla.edu>; Rajvardhan <rdeshmukh at umass.edu>
Subject: Re: [ndnSIM] Simulation terminated with signal SIGSEGV - possible bug

Thiago,

I keep coming back to this when I get free time, and it's still a bit perplexing.  Is the ndn-debug-scenario.cc that you have at https://gist.github.com/thiteixeira/8c4fd1deb884b548d1f071ebb1bee043#file-valgrind-leak-check-full-txt up to date?

Yes, that is the code we are using to debug this issue, because it has only a few modifications from the ndn-simple-wifi scenario.

The Valgrind points to line 195 as being the culprit, but that's just setting a prefix ( producerHelper.SetPrefix(ndnPrefix);)

Some things we can at least look into:


  1.  What version of the C standard library is installed on your system?      (ldd --version)
$ ldd –version
ldd (Ubuntu GLIBC 2.23-0ubuntu10) 2.23

  1.  When you said "ndnSIM 2.4 didn't work for us", did you mean you couldn't install ndnSIM 2.4, or you did install it and it still doesn't work?
We were able to install ndnSIM 2.4 by cloning ndnSIM 2.5, checking the ndnSIM 2.4 tag, running “git submodule update --init". But the code in that Gist only runs for a few seconds (Raj has more memory and his simulation run for a bit longer).

  1.  Deep in the belly of ns-3 (https://www.nsnam.org/doxygen/simple-ref-count_8h_source.html#l00105) - it appears there is an m_count variable, but it's an unsigned 32 bit integer... not 64 bit.  So I'm wondering if the number of references it's tracking exceeds 4,294,967,295.
  2.  Alternatively, the problem most clearly (unclearly?) says it's "

5.            Address 0x8 is not stack'd, malloc'd or (recently) free'd

6.

and
==1647== Process terminating with default action of signal 11 (SIGSEGV) ==1647== Access not within mapped region at address 0x8


So I'm wondering...  where is this 0x8 address having a value assigned or being accessed?  0x8 doesn't seem to be a likely memory address to be allocated...  so a pointer is being set to the value "8" somewhere....  I don't see you doing that anywhere.  So I'm wondering if there's a buffer overflow somewhere.


Thanks,

John

On Wed, May 16, 2018 at 5:10 PM, Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>> wrote:
Hi John,

I added more memory when I upgraded to the latest Valgrind version. This test was run with 4GB of memory on the VM. My colleague also performed the same test on a VM with 8GB, same result.

Is there any other test that we can run? (ndnSIM 2.4 didn’t work for us)

Thanks,
Thiago


From: John Baugh [mailto:jpbaugh at umich.edu<mailto:jpbaugh at umich.edu>]
Sent: Wednesday, May 16, 2018 3:53 PM
To: Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>>

Cc: Junxiao Shi <shijunxiao at email.arizona.edu<mailto:shijunxiao at email.arizona.edu>>; ndnsim <ndnsim at lists.cs.ucla.edu<mailto:ndnsim at lists.cs.ucla.edu>>
Subject: Re: [ndnSIM] Simulation terminated with signal SIGSEGV - possible bug

Thiago,

My best estimate is this based on valgrind:  your simulation is using almost 900 MB of memory, and your system only has 2.0 GB.  I think by the time it gets to 2,000 or so seconds, you simply are running out of memory and there's a SIGSEGV.  You'll probably need more memory in this system, or to use a more powerful system.  Under HEAP SUMMARY, it says

  in use at exit: 897,122,422 bytes in 7,147,817 blocks

==1647==   total heap usage: 40,377,629,822 allocs, 40,370,482,008 frees, 4,349,761,010,258 bytes allocated

I assume with the OS, other apps running, and the simulator, it's just too much for the device you're on.

Thanks!

John

On Wed, May 16, 2018 at 10:18 AM, Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>> wrote:
Hi John,

The Valgrind (version 3.13) test ended. Please see output attached. I also posted here<https://umass.box.com/s/s9qql8vo1kjy170qm8ijxcm6181ioip4> for future reference.

Please let me know is there’s something else I can do.

Best,
Thiago


From: John Baugh [mailto:jpbaugh at umich.edu<mailto:jpbaugh at umich.edu>]
Sent: Monday, May 7, 2018 4:02 AM

To: Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>>
Cc: Junxiao Shi <shijunxiao at email.arizona.edu<mailto:shijunxiao at email.arizona.edu>>; ndnsim <ndnsim at lists.cs.ucla.edu<mailto:ndnsim at lists.cs.ucla.edu>>
Subject: Re: [ndnSIM] Simulation terminated with signal SIGSEGV - possible bug

Thiago,

I am sorry to ask you to do this, but could you perhaps upgrade your Valgrind and run again if possible?  I've been looking into your issue and I found a few places that said Valgrind 3.11.0 doesn't recognize the random_device::_M_getval()  instruction so Valgrind is exiting too soon before it finds the actual problem, in my estimation.  The _M_getval is used several calls deep from some of the ndnSIM code, and Valgrind doesn't recognize it, so it's terminating and not giving much useful information.

Thanks,

John



On Sun, May 6, 2018 at 12:58 PM, Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>> wrote:
Hi John,

Please find the Valgrind output. I ran  both Valgrind leak-check=yes and leak-check=full

https://gist.github.com/thiteixeira/8c4fd1deb884b548d1f071ebb1bee043#file-valgrind-leak-check-full-txt

https://gist.github.com/thiteixeira/8c4fd1deb884b548d1f071ebb1bee043#file-valgrind-leak-check-yes-txt


Thanks,
TT

From: John Baugh [mailto:jpbaugh at umich.edu<mailto:jpbaugh at umich.edu>]
Sent: Sunday, May 6, 2018 8:40 AM
To: Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>>
Cc: Junxiao Shi <shijunxiao at email.arizona.edu<mailto:shijunxiao at email.arizona.edu>>; ndnsim <ndnsim at lists.cs.ucla.edu<mailto:ndnsim at lists.cs.ucla.edu>>

Subject: Re: [ndnSIM] Simulation terminated with signal SIGSEGV - possible bug

Looks like a heap corruption of some sort.

Can we try valgrind?

This may help:. http://www.lists.cs.ucla.edu/pipermail/ndnsim/2017-July/003991.html

Thanks

John

On Sun, May 6, 2018, 7:35 AM Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>> wrote:
Hi,

Thanks for your answers. I ran the simulation again and posted the gdb back trace full on the Gist (https://gist.github.com/thiteixeira/8c4fd1deb884b548d1f071ebb1bee043#file-gbd_bt_full_output-txt)

@John, I will try ndnSIM 2.4, thanks.

Best,
Thiago

From: John Baugh [mailto:jpbaugh at umich.edu<mailto:jpbaugh at umich.edu>]
Sent: Saturday, May 5, 2018 6:16 PM
To: Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>>
Cc: ndnsim at lists.cs.ucla.edu<mailto:ndnsim at lists.cs.ucla.edu>
Subject: Re: [ndnSIM] Simulation terminated with signal SIGSEGV - possible bug

Thiago,

I was able to run your scenario in ndnSIM 2.4.  So I suspect this is either an introduced bug in the simulator itself, or perhaps there's something not correctly configured in your environment.

I can't reproduce the error, so as Dr. Shi suggested, it would be useful to see your GDB and/or Valgrind output.

Thanks!

John

On Sat, May 5, 2018 at 9:23 AM, Thiago Teixeira <tteixeira at umass.edu<mailto:tteixeira at umass.edu>> wrote:
Hi all,

We have a scenario where nodes have two interfaces, one wireless and one wired. Nodes are position on a grid fashion and the producer is located in the center of the grid.
We run this scenario for 4,000 seconds, but the simulation ends at 2,099 seconds with the code
    Command ['/home/vagrant/ndnSIM/ns-3/build/scratch/ndn-debug-scenario'] terminated with signal SIGSEGV. Run it under a debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>").

Running with GDB didn’t offer any other insights. Here are the steps to reproduce the issue:

#### Expected behavior
Simulations run until the specified time Simulator::Stop(Seconds(4000.0));

#### Actual behavior
Simulations crash (see error message below) before Simulator::Stop(Seconds(4000.0));

`Command ['/home/vagrant/ndnSIM/ns-3/build/scratch/ndn-debug-scenario'] terminated with signal SIGSEGV. Run it under a debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>").`

#### Code to reproduce the problem
See Gist:
https://gist.github.com/thiteixeira/8c4fd1deb884b548d1f071ebb1bee043

#### ndnSIM version
ndnSIM-2.5-2-ge674a01

#### Operating system and version
Ubuntu 16.04.4 LTS
memory size: 2000MiB
1 cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz

#### Other relevant information
compiled ndnSIM using the optimized version:
./waf configure -d optimized

Increasing the number of nodes makes the simulation crash at an earlier time.

_______________________________________________
ndnSIM mailing list
ndnSIM at lists.cs.ucla.edu<mailto:ndnSIM at lists.cs.ucla.edu>
http://www.lists.cs.ucla.edu/mailman/listinfo/ndnsim




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.cs.ucla.edu/pipermail/ndnsim/attachments/20180521/49035f55/attachment-0001.html>


More information about the ndnSIM mailing list