PDA

View Full Version : Zone Crashing Issue


KingMort
06-04-2009, 09:45 PM
Having major zone crashing issues...

This is the only info I can get from the core drop..


Program terminated with signal 11, Segmentation fault.
#0 0x00002ba1d815e012 in __gnu_cxx::__exchange_and_add ()
from /usr/lib/libstdc++.so.6


Any ideas??

gaeorn
06-05-2009, 01:31 AM
I just started working on this tonight. I'm pretty sure it's a SoF problem with a 64bit server. I just obtained SoF tonight and immediately ran into this on my 64bit testing server. I'll let you know when I know more.

KLS
06-05-2009, 02:51 AM
SoF uses some stringstreams instead of pure char buffers which might be a place to look at, can't think of anything different between that and titanium other than that.

gaeorn
06-05-2009, 03:13 AM
It's segfaulting at the setup of the stringstream in SoF.cpp line 2069. I only know that's the exact point it fails because I set a breakpoint before it and was only able to step to that point.

(gdb) bt
#0 0x00000033b10b7712 in __gnu_cxx::__exchange_and_add (__mem=0x0, __val=-1) at atomicity.cc:41
#1 0x00000033b105ca88 in std::locale::operator= (this=0x7fff0bcad634, __other=@0x7fff0bcad450)
at /usr/src/debug/gcc-4.1.2-20070925/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/locale_classes.h:513
#2 0x00000033b105b8b2 in std::ios_base::_M_init (this=<value optimized out>) at ../../../../libstdc++-v3/src/ios_locale.cc:48
#3 0x00000033b106ea49 in std::basic_ios<char, std::char_traits<char> >::init (this=0x0, __sb=0xffffffff)
at /usr/src/debug/gcc-4.1.2-20070925/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_ios.tcc:142
#4 0x000000000068ef26 in SoF::SerializeItem (inst=0x2fe8a10, slot_id_in=0, length=0x7fff0bcadaa4, depth=0 '\0') at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/sstream:525
#5 0x00000000006901e9 in SoF::Strategy::Encode_OP_CharInventory (p=<value optimized out>, dest=0x2abca80, ack_req=true) at ../common/patches/SoF.cpp:974
#6 0x0000000000690b9e in StructStrategy::Encode (this=0x33b12efd00, p=0x0, dest=0xffffffff, ack_req=false) at ../common/StructStrategy.cpp:22
#7 0x000000000067a64b in EQStreamProxy::FastQueuePacket (this=<value optimized out>, p=0xffffffff, ack_req=false) at ../common/EQStreamProxy.cpp:36
#8 0x000000000067a50a in EQStreamProxy::QueuePacket (this=0x2fde100, p=<value optimized out>, ack_req=true) at ../common/EQStreamProxy.cpp:30
#9 0x00000000004a5547 in Client::QueuePacket (this=<value optimized out>, app=0xffffffff, ack_req=136, required_state=Mob::CLIENT_CONNECTING, filter=2972646656) at client.cpp:667
#10 0x00000000004b70f5 in Client::BulkSendInventoryItems (this=0x2fea700) at client_process.cpp:783
#11 0x0000000000589e36 in Client::FinishConnState2 (this=0x2fea700, dbaw=<value optimized out>) at client_packet.cpp:7596
#12 0x000000000058a0ac in Client::DBAWComplete (this=0x0, workpt_b1=<value optimized out>, dbaw=0x33a2c10288) at client_packet.cpp:6997
#13 0x0000000000504ff4 in DispatchFinishedDBAsync (dbaw=0x2fd9100) at zonedbasync.cpp:44
#14 0x00000000004c0ba9 in main (argc=<value optimized out>, argv=<value optimized out>) at net.cpp:536
(gdb)


Unfortunately, I really don't know C++ all that well. I know C well enough, but I kinda have to muddle through the C++ stuff (lots of google searches, lol).

I'll keep trying but I really don't have a clue why it is failing at this time.

KingMort
06-05-2009, 11:39 AM
Come on VZ / TZ hook us up with the answers already!!!

They must have run into this stuff recently also moving to a 64 bit system...

King

gaeorn
06-05-2009, 05:04 PM
Temporary work around:

I built 32bit binaries on a 32bit arch machine. I then copied them to my 64bit machine and ran them and zone did not crash when using SoF.

Continuing investigation into the problem:

I have had some experts in C++ look into this and as best they can tell, this is either a library or compiler bug under 64bit. If you look at this line in the backtrace:

#3 0x00000033b106ea49 in std::basic_ios<char, std::char_traits<char> >::init (this=0x0, __sb=0xffffffff)

notice the 'this=0x0' which means the reference got lost somewhere.

For a test, I am updating my compiler and libraries. I'll rebuild after that is done and see if the problem is still present.

gaeorn
06-06-2009, 01:48 AM
Updated to gcc packages to 4.3.2 and now I get:

#0 0x00007f3a033dddd2 in std::locale::operator= (this=0x7fff0b6a5988, __other=@0x7fff0b6a57b0)
at /usr/src/debug/gcc-4.3.2-20081105/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/ext/atomicity.h:51
#1 0x00007f3a033dc932 in std::ios_base::_M_init (this=<value optimized out>) at ../../../../libstdc++-v3/src/ios_locale.cc:48
#2 0x00007f3a033f0289 in std::basic_ios<char, std::char_traits<char> >::init (this=0x7fff0b6a5988, __sb=0x7fff0b6a57b0)
at /usr/src/debug/gcc-4.3.2-20081105/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_ios.tcc:133
#3 0x0000000000693967 in SoF::SerializeItem (inst=0x1866a50, slot_id_in=0, length=0x7fff0b6a5df4, depth=0 '\0') at /usr/lib/gcc/x86_64-redhat-linux/4.3.2/../../../../include/c++/4.3.2/istream:587
#4 0x0000000000694cf7 in SoF::Strategy::Encode_OP_CharInventory (p=<value optimized out>, dest=0x1868940, ack_req=true) at ../common/patches/SoF.cpp:974
#5 0x0000000000694fec in StructStrategy::Encode (this=0x7f3a03673f00, p=0x7fff0b6a5988, dest=0x7fff0b6a57b0, ack_req=136) at ../common/StructStrategy.cpp:22
#6 0x000000000067efb7 in EQStreamProxy::FastQueuePacket (this=<value optimized out>, p=0x7fff0b6a57b0, ack_req=false) at ../common/EQStreamProxy.cpp:36
#7 0x000000000067ee78 in EQStreamProxy::QueuePacket (this=0x185bec0, p=<value optimized out>, ack_req=136) at ../common/EQStreamProxy.cpp:30
#8 0x00000000004a9a8f in Client::QueuePacket (this=<value optimized out>, app=0x7fff0b6a57b0, ack_req=136, required_state=Mob::CLIENT_CONNECTING, filter=57097984) at client.cpp:667
#9 0x00000000004b81a9 in Client::BulkSendInventoryItems (this=0x186b3d0) at client_process.cpp:783
#10 0x000000000058baf0 in Client::FinishConnState2 (this=0x186b3d0, dbaw=<value optimized out>) at client_packet.cpp:7602
#11 0x0000000000596b2d in Client::DBAWComplete (this=0x7fff0b6a5988, workpt_b1=<value optimized out>, dbaw=0x33a2c10288) at client_packet.cpp:7003
#12 0x000000000050bd67 in DispatchFinishedDBAsync (dbaw=0x18576d0) at zonedbasync.cpp:44
#13 0x00000000004c3fe7 in main (argc=<value optimized out>, argv=<value optimized out>) at net.cpp:536


Similar, but at least now I don't see the null reference on line 3. Going to run under valgrind to see if anything funky is going on.

gaeorn
06-06-2009, 04:39 AM
Ok, it's a gcc bug with optimization. If you remove the -O option from CFLAGS when building common/patches/SoF.o, it should work. Everything else can still use the -O flag. I'm going to narrow down the specific optimization that is the cause and then I will create patches for the makefiles.

gaeorn
06-06-2009, 05:27 AM
Here is the patch I'm using to resolve this problem:

Index: world/makefile
================================================== =================
--- world/makefile (revision 27)
+++ world/makefile (revision 28)
@@ -21,6 +21,13 @@
MYSQL_FLAGS=$(shell mysql_config --cflags)
MYSQL_LIB=$(shell mysql_config --libs)

+SOFCOPTS=$(WFLAGS) -g -pthread -pipe -I../common/SocketLib \
+ -fauto-inc-dec -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse \
+ -fguess-branch-probability -fif-conversion2 -fif-conversion -finline-small-functions \
+ -fipa-pure-const -fipa-reference -fmerge-constants -fsplit-wide-types -ftree-ccp \
+ -ftree-ch -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre \
+ -ftree-sra -ftree-ter -funit-at-a-time -fomit-frame-pointer \
+ -DFX -D_GNU_SOURCE -DINVERSEXY -DWORLD $(DFLAGS) $(MYSQL_FLAGS) $(PERL_FLAGS)
COPTS=$(WFLAGS) -g -O -pthread -pipe -I../common/SocketLib \
-DFX -D_GNU_SOURCE -DINVERSEXY -DWORLD $(DFLAGS) $(MYSQL_FLAGS) $(PERL_FLAGS)
LINKOPTS=$(COPTS) -rdynamic -L. -lstdc++ -lm -lz -ldl \
@@ -33,6 +40,9 @@

include makefile.common

+../common/patches/SoF.o: ../common/patches/SoF.cpp
+ $(CC) $(NOLINK) $(SOFCOPTS) $< $(OUT)$@
+
.depend depend:
for f in $(SF); \
do \
Index: zone/makefile
================================================== =================
--- zone/makefile (revision 27)
+++ zone/makefile (revision 28)
@@ -18,6 +18,13 @@
PERL_LIB=$(shell perl -MExtUtils::Embed -e ldopts)
DFLAGS+=-DEMBPERL -DEMBPERL_PLUGIN -DHAS_UNION_SEMUN
WFLAGS=-fpermissive -Wall -Wuninitialized -Wwrite-strings -Wcast-qual -Wno-deprecated -Wcomment -Wcast-align
+SOFCOPTS=$(WFLAGS) -g \
+ -fauto-inc-dec -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse \
+ -fguess-branch-probability -fif-conversion2 -fif-conversion -finline-small-functions \
+ -fipa-pure-const -fipa-reference -fmerge-constants -fsplit-wide-types -ftree-ccp \
+ -ftree-ch -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre \
+ -ftree-sra -ftree-ter -funit-at-a-time -fomit-frame-pointer \
+ -pthread -pipe -D_GNU_SOURCE -DINVERSEXY -DFX -DZONE $(DFLAGS) $(MYSQL_FLAGS) $(PERL_FLAGS)
COPTS=$(WFLAGS) -O -g -pthread -pipe -D_GNU_SOURCE -DINVERSEXY -DFX -DZONE $(DFLAGS) $(MYSQL_FLAGS) $(PERL_FLAGS)
LINKOPTS=$(COPTS) -rdynamic -L. -lstdc++ -ldl $(MYSQL_LIB) $(PERL_LIB)

@@ -27,6 +34,9 @@

include makefile.common

+../common/patches/SoF.o: ../common/patches/SoF.cpp
+ $(CC) $(NOLINK) $(SOFCOPTS) $< $(OUT)$@
+
.depend depend:
for f in $(SF); \
do \


All of the sub-optimization flags of -O did cause the bug with common/patches/SoF.cpp so I included them all when building that one object file. Everything else still builds with -O.

gaeorn
06-06-2009, 01:44 PM
Odd I can't edit the last post. Anyway, I meant to say all the optimization flags that are set by -O did NOT cause the bug so I set them all instead of -O and it works fine on my system.

I did notice an old, closed, bug on this issue on the gcc bugtrack. I'll be opening a new bug about it so hopefully it'll be fixed in the future.