PDA

View Full Version : Core Dumb BackTrace + More


arkaria
11-09-2003, 06:20 AM
I'm running DR5 (cvs as of Nov 8 9:47 PST)


Core was generated by `./zone . eqemu.1amos.com 7999 localhost'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libstdc++.so.5...done.
Loaded symbols for /usr/lib/libstdc++.so.5
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libmysqlclient.so.10...done.
Loaded symbols for /usr/lib/libmysqlclient.so.10
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /home/shadow/emu/libEMuShareMem.so...done.
Loaded symbols for /home/shadow/emu/libEMuShareMem.so
#0 EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817c480, sender=0x41e2dd18, iArrgoRange=65, iAssistRange=65)
at MobAI.cpp:300
300 LogFile->write(EQEMuLog::Debug, "Check aggro for %s assisting %s, target %s.", sender->GetName(), mob->GetName(), mobTarget->GetName());
(gdb) bt
#0 EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817c480, sender=0x41e2dd18, iArrgoRange=65, iAssistRange=65)
at MobAI.cpp:300
#1 0x080df837 in Mob::AI_Process() (this=0x41e2dd18) at mob.h:504
#2 0x080a6ec4 in NPC::Process() (this=0x41e2dd18) at npc.cpp:499
#3 0x0807e4a6 in EntityList::Process() (this=0xbfffe950) at entity.cpp:1253
#4 0x080a9550 in main (argc=5, argv=0x1) at net.cpp:294
#5 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6
(gdb)

arkaria
11-09-2003, 09:21 AM
I went out for about 4 hours and came back to this in my console:

Error in LoadVariables query 'SELECT varname, value, unix_timestamp() FROM variables where unix_timestamp(ts) >= 0' #2006: MySQL server has gone away
Error in LoadZoneNames query 'SELECT MAX(zoneidnumber) FROM zone' #2006: MySQL server has gone away
Error in GetItemsCount query 'SELECT MAX(id),count(*) FROM items' #2006: MySQL server has gone away
Error in GetNPCTypesCount query 'SELECT MAX(id), count(*) FROM npc_types' #2006: MySQL server has gone away
Error in LoadVariables query 'SELECT varname, value, unix_timestamp() FROM variables where unix_timestamp(ts) >= 0' #2006: MySQL server has gone away
Error in LoadZoneNames query 'SELECT MAX(zoneidnumber) FROM zone' #2006: MySQL server has gone away
Error in GetItemsCount query 'SELECT MAX(id),count(*) FROM items' #2006: MySQL server has gone away
Error in GetNPCTypesCount query 'SELECT MAX(id), count(*) FROM npc_types' #2006: MySQL server has gone away


There is also 84 mysqld processes running. And 4 of my 9 zone processes have crashed.

*edit*
every 5 minutes or so 5 more mysqld processes are spawned

Edgar1898
11-09-2003, 10:52 AM
maybe the linux emu doesnt clean up after its db queries, but on windows I dont have that problem. You might want to check the mysql server, it might be messed up. I'm running 4.0.13 on a linux machine and I havent had any problems at all.

arkaria
11-09-2003, 11:04 AM
I'm running MySQL v3.23

As soon as I kill all the zone processes and world, all the extra mysql processes end.

Before I updated to DR5, I never saw any errors like these. No extra mysqld processes either.

If you sugest I upgrade to mysql 4.x I'd be happy to give that a try.

Edgar1898
11-09-2003, 11:06 AM
Not sure if that would help or not, maybe one of the linux gurus like DM or TC can find the cause of it, if its trully not cleaning up after itself.

Trumpcard
11-09-2003, 11:47 AM
I havent had any problems with my zone servers, but I'm running the 4.x branch also... The connections should be the same whether its windows or linux, nothing OS specific in the db calls that i'm aware of...

arkaria
11-09-2003, 01:25 PM
Well I've compiled and installed mysql 4.0.16 and grabbed the new cvs that was just pushed.

Lets hope it all works :)

*edit*

Seems to be working so far.

arkaria
11-09-2003, 04:43 PM
Well it would seem that I'm having the same troubles.

Here are some of the errors I'm getting.

Error in LoadVariables query 'SELECT varname, value, unix_timestamp() FROM variables where unix_timestamp(ts) >= 0' #2006: MySQL server has gone away
Error in LoadZoneNames query 'SELECT MAX(zoneidnumber) FROM zone' #2006: MySQL server has gone away
Error in GetItemsCount query 'SELECT MAX(id),count(*) FROM items' #2006: MySQL server has gone away
Error in GetNPCTypesCount query 'SELECT MAX(id), count(*) FROM npc_types' #2006: MySQL server has gone away
Error in LoadVariables query 'SELECT varname, value, unix_timestamp() FROM variables where unix_timestamp(ts) >= 0' #2006: MySQL server has gone away
Error in LoadZoneNames query 'SELECT MAX(zoneidnumber) FROM zone' #2006: MySQL server has gone away
Error in GetItemsCount query 'SELECT MAX(id),count(*) FROM items' #2006: MySQL server has gone away
Error in GetNPCTypesCount query 'SELECT MAX(id), count(*) FROM npc_types' #2006: MySQL server has gone away
Error in LoadVariables query 'SELECT varname, value, unix_timestamp() FROM variables where unix_timestamp(ts) >= 0' #2006: MySQL server has gone away
Error in LoadZoneNames query 'SELECT MAX(zoneidnumber) FROM zone' #2006: MySQL server has gone away
Error in GetItemsCount query 'SELECT MAX(id),count(*) FROM items' #2006: MySQL server has gone away
Error in GetNPCTypesCount query 'SELECT MAX(id), count(*) FROM npc_types' #2006: MySQL server has gone away

and a whole pile of this one:

eqns.Open failed

arkaria
11-09-2003, 04:44 PM
More core dumps:

Loaded symbols for /home/shadow/emu/libEMuShareMem.so
#0 EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817bcc0, sender=0x832dc88, iArrgoRange=65, iAssistRange=65)
at MobAI.cpp:300
300 LogFile->write(EQEMuLog::Debug, "Check aggro for %s assisting %s, target %s.", sender->GetName(), mob->GetName(), mobTarget->GetName());
(gdb) bt
#0 EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817bcc0, sender=0x832dc88, iArrgoRange=65, iAssistRange=65)
at MobAI.cpp:300
#1 0x080dfbd3 in Mob::AI_Process() (this=0x832dc88) at mob.h:504
#2 0x080a7234 in NPC::Process() (this=0x832dc88) at npc.cpp:499
#3 0x0807e51a in EntityList::Process() (this=0x817bcd0) at entity.cpp:1253
#4 0x080a98ec in main (argc=5, argv=0x1) at net.cpp:294
#5 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6


And another:

#0 EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817bcc0, sender=0x82b53c0, iArrgoRange=65, iAssistRange=65)
at MobAI.cpp:300
300 LogFile->write(EQEMuLog::Debug, "Check aggro for %s assisting %s, target %s.", sender->GetName(), mob->GetName(), mobTarget->GetName());
(gdb) bt
#0 EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817bcc0, sender=0x82b53c0, iArrgoRange=65, iAssistRange=65)
at MobAI.cpp:300
#1 0x080dfbd3 in Mob::AI_Process() (this=0x82b53c0) at mob.h:504
#2 0x080a7234 in NPC::Process() (this=0x82b53c0) at npc.cpp:499
#3 0x0807e51a in EntityList::Process() (this=0x817bcd0) at entity.cpp:1253
#4 0x080a98ec in main (argc=5, argv=0x1) at net.cpp:294
#5 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6

arkaria
11-10-2003, 03:39 PM
Well if anyone uses my server the zone processes seem to keep crashing. I'm getting the same BackTrace as the ones posted above so I won't post them again.

It seems that the zone proceses are restarting themselves (explains the extra mysqld) I know they are restarting because when I start them up they are all nice and in order in my processes list (ps -aux).

But after some crashs they are out of order and they have gone from:
./zone . eqemu.1amos.com 7996 localhost

to:
./zone . eqemu.1amos.com 7996 127.0.0.1

Am I the only one seeing this error or anything like it?

Trumpcard
11-10-2003, 11:16 PM
It shouldnt matter though, localhost is just an alias to 127.0.0.1 .. Its all loopback..

I havent been having any problems with my zone servers crashing, but there have been alot of changes in AIArrgo (yes, a typo) so a crash wouldnt surprise me.

The funny thing about that is that it looks like it's crashing in the logfile write.

Try commenting out line 300# in MobAI.cpp so it doesnt write to the logfile everytime an assist is identified... Technically all those debug writes should be in a ifdef anyways...

krich
11-11-2003, 04:52 AM
Guys,

I'm experiencing the same thing.

I suspected that this debug statement was crashing the zone by referencing a null pointer (either sender, mob, or mobTarget). I put a statement in front of that debug statement that prints out the pointer values of sender, mob, and mobTarget (major log spam). Like this:

LogFile->write(EQEMuLog::Debug, "Check aggro for %d assisting %d, target %d.", sender, mob, mobTarget);

When the zone crashes, mobTarget shows a value of 0. Like this:
eqemu_debug_zone.log:16325 [11.10. - 23:51:16] Check aggro for 136602560 assisting 136762384, target 0.

Now, this is very strange because the previous IF statement checks to make sure mobTarget is not null, so in order to get into that block of code mobTarget cannot be null. Here's the full block of code:

if (mobTarget
&& (fv <= FACTION_AMIABLE
#ifdef GUILDWARS
|| guildwars.GetCurrentGuildFaction(mob,sender) >= GW_KINDLY
#endif
)
&& (mob->IsNPC() && mob->IsEngaged()) || (mob->IsClient() && mob->CastToClient()->AutoAttackEnabled()) // Clients do not use IsEngaged!!
&& dist <= iAssistRange
&& (mob->GetINT() <= 100 || mobTarget->GetLevelCon(sender->GetLevel()) != CON_GREEN)
&& dist <= (iAssistRange * 2)
) {
// Had an if statement to check if it wasn't a GM but theres no reason, we check that above
// Also had an interactive npc check but I believe these are no longer used, if required can be put above
// Assist friend
LogFile->write(EQEMuLog::Debug, "Check aggro for %d assisting %d, target %d.", sender, mob, mobTarget);
LogFile->write(EQEMuLog::Debug, "Check aggro for %s assisting %s, target %s.", sender->GetName(), mob->GetName(), mobTarget->GetName());
return mobTarget;
}

Hope this helps a bit. I'm still investigating a fix, but that's likely the cause.

Regards,

krich

krich
11-11-2003, 04:55 AM
Arkaria,

If you are on Linux, make sure you turn off the holdzones variable. I've seen it take down perfectly working zones just as you are describing (i.e. no core dump). I haven't seen it work cleanly on a Linux system yet. Alas...someday Linux will catch up to windows... 8)

Regards,

krich

Trumpcard
11-11-2003, 05:38 AM
When I get in this evening I'll wrap all the debugs in an ifdef DEBUG>=11. It doesnt solve the problem but will keep the code out unless someone really wants it. It might just be a synchronization issue where the mobtarget gets deallocated before the log write. Pretty odd, but can happen I suppose...

Hopefully the return mobtarget won't cause a problem when that happens...

arkaria
11-11-2003, 06:35 AM
I've commented out line 300 and have the server up and running. All I need is for a bunch of peopole to log in to see if the zones start comming down again.

I also turned off holdzones.

/crosses fingers

Edgar1898
11-11-2003, 09:04 AM
DM's fix is on public cvs now.

arkaria
11-11-2003, 11:34 AM
Sweet! I'll check it out now cause one of my zones crashed:
#0 Mob::GetLevelCon(unsigned char) (this=0x0, iOtherLevel=5 '\005') at mob.h:303
303 inline const int8& GetLevel() { return level; }
(gdb) bt
#0 Mob::GetLevelCon(unsigned char) (this=0x0, iOtherLevel=5 '\005') at mob.h:303
#1 0x080de4a0 in EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817bcc0, sender=0x82696f8, iArrgoRange=65,
iAssistRange=65) at mob.h:303
#2 0x080dfad7 in Mob::AI_Process() (this=0x82696f8) at mob.h:504
#3 0x080a7190 in NPC::Process() (this=0x82696f8) at npc.cpp:499
#4 0x0807e556 in EntityList::Process() (this=0x817bcd0) at entity.cpp:1253
#5 0x080a9848 in main (argc=5, argv=0x1) at net.cpp:294
#6 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6

Same old same old it would seem.

Thanx for the fast CVS update I'll give it a try now.

arkaria
11-12-2003, 01:05 PM
#0 Mob::GetLevelCon(unsigned char) (this=0x0, iOtherLevel=43 '+') at mob.h:303
303 inline const int8& GetLevel() { return level; }
(gdb) bt
#0 Mob::GetLevelCon(unsigned char) (this=0x0, iOtherLevel=43 '+') at mob.h:303
#1 0x080de4ac in EntityList::AICheckCloseArrgo(Mob*, float, float) (this=0x817bcc0, sender=0x82830b8, iArrgoRange=65,
iAssistRange=65) at mob.h:303
#2 0x080dfb2f in Mob::AI_Process() (this=0x82830b8) at mob.h:504
#3 0x080a7190 in NPC::Process() (this=0x82830b8) at npc.cpp:499
#4 0x0807e556 in EntityList::Process() (this=0x817bcd0) at entity.cpp:1253
#5 0x080a9848 in main (argc=5, argv=0x1) at net.cpp:294
#6 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6


I got a pile of zones that crashe with the same BackTrace as I have posted previously. The above BT is different. This is running off CVS from yesterday afternoon.