PDA

View Full Version : zone crashing


ItChyEQ
10-18-2011, 10:24 PM
Hi all,

I have an issue where a zone i am in is continually crashing and a fellow player caught this bit of information. Can anyone help?


Re: PoA crashes
« Reply #83 on: October 15, 2011, 05:43:34 PM »
Reply with quoteQuote
Came across something interesting today. I was logging scenarios to try and narrow something down with these crashes, air specifically. For all the characters, a zone crash resulted in the log file ending seemingly at random. But my warrior logged this very final line after one of the crashes:

[Sat Oct 15 16:18:15 2011] ERROR: String not found. (2088400502)

2,088,400,502 error code number
2,147,483,647 max range of a signed integer

On the same factor with a small difference less than 60 million.

My notes show the zone crashed when my 7th The Warlord was around 50%. My rogue is my lead dps and on average does 25 million damage per warlord. With the other 11 other classes there, the error code number aligns almost perfectly with a variable saved after the last kill before the final npc that crashed it. (Total damage - current warlord) Based on average dps, the number predicts almost perfectly when that value would overflow and when the zone would crash. (Total damage -> overflow)

lerxst2112
10-18-2011, 11:17 PM
The information is likely to be anecdotal and not useful in tracking down the issue. In order to track down the issue and fix it the server owner would need to add debugging info to their zone process and either catch the crash in the debugger or in a minidump. Information on how to do both has been mentioned here many times.

I suspect that zone may be running out of memory due to a pretty easy to cause memory leak. It's a small leak, but given where it is and how easy it is to cause it could be the problem.


This code leaks 4k every time a message is blocked due to filters:

void Client::Message(uint32 type, const char* message, ...) {
va_list argptr;
char *buffer = new char[4096];

if (GetFilter(FilterSpellDamage) == FilterHide && type == MT_NonMelee)
return;
if (GetFilter(FilterMeleeCrits) == FilterHide && type == MT_CritMelee) //98 is self...
return;
if (GetFilter(FilterSpellCrits) == FilterHide && type == MT_SpellCrits)
return;

va_start(argptr, message);
vsnprintf(buffer, 4096, message, argptr);
va_end(argptr);

size_t len = strlen(buffer);

//client dosent like our packet all the time unless
//we make it really big, then it seems to not care that
//our header is malformed.
//len = 4096 - sizeof(SpecialMesg_Struct);

uint32 len_packet = sizeof(SpecialMesg_Struct)+len;
EQApplicationPacket* app = new EQApplicationPacket(OP_SpecialMesg, len_packet);
SpecialMesg_Struct* sm=(SpecialMesg_Struct*)app->pBuffer;
sm->header[0] = 0x00; // Header used for #emote style messages..
sm->header[1] = 0x00; // Play around with these to see other types
sm->header[2] = 0x00;
sm->msg_type = type;
memcpy(sm->message, buffer, len+1);

FastQueuePacket(&app);

safe_delete_array(buffer);
}


This code does not leak:

void Client::Message(uint32 type, const char* message, ...) {
if (GetFilter(FilterSpellDamage) == FilterHide && type == MT_NonMelee)
return;
if (GetFilter(FilterMeleeCrits) == FilterHide && type == MT_CritMelee) //98 is self...
return;
if (GetFilter(FilterSpellCrits) == FilterHide && type == MT_SpellCrits)
return;

va_list argptr;
char buffer[4096];
va_start(argptr, message);
vsnprintf(buffer, sizeof(buffer), message, argptr);
va_end(argptr);

size_t len = strlen(buffer);

//client dosent like our packet all the time unless
//we make it really big, then it seems to not care that
//our header is malformed.
//len = 4096 - sizeof(SpecialMesg_Struct);

uint32 len_packet = sizeof(SpecialMesg_Struct)+len;
EQApplicationPacket* app = new EQApplicationPacket(OP_SpecialMesg, len_packet);
SpecialMesg_Struct* sm=(SpecialMesg_Struct*)app->pBuffer;
sm->header[0] = 0x00; // Header used for #emote style messages..
sm->header[1] = 0x00; // Play around with these to see other types
sm->header[2] = 0x00;
sm->msg_type = type;
memcpy(sm->message, buffer, len+1);

FastQueuePacket(&app);
}


My advice is try making sure the above mentioned filters are off. The server owner could also try replacing the first bit of code with the second and seeing if it helps.

trevius
10-19-2011, 04:42 AM
Very nice catch, Lerxst2112! I think you may have just solved one of the main EQEmu memory leaks. I know Lillu was telling me that he noticed memory leaks in their zones while in combat, but no leaks while just standing around. I am sure there are still other leaks, but that should be a pretty big one. I can imagine seeing that leak grow quickly in a raid scenario.

lerxst2112
10-19-2011, 05:10 AM
There are a few others, but that one is the worst because it is so easy to trigger and it is called so often. 4k isn't a lot by itself, but it would add up over time, and it's likely that fragmentation and allocation count would cause issues long before it hit the 2 gig limit.

Akkadius is working on getting that one and some of the others committed. I'll keep passing them along as I'm able to test them.

Lillu
10-19-2011, 10:56 AM
Lerxst, this is an awesome catch. I believe this was the source of our biggest leaks (player and even bot custom message filters too). We updated the code and I already see a huge difference in memory consumption. Can't tell you how happy Vaion and I were seeing this fix. Thank you!! <3

ItChyEQ
10-20-2011, 12:53 AM
Thank you! I have sent this information to the server admin. I really appreciate the help :)

thepoetwarrior
10-20-2011, 12:29 PM
Is this update live yet? I can't wait to update the source code and have the users test it out.

trevius
10-20-2011, 08:28 PM
Yeah, KLS committed it in Rev 2033 the other day:

http://code.google.com/p/projecteqemu/source/list

thepoetwarrior
10-20-2011, 11:44 PM
Can't wait to try it out this weekend. Hopefully this would fix the zone-name.exe in the process list from growing from like 20 mb to about 1+ GB? Our server has 48 GB RAM so thats not much issue, but the crashing is, which would be awsome if this fixes that.

Lillu
10-21-2011, 03:04 AM
It will fix that, we had the same issue. Youl'll run at around 7GB ram now with 100 dyn zones and 300+ players.. just an amazing fix.

thepoetwarrior
10-21-2011, 09:08 PM
Awsome, can't wait! Updating source code now. By the way, here is what my process list looks like right now, after 15 hours of uptime and 300+ online. If it improves after update, then I'll post new results for everyone to see. If this work, it truely will be one of the best fixes ever for eqemu! Thanks Lillu for your update as well! :)

http://i51.tinypic.com/xbb636.png

thepoetwarrior
10-22-2011, 11:57 AM
Update:

Here is another screenshot AFTER updating the source code. Again, there are over 300+ players online.

Notice any difference how much RAM each zone is taking up vs the previous post?

Best fix ever!

http://i55.tinypic.com/x0y7vs.png

thepoetwarrior
10-23-2011, 01:15 AM
Users are complaining about the melee crits not filtering, and spamming their window, but they are also very happy of no more crashes. They've tried very hard with 50 man raids to crash a zone, and were not able to crash.

Akkadius
10-23-2011, 01:47 AM
Users are complaining about the melee crits not filtering, and spamming their window, but they are also very happy of no more crashes. They've tried very hard with 50 man raids to crash a zone, and were not able to crash.

This may also be because of some spell crashes that have been around for quite a while. The few things that I patched in brought 10 crashes an hour on The Hidden Forest to barely 1.

One had to do with spells that would fire off on fade or on wear off, TryFadeEffect, ultimately triggering a weapon proc that crashes the zone if an invalid spell id passes through.

Another one is very related to TryFadeEffect, but has to do with ExecWeaponProc when of course a weapon is proc'ed, validity for the spell itself is not checked in a few areas of the code. I have implemented validity checking within the function itself, and it will let the player know that it is invalid and also log it to spells logging if you have it enabled so that you can fix it.

Though fixing the memory leak will ultimately provide way more stability and save TONS of memory and is a huge fix.

Lillu reported having used anywhere between 40-60GB of RAM after 24 hours, and is now using maybe 8GB in a similar time frame. Given THF boots with 100 zoneservers and initially starts at approx 6.5GB.

All around things should be much more stable right now. I'd advise any crashes that you are getting to report the dumps back to the forums so that they can be investigated.

As far as melee hits not filtering, I'm not sure off hand but I'm sure the trade off is well worth it for now until it is investigated.

Thanks,
~Akka

Lillu
10-23-2011, 02:10 AM
Akkadius is right, it's the best trade-off ever. Hell, I would trade in all our wood elf chix for that fix! :rolleyes:

Again, crashes and memory leaks are 99% gone. It's night and day. Thanks for the awesome fix.

sorvani
10-23-2011, 03:22 AM
Melee crits not filtering would likely be my fault I've been fixing various other filters that were not working, and did not think i touched that one but i guess i did. I'll go poke at the code and see what I did. I made a lot of message changes in rev 2011 and 2012.
edit: hmm i did add in some changes to use string id's, but it is still a MT_CritMelee.
edit 2: Here is the change. I do not know why it would cause it not to filter unless there is a problem with MessageClose_StringID or something, but I used the same code when I changed pet flurry and enrage and those work.
================================================== =================
--- C:/SVN Files/eqemu/trunk/EQEmuServer/zone/attack.cpp (revision 2011)
+++ C:/SVN Files/eqemu/trunk/EQEmuServer/zone/attack.cpp (revision 2012)
@@ -3866,7 +3866,7 @@
if (MakeRandomInt(0, 99) < critChance) {
critMod += GetCritDmgMob(skill) * 2; // To account for base crit mod being 200 not 100
damage = (damage * critMod) / 100;
- entity_list.MessageClose(this, false, 200, MT_CritMelee, "%s scores a critical hit!(%d)", GetCleanName(), damage);
+ entity_list.MessageClose_StringID(this, false, 200, MT_CritMelee, CRITICAL_HIT, GetCleanName(), itoa(damage));
}
}
}

lerxst2112
10-23-2011, 04:13 AM
If you look at the top of Client::Message it appears there may be some extra filtering needed that EntityList::QueueCloseClients doesn't do.

sorvani
10-23-2011, 04:28 AM
See they way I read that was that it was not filtering to another window properly which would have nothing to do with the filter being set to completely hide it.

Heading home now so I'll be able to tinker with it tomorrow afternoon/evening sometime.

thepoetwarrior
10-23-2011, 09:26 AM
Yes, the trade off is worth it. I'm sure the users will figure out how to deal with the crit melee spam (put it in another window), cause not crashing is just so awsome! Thanks again!

sorvani
10-23-2011, 11:08 AM
ok just did a quick test to confirm the issue. Criticals do filter correctly to another window, it is simply the filtering to not show at all that fails.

Loooking closer at the code in EntityList::QueueCloseClients it is attempting to use the filters, so must be a problem in that logic someplace.

sorvani
10-23-2011, 01:14 PM
should be fixed in r2042

thepoetwarrior
10-24-2011, 11:39 PM
Awsome! Downloading and Compiling new source code now. Best stuff ever for EQEMU!