EQEmulator Forums

EQEmulator Forums (https://www.eqemulator.org/forums/index.php)
-   Support::Windows Servers (https://www.eqemulator.org/forums/forumdisplay.php?f=587)
-   -   zone crashing (https://www.eqemulator.org/forums/showthread.php?t=34371)

ItChyEQ 10-18-2011 10:24 PM

zone crashing
 
Hi all,

I have an issue where a zone i am in is continually crashing and a fellow player caught this bit of information. Can anyone help?

Quote:

Re: PoA crashes
« Reply #83 on: October 15, 2011, 05:43:34 PM »
Reply with quoteQuote
Came across something interesting today. I was logging scenarios to try and narrow something down with these crashes, air specifically. For all the characters, a zone crash resulted in the log file ending seemingly at random. But my warrior logged this very final line after one of the crashes:

[Sat Oct 15 16:18:15 2011] ERROR: String not found. (2088400502)

2,088,400,502 error code number
2,147,483,647 max range of a signed integer

On the same factor with a small difference less than 60 million.

My notes show the zone crashed when my 7th The Warlord was around 50%. My rogue is my lead dps and on average does 25 million damage per warlord. With the other 11 other classes there, the error code number aligns almost perfectly with a variable saved after the last kill before the final npc that crashed it. (Total damage - current warlord) Based on average dps, the number predicts almost perfectly when that value would overflow and when the zone would crash. (Total damage -> overflow)

lerxst2112 10-18-2011 11:17 PM

The information is likely to be anecdotal and not useful in tracking down the issue. In order to track down the issue and fix it the server owner would need to add debugging info to their zone process and either catch the crash in the debugger or in a minidump. Information on how to do both has been mentioned here many times.

I suspect that zone may be running out of memory due to a pretty easy to cause memory leak. It's a small leak, but given where it is and how easy it is to cause it could be the problem.


This code leaks 4k every time a message is blocked due to filters:
Code:

void Client::Message(uint32 type, const char* message, ...) {
        va_list argptr;
        char *buffer = new char[4096];

        if (GetFilter(FilterSpellDamage) == FilterHide && type == MT_NonMelee)
                return;
        if (GetFilter(FilterMeleeCrits) == FilterHide && type == MT_CritMelee) //98 is self...
                return;
        if (GetFilter(FilterSpellCrits) == FilterHide && type == MT_SpellCrits)
                return;

        va_start(argptr, message);
        vsnprintf(buffer, 4096, message, argptr);
        va_end(argptr);

        size_t len = strlen(buffer);

        //client dosent like our packet all the time unless
        //we make it really big, then it seems to not care that
        //our header is malformed.
        //len = 4096 - sizeof(SpecialMesg_Struct);

        uint32 len_packet = sizeof(SpecialMesg_Struct)+len;
        EQApplicationPacket* app = new EQApplicationPacket(OP_SpecialMesg, len_packet);
        SpecialMesg_Struct* sm=(SpecialMesg_Struct*)app->pBuffer;
        sm->header[0] = 0x00; // Header used for #emote style messages..
        sm->header[1] = 0x00; // Play around with these to see other types
        sm->header[2] = 0x00;
        sm->msg_type = type;
        memcpy(sm->message, buffer, len+1);

        FastQueuePacket(&app);

        safe_delete_array(buffer);
}

This code does not leak:
Code:

void Client::Message(uint32 type, const char* message, ...) {
        if (GetFilter(FilterSpellDamage) == FilterHide && type == MT_NonMelee)
                return;
        if (GetFilter(FilterMeleeCrits) == FilterHide && type == MT_CritMelee) //98 is self...
                return;
        if (GetFilter(FilterSpellCrits) == FilterHide && type == MT_SpellCrits)
                return;

        va_list argptr;
        char buffer[4096];
        va_start(argptr, message);
        vsnprintf(buffer, sizeof(buffer), message, argptr);
        va_end(argptr);

        size_t len = strlen(buffer);

        //client dosent like our packet all the time unless
        //we make it really big, then it seems to not care that
        //our header is malformed.
        //len = 4096 - sizeof(SpecialMesg_Struct);

        uint32 len_packet = sizeof(SpecialMesg_Struct)+len;
        EQApplicationPacket* app = new EQApplicationPacket(OP_SpecialMesg, len_packet);
        SpecialMesg_Struct* sm=(SpecialMesg_Struct*)app->pBuffer;
        sm->header[0] = 0x00; // Header used for #emote style messages..
        sm->header[1] = 0x00; // Play around with these to see other types
        sm->header[2] = 0x00;
        sm->msg_type = type;
        memcpy(sm->message, buffer, len+1);

        FastQueuePacket(&app);
}

My advice is try making sure the above mentioned filters are off. The server owner could also try replacing the first bit of code with the second and seeing if it helps.

trevius 10-19-2011 04:42 AM

Very nice catch, Lerxst2112! I think you may have just solved one of the main EQEmu memory leaks. I know Lillu was telling me that he noticed memory leaks in their zones while in combat, but no leaks while just standing around. I am sure there are still other leaks, but that should be a pretty big one. I can imagine seeing that leak grow quickly in a raid scenario.

lerxst2112 10-19-2011 05:10 AM

There are a few others, but that one is the worst because it is so easy to trigger and it is called so often. 4k isn't a lot by itself, but it would add up over time, and it's likely that fragmentation and allocation count would cause issues long before it hit the 2 gig limit.

Akkadius is working on getting that one and some of the others committed. I'll keep passing them along as I'm able to test them.

Lillu 10-19-2011 10:56 AM

Lerxst, this is an awesome catch. I believe this was the source of our biggest leaks (player and even bot custom message filters too). We updated the code and I already see a huge difference in memory consumption. Can't tell you how happy Vaion and I were seeing this fix. Thank you!! <3

ItChyEQ 10-20-2011 12:53 AM

Thank you! I have sent this information to the server admin. I really appreciate the help :)

thepoetwarrior 10-20-2011 12:29 PM

Is this update live yet? I can't wait to update the source code and have the users test it out.

trevius 10-20-2011 08:28 PM

Yeah, KLS committed it in Rev 2033 the other day:

http://code.google.com/p/projecteqemu/source/list

thepoetwarrior 10-20-2011 11:44 PM

Can't wait to try it out this weekend. Hopefully this would fix the zone-name.exe in the process list from growing from like 20 mb to about 1+ GB? Our server has 48 GB RAM so thats not much issue, but the crashing is, which would be awsome if this fixes that.

Lillu 10-21-2011 03:04 AM

It will fix that, we had the same issue. Youl'll run at around 7GB ram now with 100 dyn zones and 300+ players.. just an amazing fix.

thepoetwarrior 10-21-2011 09:08 PM

Awsome, can't wait! Updating source code now. By the way, here is what my process list looks like right now, after 15 hours of uptime and 300+ online. If it improves after update, then I'll post new results for everyone to see. If this work, it truely will be one of the best fixes ever for eqemu! Thanks Lillu for your update as well! :)

http://i51.tinypic.com/xbb636.png

thepoetwarrior 10-22-2011 11:57 AM

Update:

Here is another screenshot AFTER updating the source code. Again, there are over 300+ players online.

Notice any difference how much RAM each zone is taking up vs the previous post?

Best fix ever!

http://i55.tinypic.com/x0y7vs.png

thepoetwarrior 10-23-2011 01:15 AM

Users are complaining about the melee crits not filtering, and spamming their window, but they are also very happy of no more crashes. They've tried very hard with 50 man raids to crash a zone, and were not able to crash.

Akkadius 10-23-2011 01:47 AM

Quote:

Originally Posted by thepoetwarrior (Post 204262)
Users are complaining about the melee crits not filtering, and spamming their window, but they are also very happy of no more crashes. They've tried very hard with 50 man raids to crash a zone, and were not able to crash.

This may also be because of some spell crashes that have been around for quite a while. The few things that I patched in brought 10 crashes an hour on The Hidden Forest to barely 1.

One had to do with spells that would fire off on fade or on wear off, TryFadeEffect, ultimately triggering a weapon proc that crashes the zone if an invalid spell id passes through.

Another one is very related to TryFadeEffect, but has to do with ExecWeaponProc when of course a weapon is proc'ed, validity for the spell itself is not checked in a few areas of the code. I have implemented validity checking within the function itself, and it will let the player know that it is invalid and also log it to spells logging if you have it enabled so that you can fix it.

Though fixing the memory leak will ultimately provide way more stability and save TONS of memory and is a huge fix.

Lillu reported having used anywhere between 40-60GB of RAM after 24 hours, and is now using maybe 8GB in a similar time frame. Given THF boots with 100 zoneservers and initially starts at approx 6.5GB.

All around things should be much more stable right now. I'd advise any crashes that you are getting to report the dumps back to the forums so that they can be investigated.

As far as melee hits not filtering, I'm not sure off hand but I'm sure the trade off is well worth it for now until it is investigated.

Thanks,
~Akka

Lillu 10-23-2011 02:10 AM

Akkadius is right, it's the best trade-off ever. Hell, I would trade in all our wood elf chix for that fix! :rolleyes:

Again, crashes and memory leaks are 99% gone. It's night and day. Thanks for the awesome fix.


All times are GMT -4. The time now is 05:53 PM.

Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.