EQEmulator Forums

EQEmulator Forums (https://www.eqemulator.org/forums/index.php)
-   Development::Server Code Submissions (https://www.eqemulator.org/forums/forumdisplay.php?f=669)
-   -   Fix for the Windows Server lag issue (https://www.eqemulator.org/forums/showthread.php?t=42762)

peterigz 01-02-2020 01:13 PM

Fix for the Windows Server lag issue
 
Hello, regarding the lag issues with some Windows Platforms (http://www.eqemulator.org/forums/showthread.php?t=42311), it looks like for some reason Windows Server tripped up over itself when using LibUV in a while loop with UV_RUN_NOWAIT and is much happier with UV_RUN_DEFAULT. Here's the commit: https://github.com/peterigz/Server/c...d9354b846bddb0

Currently running this on The Hidden Forest and all lag issues have now been resolved, 18 would lag heavily for us and drop clients, so far we've had 50+ in zone with no problem, hopefully we can stress test with more soon but either way it's way better then it was.

I'll give it some more time in case any other issues arise, then I can do a pull request. I can't check Linux, maybe someone can build on that platform to see if all still ok there, can't see any reason why not.

demonstar55 01-02-2020 02:15 PM

Reading the documentation on libuv for UV_RUN_NOWAIT, it says

"Poll for i/o once but don’t block if there are no pending callbacks. Returns zero if done (no active handles or requests left), or non-zero if more callbacks are expected (meaning you should run the event loop again sometime in the future)."

We might just need to actually check the return value and run the event loop sooner.

Huppy 01-02-2020 02:18 PM

I'm assuming, anyone using older code (few months or more), where zone/main.cpp did not exist, this diff would be used in zone/net.cpp ? Or would it work ?

peterigz 01-02-2020 04:02 PM

I think you'd probably be ok to do that Huppy, just have to be careful to only merge in the relevant changes, not sure how much changed in net/main.cpp since your version.

Quote:

We might just need to actually check the return value and run the event loop sooner.
I think letting LibUV handle the main loop simplifies things. Those sleeps are gone now but I didn't notice any difference in CPU usage either, it might even be slightly lower for us now.

chrsschb 01-03-2020 12:53 AM

Anyone else tested this yet?

FievelMousey 01-03-2020 02:58 PM

Nope but it be cool if it is working gets pushed main source soon

Akkadius 01-03-2020 08:37 PM

Quote:

Originally Posted by peterigz (Post 264263)
Hello, regarding the lag issues with some Windows Platforms (http://www.eqemulator.org/forums/showthread.php?t=42311), it looks like for some reason Windows Server tripped up over itself when using LibUV in a while loop with UV_RUN_NOWAIT and is much happier with UV_RUN_DEFAULT. Here's the commit: https://github.com/peterigz/Server/c...d9354b846bddb0

Currently running this on The Hidden Forest and all lag issues have now been resolved, 18 would lag heavily for us and drop clients, so far we've had 50+ in zone with no problem, hopefully we can stress test with more soon but either way it's way better then it was.

I'll give it some more time in case any other issues arise, then I can do a pull request. I can't check Linux, maybe someone can build on that platform to see if all still ok there, can't see any reason why not.

Hey Peter!

Thanks for investigating into this, it is always appreciated when people take initiative to improve the codebase for the community

The UV library definitely polls the underlying systems differently for each platform so it would make sense that this would cause the symptoms that we've been seeing to some degree on the Windows platform

The reason why the sleeps were in place is because KLS was still going through refactoring at the time and they can probably be removed

As far as testing this, I'd like to see other populated Windows servers give this a test and then test it on a branch via PEQ and I see no reason to at least run the loop with the default mode set which is what we should be doing anyways but for the former reasons mentioned; KLS left it as is for the time being.

I know you mentioned that it seemed to have worked after your changes, but we had also had confirmed fixes with Windows servers of over 100 people in a zone and then eventually someone runs into the symptoms again. So to be sure this is what we were looking into I'd just want a few others to validate that

I think most servers 30+ (subtract p99) are on Linux currently so not sure who could test currently, but either way they can report their findings here and we can go from there

Nice find!

FievelMousey 01-03-2020 08:59 PM

I not one who likes add in changes to files if they was a branch had this fix in it i mess with it

chrsschb 01-03-2020 09:35 PM

I have a windows server so if I get time this weekend I'll give it a run.

Edit: Pushed changes to server this morning. Will probably be tonight before enough people online to see results.

chrsschb 01-05-2020 01:05 AM

30+ in zone, plus pets and 24 bots. No lag. Like Peter, actually seems to be running a little better.


All times are GMT -4. The time now is 10:31 AM.

Powered by vBulletin®, Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.