Go Back   EQEmulator Home > EQEmulator Forums > Support > Support::Windows Servers

Support::Windows Servers Support forum for Windows EQEMu users.

Reply
 
Thread Tools Display Modes
  #16  
Old 02-16-2019, 12:16 AM
ptarp
Fire Beetle
 
Join Date: Jan 2010
Location: Idaho
Posts: 27
Default

Yes. This is different. Server is running on an i5. CPU stays low. Around 6%. The issue with me seems that entity_list.Process() is taking too long. By the time you get through the whole list of clients, the first one is starving for packets. Each client added increases the ms reading reported by EQ (F11 to show it in top left corner)

This same thing will affect all operating systems. Win 10 seems better than server, but is still not working well.

I enabled MySQL logging to disk. Logging was on a secondary SSD drive. MySQL data files on drive C:, server folder on D:. that's how I saw how many times per second the MySQL was being accessed.. Some of it may not take that much time, but all together, it's a DoS bomb for the hard drive even if it's SSD like mine. Turn it on and look at a zone with over 24 or 25 in it, and you'll see what I mean. Look at the times for the first client going through client::save and compare it to the last.
Since I'm logging on a separate hard drive, performance doesn't change when I turn logging on/off.

I recommend you think about dealing with this before you worry about re-send logic.
Correct the issues I'm talking about, and your re-send issues may go away.
Hope this helps.
Reply With Quote
  #17  
Old 02-16-2019, 01:27 PM
Akkadius's Avatar
Akkadius
Administrator
 
Join Date: Feb 2009
Location: MN
Posts: 2,071
Default

Quote:
Originally Posted by ptarp View Post
Yes. This is different. Server is running on an i5. CPU stays low. Around 6%. The issue with me seems that entity_list.Process() is taking too long. By the time you get through the whole list of clients, the first one is starving for packets. Each client added increases the ms reading reported by EQ (F11 to show it in top left corner)

This same thing will affect all operating systems. Win 10 seems better than server, but is still not working well.

I enabled MySQL logging to disk. Logging was on a secondary SSD drive. MySQL data files on drive C:, server folder on D:. that's how I saw how many times per second the MySQL was being accessed.. Some of it may not take that much time, but all together, it's a DoS bomb for the hard drive even if it's SSD like mine. Turn it on and look at a zone with over 24 or 25 in it, and you'll see what I mean. Look at the times for the first client going through client::save and compare it to the last.
Since I'm logging on a separate hard drive, performance doesn't change when I turn logging on/off.

I recommend you think about dealing with this before you worry about re-send logic.
Correct the issues I'm talking about, and your re-send issues may go away.
Hope this helps.
Again, these are completely unrelated.

Just because you saw a bunch of disk activity and a bunch of queries in a file doesn't mean that its the reason for lag. If you have an improperly tuned MySQL server along with something enabled that is pegging your MySQL server that is another thing and I'm happy to help diagnose those with you

I want you to contrast all of what you observed with PEQ's disk activity:

http://peq.akkadius.com:19999/#menu_...late;help=true

PEQ has over 800 players right now at maybe stays around 1MB/s writes if at all and occasional bursts, IO operations stay down at a very very low amount even for 800 players

Client::Save is a very light operation, there's maybe a handful of INSERT's or REPLACE into's that occur which are all sub 10ms inserts. We could use less Client::Save's in general but it really isn't the problem here

You don't need to turn on the MySQL general log when you can see exactly what a zone process is doing by enabling MySQL logging at the process level. Even if you pipe that to another drive it still is overhead to the MySQL process

https://github.com/EQEmu/Server/wiki...-System#gm-say

In the `logsys_categories` table you can shut off any category you are piping to file

Back to the Network Issue

We know exactly what's going on with the network issue because we've taken CPU snapshot profiles during the problem. It's just not a quick "Fix" and we typically chose to go through a very careful staged approach before reintroducing this into mainline because of the complex factors involved

The reason we've seen this far less on PEQ is because PEQ has a OC'ed 5Ghz core processor, DDR4 memory and NVME Datacenter SSD's. When the zone processes goes into resend storm logic, it can keep up with the very aggressive resend logic just enough until the client either disconnects from their own terrible connection or the client itself recovers.

There is still a breaking point with our hardware however, it just takes a lot more to get there. If we had over 100 toons in a zone PEQ and we had something produce enough resend logic (Like raid combat spam burning) it would trip the same inflection point that most folks are seeing on their Windows nodes at 20-40 people in a zone with 2.6Ghz ish processors and whatever else they're using on their boxes. Even with over 100 toons it is still very rare to see it just because of the very tight hardware that is being utilized

Regardless, you shouldn't need the above specs to run a server, that is not the point at all. The point is why we've not run into this issue up until this point because most of our code QA goes through PEQ and our hardware has been masking the problem. Before we released the new netcode overhaul to mainline we went through several several iterations of issues and actually drastically improved our overall netcode utilization massively which I am still super stoked about to this day, we just have this one issue plaguing people and we will have it resolved soon, so just stay tuned for updates
Reply With Quote
  #18  
Old 02-16-2019, 09:30 PM
Drakiyth's Avatar
Drakiyth
Dragon
 
Join Date: Apr 2012
Posts: 549
Default

Quote:
Originally Posted by Akkadius View Post
Again, these are completely unrelated.

Just because you saw a bunch of disk activity and a bunch of queries in a file doesn't mean that its the reason for lag. If you have an improperly tuned MySQL server along with something enabled that is pegging your MySQL server that is another thing and I'm happy to help diagnose those with you

I want you to contrast all of what you observed with PEQ's disk activity:

http://peq.akkadius.com:19999/#menu_...late;help=true

PEQ has over 800 players right now at maybe stays around 1MB/s writes if at all and occasional bursts, IO operations stay down at a very very low amount even for 800 players

Client::Save is a very light operation, there's maybe a handful of INSERT's or REPLACE into's that occur which are all sub 10ms inserts. We could use less Client::Save's in general but it really isn't the problem here

You don't need to turn on the MySQL general log when you can see exactly what a zone process is doing by enabling MySQL logging at the process level. Even if you pipe that to another drive it still is overhead to the MySQL process

https://github.com/EQEmu/Server/wiki...-System#gm-say

In the `logsys_categories` table you can shut off any category you are piping to file

Back to the Network Issue

We know exactly what's going on with the network issue because we've taken CPU snapshot profiles during the problem. It's just not a quick "Fix" and we typically chose to go through a very careful staged approach before reintroducing this into mainline because of the complex factors involved

The reason we've seen this far less on PEQ is because PEQ has a OC'ed 5Ghz core processor, DDR4 memory and NVME Datacenter SSD's. When the zone processes goes into resend storm logic, it can keep up with the very aggressive resend logic just enough until the client either disconnects from their own terrible connection or the client itself recovers.

There is still a breaking point with our hardware however, it just takes a lot more to get there. If we had over 100 toons in a zone PEQ and we had something produce enough resend logic (Like raid combat spam burning) it would trip the same inflection point that most folks are seeing on their Windows nodes at 20-40 people in a zone with 2.6Ghz ish processors and whatever else they're using on their boxes. Even with over 100 toons it is still very rare to see it just because of the very tight hardware that is being utilized

Regardless, you shouldn't need the above specs to run a server, that is not the point at all. The point is why we've not run into this issue up until this point because most of our code QA goes through PEQ and our hardware has been masking the problem. Before we released the new netcode overhaul to mainline we went through several several iterations of issues and actually drastically improved our overall netcode utilization massively which I am still super stoked about to this day, we just have this one issue plaguing people and we will have it resolved soon, so just stay tuned for updates
Akkadius,

I just want to say that the Varlyndria players and myself really appreciate everything you and the main EQ Devs are doing to fix this lag issue. I could only imagine the frustration it could bring. One thing I have done for my hub zone is create public instances that players can travel to. This helps free up congestion if lag starts occurring in the non-instanced zone. I encourage any server owner to do the same while this issue remains.

Here is to a quick recovery so we can all once again enjoy a solid amount of players in the same zone with no issues.
Reply With Quote
  #19  
Old 02-25-2019, 07:53 PM
eldarian's Avatar
eldarian
Fire Beetle
 
Join Date: May 2017
Posts: 25
Default

has there been any new progress on this issue? Very frustrating a commonly used processor for hosting is causing this much turmoil
Reply With Quote
  #20  
Old 02-25-2019, 08:33 PM
Akkadius's Avatar
Akkadius
Administrator
 
Join Date: Feb 2009
Location: MN
Posts: 2,071
Default

Quote:
Originally Posted by eldarian View Post
has there been any new progress on this issue? Very frustrating a commonly used processor for hosting is causing this much turmoil
Update is we've had it on PEQ, we're making additional tweaks that go live tomorrow, this takes time to test until we feel its ready to go back into mainline
Reply With Quote
  #21  
Old 02-25-2019, 08:35 PM
eldarian's Avatar
eldarian
Fire Beetle
 
Join Date: May 2017
Posts: 25
Default

i know AEQ would be very happy to be your test server for this fix community has communicated as much to me. feel free to reach out to me and we can do what we need too to test it in operation
Reply With Quote
  #22  
Old 03-02-2019, 06:50 PM
Akkadius's Avatar
Akkadius
Administrator
 
Join Date: Feb 2009
Location: MN
Posts: 2,071
Default

We pushed changes last night that have been tested on PEQ for over a week with 800+ toons with no issues. Also tested on Legacy of Norrath before they shut down

https://ci.appveyor.com/api/projects...86-no-bots.zip

Give that a whirl
Reply With Quote
  #23  
Old 03-02-2019, 09:48 PM
Drakiyth's Avatar
Drakiyth
Dragon
 
Join Date: Apr 2012
Posts: 549
Default

Quote:
Originally Posted by Akkadius View Post
We pushed changes last night that have been tested on PEQ for over a week with 800+ toons with no issues. Also tested on Legacy of Norrath before they shut down

https://ci.appveyor.com/api/projects...86-no-bots.zip

Give that a whirl
I plan to add this tomorrow morning to Varlyndria. We all thank you for this fix.
Reply With Quote
  #24  
Old 03-03-2019, 11:16 AM
eldarian's Avatar
eldarian
Fire Beetle
 
Join Date: May 2017
Posts: 25
Default

Quote:
Originally Posted by Drakiyth View Post
I plan to add this tomorrow morning to Varlyndria. We all thank you for this fix.
Let me know if this fix worked for you in any areas
Reply With Quote
  #25  
Old 03-03-2019, 05:08 PM
Drakiyth's Avatar
Drakiyth
Dragon
 
Join Date: Apr 2012
Posts: 549
Default

Quote:
Originally Posted by eldarian View Post
Let me know if this fix worked for you in any areas

I added the source code from Akkadius' link above to Varlyndria early this morning, and then did a stress test with the server that didn't go so well. I even tried the pull method of the latest unstable source in the folder. The stress test in Nexus started bugging out with 18+ players when the spike came back. It does appear to be better than it was before, but not what I was expecting. Varlyndria is currently on a AWS Large T3 windows system. It has held over 118 clients online + pets just fine as long as they are in different zones/instances of the high-traffic hubs and under 11 in total. (on average). Now it seems like 16-18 or so, but the lag does come back full force and spikes the zone out badly -- eventually crashing it, or forcing me to shut it down.

I've heard from a source on my discord that Linux using developers are having more luck with it. I've been running windows since I started with EQemu and I've never seen an issue like this before, aside from not having enough connection speed to handle the player population.

At this point, I am hesitant/undecided to see if a stronger connection than T3 Large would produce better results with the change.



Any professional advice that can be given on the situation would be helpful.
Reply With Quote
  #26  
Old 03-04-2019, 09:38 AM
ptarp
Fire Beetle
 
Join Date: Jan 2010
Location: Idaho
Posts: 27
Default

There just appears to be something in the windows compile that's making it "hiccup".. I'm wondering if I have to switch to Linux.
Reply With Quote
  #27  
Old 03-04-2019, 11:02 AM
Maze_EQ
Demi-God
 
Join Date: Mar 2012
Posts: 1,106
Default

Quote:
Originally Posted by Akkadius View Post
We pushed changes last night that have been tested on PEQ for over a week with 800+ toons with no issues. Also tested on Legacy of Norrath before they shut down

https://ci.appveyor.com/api/projects...86-no-bots.zip

Give that a whirl
It worked on our dev build with 80 clients in the same zone.

Our dev environment previously couldn't handle 20+.
__________________
"No, thanks, man. I don't want you fucking up my life, too."

Skype:
Comerian1
Reply With Quote
  #28  
Old 03-04-2019, 12:36 PM
ptarp
Fire Beetle
 
Join Date: Jan 2010
Location: Idaho
Posts: 27
Default

Quote:
Originally Posted by Maze_EQ View Post
It worked on our dev build with 80 clients in the same zone.

Our dev environment previously couldn't handle 20+.
You're running windows?
Reply With Quote
  #29  
Old 03-04-2019, 02:44 PM
Akkadius's Avatar
Akkadius
Administrator
 
Join Date: Feb 2009
Location: MN
Posts: 2,071
Default

So - we could be dealing with a few factors here, while the resend issue was a very valid issue that we took care of, I have a hunch that something is of influence in the windows realm here

I have another question for you guys, where have you guys been getting your binaries?

Have you been compiling them yourselves? In the past few months we switched our main source of windows binary updates from our CI system and I just want to rule out a bad or imperformant library or compilation setting

At the end of the day, Windows or Linux you should be able to run on either, we'll get it figured out
Reply With Quote
  #30  
Old 03-04-2019, 04:01 PM
Maze_EQ
Demi-God
 
Join Date: Mar 2012
Posts: 1,106
Default

Quote:
Originally Posted by Akkadius View Post
So - we could be dealing with a few factors here, while the resend issue was a very valid issue that we took care of, I have a hunch that something is of influence in the windows realm here

I have another question for you guys, where have you guys been getting your binaries?

Have you been compiling them yourselves? In the past few months we switched our main source of windows binary updates from our CI system and I just want to rule out a bad or imperformant library or compilation setting

At the end of the day, Windows or Linux you should be able to run on either, we'll get it figured out
I built these myself.

I'll see if i can repro with your installer.
__________________
"No, thanks, man. I don't want you fucking up my life, too."

Skype:
Comerian1
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

   

All times are GMT -4. The time now is 07:55 AM.


 

Everquest is a registered trademark of Daybreak Game Company LLC.
EQEmulator is not associated or affiliated in any way with Daybreak Game Company LLC.
Except where otherwise noted, this site is licensed under a Creative Commons License.
       
Powered by vBulletin®, Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Template by Bluepearl Design and vBulletin Templates - Ver3.3