Thread: lag problems
View Single Post
  #44  
Old 03-08-2019, 06:00 PM
Akkadius's Avatar
Akkadius
Administrator
 
Join Date: Feb 2009
Location: MN
Posts: 2,071
Default

Quote:
Originally Posted by Rekka View Post
From what I can tell from just looking at the code , the locking isn't at the database layer per say, it's at the MySQL connection per zone In the zonedb.cpp. there is only one connection per zone, at least from what I see.

Fsync is a delay or latency issue on the DB when dealing with transactions. Every single query in save is a transaction. (All 13+of them.) You can have a system that can do 500 transactions per sec but can do 100,000 inserts per sec if you bulk up statements.

A small latency can have a massive impact when locks are involved.

If there is latency at the DB it queues up on the zone depending on how many are in each zone. Less people in the zone the less this impacts them

Lowering the latency by limiting the fsync on the transaction call can ease the pressure on the lock on the connection which prevents stalling of the character saved. Or that is at least the idea.

**Note** this lock prevents all queries in the zone , not jsut during saves.

Also note the removal of the table scanning of the pet tables (adding indexes) helps to lower latency of the call as well

It can be easy to confuse work with latency/locks. You can have a slow system doing no work.

Honestly I would like to do more work on this and know it's a stop gap but figured doing 13x less transaction s per save was a win when someone in this thread noted commenting how some of the saves improved their latency. (Mine improved 2-3x)

I know it can be many issues and I may be barking up the wrong tree , but this is simply another option that does have a very clear improvement in performance around the zone locks.

Side note, A single mysql connection for a process is generally a less than idea situation. It is too much of a blockage area for network IO. Locks should be kept for nano/microseconds, not milliseconds. Possibly make seperate connections for read/writes depending on how the threading is setup on the zone process.. (note I have not really looked at the threading model of the zone yet, so this may be moot and may be my misunderstanding)

Note on a phone so sorry for formatting/bad Grammer. Very small window
Rekka, thank you so much for spending the time and performing analysis on things and trying to help, we always appreciate folks who take initiative to contribute to the project

There are quite a few things I want to highlight about this though and contrast what the real problems are here

On the Performance Standpoint

EQEmu is not database heavy at all, there was once upon a time where we did all kinds of stupid things but 100's of hours have been poured over reducing our performance bottlenecks across the board in departments of CPU, I/O and even Network. To illustrate here is PEQ: http://peq.akkadius.com:19999/#menu_...late;help=true

With PEQ at 800-1000 toons on the daily, we barely break 100 IOPS a second, with barely 1MB/s in writes with minimal spikes here and there. That is virtually nothing

Also with your benchmark (I assume this was you on the PR), the current stock code on some middle of the line server hardware produces the following timings

HTML Code:
[Debug] ZoneDatabase::SaveCharacterData 1, done... Took 0.000107 seconds
[Debug] ZoneDatabase::SaveCharacterData 1, done... Took 0.000091 second
This is a sub 1ms operation that is not going to hurt even when it is synchronous

(These timings entirely will depend on your hardware and MySQL server configuration of course)

These operations also happen very infrequently where it is not going to even matter.

There are many many factors that play into the overall performance of a server and since the server is essentially an infinite loop, anything within that loop can influence the amount of time that a CPU is not spent idling (Network, I/O, overly CPU intensive operations etc.). Hardware is of influence, your MySQL server configuration is of influence and most of all software is of influence

EQEmu used to be way way way more resource intensive and we've come along way to where that is not even an issue anymore. We have one outstanding bug that is isolated to the networking layer that made its way through because we never saw it on PEQ during our normal QA routines

We are currently working on the code to measure application layer network stats so folks can give us dump metrics off of so we can give a proper fix. We've seen what happens during the network anomaly during a CPU profile and there's not much that it is going to show alone but where it is spending most of its time.

We folks at EQEmu definitely have jobs that have been a deterrer from resolving said bug but we will have it on lockdown soon enough as we know exactly what we need to do, the only thing in our way is time as a resource

We are not grasping at straws to fix this folks, so please just be patient as this is just not a quick fix with our schedules

Quote:
Originally Posted by ptarp View Post
As another test.. Use the same binaries, everything else.. but in the /Maps/nav directory, create a subdirectory. Something like /Maps/nav/removed

Move all of the files from the /nav directory into the new subdirectory. Then run the server again.. Lag goes away for me.

NOTE: I'm working with highly customized server code and don't have the latest update.

As a second test, I turned off .mmf file loading and left .nav files in the /nav directory. Either solution worked for me.
Nav may "help" lag because you have less position updates being sent around from mobs not pathing or pathing less frequently along with less CPU intensive path calculations, again I will defer to my statements above that folks just be patient and we'll have a fix for folks when we have the time
Reply With Quote