Speed optimizations?
I saw it said on GW2 that server lag was mainly due to CPU load rather than too small a network pipe, so I spent some time looking at the server code to look for places to speed up the code. Without a profiler it's really hard to tell where the problem is, so I decided to simply poke around. My understanding is that there's a thread for each client that's connected to a zone server, and one or more threads that handle communication to and from the socket port.
I could be way off base, but one thing that really bothered me was all the memory allocations and deallocations. Though I had a real hard time finding a way to avoid this, one possibility might be in the EQNetworkPacket class. It seems that the data area is allocated to match the amount of bytes to be sent, but I'm wondering if it might be better to define it as a static array of 512 bytes (the maximum) instead. It sacrifices some memory, but that way the data area will be allocated along with the class and maybe it will save some time. Another possibility (though you probably won't like this one) would be to make APPLAYER a struct instead of a class and preallocate the data area at the same time as the struct. You'd have to change all of its methods to procedures, but it would knock out half of the memory allocations and deallocations... WC |
Optimizing stuff like this is fun. If I were a betting man, I'd put money on the line-of-sight calculations as the guilty party for large CPU loads. That area could probably use optimization, but ultimately there comes a point where the code can't be optimized further and the load should be handled by > 1 server.
Maybe the people who run guildwars can set up a profiler of some sort to determine what causes the load. The quickest and easiest profiling is to see which thread is getting the most attention. There's only a limited number of threads in the app, and all of the line-of-sight calcs are done on one thread. For APPLAYER in particular, my guess would be to make the object smarter and bear the memory alloc. Most instances of the object end up being copied several times before making it to the network pipe. Doodman had a good idea along these lines a while back, maybe he could share it here :) |
Another thing I ran across was code somewhere that did a linear search through some sort of door list to find one with a certain ID. A binary search would be infinitely more efficient (log(n) vs. n).
WC |
We reduced alot of the search problems when we split the list up into the multiple componets... Every search through the entity list for ANYTHING would run through the entire entity list, so it was insane..
I havent done any profiling recently, but i'm sure the CheckCloseArgo function and its descendant function calls are still the top of the list. I put a very low cost distance calc. filter at the top of that function that cuts out 95% of the extra calls though, but it could probably still use some work. More efficent structures would be nice too in some places rather than a list. We talked about red-black BST's at one time which would make searches much more efficient.. |
You can't do a binary search on an unsorted list or a non ordinal list. Ours is currently the former. You also run into the problem of what to sort by? It's certainly the most expensive to search by name, so we could sort by name and speed that up the most. However, if we call by EntID or ID more, it wouldn't help in the in. It's a sticky stituation. APPLAYER is already smart about it's memory and allocates it itself. You cannot easily get around the news and deletes of applayers, even if you statically allocate them they are still going to be news and deletes in basically the same place (think C99 and compliant compilers).
|
here is a piece of code that might help.
Basically, pre-computes neighborhood for all mobs in zone every x seconds. This is quite fast, and --linear-- in Entity::AggroCheckClose, instead of parsing all mob_list and checking for neighbors, use that cache Below are two snippets i used DoorProcess to update neighborhood, as this is called in most zones (if not all) whatever the players in it. Thus, every 5 secs, an update of the terrain (a static singleton of Terrain class) if performed. After this has been called, terrain hopefully has a list (hastable) of all mobs within a given range, for each mob. i know this is not the right place to do it, but for the moment i didn't want to get too intrusive and add pieces of code in net.cpp or wherever. Code:
--- ../../cvs/eqemu/eqemu/Source/zone/entity.cpp 2004-02-02 13:55:07.000000000 +0000 when checking for a AgrroRadius match, get the vector of mobs that were matched as neighbors. Use this list insted if the full mob_list to check for aggro. Code:
--- ../../cvs/eqemu/eqemu/Source/zone/MobAI.cpp 2004-02-02 13:54:59.000000000 +0000 http://perso.wanadoo.fr/afou/khalzed...ighbors.tar.gz you need to add -DKD and -DKDSTAT to you COPTS in the zone makefile tune verbosity by editing the VERBOSE define in Terrain.cpp Tpical output Code:
... Code:
in EntityList::DoorProcess, time is 5.0: Drawbacks : * you must select a big-enough value for the cache range checking. on a 'typical' DB : Code:
mysql> select aggroradius, count(id) from npc_types group by aggroradius; * Use a better timer. Once every second should be fine. * Nothing ensures that the first call to cache is valid. Thus the first checkings may abort. It does not seem to have any impact, as process goes up normally after the cache has been set. * the code was written from scratch, or translated from java. It has memory leaks (at least freeing the old hashtable before update, but most probably more). However interfaces are small, so one could easily subst Hashtable and Vector classes for more reliable ones. As of this code, it gets Sig segv after a few minutes. =( * some comments are out of date, and irrelevant :shock: Besides this (little) unpleansant effect, results are good. Very good. |
worked on it a bit more. The link from the previous post is updated
Some memory leaks fixed, and a few changes to the support code. Most zones work fine (as far as tested, only qeynos crashes, after 4-5 mins, always on a pet...). 'd be interested for some feedback if any. |
I have never run EQEMU under a profiler. That being said, I still thought I'd mention a couple of things. Ancillary data structures of pointers to entities can be used to satisfy the requirement of multiple sort criteria. Adding non-standard containers (with the possible exception of hashtable) to the code is probably not wise - espescially if they are not proven or have unusual semantics. The caching suggestion is interesting and I'd love to see it under a profiler. I'm wondering if it would lead to natural extension using some sort of bounding volumes. Once you deliniate the caching "terrain" for a given entity, it seems like a natural extension to use that as a bounding volume to simplify in/out calculations. If one could assume square bounding boxes, the range calculations would be so fast as to probably eliminate the need to be cached (although it would open up some interesting new strategies - like sneaking up on a mob's diagonal).
|
as for the profiler, here is an exeirpt of profiler output (gprof)
Code:
Flat profile: http://perso.wanadoo.fr/afou/khalzed...qzone.gmon.out http://perso.wanadoo.fr/afou/khalzed...qzone.gmon.out http://perso.wanadoo.fr/afou/khalzed...qzone.gmon.out http://perso.wanadoo.fr/afou/khalzed...dzone.gmon.out http://perso.wanadoo.fr/afou/khalzed...dzone.gmon.out or all in a pack : http://perso.wanadoo.fr/afou/khalzed...e-files.tar.gz These are short runs, with one player if any. If you don't bother to read, caching as it is now gives cpu load a rough 50% down, depending on the zone (AICheckCloseArrgo cumulated accounts for between 60-70% of non-cached version, divided by 10 or so in the cached version). 'm not sure about the oddity of a hash or vector class, instead of inline manipulation 'don't know about bounding boxes extensions. It's like making a function recursive to non recursive, or vice-versa. It's always possible, but .. i'll work in it afterwards |
Quote:
Maybe I was unnecessarily confusing by relating this to your plans. It just seemed to me that you'd already have to deal with bounding volumes in order to generate your caches. EDIT: ack! what am I thinking? More like 3 multiplies, 3 subtracts, an add, and a comparison (for 2d, and the add/subtract are the expensive ops for floats). |
Quote:
In the Terrain code, it is exact-distance calc. However this is little overhead, as : - those in the corner (pass square test, fail exact test) are not checked against lately in AICheckCloseArrgo), thus time saved in the end - event though caching uses cpu (about 3% in very heavy zones), it wipes-out a lot of tests afterwards. |
Quote:
|
The reason for the CPU usage problem is moreso due to memory leaks, when the ram usage becomes to the point where it starts using the hard drive as ram (pagefile) thats when the 'shit hits the fan'. Other than that its more of the fact that I am currently running 15 STATIC zones (not bootable) per computer or more (This changes tonight as im setting up the new computers). And the computers are outdated (p3 733, athlon 600), the 'good' computers are xp 1700 and xp 2000. But if you put enough on them they can't handle it obviousely. But compared to what EQEMu was a while ago we are doing much better IMO.
|
Quote:
I can run up to 10 zones, like kaladima, kaladimb, nexus, i.e up to 50 npcs only per zone, average 10M memory and 1 to 5 % cpu (that's what top says). But when it comes to more populated zones, eg around 150 mobs, it goes up to 40-50% cpu, even more wih the pets spawning i could not run any zone more than 200 mobs before this cache, due to cpu at first glance. Load is, as far as AICheckCloseArrgo is concerned, N*N. thus for 150 mobs it accounts 70% of zone load. Anyway, with the cache, i could run 500+ mobs in a zone on that very machine ;) Memory then is about 40-50M, growing up slowly. |
Even so, the idea of bounding boxes is growing on me. It is looking like a really elegant solution to the line of sight problem. In the simplest case, you attach a vector of rectangles to each mob. One rectangle would be the mob's aggro area and the others would be facets cut away by occluding intercepts like walls. Determining line-of-sight then becomes a simple matter of taking the difference of our rectangles and testing for inclusion of a point. Extend to 3d as necessary. The math is simple and the bounding areas/volumes can be precomputed and stored in the db since mobs always spawn in the same location AFAIK. Contrast this to the existing method (or at least the method I saw suggested by Wiz), which says that if you want to find out if you can see from a to b you should take delta-sized steps in the direction of b and check at each step to see if you're standing on a wall (I doubt there is an acceptable delta in terms of both performance and wall-thickness).
The prime need for line of sight calculations that I'm aware of right now stems from the fact that radial distances are the only way to specify aggro distance. I.e... in a dungeon, you can't specify that a monster in a corner should only attack critters in the same room. Even without using multiple rectangles per mob, moving to a rectangular bounding area would allow more intelligent specification of aggro ranges (IMHO). |
This sounds interesting.
Quote:
Generally speaking, this is a kind of BSP you refer to, clipped by Aggro/Assist/whatever radius. Such BSP exist in the client's zone files. It seems also that the map package in zone (will|might|could|ought to) be used to describe, and thus maybe partition, space. It uses *.map files, but i spent very little time looking at what it does, so maybe it's just missaying. BTW, i found something about caching : you must keep it up to date ;). I came accross the idea of tracing despawns, and, yes, after despawn, using the reference from the cache doesnt end up too well ... Still some work on that topic. |
Quote:
Quote:
Quote:
|
Quote:
I just downloaded and examined some of your code and it is the funniest thing I've seen in a while. The "hashtable" in particular is the funniest data structure I've ever seen. You state in the code that it should have O(n) inserts and O(log n) searches. This is the first clue that something is seriously wrong. A proper hash table should have O(1) inserts, O(1) deletes and O(1) lookups (assuming good hash functions). What you have is really a sorted vector. The really funny part, though, is when you realize that it isn't even a good sorted vector! The search function is recursive, which is a little ugly. The really funny part is that you walk the vector to insert in sorted order instead of using the binary search function you've already created. And all of these numbers are biased by using comparisons as our atomic operation. If you look at overall efficiency, the story saddens considerably more. A good knowledge of data structures is the most dramatic difference between a good programmer and a poor one. In my experience, 80% of all programming tasks are solved by setting up appropriate data structures and then manipulating them (classes are data structures when using this view). I commend you for trying to write your own, since that can be an excellent way to learn, but still felt compelled to rip on you a bit for such a funny implementation. |
i wont take offense. Still, it saves over half cpu....
The structures where built from scratch, and what is called a hashable, well, it's a keyed set. Think of searching a 'flat" balanced binary tree. Whatever the names or code involved, setting up data structures was a waste of time in my mind at that time. i would encourage anyone willing to provide new structures to the project It could event have used linked lists, as this -crappy- piece of code's efficiency comes not from data structures. Well, good and efficient DS would help. |
All times are GMT -4. The time now is 06:54 PM. |
Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.