EQEmulator Forums

EQEmulator Forums (https://www.eqemulator.org/forums/index.php)
-   Archive::Development (https://www.eqemulator.org/forums/forumdisplay.php?f=621)
-   -   Speed optimizations? (https://www.eqemulator.org/forums/showthread.php?t=11504)

Windcatcher 01-22-2004 05:09 PM

Speed optimizations?
 
I saw it said on GW2 that server lag was mainly due to CPU load rather than too small a network pipe, so I spent some time looking at the server code to look for places to speed up the code. Without a profiler it's really hard to tell where the problem is, so I decided to simply poke around. My understanding is that there's a thread for each client that's connected to a zone server, and one or more threads that handle communication to and from the socket port.

I could be way off base, but one thing that really bothered me was all the memory allocations and deallocations. Though I had a real hard time finding a way to avoid this, one possibility might be in the EQNetworkPacket class. It seems that the data area is allocated to match the amount of bytes to be sent, but I'm wondering if it might be better to define it as a static array of 512 bytes (the maximum) instead. It sacrifices some memory, but that way the data area will be allocated along with the class and maybe it will save some time.

Another possibility (though you probably won't like this one) would be to make APPLAYER a struct instead of a class and preallocate the data area at the same time as the struct. You'd have to change all of its methods to procedures, but it would knock out half of the memory allocations and deallocations...

WC

Merth 01-25-2004 03:03 PM

Optimizing stuff like this is fun. If I were a betting man, I'd put money on the line-of-sight calculations as the guilty party for large CPU loads. That area could probably use optimization, but ultimately there comes a point where the code can't be optimized further and the load should be handled by > 1 server.

Maybe the people who run guildwars can set up a profiler of some sort to determine what causes the load. The quickest and easiest profiling is to see which thread is getting the most attention. There's only a limited number of threads in the app, and all of the line-of-sight calcs are done on one thread.

For APPLAYER in particular, my guess would be to make the object smarter and bear the memory alloc. Most instances of the object end up being copied several times before making it to the network pipe. Doodman had a good idea along these lines a while back, maybe he could share it here :)

Windcatcher 01-25-2004 03:05 PM

Another thing I ran across was code somewhere that did a linear search through some sort of door list to find one with a certain ID. A binary search would be infinitely more efficient (log(n) vs. n).

WC

Trumpcard 01-25-2004 11:44 PM

We reduced alot of the search problems when we split the list up into the multiple componets... Every search through the entity list for ANYTHING would run through the entire entity list, so it was insane..

I havent done any profiling recently, but i'm sure the CheckCloseArgo function and its descendant function calls are still the top of the list. I put a very low cost distance calc. filter at the top of that function that cuts out 95% of the extra calls though, but it could probably still use some work.

More efficent structures would be nice too in some places rather than a list. We talked about red-black BST's at one time which would make searches much more efficient..

kathgar 01-26-2004 02:56 AM

You can't do a binary search on an unsorted list or a non ordinal list. Ours is currently the former. You also run into the problem of what to sort by? It's certainly the most expensive to search by name, so we could sort by name and speed that up the most. However, if we call by EntID or ID more, it wouldn't help in the in. It's a sticky stituation. APPLAYER is already smart about it's memory and allocates it itself. You cannot easily get around the news and deletes of applayers, even if you statically allocate them they are still going to be news and deletes in basically the same place (think C99 and compliant compilers).

smogo 02-05-2004 12:13 AM

here is a piece of code that might help.

Basically, pre-computes neighborhood for all mobs in zone every x seconds. This is quite fast, and --linear--
in Entity::AggroCheckClose, instead of parsing all mob_list and checking for neighbors, use that cache

Below are two snippets



i used DoorProcess to update neighborhood, as this is called in most zones (if not all) whatever the players in it. Thus, every 5 secs, an update of the terrain (a static singleton of Terrain class) if performed. After this has been called, terrain hopefully has a list (hastable) of all mobs within a given range, for each mob. i know this is not the right place to do it, but for the moment i didn't want to get too intrusive and add pieces of code in net.cpp or wherever.

Code:

--- ../../cvs/eqemu/eqemu/Source/zone/entity.cpp  2004-02-02 13:55:07.000000000 +0000
+++ entity.cpp  2004-02-05 11:18:55.000000000 +0000
@@ -52,6 +52,10 @@
 #define strcasecmp _stricmp
 #endif

+#ifdef KD
+#include "../extra/Vector.h"
+#include "../extra/Terrain.h"
+#endif
 extern Zone* zone;
 extern volatile bool ZoneLoaded;
 extern WorldServer worldserver;
@@ -216,6 +220,24 @@
    else
      iterator.Advance();
  }
+#ifdef KD
+ printf("in EntityList::DoorProcess, time is %4.1f: \n",Timer::GetCurrentTime()*0.001f);
+ LinkedListIterator<Mob*> miter(mob_list);
+ Vector *vmob=new Vector();
+ miter.Reset();
+ while(miter.MoreElements()) {
+  //printf("Yet another mob in the list...\n");
+  vmob->add(miter.GetData());
+  miter.Advance();
+ }
+ printf("got %d entities from mob_list\n",vmob->size());
+ Mob* mobs[vmob->size()];
+ vmob->copyInto(mobs);
+ // aggregate on a max 200.0 distance
+ // this should be dynamically set to the max aggro range of all mobs
+ terrain.update(vmob->size(),mobs,200.0f);
+ delete vmob;
+#endif
  if(count==0)

in MobAI
when checking for a AgrroRadius match, get the vector of mobs that were matched as neighbors. Use this list insted if the full mob_list to check for aggro.
Code:

--- ../../cvs/eqemu/eqemu/Source/zone/MobAI.cpp 2004-02-02 13:54:59.000000000 +0000
+++ MobAI.cpp 2004-02-05 10:41:31.000000000 +0000
@@ -20,6 +20,11 @@
  extern SPDat_Spell_Struct spells[SPDAT_RECORDS];
 #endif

+#ifdef KD
+#include "../extra/LL.h"
+#include "../extra/Terrain.h"
+#endif
+
 extern EntityList entity_list;
 extern Database database;
 extern Zone *zone;
@@ -222,7 +227,34 @@
 Mob* EntityList::AICheckCloseArrgo(Mob* sender, float iArrgoRange, float iAssistRange) {
  if (!sender || !sender->IsNPC())
    return 0;
+#ifdef KD
+ Vector *ngh=terrain.neighbors(sender->GetID());
+ if(ngh==NULL){
+  //printf("Mob::AICheckCloseArrgo , %d has no neighbors\n",sender->GetID());
+  return 0;
+ }
+ int ncount= ngh->size();
+ //printf("Mob::AICheckCloseArrgo , got %d neighbors for %d ",ncount, sender->GetID());
+ Mob* neighbors[ncount];
+ ngh->copyInto(neighbors);
+ //for(int i=0;i<ncount;i++) {
+ //  printf("%d:[%4.1f,%4.1f]", ((Mob*)ngh->get(i))->GetID(),
+ //                          ((Mob*)ngh->get(i))->GetX(), ((Mob*)ngh->get(i))->GetY());
+ //}
+ KDIterator<Mob*> iterator(neighbors,ncount,FORWARD);
+#ifdef KDSTAT
+ printf("%d :",sender->GetID());
+ iterator.Reset();
+ while(iterator.MoreElements()) {
+  Mob* mob = iterator.GetData();
+  printf("<%d>",mob->GetID());
+  iterator.Advance();
+ }
+ printf("\n"); fflush(stdout);
+#endif //KDSTAT
+#else
  LinkedListIterator<Mob*> iterator(mob_list);
+#endif
  iterator.Reset();
  float dist;
  //float distZ;

the other code (Terrain and Hashtable and Vector ... )are a available here
http://perso.wanadoo.fr/afou/khalzed...ighbors.tar.gz
you need to add -DKD and -DKDSTAT to you COPTS in the zone makefile
tune verbosity by editing the VERBOSE define in Terrain.cpp

Tpical output
Code:

...
Request for neighborhood before initializing
Request for neighborhood before initializing
...

Code:

in EntityList::DoorProcess, time is  5.0:
got 55 entities from mob_list
updating terrain, 55 mobs
enter collide with 55 elements
 Terrain::collide for distance 100.0 found :
  54 groups of average size 9.9 (max group : 18 + self),  and 1 lonely
 linear parse : 532 instead of 3025 (17.6%)
leavin collide
nieghbours set

llnear : parse checks wil be made using cache, and thus estimates the amount of cpu used (17,6% means you cut down cpu for check of 82%)

Drawbacks :
* you must select a big-enough value for the cache range checking. on a 'typical' DB :
Code:

mysql> select aggroradius, count(id) from npc_types group by aggroradius;
+-------------+-----------+
| aggroradius | count(id) |
+-------------+-----------+
|          0 |        63 |
|          45 |      152 |
|          65 |    26669 |
|          85 |        37 |
|        200 |    44854 |
+-------------+-----------+

set distance to 200.0f + 20.f (estimated distance moved between two cache updates). The lower, the smaller the groups, thus the less CPU used. But some zones might only have 65.0 agrroradius mobs.

* Use a better timer. Once every second should be fine.

* Nothing ensures that the first call to cache is valid. Thus the first checkings may abort. It does not seem to have any impact, as process goes up normally after the cache has been set.

* the code was written from scratch, or translated from java. It has memory leaks (at least freeing the old hashtable before update, but most probably more). However interfaces are small, so one could easily subst Hashtable and Vector classes for more reliable ones.
As of this code, it gets Sig segv after a few minutes. =(

* some comments are out of date, and irrelevant :shock:


Besides this (little) unpleansant effect, results are good. Very good.

smogo 02-05-2004 01:50 PM

worked on it a bit more. The link from the previous post is updated

Some memory leaks fixed, and a few changes to the support code. Most zones work fine (as far as tested, only qeynos crashes, after 4-5 mins, always on a pet...).

'd be interested for some feedback if any.

Eglin 02-05-2004 08:31 PM

I have never run EQEMU under a profiler. That being said, I still thought I'd mention a couple of things. Ancillary data structures of pointers to entities can be used to satisfy the requirement of multiple sort criteria. Adding non-standard containers (with the possible exception of hashtable) to the code is probably not wise - espescially if they are not proven or have unusual semantics. The caching suggestion is interesting and I'd love to see it under a profiler. I'm wondering if it would lead to natural extension using some sort of bounding volumes. Once you deliniate the caching "terrain" for a given entity, it seems like a natural extension to use that as a bounding volume to simplify in/out calculations. If one could assume square bounding boxes, the range calculations would be so fast as to probably eliminate the need to be cached (although it would open up some interesting new strategies - like sneaking up on a mob's diagonal).

smogo 02-05-2004 11:48 PM

as for the profiler, here is an exeirpt of profiler output (gprof)

Code:

Flat profile:

Each sample counts as 0.01 seconds.
  %  cumulative  self              self    total         
 time  seconds  seconds    calls  s/call  s/call  name   
 64.00  1237.48  1237.48 14718573    0.00    0.00  EntityList::AICheckCloseArrgo(Mob*, float, float)
 11.85  1466.65  229.17 3970091636    0.00    0.00  LinkedListIterator<Mob*>::Advance()
  5.73  1577.41  110.76 3985303690    0.00    0.00  LinkedListIterator<Mob*>::MoreElements()
  2.92  1633.95    56.54 1261006787    0.00    0.00  Timer::Check(bool)
  2.21  1676.74    42.79 128348838    0.00    0.00  Mob::AI_Process()
  1.84  1712.26    35.51 128348838    0.00    0.00  NPC::Process()
  1.43  1739.93    27.68      344    0.08    0.08  LinkedListIterator<Mob*>::RemoveCurrent(bool)
  1.38  1766.52    26.59 3970146525    0.00    0.00  LinkedListIterator<Mob*>::GetData()
  0.92  1784.30    17.78 65021129    0.00    0.00  NPC::GetFactionCon(Mob*)
  0.90  1801.78    17.48 128348838    0.00    0.00  Mob::SpellProcess()
  0.76  1816.52    14.73 65021473    0.00    0.00  Mob::IsInvisible(Mob*)
...

you can get the full file (incluing call graph) and more examples at :
http://perso.wanadoo.fr/afou/khalzed...qzone.gmon.out
http://perso.wanadoo.fr/afou/khalzed...qzone.gmon.out
http://perso.wanadoo.fr/afou/khalzed...qzone.gmon.out
http://perso.wanadoo.fr/afou/khalzed...dzone.gmon.out
http://perso.wanadoo.fr/afou/khalzed...dzone.gmon.out

or all in a pack :
http://perso.wanadoo.fr/afou/khalzed...e-files.tar.gz

These are short runs, with one player if any.

If you don't bother to read, caching as it is now gives cpu load a rough 50% down, depending on the zone (AICheckCloseArrgo cumulated accounts for between 60-70% of non-cached version, divided by 10 or so in the cached version).

'm not sure about the oddity of a hash or vector class, instead of inline manipulation

'don't know about bounding boxes extensions. It's like making a function recursive to non recursive, or vice-versa. It's always possible, but .. i'll work in it afterwards

Eglin 02-06-2004 12:40 AM

Quote:

Originally Posted by smogo
'don't know about bounding boxes extensions. It's like making a function recursive to non recursive, or vice-versa. It's always possible, but .. i'll work in it afterwards

Don't really follow you there. Maybe I misinterpreted your strategy, but I thought it consisted of caching a list of all entities within some bounding area for each mob. When thinking in those terms it occurred to me that it would probably make sense to just bound mobs with a square and check for inclusion within that square instead of checking for radial distance. You give up a little accuracy, but I don't think it would really matter in this case. Checking to see if a point lies within a square requires two subtractions and two comparisons. Proper vector distance calculations for radial inclusion usually require 3 multiplies an addition a subtraction and a comparison. I haven't looked at the code for this lately, but it does still do a vector distance calc, doesn't it?

Maybe I was unnecessarily confusing by relating this to your plans. It just seemed to me that you'd already have to deal with bounding volumes in order to generate your caches.

EDIT: ack! what am I thinking? More like 3 multiplies, 3 subtracts, an add, and a comparison (for 2d, and the add/subtract are the expensive ops for floats).

smogo 02-06-2004 01:17 AM

Quote:

Checking to see if a point lies within a square requires two subtractions and two comparisons. Proper vector distance calculations for radial inclusion usually require 3 multiplies an addition a subtraction and a comparison.I haven't looked at the code for this lately, but it does still do a vector distance calc, doesn't it?
the actual code does box-checking in AICheckCloseArrgo as far as i know, thus abs(x1-x2)<dist && abs(y1-y2) <dist.

In the Terrain code, it is exact-distance calc. However this is little overhead, as :
- those in the corner (pass square test, fail exact test) are not checked against lately in AICheckCloseArrgo), thus time saved in the end
- event though caching uses cpu (about 3% in very heavy zones), it wipes-out a lot of tests afterwards.

Eglin 02-06-2004 01:29 AM

Quote:

Originally Posted by smogo
Quote:

Checking to see if a point lies within a square requires two subtractions and two comparisons. Proper vector distance calculations for radial inclusion usually require 3 multiplies an addition a subtraction and a comparison.I haven't looked at the code for this lately, but it does still do a vector distance calc, doesn't it?
the actual code does box-checking in AICheckCloseArrgo as far as i know, thus abs(x1-x2)<dist && abs(y1-y2) <dist.

In the Terrain code, it is exact-distance calc. However this is little overhead, as :
- those in the corner (pass square test, fail exact test) are not checked against lately in AICheckCloseArrgo), thus time saved in the end
- event though caching uses cpu (about 3% in very heavy zones), it wipes-out a lot of tests afterwards.

Yeah, I remember when Trump added that box filter. I think it _adds_ to computation time instead of reducing it. My suggestion was that we rely solely on it, instead of using it as merely a filter. Maybe I'm out of my element, but bounding volumes seem like a very forward-looking solution. They are extremely fast and map nodes can be mapped onto them in a fairly straight-forward manner (Kay-Kajiya bounding trees, maybe?). i.e... you can just lop off portions of your bounding square to reflect walls which nets you line-of-sight calculations for free.

DeletedUser 02-06-2004 07:21 AM

The reason for the CPU usage problem is moreso due to memory leaks, when the ram usage becomes to the point where it starts using the hard drive as ram (pagefile) thats when the 'shit hits the fan'. Other than that its more of the fact that I am currently running 15 STATIC zones (not bootable) per computer or more (This changes tonight as im setting up the new computers). And the computers are outdated (p3 733, athlon 600), the 'good' computers are xp 1700 and xp 2000. But if you put enough on them they can't handle it obviousely. But compared to what EQEMu was a while ago we are doing much better IMO.

smogo 02-06-2004 07:44 AM

Quote:

the computers are outdated (p3 733, athlon 600)
Almost, mine is AMD K6-2, 475Mhz. ;)

I can run up to 10 zones, like kaladima, kaladimb, nexus, i.e up to 50 npcs only per zone, average 10M memory and 1 to 5 % cpu (that's what top says).

But when it comes to more populated zones, eg around 150 mobs, it goes up to 40-50% cpu, even more wih the pets spawning

i could not run any zone more than 200 mobs before this cache, due to cpu at first glance.

Load is, as far as AICheckCloseArrgo is concerned, N*N. thus for 150 mobs it accounts 70% of zone load.

Anyway, with the cache, i could run 500+ mobs in a zone on that very machine ;) Memory then is about 40-50M, growing up slowly.

Eglin 02-06-2004 08:18 AM

Even so, the idea of bounding boxes is growing on me. It is looking like a really elegant solution to the line of sight problem. In the simplest case, you attach a vector of rectangles to each mob. One rectangle would be the mob's aggro area and the others would be facets cut away by occluding intercepts like walls. Determining line-of-sight then becomes a simple matter of taking the difference of our rectangles and testing for inclusion of a point. Extend to 3d as necessary. The math is simple and the bounding areas/volumes can be precomputed and stored in the db since mobs always spawn in the same location AFAIK. Contrast this to the existing method (or at least the method I saw suggested by Wiz), which says that if you want to find out if you can see from a to b you should take delta-sized steps in the direction of b and check at each step to see if you're standing on a wall (I doubt there is an acceptable delta in terms of both performance and wall-thickness).

The prime need for line of sight calculations that I'm aware of right now stems from the fact that radial distances are the only way to specify aggro distance. I.e... in a dungeon, you can't specify that a monster in a corner should only attack critters in the same room. Even without using multiple rectangles per mob, moving to a rectangular bounding area would allow more intelligent specification of aggro ranges (IMHO).


All times are GMT -4. The time now is 08:47 AM.

Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.