Better use of entity lists? [Archive]

Trumpcard

11-21-2003, 01:20 AM

Currently, the way eqemu is designed , we create one big entity list for the entire zone and put everything into it...

Doors, spawns, zone objects, etc...

We search through this entire list everytime we need to do something in there, whether it be mob,door or object related, so we execute ALOT of iterator.reset(), then while (iterator.MoreElements())

This is obviously a performance monster..

Anyone have any ideas for better ways to handle this? We've talked about a few , but nothing has jumped out yet.

Seperating some things out, say doors into their own list would help out a bit, but Im not sure of the full ramifications of doing that at the moment. Im wondering if there might be a better container we could implement for better performance or a technique we could use for 'indexing' our list, so we only have to search through what we need.

Anyone with any ideas feel free to post. While doing combat performance testing, the main CPU hogs are the distance checks , which we're working on, and the maintence on the iterator, ie running through it so many times..

If anyone wants to do an experiement (which i've wanted to do for awhile but havent had time due to family life ! ), create 2 test programs, one using an out of the box STL LinkedList/Vector/Deque/etc , create a list of a few million elements and do some timing on it to see how long it takes to perform various operations, then do the same exact thing using eqemu's linked list implementation.

Numbers from that will give us a good idea if the performance of our linked list is close to that of the STL one, or if we could make serious gains by switching over to something else..

Heres a good link for anyone that is interested..

http://www.tantalon.com/pete/cppopt/appendix.htm#AppendixA

-----------------------------------------------
1.11 0.86 25523/25523 Mob::AI_Process() [4]
[5] 87.5 1.11 0.86 25523 EntityList::AICheckCloseArrgo(Mob*,
float, float) [5]
0.51 0.00 18274468/18274884 Mob::Dist(Mob const&) [6]
0.07 0.00 75803310/81363398 LinkedListIterator<Entity*>::
GetData() [9]
0.06 0.00 20928860/25042202 LinkedListIterator<Entity*>::
Advance() [11]
0.05 0.00 20954383/25075944 LinkedListIterator<Entity*>::
MoreElements() [12]
0.05 0.00 18274468/20299491 Entity::IsClient() [13]
0.04 0.00 18299991/18300531 Entity::IsCorpse() [15]
0.03 0.00 18299991/19729699 Mob::IsMob() [18]
0.02 0.00 18274468/19098228 Entity::CastToMob() [29]
0.01 0.00 25523/32922 LinkedListIterator<Entity*>::Re
set() [50]
0.00 0.00 2628869/2929732 Entity::IsMob() [128]
0.00 0.00 96894/116226 NPC::GetFactionCon(Mob*) [135]
0.00 0.00 96894/96894 Mob::InZone() [136]
0.00 0.00 25523/27627 NPC::IsNPC() [139]
-----------------------------------------------

Here's where alot of the work is being spent, dist checks are the big hitter, followed by time spent with the various iterator functions. The reason the iterators are so bulky is just due to the sheer number of times we run through them.. Notice the number's
( 18274468/18274884)

this was just starting a static zone, not logging in.. This is a count of the number of times the method was called... Amazing how huge this number is for something that runs for less than 30 seconds.. we could significantly increase performance if we both cut down the number of calls, the size of the list, and look at the efficiency of the container....

Merth

11-21-2003, 02:42 AM

Looking at the numbers, I don't think the linked list operations are hurting you at all.

First of all, could we clarify what the columns mean? I would guess the first column is a % of cpu time spent doing that function. I would also guess the second column is the amount of time spent in that function.

Second, if those are correct, let's do a summation of all linked list operations, and compare this to distance checking functions:

Distance Checking (AICheckCloseArrgo + Dist)
88.01%
0.86s

Linked List Operations (Includes IsXXXXX())
0.33%
0.00s

0.00 seconds is pretty fast!

Yeah, there are optimizations to be made with linked lists. However, I doubt the speed gained would offset the maintainability of the code lost.

Trumpcard

11-21-2003, 03:10 AM

Actually, this is a better list to look at.. Raw gprof info is hard to sort through..

Heres the flat profile..

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
49.33 1.11 1.11 25523 0.04 0.08 EntityList::AICheckCloseAr
rgo(Mob*, float, float)
22.67 1.62 0.51 18274884 0.00 0.00 Mob::Dist(Mob const&)
3.56 1.70 0.08 81363398 0.00 0.00 LinkedListIterator<Entity*
>::GetData()
3.33 1.77 0.07 25042202 0.00 0.00 LinkedListIterator<Entity*
>::Advance()
2.89 1.84 0.07 25075944 0.00 0.00 LinkedListIterator<Entity*
>::MoreElements()
2.67 1.90 0.06 20299491 0.00 0.00 Entity::IsClient()
1.78 1.94 0.04 18300531 0.00 0.00 Entity::IsCorpse()
1.56 1.98 0.04 19729699 0.00 0.00 Mob::IsMob()
1.11 2.00 0.03 19098228 0.00 0.00 Entity::CastToMob()
1.11 2.02 0.03 sqrtf

This is total execution time metrics for each indivudual function.. Normally flat views arent very useful, but in our case they are.. All the expensive operations come from the same method, AICheckA

Now, what I posted before was a different view.

Here are the column names

index % time self children called name

and the descriptions..

% the percentage of the total running time of the
time program used by this function.

cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.

self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.

calls the number of times this function was invoked, if
this function is profiled, else blank.

self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.

total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.

name the name of the function. This is the minor sort

What I pasted in the last message was basiclly a performance call stack for AICheck...

so, the 1.11 indicates the # of seconds, not percentages, and .51 (Dist) , .07 (Iterator), .06 (Iterator) should all together add up to 1.11 , so you're not adding percentages, you should be adding seconds... Suffice too say, the iterator operators add up

AICheckClose took 1.11 seconds, and its call tree involved
Dist (.51), roughly 1/2, then the linked list ops, 0.07, .06, .05

~.2 .. Still heavy.. Then all the calls too IsClient, IsCorpse, IsMob, add up to the total time of AICheck..

Each time we run through the iterator, we incur the costs of Iterator.Advance, GetData, etc, and additionally running the IsChecks.

AICheck makes up 50% of the CPU cycles in zone, and this is just breaking that down into whats heaviest in it.

You can look at the entire profile
http://denial.dyndns.org/hma/eqemu/zone.profout

Obviously, the dist check is the big hitter... But behind them comes the time spent in iterator operations.

One thing is we're using a doubly linked list, even though we only do forward operations from what I can tell. We could probably reduce this to a singularly linked list to speed things up, unless we can identify cases when we want to go forward then backwards.

Scorpious2k

11-21-2003, 04:32 AM

One idea might be to implement a lookaside methodology.

Check to see if this search is for the same item as the last, if so return the same results without searching. This would help if there are frequent calls for the same value.

You could expand this by keeping a list of the last X number of values and results, searching it first. For frequent calls with the same values, this would reduce searches, especially of long lists. If same values aren't searched often, the second would hurt performance a little depending upon the size of the recent call results list.

Trumpcard

11-21-2003, 06:36 AM

Ah ha! Know how to cut out most of the dist calls. A lite weight short circuited axis check. If the magnitude is greater than the AGGRO dist on any one axis, theres no point in calling the dist function....

if ( (abs ( x1 - x2 ) > AGGRODISTANCE ) || ( abs ( y1 - y2) > AGGRODISTANCE ) || ( abs (z1 - z2) > AGGRODISTANCE )
Advance();
else
continue....

if one x is 100, and the other is 0, there is no way the distance is going to be less than the dist formula calculated dist, so theres no point in calling that function...

Nice thing about shortcircuited || operations is they will only evaluate until they find a true case, so if delta X > AD, it will drop out, and not perform the other 2 calculations and comparison...

Trumpcard

11-21-2003, 07:06 AM

Eureka !

Success !

Added this little gem into AICheckCloseArrgo

h the other information
if( (abs (mob->GetX() - sender->GetX() ) > iArrgoRange)
|| (abs (mob->GetY() - sender->GetY() ) > iArrgoRange)|| (abs (mob->GetZ() - sen
der->GetZ() ) > iArrgoRange) || (mob->IsInvisible(sender))
|| (mob->IsClient() && (!mob->CastToClient()->Connect
ed()
|| mob->CastToClient()->IsLD()
|| mob->CastToClient()->IsBecomeNPC()
|| mob->CastToClient()->GetGM())

and heres the new profile results...

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
47.01 0.55 0.55 15823 0.03 0.06 EntityList::AICheckCloseAr
rgo(Mob*, float, float)
7.69 0.64 0.09 16233344 0.00 0.00 LinkedListIterator<Entity*
>::MoreElements()
5.98 0.71 0.07 12582403 0.00 0.00 Mob::IsMob()
5.56 0.78 0.07 16210547 0.00 0.00 LinkedListIterator<Entity*
>::Advance()
5.13 0.83 0.06 51483587 0.00 0.00 LinkedListIterator<Entity*
>::GetData()
5.13 0.90 0.06 11345291 0.00 0.00 Entity::IsCorpse()
5.13 0.95 0.06 sqrtf
2.56 0.98 0.03 717 0.04 0.07 Parser::DelChatAndItemVars
(unsigned int)
1.71 1.00 0.02 32982 0.00 0.03 Mob::AI_Process()

Heres how often dist is now being called in comparison..

0.00 1.17 0.00 40869 0.00 0.00 itoa(int)
0.00 1.17 0.00 37034 0.00 0.00 Mob::Dist(Mob const&)
0.00 1.17 0.00 36766 0.00 0.00 Mob::InZone()

doing my little victory dance....

Now the iterators are the number one pain !

rmm99

11-21-2003, 07:16 AM

Nice work Trumpcard but humor me please, I'm still pouring over the code you all worked so hard on trying to figure out how you all work the magic. With that change you made, doesn't that mean that as soon as you run out of aggro range of a mob that you may already have aggro'd, they will not have a turn to a)summon you if they have that ability, b)continue to chase post-aggro'd. Once you reach the aggro range barrier, the AI in effect turns off all hostility for that mob.

NICE change in the numbers though (to this untrained eye anyways). Installing a compiler so I can actually compile it myself to test your "little gem" and see how this helps in real game situations regarding the zone is going nuts with lots of mobs.

Trumpcard

11-21-2003, 07:35 AM

Honesty can't answer that at the moment.. Figured this out at work where I can't test, i can only telent into my server and start the zone , cant run the client , but that sounds like a valid concern.. Might need to check to see if the mob has a target before short circuiting based on the distance.. Good thought though, cant say until i try it.

Compile it in and see what it does...

Merth

11-21-2003, 07:41 AM

Good job, Trumpcard. Now, how about we tackle Mob::Dist()?

I see in the code we have variants that approximate the distance using a less CPU-intensive calculation. Have we tried messing around with these as substitutes for the function calls that consume the most CPU?

As a thought experiment, how would you do this in a professional environment? This distance function ignores the node map for a zone. For example, what if the mob is on the other side of a wall? Iterating over a node map would introduce a pathfinding algorithm that would definitely increase CPU cycles. How would you fine tune this algorithm in this case?

Trumpcard

11-21-2003, 07:58 AM

Regarding going out of agro range, i dont think it will matter, its no different than the if condition thats done AFTER the dist calculation in that regards so i dont think it should matter.. It shouldnt change how it works..

The variants I see are just ones that dont that the square root into account, they simply square and add the delta x and y componets..
They are only less CPU intensive because they dont actually calc the distance, but the distance^2

One way to get around the sqrt function is to square the dist, but we'd need to compare the cost of op's of a sqrt vs. a square as its trading one operation for another.

Theres really no point in taking a wall into account in a distance calc before checking for the existance of a line of sight connection between the nodes, if that failed, then there would be no point in going further..

If we represent a wall as a line, its relatively simple to determine if we intersect another line on our trip from point a to point b, the best best would be an ifVisible() check that would determine a LOS based on the node map... I dont have alot of experience using them, but I dont think they'd be too difficult to figure out...

Merth

11-21-2003, 09:21 AM

I've been going through the networking code and making small linked list optimizations. It won't affect the bottom line much, but you may want to do the same for the code you are dealing with.

Old code:
LinkedListIterator<EQNetworkConnection*> iterator(*list);
iterator.Reset();
while (iterator.MoreElements()) {
if (iterator.GetData()->IsFree() && (!iterator.GetData()->CheckNetActive())) {
iterator.RemoveCurrent();
}
else {
iterator.GetData()->Process(sock);
iterator.Advance();
}
}

Optimized code:
LinkedListIterator<EQNetworkConnection*> iterator(*list);
iterator.Reset();
while (iterator.MoreElements()) {
EQNetworkConnection* eqnc_data = iterator.GetData();
if (eqnc_data->IsFree() && (!eqnc_data->CheckNetActive())) {
iterator.RemoveCurrent();
}
else {
eqnc_data->Process(sock);
iterator.Advance();
}
}

*EDIT* I also noticed this sort of behavior with the CastToXXXX() type functions. You may also want to cache that call, as it is sometimes invoked repeatedly, unnecessarily.

kai_shadowbane

11-21-2003, 09:49 AM

The only thing I can't see about the wall issue is the factor of mobs aggroing via *hearing* you instead of seeing (thereby negating the line of sight theory) and wouldn't therefore a wall check be needed, or is it not factored in here with this equation?

And the idea of using the wall as a line of sight to hide behind after you have aggro'd something and then ran (out of sight, behind a wall, so it wouldn't still aggro around a corner or through a wall when it shouldn't)

Trumpcard

11-21-2003, 10:19 AM

Who said anything about hearing ?

We're talking about walls hiding agro, or influencing checks.. lets get sight down before we wander into smell, taste or sound...

We also not talking about handling already agro'd mobs, we're talking about handling the initial checks for agro

kai_shadowbane

11-21-2003, 10:33 AM

An idea, tho dunno how you would do it.
Right now it looks like you're checking aggro on everything.
Could you check aggro radius on the player or moving mobs instead after the initial startup or respawn check?

Like keep a list of what's in range of the moving mob or player, that way you would only have to reference what's in range, you could handle less things at once.

Like say it's a small zone (for example) and there are 45 mobs. the mob aggro ranges are 100, 45, and 30. So when the player enters the area, he checks for mobs in range 100, 45, 30 and so on until he gets in range of a mob. Then it runs the aggro check to see if that specific mob would attack.

That way, instead of referencing all the mobs, you'd only have to reference those that are moving, and the players. (Since if the mob isn't moving, it's not going in range of anything unless it's coming to them.)

Trumpcard

11-21-2003, 10:53 AM

If you dont calculate the distance, how do you know he's in range ?

You have to calculate the distance to know that.. When you move towards him , you dont know he's in range until you recalculate how far the mob now is...

Merth: You have a good point, we dereference WAY too much... I put that change in for ya ;-)

Armanthuz

11-24-2003, 08:50 AM

Why not divide each grid into subgrids when you load the zone up. Would have to be sorted once upon load(based on location coords). Then depending uopn location of the person you could start the search with a subgrid of localized data. If not found there then sort nearest subgrids, until finally searching farthest ones.

In theory you could kill 2 birds with one stone and use subgrids to further reduce dist check time by reducing the possible sets of data you have to compare too.

Valtin

11-24-2003, 10:51 AM

Regarding the original proposition of faster access to the EntityList - I've not looked at the code for too long but it seems there is a lot of commitment in terms of coding/time/learning invested in the existing structure, it would be painful to discard that.

Another problem is that the number of elements of a particular type varies enormously by zone. For example in freeport the doors elements are quite high, in lavastorm, not so many. So dividing into separate doors, items, etc, wouldn't be so effective for many zones. Short lists are as wasteful as long ones in time terms.

One way of handling even breaking up of long lists where you have a quickly accessible int based ID is just to use a hashtable structure over the linked lists. So the lists still remain we just have an array of n ElementLists then use (ID MOD n) to allocate an object with a given ID to the appropriate list. This gives a fairly even distribution of Elements to lists without caring too much about their type, the hashing operation is very quick so shouldn't introduce excessive overhead and in larger zones n can be set to a greater number.

S'cuse my C++ being rusty, I've been playing java only for the last few years but just to give the rough idea.

class EntityFarm
{
private:
int n = 20;
EntityList* lists[n];
public:
EntityFarm() { }
~EntityFarm() {}
int16 Hash(int16 ID ) { return (ID MOD n; }
Mob* GetMob(int16 id) { &lists[Hash(id)]->GetMob(id)

....

Yeah, I know the syntax is wrong in there somewhere you get the idea though. The initial cost of deciding on the right list to use *should* be paid off in the, fairly even, allocation of entities to lists and it also gives another tuning layer without adding too much complexity (by changing n). The Entity farm is just a wrapper on the EntityList so most if not all of the methods can be inline.

From your timings it doesn't look as though this aspect is *that* time critical though. Just offering this up as an idea for consideration in case you do need to address the list problem.

Eglin

11-24-2003, 06:39 PM

Regarding the original proposition of faster access to the EntityList - I've not looked at the code for too long but it seems there is a lot of commitment in terms of coding/time/learning invested in the existing structure, it would be painful to discard that.

I'm not convinved that there is a definite need to optimize yet (which explains why I haven't implemented/suggested code changes). OTOH, changing list<> to set<> wouldn't be so very difficult. The container is already built into the language, and the semantics are almost identical. The base entity would only need to implement some sort of arbitrary comparison operation.

Short lists are as wasteful as long ones in time terms.*brushes off his math books* Actually, for element searches like the ones Trump mentioned at the beginning of the thread, your statement is exactly opposite the truth. Searching unsorted linked-lists is an O(n) operation. That means that the cost goes up in step with the size of elements. Using an stl set (which is implemented w/ a red-black binary search tree) would reduce search time to O(logn). As you can see, the difference for small n is negligable. So, to make a long story short - short lists are not as wasteful as long ones.

One way of handling even breaking up of long lists where you have a quickly accessible int based ID is just to use a hashtable structure over the linked lists.I fully agree. Hashtables are sexy. An item search in a hash table drops down to O(1) time - it doesn't matter how many items you store. The bad news is that ANSI C++ doesn't yet include a hash-table container (although they are currently considering proposals). Most compilers still contain some form of hashmap, but I find them hard to recommend for cross-platform solutions.

From your timings it doesn't look as though this aspect is *that* time critical though. Just offering this up as an idea for consideration in case you do need to address the list problem.
I agree. My time is better spent adding/improving functionality at this point.

Valtin

11-24-2003, 09:58 PM

Agree with your point about lists being linear so shorter ones being quicker to search than long ones Eglin. I was meaning that the pre-search code for list allocation becomes more significant in the total time. i.e. pre-search list allocation time makes things slower, especially when you have a large number of shorter lists. Conversely the pre-search time tends to be absorbed more into a lower number of 'longer list' searches.

That's the point I was trying to make, that a balance is needed - the ultimate step is to have n lists of length 1 where the id is a one to one allocator for the list holding the single element. I didn't express myself at all well though. (and still aren't sure that I have LOL)

Not a big issue really, I agree with you basically just didn't want you thinking I didn't know it takes longer to search a linked list of 1000 things than 10 things. :wink:

I'm sorta confused as to why a hashtable or any other kind of structure not being a standard container at this point should be a problem. Its nice if there is a standard implementation of some structure but there is nothing wrong with creating your own. As long as you do it in a disciplined way the cross-platform element isn't an issue.

That said, I absolutely agree your time is better spent elsewhere though - from experience its only worth messing with something this central if you have an explicit problem. Just noticed that nobody had addressed trumpcard's original question so thought I'd throw some thoughts in for what its worth :wink:

Anyway back to working through the code for me so I can discuss the issues a bit more intelligently.

Trumpcard

11-25-2003, 12:32 AM

The type of collection you use really needs to be defined by the bulk of the operations you are going to be performing...

Are you searching, adding/removing elements to the front/back of the collection, etc..

Our list does all of those pretty regularly..

Consider the performance impacts of how we do it now, everytime you make a combat hit, the iterator has to loop through the entire entity list (lets say 20 clients, 400 mobs, 200 doors, 100 objects), find all the clients (performing an isClient check on each entity), check their distance from you, and if they're within a certain range, spit out a message to your client to inform you of the hit...

As you can imagine, this is highly unoptimized.. Will a new collection help? Only for searches (look at GetMob) really...

I think probably the simpliest and best solution is to just split the lists up, entity_list for mob, client_list, etc, and the calls changed to use the right one, this way youre drasticlly reducing the number of objects your having to compare against to find the one you want.

Since we regularly insert/delete (spawn, mob dies, etc), the efficiency boost of a sorted list like a hash map is questionable (will the speedup in searches offset the decrease in efficiency of inserts/deletes, a sorted list has to resort the entire table everytime something changes)

Ultimately, probably not, a hashmap or binary tree is probably the best option, and splitting the lists out, even if we stuck to linked lists, would improve the performance immensely... That in itself would probably give us the most bang for the buck (the biggest hike in performance with the least amount of effort)

I like a RB binary tree personally, but it might be a pain in the ass to implement...

Scorpious2k

11-25-2003, 01:29 AM

I think probably the simpliest and best solution is to just split the lists up, entity_list for mob, client_list, etc, and the calls changed to use the right one

What if you use a psuedo index... when the list is loaded, save a pointer to the start of the type of data contained at that point: pointer for start of mobs, clients, doors, etc. And a count of how many there are.

You could then start searching at that point in the list for a count = to the number of entries for that type. Seems like if this would be possible, it would be the smallest amount of change to the program, giving the fastest method to implement and the least amount of potential bugs produced because of the change.

Valtin

11-25-2003, 03:52 AM

Hmm, makes sense to split them out if the functionality is so different Trumpcard. I see your point about having to decide what every entity is before performing an operation rather than just addressing the clients, as per your example. I agree just by not doing that every round you'll drastically increase performance.

Sorted structures will only help if you are searching for a specific item in the structure, if you are walking it all anyway (for distance checks) there is no advantage, only overhead. If they are unsorted though Scorpious2k's indexing won't work as the doors, say, could be at any point in the list rather than starting at n and extending to n+x. If they are ordered, its an excellent solution though.

Would there be an advantage in structured representation of the data such as items or doors which are more likely to be looked up individually, while leaving ones which need walking such as clients and mobs unordered? I could see advantages for doing an ordered representation for doors as they are pretty static so the sorting overhead would be negligible. If so that would be a good argument for breaking them out as you say.

Merth

11-25-2003, 04:10 AM

A couple of things to keep in mind when splitting up the entity list...

There is a thread in the application that acts as the "game pulse". This means all of the logic for the application is done repeatedly, with a 1/60-th sec pause between each pulse.

1. While optimizing things like attack can be beneficial, optimizing operations that happen each pulse is more critical. Look in net.cpp to see what happens on each pulse. Since this main thread is the CPU-hog for the app, this is the first place to look to cut down on CPU usage.

2. One critical bit of functionality inside the pulse is where each entity in the entity_list is told to save itself. If you split up the entity_list, make sure to keep this in mind. Preserving polymorphism here would be best.