PDA

View Full Version : If you're running linux and would like to help!


Trumpcard
11-18-2003, 03:37 AM
I've been doing testing on EQEMU using several linux tools such as LeakTracer and gprof to try and locate memory problems and bottlenecks in the code.

My problem is typically that I don't generate load on my machines (more than 1 client)

To assist, you can install and build LeakTracer on your machine , and run at least one zone using it, and get us some test data that covers use cases of several users in the zone doing different things. LeakTracer is nice, as you can preload the .so into your lib path, and all you really need is a zone executable compiled with -g so the leak-analyze script can use gdb to find the location of the problem.

LeakTracer is available at http://www.andreasen.org/LeakTracer/


Additionally, something else you can do, and at the same time is compiling the zone with -pg to enable performance statistics, then running the .out file through gprof , and dumping the results to a txt file. This will show where the code is spending all its time in execution.

I have attempted running zone through Valgrind, but so far have been unsuccessful getting it to work. Valgrind significantly slows code execution, and this causes too much delay in zone and the client believes the zone is unavailable.

If anyone runs a 24x7 linux server that gets decent load that would like to help, your assistance would be appreciated.

My tests have uncovered several issues we have been able to correct, but having someone test on a larger scale basis will give us more complete code coverage, and possibly broader results.

Thanks!

arkaria
11-18-2003, 04:23 AM
I run a 24/7 server but I don't get the impression it gets much traffic.

Would it be worth me running LeakTrace? If so I'd be happy to.

a_Guest03
11-18-2003, 04:52 AM
In January, I'm starting a web resale business, and I'll be buying hosting space and bandwidth. I don't think I'll be using 100% of the bandwidth (100GB up/down, or as low as 20GB up/down). I'll see what I can do about hosting strange processes like world before I sign the little contract.

I can choose a linux or a windows server and I'm going linux.

Trumpcard
11-18-2003, 04:54 AM
Every little bit helps! Doenst need a ton, as long as we get a few people running in there.. What I want to see is RANDOM functionality executed, when I test, I test specific cases, so the cases are locked to what Im working on.

By letting random people run around and execute, you get a better model of 'real world usage', hence the result set will be much more rounded.

arkaria
11-18-2003, 05:31 AM
Should I be adding -g to the make file or just at the command line "make -g"

a_Guest03
11-18-2003, 05:45 AM
Trump, what kind of server are we looking at here? Do I need 256M+ RAM for sure, or 1G, 512? I think speedwise, I will be lucky to get a 1.8 hosted...

I'm a poor college kid :P But I'll do what I can to host some.

Trumpcard
11-18-2003, 05:54 AM
Doesnt matter, the code should perform at the same relative level regardles of the speed/memory.

You can test on one zone only, you just need to make sure all the folks zone into the same zone to perform the testing.

You add -g into your FLAGS in the makefile, not to the make commandline

arkaria
11-18-2003, 06:24 AM
Ok zone is recompiled with -g and I'm running zone via LeakCheck. Should I run every zone with LeakCheck or just one?

Trumpcard
11-18-2003, 07:05 AM
I would run just one with leaktrace

the best way to do it is to name one zone zone.leak or something seperate from the other zones, then run it using the startscript. I created my own start script using the leak.sh one provided with the program with the emsharemem.so lib appended onto the end, and the zone i wanted to start, and run it in a seperate window so you can control-C or control-Z out of the zone when you've completed the run.

This should produce a leak.out file.

The problem with running it on all zones is that each one will overwrite the file, the only way around that is to create a seperate directory for each running zone.

Experimentation will tell you what works best for you

arkaria
11-18-2003, 08:44 AM
I ran all 10 zones with the LeakTrace just to see what happenes. I now have a nice big "leak.out" file (40k).

Here is what "./leak-analyze zone leak.out" gives me.

http://www.1amos.com/leak.txt

bobzub
11-18-2003, 08:50 AM
I would like to help. I run fish-wolf. I will try to get this stuff
set up when I get some time here soon.

Trumpcard
11-18-2003, 09:22 AM
Ark,

I'm going to push out new CVS code this evening, can you rerun the tests using builds with it ?

No mem. leaks in this txt, just lots of allocation scheme differences. I'll check against our code base tonight and see if any of these are ones I havent corrected yet.

Also, it wont pick up .h files that are outside of the current directory, you can get around that by copying the common.h files into the executing directory (if you notice, servertalk.h is one thats listed as MIA)

Thanks!

arkaria
11-18-2003, 09:34 AM
Yeah I'll do that for sure. Just post here or something when CVS has been updated and I'll recompile.

I can copy all the .h files into my bin folder if that will help or make a difference

loderunner
11-18-2003, 09:51 AM
Was the source for the released version 0.5.0 binaries ever pushed to the public CVS? If it was I'm doing something wrong. I've been keeping up to date with CVS, but I've not gotten anything new since the 11/11/03 changelog. Just curious.

The CVS push you mentioned this evening, Trump, will that be dev or public? Just a bit anxious for some updated code to test/play with.

If you need additional leak testing, I normally run 5 zones on my little private server and I can have a few mates put a load on them, just let me know.

Thanks in advance. : )

arkaria
11-18-2003, 09:53 AM
The source for the 5.0 release was not pushed to CVS as of vesterday but there is a zip of the source available along side the bins.

more leak info

www.1amos.com/leak2.txt

Trumpcard
11-18-2003, 10:41 AM
I've already corrected all the ones you reports are displaying, that was done this weekend.

New code is being pushed out now, should be available in a few hours. Sourceforge can take time to recognize the updates.

Please test with this updated version. Also, I still believe running multiple leak zones will continully overwrite the leak.out file, so you might be eliminating information by running them all concurrently in the same directory. It is definitely not appending or there would be multiple occurences of each alloc scheme mismatch.

bobzub
11-18-2003, 11:54 AM
Sorry for being late. I hope I got things together properly, I had
to step away for a bit while this ran. Please let me know if this
helps at all. I asked people to zone in and out of gfaydark, which
is the zone I ran this on. Sorry if its a big large.

http://www.ubzub.com/fish-wolf/leak.out

-bobzub

Trumpcard
11-18-2003, 12:29 PM
That needs to be run through leak-analyze before the results will help....

leak analyze will use gdb to isolate the code

I could try but if my binary doesnt match yours, the source lines wouldnt match up, so run it through there and post the results..

arkaria
11-18-2003, 12:32 PM
bobzub
have a look at my first post of a leak.txt I gave the command you need to use. There is more info on analysing in the README

bobzub
11-18-2003, 12:43 PM
grrr figures I forget something, lemme get that done.

-bobzub

bobzub
11-18-2003, 12:49 PM
Apologies, here is resulting file:

http://www.ubzub.com/fish-wolf/leak.txt

hopefully I got this one right.

-bobzub

Trumpcard
11-18-2003, 10:18 PM
Odd, its listing the line numbers and where they're from, but not listing the source lines...

This is obviously the older code base, pull down CVS today and test again with the updated version if you dont mind. Good tests would be having mixtures of spell casters and melees fighting, that way we'd be exercising alot of the code in spells.cpp as well.

Thanks!

This is one I havent seen... I'm guessing the way point list isnt being cleaned up properly.. Havent really messed with that code much, but I'll look into it.. How many zones was this, and how long was it running for ?


#-- Leak: counted 1150x / total Size: 23000
0x80e588a is in Mob::AssignWaypoints(unsigned short) (MobAI.cpp:1274).
in MobAI.cpp



This one has been FIXED, we caught that one the other day in testing..


#-- Leak: counted 2x / total Size: 45404
0x80df355 is in Corpse::MakeLootRequestPackets(Client*, APPLAYER const*) (PlayerCorpse.cpp:527).
in PlayerCorpse.cpp


Think we got all of these..


#-- Leak: counted 330x / total Size: 6600
0x80b1dd1 is in Client::Damage(Mob*, int, unsigned short, unsigned char, bool, signed char, bool) (attack.cpp:932).
in attack.cpp

#-- Leak: counted 330x / total Size: 7590
0x80b1df7 is in Client::Damage(Mob*, int, unsigned short, unsigned char, bool, signed char, bool) (EQNetwork.h:69).
in EQNetwork.h

#-- Leak: counted 330x / total Size: 7590
0x80b1e36 is in Client::Damage(Mob*, int, unsigned short, unsigned char, bool, signed char, bool) (attack.cpp:933).
in attack.cpp

#-- Leak: counted 419x / total Size: 8380
0x80b31a5 is in NPC::Damage(Mob*, int, unsigned short, unsigned char, bool, signed char, bool) (attack.cpp:1354).
1353 in attack.cpp

#-- Leak: counted 419x / total Size: 9637
0x80b35a5 is in NPC::Damage(Mob*, int, unsigned short, unsigned char, bool, signed char, bool) (EQNetwork.h:69).
in EQNetwork.h


The ones that show up in EQNetwork.h are from the APPlayer constructor, they should be getting cleaned up when the ~ deconstructor is called..

-------------------------------------------------------------------------

Not too sure about these..


#-- Leak: counted 1538x / total Size: 12304
0x80fd4c7 is in Parser::LoadScript(int, char const*) (atomicity.h:50).
49 __atomic_add (volatile _Atomic_word* __mem, int __val)
50 {

#-- Leak: counted 1538x / total Size: 12304
0x80fd4f6 is in Parser::LoadScript(int, char const*) (parser.cpp:984).
in parser.cpp


Not sure how this code works but..


EventList* event1 = new EventList;
Events * NewEventList = new Events;
while (file && !file.eof())
{
getline(file,line);
string::iterator iterator = line.begin();
while (*iterator)
{
if (iterator[0] == '/' && iterator[1] == '/') break;
if (!ignore && *iterator == '/' && iterator[1] == '*') { ignore++; iterator++; iterator++; }
if (*iterator == '*' && iterator[1] == '/') { ignore--; iterator++; iterator++; }
if (!ignore && (strchr(charIn,*iterator) || quote || paren))
buffer+=*iterator;
if (!ignore)
{
if (*iterator == '{')
{
bracket++;
if (bracket == 1)
{
event1 = new EventList;
NewEventList->npcid = npcid;
buffer.replace(buffer.length()-1,buffer.length(),"");
event1->event = buffer;
buffer="";
}
}




I see event1 is being new'd twice, not sure why it needs to be new'd again if bracket==1 . I think this is most likely a typo, why create a new over the same variable name ? Im pretty sure this is going to discard the memory address of the 1st one, and leak it, but im not 100% sure.

Also, there are no corresponding deletes to the list new's , probably need to be added to the deconstructor also

kathgar
11-19-2003, 01:43 AM
All of those death/damage ones should be fixed, loot one is fixed.
With the parser ones.. i'm not sure parser ever cleans up after itself. It's a lot of code to go through but I got the impression that even if that zone is unbooted things will still be loaded.. obviously as 'bad thing'(tm).

Thank you all for your help.

arkaria
11-19-2003, 03:34 AM
Has public CVS been updated? Looking at the mod dates on the files there are only 3 that have changed in the past 7 days:
zone.leak
splintreport.txt
splint.sh

Trumpcard
11-19-2003, 03:49 AM
Yes, I pushed it out both last night and this morning. I just removed zone.leak and the splint stuff, that was an accidental checkin, along with world.386 and world.pent.

I think that LE has started updating CVS at the same time he makes changes to the dev only CVS, and that might be causing conflicts in my jobs..

I'll take a look tonight, I might have to rewrite the CVS jobs... They were originally written with the idea that I'd be the only one putting updates in CVS , so the method I used was very simple and primitive, and probably prone to problems when other people put changes directly into CVS.

arkaria
11-19-2003, 04:06 AM
I think I'm missing something here.

So CVS has been updated? If so why are all the files dated Nov 11? And the .leak file is also still there.

Trumpcard
11-19-2003, 05:23 AM
If you delete your soruce directories, and pull everything down fresh, it won't be there...

assuming you use anonymous 'cvs checkout' to pull code code down as opposed to that junky web interface.

bobzub
11-19-2003, 05:27 AM
Lets see...

I ran the leakcheck script on a single instance of zone, which I
compiled with the -g flag. For that zone, I ran a static area,
which was gfaydark. I asked people on the server to zone in
and out, but I have no idea what they did since I asked via
console. At that point I had to leave, and I was away for maybe
an hour. So it ran for about an hour.

I will grab CVS from today and compile it. If all goes well
I will run leakcheck on a zone again, and try to get people
into the zone to help out.

Thanks,

-bobzub

bobzub
11-19-2003, 05:33 AM
Sorry to ask about his, but I just did a fresh pull from cvs after
moving my eqemu dir to another location. I was looking in
common/version.h, and notice DR5 still in it. Is this correct?


-bobzub

arkaria
11-19-2003, 05:37 AM
Yeah same thing here. I always pull a fresh copy from CVS via the command line and all the files are from Nov 11th. Public CVS has not been updated since Pre 5.0 release.

Trumpcard
11-19-2003, 06:51 AM
Please read over what I typed before


I think that LE has started updating CVS at the same time he makes changes to the dev only CVS, and that might be causing conflicts in my jobs..

I'll take a look tonight, I might have to rewrite the CVS jobs... They were originally written with the idea that I'd be the only one putting updates in CVS , so the method I used was very simple and primitive, and probably prone to problems when other people put changes directly into CVS.

So in a nutshell, no.. CVS ISNT UPDATED YET..

arkaria
11-19-2003, 08:13 PM
I just compiled the new (file dates of nov 19) with the -g option and ran one of my zone processes with LeakCheck and the first thing it did was create a 54MB leak.out file in the first 20 seconds
then settled down.

http://www.1amos.com/leak3.txt

Trumpcard
11-20-2003, 02:16 AM
No leaks there, just some dealloc hits..

To get really good data, the zone needs to run for quite a while and have various people doing random things in there. This ensures we are getting good code coverage execution....

Heres a short lesson in enterprise coding... skip this if you're not interested..
-----------------------------------------------------------------------------------
For those that are new to code coverage, here's a lesson on it and what it means and why its important...

In enterprise development (real world coding), you have what are called test cases or use cases that are intended to ensure that your code is executing as is designed. Say a course of behavior for a client using an application.

1) Client logs in.
2) Client checks his inventory and rearranges several items
3) Client goes to a merchant and buys something
4) Client logs out..

This series of steps indicates a specific 'test case' . Only certain parts of the code were executed, hence you can only verify that certain pieces are working. Combat could be completely shot, but you will never know until it is tested. This requires you to have a combat test case. And combat itself could be broken down into melee and nonmelee (spell casting) tests cases, which could be further broken down. to ensure good code coverage and testing means you need extensive test cases to adequetely test all the different pieces of the code...

This is where semi randomized testing comes in. Randomized testing doesnt cover true test causes, but it does give you a good idea of where the bugs are because of the rather random behavior of the users, and is often the best way to catch 'off the path errors'

Using this approach, allow the server to run for a long time and having multiple people randomly exercise pieces of the code that you dont normally test yourself gives you a good idea of whats working and whats not. In the case here, this semi random use will help us to determine problems in pieces of the code that we dont normally test ourselves.

Thats why bug reports are so important to eqemu. Without standardized regression tests and base test cases, it falls upon the users to find problems that we ourselves dont catch.

One of the problems with standard test cases is they are usually 'golden path' test cases, which means someone testing when someone did something right, but what if someone does something wrong, did you test for that? Someone accidently does something they didnt mean too, like target themselves instead of a mob for a spell.. Results could be unpredictable if someone hasnt taken that into account in the code! Thats one of the reason that defualts in switch statements are so important, you always need catch all rules when things dont behave as you expect they should......

Trumpcard
11-20-2003, 02:38 AM
Theres one too fix..


if (RunQuery(query, MakeAnyLenString(&query, "SELECT loottable_id, lootdrop_id, multiplier, probability FROM loottable_entries WHERE loottable_id=%i", tmpid), errbuf, &result2)) {
safe_delete_array(query);
tmpLT = (LootTable_Struct*) new uchar[sizeof(LootTable_Struct) + (sizeof(LootTableEntries_Struct) * mysql_num_rows(result2))];
memset(tmpLT, 0, sizeof(LootTable_Struct) + (sizeof(LootTableEntries_Struct) * mysql_num_rows(result2)));
tmpLT->NumEntries = mysql_num_rows(result2);
tmpLT->mincash = tmpmincash;
tmpLT->maxcash = tmpmaxcash;
tmpLT->avgcoin = tmpavgcoin;
i=0;
while ((row = mysql_fetch_row(result2))) {
if (i >= tmpLT->NumEntries) {
mysql_free_result(result);
mysql_free_result(result2);
cerr << "Error in Database::DBLoadLoot, i >= NumEntries" << endl;
return false;
}
tmpLT->Entries[i].lootdrop_id = atoi(row[1]);
tmpLT->Entries[i].multiplier = atoi(row[2]);
tmpLT->Entries[i].probability = atoi(row[3]);
i++;
}
if (!EMuShareMemDLL.Loot.cbAddLootTable(tmpid, tmpLT)) {
mysql_free_result(result);
mysql_free_result(result2);
safe_delete(tmpLT);
cout << "Error in Database::DBLoadLoot: !cbAddLootTable(" << tmpid << ")" << endl;
return false;
}
safe_delete(tmpLT);
mysql_free_result(result2);


The problem is here..

tmpLT = (LootTable_Struct*) new uchar[sizeof(LootTable_Struct) + (sizeof(LootTableEntries_Struct) * mysql_num_rows(result2))];

and its dealloc'd with

safe_delete(tmpLT);


That delete should be a safe_delete_array since it's new'd as uchar[x]



The one in seperator confuses me.. This is the deconstructor

91 ~Seperator() {
92 for (int i=0; i<=maxargnum; i++)
93 safe_delete(arg[i]);
94 safe_delete_array(arg);
95 safe_delete_array(argplus);
96 }


If you have an array of new'd objects, do you need to delete each member of the array, do you need to dealloc each member of the array, and then the array itself ?

arkaria
11-20-2003, 03:31 AM
I've had at least one player who spends alot of time on my server report this each time he tried to zone:

[thu Nov 20 07:33:02 2003] Error: Asyncronous save of your character failed.

He would get disconnected and this was in the log file.

He was unable to get out of the zone he was in (arena) After trying to zone and getting disconnected he would log back in and not have zoned. SO he was stuck in the zone he happened to be in bfore I upgraded the server.

*** Edit***

Where there maybe changes to the DB structure? I have to admit I didn't check.

Trumpcard
11-20-2003, 04:26 AM
Yes, db changes... Look in your zone stndard out and you should see the failure, and look at your zone.log

Chrysm
11-20-2003, 04:39 AM
Speaking of testing code, I read a while back about the program gcov put out by the gnu team.

Do you think this might help you?

Here is the url
http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html


8.1 Introduction to gcov
gcov is a test coverage program. Use it in concert with GNU CC to analyze your programs to help create more efficient, faster running code. You can use gcov as a profiling tool to help discover where your optimization efforts will best affect your code. You can also use gcov along with the other profiling tool, gprof, to assess which parts of your code use the greatest amount of computing time.

Profiling tools help you analyze your code's performance. Using a profiler such as gcov or gprof, you can find out some basic performance statistics, such as:


how often each line of code executes

what lines of code are actually executed

how much computing time each section of code uses
Once you know these things about how your code works when compiled, you can look at each module to see which modules should be optimized. gcov helps you determine where to work on optimization.

Software developers also use coverage testing in concert with testsuites, to make sure software is actually good enough for a release. Testsuites can verify that a program works as expected; a coverage program tests to see how much of the program is exercised by the testsuite. Developers can then determine what kinds of test cases need to be added to the testsuites to create both better testing and a better final product.

You should compile your code without optimization if you plan to use gcov because the optimization, by combining some lines of code into one function, may not give you as much information as you need to look for `hot spots' where the code is using a great deal of computer time. Likewise, because gcov accumulates statistics by line (at the lowest resolution), it works best with a programming style that places only one statement on each line. If you use complicated macros that expand to loops or to other control structures, the statistics are less helpful--they only report on the line where the macro call appears. If your complex macros behave like functions, you can replace them with inline functions to solve this problem.

gcov creates a logfile called `sourcefile.gcov' which indicates how many times each line of a source file `sourcefile.c' has executed. You can use these logfiles along with gprof to aid in fine-tuning the performance of your programs. gprof gives timing information you can use along with the information you get from gcov.

gcov works only on code compiled with GNU CC. It is not compatible with any other profiling or test coverage mechanism.

arkaria
11-20-2003, 06:11 AM
Ok I had a lok through db.sql and could not find the change in that character_ table then I checked the db.sql mod date and it's still from the 11th. What are the properties of the added column?

Trumpcard
11-20-2003, 06:22 AM
I've used gcov several times.. It will slow the heck out the program (have to compile with instrumentation options for your code) , but tells you that you've hit the code you're wanting to test..

Ive thought about it, but unless eqemu has a standardized release test team, and test cases, I really dont think it would help us a whole lot.

Ark, dont know, changes werent mine, but I'm guessing your issue is related to aa changes. Without the error message though thats only a guess... I think there was a 'time last on' field added in the code, but you might want to ask LE.

arkaria
11-20-2003, 06:24 AM
Yeah it's the timelaston column I'm missing. Should be an int I think.