If you're running linux and would like to help!
I've been doing testing on EQEMU using several linux tools such as LeakTracer and gprof to try and locate memory problems and bottlenecks in the code.
My problem is typically that I don't generate load on my machines (more than 1 client) To assist, you can install and build LeakTracer on your machine , and run at least one zone using it, and get us some test data that covers use cases of several users in the zone doing different things. LeakTracer is nice, as you can preload the .so into your lib path, and all you really need is a zone executable compiled with -g so the leak-analyze script can use gdb to find the location of the problem. LeakTracer is available at http://www.andreasen.org/LeakTracer/ Additionally, something else you can do, and at the same time is compiling the zone with -pg to enable performance statistics, then running the .out file through gprof , and dumping the results to a txt file. This will show where the code is spending all its time in execution. I have attempted running zone through Valgrind, but so far have been unsuccessful getting it to work. Valgrind significantly slows code execution, and this causes too much delay in zone and the client believes the zone is unavailable. If anyone runs a 24x7 linux server that gets decent load that would like to help, your assistance would be appreciated. My tests have uncovered several issues we have been able to correct, but having someone test on a larger scale basis will give us more complete code coverage, and possibly broader results. Thanks! |
I run a 24/7 server but I don't get the impression it gets much traffic.
Would it be worth me running LeakTrace? If so I'd be happy to. |
In January, I'm starting a web resale business, and I'll be buying hosting space and bandwidth. I don't think I'll be using 100% of the bandwidth (100GB up/down, or as low as 20GB up/down). I'll see what I can do about hosting strange processes like world before I sign the little contract.
I can choose a linux or a windows server and I'm going linux. |
Every little bit helps! Doenst need a ton, as long as we get a few people running in there.. What I want to see is RANDOM functionality executed, when I test, I test specific cases, so the cases are locked to what Im working on.
By letting random people run around and execute, you get a better model of 'real world usage', hence the result set will be much more rounded. |
Should I be adding -g to the make file or just at the command line "make -g"
|
Trump, what kind of server are we looking at here? Do I need 256M+ RAM for sure, or 1G, 512? I think speedwise, I will be lucky to get a 1.8 hosted...
I'm a poor college kid :P But I'll do what I can to host some. |
Doesnt matter, the code should perform at the same relative level regardles of the speed/memory.
You can test on one zone only, you just need to make sure all the folks zone into the same zone to perform the testing. You add -g into your FLAGS in the makefile, not to the make commandline |
Ok zone is recompiled with -g and I'm running zone via LeakCheck. Should I run every zone with LeakCheck or just one?
|
I would run just one with leaktrace
the best way to do it is to name one zone zone.leak or something seperate from the other zones, then run it using the startscript. I created my own start script using the leak.sh one provided with the program with the emsharemem.so lib appended onto the end, and the zone i wanted to start, and run it in a seperate window so you can control-C or control-Z out of the zone when you've completed the run. This should produce a leak.out file. The problem with running it on all zones is that each one will overwrite the file, the only way around that is to create a seperate directory for each running zone. Experimentation will tell you what works best for you |
I ran all 10 zones with the LeakTrace just to see what happenes. I now have a nice big "leak.out" file (40k).
Here is what "./leak-analyze zone leak.out" gives me. http://www.1amos.com/leak.txt |
I would like to help. I run fish-wolf. I will try to get this stuff
set up when I get some time here soon. |
Ark,
I'm going to push out new CVS code this evening, can you rerun the tests using builds with it ? No mem. leaks in this txt, just lots of allocation scheme differences. I'll check against our code base tonight and see if any of these are ones I havent corrected yet. Also, it wont pick up .h files that are outside of the current directory, you can get around that by copying the common.h files into the executing directory (if you notice, servertalk.h is one thats listed as MIA) Thanks! |
Yeah I'll do that for sure. Just post here or something when CVS has been updated and I'll recompile.
I can copy all the .h files into my bin folder if that will help or make a difference |
CVS, public or dev?
Was the source for the released version 0.5.0 binaries ever pushed to the public CVS? If it was I'm doing something wrong. I've been keeping up to date with CVS, but I've not gotten anything new since the 11/11/03 changelog. Just curious.
The CVS push you mentioned this evening, Trump, will that be dev or public? Just a bit anxious for some updated code to test/play with. If you need additional leak testing, I normally run 5 zones on my little private server and I can have a few mates put a load on them, just let me know. Thanks in advance. : ) |
The source for the 5.0 release was not pushed to CVS as of vesterday but there is a zip of the source available along side the bins.
more leak info www.1amos.com/leak2.txt |
I've already corrected all the ones you reports are displaying, that was done this weekend.
New code is being pushed out now, should be available in a few hours. Sourceforge can take time to recognize the updates. Please test with this updated version. Also, I still believe running multiple leak zones will continully overwrite the leak.out file, so you might be eliminating information by running them all concurrently in the same directory. It is definitely not appending or there would be multiple occurences of each alloc scheme mismatch. |
Sorry for being late. I hope I got things together properly, I had
to step away for a bit while this ran. Please let me know if this helps at all. I asked people to zone in and out of gfaydark, which is the zone I ran this on. Sorry if its a big large. http://www.ubzub.com/fish-wolf/leak.out -bobzub |
That needs to be run through leak-analyze before the results will help....
leak analyze will use gdb to isolate the code I could try but if my binary doesnt match yours, the source lines wouldnt match up, so run it through there and post the results.. |
bobzub
have a look at my first post of a leak.txt I gave the command you need to use. There is more info on analysing in the README |
grrr figures I forget something, lemme get that done.
-bobzub |
Apologies, here is resulting file:
http://www.ubzub.com/fish-wolf/leak.txt hopefully I got this one right. -bobzub |
Odd, its listing the line numbers and where they're from, but not listing the source lines...
This is obviously the older code base, pull down CVS today and test again with the updated version if you dont mind. Good tests would be having mixtures of spell casters and melees fighting, that way we'd be exercising alot of the code in spells.cpp as well. Thanks! This is one I havent seen... I'm guessing the way point list isnt being cleaned up properly.. Havent really messed with that code much, but I'll look into it.. How many zones was this, and how long was it running for ? Code:
#-- Leak: counted 1150x / total Size: 23000 This one has been FIXED, we caught that one the other day in testing.. Code:
#-- Leak: counted 2x / total Size: 45404 Code:
#-- Leak: counted 330x / total Size: 6600 ------------------------------------------------------------------------- Not too sure about these.. Code:
#-- Leak: counted 1538x / total Size: 12304 Code:
EventList* event1 = new EventList; Also, there are no corresponding deletes to the list new's , probably need to be added to the deconstructor also |
All of those death/damage ones should be fixed, loot one is fixed.
With the parser ones.. i'm not sure parser ever cleans up after itself. It's a lot of code to go through but I got the impression that even if that zone is unbooted things will still be loaded.. obviously as 'bad thing'(tm). Thank you all for your help. |
Has public CVS been updated? Looking at the mod dates on the files there are only 3 that have changed in the past 7 days:
zone.leak splintreport.txt splint.sh |
Yes, I pushed it out both last night and this morning. I just removed zone.leak and the splint stuff, that was an accidental checkin, along with world.386 and world.pent.
I think that LE has started updating CVS at the same time he makes changes to the dev only CVS, and that might be causing conflicts in my jobs.. I'll take a look tonight, I might have to rewrite the CVS jobs... They were originally written with the idea that I'd be the only one putting updates in CVS , so the method I used was very simple and primitive, and probably prone to problems when other people put changes directly into CVS. |
I think I'm missing something here.
So CVS has been updated? If so why are all the files dated Nov 11? And the .leak file is also still there. |
If you delete your soruce directories, and pull everything down fresh, it won't be there...
assuming you use anonymous 'cvs checkout' to pull code code down as opposed to that junky web interface. |
Lets see...
I ran the leakcheck script on a single instance of zone, which I compiled with the -g flag. For that zone, I ran a static area, which was gfaydark. I asked people on the server to zone in and out, but I have no idea what they did since I asked via console. At that point I had to leave, and I was away for maybe an hour. So it ran for about an hour. I will grab CVS from today and compile it. If all goes well I will run leakcheck on a zone again, and try to get people into the zone to help out. Thanks, -bobzub |
Sorry to ask about his, but I just did a fresh pull from cvs after
moving my eqemu dir to another location. I was looking in common/version.h, and notice DR5 still in it. Is this correct? -bobzub |
Yeah same thing here. I always pull a fresh copy from CVS via the command line and all the files are from Nov 11th. Public CVS has not been updated since Pre 5.0 release.
|
Please read over what I typed before
Quote:
|
I just compiled the new (file dates of nov 19) with the -g option and ran one of my zone processes with LeakCheck and the first thing it did was create a 54MB leak.out file in the first 20 seconds
then settled down. http://www.1amos.com/leak3.txt |
No leaks there, just some dealloc hits..
To get really good data, the zone needs to run for quite a while and have various people doing random things in there. This ensures we are getting good code coverage execution.... Heres a short lesson in enterprise coding... skip this if you're not interested.. ----------------------------------------------------------------------------------- For those that are new to code coverage, here's a lesson on it and what it means and why its important... In enterprise development (real world coding), you have what are called test cases or use cases that are intended to ensure that your code is executing as is designed. Say a course of behavior for a client using an application. 1) Client logs in. 2) Client checks his inventory and rearranges several items 3) Client goes to a merchant and buys something 4) Client logs out.. This series of steps indicates a specific 'test case' . Only certain parts of the code were executed, hence you can only verify that certain pieces are working. Combat could be completely shot, but you will never know until it is tested. This requires you to have a combat test case. And combat itself could be broken down into melee and nonmelee (spell casting) tests cases, which could be further broken down. to ensure good code coverage and testing means you need extensive test cases to adequetely test all the different pieces of the code... This is where semi randomized testing comes in. Randomized testing doesnt cover true test causes, but it does give you a good idea of where the bugs are because of the rather random behavior of the users, and is often the best way to catch 'off the path errors' Using this approach, allow the server to run for a long time and having multiple people randomly exercise pieces of the code that you dont normally test yourself gives you a good idea of whats working and whats not. In the case here, this semi random use will help us to determine problems in pieces of the code that we dont normally test ourselves. Thats why bug reports are so important to eqemu. Without standardized regression tests and base test cases, it falls upon the users to find problems that we ourselves dont catch. One of the problems with standard test cases is they are usually 'golden path' test cases, which means someone testing when someone did something right, but what if someone does something wrong, did you test for that? Someone accidently does something they didnt mean too, like target themselves instead of a mob for a spell.. Results could be unpredictable if someone hasnt taken that into account in the code! Thats one of the reason that defualts in switch statements are so important, you always need catch all rules when things dont behave as you expect they should...... |
Theres one too fix..
Code:
if (RunQuery(query, MakeAnyLenString(&query, "SELECT loottable_id, lootdrop_id, multiplier, probability FROM loottable_entries WHERE loottable_id=%i", tmpid), errbuf, &result2)) { tmpLT = (LootTable_Struct*) new uchar[sizeof(LootTable_Struct) + (sizeof(LootTableEntries_Struct) * mysql_num_rows(result2))]; and its dealloc'd with safe_delete(tmpLT); That delete should be a safe_delete_array since it's new'd as uchar[x] The one in seperator confuses me.. This is the deconstructor 91 ~Seperator() { 92 for (int i=0; i<=maxargnum; i++) 93 safe_delete(arg[i]); 94 safe_delete_array(arg); 95 safe_delete_array(argplus); 96 } If you have an array of new'd objects, do you need to delete each member of the array, do you need to dealloc each member of the array, and then the array itself ? |
I've had at least one player who spends alot of time on my server report this each time he tried to zone:
[thu Nov 20 07:33:02 2003] Error: Asyncronous save of your character failed. He would get disconnected and this was in the log file. He was unable to get out of the zone he was in (arena) After trying to zone and getting disconnected he would log back in and not have zoned. SO he was stuck in the zone he happened to be in bfore I upgraded the server. *** Edit*** Where there maybe changes to the DB structure? I have to admit I didn't check. |
Yes, db changes... Look in your zone stndard out and you should see the failure, and look at your zone.log
|
Speaking of testing code, I read a while back about the program gcov put out by the gnu team.
Do you think this might help you? Here is the url http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html Quote:
|
Ok I had a lok through db.sql and could not find the change in that character_ table then I checked the db.sql mod date and it's still from the 11th. What are the properties of the added column?
|
I've used gcov several times.. It will slow the heck out the program (have to compile with instrumentation options for your code) , but tells you that you've hit the code you're wanting to test..
Ive thought about it, but unless eqemu has a standardized release test team, and test cases, I really dont think it would help us a whole lot. Ark, dont know, changes werent mine, but I'm guessing your issue is related to aa changes. Without the error message though thats only a guess... I think there was a 'time last on' field added in the code, but you might want to ask LE. |
Yeah it's the timelaston column I'm missing. Should be an int I think.
|
All times are GMT -4. The time now is 04:19 AM. |
Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.