Log in

View Full Version : Calculation Speed vs Efficiency


Uleat
10-17-2012, 10:20 PM
I'm trying to tweak a code segment I'm working on and can't find a reference as to what method is 'more proper' to use.

Which method is more efficient in terms of clock cycles?

x += y;
x /= 10;

or

x = (x + y) / 10;

Thanks!

lerxst2112
10-17-2012, 11:07 PM
Short Answer: Worrying about micro-optimizations like this is almost certainly pointless.

Slightly longer answer: Without knowing the type of x & y this question cannot be answered. There's a big difference between them being integers and being a large complicated class with overloaded operators.

If you really care, learn how to have the compiler output the assembly code and compare the two. Of course, then you'd need to look up the cost of each instruction for every processor the code might run on as well as consider any potential stalls, etc.

Suffice it to say that for integers they likely generate identical or almost identical code. For anything more complex where the generated code might be different the only real way to know which is better is to profile both, possibly under several different conditions, and use that measurement to determine which is better.

Uleat
10-18-2012, 12:00 AM
Wow..I'd expect no less an answer! Thanks again!


One last question (this is open):

Can anyone recommend a good book or offline material for beginner to intermediate c++? (price range $30~$40)

I think it's time that I update my library...

trevius
10-18-2012, 02:29 AM
You would probably find this thread useful:

http://www.eqemulator.org/forums/showthread.php?t=32765

KLS
10-18-2012, 02:55 AM
lerxst's short answer is going to be right 99% of the time. Just cause I needed a break from hammering out these templates I compiled them both for you and generated the assembly.

Optimizations off MSVC10

; x += y;
; x /= 10;
mov edx, DWORD PTR _x$[ebp]
add edx, DWORD PTR _y$[ebp]
mov DWORD PTR _x$[ebp], edx
mov eax, DWORD PTR _x$[ebp]
cdq
mov ecx, 10
idiv ecx
mov DWORD PTR _x$[ebp], eax

; x = (x + y) / 10;
mov eax, DWORD PTR _x$[ebp]
add eax, DWORD PTR _y$[ebp]
cdq
mov ecx, 10
idiv ecx
mov DWORD PTR _x$[ebp], eax


Optimizations on MSVC10(/O2)

; x += y;
; x /= 10;
mov eax, 1717986919
lea ecx, DWORD PTR [edx+edi]
imul ecx
sar edx, 2
mov eax, edx
shr eax, 31
add eax, edx

; x = (x + y) / 10;
mov eax, 1717986919
lea ecx, DWORD PTR [edx+edi]
imul ecx
sar edx, 2
mov eax, edx
shr eax, 31
add eax, edx


With optimizations on they both generate the same code, with them off it's slightly different.

lerxst2112
10-18-2012, 04:04 AM
For extra credit, figure out why the optimized code actually works since there's no actual division in it. :)

I added a bunch of book recommendations to the other thread rather than put them here.

KLS
10-18-2012, 06:33 PM
That extra credit is going to give someone a brain aneurysm.

Uleat
10-18-2012, 06:48 PM
Thanks for running that K! It is interesting to see what goes on behind the scenes. And thanks for that list Lerxst!

Oh god! It's been 30 years since I played with Assembly language..and that was on a Motorola 6809...


Edit: And thanks Trev for pointing me back over there. I remember seeing that thread when it first came up, but
I hadn't kept up with it. It really grew!

lerxst2112
10-18-2012, 07:47 PM
That extra credit is going to give someone a brain aneurysm.

Heh, it's one of the neater tricks used to avoid division by constants. It also proves the guys that write the optimizers are way smarter than me. :)
http://homepage.cs.uiowa.edu/~jones/bcd/divide.html

I remember back in the old days before optimizers were scary good my boss would compile code to asm and then go through and hand optimize some simple things like xor ax,ax instead of mov ax,0 because it saved 1 cycle. Probably pointless, but it seemed to make him happy. He'd also freak if you used division instead of converting to a right shift or two where possible.

Nowadays that kind of stuff is boring compared to the lengths people will go to in order to avoid a branch on a console.

More reading if your brain doesn't already hurt: http://graphics.stanford.edu/~seander/bithacks.html

Uleat
10-20-2012, 07:05 PM
Obviously, there was a motive behind my asking about speed versus efficiency. At a minimum, I think there should be a
bit of an increase in certain parts of the code below due to the fact that the registers aren't being reloaded with all
of the previous variables each time a new denomination is required.

I guess I'm gonna finally have to sit down and learn how to use the debug and disassembler features. I've skated long
enough and I need to know how to do this.


I was reading up on those links that you provided Lerxst and that is some really interesting stuff.

So, are both the Win and Linux projects set up to be optimized for stuff like this automatically? I saw that whole program
optimization was off in the Win project, but I believe that the local was on (can't remember at the moment...)


I'd really like to get the division and mod operations as efficient as possible here (and in anything else that I submit.)

<For Example:>

[client.h]
#define CONVERSION_RATE 10

[client.cpp]
bool Client::TakeMoneyFromPP(uint64 copper, bool updateclient) {

if (!HasMoney(copper)) { return false; }

sint64 cur_copper = (static_cast<sint64>(m_pp.copper) - copper);
if (cur_copper < 0) {
copper = abs64(cur_copper);
cur_copper = ((CONVERSION_RATE - (copper % CONVERSION_RATE)) % CONVERSION_RATE);
copper = ((copper + cur_copper) / CONVERSION_RATE);
}
else { copper = 0; }
m_pp.copper = cur_copper;

sint64 cur_silver = (static_cast<sint64>(m_pp.silver) - copper);
if (cur_silver < 0) {
copper = abs64(cur_silver);
cur_silver = ((CONVERSION_RATE - (copper % CONVERSION_RATE)) % CONVERSION_RATE);
copper = ((copper + cur_silver) / CONVERSION_RATE);
}
else { copper = 0; }
m_pp.silver = cur_silver;

sint64 cur_gold = (static_cast<sint64>(m_pp.gold) - copper);
if (cur_gold < 0) {
copper = abs64(cur_gold);
cur_gold = ((CONVERSION_RATE - (copper % CONVERSION_RATE)) % CONVERSION_RATE);
copper = ((copper + cur_gold) / CONVERSION_RATE);
}
else { copper = 0; }
m_pp.gold = cur_gold;

sint64 cur_platinum = (static_cast<sint64>(m_pp.platinum) - copper);
if (cur_platinum < 0) {
#if (EQDEBUG>=5)
LogFile->write(EQEMuLog::Debug, "Client::TakeMoneyFromPP() %s's transaction resulted in a deficit of %i platinum",
GetName(), cur_platinum);
#endif
cur_platinum = 0;
}
m_pp.platinum = cur_platinum;

if (updateclient) { SendMoneyUpdate(); }

RecalcWeight();
Save();
return true;
}


I think that I'm using static_cast appropriately since I don't want to have an overflow issue with <m_pp.denomination> when I
subtract a <uint64> value from a <sint32>..especially when I know that the value of 'copper' WILL possibly exceed the maximum
negative (or minimum) <sint32> value.

I think that I also ran across something about '(unsigned)' being a free conversion. Would it be better to just use that instead
of invoking an abs64() call? Or would that return the wrong value on conversion?


I'm sure you gurus out there will have no problem following what I did here, but for anyone else wondering, basically, here is
what I did:

-> assign a temp variable [cur_denomination] to equal player currency [m_pp.denomination] minus debt [copper]

-> check to see if [cur_denom] is negative, which indicates there's not enough [cur_denom] to pay all the debt
[case:true]
-> reassign debt to [copper] <absolute/positive value>
-> "borrow" '10' and find result to clear current debt place value (second % C_R is for the case of '0'..don't want '10')
-> [cur_denom] added back to debt clears the place value..dividing by C_R converts to next denomination

[case:false]
-> set [copper] to zero '0' <clear all debt> .. [cur_denom] will be '0' or greater

-> reassign [m_pp.denom] to equal [cur_denom]

-> perform this test on each increasing denomination until all debt is payed off (if debt was zeroed, 'conversion' is pointless)
-> special check/log event in platinum in the case that something went awry


I can see why the person who coded the current implementation of this used multiplication. Using two division type operations as
much as I did in this can lead to much slower performance in non-optimized code..even if there are more calculations the other way...
(I think one of those links said as many as 41 cycles for a single, true-division operation.)

Anyways..it's open to suggestions.

Edit: I will look at other methods to avoid 'division' when I finish getting my system back up

lerxst2112
10-20-2012, 08:19 PM
You're still way overthinking this. It doesn't happen often enough that it really needs to be optimized, and all of the rest of the code in that function, optimized or not, takes a tiny fraction of the time of the Save() call at the end.

This code is also really complicated for no particular reason. Imagine you had two helper functions, one that converted plat, gold, silver, copper into just copper, and the other does the reverse, converting copper into plat, gold, silver, copper. Now the function is simple, convert the player's money into copper, subtract some, then convert it back.

The absolute value of -1 is 1, whereas (uint32)-1 is 4,294,967,295. Obviously very different things.

Uleat
10-20-2012, 09:08 PM
My head was having a hard time wrapping around what the unsigned conversion resulted in when I was writing that..shoulda rethought it before including that...

You're right, I was only focusing on the internal code and not the time of the external call, so I guess this really became an exercise in futility on my part. (It
reflects my life at the moment, so no loss of expectations there.)


I've been thinking of porting my numeric mapping program (think primes) to c++ from vb. Since it uses a ton of division and sqr() operations, this thread has
caused me to rethink how I might approach it, so thank you guys for that!

I'll leave this alone since it doesn't provide any real benefit..just change... I'll repackage that '#peekinv money' submission sometime in the future.