PDA

View Full Version : Potential speed increase


blmille2
07-15-2010, 02:41 PM
Hello, guys.
I was thinking about the following:
stat += 100 * stat * modifier / 100

Now, I know why it's being used -- it's integer multiplication and division, which is way faster than floating point calculations.

Here's my idea:
stat += ((stat << 7) * modifier) >> 7

With this, you have only one integer multiplication instead two multiplications and one division.

I used to TA an assembly course at UIUC and we learned that integer shifts can be done in one clock cycle and integer multiplication can take quite a few clock cycles.

When I did verification simulations at AMD, I also verified that it takes multiple clock cycles on integer math (except for addition/subtraction) and AMD had the fastest ALU at the time.

However, shifts have always been able to be done in one clock cycle, guaranteed.

I know that 100 is a nice number to multiply by, but since its only use is to do a somewhat accurate integer percent calculation, I recommend doing a shift left (or right) by 7, which is multiplying by 128 and dividing my 128, only loads faster.

In the end, you're multiplying the whole equation by 1.

What is your opinion on this?

Thanks!

blmille2
07-15-2010, 03:10 PM
Never mind, I'm an idiot--I misread the lines.

It's not a matter of 100 * (stat * modifier) / 100
It's more a matter of stat += stat * modifier / 100

Therefore, I'm stupid and the above won't produce the same results...

Sakkath
08-17-2010, 07:32 AM
Hey blmille2,
I know you've discounted the idea because the functionality is different, but here's my 2cents:
The general rule is that 80% of the processing time is spent in 20% of the code - depending on the application it might be even more concentrated. When optimising, it's best to focus on the bottlenecks that will yield the best result.
Sometimes you can make code faster by using more efficient instructions, although the biggest optimisations generally come from changing the program flow - for example caching values, reducing IO, breaking out of loops early etc.

Had your solution been correct, it would probably only have a very minimal impact, and on the downside it makes the code less readable as it's not as obvious what is being done.
I don't mean to discourage you at all - there is certainly a lot that can be optimise that would make the servers perform much better.
For instance, the Process_AI() loops are HUGE functions where the code checks over and over which class it's processing. They'd be much more efficient if they used a series of derived classes, i.e. Necro::Process_AI, Wizard::Process_AI etc.