Hi Tobias,
Thanks for playing. I just think its more fun to discuss the shelves I
built my self rather than ones picked up from IKEA, although both have
their place.
There are differences. The one you present is definitely marginally faster
than the other two (on QPC2, but barely discernibly so on SMSQmulator). But
it doesnt test for zero inputs so N x 0 takes as long to calculate as N x
M. My version is the slowest of the three, except for when at least one
input is a small number (I only tested 1 x N) when it is as fast or faster.
This one by Jonathan Oakley is also interesting. He really does it the hard
way, bit by bit, with shifts and additions. I guess the microcode versions
must be doing something like this:
Code: Select all
*
* 32-bit integer arithmetic v0.01 © Mar 1988 J.R.Oakley
QJUMP
*
* Compatible with QDOS, Minerva and SMSQ/E
*
section code
xdef usmul
*+++
* Multiply two 32-bit unsigned numbers, giving a 64-bit result
*
* Registers:
* Entry Exit
* D0 multiplier result, LSLW
* D1 multiplicand result, MSLW
*---
usmul
move.l d0,d3 ; copy multiplier
move.l d1,d4 ; and multiplicand
moveq #0,d5 ; upper LW starts as zero
moveq #0,d0 ; result is
moveq #0,d1 ; also zero
mullp
lsr.l #1,d3 ; get a bit
bcc.s noadd ; there wasn't one
add.l d4,d0 ; add to LSLW
addx.l d5,d1 ; and with carry to MSLW
noadd
tst.l d3 ; end of multiplier?
beq.s mulexit ; yes
add.l d4,d4 ; no, double...
addx.l d5,d5 ; ...the multiplicand
bra.s mullp ; next bit
mulexit
rts
*
end
Unfortunately, although the most compact it is also the slowest of the
lot, except where the multiplier is very small. It too is dependents on its
inputs. You can find the full, signed version in dev8_util_cv_muldiv_asm in
the SMSQ/E sources.
I guess theres a little fat in all these routines that could be trimmed if
desired or special circumstances permitted. Different platforms also give
different results, and the same algorithms on different CPUs could also
make a huge difference.
Reading that Multiplication article on Wikipedia, I see further
efficiencies could be achieved, but the added complexity doesnt make much
sense for the relatively small numbers we're operating with here and may
not even kick in for numbers with less than a few hundred decimal digits.
Is any of this useful? Possibly, but thats not necessarily the point. One
thing I learnt from our FUZZBUZZ excercise way back when, was that when you
think youve reached the limit of compactness or efficiency, someone always
comes up with something to blow your socks off!