Nice one! Thanks!NormanDunbar wrote: Sun Apr 02, 2023 3:00 pm Hi Per,
when I said "Spooky!" I was referring to the algorithm used. I need to have a read through it and try to understand it better.
EDIT: I found this: https://en.wikipedia.org/wiki/Multiplic ... iplication on Wikipedia.
Cheers,
Norm.
Multiplication
Multiplication
Per
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
Re: Multiplication
I dont want to hog this thread so Im creating a new one. Dont know if its going to go anywhere.. Doesnt seem to be a lot of interest in this sort of thing here.
I did a comparison with your own routine (ezine vol 11). On QPC2 I couldnt detect any measurable difference in performance. On SMSQmulator (different engine!) yours was slightly faster (like 1-2%) for large numbers. Predictably, the performance of my version depended on the inputs, while yours was consistent whatever the input.
Ive attached the two versions with harnesses below, so interested parties can see for themselves. Obviously, I stripped yours down to the bare essentials for comparison's sake, and I supplied my 64 bit version.
The harnesses just attach the routines to S*BASIC functions for ease of access. They include a quick and dirty routine to convert the outputs to a 16 byte hex string so I could check that the results were correct.
Simply LRESPR the two toolkits and run the commands in a timed loop. PRINT SMULT3264$(x, y) tests your version, PRINT MULU3264$(x, y) for mine. About 10 million iterations are required, on my systems at least, to get a meaningful measurement. This should be reduced by some orders of magnitude on QL-ish hardware.
Note: When I use "yours" and "mine" Im not trying to invoke a pissing contest
Im simply interested in comparing two very different algorithms! Other people enjoy bungee jumping, or collecting old computers!
I did a comparison with your own routine (ezine vol 11). On QPC2 I couldnt detect any measurable difference in performance. On SMSQmulator (different engine!) yours was slightly faster (like 1-2%) for large numbers. Predictably, the performance of my version depended on the inputs, while yours was consistent whatever the input.
Ive attached the two versions with harnesses below, so interested parties can see for themselves. Obviously, I stripped yours down to the bare essentials for comparison's sake, and I supplied my 64 bit version.
The harnesses just attach the routines to S*BASIC functions for ease of access. They include a quick and dirty routine to convert the outputs to a 16 byte hex string so I could check that the results were correct.
Simply LRESPR the two toolkits and run the commands in a timed loop. PRINT SMULT3264$(x, y) tests your version, PRINT MULU3264$(x, y) for mine. About 10 million iterations are required, on my systems at least, to get a meaningful measurement. This should be reduced by some orders of magnitude on QL-ish hardware.
Note: When I use "yours" and "mine" Im not trying to invoke a pissing contest

- Attachments
-
- MUL.zip
- Multiplication algorithms
- (6.82 KiB) Downloaded 48 times
Per
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
- NormanDunbar
- Forum Moderator
- Posts: 2462
- Joined: Tue Dec 14, 2010 9:04 am
- Location: Buckie, Scotland
- Contact:
Re: Multiplication
Thanks Per.
I knew you weren't up for a pissing contest. I like to see different ways of doing the same thing. I appreciate your input -- do you mind if I write it up for the next (exciting) issue of the irregular Assembly Language eMagazine. Full credit will be given of course.
I've studied your code, and come up with a signed version that does 32*32 bits into 64 bits. I have yet to test it though. Here's my first attempt -- I haven't tried assembling yet though! I've adjusted the inputs and outputs to match what I wrote up in the magazine, that way, I remain slightly less confused!
Thanks again for your efforts, if no-one else appreciates it, I do!
Cheers,
Norm. (Code below)
I knew you weren't up for a pissing contest. I like to see different ways of doing the same thing. I appreciate your input -- do you mind if I write it up for the next (exciting) issue of the irregular Assembly Language eMagazine. Full credit will be given of course.
I've studied your code, and come up with a signed version that does 32*32 bits into 64 bits. I have yet to test it though. Here's my first attempt -- I haven't tried assembling yet though! I've adjusted the inputs and outputs to match what I wrote up in the magazine, that way, I remain slightly less confused!

Thanks again for your efforts, if no-one else appreciates it, I do!
Cheers,
Norm. (Code below)
Code: Select all
;--------------------------------------------------------------
;
; Multiply two 32 bit signed integers using the "Peasant
; Multiplication" method.
;
; See Wikipedia: <https://en.wikipedia.org/wiki/
; Multiplication_algorithm#Russian_peasant_multiplication>
;
; and PJW's post on QL Forum at <https://theqlforum.com/
; viewtopic.php?p=50874#p50874> fo details.
;
; This method of multiplication is (or at least was) used by
; sheep and camel herders in the desserts of Arabia. The only
; skill you need to work this is to be able to count!
;
; You dig two parallel columns of small holes in the ground.
; The number of holes required will depend on the size of of
; your calculation. It is not fixed; you can always add more
; if you need.
;
; In the topmost hole of the left column you put a number of
; beads -- or stones or sheep or camel droppings, depending on
; what's available -- corresponding to the number of, say,
; sheep you wish to sell;
;
; In the topmost right hole the number of beads corresponding
; to the price per head.
;
; Then, operating on the second row of the right column, for
; every two beads in the hole above, you put one bead in the
; hole, rounded down to the nearest whole number.
;
; Continue until there is only one bead left.
;
; In the left column you double the number of beads for each
; hole down the column. Each row on the left corresponds to
; the row on the right, so stop when you get to the row where
; the right column contains only one bead.
;
; Now, starting from the topmost row, take all the the beads
; from the left column, where the corresponding right column
; contains an ODD number of beads, and put them in the
; accumulator.
;
; The contents of the accumulator now, magically, contains the
; number of beads that correspond to the total amount of
; rials, shekels or whatever, for the sale; ie multiplication.
;
; The code below emulates that algorithm.
;
;
; Entry Return
; D0.L = First number D0.L smashed.
; D1.L = Second number D1.L smashed.
; D2.L D2:D3 = result, high word in D3.
; D3.L
; D6.B = Sign indicator D6.B = preserved.
;--------------------------------------------------------------
****** *****
****** T O B E T E S T E D F U L L Y *****
****** *****
;--------------------------------------------------------------
; Code for signed multiply of two 32 bit numbers in D0 and D1.
; Does some jiggery pokery (!) before calling unsigned "mult32"
; and on return, adjusts things to undo the jiggery pokery!
;
; Negative * Negative = Positive. )
; Positive * Positive = Positive. ) Same sign = +ve result.
; Negative * Positive = Negative.
;
; D6.B is used to hold a flag to indicate whether the result
; has to be sign adjusted. If D6.B is positive, then both input
; values had the same sign, so the result will be positive. If
; D6.B is negative, the signs were different and the result
; will need to be negative.
;--------------------------------------------------------------
smult32
moveq #0,d2 ; Clear accumulator low word.
move.l d2,d3 ; Clear accumulator high word.
move.l d6,-(a7) ; Save sign flag.
moveq #1,d6 ; Assume both have same sign
;--------------------------------------------------------------
; Test D0.L if zero, we are done here, the result is already
; zero.
;--------------------------------------------------------------
testD0Zero
tst.l d0 ; Anything to do?
beq.s testSign ; Nope, result is 0.
;--------------------------------------------------------------
; Test sign of D0. If negative set a flag in D6 and make D0
; positive. Negate D6.B to indicate one negative value so far.
;--------------------------------------------------------------
testD0Sign
bpl.s testD1Zero ; D0 is positive.
neg.b d6 ; D0 is negative, set flag.
neg.l d0 ; And make D0 positive.
;--------------------------------------------------------------
; Test D1.L if zero, we are done here, the result is already
; zero.
;--------------------------------------------------------------
testD1Zero
tst.l d1 ; Check other number.
beq.s testSign ; Nothing to do, result is 0.
;--------------------------------------------------------------
; Test sign of D1. If negative set a flag in D6.B and make D0
; positive. Negate D6. At this point D6.B will be +ve if both
; values had the same sign, and -ve if both were different.
;--------------------------------------------------------------
testD1Sign
bpl.s testSwap ; D1 is positive.
neg.b d6 ; D1 is negative, set flag.
neg.l d1 ; MAke D1 positive.
;--------------------------------------------------------------
; According to Wikipedia:
; <https://en.wikipedia.org/wiki/Multiplication_algorithm#
; Russian_peasant_multiplication>
; not strictly necessary, it works either way around. But PJW
; wroite it this way.
;--------------------------------------------------------------
testSwap
cmp.l d0,d1 ; Smallest is the multiplier.
bcs.s doMult32 ; Already in order.
exg d0,d1 ; Swap numbers over.
bra.s doMult32 ; Ready to multiply.
;--------------------------------------------------------------
; This is where we do all of the halving and doubling.
;--------------------------------------------------------------
loop
lsl.l #1,d1 ; Double multiplicand.
bvs.s testSign ; Oops. Overflow!
lsr.l #1,d0 ; Halve multiplier.
beq.s testSign ; No more to do.
;--------------------------------------------------------------
; If the halved value in D0.L is odd, we need to add it to the
; result in D3:D2
;--------------------------------------------------------------
doMult32
btst #0,d0 ; If even..
beq.s loop ; ..skip addition, else..
add.l d1,d2 ; ..add into the low word
addx.l #0,d3 ; .. and the higm with carry.
bvc.s loop ; Loop unless D3 overflowed.
;--------------------------------------------------------------
; Adjustments to the sign of the result required? If D6.B is
; positive, then result is fine. If negative, we need to negate
; the result.
;--------------------------------------------------------------
testSign
tst.b d6 ; +ve = same sign,
; -ve = different.
bpl.s done ; Same sign, all done.
;--------------------------------------------------------------
; To negate a 64 bit value, NEG the low word and NEGX the high
; word.
;--------------------------------------------------------------
negateResult
neg.l d2 ; Negate result, low word.
negx.l d3 ; And the high word
;--------------------------------------------------------------
; D3:D2 contains the 64 bit result. D3 holds the high 32 bits
; and D2 the low 32 bits.
;--------------------------------------------------------------
done
move.l (a7)+,d6 ; Restore sign flag register.
rts
Why do they put lightning conductors on churches?
Author of Arduino Software Internals
Author of Arduino Interrupts
No longer on Twitter, find me on https://mastodon.scot/@NormanDunbar.
Author of Arduino Software Internals
Author of Arduino Interrupts
No longer on Twitter, find me on https://mastodon.scot/@NormanDunbar.
Re: Multiplication
Thank you Per. Bit out of my league with this, so when I have a bit more spare time I may study it more closely to see if I can understand it.
--
All things QL - https://dilwyn.theqlforum.com
All things QL - https://dilwyn.theqlforum.com
Re: Multiplication
Norm, I was going to do a signed version next, but that would just have been for duty, not fun, so Im glad you did it 
I dont know if this is the right place to discuss the nitty-gritty. We could take it offline if you (or public sentiment would) prefer..
Looks good, but didnt you miss a step?
My version says:
Multiplicand could run to a double long word, no?
I dont know if this would work, but I wanted to try this optimisation:
I havent yet studied your algorithm properly (in my case that means heuristically
), but I was wondering how you dealt the sign in the event that the unsigned multiplication caused bit 63 of the accumulator to be set? Shouldnt that case cause an overflow error?
Just a note on
Consider the (extreme) case of 1 x 2^30. The code would have to do a whole bunch of unnecessary loops before it figured out there was no there there. Swap the two numbers around and it is solved in one go. I think (completely unscientifically) that that optimisation is probably worth it..
Yes, feel free to use it any way you like. Perhaps you might like to remove my description of the algorithm (it was terribly badly explained!) and use the Wikipedia description instead. Or the one I appended to the end of 64 bit version..

I dont know if this is the right place to discuss the nitty-gritty. We could take it offline if you (or public sentiment would) prefer..
Looks good, but didnt you miss a step?
Code: Select all
loop
lsl.l #1,d1 ; Double multiplicand.
bvs.s testSign ; Oops. Overflow!
Code: Select all
loop
lsl.l #1,d0
lsl.l #1,d1 double multiplicand
bcc.s skip
addq.l #1,d0 carry
skip
I dont know if this would work, but I wanted to try this optimisation:
Code: Select all
loop
lsl.l #1,d0
addx.l d1,d1 double multiplicand
; bvs.s done oops! Overflow (unlikely. Impossible?)

Just a note on
Code: Select all
testSwap
cmp.l d0,d1 ; Smallest is the multiplier.
bcs.s doMult32 ; Already in order.
exg d0,d1 ; Swap numbers over.
bra.s doMult32 ; Ready to multiply.
Yes, feel free to use it any way you like. Perhaps you might like to remove my description of the algorithm (it was terribly badly explained!) and use the Wikipedia description instead. Or the one I appended to the end of 64 bit version..
Per
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
Re: Multiplication
No worries, Dilwyn. You do more than your fair share of other good stuffdilwyn wrote: Sun Apr 02, 2023 5:44 pm Thank you Per. Bit out of my league with this, so when I have a bit more spare time I may study it more closely to see if I can understand it.

Per
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
Re: Multiplication
I'm using this for unsigned 32 x 32 -->64 multiplication:
Boringly straightforward
This is written for the Q68 - that has blazingly fast multiplication, so I thought I could just as well use it. Long multiclipation is used in my programs that use 16:16 fixed-point math. I have a similar division routine somewhere.
Code: Select all
; multiplies d0:32 with d1:32 into d0:d1:64 (high-order bits in d0)
umult64:
move.l d2,-(a7) ; save d2 (scratch reg)
move.w d0,d2 ; d0.low to d2
mulu d1,d2 ; d2 = d0.low * d1.low
move.l d2,-(a7) ; result to stack
move.l d1,d2 ;
swap d2 ; d1.hi to d2
move.w d2,-(a7) ; save it
mulu d0,d2 ; d2.l = d0 * d1.hi
swap d0
mulu d0,d1
mulu (a7)+,d0
add.l d2,d1
moveq #0,d2
addx.w d2,d2
swap d2
swap d1
move.w d1,d2
clr.w d1
add.l (a7)+,d1
addx.l d2,d0
move.l (a7)+,d2
rts

ʎɐqǝ ɯoɹɟ ǝq oʇ ƃuᴉoƃ ʇou sᴉ pɹɐoqʎǝʞ ʇxǝu ʎɯ 'ɹɐǝp ɥO
Re: Multiplication
Hi Tobias,
Thanks for playing. I just think its more fun to discuss the shelves I
built my self rather than ones picked up from IKEA, although both have
their place.
There are differences. The one you present is definitely marginally faster
than the other two (on QPC2, but barely discernibly so on SMSQmulator). But
it doesnt test for zero inputs so N x 0 takes as long to calculate as N x
M. My version is the slowest of the three, except for when at least one
input is a small number (I only tested 1 x N) when it is as fast or faster.
This one by Jonathan Oakley is also interesting. He really does it the hard
way, bit by bit, with shifts and additions. I guess the microcode versions
must be doing something like this:
Unfortunately, although the most compact it is also the slowest of the
lot, except where the multiplier is very small. It too is dependents on its
inputs. You can find the full, signed version in dev8_util_cv_muldiv_asm in
the SMSQ/E sources.
I guess theres a little fat in all these routines that could be trimmed if
desired or special circumstances permitted. Different platforms also give
different results, and the same algorithms on different CPUs could also
make a huge difference.
Reading that Multiplication article on Wikipedia, I see further
efficiencies could be achieved, but the added complexity doesnt make much
sense for the relatively small numbers we're operating with here and may
not even kick in for numbers with less than a few hundred decimal digits.
Is any of this useful? Possibly, but thats not necessarily the point. One
thing I learnt from our FUZZBUZZ excercise way back when, was that when you
think youve reached the limit of compactness or efficiency, someone always
comes up with something to blow your socks off!
Thanks for playing. I just think its more fun to discuss the shelves I
built my self rather than ones picked up from IKEA, although both have
their place.
There are differences. The one you present is definitely marginally faster
than the other two (on QPC2, but barely discernibly so on SMSQmulator). But
it doesnt test for zero inputs so N x 0 takes as long to calculate as N x
M. My version is the slowest of the three, except for when at least one
input is a small number (I only tested 1 x N) when it is as fast or faster.
This one by Jonathan Oakley is also interesting. He really does it the hard
way, bit by bit, with shifts and additions. I guess the microcode versions
must be doing something like this:
Code: Select all
*
* 32-bit integer arithmetic v0.01 © Mar 1988 J.R.Oakley
QJUMP
*
* Compatible with QDOS, Minerva and SMSQ/E
*
section code
xdef usmul
*+++
* Multiply two 32-bit unsigned numbers, giving a 64-bit result
*
* Registers:
* Entry Exit
* D0 multiplier result, LSLW
* D1 multiplicand result, MSLW
*---
usmul
move.l d0,d3 ; copy multiplier
move.l d1,d4 ; and multiplicand
moveq #0,d5 ; upper LW starts as zero
moveq #0,d0 ; result is
moveq #0,d1 ; also zero
mullp
lsr.l #1,d3 ; get a bit
bcc.s noadd ; there wasn't one
add.l d4,d0 ; add to LSLW
addx.l d5,d1 ; and with carry to MSLW
noadd
tst.l d3 ; end of multiplier?
beq.s mulexit ; yes
add.l d4,d4 ; no, double...
addx.l d5,d5 ; ...the multiplicand
bra.s mullp ; next bit
mulexit
rts
*
end
lot, except where the multiplier is very small. It too is dependents on its
inputs. You can find the full, signed version in dev8_util_cv_muldiv_asm in
the SMSQ/E sources.
I guess theres a little fat in all these routines that could be trimmed if
desired or special circumstances permitted. Different platforms also give
different results, and the same algorithms on different CPUs could also
make a huge difference.
Reading that Multiplication article on Wikipedia, I see further
efficiencies could be achieved, but the added complexity doesnt make much
sense for the relatively small numbers we're operating with here and may
not even kick in for numbers with less than a few hundred decimal digits.
Is any of this useful? Possibly, but thats not necessarily the point. One
thing I learnt from our FUZZBUZZ excercise way back when, was that when you
think youve reached the limit of compactness or efficiency, someone always
comes up with something to blow your socks off!
Per
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
I love long walks, especially when they are taken by people who annoy me.
- Fred Allen
- NormanDunbar
- Forum Moderator
- Posts: 2462
- Joined: Tue Dec 14, 2010 9:04 am
- Location: Buckie, Scotland
- Contact:
Re: Multiplication
Hi Per,
Basically, if the result overflows, I'm sitting here looking surprised! The biggest signed positive number is $7FFFFFFF and multiplying that by itself gives $3FFFFFFF:00000001 which has two zero bits at the high end, so in theory, it should never overflow. For unsigned it's $FFFFFFFF times itself, resulting in $FFFFFFFE:00000001 I think! There's no need to worry about overflow.
If one of the numbers is -1 or $FFFFFFFF then the (positive) result will be $7FFFFFFE:80000001 which is then sign adjusted to $80000001:7FFFFFFF so should still be ok.
Disclaimer, I'm no mathematician so the above might require a lot of correcting!
Cheers,
Norm.
Not as far as I'm aware. I'm multiplying D0 by D1, or vice versa, D0 gets halved and D1 gets doubled. The result is in D3:D2 with the high 32 bits in D3. Time, and testing, will tell!pjw wrote:Looks good, but didn't you miss a step?
pjw wrote:I was wondering how you dealt the sign in the event that the unsigned multiplication caused bit 63 of the accumulator to be set? Shouldn't that case cause an overflow error?
Code: Select all
doMult32
btst #0,d0 ; If even..
beq.s loop ; ..skip addition, else..
add.l d1,d2 ; ..add into the low word
addx.l #0,d3 ; .. and the high with carry.
bvc.s loop ; Loop unless D3 overflowed.
If one of the numbers is -1 or $FFFFFFFF then the (positive) result will be $7FFFFFFE:80000001 which is then sign adjusted to $80000001:7FFFFFFF so should still be ok.
Disclaimer, I'm no mathematician so the above might require a lot of correcting!
Thank you very much, and yes, I'll rephrase the description.pjw wrote:Yes, feel free to use it any way you like. Perhaps you might like to remove my description of the algorithm (it was terribly badly explained!) and use the Wikipedia description instead. Or the one I appended to the end of 64 bit version.
Cheers,
Norm.
Why do they put lightning conductors on churches?
Author of Arduino Software Internals
Author of Arduino Interrupts
No longer on Twitter, find me on https://mastodon.scot/@NormanDunbar.
Author of Arduino Software Internals
Author of Arduino Interrupts
No longer on Twitter, find me on https://mastodon.scot/@NormanDunbar.
Re: Multiplication
was just wondering how these routines compare against the maths co processors (I know this is/was not offered directly on the old QL ) in terms of speed and code efficiency (space used)