What division algorithm should be used for dividing small integers in hardware?

Question

I need to multiply an integer ranging from 0-1023 by 1023 and divide the result by a number ranging from 1-1023 in hardware (verilog/fpga implementation). The multiplication is straight forward since I can probably get away with just shifting 10 bits (and if needed I'll subtract an extra 1023). The division is a little interesting though. Area/power arent't really critical to me (I'm in an FPGA so the resources are already there). Latency (within reason) isn't a big deal so long as I can pipeline the deisgn. There are obviously several choices with different trade offs, but I'm wondering if there's an "obvious" or "no brainer" algorithm for a situation like this. Given the limited range of operands and the abundance of resources that I have (bram etc) I'm wondering if there isn't something obvious to do.

If you have enough resources, maybe build a LUT (Look up table) for it. — saeedn
– saeedn, Commented Aug 7, 2013 at 19:10

EML · Accepted Answer · 2013-08-09 07:39:04Z

2

If you can pre-compute everything, and you've got a spare 20x20 multiplier, and some way to store your pre-computed number, then go for Morgan's suggestion. You need to precompute a 20-bit multiplicand (10b quotient, 10b remainder), and multiply by your first 10b number, and take the bottom 30b of the 40b result.

Otherwise, the no-brainer is non-restoring division, since you say that latency isn't important (lots of stuff on the web, most of it incomprehensible). you have a 20-bit numerator (the result of your (1023 x) multiplication), and a 10-bit denominator. This gives a 20b quotient, and a 10b remainder (ie. 20 bits for the integer part of the answer, and 10 bits for the fractional part, giving a 30b answer).

The actual hardware is pretty trivial: an 11b adder/subtractor, a 31b shift register, and a 10b or 11b register to store the divisor. You also need a small FSM to control it (2b). You have to do a compare, add or subtract, and shift in every clock cycle, and you get the answer out in 21 cycles. I think. :)

edited Aug 9, 2013 at 7:39

answered Aug 8, 2013 at 8:29

EML

10.5k8 gold badges51 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Doov Over a year ago

Thanks for the reply @EML. I think I'm going to go with Morgan's approach (and of course with your detail). I can indeed precompute everything -- wrote a matlab script to get the coefficients/make the LUT. BTW num2fixpt is a nice function that I didn't know existed till I wrote my own version. I'm actually on a spartan 3a-dsp, which has several multipliers built in. That stated I think the latency of the multiplication is nicer than the division. I'd give you the answer vote, but Morgan was first :)

Morgan · Accepted Answer · 2013-08-08 10:19:52Z

1

If you can work with fixed point precision rather than integers it may be possible to change :

divide the result by a number ranging from 1-1023

to multiplication by a number ranging from 1 - 1/1023, ie pre-compute the divide and store that as the coefficient for the multiply.

edited Aug 8, 2013 at 10:19

answered Aug 7, 2013 at 20:36

Morgan

20.6k8 gold badges62 silver badges87 bronze badges

2 Comments

Doov Over a year ago

I think I might do something similar. The total operation is (integer ranging from 0-1023) * 1023 / (integer ranging 0-1023). Given the second half I could just reduce this to a fixed point multiplication of (integer ranging from 0-1023) * (fixed point number 1-1023). The latter half of the equation can be precomputed getting rid of the division all together. Thoughts?

Morgan Over a year ago

@Doov that is how I would approach it.

Collectives™ on Stack Overflow

What division algorithm should be used for dividing small integers in hardware?

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related