# Euclidean Division

## Truncated, Floored, and Euclidean Division

Haskell provides two kinds of integral division: `quotRem`

and `divMod`

. It’s common for beginners to ask what the difference is and when you might prefer one over the other. If you came here just trying to Google the difference between `quotRem`

and `divMod`

, here it is: `quotRem`

always rounds the quotient toward zero (upward if it’s less then zero and downward if it’s greater than zero) and is more efficient than `divMod`

, but `divMod`

rounds the quotient toward negative infinity (always downward).

I’ve been writing a library for binary fixed-point arithmetic, and when implementing the `(/)`

operation I was forced to consider the latter question. I already knew that `quotRem`

is more efficient than `divMod`

, which made me really hope that `quotRem`

was an appropriate choice. I also already knew the difference between the two. Regardless, I had no idea whether `div`

was a better choice then `quot`

for my purposes, so I asked and Googled around.

First, I discovered cdsmith’s blog post “Learning Number Theory and Haskell: The Division Algorithm.” He cites Gareth and Mary Jones’ *Elementary Number Theory* for the “division algorithm,” which goes as follows:

∀ *a**b* ∈ ℤ : *b* ≠ 0 ⇒ ∃ *q**r* ∈ ℤ : *a* = *q**b* + *r* ∧ 0 ≤ *r* < ∣*b*∣

He then goes on to demonstrate that *neither* `quotRem`

nor `divMod`

satisfy this property. `rem`

always gives the remainder the same sign as the dividend, and `mod`

always gives the remainder the same sign as the divisor. That means that if the dividend is negative then `quotRem`

gives the incorrect result, and if the divisor is negative then `divMod`

gives the incorrect result. Both functions satisfy one part of the above theorem: *a* = *q**b* + *r*. It’s just that they don’t necessarily choose *q* and *r* such that 0 ≤ *r* < ∣*b*∣.

How do other languages handle this? According to the Wikipedia article on the modulo operation, there is very little consistency among programming languages, although most of them follow the rules either of `quotRem`

or `divMod`

, sometimes providing both as in the case of Haskell. According to the article, `quotRem`

is classified as truncated division, and `divMod`

is classified as floored division. A function that satisfies the division algorithm in its entirety is called Euclidean division. The Wikipedia article cites two interesting papers. One is Raymond T. Boute’s “The Euclidean definition of the functions div and mod,” which analyzes various definitions for quotient and remainder, highlighting areas where some are more appropriate than others. You can probably tell from the title that Boute heavily favors Euclidean division. I won’t go into any depth here, but if you’re curious, it’s an interesting paper and pretty easy to read. The other paper is Daan Leijen’s “Division and Modulus for Computer Scientists,” which among other things includes a simple implementation for Euclidean division and a proof of its correctness.

## Deriving a More Efficient Euclidean Division Function

The implementation given by Leijen looks less efficient than it could be, but we can derive a more efficient version of it. Here is the implementation proven correct in the paper, which I have translated directly into Haskell:

```
euclidean :: Integral a => a -> a -> (a, a)
a `euclidean` b =
case a `quotRem` b of
(q, r) -> (q - i, r + i*b)
where i = if r >= 0
then 0
else ( if b > 0
then 1
else (-1)
)
```

The first time I went through this, I ended up substituting a couple subexpressions with `signum`

and `abs`

, but it turns out that those are not primitive operations in GHC, so they weren’t actually any faster. So, despite the fact that right off the bat we could substitute `if b > 0 then 1 else (-1)`

with `signum`

(since we know that `b /= 0`

anyway), let’s resist that temptation.

First, let’s inline `i`

:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> ( q - if r >= 0 then 0 else (if b > 0 then 1 else (-1))
, r + (if r >= 0 then 0 else (if b > 0 then 1 else (-1)))*b
)
```

Applying an operation to the result of an if expression is the same as applying it in both the branches of the if expression. We can use this knowledge to move the subtraction from `q`

into the first if statement and the addition to `r`

and multiplication with `b`

into the second.

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> ( if r >= 0 then q - 0 else q - if b > 0 then 1 else (-1)
, if r >= 0 then r + 0*b else r + (if b > 0 then 1 else (-1))*b
)
```

We can simplify parts where we subtract zero and multiply by zero:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> ( if r >= 0 then q else q - if b > 0 then 1 else (-1)
, if r >= 0 then r else r + (if b > 0 then 1 else (-1))*b
)
```

Let’s reformat this to use the outer (,) in prefix notation, in order to make another change more obvious:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> (,) (if r >= 0 then q else q - if b > 0 then 1 else (-1))
(if r >= 0 then r else r + (if b > 0 then 1 else (-1))*b)
```

Now we can put the (,) into the branches of the first if statement:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> (if r >= 0 then (,) q else (,) (q - if b > 0 then 1 else (-1)))
(if r >= 0 then r else r + (if b > 0 then 1 else (-1))*b)
```

Next, we can treat the entire first if expression as a function and move it to the branches of the next if statement:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> if r >= 0
then (if r >= 0 then (,) q else (,) (q - if b > 0 then 1 else (-1))) r
else (if r >= 0 then (,) q else (,) (q - if b > 0 then 1 else (-1)))
(r + (if b > 0 then 1 else (-1))*b)
```

Once we are inside either of the branches of the outer if expression, the value of `r >= 0`

is known already, so we can simplify some of the inner if expressions by picking the appropriate branch:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> if r >= 0
then (,) q r
else (,) (q - if b > 0 then 1 else (-1))
(r + (if b > 0 then 1 else (-1))*b)
```

Now let’s push the `\x -> q - x`

and `\x -> r + x*b`

operations into the inner if expressions:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> if r >= 0
then (,) q r
else (,) (if b > 0 then q - 1 else q + 1)
(if b > 0 then r + b else r - b)
```

Playing the same game with `(,)`

that we did earlier, we’ll push the `(,)`

into the first if expression:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> if r >= 0
then (,) q r
else (if b > 0 then (,) (q - 1) else (,) (q + 1))
(if b > 0 then r + b else r - b)
```

Then push the first if expression into the branches of the second:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> if r >= 0
then (,) q r
else if b > 0
then (if b > 0 then (,) (q - 1) else (,) (q + 1)) (r + b)
else (if b > 0 then (,) (q - 1) else (,) (q + 1)) (r - b)
```

Then, since we know the result of `b > 0`

in each major branch, pick the correct inner branch:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> if r >= 0
then (,) q r
else if b > 0
then (,) (q - 1) (r + b)
else (,) (q + 1) (r - b)
```

Now we can make the `(,)`

s infix again:

```
a `euclidean` b =
case a `quotRem` b of
(q, r) -> if r >= 0
then (q, r)
else if b > 0
then (q - 1, r + b)
else (q + 1, r - b)
```

Noticing that the first branch is the same as the result of the case match, we can just reuse that result instead of constructing a new tuple:

```
a `euclidean` b =
case a `quotRem` b of
qr@(q, r) -> if r >= 0
then qr
else if b > 0
then (q - 1, r + b)
else (q + 1, r - b)
```

Now let’s just tidy it up a little:

```
a `euclidean` b =
case a `quotRem` b of
qr@(q, r) | r >= 0 -> qr
| b > 0 -> (q-1, r+b)
| otherwise -> (q+1, r-b)
```

It’s not quite as pretty as the implementation I had using `signum`

and `abs`

, but at least it’s a bit tighter.

## Benchmarking Euclidean Division

So, how efficient is it? Of course it will not be as cheap as `quotRem`

alone, but it may be faster than `divMod`

. Here’s the default definition of `divMod`

, rewritten to fit the style of the definition I wrote for `euclidean`

:

```
a `divMod` b =
case a `quotRem` b of
qr@(q, r) | (r > 0 && b < 0) || (r < 0 && b > 0) -> (q-1, r+b)
| otherwise -> qr
```

Well, `divMod`

*looks* a little more complicated, but it’s close enough that I can’t say just from looking at it whether that means anything, though. I’d look at the core, but even that doesn’t tell us as much as benchmarks.

So, I wrote a simple benchmark using the vector and criterion packages. It pregenerates two random vectors containing `Int`

s and two random vectors containing `Word`

s, making sure that the second of each pair contains no zero elements; the vectors were generated with 100,000 elements each. For the benchmarks, it zips `divMod`

, `quotRem`

, and `euclidean`

over each vector pair. I built the executable on x86_64 Linux using GHC 7.0.2 with `-O`

and ran it with `-s 1000 +RTS -H`

. I was surprised by the results:

```
benchmarking Int/divMod
mean: 39.97829 ms, lb 39.80771 ms, ub 40.14524 ms, ci 0.950
std dev: 2.727351 ms, lb 2.663118 ms, ub 2.795931 ms, ci 0.950
benchmarking Int/quotRem
mean: 36.32100 ms, lb 36.15058 ms, ub 36.49007 ms, ci 0.950
std dev: 2.735966 ms, lb 2.680588 ms, ub 2.816686 ms, ci 0.950
benchmarking Int/euclidean
mean: 35.98929 ms, lb 35.84691 ms, ub 36.13137 ms, ci 0.950
std dev: 2.315703 ms, lb 2.284974 ms, ub 2.364711 ms, ci 0.950
benchmarking Word/divMod
mean: 34.19098 ms, lb 34.00258 ms, ub 34.38643 ms, ci 0.950
std dev: 3.095565 ms, lb 3.019508 ms, ub 3.177908 ms, ci 0.950
benchmarking Word/quotRem
mean: 33.49599 ms, lb 33.31483 ms, ub 33.68272 ms, ci 0.950
std dev: 2.965439 ms, lb 2.883252 ms, ub 3.065997 ms, ci 0.950
benchmarking Word/euclidean
mean: 32.24446 ms, lb 32.09422 ms, ub 32.39927 ms, ci 0.950
std dev: 2.469073 ms, lb 2.402462 ms, ub 2.548382 ms, ci 0.950
```

I was afraid these results were wrong because for a while they fluctuated wildly, but I tracked the issue down to my laptop’s thermal management system; having changed a few settings, the results are now extremely consistent.

So what the heck is going on? It’s understandable that `euclidean`

could be faster than `divMod`

, but being faster than `quotRem`

seems impossible, considering that `quotRem`

is a part of its implementation! I can’t even fathom why this would happen. I tried to compare the core output of these functions, but I can’t figure out a way to get ghc-core to dump anything useful about `divMod`

or `quotRem`

, so instead I’ll just go poking around in base’s source to find the exact implementations used for `Int`

and `Word`

so I can compare those with the core output for `euclidean`

.

Looking in `GHC.Num`

, we find the implementations of `divMod`

and `quotRem`

for `Int`

:

```
quotRemInt :: Int -> Int -> (Int, Int)
quotRemInt a@(I# _) b@(I# _) = (a `quotInt` b, a `remInt` b)
divModInt :: Int -> Int -> (Int, Int)
divModInt x@(I# _) y@(I# _) = (x `divInt` y, x `modInt` y)
```

I find it interesting that they would be defined this way. It seems to me that it would make more sense for `quotInt`

and `remInt`

to share some work. Let’s look at their definitions in `GHC.Base`

; maybe GHC can optimize `quotRemInt`

by inlining them:

```
{-# INLINE quotInt #-}
{-# INLINE remInt #-}
quotInt, remInt :: Int -> Int -> Int
(I# x) `quotInt` (I# y) = I# (x `quotInt#` y)
(I# x) `remInt` (I# y) = I# (x `remInt#` y)
```

It seems they are just primops, so if GHC is doing anything fancy, it’s at a lower level than core. Is the same true of `divInt`

and `modInt`

?

```
divInt, modInt :: Int -> Int -> Int
(I# x) `divInt` (I# y) = I# (x `divInt#` y)
(I# x) `modInt` (I# y) = I# (x `modInt#` y)
```

Well, I didn’t see any `INLINE`

pragmas for them, but their unboxed versions are not primops, so at least they’re easy to look at:

```
divInt# :: Int# -> Int# -> Int#
x# `divInt#` y#
| (x# ># 0#) && (y# <# 0#) = ((x# -# 1#) `quotInt#` y#) -# 1#
| (x# <# 0#) && (y# ># 0#) = ((x# +# 1#) `quotInt#` y#) -# 1#
| otherwise = x# `quotInt#` y#
modInt# :: Int# -> Int# -> Int#
x# `modInt#` y#
| (x# ># 0#) && (y# <# 0#) ||
(x# <# 0#) && (y# ># 0#) = if r# /=# 0# then r# +# y# else 0#
| otherwise = r#
where
!r# = x# `remInt#` y#
```

These appear to perform essentially the same conditional branches, so it might be pretty easy for GHC to merge these operations into one if it happens to decide that it’s willing to perform some common subexpression elimination (which it’s capable of doing, but only does so rarely).

These definitions seem to imply that `euclidean`

might have an advantage over `divMod`

and `quotRem`

in the `Int`

case since it is specialized for this case. Perhaps it would be penalized by only computing the quotient or the remainder instead of both by composing with `fst`

or `snd`

respectively. Here are the benchmark results for that (also doing the same with `Word`

):

```
benchmarking Int/div
mean: 16.28924 ms, lb 16.22815 ms, ub 16.35012 ms, ci 0.950
std dev: 983.7511 us, lb 955.7222 us, ub 1.022347 ms, ci 0.950
benchmarking Int/quot
mean: 15.25413 ms, lb 15.18767 ms, ub 15.32052 ms, ci 0.950
std dev: 1.074821 ms, lb 1.043818 ms, ub 1.121362 ms, ci 0.950
benchmarking Int/euclideanQuot
mean: 21.14791 ms, lb 21.03288 ms, ub 21.26672 ms, ci 0.950
std dev: 1.885141 ms, lb 1.808788 ms, ub 2.046627 ms, ci 0.950
benchmarking Int/mod
mean: 17.38129 ms, lb 17.32322 ms, ub 17.44116 ms, ci 0.950
std dev: 952.2632 us, lb 917.6375 us, ub 1.001158 ms, ci 0.950
benchmarking Int/rem
mean: 15.33842 ms, lb 15.27551 ms, ub 15.40213 ms, ci 0.950
std dev: 1.020424 ms, lb 992.9151 us, ub 1.064465 ms, ci 0.950
benchmarking Int/euclideanRem
mean: 19.37936 ms, lb 19.25804 ms, ub 19.50230 ms, ci 0.950
std dev: 1.963617 ms, lb 1.910733 ms, ub 2.038009 ms, ci 0.950
benchmarking Word/div
mean: 15.57217 ms, lb 15.50029 ms, ub 15.64809 ms, ci 0.950
std dev: 1.190803 ms, lb 1.136332 ms, ub 1.250223 ms, ci 0.950
benchmarking Word/quot
mean: 15.05417 ms, lb 14.99211 ms, ub 15.12062 ms, ci 0.950
std dev: 1.030228 ms, lb 988.1618 us, ub 1.077946 ms, ci 0.950
benchmarking Word/euclideanQuot
mean: 18.37285 ms, lb 18.26637 ms, ub 18.48592 ms, ci 0.950
std dev: 1.774987 ms, lb 1.682966 ms, ub 1.908387 ms, ci 0.950
benchmarking Word/mod
mean: 15.25315 ms, lb 15.19449 ms, ub 15.31481 ms, ci 0.950
std dev: 968.6585 us, lb 935.6003 us, ub 1.006045 ms, ci 0.950
benchmarking Word/rem
mean: 15.06085 ms, lb 15.00381 ms, ub 15.11985 ms, ci 0.950
std dev: 936.9703 us, lb 905.6734 us, ub 976.3766 us, ci 0.950
benchmarking Word/euclideanRem
mean: 16.12448 ms, lb 16.05018 ms, ub 16.20500 ms, ci 0.950
std dev: 1.245996 ms, lb 1.177754 ms, ub 1.354666 ms, ci 0.950
```

This seems fairly consistent with my prediction; this was enough of a penalty to bring `euclidean`

into dead last for efficiency. This also verifies that GHC isn’t doing anything special with `divMod`

or `quotRem`

as far as combining `div`

with `mod`

or `quot`

with `rem`

; these operations are each taking roughly half the time that their combined counterparts are taking. I had heard that `quotRem`

was itself a primop; I wonder if that was never true in the first place or if it has simply changed since that was stated (in which case, I wonder why). All this still doesn’t explain why `euclidean`

manages to beat `quotRem`

, but at least it was an enlightening bit of exploration. The only remaining avenue I can think of is to explore the implementation of GHC’s primops and also trace them through some of GHC’s optimization passes, but that is more trouble than I feel like going through right now. Maybe a knowledgable commenter can more easily explain the true reason for this.

I think these numbers make me less interested in how it’s specialized for `Word`

; I expect it’s fairly predictable, anyway.

## Specializing for Only the Quotient or Remainder

Let’s see what happens when we try manually specializing `euclidean`

for just the quotient or just the remainder.

### Quotient

Let’s start with the quotient first; I’ve taken the liberty to go ahead and naively drop the remainder from the results:

```
a `eucQuot` b =
case a `quotRem` b of
(q, r) | r >= 0 -> q
| b > 0 -> q-1
| otherwise -> q+1
```

We’ve already seen that `quot`

and `rem`

in isolation are faster than `quotRem`

combined, so if we can arrange to only use `quot`

then we might be better off. That means we need to find a way to get rid of the dependency on `r`

in the first guard. We know that if `r`

is not zero then it has the same sign as `a`

. We also know that `quotRem`

is acceptable when `a == 0`

. This means we can split the first guard into two; the first one being `a >= 0`

and the second being `r == 0`

:

```
a `eucQuot` b =
case a `quotRem` b of
(q, r) | a >= 0 -> q
| r == 0 -> q
| b > 0 -> q-1
| otherwise -> q+1
```

Now let’s go ahead and remove the guard bodies’ dependence on the result of `quotRem`

by using `quot`

in the bodies themselves:

```
a `eucQuot` b =
case a `quotRem` b of
(q, r) | a >= 0 -> a `quot` b
| r == 0 -> a `quot` b
| b > 0 -> (a `quot` b) - 1
| otherwise -> (a `quot` b) + 1
```

We had simplified the guard that uses `r`

, but we still haven’t removed the `r`

entirely. In order to do that, we will have to modify the last two cases such that the case that `r == 0`

is handled appropriately. Let’s look at the `b > 0`

case first. We want to write it such that when `r == 0`

it is equivalent to `q`

, but otherwise it is equivalent to `q - 1`

. This can be achieved by shifting the input to `quot`

in the original guard body by one; when `r == 0`

, this will shift the output by one, but otherwise it will be the same:

```
a `eucQuot` b =
case a `quotRem` b of
(q, r) | a >= 0 -> a `quot` b
| r == 0 -> a `quot` b
| b > 0 -> ((a + 1) `quot` b) - 1
| otherwise -> (a `quot` b) + 1
```

The `b > 0`

case is now independent of the `r == 0`

case, so let’s move it above the latter to make that clear:

```
a `eucQuot` b =
case a `quotRem` b of
(q, r) | a >= 0 -> a `quot` b
| b > 0 -> ((a + 1) `quot` b) - 1
| r == 0 -> a `quot` b
| otherwise -> (a `quot` b) + 1
```

All that’s left is the `otherwise`

case. At this point, we know `a < 0`

and `b < 0`

. To make this account for the `r == 0`

case, we want to simply shift `a`

the exact same way:

```
a `eucQuot` b =
case a `quotRem` b of
(q, r) | a >= 0 -> a `quot` b
| b > 0 -> ((a + 1) `quot` b) - 1
| r == 0 -> a `quot` b
| otherwise -> ((a + 1) `quot` b) + 1
```

Now we can drop the `r == 0`

case entirely:

```
a `eucQuot` b =
case a `quotRem` b of
(q, r) | a >= 0 -> a `quot` b
| b > 0 -> ((a + 1) `quot` b) - 1
| otherwise -> ((a + 1) `quot` b) + 1
```

And the case expression is entirely unnecessary now:

```
a `eucQuot` b
| a >= 0 = a `quot` b
| b > 0 = ((a + 1) `quot` b) - 1
| otherwise = ((a + 1) `quot` b) + 1
```

Promisingly, aside from working with boxed instead of unboxed values, this looks even simpler than the implementation of `divInt#`

!

### Remainder

Let’s compute the remainder, now. We’ll start with the original `euclidean`

function, but where the guard bodies only give the remainder instead of a pair:

```
a `eucRem` b =
case a `quotRem` b of
(_, r) | r >= 0 -> r
| b > 0 -> r + b
| otherwise -> r - b
```

We can see that the quotient from `quotRem`

is never used. This gives us an opportunity to replace the use of `quotRem`

with `rem`

:

```
a `eucRem` b =
case a `rem` b of
r | r >= 0 -> r
| b > 0 -> r + b
| otherwise -> r - b
```

And we’re done already! Again, aside from the boxed values, this looks a lot simpler than `modInt#`

to me. The gains from this are likely to be smaller than in the case of `eucQuot`

since it’s easier for GHC to specialize `euclidean`

to something like this in the first place.

## Benchmarking Euclidean Quotient and Remainder

Let’s see how these stack up. I’m using the same benchmark setup as earlier, just using different operations:

```
benchmarking Int/eucQuot
mean: 16.11784 ms, lb 16.05454 ms, ub 16.18123 ms, ci 0.950
std dev: 1.023141 ms, lb 993.6663 us, ub 1.057743 ms, ci 0.950
benchmarking Int/eucRem
mean: 18.20244 ms, lb 18.14797 ms, ub 18.25797 ms, ci 0.950
std dev: 891.3623 us, lb 856.9801 us, ub 969.7699 us, ci 0.950
benchmarking Word/eucQuot
mean: 15.22592 ms, lb 15.16014 ms, ub 15.29432 ms, ci 0.950
std dev: 1.084259 ms, lb 1.042233 ms, ub 1.133132 ms, ci 0.950
benchmarking Word/eucRem
mean: 15.22092 ms, lb 15.16380 ms, ub 15.27993 ms, ci 0.950
std dev: 941.7308 us, lb 907.7359 us, ub 1.001557 ms, ci 0.950
```

It appears to be quite a substantial improvement, especially for the Euclidean quotient. `eucQuot`

is faster than `div`

, as I hoped, and slower than `quot`

, as I expected. For `Int`

, `eucRem`

is slower than both `mod`

and `rem`

, though I doubt it is insurmountable; I won’t bother with it right now. For `Word`

, at least, it’s still in its proper place between `mod`

and `rem`

.

## Conclusions

`eucQuot`

is faster than `div`

, at least for the types I tested it on, and assuming the arguments for its well-behavedness are convincing enough, I think I know what I’ll be using for my fixed-point library. Oddly, the combined `euclidean`

operation is benchmarking faster even than `quotRem`

, but I suspect that a more complex benchmark might give the speed advantage back to `quotRem`

.

Posted on April 28, 2011, in Computer Arithmetic and tagged benchmarks, division, divMod, euclidean division, floored division, functional, ghc, haskell, program derivation, quotRem, truncated division. Bookmark the permalink. 1 Comment.

Pingback: Fixed-Point Arithmetic « Declarative Stuff