Haskell, Python speed comparison (Heap // priority-queue)

Question

I have a problem to solve. Details of it basically irrelevant & I have two valid solutions for it: in python and Haskell.

Python code:

import heapq

_, volume, *ppl = map(int, open("input.txt").read().split())
ppl = [-x for x in ppl]
heapq.heapify(ppl)
for i in range(volume):
    current = (- heapq.heappop(ppl)) // 10
    heapq.heappush(ppl, -current)

print(sum(-x for x in ppl), file=open("output.txt", 'w'))

Haskell code:

import Data.List hiding (insert, singleton)

type Rank = Int

data Heap a = Tip | Node Rank Int (Heap Int) (Heap Int)

rank Tip = 0
rank (Node r _ _ _) = r

fromList :: [Int] -> Heap Int
fromList [] = Tip
fromList (x:xs) = foldl' (flip insert) (singleton x) xs

makeHeap :: Int -> Heap Int -> Heap Int -> Heap Int
makeHeap x a b = if rank a >= rank b then Node (rank b + 1) x a b
                                     else Node (rank a + 1) x b a

empty :: Heap a
empty = Tip

singleton :: Int -> Heap Int
singleton x = Node 1 x Tip Tip

insert :: Int -> Heap Int -> Heap Int
insert x = merge (singleton x)

merge :: Heap Int -> Heap Int -> Heap Int
merge l Tip = l
merge Tip r = r
merge h1@(Node _ x l1 r1) h2@(Node _ y l2 r2) =
  if x > y then makeHeap x l1 (merge r1 h2)
            else makeHeap y l2 (merge h1 r2)

-- | O(1).
peek :: Heap Int -> Int
peek Tip = 0
peek (Node _ x _ _) = x

-- | O(1), but evaluating the second element of the tuple has same complexity
-- of `merge`.
extract :: Heap Int -> (Int, Heap Int)
extract (Node _ x a b) = (x, merge a b)

toSum :: Heap Int -> Int
toSum Tip            = 0
toSum (Node _ x a b) = x + toSum a + toSum b 

solve :: Int -> Heap Int -> Int
solve 0 heap = toSum heap
solve _ (Node _ 0 _ _) = 0
solve sips ppl = solve (sips - 1) (insert (val `div` 10) h)
    where (val, h) = extract ppl
    


main :: IO()
main = do
    line <- readFile "input.txt"
    let _:volume:ppl = map(read :: String -> Int) $ words line
    let heap = fromList ppl
    writeFile "output.txt" $ show $ solve volume heap

The question borrowed is in the following slice of the results table:

N of test	Verdict	Time (sec) python	Time (sec) Haskell
19	OK	0.0312	0.0156
20	OK	0.0624	0.172
21	OK	0.078	0.733
22	OK	0.234	1.34
23	OK	0.218	1.34

Tests itself is unavailable to me.

So... Seems like my Haskell code for some reason is significantly slower with high amount of values (~10^5). Could someone explain me why and how to fix it?

probably it's because of your so-called heap implementation in Haskell is not how a normal heap is implemented and therefore not being very efficient? Have you tried with hackage.haskell.org/package/heap-1.0.4/docs/Data-Heap.html ? Or is this homework? if so, please say so. — mb21
– mb21, Commented Oct 26, 2024 at 14:53
I cannot "try" it in the test system. If my implementation is inefficient (what's obviously the case) I want to know why. — mesenev
– mesenev, Commented Oct 26, 2024 at 15:05
Why? It seems that your makeHeap and merge are both O(n). Have a look at en.wikipedia.org/wiki/Heap_(data_structure)#Implementation In Haskell, hackage.haskell.org/package/vec is the normal go-to lib if you need an array (which is not just a linked list). — mb21
– mb21, Commented Oct 26, 2024 at 15:40
extract should be O(lg n), and it's not obvious that merge is O(lg n) as it needs to be. — chepner
– chepner, Commented Oct 26, 2024 at 16:22
You might find RosettaCode.org interesting and useful: rosettacode.org/wiki/Priority_queue#Haskell — pjs
– pjs, Commented Oct 26, 2024 at 17:42

K. A. Buhr · Accepted Answer · 2024-10-27 00:26:36Z

2

Most likely, the issue is with reading the input file rather than your algorithm.

When I benchmark reading a million integers and summing them with Python:

ppl = map(int, open("random.dat").read().split())
print(sum(ppl))

I get a baseline time of about 100ms.

If I compare that with a version patterned after your Haskell code:

main :: IO()
main = do
    line <- readFile "random.dat"
    let ppl = map (read :: String -> Int) $ words line
    print $ sum ppl

That takes about 950ms. Oof!

Switching from this String implementation to a lazy ByteString:

import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Char8 as BL

main :: IO()
main = do
    line <- BL.readFile "random.dat"
    let ppl = [n | w <- BL.words line, let Just (n, _) = BL.readInt w]
    print $ sum ppl

gets it down to 43ms.

So, give that a shot.

answered Oct 27, 2024 at 0:26

K. A. Buhr

52.7k3 gold badges55 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

no comment Over a year ago

They said 10^5, though, not a million. If that takes 95 ms, that's only a small part of the 1.34 seconds.

mesenev Over a year ago

link: 19 -- 0 sec | 20 -- 0.062 sec | 21 -- 0.172 sec | 22 -- 0.702 sec | 23 -- 0.718 sec | Very interesting! thanks! Still 3x slower then python though :)

K. A. Buhr Over a year ago

You may find that adding strictness with some well placed exclamation marks helps. Writing data Heap a = Tip | Node !Rank !Int !(Heap Int) !(Heap Int) gave a significant improvement in some of my testing.

mesenev Over a year ago

@K.A.Buhr another 50 ms win! Thank you! 22: 0.686, 23: 0.655.

Collectives™ on Stack Overflow

Haskell, Python speed comparison (Heap // priority-queue)

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related