1

I am attempting to replicate the hashing the Excel does when a sheet is password-protected in Python, but am not matching even when testing on dummy inputs. From the xml file, I am seeing this:

sheetProtection algorithmName="SHA-512"
hashValue="Ua4h+FTPQI0+aCSbQ1Ya9fDYsddMzCfAypD1u1TBGmNONIy6sRfJBLDoMhOfbCv0i5Q2t1JOm4okjSvC1CsJYw==" 
saltValue="Furur6jnDIFaQBhHQBXzFA==" 
spinCount="100000"

To replicate this, I coded the following in Python:

import hashlib
import base64

hash_value = "Ua4h+FTPQI0+aCSbQ1Ya9fDYsddMzCfAypD1u1TBGmNONIy6sRfJBLDoMhOfbCv0i5Q2t1JOm4okjSvC1CsJYw=="
salt_value = "Furur6jnDIFaQBhHQBXzFA=="

password = "password"

pdata = password.encode('utf-8')
sdata = base64.b64decode(salt_value)
hash_iter = (sdata + pdata)

for i in range(100000):
    hash_iter = hashlib.sha512(hash_iter).digest()

print(base64.b64encode(hash_iter).decode())

which returns the following result:

9o4313eeh/ym8+GHSHW4iyh1usvNVD1DflzET5WgG9QKutn0loM24Op7/McAGr4D5H10W+DuQCD8Tj8Cn7uDOg==

What am I doing wrong here? I have tried switching between prefix/suffix for the salt, and it does not get me Excel's final hash. I have also tried different encodings for the plain text password into binary, but that doesn't seem to be the issue. I have a suspicion it might have something to do with how the hash is iterated, but I am not sure what I'm doing wrong.

5
  • The algorithms is certainly wrong.. What are you trying to do in the first place? There are a lot of Python packages that read or write Excel sheets with protection. The algorithm used is wrong - SHA512 isn't used for password hashing, not even with a salt.Applications use algorithms like PBKDF2 with an HMAC hash function that in turn uses eg SHA-512. There are packages that implement PKDBF2 too Commented May 28 at 6:51
  • From the document protection spec I see that the 0-based iteration is appended to each iteration result, Hn = H(Hn-1 + iterator) Commented May 28 at 7:04
  • @PanagiotisKanavos, I am simply trying to replicate what Excel is doing to encrypt passwords for protecting an individual sheet. I'm not sure on the algorithm -- the XML code states it's SHA-512 and I think because it's an individual sheet, rather than the entire file, it may just be simpler. I am happy to be wrong! It's just that support like this lead me to believe it may just be SHA-512. Commented May 28 at 7:38
  • @PanagiotisKanavos thanks for the pointer on the iterator appendation! I will try that. Commented May 28 at 7:42
  • I don't know the specific APIs you are using, but I've had similar problems in the past that turned out to be simply the fact that different tools present the same binary value in different ways, for example in big-endian versus little-endian format. Commented May 28 at 8:41

1 Answer 1

2

From the link provided in Panagiotis Kanavos' comment:

Let H() be an implementation of the hashing algorithm specified by AlgorithmName, iterator be an unsigned 32-bit integer, Hn be the hash data of the nth iteration, and a plus sign (+) represent concatenation. The initial password hash is generated as follows.

H0 = H(salt + password)

The hash is then iterated using the following approach.

Hn = H(Hn-1 + iterator)

where iterator is initially set to 0 and is incremented monotonically on each iteration until SpinCount iterations have been performed. The value of iterator on the last iteration MUST be one less than SpinCount. The final hash is then Hfinal = HSpinCount-1.

and a little trial and error, it appears that the Excel password hash algorithm can be reproduced in Python with the following:

def excel_password_hash_sha512(password: str, salt: bytes, iteration_count: int) -> str:
    password_bytes = password.encode('utf-16le')
    h0 = hashlib.sha512(salt + password_bytes).digest()
    h_i = h0
    for iterator in range(iteration_count):
        h_i = hashlib.sha512(h_i + iterator.to_bytes(4, 'little', signed=False)).digest()
    return base64.b64encode(h_i).decode()

The non-obvious parts that had to be deduced were:

  • The encoding to apply to the password string to convert it to bytes: UTF-16LE, and
  • The endianness to be used to convert the iterator integer to bytes: little
Sign up to request clarification or add additional context in comments.

1 Comment

You've done it!! Thank you for your help, very interesting what the hangups were.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.