libreoffice / asking for cell password / brute force

libreoffice / asking for cell password / brute force

  • Written by
    Walter Doekes
  • Published on

While we were editing a provider-supplied Excel document using LibreOffice, at seemingly random times, it would show a popup asking us for a password to a cell. This popup would only go away if we set a new (non-blank) password on it. Annoying!

Apparently, it has to do with Sheet and Cell protection whereby an editing user is disallowed to edit certain cells/rows/sheets in a document.

Having certain cells marked read-only, sure. But protected using a password? That doesn’t make sense. We’re editing the document, it’s ours now.

So, you can disable sheet protection using the Tools menu, where there is a Protect sheet checkbox. Disabling it should give us full access. Unfortunately, in this case the correct non-blank password is needed.

As you may already know, an xlsx file, is just a PK zipped archive with XML and other content in it. Let’s see if we can find a password in there:

$ cd $(mktemp -d)
$ unzip ~/Documents/annoying_document.xlsx
Archive:  /home/walter/Documents/annoying_document.xlsx
  inflating: xl/workbook.xml
  inflating: xl/styles.xml
...
$ grep password . -r
./xl/worksheets/sheet1.xml:...
...</sheetData><sheetProtection sheet="true" password="ca5b"
  formatColumns="false" formatRows="false"
  insertRows="false"/><mergeCells count="7">...

Okay, there’s the password: ca5b. Except it’s not a password. It’s a (very short) hash.

We can simply remove the entire <sheetProtection/> tag and zip it up again. (Easy fix.)

Or, we could go for the hard fix, and try to brute force a password. This also helps if you have multiple documents with the same “protection”.

OpenOffice.org’s documentation has this to say about the password hash:

The PASSWORD record contains the hash value of the password used to protect the sheet.

[…]

The length of the password is restricted to 15 characters.

ALGORITHM Get_Password_Hash( password )
1) hash <-0 ; char_index <-char_count <-character count of password
2) char_index <-char_index - 1
3) char <-character from password with index char_index
   {0 is leftmost character}
4) hash <-hash XOR char
5) rotate the lower 15 bits of hash left by 1 bit
6) IF char_index > 0 THEN JUMP 2)
7) RETURN hash XOR char_count XOR CE4BH

In Python, that might look like this:

def f(s):  # takes a string
    h = 0                                   # step 1
    for idx in range(len(s) - 1, -1, -1):   # step 1 and 2 and 6
        h ^= ord(s[idx])                    # step 3 and 4
        h <<= 1                             # step 5 (left shift)
        if h & 0x8000:                      # step 5 (check high bit)
            h = (h & 0x7fff) | 1            # step 5 (truncate + rotate)
    return (h ^ len(s)) ^ 0xce4b            # step 7

And running it on the string abcdefghij yields FEF1 (in hex):

>>> hex(f('abcdefghij'))
'0xfef1'

>>> assert f('zzyw') == f('BBAb')
>>> assert f('zzyw') == f('pqpp')
>>> assert f('zzyw') != f('pqpr')

If we want to brute force it, we’ll optimize away the 0xce4b and the ord() calls first:

def f_tuple(t):  # takes a tuple of ints
    h = 0                                   # step 1
    for idx in range(len(t) - 1, -1, -1):   # step 1 and 2 and 6
        h ^= t[idx]                         # step 3 and 4
        h <<= 1                             # step 5 (left shift)
        if h & 0x8000:                      # step 5 (check high bit)
            h = (h & 0x7fff) | 1            # Step 5 (truncate + rotate)
    return h ^ len(t)                       # step 7
>>> hex(f_tuple([ord(i) for i in 'abcdefghij']) ^ 0xce4b)
'0xfef1'
>>> import timeit
>>> abcdefghij_tuple = tuple(ord(i) for i in 'abcdefghij')
>>> result_tuple = 0xfef1 ^ 0xce4b
>>> timeit.timeit(lambda: f('abcdefghij') == 0xfef1)
1.3861091136932373
>>> timeit.timeit(lambda: f_tuple(abcdefghij_tuple) == result_tuple)
1.2009429931640625

And we can use the nice generator pattern provided by Python:

def generator():
    "Yields 'A', 'B', 'C', .., 'Z', 'AA', 'AB', etc. as tuples of ints"
    valid_tokens = tuple([ord(i) for i in (
        'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
        'abcdefghijklmnopqrstuvwxyz'
        '0123456789'
        # You're free to add more tokens, but it may have an adverse effect
        # on the search. YMMV.
        # '!@#$%^&*()' ',./<>?;:[]{}-_=+|'
        )])
    max_token = len(valid_tokens)
    # "The length of the password is restricted to 15 characters."
    for length in range(1, 16):  # length 1 through length 15
        state = length * [0]  # index to valid_tokens
        while True:
            pass_ = tuple(valid_tokens[i] for i in state)
            yield pass_  # as tuple of ints, because f_tuple is faster than f
            # Increase last element of state by one:
            for tailidx in range(length - 1, -1, -1):  # edit last index first
                state[tailidx] += 1
                if state[tailidx] != max_token:
                    break
                state[tailidx] = 0
            else:
                # We did not break out of the loop, so all of state was
                # max_key. Try the next length.
                break

So.. that keeps repeating forever, until all combinations have been tried:

>>> g = iter(generator())
>>> [''.join(chr(ch) for ch in next(g)) for i in range(10)]
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

.. and after 10,000 iterations, it’s at:

>>> [''.join(chr(ch) for ch in next(g)) for i in range(10000)][-10:]
['BkS', 'BkT', 'BkU', 'BkV', 'BkW', 'BkX', 'BkY', 'BkZ', 'Bka', 'Bkb']

Wrapping it all up, a function to find a valid password for a specified hash:

def find_password_for(wanted):
    expected = wanted ^ 0xce4b
    for t in generator():
        if f_tuple(t) == expected:
            pwd = ''.join(chr(ch) for ch in t)
            print('found password for 0x{:x}: {} [{!r}]'.format(
                wanted, pwd, pwd))
            return
>>> find_password_for(0xca5b)
found password for 0xca5b: BBAy ['BBAy']

.. or, if we want to know what it’s doing, we can add a bit of SIGALRM magic:

def find_password_for(wanted):
    # vvvvv-- print status
    import signal
    def print_status(*args):
        print('(generator is at {!r})'.format(''.join(chr(ch) for ch in t)))
        signal.alarm(1)
    signal.signal(signal.SIGALRM, print_status)
    signal.alarm(1)
    try:
        # ^^^^^-- print status
        expected = wanted ^ 0xce4b
        for t in generator():
            if f_tuple(t) == expected:
                pwd = ''.join(chr(ch) for ch in t)
                print('found password for 0x{:x}: {} [{!r}]'.format(
                    wanted, pwd, pwd))
                return
    # vvvvv-- print status
    finally:
        signal.alarm(0)  # stop alarm

Now you can see what’s going on (and how slow it is):

>>> find_password_for(f('ABBBO'))
(generator is at 'BuvU')
(generator is at 'EaWh')
(generator is at 'HGAG')
... 18 lines elided ...
(generator is at '5fau')
(generator is at '8Exw')
(generator is at 'AAmr9')
found password for 0xc014: ABBBO ['ABBBO']

So, that was a sort-of interesting side track. It’s noteworthy that it’s still quite slow. I stopped the iterations before finding a collision for 0xfef1. Removing the <sheetProtection/> is generally your best bet.

Maybe this could be an interesting toy project to try out Rust-lang on…


Back to overview Newer post: excel / generate sheet password collision Older post: docker unprivileged user / becoming root