Comment on page

# Decompress String

In this exercise, you're going to decompress a compressed string.
Your input is a compressed string of the format`number[string]`and the decompressed output form should be the`string`written`number`times. For example:
Example:
input: 3[abc]4[ab]c
output: asabcabcabcababababc

## Other rules

• Number can have more than one digit. For example,`10[a]`is allowed, and just means aaaaaaaaaa
• One repetition can occur inside another. For example,`2[3[a]b]`decompresses into`aaabaaab`
• Characters allowed as input include digits, small English letters and brackets`[]`.
• Digits are only to represent amount of repetitions.
• Letters are just letters.
• Brackets are only part of syntax of writing repeated substring.
• Input is always valid, so no need to check its validity.
The Idea: The first thing that needs to be done is splitting the input strings into tokens we can recognize. This will simplify things a lot later. The tokens we are interested in are brackets, numbers, and characters. Next we step into an algorithm that will match brackets. We can simulate this using a stack. On the instance we find a closing ], the top of the stack will reveal the opening brace. We can use the top of the index to extract two details: the string, and the multiplier for the string, of which will appear to the right and left of the opening delimiter respectfully.
Then come the most crucial part of the algorithm: 1) If we find a previous instance in the stack that is + 1 from its current length, we do this: `tree_level[len(s)] += multiplier * (my_str + tree_level[len(s) + 1])`, otherwise we do this: `tree_level[len(s)] += (multiplier * my_str)`. The first identifies when we have a nested case: `a[2b[3c]]`, for example. so that `ccc` from stack 2 becomes `bbccc` in stack 1. The point is that `a` could have access to `bbcc` by looking at stack 0 + 1, which becomes `abbccc`. Eventually the root node is reached, and we can return 0th index of `tree_level`.
In the second case, if there isnt a child (+1 stack index), then we append to the string on the same level. This would be equivalent to the non-nested case. For example, `3[abc]3[d]`, where both `d` and `abc` point to the same stack level (0).
Complexity:
import re
import collections
def extract_components(tokens, open_i):
my_str = tokens[open_i + 1]
multiplier = tokens[open_i - 1]
return int(multiplier), my_str
def uncompress_str(compr):
if not compr:
return ""
# delimit by '[' or ']' or by numbers, then by default the characters will be remain and be delimited as well
# then filter out spaces to ensure proper / expected indexing in our algorithm
tokens = re.split('(\[|\]|[0-9]+)', compr)
tokens = [token for token in tokens if token != '']
tree_level = collections.defaultdict(str)
s = []
for i, token in enumerate(tokens):
if token == '[':
s.append(i)
elif token == ']':
multiplier, my_str = extract_components(tokens, s.pop())
if my_str.isnumeric(): # checks for instance of empty string
my_str = ''
if len(s) + 1 in tree_level:
tree_level[len(s)] += multiplier * (my_str + tree_level[len(s) + 1])
else:
tree_level[len(s)] += (multiplier * my_str)
return tree_level
Testing
t = '2[abc]2[aaa3[ok2[zz]3[qq2[d]]]]'
print(uncompress_str(t)) # 'abcabcaaaokzzzzqqddqqddqqddokzzzzqqddqqddqqddokzzzzqqddqqddqqddaaaokzzzzqqddqqddqqddokzzzzqqddqqddqqddokzzzzqqddqqddqqdd'
t = '2[abc]3[b]0[nope]2[aaa3[ok]]'
print(uncompress_str(t)) # 'abcabcbbbaaaokokokaaaokokok'
t = '2[3[a]1[b]]'
print(uncompress_str(t)) # 'aaabaaab'
t = '3[b2[c10[d]]]'
print(uncompress_str(t)) # 'bcddddddddddcddddddddddbcddddddddddcddddddddddbcddddddddddcdddddddddd'
t = '3[abc4[ab]]'
print(uncompress_str(t)) # 'abcabababababcabababababcabababab'
t = '3[abc]4[ab]1[c]'
print(uncompress_str(t)) # 'abcabcabcababababc'
t = '10[a]'
print(uncompress_str(t)) # 'aaaaaaaaaa'
t = '1[b2[c3[d2[abcd]]]]'
print(uncompress_str(t)) # 'bcdabcdabcddabcdabcddabcdabcdcdabcdabcddabcdabcddabcdabcd'