First off, as said in the comments, C-style comments matches the first /* with the first */; meaning you can not nest comments:
/* This comment is /* a nested */ comment */
should be interpreted as: comment */
It is also more natural, in Python, to iterate over the elements of a collection rather than their indices. This allows you to write for character in content:. And if you trully need indices, you can use enumerate.
You can also use temporary variables to store characters that may indicate the beginning or the end of a comment without having to look at the character before or after the current one:
def remove_comments(content):
block_comment = False
line_comment = False
probably_a_comment = False
result = []
for character in content:
if not line_comment and not block_comment and character == '/':
probably_a_comment = True
continue
if block_comment and character == '*':
probably_a_comment = True
continue
if line_comment and character == '\n':
line_comment = False
result.append('\n')
elif block_comment and probably_a_comment and character == '/':
block_comment = False
elif not line_comment and not block_comment:
if probably_a_comment:
if character == '/':
line_comment = True
elif character == '*':
block_comment = True
else:
result.append('/') # Append the / we skipped when flagging that it was probably a comment starting
result.append(character)
else:
result.append(character)
probably_a_comment = False
return ''.join(result)
You can also simplify a bit the memory management by using a generator instead of appending into a list:
def remove_comments(content):
def gen_content():
block_comment = False
line_comment = False
probably_a_comment = False
for character in content:
if not line_comment and not block_comment and character == '/':
probably_a_comment = True
continue
if block_comment and character == '*':
probably_a_comment = True
continue
if line_comment and character == '\n':
line_comment = False
yield '\n'
elif block_comment and probably_a_comment and character == '/':
block_comment = False
elif not line_comment and not block_comment:
if probably_a_comment:
if character == '/':
line_comment = True
elif character == '*':
block_comment = True
else:
yield '/'
yield character
else:
yield character
probably_a_comment = False
return ''.join(gen_content())
If you want to go crazy, you can also use a state machine approach to simplify the code: no more boolean flags and far less comparisons in average:
def source_code(char):
if char == '/':
return comment_begin, ''
return source_code, char
def comment_begin(char):
if char == '/':
return inline_comment, ''
if char == '*':
return block_comment, ''
return source_code, '/'+char
def inline_comment(char):
if char == '\n':
return source_code, char
return inline_comment, ''
def block_comment(char):
if char == '*':
return end_block_comment, ''
return block_comment, ''
def end_block_comment(char):
if char == '/':
return source_code, ''
return block_comment, ''
def remove_comments(content):
def gen_content():
parser = source_code
for character in content:
parser, text = parser(character)
yield text
return ''.join(gen_content())
But, all in all, this is far too complicated for the task at hand. You can get the same job done using a simple regular expression:
import re
COMMENTS = re.compile(r'''
(//[^\n]*(?:\n|$)) # Everything between // and the end of the line/file
| # or
(/\*.*?\*/) # Everything between /* and */
''', re.VERBOSE)
def remove_comments(content):
return COMMENTS.sub('\n', content)
remove_comment(code)should return\n comment\n */\n foo();and not\n foo();. \$\endgroup\$index-1would be to keep alastChar. You have three conditions starting…== '/' and…in a row. Is a/*in an in-line-comment (comment_line_inside) really intended to increase the comment level? You may find it easier to "skip the rest of the line" as soon as the start of an in-line-comment is recognised. Your code lacks docstrings and comments ("The problem" may be writing code not easily misunderstood - adding and maintaining comments instead of removing them might help.) \$\endgroup\$\n comment(or even\n */) show up? \$\endgroup\$*/closes the first/*. \$\endgroup\$