How to determine whether a line of Python has a comment, and split the line into `[code, comment]`

638 views Asked by At

I'm wondering how to

  1. determine (True / False) whether a line of Python code has a comment
  2. split the line into code, comment

Something such as:

loc_1 = "print('hello') # this is a comment"

Is very straightforward, but something such as:

loc_2 = for char in "(*#& eht # ": # pylint: disable=one,two # something

Is not so straightforward. I don't know how to go about this in general, such that I can do

f(loc_2)
# returns
[
    # code
    'for char in "(*#& eht # ":',
    # comment
    ' # pylint: disable=one,two # something'
]

From comments: "You tagged this with libcst. Have you already using that library to give you an AST?"

I have tried to use this and failed with it, e.g.:

From comments: "Are you parsing just single lines, the source code for a function or a class, or parsing whole modules?"

I'm parsing single lines - that was my intention at least. I wanted to be able to iterate over the lines in a file. I have a pre-existing process which already iterates over the lines of a python file, but I wanted to extend it to consider the comments, which motivated this question.

From comments "What is your eventual goal with the parse results, just get the source code without comments?"

No - I want the source code and the comment as given in the example.

1

There are 1 answers

0
JonSG On

I think you can get a good way down the path to an answer using io.StringIO() and tokenize.generate_tokens(). If applied to a string representing a line of python you should get a list of tokeninfo that you can then inspect for the comment token (tokenize.COMMENT).

Your f() method might look a bit like this but please make the name something meaningful rather than f()

import io
import tokenize

def separate_code_from_comments(text):
    reader = io.StringIO(text).readline
    comment_tokens = (t for t in tokenize.generate_tokens(reader) if t.type == tokenize.COMMENT)
    comment = next(comment_tokens, None)
    return [text[0: comment.start[1]], text[comment.start[1]:]] if comment else [text, '']

print(separate_code_from_comments("print('hello') # this is a comment"))
print(separate_code_from_comments("# this is a comment"))
print(separate_code_from_comments("loc_2 = for char in \"(*#& eht #\": # pylint: disable=one,two # something"))
print(separate_code_from_comments("loc_2 = for char in \"(*#& eht #\": "))

This should print out:

["print('hello') ", '# this is a comment']
['', '# this is a comment']
['loc_2 = for char in "(*#& eht #": ', '# pylint: disable=one,two # something']
['loc_2 = for char in "(*#& eht #": ', '']