2

Looking to gain understanding in Python'stokenize module. I am interested in calling the tokenize.tokenize method on a given python source file (like the one below) and get its tokenized output with the 5-tuple as mentioned in the docs.

# Python source file
import os

class Test():
    """
    This class holds latitude, longitude, depth and magnitude data.
    """

    def __init__(self, latitude, longitude, depth, magnitude):
        self.latitude = latitude
        self.longitude = longitude
        self.depth = depth
        self.magnitude = magnitude

    def __str__(self):
        # -1 is for detection of missing data
        depth = self.depth
        if depth == -1:
            depth = 'unknown'

        magnitude = self.magnitude
        if magnitude == -1:
            depth = 'unknown'

        return "M{0}, {1} km, lat {2}\N{DEGREE SIGN} lon {3}\N{DEGREE SIGN}".format(magnitude, depth, self.latitude, self.longitude)

Unfortunately, the example in the docs is not clear enough given my inexperience in Python to make it work. Also, I could not find any related useful example code online.

Any simple workable code examples will be greatly appreciated. Also, if you know of useful online material with examples/explanations of the tokenize module and its methods that would be great.

1 Answer 1

1

tokenize.tokenize is a generator, and it will yield multiple 5-tuples corresponding to each token in the source.

with open('/path/to/src.py', 'rb') as f:
    for five_tuple in tokenize.tokenize(f.readline):
        print(five_tuple.type)
        print(five_tuple.string)
        print(five_tuple.start)
        print(five_tuple.end)
        print(five_tuple.line)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your answer. However, I am not getting the name of the tokens, e.g. STRING, OP etc. . Am I missing something?
They are the first item in the tuple, but they are an int, not a string. You'll have to compare them to the global statics in the tokenize module to if five_tuple.type == tokenize.OP
I see, and how can I compare against more structural tokens (e.g. for an if statement and other keywords)??
You're going to have to do some work yourself. Just print out all the token tuples and see what the types are for if statements. The token tuples include the code string and the entire line the token appears on.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.