Parse Text with Python

Question

I have data like the example data below in a text file. What I would like to do is search through the text file and return everything between "SpecialStuff" and the next ";", like I've done with the example out put. I'm pretty new to python so any tips are greatly appreciated, would something like .split() work?

Example Data:

stuff:
    1
    1
    1
    23

];

otherstuff:
    do something
    23
    4
    1

];

SpecialStuff
    select
        numbers
        ,othernumbers
        words
;

MoreOtherStuff
randomstuff
@#123


Example Out Put:

select
        numbers
        ,othernumbers
        words

cosinepenguin · Accepted Answer · 2017-06-20 15:13:18Z

1

You can try this:

file = open("filename.txt", "r") # This opens the original file
output = open("result.txt", "w") # This opens a new file to write to
seenSpecialStuff = 0 # This will keep track of whether or not the 'SpecialStuff' line has been seen.
for line in file:
    if ";" in line:
        seenSpecialStuff = 0 # Set tracker to 0 if it sees a semicolon.
    if seenSpecialStuff == 1:
        output.write(line)  # Print if tracker is active 
    if "SpecialStuff" in line:
        seenSpecialStuff = 1 # Set tracker to 1 when SpecialStuff is seen

This returns a file named result.txt that contains:

  select
    numbers
    ,othernumbers
    words

This code can be improved! Since this is likely a homework assignment, you'll probably want to do more research about how to make this more efficient. Hopefully it can be a useful starting ground for you!

Cheers!

EDIT

If you wanted the code to specifically read the line "SpecialStuff" (instead of lines containing "SpecialStuff"), you could easily change the "if" statements to make them more specific:

file = open("my.txt", "r")
output = open("result.txt", "w")
seenSpecialStuff = 0
for line in file:
    if line.replace("\n", "") == ";":
        seenSpecialStuff = 0
    if seenSpecialStuff == 1:
        output.write(line)
    if line.replace("\n", "") == "SpecialStuff":
        seenSpecialStuff = 1

edited Jun 20, 2017 at 15:13

answered Jun 19, 2017 at 18:23

cosinepenguin

1,5751 gold badge13 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3476463 Over a year ago

Thank you, this is really close to what I was looking for. The only problem is that there are some parts of the code that have strings like "abcSpecialStuffpdq" and so it's grabbing everything that follows. How could I change the code so it only grabs stuff following the string "SpecialStuff" ?

cosinepenguin Over a year ago

You can try making the "if" statement something like if line.replace("\n", "") == "SpecialStuff":, which would make it so that only the line that has exactly SpecialStuff in it would trigger making the tracker "1"! That can be done for the other lines too, if you want it to only find specific occurrences!

cosinepenguin Over a year ago

I edited the answer to reflect that! If you needed to later also grab the information contained in "abcSpecialStuffpdq" you would have to add a separate "if" statement so that the code would recognize it.

inspectorG4dget · Accepted Answer · 2017-06-19 18:25:16Z

0

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:  # open the input and output files
    wanted = False  # do we want the current line in the output?
    for line in infile:
        if line.strip() == "SpecialStuff":  # marks the begining of a wanted block
            wanted = True
            continue
        if line.strip() == ";" and wanted:  # marks the end of a wanted block
            wanted = False
            continue

        if wanted: outfile.write(line)

answered Jun 19, 2017 at 18:25

inspectorG4dget

115k30 gold badges159 silver badges253 bronze badges

Comments

zwer · Accepted Answer · 2017-06-19 18:33:16Z

Don't use str.split() for that - str.find() is more than enough:

parsed = None
with open("example.dat", "r") as f:
    data = f.read()  # load the file into memory for convinience
    start_index = data.find("SpecialStuff")  # find the beginning of your block
    if start_index != -1:
        end_index = data.find(";", start_index)  # find the end of the block
        if end_index != -1:
            parsed = data[start_index + 12:end_index]  # grab everything in between
if parsed is None:
    print("`SpecialStuff` Block not found")
else:
    print(parsed)

Keep in mind that this will capture everything between those two, including new lines and other whitespace - you can additionally do parsed.strip() to remove leading and trailing whitespaces if you don't want them.

Collectives™ on Stack Overflow

Parse Text with Python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related