0

As it can be seen in the code. I created two output files one for output after splitting and second output as actual out after removing duplicate lines How can i make only one output file. Sorry if i sound too stupid, I'm a beginner

import sys
txt = sys.argv[1] 

lines_seen = set() # holds lines already seen
outfile = open("out.txt", "w")
actualout = open("output.txt", "w")

for line in open(txt, "r"):
    line = line.split("?", 1)[0]
    outfile.write(line+"\n")
outfile.close()

for line in open("out.txt", "r"):
    if line not in lines_seen: # not a duplicate
        actualout.write(line)
        lines_seen.add(line)

actualout.close()

2 Answers 2

1

You can add the lines from the input file directly into the set. Since sets cannot have duplicates, you don't even need to check for those. Try this:

import sys
txt = sys.argv[1] 

lines_seen = set() # holds lines already seen
actualout = open("output.txt", "w")

for line in open(txt, "r"):
    line = line.split("?", 1)[0]
    lines_seen.add(line + "\n")

for line in lines_seen:
    actualout.write(line)

actualout.close()
Sign up to request clarification or add additional context in comments.

Comments

1

In the first step you iterate through every line in the file, split the line on your decriminator and store it into a list. After that you iterate through the list and write it into your output file.

import sys
txt = sys.argv[1] 

lines_seen = set() # holds lines already seen
actualout = open("output.txt", "w")

data = [line.split("?", 1[0] for line in open("path/to/file/here", "r")]
for line in data:
    if line not in lines_seen: # not a duplicate
        actualout.write(line)
        lines_seen.add(line)  

actualout.close()

1 Comment

Its showing error in data variable where you tried to close the list

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.