1

I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the following. The code I tried doesn't split the input file properly. How can I split the input file into multiple files?

My code:

#!/usr/bin/python

with open("input.txt", "r") as f:
    a1=[]
    a2=[]
    a3=[]
    for line in f:
        if not line.strip() or line.startswith('A') or line.startswith('$$'): continue
        row = line.split()
        a1.append(str(row[0]))
        a2.append(float(row[1]))
        a3.append(float(row[2]))
f = open('1.txt','a')
f = open('2.txt','a')
f = open('3.txt','a')
f.write(str(a1)) 
f.close()

Input file:

A
x
k
..
$$

A
z
m
..
$$

A
B
l
..
$$

Desired output 1.txt

A
x
k
..
$$

Desired output 2.txt

A
z
m
..
$$

Desired output 3.txt

A
B
l
..
$$
4
  • i would use re.findall() for this... Commented Mar 10, 2016 at 12:29
  • Is $$ the delimiter? Commented Mar 10, 2016 at 12:32
  • @pzp, I don't think it's the delimiter since it's included in the desired output. Rather, the extra line-break would be the delimiter here. Commented Mar 10, 2016 at 12:33
  • $$ is a line in the input file. It should be written in each output file. Commented Mar 10, 2016 at 17:20

6 Answers 6

3

Read your input file and write to an output each time you find a "$$" and increase the counter of output files, code :

with open("input.txt", "r") as f:
    buff = []
    i = 1
    for line in f:
        if line.strip():  #skips the empty lines
           buff.append(line)
        if line.strip() == "$$":
           output = open('%d.txt' % i,'w')
           output.write(''.join(buff))
           output.close()
           i+=1
           buff = [] #buffer reset

EDIT: should be efficient too https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation

Sign up to request clarification or add additional context in comments.

1 Comment

.@maazza your code gives this error: Traceback (most recent call last): File "split.py", line 8, in <module> buff.append(line) AttributeError: 'str' object has no attribute 'append
1

try re.findall() function:

import re

with open('input.txt', 'r') as f:
    data = f.read()

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]

Minimalistic approach for the first 3 occurrences:

import re

found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]

Some explanations:

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

will find all occurrences matching the specified RegEx and will put them into the list, called found

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]

iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.

Another version, without RegEx's:

blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'

with open('35916503.txt', 'r') as f:
    fn = 1
    data = []
    write_block = False
    for line in f:
        if fn > blocks_to_read:
            break 
        line = line.strip()
        if line == blk_begin:
            write_block = True
        if write_block:
            data.append(line)
        if line == blk_end:
            write_block = False
            with open(str(fn) + '.txt', 'w') as fout:
                fout.write('\n'.join(data))
                data = []
            fn += 1

PS i, personally, don't like this version and i would use the one using RegEx

12 Comments

The regular expression you used is too restrictive. We can't say for sure that that format will hold for all inputs.
@ChuckLoganLim, i think OP might have '\n's in text-blocks
@MaxU Your code works properly. Can you please explain what each line does? And I think you can write another code without re.findall() function for the same aim? I wouldn't like to use re.findall() function :)
@MaxU And how can you arrange your code to get only three output files? Is that possible? Thanks.
@erhan, I've updated my "re.findall()" answer, so that it will write only first 3 blocks and will add another version without RegEx's bit later...
|
0

open 1.txt in the beginning for writing. Write each line to the current output file. Additionally, if line.strip() == '$$', close the old file and open a new one for writing.

Comments

0

The blocks are divided by empty lines. Try this:

import sys

lines = [line for line in sys.stdin.readlines()]
i = 1
o = open("1{}.txt".format(i), "w")
for line in lines:
    if len(line.strip()) == 0:
        o.close()
        i = i + 1
        o = open("{}.txt".format(i), "w")
    else:
        o.write(line)

Comments

0

Looks to me that the condition that you should be checking for is a line that contains just the carriage return (\n) character. When you encounter such a line, write the contents of the parsed file so far, close the file, and open another one for writing.

2 Comments

Each character in the input will have a carriage return
Sorry, I wasn't explicit enough. I meant a line that contains just the carriage return character.
0

A very easy way would if you want to split it in 2 files for example:

with open("myInputFile.txt",'r') as file:
    lines = file.readlines()

with open("OutputFile1.txt",'w') as file:
    for line in lines[:int(len(lines)/2)]:
        file.write(line)

with open("OutputFile2.txt",'w') as file:
    for line in lines[int(len(lines)/2):]:
        file.write(line)

making that dynamic would be:

with open("inputFile.txt",'r') as file:
    lines = file.readlines()

Batch = 10
end = 0
for i in range(1,Batch + 1):
    if i == 1:
        start = 0
    increase = int(len(lines)/Batch)
    end = end + increase
    with open("splitText_" + str(i) + ".txt",'w') as file:
        for line in lines[start:end]:
            file.write(line)
    
    start = end

1 Comment

The best solution so far

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.