How can I split a text file into multiple text files using python?

Question

I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the following. The code I tried doesn't split the input file properly. How can I split the input file into multiple files?

My code:

#!/usr/bin/python

with open("input.txt", "r") as f:
    a1=[]
    a2=[]
    a3=[]
    for line in f:
        if not line.strip() or line.startswith('A') or line.startswith('$$'): continue
        row = line.split()
        a1.append(str(row[0]))
        a2.append(float(row[1]))
        a3.append(float(row[2]))
f = open('1.txt','a')
f = open('2.txt','a')
f = open('3.txt','a')
f.write(str(a1)) 
f.close()

Input file:

A
x
k
..
$$

A
z
m
..
$$

A
B
l
..
$$

Desired output 1.txt

A
x
k
..
$$

Desired output 2.txt

A
z
m
..
$$

Desired output 3.txt

A
B
l
..
$$

@pzp, I don't think it's the delimiter since it's included in the desired output. Rather, the extra line-break would be the delimiter here. — Chuck
– Chuck, Commented Mar 10, 2016 at 12:33
$$ is a line in the input file. It should be written in each output file. — erhan
– erhan, Commented Mar 10, 2016 at 17:20

maazza · Accepted Answer · 2016-03-10 17:39:06Z

3

Read your input file and write to an output each time you find a "$$" and increase the counter of output files, code :

with open("input.txt", "r") as f:
    buff = []
    i = 1
    for line in f:
        if line.strip():  #skips the empty lines
           buff.append(line)
        if line.strip() == "$$":
           output = open('%d.txt' % i,'w')
           output.write(''.join(buff))
           output.close()
           i+=1
           buff = [] #buffer reset

EDIT: should be efficient too https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation

edited Mar 10, 2016 at 17:39

answered Mar 10, 2016 at 12:53

maazza

7,25116 gold badges67 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

erhan Over a year ago

.@maazza your code gives this error:

Traceback (most recent call last):   File "split.py", line 8, in <module>     buff.append(line) AttributeError: 'str' object has no attribute 'append

MaxU - stand with Ukraine · Accepted Answer · 2016-03-10 18:52:03Z

1

try re.findall() function:

import re

with open('input.txt', 'r') as f:
    data = f.read()

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]

Minimalistic approach for the first 3 occurrences:

import re

found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]

Some explanations:

found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)

will find all occurrences matching the specified RegEx and will put them into the list, called found

[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]

iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.

Another version, without RegEx's:

blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'

with open('35916503.txt', 'r') as f:
    fn = 1
    data = []
    write_block = False
    for line in f:
        if fn > blocks_to_read:
            break 
        line = line.strip()
        if line == blk_begin:
            write_block = True
        if write_block:
            data.append(line)
        if line == blk_end:
            write_block = False
            with open(str(fn) + '.txt', 'w') as fout:
                fout.write('\n'.join(data))
                data = []
            fn += 1

PS i, personally, don't like this version and i would use the one using RegEx

edited Mar 10, 2016 at 18:52

answered Mar 10, 2016 at 12:43

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

12 Comments

Chuck Over a year ago

The regular expression you used is too restrictive. We can't say for sure that that format will hold for all inputs.

MaxU - stand with Ukraine Over a year ago

@ChuckLoganLim, i think OP might have '\n's in text-blocks

erhan Over a year ago

@MaxU Your code works properly. Can you please explain what each line does? And I think you can write another code without re.findall() function for the same aim? I wouldn't like to use re.findall() function :)

erhan Over a year ago

@MaxU And how can you arrange your code to get only three output files? Is that possible? Thanks.

MaxU - stand with Ukraine Over a year ago

@erhan, I've updated my "re.findall()" answer, so that it will write only first 3 blocks and will add another version without RegEx's bit later...

|

Antti Haapala · Accepted Answer · 2016-03-10 12:33:57Z

0

open 1.txt in the beginning for writing. Write each line to the current output file. Additionally, if line.strip() == '$$', close the old file and open a new one for writing.

answered Mar 10, 2016 at 12:33

Antti Haapala

135k23 gold badges297 silver badges349 bronze badges

Comments

user5547025 · Accepted Answer · 2016-03-10 12:34:54Z

0

The blocks are divided by empty lines. Try this:

import sys

lines = [line for line in sys.stdin.readlines()]
i = 1
o = open("1{}.txt".format(i), "w")
for line in lines:
    if len(line.strip()) == 0:
        o.close()
        i = i + 1
        o = open("{}.txt".format(i), "w")
    else:
        o.write(line)

answered Mar 10, 2016 at 12:34

user5547025

Comments

Chuck · Accepted Answer · 2016-03-10 12:36:30Z

0

Looks to me that the condition that you should be checking for is a line that contains just the carriage return (\n) character. When you encounter such a line, write the contents of the parsed file so far, close the file, and open another one for writing.

edited Mar 10, 2016 at 12:36

answered Mar 10, 2016 at 12:32

Chuck

8846 silver badges17 bronze badges

2 Comments

gmuraleekrishna Over a year ago

Each character in the input will have a carriage return

Chuck Over a year ago

Sorry, I wasn't explicit enough. I meant a line that contains just the carriage return character.

Enrique Benito Casado · Accepted Answer · 2021-10-20 07:11:35Z

0

A very easy way would if you want to split it in 2 files for example:

with open("myInputFile.txt",'r') as file:
    lines = file.readlines()

with open("OutputFile1.txt",'w') as file:
    for line in lines[:int(len(lines)/2)]:
        file.write(line)

with open("OutputFile2.txt",'w') as file:
    for line in lines[int(len(lines)/2):]:
        file.write(line)

making that dynamic would be:

with open("inputFile.txt",'r') as file:
    lines = file.readlines()

Batch = 10
end = 0
for i in range(1,Batch + 1):
    if i == 1:
        start = 0
    increase = int(len(lines)/Batch)
    end = end + increase
    with open("splitText_" + str(i) + ".txt",'w') as file:
        for line in lines[start:end]:
            file.write(line)
    
    start = end

edited Oct 20, 2021 at 7:11

answered Oct 19, 2021 at 16:00

Enrique Benito Casado

2,1203 gold badges28 silver badges49 bronze badges

1 Comment

Cristian C Over a year ago

The best solution so far

Collectives™ on Stack Overflow

How can I split a text file into multiple text files using python?

6 Answers 6

1 Comment

12 Comments

Comments

Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

12 Comments

Comments

Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related