I'm working on text file processing using Python. I've got a text file (ctl_Files.txt) which has the following content/ or similar to this:
------------------------
Changeset: 143
User: Sarfaraz
Date: Tuesday, April 05, 2011 5:34:54 PM
Comment:
Initial add, all objects.
Items:
add $/Systems/DB/Expences/Loader
add $/Systems/DB/Expences/Loader/AAA.txt
add $/Systems/DB/Expences/Loader/BBB.txt
add $/Systems/DB/Expences/Loader/CCC.txt
Check-in Notes:
Code Reviewer:
Performance Reviewer:
Reviewer:
Security Reviewer:
------------------------
Changeset: 145
User: Sarfaraz
Date: Thursday, April 07, 2011 5:34:54 PM
Comment:
edited objects.
Items:
edit $/Systems/DB/Expences/Loader
edit $/Systems/DB/Expences/Loader/AAA.txt
edit $/Systems/DB/Expences/Loader/AAB.txt
Check-in Notes:
Code Reviewer:
Performance Reviewer:
Reviewer:
Security Reviewer:
------------------------
Changeset: 147
User: Sarfaraz
Date: Wednesday, April 06, 2011 5:34:54 PM
Comment:
Initial add, all objects.
Items:
delete, source rename $/Systems/DB/Expences/Loader/AAA.txt;X34892
rename $/Systems/DB/Expences/Loader/AAC.txt.
Check-in Notes:
Code Reviewer:
Performance Reviewer:
Reviewer:
Security Reviewer:
------------------------
To process this file I wrote the following code:
#Tags - used for spliting the information
tag1 = 'Changeset:'
tag2 = 'User:'
tag3 = 'Date:'
tag4 = 'Comment:'
tag5 = 'Items:'
tag6 = 'Check-in Notes:'
#opening and reading the input file
#In path to input file use '\' as escape character
with open ("C:\\Users\\md_sarfaraz\\Desktop\\ctl_Files.txt", "r") as myfile:
val=myfile.read().replace('\n', ' ')
#counting the occurence of any one of the above tag
#As count will be same for all the tags
occurence = val.count(tag1)
#initializing row variable
row=""
#passing the count - occurence to the loop
for count in range(1, occurence+1):
row += ( (val.split(tag1)[count].split(tag2)[0]).strip() + '|' \
+ (val.split(tag2)[count].split(tag3)[0]).strip() + '|' \
+ (val.split(tag3)[count].split(tag4)[0]).strip() + '|' \
+ (val.split(tag4)[count].split(tag5)[0]).strip() + '|' \
+ (val.split(tag5)[count].split(tag6)[0]).strip() + '\n')
#opening and writing the output file
#In path to output file use '\' as escape character
file = open("C:\\Users\\md_sarfaraz\\Desktop\\processed_ctl_Files.txt", "w+")
file.write(row)
file.close()
and got the following result/File (processed_ctl_Files.txt):
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader add $/Systems/DB/Expences/Loader/AAA.txt add $/Systems/DB/Expences/Loader/BBB.txt add $/Systems/DB/Expences/Loader/CCC.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader edit $/Systems/DB/Expences/Loader/AAA.txt edit $/Systems/DB/Expences/Loader/AAB.txt
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|delete, source rename $/Systems/DB/Rascal/Expences/AAA.txt;X34892 rename $/Systems/DB/Rascal/Expences/AAC.txt.
But, I want the result like this:
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader
add $/Systems/DB/Expences/Loader/AAA.txt
add $/Systems/DB/Expences/Loader/BBB.txt
add $/Systems/DB/Expences/Loader/CCC.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader
edit $/Systems/DB/Expences/Loader/AAA.txt
edit $/Systems/DB/Expences/Loader/AAB.txt
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|delete, source rename $/Systems/DB/Rascal/Expences/AAA.txt;X34892
rename $/Systems/DB/Rascal/Expences/AAC.txt.
or it would be great if we can get results like this :
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/AAA.txt
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/BBB.txt
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/CCC.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader/AAA.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader/AAB.txt
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|delete, source rename $/Systems/DB/Rascal/Expences/AAA.txt;X34892
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|rename $/Systems/DB/Rascal/Expences/AAC.txt.
Let me know how I can do this. Also, I'm very new to Python so please ignore if I've written some lousy or redundant code. And help me to improve this.