How to get values from similar strings in Python?

Question

Suppose I have the following strings, from a file containing similar strings :

Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|
Canillo|ad|Canillo|3292|42.57|1.6|
Encamp|ad|Encamp|11224|42.54|1.57|
La Massana|ad|La Massana|7211|42.55|1.51|
...

How could I print the first number (or the fourth field of each string) using regular expressions? And, how could I print the first 4 fields from a particular line (e.g. "Andorra la Vella" "ad" "Andorra la Vella" 20430) if the 4th number is above 10000?

Have a look at the csv module. You won't need regex and will be able to address the use cases you mention. — rickhg12hs
– rickhg12hs, Commented Nov 17, 2013 at 16:11

unutbu · Accepted Answer · 2013-11-17 16:07:04Z

5

I think it would be easier to use the csv module in this case:

import csv
with open(filename, 'rb') as f:
    for row in csv.reader(f, delimiter='|'):
        num = float(row[3])
        if num > 10000:
            print(row[:4])

answered Nov 17, 2013 at 16:07

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Inbar Rose · Accepted Answer · 2013-11-17 16:08:00Z

2

You don't need regex.

s = """
Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|
Canillo|ad|Canillo|3292|42.57|1.6|
Encamp|ad|Encamp|11224|42.54|1.57|
La Massana|ad|La Massana|7211|42.55|1.51|
"""

for line in s.splitlines():  # pretend we are reading from a file
    if not line:
        continue # skip empty lines

    groups = line.split('|')  # splits each line into its segments
    if int(groups[3]) > 10000:  # checks if the 4th value is above 10000
        print groups[:4]  # prints the first 4 values
    else:
        print groups[3]  # prints the 4th value

>>> 
['Andorra la Vella', 'ad', 'Andorra la Vella', '20430']
3292
['Encamp', 'ad', 'Encamp', '11224']
7211

answered Nov 17, 2013 at 16:08

Inbar Rose

43.7k24 gold badges91 silver badges137 bronze badges

Comments

Pi Marillion · Accepted Answer · 2013-11-17 16:19:05Z

1

Using regular expressions:

import re
results = [re.match('(.*?\|)(.*?\|)(.*?\|)(.*?\|)(.*?\|)(.*?\|)', line).groups() for line in open('file.txt')]
# filter just the rows with fourth column > 10000
results = [result for result in results if int(result[3]) > 10000]

Using split:

results = [line.split('|')[0:-1] for line in open('file.txt')]
# filter just the rows with fourth column > 10000
results = [result for result in results if int(result[3]) > 10000]

answered Nov 17, 2013 at 16:19

Pi Marillion

4,6941 gold badge22 silver badges23 bronze badges

Comments

Ashwini Chaudhary · Accepted Answer · 2013-11-17 16:07:13Z

0

You don't need regex here, you can use str.split and str.strip:

>>> s = 'Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51|'
>>> spl = s.rstrip('|\n').split('|')
>>> spl
['Andorra la Vella', 'ad', 'Andorra la Vella', '20430', '42.51', '1.51']
if int(spl[3]) > 10000:
    print (spl[:3])
...     
['Andorra la Vella', 'ad', 'Andorra la Vella']

Demo:

with open('filename') as f:
    for line in f:
        data = line.rstrip('|\n').split('|')
        if int(data[3]) > 10000:
            print data[:4]

Output:

['Andorra la Vella', 'ad', 'Andorra la Vella', '20430']
['Encamp', 'ad', 'Encamp', '11224']

answered Nov 17, 2013 at 16:07

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Collectives™ on Stack Overflow

How to get values from similar strings in Python?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related