if statement inside for loop python save to CSV

Question

I have two spreadsheets of data which I am trying to check rows from spreadsheet a against values in spreadsheet b and then take a value from spreadsheet b to a.

Here is the example data:

a.CSV:

IDNumber   Title
1          Vauxhall Astra Model H 92-93
2          VW Golf MK2 GTI 90-91
3          BMW 1 Series 89-93

b.CSV:

Manufacturer  Model      Type     Year                        Tag
VW            Golf       MK2      1990|1991|1993              1000
VW            Golf       MK2 GTI  1990|1991|1993              1001
VW            Golf       MK2      1896|1897|1898|1899         1002
Vauxhall      Astra      Model H  1991|1992|1993|1994         1003
BMW           2 Series            2000|2001|2002              1004
BMW           1 Series            1889|1890|1891|1892|1893    1005

Result I am trying to achieve c.csv:

IDNumber   Title                           Tag
1          Vauxhall Astra Model H 92-93    1003
2          VW Golf MK2 GTI 90-91           1001
3          BMW 1 Series 89-93              1005

My Code:

import pandas as pd
import re

acsv = pd.read_csv('a.csv', sep=",")
bcsv = pd.read_csv('b.csv', sep=",")

for index, row in acsv.iterrows():
  title = row['Title']

  for i, r in bcsv.iterrows():
    if r['Model'] in title:
      type = r['type']
      if bool(re.search(rf'\b{type} \b', title)):
        year = r['Year']
        yearSearch = "|".join([x[2:] for x in year.split("|")])
        if bool(re.search(rf'\b(?:{yearSearch})\b.*?\b(?:{yearSearch})\b', ebayTitle)):
          tag = r['Tag']
          acsv['tag'][index] = tag

acsv.to_csv(fileinString, sep=",", index=False)

Currently it returns a few items but not correctly but If i print the information in the loop inside the last if statement it shows it correctly on the screen but is not storing the information right.

I have put all the indicies in place so you can see exactly how it runs and I attempted to build an online run of it to see if it can work but couldnt get that working but may help in answering the question: https://ideone.com/otV6AS

Can you explain how that is achieved? Is using iterrows() even necessary for this? Also, please provide the data in a more convenient format. — AMC
– AMC, Commented Apr 15, 2020 at 19:02
The dictionaries in your ideone example are not analogous to a dataframe. You should use a list of dictionaries, not a dictionary of lists. — Barmar
– Barmar, Commented Apr 15, 2020 at 19:16

MachineLearner · Accepted Answer · 2020-04-15 20:38:03Z

Not most elegant and efficient solution but it should work.

import re
import pandas as pd

df1 = pd.DataFrame({
    'IDNumber': [1, 2, 3],
    'Title': ['Vauxhall Astra Model H 92-93', 'VW Golf MK2 GTI 90-91', 'BMW 1 Series 89-93']})

df2 = pd.DataFrame({
    'Manufacturer': ['VW', 'VW', 'VW', 'Vauxhall', 'BMW', 'BMW'],
    'Model': ['Golf', 'Golf', 'Golf', 'Astra', '2 Series', '1 Series'],
    'Type': ['MK2', 'MK2 GTI', 'MK2', 'Model H', '', ''],
    'Year': [
        '1990|1991|1993',
        '1990|1991|1993',
        '1896|1897|1898|1899',
        '1991|1992|1993|1994',
        '2000|2001|2002',
        '1889|1890|1891|1892|1893'],
    'Tag': [1000, 1001, 1002, 1003, 1004, 1005]})

# split title of df1 into string and year tag min and year tag max
regular_expression = re.compile(r'\d\d-\d\d')

df1['title_string'] = df1['Title'].apply(lambda x: x.replace(regular_expression.search(x)[0], '').strip())
df1['year_tag_min'] = df1['Title'].apply(lambda x: regular_expression.search(x)[0].split('-')[0])
df1['year_tag_max'] = df1['Title'].apply(lambda x: regular_expression.search(x)[0].split('-')[1])

# add zero column for Tags
df1['Tag'] = 0

# add min and max year to df2
df2['year_min'] = df2['Year'].str.slice(start=2, stop=4, step=1)
df2['year_max'] = df2['Year'].str.slice(start=-2, step=1)

# add title_string column to df2
df2['title_string'] = df2['Manufacturer'] + ' ' + df2['Model'] + ' ' + df2['Type']

for df1_row in range(0, df1.shape[0]):
    # get values from df1
    current_title_string = df1.iloc[df1_row, 2]
    current_year_tag_min = df1.iloc[df1_row, 3]
    current_year_tag_max = df1.iloc[df1_row, 4]
    # loop on values from df2 
    for df2_row in range(0, df2.shape[0]):
        # check if titles match
        match_title = df2.iloc[df2_row, -1].strip() == current_title_string.strip()
        # check if year interval from year_tag_min - year_tag_max lies in allowed interval
        match_year = current_year_tag_min >= df2.iloc[df2_row, -3] and current_year_tag_max <= df2.iloc[df2_row, -2]
        if match_title and match_year:
            df1.iloc[df1_row, -1] =  df2.iloc[df2_row, -4]

Collectives™ on Stack Overflow

if statement inside for loop python save to CSV

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related