0

I have two spreadsheets of data which I am trying to check rows from spreadsheet a against values in spreadsheet b and then take a value from spreadsheet b to a.

Here is the example data:

a.CSV:

IDNumber   Title
1          Vauxhall Astra Model H 92-93
2          VW Golf MK2 GTI 90-91
3          BMW 1 Series 89-93

b.CSV:

Manufacturer  Model      Type     Year                        Tag
VW            Golf       MK2      1990|1991|1993              1000
VW            Golf       MK2 GTI  1990|1991|1993              1001
VW            Golf       MK2      1896|1897|1898|1899         1002
Vauxhall      Astra      Model H  1991|1992|1993|1994         1003
BMW           2 Series            2000|2001|2002              1004
BMW           1 Series            1889|1890|1891|1892|1893    1005

Result I am trying to achieve c.csv:

IDNumber   Title                           Tag
1          Vauxhall Astra Model H 92-93    1003
2          VW Golf MK2 GTI 90-91           1001
3          BMW 1 Series 89-93              1005

My Code:

import pandas as pd
import re

acsv = pd.read_csv('a.csv', sep=",")
bcsv = pd.read_csv('b.csv', sep=",")

for index, row in acsv.iterrows():
  title = row['Title']

  for i, r in bcsv.iterrows():
    if r['Model'] in title:
      type = r['type']
      if bool(re.search(rf'\b{type} \b', title)):
        year = r['Year']
        yearSearch = "|".join([x[2:] for x in year.split("|")])
        if bool(re.search(rf'\b(?:{yearSearch})\b.*?\b(?:{yearSearch})\b', ebayTitle)):
          tag = r['Tag']
          acsv['tag'][index] = tag

acsv.to_csv(fileinString, sep=",", index=False)

Currently it returns a few items but not correctly but If i print the information in the loop inside the last if statement it shows it correctly on the screen but is not storing the information right.

I have put all the indicies in place so you can see exactly how it runs and I attempted to build an online run of it to see if it can work but couldnt get that working but may help in answering the question: https://ideone.com/otV6AS

4
  • Can you explain how that is achieved? Is using iterrows() even necessary for this? Also, please provide the data in a more convenient format. Commented Apr 15, 2020 at 19:02
  • The dictionaries in your ideone example are not analogous to a dataframe. You should use a list of dictionaries, not a dictionary of lists. Commented Apr 15, 2020 at 19:16
  • Or just create a real df. Commented Apr 15, 2020 at 19:17
  • 1
    Have you checked my answer? Commented Apr 16, 2020 at 9:47

1 Answer 1

1

Not most elegant and efficient solution but it should work.

import re
import pandas as pd

df1 = pd.DataFrame({
    'IDNumber': [1, 2, 3],
    'Title': ['Vauxhall Astra Model H 92-93', 'VW Golf MK2 GTI 90-91', 'BMW 1 Series 89-93']})

df2 = pd.DataFrame({
    'Manufacturer': ['VW', 'VW', 'VW', 'Vauxhall', 'BMW', 'BMW'],
    'Model': ['Golf', 'Golf', 'Golf', 'Astra', '2 Series', '1 Series'],
    'Type': ['MK2', 'MK2 GTI', 'MK2', 'Model H', '', ''],
    'Year': [
        '1990|1991|1993',
        '1990|1991|1993',
        '1896|1897|1898|1899',
        '1991|1992|1993|1994',
        '2000|2001|2002',
        '1889|1890|1891|1892|1893'],
    'Tag': [1000, 1001, 1002, 1003, 1004, 1005]})

# split title of df1 into string and year tag min and year tag max
regular_expression = re.compile(r'\d\d-\d\d')

df1['title_string'] = df1['Title'].apply(lambda x: x.replace(regular_expression.search(x)[0], '').strip())
df1['year_tag_min'] = df1['Title'].apply(lambda x: regular_expression.search(x)[0].split('-')[0])
df1['year_tag_max'] = df1['Title'].apply(lambda x: regular_expression.search(x)[0].split('-')[1])

# add zero column for Tags
df1['Tag'] = 0

# add min and max year to df2
df2['year_min'] = df2['Year'].str.slice(start=2, stop=4, step=1)
df2['year_max'] = df2['Year'].str.slice(start=-2, step=1)

# add title_string column to df2
df2['title_string'] = df2['Manufacturer'] + ' ' + df2['Model'] + ' ' + df2['Type']

for df1_row in range(0, df1.shape[0]):
    # get values from df1
    current_title_string = df1.iloc[df1_row, 2]
    current_year_tag_min = df1.iloc[df1_row, 3]
    current_year_tag_max = df1.iloc[df1_row, 4]
    # loop on values from df2 
    for df2_row in range(0, df2.shape[0]):
        # check if titles match
        match_title = df2.iloc[df2_row, -1].strip() == current_title_string.strip()
        # check if year interval from year_tag_min - year_tag_max lies in allowed interval
        match_year = current_year_tag_min >= df2.iloc[df2_row, -3] and current_year_tag_max <= df2.iloc[df2_row, -2]
        if match_title and match_year:
            df1.iloc[df1_row, -1] =  df2.iloc[df2_row, -4] 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.