I'm currently working on a function that will detect if the row is a duplicate based on multiple conditions (square meters, images and price). It works perfectly fine, till it finds the duplicate, removes the row from DataFrame and then my for loop is disturbed. This produces IndexError: single positional indexer is out-of-bounds.
def image_duplicate(df):
# Detecting duplicates based on the publications' images, m2 and price.
for index1 in df.index:
for index2 in df.index:
if index1 == index2:
continue
print('index1: {} \t index2: {}'.format(index1, index2))
img1 = Image.open(requests.get(df['img_url'].iloc[index1], stream=True).raw).resize((213, 160))
img2 = Image.open(requests.get(df['img_url'].iloc[index2], stream=True).raw).resize((213, 160))
img1 = np.array(img1).astype(float)
img2 = np.array(img2).astype(float)
ssim_result = ssim(img1, img2, multichannel=True)
ssim_result_percentage = (1+ssim_result)/2
if ssim_result_percentage > 0.80 and df['m2'].iloc[index1] == df['m2'].iloc[index2] \
and df['Price'].iloc[index1] == df['Price'].iloc[index2]:
df.drop(df.iloc[index2], inplace=True).reindex()
image_duplicate(full_df)
What would be a good solution to this issue?
Expected output: Remove One Bedroom row [2] from the DataFrame.

applymethod or vector operations. Here you are changing the dataframe while looping though it, which can't work. Please provide an example of your data and the expected output.similar(img1, img2) -> True/False. Then you can apply it more easily