I'm developing a program that scrapes the web for certain data and feeds it back to the database. The problem is that I don't want duplicate entries of the same data as soon as the crawlers run for a second time. If some attributes changed, but the majority of the data is still the same, I'd like to update the DB entry rather than simply adding a new one. I know how to do this in code, but I was wondering if this could be done better.
The way the update works right now:
//This method calls several other methods to check if the event in question already exists. If it does, it updates it using the id it returns.
//If it doesn't exist, -1 is returned as an id.
public static void check_event(Event event)
{
int id = -1;
id = check_exact_event(event); //Check if an event exists with the same title, location and time.
if(id > 0)
{
update_event(event, id);
Logger.log("EventID #" + id + " found using exact comparison");
return;
}
id = check_similar_event_titles(event); //Check if event exists with a different (but similar) title
if(id > 0)
{
update_event(event, id);
Logger.log("EventID #" + id + " found using similar title comparison");
return;
}
id = check_exact_image(event); //Check if event exists with the exact same image
if(id > 0)
{
update_event(event, id);
Logger.log("EventID #" + id + " found using image comparison");
return;
}
//Otherwise insert new event
create_new_event(event);
}
This works, but it's not very pleasing to the eye. What's the best way to go about this?
update_event->updateEvent,check_event->checkEventetc). And it's not that bad as code.