2

I'm currently facing a problem where I am giving a thread a reference to a set and I want to be able to replace the set with a mocked database call. I have so far done

import logging
import threading
import time
from typing import Callable

from loguru import logger


class MonitorProduct:

    def __init__(self, term: str, is_alive: Callable[[str], bool]) -> None:
        self.is_alive = is_alive
        self.term = term

    def do_request(self) -> None:
        time.sleep(.1)
        while True:
            logger.info(f'Checking {self.term}')
            if not self.is_alive(self.term):
                logger.info(f'Deleting term from monitoring: "{self.term}"')
                return

            time.sleep(5)


# mocked database
def database_terms() -> set[str]:
    return {
        'hello world',
        'python 3',
        'world',
        'wth',
    }


def database_terms_2() -> set[str]:
    return {
        'what am I doing wrong',
    }


def main() -> None:
    terms: set[str] = set()

    while True:
        db_terms = database_terms()
        diff = db_terms - terms
        terms.symmetric_difference_update(db_terms)

        for url in diff:
            logger.info(f'Starting URL: {url}')
            threading.Thread(
                target=MonitorProduct(url, terms.__contains__).do_request
            ).start()

        time.sleep(2)

        # ----------------------------------------------- #

        db_terms = database_terms_2() 
        diff = db_terms - terms
        terms.symmetric_difference_update(db_terms) # <--- terms should only now contain `what am I doing wrong`

        # Start the new URLS
        for url in diff:
            logger.info(f'Starting URL 2: {url}')
            threading.Thread(
                target=MonitorProduct(url, terms.__contains__).do_request
            ).start()

        time.sleep(10)


if __name__ == '__main__':
    main()

The problem I am now having is that when we do our first db call, it should start threads for each of terms:

{
  'hello world',
  'python 3',
  'world',
  'wth',
}

and as you can see we also send in a terms.__contains__ for each thread.

When we do the second call of db - that set should replace the terms to

{
  'what am I doing wrong',
}

which should end up exiting the four running threads due to:

def do_request(self) -> None:
        time.sleep(.1)
        while True:
            logger.info(f'Checking {self.term}')
            if not self.is_alive(self.term):
                logger.info(f'Deleting term from monitoring: "{self.term}"')
                return

            time.sleep(5)

however the problem is that we cannot replace terms by doing

terms = ... because we are creating a new set and then bidning that set to the variable terms while the thread still has a reference to the old set.

My question is, how can I replace the old terms with updating to the newest set without binding a new set?

4
  • mutate the set? Commented Jul 28, 2022 at 21:45
  • @juanpa.arrivillaga Im sorry, I dont understand what you mean? Commented Jul 28, 2022 at 21:48
  • update the set object. e.g. myset.clear(); myset.update(new_values) Commented Jul 28, 2022 at 21:49
  • 1
    @juanpa.arrivillaga but if I do that, there is a chance that will hit the if not self.is_alive(self.term): in the same timing which is not what I would like to do because then there is threads that were supposed to run - be exited instead and then starting a new spawn again due to that. Commented Jul 28, 2022 at 21:50

1 Answer 1

1

You're almost there. But

diff = db_terms - terms
terms ^= diff  # symmetric_difference_update()

Isn't enough, because that just adds the new values, so it's the same as

terms |= diff  # update()

or even

terms |= db_terms  # update()

(And one of these options should be clearer to the reader than the symmetric difference, because you're not using the symmetric difference to remove anything.)

To remove the old values, you want to also do

terms &= db_terms  # intersection_update()

You said you're concerned about race conditions with intermediate values of the set. If you'd want to modify the set from more than one thread, you should use a mutex lock (threading.RLock) around it. But if you're only modifying from one thread and comparing __contains__ in another, you can avoid a lock in CPython as long as each step of execution keeps your set in a consistent state.

Sign up to request clarification or add additional context in comments.

5 Comments

Oh that is cool! Appreciate that. What I did was that I had to do a terms.clear() and then terms |= db_terms to replace the full set. Is that bad thing to do?
@ProtractorNewbie after both lines the result is the same, though as you mentioned in a comment, in between the two lines terms will be empty, so there's the change of a race condition if you don't use locking. If you replace .clear() with &= db_terms, it will only remove the terms not in db_terms.
Oh in that vase using &= db_terms might be a better solution for me :) Will need to give that a try!
so I am correct, The correct way would be to dodb_terms = database_terms() diff = db_terms - terms terms &= db_terms terms |= db_terms - If I am not incorrect now?
Yes, that should work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.