1

I want to write unittests for class with several methods for data transformation.

High level:

class my_class:

    def __init__(self, file):
        # read data out of .yml config file
        config = read_data_from_yml_config(file)
        self.value1 = config["value1"]
        self.value2 = config["value2"]

    def get_and_transform(self):
        data_dict = self.get_data()
        transformed_data = self.transform_data(data_dict)

        return transformed_data

    def get_data(self):
        data_dict = request_based_on_value1(self.value1)
        return data_dict

    def transform_data(self, data_dict):
        trnsf = transform1(data_dict, self.value2)

        return trnsf

Here, I have several questions. The main thing to test here ist my_class.transform_data(). It takes a dict as an input, reads it as a pandas data frame and does some transformations.

In my understanding, I need several fixtures d1, d2, d3, ... (as different values for data_dict) which represent the different test case inputs for my_class.transform_data(). As I want to make sure that the output is as expected, I would als define my expected output:

o1 # expected output for transform_data(d1)
o2, o3, ... # respectively

Several questions to that:

  1. Is this approach correct?
  2. How and where would I specify d1, d2, ... and o1, o2,....? I could either do that in the test_my_class.py-file or store d1_sample.pkl, ... in the tests/ folder. Here I would chose a minimal example for both d and o
  3. As the transformation in transform_data also depends on the attribute self.value2, how would I pass in different values for value2 without creating an instance of my_class?

In general, it is also not fully clear to me whether I would test on an "object"-level or on a "method" level. Above, I described a "method"-approach (because I am mainly interested in the results of transform_data). The alternative would be to provide different .yml files and thus creating different test instances of my_class.

def yml1():
    config = read_in_yml1()
    return config

# and so on for different configurations.

then for the test:

@pytest.mark.parametrize("test_input, expected", [(yml1, ???), (yml2, ???)])
def test_my_class():
    test_class = my_class(file)

    assert test_class.transform_data == expected

However, as the function input to my_class.transform_data() does not depend (directly) from the content of yml1, but rather the response of my_class.get_data(), that seems to make little sense. How would I test for different input values of data_dict?

What is the correct way to write unit tests in this scenario?

1 Answer 1

1

I'm no expert at all but I find your problem interesting, so I'm gonna try to post an answer that is careful and constructive at the same time:

  1. Your approach of verifying that transform_data behaves as expected with regards to pairs of inputs and outputs seems valid to me. Unit tests verify that the smallest components ("units") in your source code behave as expected, and I would say that the behavior of your three methods is different enough to make these methods units.
  2. If you are not sure on whether to declare pairs of input dictionaries and expected outputs inside your test file or in external .pkl files, I guess that these input/output pairs are either large (in size) or many (in number):
    • In the first case, you could declare a default data_dict input dictionary and a default expected output as two fixtures that you can later monkeypatch inside a test function. You can parametrize the test function with different values that correspond to elements in the data_dict dictionary, and then monkeypatch the elements of the default data_dict fixture with these parametrized values.
    • In the second case, I think it would be advisable to reduce the number of test cases to a few relevant ones, and try to keep their input/output specification inside the test file.
  3. You could pass in different values for value2 by (again) monkeypatching the value2 property of an instance of my_class. You would however need to declare such an instance at least once (either inside or outside your function).

Like I was writing above, testing on a "method-level" rather than an "object-level" seems valid to me also for object-oriented software. Providing different .yml files and creating different instances of your class (so testing on an "object-level" like you said) looks to me like a first step into doing integration tests. I might be wrong on this last point, but hopefully someone can improve my answer on this or other points :)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.