I want to write unittests for class with several methods for data transformation.
High level:
class my_class:
def __init__(self, file):
# read data out of .yml config file
config = read_data_from_yml_config(file)
self.value1 = config["value1"]
self.value2 = config["value2"]
def get_and_transform(self):
data_dict = self.get_data()
transformed_data = self.transform_data(data_dict)
return transformed_data
def get_data(self):
data_dict = request_based_on_value1(self.value1)
return data_dict
def transform_data(self, data_dict):
trnsf = transform1(data_dict, self.value2)
return trnsf
Here, I have several questions. The main thing to test here ist my_class.transform_data().
It takes a dict as an input, reads it as a pandas data frame and does some transformations.
In my understanding, I need several fixtures d1, d2, d3, ... (as different values for data_dict) which represent the different test case inputs for my_class.transform_data(). As I want to make sure that the output is as expected, I would als define my expected output:
o1 # expected output for transform_data(d1)
o2, o3, ... # respectively
Several questions to that:
- Is this approach correct?
- How and where would I specify
d1,d2, ... ando1,o2,....? I could either do that in thetest_my_class.py-file or store d1_sample.pkl, ... in thetests/folder. Here I would chose a minimal example for bothdando - As the transformation in
transform_dataalso depends on the attributeself.value2, how would I pass in different values forvalue2without creating an instance ofmy_class?
In general, it is also not fully clear to me whether I would test on an "object"-level or on a "method" level. Above, I described a "method"-approach
(because I am mainly interested in the results of transform_data). The alternative would be to provide different .yml files and thus creating
different test instances of my_class.
def yml1():
config = read_in_yml1()
return config
# and so on for different configurations.
then for the test:
@pytest.mark.parametrize("test_input, expected", [(yml1, ???), (yml2, ???)])
def test_my_class():
test_class = my_class(file)
assert test_class.transform_data == expected
However, as the function input to my_class.transform_data() does not depend (directly) from the content of yml1, but rather the response of my_class.get_data(), that seems to make little sense. How would I test for different input values of data_dict?
What is the correct way to write unit tests in this scenario?