I don't have any formal training in programming, but I routinely come across this question when I am making classes and running individual methods of that class in sequence. What is better: save results as class variables or return them and use them as inputs to subsequent method calls. For example, here is a class where the the variables are returned and used as inputs:
class ProcessData:
def __init__(self):
pass
def get_data(self,path):
data = pd.read_csv(f"{path}/data.csv"}
return data
def clean_data(self, data)
data.set_index("timestamp", inplace=True)
data.drop_duplicates(inplace=True)
return data
def main():
processor = ProcessData()
temp = processor.get_data("path/to/data")
processed_data = processor.clean_data(temp)
And here is an example where the results are saved/used to update the class variable:
class ProcessData:
def __init__(self):
self.data = None
def get_data(self,path):
data = pd.read_csv(f"{path}/data.csv"}
self.data = data
def clean_data(self)
self.data.set_index("timestamp", inplace=True)
self.data.drop_duplicates(inplace=True)
def main():
processor = ProcessData()
processor.get_data("path/to/data")
processor.clean_data()
I have a suspicion that the latter method is better, but I could also see instances where the former might have its advantages. I am sure the answer to my question is "it depends", but I am curious in general, what are the best practices?
get_datamust be called beforeclean_data. In your first example,clean_datatakesdataas an argument, so the dependencies are obvious just from the function signatures.data. The second example misses a trick where you should be passing thepathto the__init__()method to read and set theself.datamember. Then you can have multiple other methods that process the data as required.