2

I don't have any formal training in programming, but I routinely come across this question when I am making classes and running individual methods of that class in sequence. What is better: save results as class variables or return them and use them as inputs to subsequent method calls. For example, here is a class where the the variables are returned and used as inputs:

class ProcessData:

 def __init__(self):
  pass

 def get_data(self,path):
  data = pd.read_csv(f"{path}/data.csv"}
  return data

 def clean_data(self, data)
  data.set_index("timestamp", inplace=True)
  data.drop_duplicates(inplace=True)
  return data

def main():
 processor = ProcessData()
 temp = processor.get_data("path/to/data")
 processed_data = processor.clean_data(temp)

And here is an example where the results are saved/used to update the class variable:

class ProcessData:

 def __init__(self):
  self.data = None

 def get_data(self,path):
  data = pd.read_csv(f"{path}/data.csv"}
  self.data = data

 def clean_data(self)
  self.data.set_index("timestamp", inplace=True)
  self.data.drop_duplicates(inplace=True)
  
def main():
 processor = ProcessData()
 processor.get_data("path/to/data")
 processor.clean_data()

I have a suspicion that the latter method is better, but I could also see instances where the former might have its advantages. I am sure the answer to my question is "it depends", but I am curious in general, what are the best practices?

4
  • I'd go with the latter unless you have really good reasons for needing the former. The latter makes the code much easier to understand and maintain because you can easily track the "flow" of data and see when and where those values are expected to change. There are some cases where you may need to save the result of a function for performance reasons - in those cases you can always return the result from the function and save it to a member variable at the call site. Commented Aug 29, 2022 at 15:20
  • 1
    You are correct, it depends. I use both, but I lean towards passing arguments. It makes the methods more pure, like functions, and easier to reason about. Commented Aug 29, 2022 at 15:21
  • 5
    Each piece of state is one more thing to keep track of and one more thing that can produce a bug. In general, the less state each part of the program has to take into account, the better. Passing state between functions by saving them as instance variables also tends to introduce non-obvious dependencies. In your second example, the caller of the class needs to know that get_data must be called before clean_data. In your first example, clean_data takes data as an argument, so the dependencies are obvious just from the function signatures. Commented Aug 29, 2022 at 15:21
  • 1
    Your first example doesn't need to be a class and could be implemented with plain functions. But there you have to keep track of the data. The second example misses a trick where you should be passing the path to the __init__() method to read and set the self.data member. Then you can have multiple other methods that process the data as required. Commented Aug 29, 2022 at 15:24

1 Answer 1

0

Sketch the class based on usage, then create it

Instead of inventing classes to make your high level coding easier, tap your heels together and write the high-level code as if the classes already existed. Then create the classes with the methods and behavior that exactly fits what you need.

PEP AS AN EXAMPLE

If you look at several peps, you'll notice that the rationale or motivation is given before the details. The rationale and motivation shows how the new Python feature is going to solve a problem and how it is going to be used sometimes with code examples.

Example from PEP 289 – Generator Expressions:

Generator expressions are especially useful with functions like sum(), min(), and max() that reduce an iterable input to a single value:

max(len(line)  for line in file  if line.strip())

Generator expressions also address some examples of functionals coded with lambda:

reduce(lambda s, a: s + a.myattr, data, 0)
reduce(lambda s, a: s + a[3], data, 0)

These simplify to:

sum(a.myattr for a in data)
sum(a[3] for a in data)

My methodology given above is the same as describing the motivation and rationale for a class in terms of use. Because you are writing the code that is actually going to use it first.

Sign up to request clarification or add additional context in comments.

4 Comments

Wow that's the fastest vote responses up or down I have ever received on stackoverflow. Either people feel my answer is completely wrong or too vague to be helpful. Constructive feedback would be helpful. Thanks.
I think I see what you mean. So your answer is essentially an "it depends" because I should be creating classes, functions, etc. after "sketching" out the structure based on what the particulate project calls for -- is that correct?
@FruityFritz that's exactly what I mean.
@FruityFritz I updated the heading to use your terminology. It more closely represents what I'm describing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.