8

I have the special case of the problem, but it would be nice to know whether it is possible for any function.

So I want to find the position of a substring in a string. Ok, in python there is a find method which does exactly what is needed.

string.find(s, sub[, start[, end]])

Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.

Amazing, but the problem is that finding a big substring in a big string can run from O(n*m) to O(n) (which is a huge deal) depending on the algorithm. Documentation gives no information about time complexity, nor information about the underlying algorithm.

I see few approaches how to resolve this:

  • benchmark
  • go to source code and try to understand it

Both does not sound really easy (I hope that there is an easier way). So how can I find a complexity of a built-in function?

3
  • You may want to take a look at the big_o module. Commented Oct 25, 2014 at 11:33
  • @AlexThornton thank you Alex, but this basically falls into my benchmark category. Not only it is long to estimate it, it is sometimes really hard to get it (for some probabilistic algorithms or get the edge cases). It just sounds strange that this information is not available in the documentation. Commented Oct 25, 2014 at 11:35
  • It is strange. There is a 'TimeComplexity' section on the wiki which tells it to you for a few, but not for methods like string.find(). Commented Oct 25, 2014 at 11:37

1 Answer 1

6

You say, "go to source code and try to understand it," but it might be easier than you think. Once you get to the actual implementation code, in Objects/stringlib/fastsearch.h, you find:

/* fast search/count implementation, based on a mix between boyer-
   moore and horspool, with a few more bells and whistles on the top.
   for some more background, see: http://effbot.org/zone/stringlib.htm */

The URL referenced there has a good discussion of the algorithm and its complexity.

Sign up to request clarification or add additional context in comments.

2 Comments

sounds good enough. I am just thinking, what is the point of not including this information in the documentation?
This would be good information to include in the documentation, you are right. You could try filing a bug about it at bugs.python.org One issue is that the stdlib may be implemented differently by different Python implementations, and they may not want to make a promise about complexity for a function like this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.