11

I need to calculate distances between every pair of points in an array and only want to do that once per pair. Is what I've come up with efficient enough or is there a better way? Here's an example, along with a visual to explain what I'm trying to obtain:

diagram of code purpose

e.g., first get segments A-B, A-C, A-D; then B-C, B-D; and finally, C-D. In other words, we want A-B in our new array, but not B-A since it would be a duplication.

var pointsArray = new Point[4];

pointsArray[0] = new Point(0, 0);
pointsArray[1] = new Point(10, 0);
pointsArray[2] = new Point(10, 10);
pointsArray[3] = new Point(0, 10);

// using (n * (n-1)) / 2 to determine array size
int distArraySize = (pointsArray.Length*(pointsArray.Length - 1))/2;

var distanceArray = new double[distArraySize];

int distanceArrayIndex = 0;

// Loop through points and get distances, never using same point pair twice
for (int currentPointIndex = 0; currentPointIndex < pointsArray.Length - 1; currentPointIndex++)
{
    for (int otherPointIndex = currentPointIndex + 1;
            otherPointIndex < pointsArray.Length;
            otherPointIndex++)
    {
        double xDistance = pointsArray[otherPointIndex].X - pointsArray[currentPointIndex].X;
        double yDistance = pointsArray[otherPointIndex].Y - pointsArray[currentPointIndex].Y;

        double distance = Math.Sqrt(Math.Pow(xDistance, 2) + Math.Pow(yDistance, 2));

        // Add distance to distanceArray
        distanceArray[distanceArrayIndex] = distance;

        distanceArrayIndex++;
    }
} 

Since this will be used with many thousands of points, I'm thinking a precisely dimensioned array would be more efficient than using any sort of IEnumerable.

4
  • 4
    This seems good. It's both efficient and will work. Did you mean to post this in Code Review instead? codereview.stackexchange.com Commented May 14, 2012 at 12:23
  • @yamen I wasn't aware of that option. Is there a way I can move this question over there? Thanks! Commented May 14, 2012 at 12:29
  • My feeling is that this is the best way; assuming all points are unique, then logically the best way to generate all the combinations from the set of points is to iterate once over the whole set, and then within each iteration iterate over the rest from that point on. Ergo you will never generate the combinations A,B and B,A. That said, this assumes that you absolutely need to store the distances and really can't simply rely on calculating them ad-hoc. But then that's outside the scope of your question Commented May 14, 2012 at 12:32
  • @AndrasZoltan Yes, the points in real world use will be unique. It's still a bit up in the air whether we will store the distances for further calculation or just keep ones that fall within a certain range. In the latter case, I'd probably just add the distances to a List<double>. Commented May 14, 2012 at 12:35

4 Answers 4

3

If you have n points, the set of all pairs of points contains n * (n-1) / 2 elements. That's the number of operations you are doing. The only change I would do is using Parallel.ForEach() to do the operations in parallel.

Something like this (needs debugging)

        int distArraySize = (pointsArray.Length * (pointsArray.Length - 1)) / 2;

        var distanceArray = new double[distArraySize];

        int numPoints = pointsArray.Length;

        Parallel.ForEach<int>(Enumerable.Range(0, numPoints - 2),
            currentPointIndex =>
            {
                Parallel.ForEach<int>(Enumerable.Range(currentPointIndex + 1, numPoints - 2),
                    otherPointIndex =>
                    {
                        double xDistance = pointsArray[otherPointIndex].X - pointsArray[currentPointIndex].X;
                        double yDistance = pointsArray[otherPointIndex].Y - pointsArray[currentPointIndex].Y;
                        double distance = Math.Sqrt(xDistance * xDistance + yDistance * yDistance);
                        int distanceArrayIndex = currentPointIndex * numPoints - (currentPointIndex * (currentPointIndex + 1) / 2) + otherPointIndex - 1;
                        distanceArray[distanceArrayIndex] = distance;
                    });
            });
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the suggestion. I'll try to implement that on my own, but any example related to my example would of course be appreciated.
Note: changing this code from a simple for loop into a Parallel.ForEach can be justified when there is a big number of points and performance is crucial. Otherwise it is just unnecessary complexity. Also, this code needs lock(distanceArray) before assigning an element to the array to avoid threading issues.
0

Looks good to me, but don't you have a bug?

Each of the inner iterations will overwrite the previous one almost completely, except for its first position. Won't it?

That is, in distanceArray[otherPointIndex] otherPointIndex gets values from currentPointIndex + 1 to pointsArray.Length - 1.
In your example, this will range on [0-3] instead of [0-6].

1 Comment

Yes, I did have a bug, but I don't think it was that. Remember, I'm ranging on 0-3 (the points) to get the six segments. Your question did make me realize that I was messing up the distance array. It needed its own incrementer. Thanks.
0

I've had to perform operations like this in the past, and I think your immediate reaction to high number crunch operations is "there must be a faster or more efficient way to do this". The only other even remotely workable solution I can think of would be to hash the pair and place this hash in a HashSet, then check the HashSet before doing the distance calculation. However, this will likely ultimately work out worse for performance.

You're solution is good. As j0aqu1n points out, you're probably going to have to crunch the numbers one way or another, and in this case you aren't ever performing the same calculation twice.

It will be interesting to see if there are any other solutions to this.

Comments

0

I think, it's a bit faster to use xDistance*xDistance instead of Math.Pow(xDistance, 2). Apart from this, if you really always need to calculate all distances, there is not much room for improvement. If, OTOH, you sometimes don't need to calculate all, you could calculate the distances lazily when needed.

1 Comment

I'll give that a try with big datasets where the effect will be apparent. I'll bet you're correct. Funny how I literally translated the x^2 to Math.Pow instead of just doing as you suggested.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.