Given two arrays, find the permutations that give closest distance between two arrays












22














Let's say I have two arrays of the same length n, named A and B.



These two arrays contain real values.
We define the distance between two arrays as the mean squared distance.



dist(A,B) = sqrt( sum((A - B)2) )



I want to find the permutation of A that gives the min distance to B.
The naive method is to try every permutation of A and record the min distance. However, this method is of complexity O(n!).



Is there an algorithm of complexity less than O(n!)?










share|improve this question
























  • For what purpose is this. Do you need the best solution or just a good enough solution?
    – Yair Halberstadt
    yesterday










  • good question though
    – Aditya Gupta
    yesterday










  • Can you eleborate a bit with an example problem and the answer you expect?
    – Aldert
    yesterday










  • I am not certain if I fully get the issue you are describing, sounds like you want to try all the possible permutations of a vector of coordinates to use Pythagoras and find which combination provides the shortest distance. So you are basically building a tree of all possible combinations, for this you would need to consider what is it you are after, degree of precision or if it needs to be exact, how many results you want (can be many), speed [of the algorithm]? Having this in mind what you could do to speed up such process is to trim your tree; again how exact does it need to be?
    – Ordiel
    yesterday






  • 1




    Why did you not approve Ivo Merchiers' answer? I'm just curious.
    – גלעד ברקן
    yesterday
















22














Let's say I have two arrays of the same length n, named A and B.



These two arrays contain real values.
We define the distance between two arrays as the mean squared distance.



dist(A,B) = sqrt( sum((A - B)2) )



I want to find the permutation of A that gives the min distance to B.
The naive method is to try every permutation of A and record the min distance. However, this method is of complexity O(n!).



Is there an algorithm of complexity less than O(n!)?










share|improve this question
























  • For what purpose is this. Do you need the best solution or just a good enough solution?
    – Yair Halberstadt
    yesterday










  • good question though
    – Aditya Gupta
    yesterday










  • Can you eleborate a bit with an example problem and the answer you expect?
    – Aldert
    yesterday










  • I am not certain if I fully get the issue you are describing, sounds like you want to try all the possible permutations of a vector of coordinates to use Pythagoras and find which combination provides the shortest distance. So you are basically building a tree of all possible combinations, for this you would need to consider what is it you are after, degree of precision or if it needs to be exact, how many results you want (can be many), speed [of the algorithm]? Having this in mind what you could do to speed up such process is to trim your tree; again how exact does it need to be?
    – Ordiel
    yesterday






  • 1




    Why did you not approve Ivo Merchiers' answer? I'm just curious.
    – גלעד ברקן
    yesterday














22












22








22


5





Let's say I have two arrays of the same length n, named A and B.



These two arrays contain real values.
We define the distance between two arrays as the mean squared distance.



dist(A,B) = sqrt( sum((A - B)2) )



I want to find the permutation of A that gives the min distance to B.
The naive method is to try every permutation of A and record the min distance. However, this method is of complexity O(n!).



Is there an algorithm of complexity less than O(n!)?










share|improve this question















Let's say I have two arrays of the same length n, named A and B.



These two arrays contain real values.
We define the distance between two arrays as the mean squared distance.



dist(A,B) = sqrt( sum((A - B)2) )



I want to find the permutation of A that gives the min distance to B.
The naive method is to try every permutation of A and record the min distance. However, this method is of complexity O(n!).



Is there an algorithm of complexity less than O(n!)?







algorithm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









Grijesh Chauhan

45.1k1495161




45.1k1495161










asked yesterday









Wang Duo

19719




19719












  • For what purpose is this. Do you need the best solution or just a good enough solution?
    – Yair Halberstadt
    yesterday










  • good question though
    – Aditya Gupta
    yesterday










  • Can you eleborate a bit with an example problem and the answer you expect?
    – Aldert
    yesterday










  • I am not certain if I fully get the issue you are describing, sounds like you want to try all the possible permutations of a vector of coordinates to use Pythagoras and find which combination provides the shortest distance. So you are basically building a tree of all possible combinations, for this you would need to consider what is it you are after, degree of precision or if it needs to be exact, how many results you want (can be many), speed [of the algorithm]? Having this in mind what you could do to speed up such process is to trim your tree; again how exact does it need to be?
    – Ordiel
    yesterday






  • 1




    Why did you not approve Ivo Merchiers' answer? I'm just curious.
    – גלעד ברקן
    yesterday


















  • For what purpose is this. Do you need the best solution or just a good enough solution?
    – Yair Halberstadt
    yesterday










  • good question though
    – Aditya Gupta
    yesterday










  • Can you eleborate a bit with an example problem and the answer you expect?
    – Aldert
    yesterday










  • I am not certain if I fully get the issue you are describing, sounds like you want to try all the possible permutations of a vector of coordinates to use Pythagoras and find which combination provides the shortest distance. So you are basically building a tree of all possible combinations, for this you would need to consider what is it you are after, degree of precision or if it needs to be exact, how many results you want (can be many), speed [of the algorithm]? Having this in mind what you could do to speed up such process is to trim your tree; again how exact does it need to be?
    – Ordiel
    yesterday






  • 1




    Why did you not approve Ivo Merchiers' answer? I'm just curious.
    – גלעד ברקן
    yesterday
















For what purpose is this. Do you need the best solution or just a good enough solution?
– Yair Halberstadt
yesterday




For what purpose is this. Do you need the best solution or just a good enough solution?
– Yair Halberstadt
yesterday












good question though
– Aditya Gupta
yesterday




good question though
– Aditya Gupta
yesterday












Can you eleborate a bit with an example problem and the answer you expect?
– Aldert
yesterday




Can you eleborate a bit with an example problem and the answer you expect?
– Aldert
yesterday












I am not certain if I fully get the issue you are describing, sounds like you want to try all the possible permutations of a vector of coordinates to use Pythagoras and find which combination provides the shortest distance. So you are basically building a tree of all possible combinations, for this you would need to consider what is it you are after, degree of precision or if it needs to be exact, how many results you want (can be many), speed [of the algorithm]? Having this in mind what you could do to speed up such process is to trim your tree; again how exact does it need to be?
– Ordiel
yesterday




I am not certain if I fully get the issue you are describing, sounds like you want to try all the possible permutations of a vector of coordinates to use Pythagoras and find which combination provides the shortest distance. So you are basically building a tree of all possible combinations, for this you would need to consider what is it you are after, degree of precision or if it needs to be exact, how many results you want (can be many), speed [of the algorithm]? Having this in mind what you could do to speed up such process is to trim your tree; again how exact does it need to be?
– Ordiel
yesterday




1




1




Why did you not approve Ivo Merchiers' answer? I'm just curious.
– גלעד ברקן
yesterday




Why did you not approve Ivo Merchiers' answer? I'm just curious.
– גלעד ברקן
yesterday












4 Answers
4






active

oldest

votes


















14














The problem you describe is equivalent to the Minimum Cost Perfect Matching Problem which can be solved efficiently (and exactly) using The Hungarian Algorithm. In the Minimum Cost Perfect Matching Problem you have an input weighted bipartite graph where the two sets have the same size n, and each edge has a non-negative cost. The goal is to find a perfect matching of minimum cost.



In your case, the bipartite graph is a biclique. That is, every vertex in one set is connected to every vertex in the other set, and the cost of edge (i,j) is (A[i] - B[i])^2 (where i corresponds to index i in array A and j corresponds to index j in array B).



EDIT:
This is not the best solution for the problem. Ivo Merchiers came up with a better solution both in terms of efficiency and simplicity. The reason I'm not deleting my answer is because my suggested solution is valuable for distance measures to which Ivo's solution does not apply (as his approach works by exploiting a property of the Euclidean distance).






share|improve this answer























  • Correct but inefficient for the problem. See Ivo's solution.
    – Dave
    yesterday










  • @Dave, thanks for the comment. I've added a reference to Ivo's better solution.
    – snakile
    yesterday



















20














You can just sort both A and B. In that case, the Euclidean distance is minimal.



If B has to remain fixed, then you just need to invert the permutation needed to sort B and apply that to the sorted version of A.



This solution does assume that you want to find just a permutation and not the most simple permutation (since sorting and unsorting through permutations will not be incredibly efficient).





Proof:
Let S,T be our pair of arrays.
We can assume S to be sorted without loss of generality, since all that
matters is the mapping between the two sets of elements.



Let T be the permutation that minimizes the distance between the two arrays,
and let d be that distance.



Suppose that T is not sorted. Then there exist elements i,j s.t. T_i > T_j



S_i + k1 = S_j
T_i = T_j + k2
where k1,k2 > 0


Let x be the total distance of all elements except i and j.



d = x + (S_i - T_i)^2 + ((S_i + k1) - (T_i - k2))^2


If we swap the order of T_i and T_j, then our new distance is:



d' = x + (S_i - (T_i - k2))^2 + ((S_i + k1) - T_i)^2


Thus:
d - d' = 2 * k1 * k2, which contradicts our assumption that T is the permutation that minimizes the distance, so the permutation that does so must be sorted.



Sorting the two arrays can be done in O(n log n) using a variety of methods.






share|improve this answer



















  • 3




    If this approach is wrong, I'd be glad to hear a counter-example, since I'm not sure what the issue with it is and I'm glad to learn something new.
    – Ivo Merchiers
    yesterday






  • 3




    It is mathematically sound. I don't have time to write up a proof, but it's along the lines of: If the best solution is not sorted, find two indices x, y such that A[x] < A[y] and B[x] > B[y]. Swap the two in A, and you have a better solution.
    – Yair Halberstadt
    yesterday






  • 1




    If this can be proven correct, it's superior to the Hungarian Algorithm in the accepted answer; that one's O(n^3).
    – Ray
    yesterday






  • 2




    Yair's idea works. I've submitted an edit with a proof along those lines.
    – Ray
    yesterday








  • 2




    Please note, that this is true for the Euclidean distance, but it's not necessarily true for an other distance measure. Just take sqrt(A[i] - B[i]) as a distance measure and take the elements [1 4] , [4 5]. Sorting yields error sqrt(3) + sqrt(1) which is bigger than sqrt(4) + sqrt(0) (which is the error of the ideal permutation).
    – yar
    yesterday





















6














You can just sort A and B and match up the corresponding elements.



Imagine that there are two elements of A, Ai and Aj, corresponding to Bi and Bj.



The error contribution from these matches is:



(Ai-Bi)^2 + (Aj-Bj)^2



= Ai^2 + Bi^2 + Aj^2 + Bj^2 - 2(AiBi + AjBj)



Is it better to swap the matches, or to keep them the way they are?



Well, the difference in the error if we swap them is:



2(AiBi + AjBj) - 2(AiBj + AjBi)



~ AiBi - AiBj + AjBj - AjBi



= Ai(Bi - Bj) - Aj(Bi - Bj)



= (Ai - Aj)(Bi - Bj)



So, if the As and Bs are in the same order, then this product is positive and the error will go up if you swap them. If they are not in the same order, then this product is negative an the error will go down if you swap them.



If you repeatedly swap any pairs that are out of order until there are no such pairs, your error will keep going down, and you'll end up with with the nth largest A matched up with the nth largest B all the way through the array.



Just sorting them and matching them up is therefor optimal, and of course it's way faster than the Hungarian algorithm.






share|improve this answer





























    4














    Construct a bipartite graph from the vectors. Find minimum weight perfect matching in this graph.



    How to construct the graph.




    1. Let A, B be two parts of the graph. Each with n nodes.

    2. Connect i in A to j in B with an edge of weight abs(A[i] - B[j]).


    I believe this can be done in O(n^2).



    See http://www.cse.iitd.ernet.in/~naveen/courses/CSL851/lec4.pdf





    If each number in A has only one closest number in B then you can do this in O(n log n). This probably might be the case since you have the real numbers.



    How?




    1. Sort A O(n log n)

    2. Binary search for the closest number for each number in B. O(n log n).


    If the numbers are coming from the real world and have even a little bit of randomness to them, then the differences between each pair of the numbers is probably unique. You can verify if this is the case by running experiments on the input vectors. Then the problem becomes easy to solve yay!






    share|improve this answer



















    • 1




      Thanks, yours answer is correct! But @snakile replied earlier with good references so I decided to accept his.
      – Wang Duo
      yesterday











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54041397%2fgiven-two-arrays-find-the-permutations-that-give-closest-distance-between-two-a%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    14














    The problem you describe is equivalent to the Minimum Cost Perfect Matching Problem which can be solved efficiently (and exactly) using The Hungarian Algorithm. In the Minimum Cost Perfect Matching Problem you have an input weighted bipartite graph where the two sets have the same size n, and each edge has a non-negative cost. The goal is to find a perfect matching of minimum cost.



    In your case, the bipartite graph is a biclique. That is, every vertex in one set is connected to every vertex in the other set, and the cost of edge (i,j) is (A[i] - B[i])^2 (where i corresponds to index i in array A and j corresponds to index j in array B).



    EDIT:
    This is not the best solution for the problem. Ivo Merchiers came up with a better solution both in terms of efficiency and simplicity. The reason I'm not deleting my answer is because my suggested solution is valuable for distance measures to which Ivo's solution does not apply (as his approach works by exploiting a property of the Euclidean distance).






    share|improve this answer























    • Correct but inefficient for the problem. See Ivo's solution.
      – Dave
      yesterday










    • @Dave, thanks for the comment. I've added a reference to Ivo's better solution.
      – snakile
      yesterday
















    14














    The problem you describe is equivalent to the Minimum Cost Perfect Matching Problem which can be solved efficiently (and exactly) using The Hungarian Algorithm. In the Minimum Cost Perfect Matching Problem you have an input weighted bipartite graph where the two sets have the same size n, and each edge has a non-negative cost. The goal is to find a perfect matching of minimum cost.



    In your case, the bipartite graph is a biclique. That is, every vertex in one set is connected to every vertex in the other set, and the cost of edge (i,j) is (A[i] - B[i])^2 (where i corresponds to index i in array A and j corresponds to index j in array B).



    EDIT:
    This is not the best solution for the problem. Ivo Merchiers came up with a better solution both in terms of efficiency and simplicity. The reason I'm not deleting my answer is because my suggested solution is valuable for distance measures to which Ivo's solution does not apply (as his approach works by exploiting a property of the Euclidean distance).






    share|improve this answer























    • Correct but inefficient for the problem. See Ivo's solution.
      – Dave
      yesterday










    • @Dave, thanks for the comment. I've added a reference to Ivo's better solution.
      – snakile
      yesterday














    14












    14








    14






    The problem you describe is equivalent to the Minimum Cost Perfect Matching Problem which can be solved efficiently (and exactly) using The Hungarian Algorithm. In the Minimum Cost Perfect Matching Problem you have an input weighted bipartite graph where the two sets have the same size n, and each edge has a non-negative cost. The goal is to find a perfect matching of minimum cost.



    In your case, the bipartite graph is a biclique. That is, every vertex in one set is connected to every vertex in the other set, and the cost of edge (i,j) is (A[i] - B[i])^2 (where i corresponds to index i in array A and j corresponds to index j in array B).



    EDIT:
    This is not the best solution for the problem. Ivo Merchiers came up with a better solution both in terms of efficiency and simplicity. The reason I'm not deleting my answer is because my suggested solution is valuable for distance measures to which Ivo's solution does not apply (as his approach works by exploiting a property of the Euclidean distance).






    share|improve this answer














    The problem you describe is equivalent to the Minimum Cost Perfect Matching Problem which can be solved efficiently (and exactly) using The Hungarian Algorithm. In the Minimum Cost Perfect Matching Problem you have an input weighted bipartite graph where the two sets have the same size n, and each edge has a non-negative cost. The goal is to find a perfect matching of minimum cost.



    In your case, the bipartite graph is a biclique. That is, every vertex in one set is connected to every vertex in the other set, and the cost of edge (i,j) is (A[i] - B[i])^2 (where i corresponds to index i in array A and j corresponds to index j in array B).



    EDIT:
    This is not the best solution for the problem. Ivo Merchiers came up with a better solution both in terms of efficiency and simplicity. The reason I'm not deleting my answer is because my suggested solution is valuable for distance measures to which Ivo's solution does not apply (as his approach works by exploiting a property of the Euclidean distance).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 20 hours ago

























    answered yesterday









    snakile

    24.1k49138208




    24.1k49138208












    • Correct but inefficient for the problem. See Ivo's solution.
      – Dave
      yesterday










    • @Dave, thanks for the comment. I've added a reference to Ivo's better solution.
      – snakile
      yesterday


















    • Correct but inefficient for the problem. See Ivo's solution.
      – Dave
      yesterday










    • @Dave, thanks for the comment. I've added a reference to Ivo's better solution.
      – snakile
      yesterday
















    Correct but inefficient for the problem. See Ivo's solution.
    – Dave
    yesterday




    Correct but inefficient for the problem. See Ivo's solution.
    – Dave
    yesterday












    @Dave, thanks for the comment. I've added a reference to Ivo's better solution.
    – snakile
    yesterday




    @Dave, thanks for the comment. I've added a reference to Ivo's better solution.
    – snakile
    yesterday













    20














    You can just sort both A and B. In that case, the Euclidean distance is minimal.



    If B has to remain fixed, then you just need to invert the permutation needed to sort B and apply that to the sorted version of A.



    This solution does assume that you want to find just a permutation and not the most simple permutation (since sorting and unsorting through permutations will not be incredibly efficient).





    Proof:
    Let S,T be our pair of arrays.
    We can assume S to be sorted without loss of generality, since all that
    matters is the mapping between the two sets of elements.



    Let T be the permutation that minimizes the distance between the two arrays,
    and let d be that distance.



    Suppose that T is not sorted. Then there exist elements i,j s.t. T_i > T_j



    S_i + k1 = S_j
    T_i = T_j + k2
    where k1,k2 > 0


    Let x be the total distance of all elements except i and j.



    d = x + (S_i - T_i)^2 + ((S_i + k1) - (T_i - k2))^2


    If we swap the order of T_i and T_j, then our new distance is:



    d' = x + (S_i - (T_i - k2))^2 + ((S_i + k1) - T_i)^2


    Thus:
    d - d' = 2 * k1 * k2, which contradicts our assumption that T is the permutation that minimizes the distance, so the permutation that does so must be sorted.



    Sorting the two arrays can be done in O(n log n) using a variety of methods.






    share|improve this answer



















    • 3




      If this approach is wrong, I'd be glad to hear a counter-example, since I'm not sure what the issue with it is and I'm glad to learn something new.
      – Ivo Merchiers
      yesterday






    • 3




      It is mathematically sound. I don't have time to write up a proof, but it's along the lines of: If the best solution is not sorted, find two indices x, y such that A[x] < A[y] and B[x] > B[y]. Swap the two in A, and you have a better solution.
      – Yair Halberstadt
      yesterday






    • 1




      If this can be proven correct, it's superior to the Hungarian Algorithm in the accepted answer; that one's O(n^3).
      – Ray
      yesterday






    • 2




      Yair's idea works. I've submitted an edit with a proof along those lines.
      – Ray
      yesterday








    • 2




      Please note, that this is true for the Euclidean distance, but it's not necessarily true for an other distance measure. Just take sqrt(A[i] - B[i]) as a distance measure and take the elements [1 4] , [4 5]. Sorting yields error sqrt(3) + sqrt(1) which is bigger than sqrt(4) + sqrt(0) (which is the error of the ideal permutation).
      – yar
      yesterday


















    20














    You can just sort both A and B. In that case, the Euclidean distance is minimal.



    If B has to remain fixed, then you just need to invert the permutation needed to sort B and apply that to the sorted version of A.



    This solution does assume that you want to find just a permutation and not the most simple permutation (since sorting and unsorting through permutations will not be incredibly efficient).





    Proof:
    Let S,T be our pair of arrays.
    We can assume S to be sorted without loss of generality, since all that
    matters is the mapping between the two sets of elements.



    Let T be the permutation that minimizes the distance between the two arrays,
    and let d be that distance.



    Suppose that T is not sorted. Then there exist elements i,j s.t. T_i > T_j



    S_i + k1 = S_j
    T_i = T_j + k2
    where k1,k2 > 0


    Let x be the total distance of all elements except i and j.



    d = x + (S_i - T_i)^2 + ((S_i + k1) - (T_i - k2))^2


    If we swap the order of T_i and T_j, then our new distance is:



    d' = x + (S_i - (T_i - k2))^2 + ((S_i + k1) - T_i)^2


    Thus:
    d - d' = 2 * k1 * k2, which contradicts our assumption that T is the permutation that minimizes the distance, so the permutation that does so must be sorted.



    Sorting the two arrays can be done in O(n log n) using a variety of methods.






    share|improve this answer



















    • 3




      If this approach is wrong, I'd be glad to hear a counter-example, since I'm not sure what the issue with it is and I'm glad to learn something new.
      – Ivo Merchiers
      yesterday






    • 3




      It is mathematically sound. I don't have time to write up a proof, but it's along the lines of: If the best solution is not sorted, find two indices x, y such that A[x] < A[y] and B[x] > B[y]. Swap the two in A, and you have a better solution.
      – Yair Halberstadt
      yesterday






    • 1




      If this can be proven correct, it's superior to the Hungarian Algorithm in the accepted answer; that one's O(n^3).
      – Ray
      yesterday






    • 2




      Yair's idea works. I've submitted an edit with a proof along those lines.
      – Ray
      yesterday








    • 2




      Please note, that this is true for the Euclidean distance, but it's not necessarily true for an other distance measure. Just take sqrt(A[i] - B[i]) as a distance measure and take the elements [1 4] , [4 5]. Sorting yields error sqrt(3) + sqrt(1) which is bigger than sqrt(4) + sqrt(0) (which is the error of the ideal permutation).
      – yar
      yesterday
















    20












    20








    20






    You can just sort both A and B. In that case, the Euclidean distance is minimal.



    If B has to remain fixed, then you just need to invert the permutation needed to sort B and apply that to the sorted version of A.



    This solution does assume that you want to find just a permutation and not the most simple permutation (since sorting and unsorting through permutations will not be incredibly efficient).





    Proof:
    Let S,T be our pair of arrays.
    We can assume S to be sorted without loss of generality, since all that
    matters is the mapping between the two sets of elements.



    Let T be the permutation that minimizes the distance between the two arrays,
    and let d be that distance.



    Suppose that T is not sorted. Then there exist elements i,j s.t. T_i > T_j



    S_i + k1 = S_j
    T_i = T_j + k2
    where k1,k2 > 0


    Let x be the total distance of all elements except i and j.



    d = x + (S_i - T_i)^2 + ((S_i + k1) - (T_i - k2))^2


    If we swap the order of T_i and T_j, then our new distance is:



    d' = x + (S_i - (T_i - k2))^2 + ((S_i + k1) - T_i)^2


    Thus:
    d - d' = 2 * k1 * k2, which contradicts our assumption that T is the permutation that minimizes the distance, so the permutation that does so must be sorted.



    Sorting the two arrays can be done in O(n log n) using a variety of methods.






    share|improve this answer














    You can just sort both A and B. In that case, the Euclidean distance is minimal.



    If B has to remain fixed, then you just need to invert the permutation needed to sort B and apply that to the sorted version of A.



    This solution does assume that you want to find just a permutation and not the most simple permutation (since sorting and unsorting through permutations will not be incredibly efficient).





    Proof:
    Let S,T be our pair of arrays.
    We can assume S to be sorted without loss of generality, since all that
    matters is the mapping between the two sets of elements.



    Let T be the permutation that minimizes the distance between the two arrays,
    and let d be that distance.



    Suppose that T is not sorted. Then there exist elements i,j s.t. T_i > T_j



    S_i + k1 = S_j
    T_i = T_j + k2
    where k1,k2 > 0


    Let x be the total distance of all elements except i and j.



    d = x + (S_i - T_i)^2 + ((S_i + k1) - (T_i - k2))^2


    If we swap the order of T_i and T_j, then our new distance is:



    d' = x + (S_i - (T_i - k2))^2 + ((S_i + k1) - T_i)^2


    Thus:
    d - d' = 2 * k1 * k2, which contradicts our assumption that T is the permutation that minimizes the distance, so the permutation that does so must be sorted.



    Sorting the two arrays can be done in O(n log n) using a variety of methods.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday









    Ray

    6861121




    6861121










    answered yesterday









    Ivo Merchiers

    38311




    38311








    • 3




      If this approach is wrong, I'd be glad to hear a counter-example, since I'm not sure what the issue with it is and I'm glad to learn something new.
      – Ivo Merchiers
      yesterday






    • 3




      It is mathematically sound. I don't have time to write up a proof, but it's along the lines of: If the best solution is not sorted, find two indices x, y such that A[x] < A[y] and B[x] > B[y]. Swap the two in A, and you have a better solution.
      – Yair Halberstadt
      yesterday






    • 1




      If this can be proven correct, it's superior to the Hungarian Algorithm in the accepted answer; that one's O(n^3).
      – Ray
      yesterday






    • 2




      Yair's idea works. I've submitted an edit with a proof along those lines.
      – Ray
      yesterday








    • 2




      Please note, that this is true for the Euclidean distance, but it's not necessarily true for an other distance measure. Just take sqrt(A[i] - B[i]) as a distance measure and take the elements [1 4] , [4 5]. Sorting yields error sqrt(3) + sqrt(1) which is bigger than sqrt(4) + sqrt(0) (which is the error of the ideal permutation).
      – yar
      yesterday
















    • 3




      If this approach is wrong, I'd be glad to hear a counter-example, since I'm not sure what the issue with it is and I'm glad to learn something new.
      – Ivo Merchiers
      yesterday






    • 3




      It is mathematically sound. I don't have time to write up a proof, but it's along the lines of: If the best solution is not sorted, find two indices x, y such that A[x] < A[y] and B[x] > B[y]. Swap the two in A, and you have a better solution.
      – Yair Halberstadt
      yesterday






    • 1




      If this can be proven correct, it's superior to the Hungarian Algorithm in the accepted answer; that one's O(n^3).
      – Ray
      yesterday






    • 2




      Yair's idea works. I've submitted an edit with a proof along those lines.
      – Ray
      yesterday








    • 2




      Please note, that this is true for the Euclidean distance, but it's not necessarily true for an other distance measure. Just take sqrt(A[i] - B[i]) as a distance measure and take the elements [1 4] , [4 5]. Sorting yields error sqrt(3) + sqrt(1) which is bigger than sqrt(4) + sqrt(0) (which is the error of the ideal permutation).
      – yar
      yesterday










    3




    3




    If this approach is wrong, I'd be glad to hear a counter-example, since I'm not sure what the issue with it is and I'm glad to learn something new.
    – Ivo Merchiers
    yesterday




    If this approach is wrong, I'd be glad to hear a counter-example, since I'm not sure what the issue with it is and I'm glad to learn something new.
    – Ivo Merchiers
    yesterday




    3




    3




    It is mathematically sound. I don't have time to write up a proof, but it's along the lines of: If the best solution is not sorted, find two indices x, y such that A[x] < A[y] and B[x] > B[y]. Swap the two in A, and you have a better solution.
    – Yair Halberstadt
    yesterday




    It is mathematically sound. I don't have time to write up a proof, but it's along the lines of: If the best solution is not sorted, find two indices x, y such that A[x] < A[y] and B[x] > B[y]. Swap the two in A, and you have a better solution.
    – Yair Halberstadt
    yesterday




    1




    1




    If this can be proven correct, it's superior to the Hungarian Algorithm in the accepted answer; that one's O(n^3).
    – Ray
    yesterday




    If this can be proven correct, it's superior to the Hungarian Algorithm in the accepted answer; that one's O(n^3).
    – Ray
    yesterday




    2




    2




    Yair's idea works. I've submitted an edit with a proof along those lines.
    – Ray
    yesterday






    Yair's idea works. I've submitted an edit with a proof along those lines.
    – Ray
    yesterday






    2




    2




    Please note, that this is true for the Euclidean distance, but it's not necessarily true for an other distance measure. Just take sqrt(A[i] - B[i]) as a distance measure and take the elements [1 4] , [4 5]. Sorting yields error sqrt(3) + sqrt(1) which is bigger than sqrt(4) + sqrt(0) (which is the error of the ideal permutation).
    – yar
    yesterday






    Please note, that this is true for the Euclidean distance, but it's not necessarily true for an other distance measure. Just take sqrt(A[i] - B[i]) as a distance measure and take the elements [1 4] , [4 5]. Sorting yields error sqrt(3) + sqrt(1) which is bigger than sqrt(4) + sqrt(0) (which is the error of the ideal permutation).
    – yar
    yesterday













    6














    You can just sort A and B and match up the corresponding elements.



    Imagine that there are two elements of A, Ai and Aj, corresponding to Bi and Bj.



    The error contribution from these matches is:



    (Ai-Bi)^2 + (Aj-Bj)^2



    = Ai^2 + Bi^2 + Aj^2 + Bj^2 - 2(AiBi + AjBj)



    Is it better to swap the matches, or to keep them the way they are?



    Well, the difference in the error if we swap them is:



    2(AiBi + AjBj) - 2(AiBj + AjBi)



    ~ AiBi - AiBj + AjBj - AjBi



    = Ai(Bi - Bj) - Aj(Bi - Bj)



    = (Ai - Aj)(Bi - Bj)



    So, if the As and Bs are in the same order, then this product is positive and the error will go up if you swap them. If they are not in the same order, then this product is negative an the error will go down if you swap them.



    If you repeatedly swap any pairs that are out of order until there are no such pairs, your error will keep going down, and you'll end up with with the nth largest A matched up with the nth largest B all the way through the array.



    Just sorting them and matching them up is therefor optimal, and of course it's way faster than the Hungarian algorithm.






    share|improve this answer


























      6














      You can just sort A and B and match up the corresponding elements.



      Imagine that there are two elements of A, Ai and Aj, corresponding to Bi and Bj.



      The error contribution from these matches is:



      (Ai-Bi)^2 + (Aj-Bj)^2



      = Ai^2 + Bi^2 + Aj^2 + Bj^2 - 2(AiBi + AjBj)



      Is it better to swap the matches, or to keep them the way they are?



      Well, the difference in the error if we swap them is:



      2(AiBi + AjBj) - 2(AiBj + AjBi)



      ~ AiBi - AiBj + AjBj - AjBi



      = Ai(Bi - Bj) - Aj(Bi - Bj)



      = (Ai - Aj)(Bi - Bj)



      So, if the As and Bs are in the same order, then this product is positive and the error will go up if you swap them. If they are not in the same order, then this product is negative an the error will go down if you swap them.



      If you repeatedly swap any pairs that are out of order until there are no such pairs, your error will keep going down, and you'll end up with with the nth largest A matched up with the nth largest B all the way through the array.



      Just sorting them and matching them up is therefor optimal, and of course it's way faster than the Hungarian algorithm.






      share|improve this answer
























        6












        6








        6






        You can just sort A and B and match up the corresponding elements.



        Imagine that there are two elements of A, Ai and Aj, corresponding to Bi and Bj.



        The error contribution from these matches is:



        (Ai-Bi)^2 + (Aj-Bj)^2



        = Ai^2 + Bi^2 + Aj^2 + Bj^2 - 2(AiBi + AjBj)



        Is it better to swap the matches, or to keep them the way they are?



        Well, the difference in the error if we swap them is:



        2(AiBi + AjBj) - 2(AiBj + AjBi)



        ~ AiBi - AiBj + AjBj - AjBi



        = Ai(Bi - Bj) - Aj(Bi - Bj)



        = (Ai - Aj)(Bi - Bj)



        So, if the As and Bs are in the same order, then this product is positive and the error will go up if you swap them. If they are not in the same order, then this product is negative an the error will go down if you swap them.



        If you repeatedly swap any pairs that are out of order until there are no such pairs, your error will keep going down, and you'll end up with with the nth largest A matched up with the nth largest B all the way through the array.



        Just sorting them and matching them up is therefor optimal, and of course it's way faster than the Hungarian algorithm.






        share|improve this answer












        You can just sort A and B and match up the corresponding elements.



        Imagine that there are two elements of A, Ai and Aj, corresponding to Bi and Bj.



        The error contribution from these matches is:



        (Ai-Bi)^2 + (Aj-Bj)^2



        = Ai^2 + Bi^2 + Aj^2 + Bj^2 - 2(AiBi + AjBj)



        Is it better to swap the matches, or to keep them the way they are?



        Well, the difference in the error if we swap them is:



        2(AiBi + AjBj) - 2(AiBj + AjBi)



        ~ AiBi - AiBj + AjBj - AjBi



        = Ai(Bi - Bj) - Aj(Bi - Bj)



        = (Ai - Aj)(Bi - Bj)



        So, if the As and Bs are in the same order, then this product is positive and the error will go up if you swap them. If they are not in the same order, then this product is negative an the error will go down if you swap them.



        If you repeatedly swap any pairs that are out of order until there are no such pairs, your error will keep going down, and you'll end up with with the nth largest A matched up with the nth largest B all the way through the array.



        Just sorting them and matching them up is therefor optimal, and of course it's way faster than the Hungarian algorithm.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered yesterday









        Matt Timmermans

        18.9k11532




        18.9k11532























            4














            Construct a bipartite graph from the vectors. Find minimum weight perfect matching in this graph.



            How to construct the graph.




            1. Let A, B be two parts of the graph. Each with n nodes.

            2. Connect i in A to j in B with an edge of weight abs(A[i] - B[j]).


            I believe this can be done in O(n^2).



            See http://www.cse.iitd.ernet.in/~naveen/courses/CSL851/lec4.pdf





            If each number in A has only one closest number in B then you can do this in O(n log n). This probably might be the case since you have the real numbers.



            How?




            1. Sort A O(n log n)

            2. Binary search for the closest number for each number in B. O(n log n).


            If the numbers are coming from the real world and have even a little bit of randomness to them, then the differences between each pair of the numbers is probably unique. You can verify if this is the case by running experiments on the input vectors. Then the problem becomes easy to solve yay!






            share|improve this answer



















            • 1




              Thanks, yours answer is correct! But @snakile replied earlier with good references so I decided to accept his.
              – Wang Duo
              yesterday
















            4














            Construct a bipartite graph from the vectors. Find minimum weight perfect matching in this graph.



            How to construct the graph.




            1. Let A, B be two parts of the graph. Each with n nodes.

            2. Connect i in A to j in B with an edge of weight abs(A[i] - B[j]).


            I believe this can be done in O(n^2).



            See http://www.cse.iitd.ernet.in/~naveen/courses/CSL851/lec4.pdf





            If each number in A has only one closest number in B then you can do this in O(n log n). This probably might be the case since you have the real numbers.



            How?




            1. Sort A O(n log n)

            2. Binary search for the closest number for each number in B. O(n log n).


            If the numbers are coming from the real world and have even a little bit of randomness to them, then the differences between each pair of the numbers is probably unique. You can verify if this is the case by running experiments on the input vectors. Then the problem becomes easy to solve yay!






            share|improve this answer



















            • 1




              Thanks, yours answer is correct! But @snakile replied earlier with good references so I decided to accept his.
              – Wang Duo
              yesterday














            4












            4








            4






            Construct a bipartite graph from the vectors. Find minimum weight perfect matching in this graph.



            How to construct the graph.




            1. Let A, B be two parts of the graph. Each with n nodes.

            2. Connect i in A to j in B with an edge of weight abs(A[i] - B[j]).


            I believe this can be done in O(n^2).



            See http://www.cse.iitd.ernet.in/~naveen/courses/CSL851/lec4.pdf





            If each number in A has only one closest number in B then you can do this in O(n log n). This probably might be the case since you have the real numbers.



            How?




            1. Sort A O(n log n)

            2. Binary search for the closest number for each number in B. O(n log n).


            If the numbers are coming from the real world and have even a little bit of randomness to them, then the differences between each pair of the numbers is probably unique. You can verify if this is the case by running experiments on the input vectors. Then the problem becomes easy to solve yay!






            share|improve this answer














            Construct a bipartite graph from the vectors. Find minimum weight perfect matching in this graph.



            How to construct the graph.




            1. Let A, B be two parts of the graph. Each with n nodes.

            2. Connect i in A to j in B with an edge of weight abs(A[i] - B[j]).


            I believe this can be done in O(n^2).



            See http://www.cse.iitd.ernet.in/~naveen/courses/CSL851/lec4.pdf





            If each number in A has only one closest number in B then you can do this in O(n log n). This probably might be the case since you have the real numbers.



            How?




            1. Sort A O(n log n)

            2. Binary search for the closest number for each number in B. O(n log n).


            If the numbers are coming from the real world and have even a little bit of randomness to them, then the differences between each pair of the numbers is probably unique. You can verify if this is the case by running experiments on the input vectors. Then the problem becomes easy to solve yay!







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited yesterday

























            answered yesterday









            Pratik Deoghare

            17.4k2583135




            17.4k2583135








            • 1




              Thanks, yours answer is correct! But @snakile replied earlier with good references so I decided to accept his.
              – Wang Duo
              yesterday














            • 1




              Thanks, yours answer is correct! But @snakile replied earlier with good references so I decided to accept his.
              – Wang Duo
              yesterday








            1




            1




            Thanks, yours answer is correct! But @snakile replied earlier with good references so I decided to accept his.
            – Wang Duo
            yesterday




            Thanks, yours answer is correct! But @snakile replied earlier with good references so I decided to accept his.
            – Wang Duo
            yesterday


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54041397%2fgiven-two-arrays-find-the-permutations-that-give-closest-distance-between-two-a%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to make a Squid Proxy server?

            Is this a new Fibonacci Identity?

            19世紀