• magic_lobster_party@kbin.run
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    5 个月前

    So in your code you do the following for each permutation:

    for (int i = 0; i<n;i++) {

    You’re iterating through the entire list for each permutation, which yields an O(n x n!) time complexity. My idea was an attempt to avoid that extra factor n.

    I’m not sure how std implements permutations, but the way I want them is:

    1 2 3 4 5

    1 2 3 5 4

    1 2 4 3 5

    1 2 4 5 3

    1 2 5 3 4

    1 2 5 4 3

    1 3 2 4 5

    etc.

    Note that the last 2 numbers change every iteration, third last number change every 2 iterations, fourth last iteration change every 2 x 3 iterations. The first number in this example change every 2 x 3 x 4 iterations.

    This gives us an idea how often we need to calculate how often each hash need to be updated. We don’t need to calculate the hash for 1 2 3 between the first and second iteration for example.

    The first hash will be updated 5 times. Second hash 5 x 4 times. Third 5 x 4 x 3 times. Fourth 5 x 4 x 3 x 2 times. Fifth 5 x 4 x 3 x 2 x 1 times.

    So the time complexity should be the number of times we need to calculate the hash function, which is O(n + n (n - 1) + n (n - 1) (n - 2) + … + n!) = O(n!) times.

    EDIT: on a second afterthought, I’m not sure this is a legal simplification. It might be the case that it’s actually O(n x n!), as there are n growing number of terms. But in that case shouldn’t all permutation algorithms be O(n x n!)?

    EDIT 2: found this link https://stackoverflow.com/a/39126141

    The time complexity can be simplified as O(2.71828 x n!), which makes it O(n!), so it’s a legal simplification! (Although I thought wrong, but I arrived to the correct conclusion)

    END EDIT.

    We do the same for the second list (for each permission), which makes it O(n!^2).

    Finally we do the hamming distance, but this is done between constant length hashes, so it’s going to be constant time O(1) in this context.

    Maybe I can try my own implementation once I have access to a proper computer.

    • MinekPo1 [She/Her]@lemmygrad.ml
      link
      fedilink
      arrow-up
      1
      ·
      5 个月前

      you forgot about updating the hashes of items after items which were modified , so while it could be slightly faster than O((n!×n)²) , not by much as my data shows .

      in other words , every time you update the first hash you also need to update all the hashes after it , etcetera

      so the complexity is O(n×n + n×(n-1)×(n-1)+…+n!×1) , though I dont know how to simplify that

      • magic_lobster_party@kbin.run
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        5 个月前

        My implementation: https://pastebin.com/3PskMZqz

        Results at bottom of file.

        I’m taking into account that when I update a hash, all the hashes to the right of it should also be updated.

        Number of hashes is about 2.71828 x n! as predicted. The time seems to be proportional to n! as well (n = 12 is about 12 times slower than n = 11, which in turn is about 11 times slower than n = 10).

        Interestingly this program turned out to be a fun and inefficient way of calculating the digits of e.

        • MinekPo1 [She/Her]@lemmygrad.ml
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 个月前

          Agh I made a mistake in my code:

          if (recalc || numbers[i] != (hashstate[i] & 0xffffffff)) {
          	hashstate[i] = hasher.hash(((uint64_t)p << 32) | numbers[i]);
          }
          

          Since I decided to pack the hashes and previous number values into a single array and then forgot to actually properly format the values, the hash counts generated by my code were nonsense. Not sure why I did that honestly.

          Also, my data analysis was trash, since even with the correct data, which as you noted is in a lineal correlation with n!, my reasoning suggests that its growing faster than it is.

          Here is a plot of the incorrect ratios compared to the correct ones, which is the proper analysis and also clearly shows something is wrong.

          Desmos graph showing two data sets, one growing linearly labeled incorrect and one converging to e labeled #hashes

          Anyway, and this is totally unrelated to me losing an internet argument and not coping well with that, I optimized my solution a lot and turns out its actually faster to only preform the check you are doing once or twice and narrow it down from there. The checks I’m doing are for the last two elements and the midpoint (though I tried moving that about with seemingly no effect ???) with the end check going to a branch without a loop. I’m not exactly sure why, despite the hour or two I spent profiling, though my guess is that it has something to do with caching?

          Also FYI I compared performance with -O3 and after modifying your implementation to use sdbm and to actually use the previous hash instead of the previous value (plus misc changes, see patch).