Counting all substrings with exactly k distinct characters












3














Problem Statement : Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that has exactly k distinct characters. Example:



Input: abc, k = 2 Output: 2 Possible substrings are {"ab", "bc"}



I have written the solution with a two pointer approach. I am not sure how to calculate the time complexity of the program?



According to me complexity should be O(n*k)



public static void main(String args)
{
String s = "abacuusttlvbnc";
int k=3;
char sArr = s.toCharArray();
int strLen = sArr.length;
Set<Character> set = new LinkedHashSet<Character>();
int l=0;
int r=0;

while(l<=strLen-k){ // will run (arrayLength - k) times
for(int i=0;i<k;i++){ // will run k times for every while iteration
set.add(sArr[l]);
l++;
}
if(set.size()==k){
System.out.println("substring : " + set);
}
set.clear(); // O(k) for every while iteration
r++;
l= r;
}
}
}


Output :



substring : [b, a, c]
substring : [a, c, u]
substring : [u, s, t]
substring : [t, l, v]
substring : [l, v, b]
substring : [v, b, n]
substring : [b, n, c]









share|improve this question
























  • I'd say O(n*k) is a reasonable representation. Did you try using subString rather than putting in the little loop? Depending on subString's implementation, it may reduce the complexity of your program to O(n)
    – Maybe_Factor
    Aug 5 '16 at 0:38












  • subString sounds good but every time subString returns a new String that would add to the space complexity of the program. Even the StringBuilder subString method returns a new String. What do you think?
    – underdog
    Aug 5 '16 at 15:33












  • If memory is an issue then using subString might not be a good idea. I hadn't considered the creation of new Strings... that could make any time savings moot anyway.
    – Maybe_Factor
    Aug 8 '16 at 1:51
















3














Problem Statement : Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that has exactly k distinct characters. Example:



Input: abc, k = 2 Output: 2 Possible substrings are {"ab", "bc"}



I have written the solution with a two pointer approach. I am not sure how to calculate the time complexity of the program?



According to me complexity should be O(n*k)



public static void main(String args)
{
String s = "abacuusttlvbnc";
int k=3;
char sArr = s.toCharArray();
int strLen = sArr.length;
Set<Character> set = new LinkedHashSet<Character>();
int l=0;
int r=0;

while(l<=strLen-k){ // will run (arrayLength - k) times
for(int i=0;i<k;i++){ // will run k times for every while iteration
set.add(sArr[l]);
l++;
}
if(set.size()==k){
System.out.println("substring : " + set);
}
set.clear(); // O(k) for every while iteration
r++;
l= r;
}
}
}


Output :



substring : [b, a, c]
substring : [a, c, u]
substring : [u, s, t]
substring : [t, l, v]
substring : [l, v, b]
substring : [v, b, n]
substring : [b, n, c]









share|improve this question
























  • I'd say O(n*k) is a reasonable representation. Did you try using subString rather than putting in the little loop? Depending on subString's implementation, it may reduce the complexity of your program to O(n)
    – Maybe_Factor
    Aug 5 '16 at 0:38












  • subString sounds good but every time subString returns a new String that would add to the space complexity of the program. Even the StringBuilder subString method returns a new String. What do you think?
    – underdog
    Aug 5 '16 at 15:33












  • If memory is an issue then using subString might not be a good idea. I hadn't considered the creation of new Strings... that could make any time savings moot anyway.
    – Maybe_Factor
    Aug 8 '16 at 1:51














3












3








3







Problem Statement : Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that has exactly k distinct characters. Example:



Input: abc, k = 2 Output: 2 Possible substrings are {"ab", "bc"}



I have written the solution with a two pointer approach. I am not sure how to calculate the time complexity of the program?



According to me complexity should be O(n*k)



public static void main(String args)
{
String s = "abacuusttlvbnc";
int k=3;
char sArr = s.toCharArray();
int strLen = sArr.length;
Set<Character> set = new LinkedHashSet<Character>();
int l=0;
int r=0;

while(l<=strLen-k){ // will run (arrayLength - k) times
for(int i=0;i<k;i++){ // will run k times for every while iteration
set.add(sArr[l]);
l++;
}
if(set.size()==k){
System.out.println("substring : " + set);
}
set.clear(); // O(k) for every while iteration
r++;
l= r;
}
}
}


Output :



substring : [b, a, c]
substring : [a, c, u]
substring : [u, s, t]
substring : [t, l, v]
substring : [l, v, b]
substring : [v, b, n]
substring : [b, n, c]









share|improve this question















Problem Statement : Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that has exactly k distinct characters. Example:



Input: abc, k = 2 Output: 2 Possible substrings are {"ab", "bc"}



I have written the solution with a two pointer approach. I am not sure how to calculate the time complexity of the program?



According to me complexity should be O(n*k)



public static void main(String args)
{
String s = "abacuusttlvbnc";
int k=3;
char sArr = s.toCharArray();
int strLen = sArr.length;
Set<Character> set = new LinkedHashSet<Character>();
int l=0;
int r=0;

while(l<=strLen-k){ // will run (arrayLength - k) times
for(int i=0;i<k;i++){ // will run k times for every while iteration
set.add(sArr[l]);
l++;
}
if(set.size()==k){
System.out.println("substring : " + set);
}
set.clear(); // O(k) for every while iteration
r++;
l= r;
}
}
}


Output :



substring : [b, a, c]
substring : [a, c, u]
substring : [u, s, t]
substring : [t, l, v]
substring : [l, v, b]
substring : [v, b, n]
substring : [b, n, c]






java algorithm strings complexity






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 4 '16 at 22:22









200_success

128k15152414




128k15152414










asked Aug 4 '16 at 17:00









underdogunderdog

11815




11815












  • I'd say O(n*k) is a reasonable representation. Did you try using subString rather than putting in the little loop? Depending on subString's implementation, it may reduce the complexity of your program to O(n)
    – Maybe_Factor
    Aug 5 '16 at 0:38












  • subString sounds good but every time subString returns a new String that would add to the space complexity of the program. Even the StringBuilder subString method returns a new String. What do you think?
    – underdog
    Aug 5 '16 at 15:33












  • If memory is an issue then using subString might not be a good idea. I hadn't considered the creation of new Strings... that could make any time savings moot anyway.
    – Maybe_Factor
    Aug 8 '16 at 1:51


















  • I'd say O(n*k) is a reasonable representation. Did you try using subString rather than putting in the little loop? Depending on subString's implementation, it may reduce the complexity of your program to O(n)
    – Maybe_Factor
    Aug 5 '16 at 0:38












  • subString sounds good but every time subString returns a new String that would add to the space complexity of the program. Even the StringBuilder subString method returns a new String. What do you think?
    – underdog
    Aug 5 '16 at 15:33












  • If memory is an issue then using subString might not be a good idea. I hadn't considered the creation of new Strings... that could make any time savings moot anyway.
    – Maybe_Factor
    Aug 8 '16 at 1:51
















I'd say O(n*k) is a reasonable representation. Did you try using subString rather than putting in the little loop? Depending on subString's implementation, it may reduce the complexity of your program to O(n)
– Maybe_Factor
Aug 5 '16 at 0:38






I'd say O(n*k) is a reasonable representation. Did you try using subString rather than putting in the little loop? Depending on subString's implementation, it may reduce the complexity of your program to O(n)
– Maybe_Factor
Aug 5 '16 at 0:38














subString sounds good but every time subString returns a new String that would add to the space complexity of the program. Even the StringBuilder subString method returns a new String. What do you think?
– underdog
Aug 5 '16 at 15:33






subString sounds good but every time subString returns a new String that would add to the space complexity of the program. Even the StringBuilder subString method returns a new String. What do you think?
– underdog
Aug 5 '16 at 15:33














If memory is an issue then using subString might not be a good idea. I hadn't considered the creation of new Strings... that could make any time savings moot anyway.
– Maybe_Factor
Aug 8 '16 at 1:51




If memory is an issue then using subString might not be a good idea. I hadn't considered the creation of new Strings... that could make any time savings moot anyway.
– Maybe_Factor
Aug 8 '16 at 1:51










1 Answer
1






active

oldest

votes


















4














Consider



public static void findSubstringsWithKDistinctCharacters(String s, int k) {
char letters = s.toCharArray();

for (int i = 0, n = letters.length - k; i <= n; i++) {
Set<Character> uniques = new LinkedHashSet<Character>();

for (int j = i, m = i + k; j < m; j++) {
uniques.add(letters[j]);
}

if (uniques.size() == k) {
System.out.println("substring : " + uniques);
}
}
}


This is a simpler version of what you wrote. It gets rid of your r variable entirely, as it is unnecessary.



I also changed the names of sArr and set to things that I find more descriptive.



I moved the code into its own method so that it can be called multiple times.



There are two reasons to move the declaration of uniques into the loop. One, this is less code. Two, if you changed this code to produce a list of results rather than print the results, the other version doesn't work. You'd have multiple copies of whatever the last set was rather than unique copies. I'd only use the clear version if I knew that performance of this method was a bottleneck.



Bug



Unfortunately, both this version and your original do not match the linked problem statement:




Input: aba, k = 2

Output: 3

Possible substrings are {"ab", "ba", "aba"}



Input: aa, k = 1

Output: 3

Possible substrings are {"a", "a", "aa"}




They only find two solutions for each of these, as they stop counting once there are k characters (distinct or not) in the substring. They should keep going until they've verified that the next character isn't a duplicate of a character already in the substring.



Complexity



Calling the time complexity $mathcal{O}(ncdot k)$ is reasonable. It's slightly more accurate to say that it is $mathcal{O}(ncdot k - k^2)$, but $k$ is never larger than $n$ so it's reasonable to view this as $mathcal{O}(ncdot k)$. Note that if you fix the algorithm, it would be $mathcal{O}(n^2)$, as the worst case is an input of all the same character. That gives substrings up to length $n$ where this is limited to length $k$.






share|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f137842%2fcounting-all-substrings-with-exactly-k-distinct-characters%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    Consider



    public static void findSubstringsWithKDistinctCharacters(String s, int k) {
    char letters = s.toCharArray();

    for (int i = 0, n = letters.length - k; i <= n; i++) {
    Set<Character> uniques = new LinkedHashSet<Character>();

    for (int j = i, m = i + k; j < m; j++) {
    uniques.add(letters[j]);
    }

    if (uniques.size() == k) {
    System.out.println("substring : " + uniques);
    }
    }
    }


    This is a simpler version of what you wrote. It gets rid of your r variable entirely, as it is unnecessary.



    I also changed the names of sArr and set to things that I find more descriptive.



    I moved the code into its own method so that it can be called multiple times.



    There are two reasons to move the declaration of uniques into the loop. One, this is less code. Two, if you changed this code to produce a list of results rather than print the results, the other version doesn't work. You'd have multiple copies of whatever the last set was rather than unique copies. I'd only use the clear version if I knew that performance of this method was a bottleneck.



    Bug



    Unfortunately, both this version and your original do not match the linked problem statement:




    Input: aba, k = 2

    Output: 3

    Possible substrings are {"ab", "ba", "aba"}



    Input: aa, k = 1

    Output: 3

    Possible substrings are {"a", "a", "aa"}




    They only find two solutions for each of these, as they stop counting once there are k characters (distinct or not) in the substring. They should keep going until they've verified that the next character isn't a duplicate of a character already in the substring.



    Complexity



    Calling the time complexity $mathcal{O}(ncdot k)$ is reasonable. It's slightly more accurate to say that it is $mathcal{O}(ncdot k - k^2)$, but $k$ is never larger than $n$ so it's reasonable to view this as $mathcal{O}(ncdot k)$. Note that if you fix the algorithm, it would be $mathcal{O}(n^2)$, as the worst case is an input of all the same character. That gives substrings up to length $n$ where this is limited to length $k$.






    share|improve this answer


























      4














      Consider



      public static void findSubstringsWithKDistinctCharacters(String s, int k) {
      char letters = s.toCharArray();

      for (int i = 0, n = letters.length - k; i <= n; i++) {
      Set<Character> uniques = new LinkedHashSet<Character>();

      for (int j = i, m = i + k; j < m; j++) {
      uniques.add(letters[j]);
      }

      if (uniques.size() == k) {
      System.out.println("substring : " + uniques);
      }
      }
      }


      This is a simpler version of what you wrote. It gets rid of your r variable entirely, as it is unnecessary.



      I also changed the names of sArr and set to things that I find more descriptive.



      I moved the code into its own method so that it can be called multiple times.



      There are two reasons to move the declaration of uniques into the loop. One, this is less code. Two, if you changed this code to produce a list of results rather than print the results, the other version doesn't work. You'd have multiple copies of whatever the last set was rather than unique copies. I'd only use the clear version if I knew that performance of this method was a bottleneck.



      Bug



      Unfortunately, both this version and your original do not match the linked problem statement:




      Input: aba, k = 2

      Output: 3

      Possible substrings are {"ab", "ba", "aba"}



      Input: aa, k = 1

      Output: 3

      Possible substrings are {"a", "a", "aa"}




      They only find two solutions for each of these, as they stop counting once there are k characters (distinct or not) in the substring. They should keep going until they've verified that the next character isn't a duplicate of a character already in the substring.



      Complexity



      Calling the time complexity $mathcal{O}(ncdot k)$ is reasonable. It's slightly more accurate to say that it is $mathcal{O}(ncdot k - k^2)$, but $k$ is never larger than $n$ so it's reasonable to view this as $mathcal{O}(ncdot k)$. Note that if you fix the algorithm, it would be $mathcal{O}(n^2)$, as the worst case is an input of all the same character. That gives substrings up to length $n$ where this is limited to length $k$.






      share|improve this answer
























        4












        4








        4






        Consider



        public static void findSubstringsWithKDistinctCharacters(String s, int k) {
        char letters = s.toCharArray();

        for (int i = 0, n = letters.length - k; i <= n; i++) {
        Set<Character> uniques = new LinkedHashSet<Character>();

        for (int j = i, m = i + k; j < m; j++) {
        uniques.add(letters[j]);
        }

        if (uniques.size() == k) {
        System.out.println("substring : " + uniques);
        }
        }
        }


        This is a simpler version of what you wrote. It gets rid of your r variable entirely, as it is unnecessary.



        I also changed the names of sArr and set to things that I find more descriptive.



        I moved the code into its own method so that it can be called multiple times.



        There are two reasons to move the declaration of uniques into the loop. One, this is less code. Two, if you changed this code to produce a list of results rather than print the results, the other version doesn't work. You'd have multiple copies of whatever the last set was rather than unique copies. I'd only use the clear version if I knew that performance of this method was a bottleneck.



        Bug



        Unfortunately, both this version and your original do not match the linked problem statement:




        Input: aba, k = 2

        Output: 3

        Possible substrings are {"ab", "ba", "aba"}



        Input: aa, k = 1

        Output: 3

        Possible substrings are {"a", "a", "aa"}




        They only find two solutions for each of these, as they stop counting once there are k characters (distinct or not) in the substring. They should keep going until they've verified that the next character isn't a duplicate of a character already in the substring.



        Complexity



        Calling the time complexity $mathcal{O}(ncdot k)$ is reasonable. It's slightly more accurate to say that it is $mathcal{O}(ncdot k - k^2)$, but $k$ is never larger than $n$ so it's reasonable to view this as $mathcal{O}(ncdot k)$. Note that if you fix the algorithm, it would be $mathcal{O}(n^2)$, as the worst case is an input of all the same character. That gives substrings up to length $n$ where this is limited to length $k$.






        share|improve this answer












        Consider



        public static void findSubstringsWithKDistinctCharacters(String s, int k) {
        char letters = s.toCharArray();

        for (int i = 0, n = letters.length - k; i <= n; i++) {
        Set<Character> uniques = new LinkedHashSet<Character>();

        for (int j = i, m = i + k; j < m; j++) {
        uniques.add(letters[j]);
        }

        if (uniques.size() == k) {
        System.out.println("substring : " + uniques);
        }
        }
        }


        This is a simpler version of what you wrote. It gets rid of your r variable entirely, as it is unnecessary.



        I also changed the names of sArr and set to things that I find more descriptive.



        I moved the code into its own method so that it can be called multiple times.



        There are two reasons to move the declaration of uniques into the loop. One, this is less code. Two, if you changed this code to produce a list of results rather than print the results, the other version doesn't work. You'd have multiple copies of whatever the last set was rather than unique copies. I'd only use the clear version if I knew that performance of this method was a bottleneck.



        Bug



        Unfortunately, both this version and your original do not match the linked problem statement:




        Input: aba, k = 2

        Output: 3

        Possible substrings are {"ab", "ba", "aba"}



        Input: aa, k = 1

        Output: 3

        Possible substrings are {"a", "a", "aa"}




        They only find two solutions for each of these, as they stop counting once there are k characters (distinct or not) in the substring. They should keep going until they've verified that the next character isn't a duplicate of a character already in the substring.



        Complexity



        Calling the time complexity $mathcal{O}(ncdot k)$ is reasonable. It's slightly more accurate to say that it is $mathcal{O}(ncdot k - k^2)$, but $k$ is never larger than $n$ so it's reasonable to view this as $mathcal{O}(ncdot k)$. Note that if you fix the algorithm, it would be $mathcal{O}(n^2)$, as the worst case is an input of all the same character. That gives substrings up to length $n$ where this is limited to length $k$.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Aug 5 '16 at 2:57









        mdfst13mdfst13

        17.4k52156




        17.4k52156






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f137842%2fcounting-all-substrings-with-exactly-k-distinct-characters%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to make a Squid Proxy server?

            第一次世界大戦

            Touch on Surface Book