Count number of pairs of strings which on joining contain all vowels












2












$begingroup$


Problem



I have $N$ words. For each valid $i$, word $i$ is described by a string $D_i$ containing only lowercase vowels, i.e. characters 'a', 'e', 'i', 'o', 'u'.



We check the number of strings $D_i$ and $D_j$ which, when concatenated into a new string $M$, contain all 5 vowels. What is the total number of (unordered) pairs of words such that when concatenated, they contain all the vowels?



Input




  • The first line of the input contains a single integer $T$ denoting the number of test cases. The description of $T$ test cases follows.

  • The first line of each test case contains a single integer $N$.


  • $N$ lines follow. For each valid $i$, the $i^{th}$ of these lines contains a single string $D_i$.


Output



For each test case, print a single line containing one integer - the number of concatenated words that contain all vowels.



Test Case



3
aaooaoaooa
uiieieiieieuuu
aeioooeeiiaiei


Output



2


Constraints




  • $1≤T≤1,000$

  • $1≤N≤10^5$


  • $1≤|D_i|≤1,000$ for each valid $i$

  • the sum of all |$D_i$| over all test cases does not exceed $3⋅10^7$


I have written some code but I want to optimize it:



#include<bits/stdc++.h>
using namespace std;

int main()
{
int t;
cin>>t;
while(t--)
{
int n;
cin>>n;
vector<string> v;
long long con_s[n];
for(int i=0;i<n;i++)
{
string s;
cin>>s;
con_s[i]=0;
for(int j=0;j<s.length();j++)
{
con_s[i] = con_s[i] | (1<<(s[j]-'a'));
}
}
long long complete = 0;
complete = (1<<('a'-'a')) | (1<<('e'-'a')) | (1<<('i'-'a')) | (1<<('o'-'a')) | (1<<('u'-'a'));
// cout<<complete;
int count = 0;
for(int i=0;i<n-1;i++)
{
for(int j=i+1;j<n;j++)
{
if((con_s[i] | con_s[j])==complete)count++;
}
}
cout<<count<<"n";
}
return 0;
}


Can you please suggest a more optimized solution...










share|improve this question









New contributor




AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Welcome! As it's written, your code is a little difficult to understand, particularly with the bit shifting. My suggestion is to use more informative variable names. Short one-letter variable names are good for loops and writing quick code, but they make it very difficult for others (even yourself) to review your code later on. I'd also recommend moving some of the code out of main and into functions.
    $endgroup$
    – AleksandrH
    12 hours ago












  • $begingroup$
    Is the "T" variable the number of tests, or something else? I don't see it mentioned in the description of the problem.
    $endgroup$
    – Austin Hastings
    9 hours ago










  • $begingroup$
    @AustinHastings yes, T is the number of testcases. Can you help me solve this ?
    $endgroup$
    – AnonymousJoe
    9 hours ago






  • 1




    $begingroup$
    @AleksandrH Please put all suggestions for improvements in answers, not comments.
    $endgroup$
    – 200_success
    7 hours ago
















2












$begingroup$


Problem



I have $N$ words. For each valid $i$, word $i$ is described by a string $D_i$ containing only lowercase vowels, i.e. characters 'a', 'e', 'i', 'o', 'u'.



We check the number of strings $D_i$ and $D_j$ which, when concatenated into a new string $M$, contain all 5 vowels. What is the total number of (unordered) pairs of words such that when concatenated, they contain all the vowels?



Input




  • The first line of the input contains a single integer $T$ denoting the number of test cases. The description of $T$ test cases follows.

  • The first line of each test case contains a single integer $N$.


  • $N$ lines follow. For each valid $i$, the $i^{th}$ of these lines contains a single string $D_i$.


Output



For each test case, print a single line containing one integer - the number of concatenated words that contain all vowels.



Test Case



3
aaooaoaooa
uiieieiieieuuu
aeioooeeiiaiei


Output



2


Constraints




  • $1≤T≤1,000$

  • $1≤N≤10^5$


  • $1≤|D_i|≤1,000$ for each valid $i$

  • the sum of all |$D_i$| over all test cases does not exceed $3⋅10^7$


I have written some code but I want to optimize it:



#include<bits/stdc++.h>
using namespace std;

int main()
{
int t;
cin>>t;
while(t--)
{
int n;
cin>>n;
vector<string> v;
long long con_s[n];
for(int i=0;i<n;i++)
{
string s;
cin>>s;
con_s[i]=0;
for(int j=0;j<s.length();j++)
{
con_s[i] = con_s[i] | (1<<(s[j]-'a'));
}
}
long long complete = 0;
complete = (1<<('a'-'a')) | (1<<('e'-'a')) | (1<<('i'-'a')) | (1<<('o'-'a')) | (1<<('u'-'a'));
// cout<<complete;
int count = 0;
for(int i=0;i<n-1;i++)
{
for(int j=i+1;j<n;j++)
{
if((con_s[i] | con_s[j])==complete)count++;
}
}
cout<<count<<"n";
}
return 0;
}


Can you please suggest a more optimized solution...










share|improve this question









New contributor




AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Welcome! As it's written, your code is a little difficult to understand, particularly with the bit shifting. My suggestion is to use more informative variable names. Short one-letter variable names are good for loops and writing quick code, but they make it very difficult for others (even yourself) to review your code later on. I'd also recommend moving some of the code out of main and into functions.
    $endgroup$
    – AleksandrH
    12 hours ago












  • $begingroup$
    Is the "T" variable the number of tests, or something else? I don't see it mentioned in the description of the problem.
    $endgroup$
    – Austin Hastings
    9 hours ago










  • $begingroup$
    @AustinHastings yes, T is the number of testcases. Can you help me solve this ?
    $endgroup$
    – AnonymousJoe
    9 hours ago






  • 1




    $begingroup$
    @AleksandrH Please put all suggestions for improvements in answers, not comments.
    $endgroup$
    – 200_success
    7 hours ago














2












2








2





$begingroup$


Problem



I have $N$ words. For each valid $i$, word $i$ is described by a string $D_i$ containing only lowercase vowels, i.e. characters 'a', 'e', 'i', 'o', 'u'.



We check the number of strings $D_i$ and $D_j$ which, when concatenated into a new string $M$, contain all 5 vowels. What is the total number of (unordered) pairs of words such that when concatenated, they contain all the vowels?



Input




  • The first line of the input contains a single integer $T$ denoting the number of test cases. The description of $T$ test cases follows.

  • The first line of each test case contains a single integer $N$.


  • $N$ lines follow. For each valid $i$, the $i^{th}$ of these lines contains a single string $D_i$.


Output



For each test case, print a single line containing one integer - the number of concatenated words that contain all vowels.



Test Case



3
aaooaoaooa
uiieieiieieuuu
aeioooeeiiaiei


Output



2


Constraints




  • $1≤T≤1,000$

  • $1≤N≤10^5$


  • $1≤|D_i|≤1,000$ for each valid $i$

  • the sum of all |$D_i$| over all test cases does not exceed $3⋅10^7$


I have written some code but I want to optimize it:



#include<bits/stdc++.h>
using namespace std;

int main()
{
int t;
cin>>t;
while(t--)
{
int n;
cin>>n;
vector<string> v;
long long con_s[n];
for(int i=0;i<n;i++)
{
string s;
cin>>s;
con_s[i]=0;
for(int j=0;j<s.length();j++)
{
con_s[i] = con_s[i] | (1<<(s[j]-'a'));
}
}
long long complete = 0;
complete = (1<<('a'-'a')) | (1<<('e'-'a')) | (1<<('i'-'a')) | (1<<('o'-'a')) | (1<<('u'-'a'));
// cout<<complete;
int count = 0;
for(int i=0;i<n-1;i++)
{
for(int j=i+1;j<n;j++)
{
if((con_s[i] | con_s[j])==complete)count++;
}
}
cout<<count<<"n";
}
return 0;
}


Can you please suggest a more optimized solution...










share|improve this question









New contributor




AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




Problem



I have $N$ words. For each valid $i$, word $i$ is described by a string $D_i$ containing only lowercase vowels, i.e. characters 'a', 'e', 'i', 'o', 'u'.



We check the number of strings $D_i$ and $D_j$ which, when concatenated into a new string $M$, contain all 5 vowels. What is the total number of (unordered) pairs of words such that when concatenated, they contain all the vowels?



Input




  • The first line of the input contains a single integer $T$ denoting the number of test cases. The description of $T$ test cases follows.

  • The first line of each test case contains a single integer $N$.


  • $N$ lines follow. For each valid $i$, the $i^{th}$ of these lines contains a single string $D_i$.


Output



For each test case, print a single line containing one integer - the number of concatenated words that contain all vowels.



Test Case



3
aaooaoaooa
uiieieiieieuuu
aeioooeeiiaiei


Output



2


Constraints




  • $1≤T≤1,000$

  • $1≤N≤10^5$


  • $1≤|D_i|≤1,000$ for each valid $i$

  • the sum of all |$D_i$| over all test cases does not exceed $3⋅10^7$


I have written some code but I want to optimize it:



#include<bits/stdc++.h>
using namespace std;

int main()
{
int t;
cin>>t;
while(t--)
{
int n;
cin>>n;
vector<string> v;
long long con_s[n];
for(int i=0;i<n;i++)
{
string s;
cin>>s;
con_s[i]=0;
for(int j=0;j<s.length();j++)
{
con_s[i] = con_s[i] | (1<<(s[j]-'a'));
}
}
long long complete = 0;
complete = (1<<('a'-'a')) | (1<<('e'-'a')) | (1<<('i'-'a')) | (1<<('o'-'a')) | (1<<('u'-'a'));
// cout<<complete;
int count = 0;
for(int i=0;i<n-1;i++)
{
for(int j=i+1;j<n;j++)
{
if((con_s[i] | con_s[j])==complete)count++;
}
}
cout<<count<<"n";
}
return 0;
}


Can you please suggest a more optimized solution...







c++






share|improve this question









New contributor




AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 3 hours ago









1201ProgramAlarm

3,2532923




3,2532923






New contributor




AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 14 hours ago









AnonymousJoeAnonymousJoe

112




112




New contributor




AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






AnonymousJoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Welcome! As it's written, your code is a little difficult to understand, particularly with the bit shifting. My suggestion is to use more informative variable names. Short one-letter variable names are good for loops and writing quick code, but they make it very difficult for others (even yourself) to review your code later on. I'd also recommend moving some of the code out of main and into functions.
    $endgroup$
    – AleksandrH
    12 hours ago












  • $begingroup$
    Is the "T" variable the number of tests, or something else? I don't see it mentioned in the description of the problem.
    $endgroup$
    – Austin Hastings
    9 hours ago










  • $begingroup$
    @AustinHastings yes, T is the number of testcases. Can you help me solve this ?
    $endgroup$
    – AnonymousJoe
    9 hours ago






  • 1




    $begingroup$
    @AleksandrH Please put all suggestions for improvements in answers, not comments.
    $endgroup$
    – 200_success
    7 hours ago


















  • $begingroup$
    Welcome! As it's written, your code is a little difficult to understand, particularly with the bit shifting. My suggestion is to use more informative variable names. Short one-letter variable names are good for loops and writing quick code, but they make it very difficult for others (even yourself) to review your code later on. I'd also recommend moving some of the code out of main and into functions.
    $endgroup$
    – AleksandrH
    12 hours ago












  • $begingroup$
    Is the "T" variable the number of tests, or something else? I don't see it mentioned in the description of the problem.
    $endgroup$
    – Austin Hastings
    9 hours ago










  • $begingroup$
    @AustinHastings yes, T is the number of testcases. Can you help me solve this ?
    $endgroup$
    – AnonymousJoe
    9 hours ago






  • 1




    $begingroup$
    @AleksandrH Please put all suggestions for improvements in answers, not comments.
    $endgroup$
    – 200_success
    7 hours ago
















$begingroup$
Welcome! As it's written, your code is a little difficult to understand, particularly with the bit shifting. My suggestion is to use more informative variable names. Short one-letter variable names are good for loops and writing quick code, but they make it very difficult for others (even yourself) to review your code later on. I'd also recommend moving some of the code out of main and into functions.
$endgroup$
– AleksandrH
12 hours ago






$begingroup$
Welcome! As it's written, your code is a little difficult to understand, particularly with the bit shifting. My suggestion is to use more informative variable names. Short one-letter variable names are good for loops and writing quick code, but they make it very difficult for others (even yourself) to review your code later on. I'd also recommend moving some of the code out of main and into functions.
$endgroup$
– AleksandrH
12 hours ago














$begingroup$
Is the "T" variable the number of tests, or something else? I don't see it mentioned in the description of the problem.
$endgroup$
– Austin Hastings
9 hours ago




$begingroup$
Is the "T" variable the number of tests, or something else? I don't see it mentioned in the description of the problem.
$endgroup$
– Austin Hastings
9 hours ago












$begingroup$
@AustinHastings yes, T is the number of testcases. Can you help me solve this ?
$endgroup$
– AnonymousJoe
9 hours ago




$begingroup$
@AustinHastings yes, T is the number of testcases. Can you help me solve this ?
$endgroup$
– AnonymousJoe
9 hours ago




1




1




$begingroup$
@AleksandrH Please put all suggestions for improvements in answers, not comments.
$endgroup$
– 200_success
7 hours ago




$begingroup$
@AleksandrH Please put all suggestions for improvements in answers, not comments.
$endgroup$
– 200_success
7 hours ago










2 Answers
2






active

oldest

votes


















1












$begingroup$

I'm no C++ guru, but some things jump out at me:





  1. Your coding style is very dense, which makes it hard to read and review. I'd suggest that you switch to a style that doesn't make you pay for each time you add a space. Code like this:



    cin>>t;
    while(t--)


    Just isn't as easy to read as code like this:



    cin >> t;
    while (t--)


    The little things make a difference! And even if you're in some kind of "shortest code" context, I think you should still write it long, and then compress it once you have it working.



  2. The first for loop in your while loop is concerned with reading and parsing the input strings. I believe you should break that into a separate function. What's more, I think you should make a slight change to your data storage as @Deduplicator suggested: instead of just using the bits located at (1 << (vowel - 'a')), at some point you should compress the bits down into the range 1<<0 .. 1<<4.


  3. I believe that the description says "(unordered) pairs" and means that if two strings, A and B, can be concatenated to meet the requirements, they only count once, as in set{A, B} and not twice, as in pair(A,B), pair(B,A). So, as @Deduplicator suggested, if you "bin" your words - that is, categorize them according to the trait "which vowels are present", then you can represent a bin with just an integer (how many words are in the bin). So you would then cross-match every bin with every other bin to determine whether the binned words can successfully pair, and add an appropriate number to the count.



With that in mind, a successful solution would look something like this:



for each input word:
scan the word for vowels, recording which vowels were found
classify the found vowels into a small integer with bits 0..4 set
use the small integer as the "bin" number for that word
increment the count for that bin#


This should be $O(n cdot s)$ on the number of words, n, and the length of the words, s, in time. The only optimization would be to break out of the scanning loop if you match every single vowel, since further scanning gains you nothing. The code will require storage for the input, but compiles everything down to a single bin number, so the storage will be from $O(1)$ to $O(s)$ depending on how you implement the code. The bin numbers are members of a fixed set, so their counts will be $O(2^5)$ regardless, which simplifies to $O(1)$.



Now since matched pairs of words only count one time, you can loop "upwards" when generating combinations:



for each bin 0 .. 2^5 - 1
for each higher bin (looping upwards here):
if the two bins match
the count of word pairs is # in bin-1 * # in bin-2
add the count to the total count of matchable pairs


This should be $O(n^2)$ on the number of bins (not words!), which is a constant. You can't optimize much, except that bins with counts of zero won't contribute anything so can be skipped.



At this point you have your answer and just need to print it out.






share|improve this answer









$endgroup$













  • $begingroup$
    It is simple to optimise to runtime O(characters+bins). And you are right, size is O(1).
    $endgroup$
    – Deduplicator
    8 hours ago





















1












$begingroup$

Your Code:




  1. Don't use <bits/stdc++.h>, it's unportable and inefficient. See "Why should I not #include <bits/stdc++.h>?" for the details.


  2. Don't use using namespace std;, that namespace just isn't designed for it. See "Why is “using namespace std” considered bad practice?" for more background.


  3. Invest in some better names. Being too parsimonious there hurts, even though brevity is a virtue.



  4. Format your code consistentently.




    • Sometimes you have space around binary operators (other than comma), sometimes you don't.

    • Also add a space after #include, for, if, and while.

    • You don't have to use a block for single statements, but please don't put them on the same line, at the least separate them with a space.



  5. C++ does not have VLAs. Just use a std::vector instead.


  6. While in general, keeping the scope of a variable minimal is a good idea, there are exceptions. One of them is efficiency. Always spinning up a new std::string means being unable to re-use the buffer. Not that you really need the whole string at once, anyway.


  7. One reason you should minimise a variables scope, is that it allows you to initialise it to the proper value, instead of leaving it uninitialised or, horrors of horrors, adding a spurious dummy-initialisation.

    That also allows you to avoid writing the type (Almost Always auto), and making it const or even constexpr.


  8. There are compound-assignment-operators for most binary operators. Like a |= b for a = a | b. Using them leads to shorter, more readable code.


  9. When you want to output a single character, why not use a character-literal? It's potentially even slightly more efficient.


  10. return 0; is implicit for main(). Take it or leave it, but be aware.



The Algorithm:



Your algorithm uses $O(#characters+#words^2)$ time and $O(max_word_length + #words)$ space.



An optimal algorithm only needs $O(#characters+2^{#vowels})$ time and $O(1)$ space:




  1. For every word:


    1. Set the bits for all the vowels contained.

    2. Increment the count on the indicated bin.



  2. For every vowel:


    1. Iterate the bins containing words with that vowel.

    2. Add the count to the respective bin without that vowel.



  3. For all bins:
    Multiply the count in the bin with the count in the complementary bin, and add that.

  4. Subtract twice the count in the bin for words containing all vowels. Those account for all self-pairings.

  5. The answer is half the calculated number, as we counted double.


Exercise for the attentive reader: Save half the multiplications this algorithm uses.






share|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    AnonymousJoe is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214644%2fcount-number-of-pairs-of-strings-which-on-joining-contain-all-vowels%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    I'm no C++ guru, but some things jump out at me:





    1. Your coding style is very dense, which makes it hard to read and review. I'd suggest that you switch to a style that doesn't make you pay for each time you add a space. Code like this:



      cin>>t;
      while(t--)


      Just isn't as easy to read as code like this:



      cin >> t;
      while (t--)


      The little things make a difference! And even if you're in some kind of "shortest code" context, I think you should still write it long, and then compress it once you have it working.



    2. The first for loop in your while loop is concerned with reading and parsing the input strings. I believe you should break that into a separate function. What's more, I think you should make a slight change to your data storage as @Deduplicator suggested: instead of just using the bits located at (1 << (vowel - 'a')), at some point you should compress the bits down into the range 1<<0 .. 1<<4.


    3. I believe that the description says "(unordered) pairs" and means that if two strings, A and B, can be concatenated to meet the requirements, they only count once, as in set{A, B} and not twice, as in pair(A,B), pair(B,A). So, as @Deduplicator suggested, if you "bin" your words - that is, categorize them according to the trait "which vowels are present", then you can represent a bin with just an integer (how many words are in the bin). So you would then cross-match every bin with every other bin to determine whether the binned words can successfully pair, and add an appropriate number to the count.



    With that in mind, a successful solution would look something like this:



    for each input word:
    scan the word for vowels, recording which vowels were found
    classify the found vowels into a small integer with bits 0..4 set
    use the small integer as the "bin" number for that word
    increment the count for that bin#


    This should be $O(n cdot s)$ on the number of words, n, and the length of the words, s, in time. The only optimization would be to break out of the scanning loop if you match every single vowel, since further scanning gains you nothing. The code will require storage for the input, but compiles everything down to a single bin number, so the storage will be from $O(1)$ to $O(s)$ depending on how you implement the code. The bin numbers are members of a fixed set, so their counts will be $O(2^5)$ regardless, which simplifies to $O(1)$.



    Now since matched pairs of words only count one time, you can loop "upwards" when generating combinations:



    for each bin 0 .. 2^5 - 1
    for each higher bin (looping upwards here):
    if the two bins match
    the count of word pairs is # in bin-1 * # in bin-2
    add the count to the total count of matchable pairs


    This should be $O(n^2)$ on the number of bins (not words!), which is a constant. You can't optimize much, except that bins with counts of zero won't contribute anything so can be skipped.



    At this point you have your answer and just need to print it out.






    share|improve this answer









    $endgroup$













    • $begingroup$
      It is simple to optimise to runtime O(characters+bins). And you are right, size is O(1).
      $endgroup$
      – Deduplicator
      8 hours ago


















    1












    $begingroup$

    I'm no C++ guru, but some things jump out at me:





    1. Your coding style is very dense, which makes it hard to read and review. I'd suggest that you switch to a style that doesn't make you pay for each time you add a space. Code like this:



      cin>>t;
      while(t--)


      Just isn't as easy to read as code like this:



      cin >> t;
      while (t--)


      The little things make a difference! And even if you're in some kind of "shortest code" context, I think you should still write it long, and then compress it once you have it working.



    2. The first for loop in your while loop is concerned with reading and parsing the input strings. I believe you should break that into a separate function. What's more, I think you should make a slight change to your data storage as @Deduplicator suggested: instead of just using the bits located at (1 << (vowel - 'a')), at some point you should compress the bits down into the range 1<<0 .. 1<<4.


    3. I believe that the description says "(unordered) pairs" and means that if two strings, A and B, can be concatenated to meet the requirements, they only count once, as in set{A, B} and not twice, as in pair(A,B), pair(B,A). So, as @Deduplicator suggested, if you "bin" your words - that is, categorize them according to the trait "which vowels are present", then you can represent a bin with just an integer (how many words are in the bin). So you would then cross-match every bin with every other bin to determine whether the binned words can successfully pair, and add an appropriate number to the count.



    With that in mind, a successful solution would look something like this:



    for each input word:
    scan the word for vowels, recording which vowels were found
    classify the found vowels into a small integer with bits 0..4 set
    use the small integer as the "bin" number for that word
    increment the count for that bin#


    This should be $O(n cdot s)$ on the number of words, n, and the length of the words, s, in time. The only optimization would be to break out of the scanning loop if you match every single vowel, since further scanning gains you nothing. The code will require storage for the input, but compiles everything down to a single bin number, so the storage will be from $O(1)$ to $O(s)$ depending on how you implement the code. The bin numbers are members of a fixed set, so their counts will be $O(2^5)$ regardless, which simplifies to $O(1)$.



    Now since matched pairs of words only count one time, you can loop "upwards" when generating combinations:



    for each bin 0 .. 2^5 - 1
    for each higher bin (looping upwards here):
    if the two bins match
    the count of word pairs is # in bin-1 * # in bin-2
    add the count to the total count of matchable pairs


    This should be $O(n^2)$ on the number of bins (not words!), which is a constant. You can't optimize much, except that bins with counts of zero won't contribute anything so can be skipped.



    At this point you have your answer and just need to print it out.






    share|improve this answer









    $endgroup$













    • $begingroup$
      It is simple to optimise to runtime O(characters+bins). And you are right, size is O(1).
      $endgroup$
      – Deduplicator
      8 hours ago
















    1












    1








    1





    $begingroup$

    I'm no C++ guru, but some things jump out at me:





    1. Your coding style is very dense, which makes it hard to read and review. I'd suggest that you switch to a style that doesn't make you pay for each time you add a space. Code like this:



      cin>>t;
      while(t--)


      Just isn't as easy to read as code like this:



      cin >> t;
      while (t--)


      The little things make a difference! And even if you're in some kind of "shortest code" context, I think you should still write it long, and then compress it once you have it working.



    2. The first for loop in your while loop is concerned with reading and parsing the input strings. I believe you should break that into a separate function. What's more, I think you should make a slight change to your data storage as @Deduplicator suggested: instead of just using the bits located at (1 << (vowel - 'a')), at some point you should compress the bits down into the range 1<<0 .. 1<<4.


    3. I believe that the description says "(unordered) pairs" and means that if two strings, A and B, can be concatenated to meet the requirements, they only count once, as in set{A, B} and not twice, as in pair(A,B), pair(B,A). So, as @Deduplicator suggested, if you "bin" your words - that is, categorize them according to the trait "which vowels are present", then you can represent a bin with just an integer (how many words are in the bin). So you would then cross-match every bin with every other bin to determine whether the binned words can successfully pair, and add an appropriate number to the count.



    With that in mind, a successful solution would look something like this:



    for each input word:
    scan the word for vowels, recording which vowels were found
    classify the found vowels into a small integer with bits 0..4 set
    use the small integer as the "bin" number for that word
    increment the count for that bin#


    This should be $O(n cdot s)$ on the number of words, n, and the length of the words, s, in time. The only optimization would be to break out of the scanning loop if you match every single vowel, since further scanning gains you nothing. The code will require storage for the input, but compiles everything down to a single bin number, so the storage will be from $O(1)$ to $O(s)$ depending on how you implement the code. The bin numbers are members of a fixed set, so their counts will be $O(2^5)$ regardless, which simplifies to $O(1)$.



    Now since matched pairs of words only count one time, you can loop "upwards" when generating combinations:



    for each bin 0 .. 2^5 - 1
    for each higher bin (looping upwards here):
    if the two bins match
    the count of word pairs is # in bin-1 * # in bin-2
    add the count to the total count of matchable pairs


    This should be $O(n^2)$ on the number of bins (not words!), which is a constant. You can't optimize much, except that bins with counts of zero won't contribute anything so can be skipped.



    At this point you have your answer and just need to print it out.






    share|improve this answer









    $endgroup$



    I'm no C++ guru, but some things jump out at me:





    1. Your coding style is very dense, which makes it hard to read and review. I'd suggest that you switch to a style that doesn't make you pay for each time you add a space. Code like this:



      cin>>t;
      while(t--)


      Just isn't as easy to read as code like this:



      cin >> t;
      while (t--)


      The little things make a difference! And even if you're in some kind of "shortest code" context, I think you should still write it long, and then compress it once you have it working.



    2. The first for loop in your while loop is concerned with reading and parsing the input strings. I believe you should break that into a separate function. What's more, I think you should make a slight change to your data storage as @Deduplicator suggested: instead of just using the bits located at (1 << (vowel - 'a')), at some point you should compress the bits down into the range 1<<0 .. 1<<4.


    3. I believe that the description says "(unordered) pairs" and means that if two strings, A and B, can be concatenated to meet the requirements, they only count once, as in set{A, B} and not twice, as in pair(A,B), pair(B,A). So, as @Deduplicator suggested, if you "bin" your words - that is, categorize them according to the trait "which vowels are present", then you can represent a bin with just an integer (how many words are in the bin). So you would then cross-match every bin with every other bin to determine whether the binned words can successfully pair, and add an appropriate number to the count.



    With that in mind, a successful solution would look something like this:



    for each input word:
    scan the word for vowels, recording which vowels were found
    classify the found vowels into a small integer with bits 0..4 set
    use the small integer as the "bin" number for that word
    increment the count for that bin#


    This should be $O(n cdot s)$ on the number of words, n, and the length of the words, s, in time. The only optimization would be to break out of the scanning loop if you match every single vowel, since further scanning gains you nothing. The code will require storage for the input, but compiles everything down to a single bin number, so the storage will be from $O(1)$ to $O(s)$ depending on how you implement the code. The bin numbers are members of a fixed set, so their counts will be $O(2^5)$ regardless, which simplifies to $O(1)$.



    Now since matched pairs of words only count one time, you can loop "upwards" when generating combinations:



    for each bin 0 .. 2^5 - 1
    for each higher bin (looping upwards here):
    if the two bins match
    the count of word pairs is # in bin-1 * # in bin-2
    add the count to the total count of matchable pairs


    This should be $O(n^2)$ on the number of bins (not words!), which is a constant. You can't optimize much, except that bins with counts of zero won't contribute anything so can be skipped.



    At this point you have your answer and just need to print it out.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 9 hours ago









    Austin HastingsAustin Hastings

    6,9671233




    6,9671233












    • $begingroup$
      It is simple to optimise to runtime O(characters+bins). And you are right, size is O(1).
      $endgroup$
      – Deduplicator
      8 hours ago




















    • $begingroup$
      It is simple to optimise to runtime O(characters+bins). And you are right, size is O(1).
      $endgroup$
      – Deduplicator
      8 hours ago


















    $begingroup$
    It is simple to optimise to runtime O(characters+bins). And you are right, size is O(1).
    $endgroup$
    – Deduplicator
    8 hours ago






    $begingroup$
    It is simple to optimise to runtime O(characters+bins). And you are right, size is O(1).
    $endgroup$
    – Deduplicator
    8 hours ago















    1












    $begingroup$

    Your Code:




    1. Don't use <bits/stdc++.h>, it's unportable and inefficient. See "Why should I not #include <bits/stdc++.h>?" for the details.


    2. Don't use using namespace std;, that namespace just isn't designed for it. See "Why is “using namespace std” considered bad practice?" for more background.


    3. Invest in some better names. Being too parsimonious there hurts, even though brevity is a virtue.



    4. Format your code consistentently.




      • Sometimes you have space around binary operators (other than comma), sometimes you don't.

      • Also add a space after #include, for, if, and while.

      • You don't have to use a block for single statements, but please don't put them on the same line, at the least separate them with a space.



    5. C++ does not have VLAs. Just use a std::vector instead.


    6. While in general, keeping the scope of a variable minimal is a good idea, there are exceptions. One of them is efficiency. Always spinning up a new std::string means being unable to re-use the buffer. Not that you really need the whole string at once, anyway.


    7. One reason you should minimise a variables scope, is that it allows you to initialise it to the proper value, instead of leaving it uninitialised or, horrors of horrors, adding a spurious dummy-initialisation.

      That also allows you to avoid writing the type (Almost Always auto), and making it const or even constexpr.


    8. There are compound-assignment-operators for most binary operators. Like a |= b for a = a | b. Using them leads to shorter, more readable code.


    9. When you want to output a single character, why not use a character-literal? It's potentially even slightly more efficient.


    10. return 0; is implicit for main(). Take it or leave it, but be aware.



    The Algorithm:



    Your algorithm uses $O(#characters+#words^2)$ time and $O(max_word_length + #words)$ space.



    An optimal algorithm only needs $O(#characters+2^{#vowels})$ time and $O(1)$ space:




    1. For every word:


      1. Set the bits for all the vowels contained.

      2. Increment the count on the indicated bin.



    2. For every vowel:


      1. Iterate the bins containing words with that vowel.

      2. Add the count to the respective bin without that vowel.



    3. For all bins:
      Multiply the count in the bin with the count in the complementary bin, and add that.

    4. Subtract twice the count in the bin for words containing all vowels. Those account for all self-pairings.

    5. The answer is half the calculated number, as we counted double.


    Exercise for the attentive reader: Save half the multiplications this algorithm uses.






    share|improve this answer











    $endgroup$


















      1












      $begingroup$

      Your Code:




      1. Don't use <bits/stdc++.h>, it's unportable and inefficient. See "Why should I not #include <bits/stdc++.h>?" for the details.


      2. Don't use using namespace std;, that namespace just isn't designed for it. See "Why is “using namespace std” considered bad practice?" for more background.


      3. Invest in some better names. Being too parsimonious there hurts, even though brevity is a virtue.



      4. Format your code consistentently.




        • Sometimes you have space around binary operators (other than comma), sometimes you don't.

        • Also add a space after #include, for, if, and while.

        • You don't have to use a block for single statements, but please don't put them on the same line, at the least separate them with a space.



      5. C++ does not have VLAs. Just use a std::vector instead.


      6. While in general, keeping the scope of a variable minimal is a good idea, there are exceptions. One of them is efficiency. Always spinning up a new std::string means being unable to re-use the buffer. Not that you really need the whole string at once, anyway.


      7. One reason you should minimise a variables scope, is that it allows you to initialise it to the proper value, instead of leaving it uninitialised or, horrors of horrors, adding a spurious dummy-initialisation.

        That also allows you to avoid writing the type (Almost Always auto), and making it const or even constexpr.


      8. There are compound-assignment-operators for most binary operators. Like a |= b for a = a | b. Using them leads to shorter, more readable code.


      9. When you want to output a single character, why not use a character-literal? It's potentially even slightly more efficient.


      10. return 0; is implicit for main(). Take it or leave it, but be aware.



      The Algorithm:



      Your algorithm uses $O(#characters+#words^2)$ time and $O(max_word_length + #words)$ space.



      An optimal algorithm only needs $O(#characters+2^{#vowels})$ time and $O(1)$ space:




      1. For every word:


        1. Set the bits for all the vowels contained.

        2. Increment the count on the indicated bin.



      2. For every vowel:


        1. Iterate the bins containing words with that vowel.

        2. Add the count to the respective bin without that vowel.



      3. For all bins:
        Multiply the count in the bin with the count in the complementary bin, and add that.

      4. Subtract twice the count in the bin for words containing all vowels. Those account for all self-pairings.

      5. The answer is half the calculated number, as we counted double.


      Exercise for the attentive reader: Save half the multiplications this algorithm uses.






      share|improve this answer











      $endgroup$
















        1












        1








        1





        $begingroup$

        Your Code:




        1. Don't use <bits/stdc++.h>, it's unportable and inefficient. See "Why should I not #include <bits/stdc++.h>?" for the details.


        2. Don't use using namespace std;, that namespace just isn't designed for it. See "Why is “using namespace std” considered bad practice?" for more background.


        3. Invest in some better names. Being too parsimonious there hurts, even though brevity is a virtue.



        4. Format your code consistentently.




          • Sometimes you have space around binary operators (other than comma), sometimes you don't.

          • Also add a space after #include, for, if, and while.

          • You don't have to use a block for single statements, but please don't put them on the same line, at the least separate them with a space.



        5. C++ does not have VLAs. Just use a std::vector instead.


        6. While in general, keeping the scope of a variable minimal is a good idea, there are exceptions. One of them is efficiency. Always spinning up a new std::string means being unable to re-use the buffer. Not that you really need the whole string at once, anyway.


        7. One reason you should minimise a variables scope, is that it allows you to initialise it to the proper value, instead of leaving it uninitialised or, horrors of horrors, adding a spurious dummy-initialisation.

          That also allows you to avoid writing the type (Almost Always auto), and making it const or even constexpr.


        8. There are compound-assignment-operators for most binary operators. Like a |= b for a = a | b. Using them leads to shorter, more readable code.


        9. When you want to output a single character, why not use a character-literal? It's potentially even slightly more efficient.


        10. return 0; is implicit for main(). Take it or leave it, but be aware.



        The Algorithm:



        Your algorithm uses $O(#characters+#words^2)$ time and $O(max_word_length + #words)$ space.



        An optimal algorithm only needs $O(#characters+2^{#vowels})$ time and $O(1)$ space:




        1. For every word:


          1. Set the bits for all the vowels contained.

          2. Increment the count on the indicated bin.



        2. For every vowel:


          1. Iterate the bins containing words with that vowel.

          2. Add the count to the respective bin without that vowel.



        3. For all bins:
          Multiply the count in the bin with the count in the complementary bin, and add that.

        4. Subtract twice the count in the bin for words containing all vowels. Those account for all self-pairings.

        5. The answer is half the calculated number, as we counted double.


        Exercise for the attentive reader: Save half the multiplications this algorithm uses.






        share|improve this answer











        $endgroup$



        Your Code:




        1. Don't use <bits/stdc++.h>, it's unportable and inefficient. See "Why should I not #include <bits/stdc++.h>?" for the details.


        2. Don't use using namespace std;, that namespace just isn't designed for it. See "Why is “using namespace std” considered bad practice?" for more background.


        3. Invest in some better names. Being too parsimonious there hurts, even though brevity is a virtue.



        4. Format your code consistentently.




          • Sometimes you have space around binary operators (other than comma), sometimes you don't.

          • Also add a space after #include, for, if, and while.

          • You don't have to use a block for single statements, but please don't put them on the same line, at the least separate them with a space.



        5. C++ does not have VLAs. Just use a std::vector instead.


        6. While in general, keeping the scope of a variable minimal is a good idea, there are exceptions. One of them is efficiency. Always spinning up a new std::string means being unable to re-use the buffer. Not that you really need the whole string at once, anyway.


        7. One reason you should minimise a variables scope, is that it allows you to initialise it to the proper value, instead of leaving it uninitialised or, horrors of horrors, adding a spurious dummy-initialisation.

          That also allows you to avoid writing the type (Almost Always auto), and making it const or even constexpr.


        8. There are compound-assignment-operators for most binary operators. Like a |= b for a = a | b. Using them leads to shorter, more readable code.


        9. When you want to output a single character, why not use a character-literal? It's potentially even slightly more efficient.


        10. return 0; is implicit for main(). Take it or leave it, but be aware.



        The Algorithm:



        Your algorithm uses $O(#characters+#words^2)$ time and $O(max_word_length + #words)$ space.



        An optimal algorithm only needs $O(#characters+2^{#vowels})$ time and $O(1)$ space:




        1. For every word:


          1. Set the bits for all the vowels contained.

          2. Increment the count on the indicated bin.



        2. For every vowel:


          1. Iterate the bins containing words with that vowel.

          2. Add the count to the respective bin without that vowel.



        3. For all bins:
          Multiply the count in the bin with the count in the complementary bin, and add that.

        4. Subtract twice the count in the bin for words containing all vowels. Those account for all self-pairings.

        5. The answer is half the calculated number, as we counted double.


        Exercise for the attentive reader: Save half the multiplications this algorithm uses.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 3 hours ago

























        answered 8 hours ago









        DeduplicatorDeduplicator

        11.5k1850




        11.5k1850






















            AnonymousJoe is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            AnonymousJoe is a new contributor. Be nice, and check out our Code of Conduct.













            AnonymousJoe is a new contributor. Be nice, and check out our Code of Conduct.












            AnonymousJoe is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214644%2fcount-number-of-pairs-of-strings-which-on-joining-contain-all-vowels%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to make a Squid Proxy server?

            Is this a new Fibonacci Identity?

            19世紀