Determine which sentences contain the set of phrases

I am trying to solve this HackerRank challenge which asks to write a function that prints the indexes of all sentences which contain the query words. For example, if the inputs to the function are:

textQueries(sentences, queries)

textQueries(['jim likes mary','kate likes tom', 'tom does not like jim'],['jim tom', 'likes'])

Then the function will print

2

0 1

If there is no sentence contains the query words, -1 will be printed

The constraints for the inputs are:

The length of the sentences and queries lists will not exceed 10000.

The number of words in any sentence or query is from 1 to 10 words.

Each word has at most 11 characters.

No word appears in more than 10 sentences.

Each word consists of uppercase and lowercase English letters only.

My attempt so far is to create a hashmap which maps each single query word with a set that contains the indexes to the sentences which contain that word, then I intersect those sets with each other to get the final set. My code runs fine, but it is slow for large test cases. Is there any way I can have it optimized?

def textQueries(sentences, queries):

    word_map = {}

    for query in queries:

        query_words = query.split()

        final_set = set()

        set_modified = False

        for query_word in query_words:

            if query_word in word_map:

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                    continue

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

                    continue

            else:

                sentences_contain_word = 

                for index, sentence in enumerate(sentences):

                    words = sentence.split()

                    if query_word in words:

                        sentences_contain_word.append(index)

                        if len(sentences_contain_word) >= 10:

                            break

                word_map[query_word] = set(sentences_contain_word)

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

        if len(final_set) == 0:

            print(-1)

        else:

            for index in sorted(final_set):

                print(str(index), end=' ')

            print()

edited 3 hours ago

asked 6 hours ago

Tien Dinh

133

New contributor

$begingroup$
Welcome to Code Review! Your example prints 2 followed by 0 1if I execute the code. Could you verify that your code and the example output are correct?
$endgroup$
– Alex
3 hours ago

$begingroup$
yea it should be 2 and 0 1 sorry for the confusion
$endgroup$
– Tien Dinh
3 hours ago

add a comment |

I am trying to solve this HackerRank challenge which asks to write a function that prints the indexes of all sentences which contain the query words. For example, if the inputs to the function are:

textQueries(sentences, queries)

textQueries(['jim likes mary','kate likes tom', 'tom does not like jim'],['jim tom', 'likes'])

Then the function will print

2

0 1

If there is no sentence contains the query words, -1 will be printed

The constraints for the inputs are:

The length of the sentences and queries lists will not exceed 10000.

The number of words in any sentence or query is from 1 to 10 words.

Each word has at most 11 characters.

No word appears in more than 10 sentences.

Each word consists of uppercase and lowercase English letters only.

def textQueries(sentences, queries):

    word_map = {}

    for query in queries:

        query_words = query.split()

        final_set = set()

        set_modified = False

        for query_word in query_words:

            if query_word in word_map:

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                    continue

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

                    continue

            else:

                sentences_contain_word = 

                for index, sentence in enumerate(sentences):

                    words = sentence.split()

                    if query_word in words:

                        sentences_contain_word.append(index)

                        if len(sentences_contain_word) >= 10:

                            break

                word_map[query_word] = set(sentences_contain_word)

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

        if len(final_set) == 0:

            print(-1)

        else:

            for index in sorted(final_set):

                print(str(index), end=' ')

            print()

edited 3 hours ago

asked 6 hours ago

Tien Dinh

133

New contributor

$begingroup$
Welcome to Code Review! Your example prints 2 followed by 0 1if I execute the code. Could you verify that your code and the example output are correct?
$endgroup$
– Alex
3 hours ago

$begingroup$
yea it should be 2 and 0 1 sorry for the confusion
$endgroup$
– Tien Dinh
3 hours ago

add a comment |

I am trying to solve this HackerRank challenge which asks to write a function that prints the indexes of all sentences which contain the query words. For example, if the inputs to the function are:

textQueries(sentences, queries)

textQueries(['jim likes mary','kate likes tom', 'tom does not like jim'],['jim tom', 'likes'])

Then the function will print

2

0 1

If there is no sentence contains the query words, -1 will be printed

The constraints for the inputs are:

The length of the sentences and queries lists will not exceed 10000.

The number of words in any sentence or query is from 1 to 10 words.

Each word has at most 11 characters.

No word appears in more than 10 sentences.

Each word consists of uppercase and lowercase English letters only.

def textQueries(sentences, queries):

    word_map = {}

    for query in queries:

        query_words = query.split()

        final_set = set()

        set_modified = False

        for query_word in query_words:

            if query_word in word_map:

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                    continue

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

                    continue

            else:

                sentences_contain_word = 

                for index, sentence in enumerate(sentences):

                    words = sentence.split()

                    if query_word in words:

                        sentences_contain_word.append(index)

                        if len(sentences_contain_word) >= 10:

                            break

                word_map[query_word] = set(sentences_contain_word)

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

        if len(final_set) == 0:

            print(-1)

        else:

            for index in sorted(final_set):

                print(str(index), end=' ')

            print()

edited 3 hours ago

asked 6 hours ago

Tien Dinh

133

New contributor

I am trying to solve this HackerRank challenge which asks to write a function that prints the indexes of all sentences which contain the query words. For example, if the inputs to the function are:

textQueries(sentences, queries)

textQueries(['jim likes mary','kate likes tom', 'tom does not like jim'],['jim tom', 'likes'])

Then the function will print

2

0 1

If there is no sentence contains the query words, -1 will be printed

The constraints for the inputs are:

The length of the sentences and queries lists will not exceed 10000.

The number of words in any sentence or query is from 1 to 10 words.

Each word has at most 11 characters.

No word appears in more than 10 sentences.

Each word consists of uppercase and lowercase English letters only.

def textQueries(sentences, queries):

    word_map = {}

    for query in queries:

        query_words = query.split()

        final_set = set()

        set_modified = False

        for query_word in query_words:

            if query_word in word_map:

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                    continue

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

                    continue

            else:

                sentences_contain_word = 

                for index, sentence in enumerate(sentences):

                    words = sentence.split()

                    if query_word in words:

                        sentences_contain_word.append(index)

                        if len(sentences_contain_word) >= 10:

                            break

                word_map[query_word] = set(sentences_contain_word)

                if len(final_set) == 0 and not set_modified:

                    final_set = word_map[query_word]

                    set_modified = True

                elif len(final_set) == 0 and set_modified:

                    break

                else:

                    final_set = final_set.intersection(word_map[query_word])

        if len(final_set) == 0:

            print(-1)

        else:

            for index in sorted(final_set):

                print(str(index), end=' ')

            print()

python performance programming-challenge

edited 3 hours ago

asked 6 hours ago

Tien Dinh

133

New contributor

edited 3 hours ago

asked 6 hours ago

Tien Dinh

133

New contributor

edited 3 hours ago

asked 6 hours ago

Tien Dinh

133

New contributor

asked 6 hours ago

Tien Dinh

133

asked 6 hours ago

Tien Dinh

133

New contributor

Tien Dinh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Welcome to Code Review! Your example prints 2 followed by 0 1if I execute the code. Could you verify that your code and the example output are correct?
$endgroup$
– Alex
3 hours ago

$begingroup$
yea it should be 2 and 0 1 sorry for the confusion
$endgroup$
– Tien Dinh
3 hours ago

add a comment |

$begingroup$
Welcome to Code Review! Your example prints 2 followed by 0 1if I execute the code. Could you verify that your code and the example output are correct?
$endgroup$
– Alex
3 hours ago

$begingroup$
yea it should be 2 and 0 1 sorry for the confusion
$endgroup$
– Tien Dinh
3 hours ago

Welcome to Code Review! Your example prints 2 followed by 0 1if I execute the code. Could you verify that your code and the example output are correct?

– Alex
3 hours ago

yea it should be 2 and 0 1 sorry for the confusion

– Tien Dinh
3 hours ago

add a comment |

2 Answers
2

active

oldest

votes

Your code is fairly readable, but I find it quite dense. Here are some general things you could do to improve it:

Use vertical white space to separate different areas of the code.

Provide a docblock at the top outlining the algorithm you are using.

Use more than one function to break your code into smaller chunks.

With that advice out of the way, let's look at your performance with an eye to the worst possible case:

def textQueries(sentences, queries):

    for query in queries:             # Worst case: 10,000 times

        ...

        for query_word in query_words:  # Worst case: 10 times

            if query_word in word_map:

                    final_set.intersection  # Worst case: O(10,000) in C

            else:

                for ... in enumerate(sentences):  # Worst case: 10,000 times

                if ... else:

                    final_set.intersection  # Worst case: O(10,000) in C

According to me, your code runs worst case in $O(Q times W times S)$ where $Q$ is the number of queries, $W$ is the number of words in the query, and $S$ is the number of sentences. (The number of words in the sentences is hidden behind a split, but it's in C so we'll let it go by.)

There's no question you need to split the sentences and split the queries into separate words, somehow. But you are nesting your loops when the "best" way to loop is to do them one after the other. If you can convert from this:

for q in Q:

    for s in Q:

        pass

to something like this:

for s in S:

    pass



for q in Q:

    pass

You will have converted from $O(S times Q)$ to $O(S + Q)$ which is considerably faster.

Your current approach is to define a set of sentences that contain a particular word. You cache these sets, in case the same words appear again. You intersect the sets for each word in the query, producing a final set of all the sentences that contain all the query words.

This is pretty much guaranteed to fail based on the instructions you quoted:

No word appears in more than 10 sentences.

In other words, the probability of a "cache hit" is 10/10,000, or 0.1%. Any time you spend maintaining the cache is wasted time.

Suppose you just go through the sentences one time, and build the big dictionary of sets:

sentence_words = collections.defaultdict(set)

for i, s in enumerate(sentences):

    for w in s.split():

        sentence_words[s].add(i)

Then you can go through the query words and do what you were doing before, without any caching:

for q in queries:

    matches = functools.reduce(operator.and_, (sentence_words[w] for w in q.split()))

    matches = [-1] if not matches else sorted(matches)

This will seem very similar to what you were doing, but the difference is that these two loops are run one after the other, providing something like $O(100,000 + 100,000)$ performance instead of $O(100,000 times 10,000)$. It's that last operator that makes all the difference.

answered 1 hour ago

Austin Hastings

7,3071233

add a comment |

I'm not sure how much there can be gained performance-wise, since the problem has an inherent theoretical complexity limit (quite nicely elaborated in this answer) not even the most well crafted code can overcome. Additionally, Python usually is not terribly fast when you have a lot of (nested) loops.

Python IDEs like Spyder come with a built-in profiler tool (see the docs), which allows you to identify performance bottlenecks. Having a quick run with your small example code showed me that on my machine most of the time is actually spent in printing the values.

Feel free to give it a go with more realistic inputs on your machine.

Nevertheless, I think your code could easily gain quite a bit on the style and clarity side.

What do I mean by this? Well, let me elaborate:

What (I think) your code wants to do:

Build a dictionary/hash map where each query word is a key that maps to a collection of indices of all sentences where the query word occurs.

Use this dictionary to determine the indices of all sentences containing all the query's words

Print those indices to the command line

Printing

Let's start with an easy one, #3. Your code was

if len(final_set) == 0:

    print(-1)

else:

    for index in sorted(final_set):

        print(str(index), end=' ')

    print()

which does what you want to do. But can we do better?
I would like to propose as slight improvement.

if final_set:

    print(' '.join(str(i) for i in sorted(final_set)))

else:

    print(-1)

So what is going on here?

First, if final_set: quite obviously replaces if len(final_set) == 0:.

How? In Python empty "iterables" like lists, tuples, strings or in your case sets evaluate to False which would then print -1 in the console, just as before.

Next, " ".join(str(i) for i in sorted(final_set)) uses list comprehension together with the built-in join-function to generate the final indices output.
This does basically the same as your original code and is just a little shorter, as a bonus you get the newline for free.

Building the hash map

To get to point #1, most of the heavy lifting is done in the else branch of the major if statement in your code.

So far so straightforward, not much to gain here. In my oppinion, words = sentence.split() should not be necessary. Python's in is capable of finding the word in the full string, too.

One could argue that it would be increase clarity and maintainability to place move this functionality to the beginning of the function (or in a seperate function altogether) to better seperate subtasks #1 and #2.

Using the hash map

Last but not least #2.

Your code in the if branch gets a little bit involved here.

If you build your word_map before processing the sentences and I'm not mistaken this can be simplified quite a bit to

final_set = set()

for word in query.split():

    if not final_set:

        final_set = word_map.get(word, set())

    else:

        final_set = final_set.intersection(word_map.get(word, set()))

    if not final_set:

        break

I have talked about using the "trueness/falseness" of iterables before so the use here should not suprise you. word_map.get(word, set()) is just a way of saying "get me the value of key word, and if there's nothing, return set()".

Final code

My final code was

def textQueriesRevised(sentences, queries):

    # 1. build the hash map

    word_map = {}

    for query in queries:

        for query_word in query.split():        

            for i, sentence in enumerate(sentences):

                if query_word in sentence:

                    word_map.setdefault(query_word, ).append(i)

    for query in queries:

        # 2. look up in the hash map

        final_set = set()

        for word in query.split():

            if not final_set:

                final_set = set(word_map.get(word, ))

            else:

                final_set = final_set.intersection(word_map.get(word, ))

            if not final_set:

                break

        # 3. print the indices

        if final_set:

            print(' '.join(str(i) for i in sorted(final_set)))

            # or

            #print(*sorted(final_set), sep=' ')

        else:

            print(-1)

You will find subtle differeces to some of the snippets posted above, but these mainly stem from me not wanting to work with sets while populating word_map (no particular reason for this - apart from that it's getting late here).
Performance-wise this should be comparable to your original solution, at least in the cases where I tested it.

answered 1 hour ago

Alex

55127

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Tien Dinh is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f215792%2fdetermine-which-sentences-contain-the-set-of-phrases%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Your code is fairly readable, but I find it quite dense. Here are some general things you could do to improve it:

Use vertical white space to separate different areas of the code.

Provide a docblock at the top outlining the algorithm you are using.

Use more than one function to break your code into smaller chunks.

With that advice out of the way, let's look at your performance with an eye to the worst possible case:

def textQueries(sentences, queries):

    for query in queries:             # Worst case: 10,000 times

        ...

        for query_word in query_words:  # Worst case: 10 times

            if query_word in word_map:

                    final_set.intersection  # Worst case: O(10,000) in C

            else:

                for ... in enumerate(sentences):  # Worst case: 10,000 times

                if ... else:

                    final_set.intersection  # Worst case: O(10,000) in C

for q in Q:

    for s in Q:

        pass

to something like this:

for s in S:

    pass



for q in Q:

    pass

You will have converted from $O(S times Q)$ to $O(S + Q)$ which is considerably faster.

This is pretty much guaranteed to fail based on the instructions you quoted:

No word appears in more than 10 sentences.

In other words, the probability of a "cache hit" is 10/10,000, or 0.1%. Any time you spend maintaining the cache is wasted time.

Suppose you just go through the sentences one time, and build the big dictionary of sets:

sentence_words = collections.defaultdict(set)

for i, s in enumerate(sentences):

    for w in s.split():

        sentence_words[s].add(i)

Then you can go through the query words and do what you were doing before, without any caching:

for q in queries:

    matches = functools.reduce(operator.and_, (sentence_words[w] for w in q.split()))

    matches = [-1] if not matches else sorted(matches)

answered 1 hour ago

Austin Hastings

7,3071233

add a comment |

Your code is fairly readable, but I find it quite dense. Here are some general things you could do to improve it:

Use vertical white space to separate different areas of the code.

Provide a docblock at the top outlining the algorithm you are using.

Use more than one function to break your code into smaller chunks.

With that advice out of the way, let's look at your performance with an eye to the worst possible case:

def textQueries(sentences, queries):

    for query in queries:             # Worst case: 10,000 times

        ...

        for query_word in query_words:  # Worst case: 10 times

            if query_word in word_map:

                    final_set.intersection  # Worst case: O(10,000) in C

            else:

                for ... in enumerate(sentences):  # Worst case: 10,000 times

                if ... else:

                    final_set.intersection  # Worst case: O(10,000) in C

for q in Q:

    for s in Q:

        pass

to something like this:

for s in S:

    pass



for q in Q:

    pass

You will have converted from $O(S times Q)$ to $O(S + Q)$ which is considerably faster.

This is pretty much guaranteed to fail based on the instructions you quoted:

No word appears in more than 10 sentences.

In other words, the probability of a "cache hit" is 10/10,000, or 0.1%. Any time you spend maintaining the cache is wasted time.

Suppose you just go through the sentences one time, and build the big dictionary of sets:

sentence_words = collections.defaultdict(set)

for i, s in enumerate(sentences):

    for w in s.split():

        sentence_words[s].add(i)

Then you can go through the query words and do what you were doing before, without any caching:

for q in queries:

    matches = functools.reduce(operator.and_, (sentence_words[w] for w in q.split()))

    matches = [-1] if not matches else sorted(matches)

answered 1 hour ago

Austin Hastings

7,3071233

add a comment |

Your code is fairly readable, but I find it quite dense. Here are some general things you could do to improve it:

Use vertical white space to separate different areas of the code.

Provide a docblock at the top outlining the algorithm you are using.

Use more than one function to break your code into smaller chunks.

With that advice out of the way, let's look at your performance with an eye to the worst possible case:

def textQueries(sentences, queries):

    for query in queries:             # Worst case: 10,000 times

        ...

        for query_word in query_words:  # Worst case: 10 times

            if query_word in word_map:

                    final_set.intersection  # Worst case: O(10,000) in C

            else:

                for ... in enumerate(sentences):  # Worst case: 10,000 times

                if ... else:

                    final_set.intersection  # Worst case: O(10,000) in C

for q in Q:

    for s in Q:

        pass

to something like this:

for s in S:

    pass



for q in Q:

    pass

You will have converted from $O(S times Q)$ to $O(S + Q)$ which is considerably faster.

This is pretty much guaranteed to fail based on the instructions you quoted:

No word appears in more than 10 sentences.

In other words, the probability of a "cache hit" is 10/10,000, or 0.1%. Any time you spend maintaining the cache is wasted time.

Suppose you just go through the sentences one time, and build the big dictionary of sets:

sentence_words = collections.defaultdict(set)

for i, s in enumerate(sentences):

    for w in s.split():

        sentence_words[s].add(i)

Then you can go through the query words and do what you were doing before, without any caching:

for q in queries:

    matches = functools.reduce(operator.and_, (sentence_words[w] for w in q.split()))

    matches = [-1] if not matches else sorted(matches)

answered 1 hour ago

Austin Hastings

7,3071233

Your code is fairly readable, but I find it quite dense. Here are some general things you could do to improve it:

Use vertical white space to separate different areas of the code.

Provide a docblock at the top outlining the algorithm you are using.

Use more than one function to break your code into smaller chunks.

With that advice out of the way, let's look at your performance with an eye to the worst possible case:

def textQueries(sentences, queries):

    for query in queries:             # Worst case: 10,000 times

        ...

        for query_word in query_words:  # Worst case: 10 times

            if query_word in word_map:

                    final_set.intersection  # Worst case: O(10,000) in C

            else:

                for ... in enumerate(sentences):  # Worst case: 10,000 times

                if ... else:

                    final_set.intersection  # Worst case: O(10,000) in C

for q in Q:

    for s in Q:

        pass

to something like this:

for s in S:

    pass



for q in Q:

    pass

You will have converted from $O(S times Q)$ to $O(S + Q)$ which is considerably faster.

This is pretty much guaranteed to fail based on the instructions you quoted:

No word appears in more than 10 sentences.

In other words, the probability of a "cache hit" is 10/10,000, or 0.1%. Any time you spend maintaining the cache is wasted time.

Suppose you just go through the sentences one time, and build the big dictionary of sets:

sentence_words = collections.defaultdict(set)

for i, s in enumerate(sentences):

    for w in s.split():

        sentence_words[s].add(i)

Then you can go through the query words and do what you were doing before, without any caching:

for q in queries:

    matches = functools.reduce(operator.and_, (sentence_words[w] for w in q.split()))

    matches = [-1] if not matches else sorted(matches)

answered 1 hour ago

Austin Hastings

7,3071233

answered 1 hour ago

Austin Hastings

7,3071233

answered 1 hour ago

Austin Hastings

7,3071233

answered 1 hour ago

Austin Hastings

7,3071233

add a comment |

Nevertheless, I think your code could easily gain quite a bit on the style and clarity side.

What do I mean by this? Well, let me elaborate:

What (I think) your code wants to do:

Build a dictionary/hash map where each query word is a key that maps to a collection of indices of all sentences where the query word occurs.

Use this dictionary to determine the indices of all sentences containing all the query's words

Print those indices to the command line

Printing

Let's start with an easy one, #3. Your code was

if len(final_set) == 0:

    print(-1)

else:

    for index in sorted(final_set):

        print(str(index), end=' ')

    print()

which does what you want to do. But can we do better?
I would like to propose as slight improvement.

if final_set:

    print(' '.join(str(i) for i in sorted(final_set)))

else:

    print(-1)

Building the hash map

Using the hash map

final_set = set()

for word in query.split():

    if not final_set:

        final_set = word_map.get(word, set())

    else:

        final_set = final_set.intersection(word_map.get(word, set()))

    if not final_set:

        break

Final code

My final code was

def textQueriesRevised(sentences, queries):

    # 1. build the hash map

    word_map = {}

    for query in queries:

        for query_word in query.split():        

            for i, sentence in enumerate(sentences):

                if query_word in sentence:

                    word_map.setdefault(query_word, ).append(i)

    for query in queries:

        # 2. look up in the hash map

        final_set = set()

        for word in query.split():

            if not final_set:

                final_set = set(word_map.get(word, ))

            else:

                final_set = final_set.intersection(word_map.get(word, ))

            if not final_set:

                break

        # 3. print the indices

        if final_set:

            print(' '.join(str(i) for i in sorted(final_set)))

            # or

            #print(*sorted(final_set), sep=' ')

        else:

            print(-1)

answered 1 hour ago

Alex

55127

add a comment |

Nevertheless, I think your code could easily gain quite a bit on the style and clarity side.

What do I mean by this? Well, let me elaborate:

What (I think) your code wants to do:

Build a dictionary/hash map where each query word is a key that maps to a collection of indices of all sentences where the query word occurs.

Use this dictionary to determine the indices of all sentences containing all the query's words

Print those indices to the command line

Printing

Let's start with an easy one, #3. Your code was

if len(final_set) == 0:

    print(-1)

else:

    for index in sorted(final_set):

        print(str(index), end=' ')

    print()

which does what you want to do. But can we do better?
I would like to propose as slight improvement.

if final_set:

    print(' '.join(str(i) for i in sorted(final_set)))

else:

    print(-1)

Building the hash map

Using the hash map

final_set = set()

for word in query.split():

    if not final_set:

        final_set = word_map.get(word, set())

    else:

        final_set = final_set.intersection(word_map.get(word, set()))

    if not final_set:

        break

Final code

My final code was

def textQueriesRevised(sentences, queries):

    # 1. build the hash map

    word_map = {}

    for query in queries:

        for query_word in query.split():        

            for i, sentence in enumerate(sentences):

                if query_word in sentence:

                    word_map.setdefault(query_word, ).append(i)

    for query in queries:

        # 2. look up in the hash map

        final_set = set()

        for word in query.split():

            if not final_set:

                final_set = set(word_map.get(word, ))

            else:

                final_set = final_set.intersection(word_map.get(word, ))

            if not final_set:

                break

        # 3. print the indices

        if final_set:

            print(' '.join(str(i) for i in sorted(final_set)))

            # or

            #print(*sorted(final_set), sep=' ')

        else:

            print(-1)

answered 1 hour ago

Alex

55127

add a comment |

Nevertheless, I think your code could easily gain quite a bit on the style and clarity side.

What do I mean by this? Well, let me elaborate:

What (I think) your code wants to do:

Build a dictionary/hash map where each query word is a key that maps to a collection of indices of all sentences where the query word occurs.

Use this dictionary to determine the indices of all sentences containing all the query's words

Print those indices to the command line

Printing

Let's start with an easy one, #3. Your code was

if len(final_set) == 0:

    print(-1)

else:

    for index in sorted(final_set):

        print(str(index), end=' ')

    print()

which does what you want to do. But can we do better?
I would like to propose as slight improvement.

if final_set:

    print(' '.join(str(i) for i in sorted(final_set)))

else:

    print(-1)

Building the hash map

Using the hash map

final_set = set()

for word in query.split():

    if not final_set:

        final_set = word_map.get(word, set())

    else:

        final_set = final_set.intersection(word_map.get(word, set()))

    if not final_set:

        break

Final code

My final code was

def textQueriesRevised(sentences, queries):

    # 1. build the hash map

    word_map = {}

    for query in queries:

        for query_word in query.split():        

            for i, sentence in enumerate(sentences):

                if query_word in sentence:

                    word_map.setdefault(query_word, ).append(i)

    for query in queries:

        # 2. look up in the hash map

        final_set = set()

        for word in query.split():

            if not final_set:

                final_set = set(word_map.get(word, ))

            else:

                final_set = final_set.intersection(word_map.get(word, ))

            if not final_set:

                break

        # 3. print the indices

        if final_set:

            print(' '.join(str(i) for i in sorted(final_set)))

            # or

            #print(*sorted(final_set), sep=' ')

        else:

            print(-1)

answered 1 hour ago

Alex

55127

Nevertheless, I think your code could easily gain quite a bit on the style and clarity side.

What do I mean by this? Well, let me elaborate:

What (I think) your code wants to do:

Build a dictionary/hash map where each query word is a key that maps to a collection of indices of all sentences where the query word occurs.

Use this dictionary to determine the indices of all sentences containing all the query's words

Print those indices to the command line

Printing

Let's start with an easy one, #3. Your code was

if len(final_set) == 0:

    print(-1)

else:

    for index in sorted(final_set):

        print(str(index), end=' ')

    print()

which does what you want to do. But can we do better?
I would like to propose as slight improvement.

if final_set:

    print(' '.join(str(i) for i in sorted(final_set)))

else:

    print(-1)

Building the hash map

Using the hash map

final_set = set()

for word in query.split():

    if not final_set:

        final_set = word_map.get(word, set())

    else:

        final_set = final_set.intersection(word_map.get(word, set()))

    if not final_set:

        break

Final code

My final code was

def textQueriesRevised(sentences, queries):

    # 1. build the hash map

    word_map = {}

    for query in queries:

        for query_word in query.split():        

            for i, sentence in enumerate(sentences):

                if query_word in sentence:

                    word_map.setdefault(query_word, ).append(i)

    for query in queries:

        # 2. look up in the hash map

        final_set = set()

        for word in query.split():

            if not final_set:

                final_set = set(word_map.get(word, ))

            else:

                final_set = final_set.intersection(word_map.get(word, ))

            if not final_set:

                break

        # 3. print the indices

        if final_set:

            print(' '.join(str(i) for i in sorted(final_set)))

            # or

            #print(*sorted(final_set), sep=' ')

        else:

            print(-1)

answered 1 hour ago

Alex

55127

answered 1 hour ago

Alex

55127

answered 1 hour ago

Alex

55127

answered 1 hour ago

Alex

55127

add a comment |

Tien Dinh is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Tien Dinh is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly

Determine which sentences contain the set of phrases

2 Answers
2

Printing

Building the hash map

Using the hash map

Final code

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Printing

Building the hash map

Using the hash map

Final code

Printing

Building the hash map

Using the hash map

Final code

Printing

Building the hash map

Using the hash map

Final code

Printing

Building the hash map

Using the hash map

Final code

Post as a guest

Popular posts from this blog

How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

is 'sed' thread safe

How to make a Squid Proxy server?

Determine which sentences contain the set of phrases

2 Answers 2

Printing

Building the hash map

Using the hash map

Final code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Printing

Building the hash map

Using the hash map

Final code

Printing

Building the hash map

Using the hash map

Final code

Printing

Building the hash map

Using the hash map

Final code

Printing

Building the hash map

Using the hash map

Final code

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

is 'sed' thread safe

How to make a Squid Proxy server?

2 Answers
2

2 Answers
2

2 Answers
2