grep: memory exhausted
I was doing a very simple search:
grep -R Milledgeville ~/Documents
And after some time this error appeared:
grep: memory exhausted
How can I avoid this?
I have 10GB of RAM on my system and few applications running, so I am really surprised a simple grep runs out of memory. ~/Documents
is about 100GB and contains all kinds of files.
grep -RI
might not have this problem, but I want to search in binary files too.
grep memory performance
add a comment |
I was doing a very simple search:
grep -R Milledgeville ~/Documents
And after some time this error appeared:
grep: memory exhausted
How can I avoid this?
I have 10GB of RAM on my system and few applications running, so I am really surprised a simple grep runs out of memory. ~/Documents
is about 100GB and contains all kinds of files.
grep -RI
might not have this problem, but I want to search in binary files too.
grep memory performance
add a comment |
I was doing a very simple search:
grep -R Milledgeville ~/Documents
And after some time this error appeared:
grep: memory exhausted
How can I avoid this?
I have 10GB of RAM on my system and few applications running, so I am really surprised a simple grep runs out of memory. ~/Documents
is about 100GB and contains all kinds of files.
grep -RI
might not have this problem, but I want to search in binary files too.
grep memory performance
I was doing a very simple search:
grep -R Milledgeville ~/Documents
And after some time this error appeared:
grep: memory exhausted
How can I avoid this?
I have 10GB of RAM on my system and few applications running, so I am really surprised a simple grep runs out of memory. ~/Documents
is about 100GB and contains all kinds of files.
grep -RI
might not have this problem, but I want to search in binary files too.
grep memory performance
grep memory performance
edited Sep 10 '13 at 13:28
Nicolas Raoul
asked Sep 10 '13 at 8:55
Nicolas RaoulNicolas Raoul
2,87383144
2,87383144
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
Two potential problems:
grep -R
(except for the modified GNUgrep
found on OS/X 10.8 and above) follows symlinks, so even if there's only 100GB of files in~/Documents
, there might still be a symlink to/
for instance and you'll end up scanning the whole file system including files like/dev/zero
. Usegrep -r
with newer GNUgrep
, or use the standard syntax:
find ~/Documents -type f -exec grep Milledgeville /dev/null {} +
(however note that the exit status won't reflect the fact that the pattern is matched or not).
grep
finds the lines that match the pattern. For that, it has to load one line at a time in memory. GNUgrep
as opposed to many othergrep
implementations doesn't have a limit on the size of the lines it reads and supports search in binary files. So, if you've got a file with a very big line (that is, with two newline characters very far appart), bigger than the available memory, it will fail.
That would typically happen with a sparse file. You can reproduce it with:
truncate -s200G some-file
grep foo some-file
That one is difficult to work around. You could do it as (still with GNU
grep
):
find ~/Documents -type f -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} +
That converts sequences of NUL characters into one newline character prior to feeding the input to
grep
. That would cover for cases where the problem is due to sparse files.
You could optimise it by doing it only for large files:
find ~/Documents -type f ( -size -100M -exec
grep -He Milledgeville {} + -o -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} + )
If the files are not sparse and you have a version of GNU
grep
prior to2.6
, you can use the--mmap
option. The lines will be mmapped in memory as opposed to copied there, which means the system can always reclaim the memory by paging out the pages to the file. That option was removed in GNUgrep
2.6
Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: lists.freebsd.org/pipermail/freebsd-current/2010-August/…
– Godric Seer
Sep 10 '13 at 12:32
4
@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM.
– Stéphane Chazelas
Sep 10 '13 at 12:44
1
@GodricSeer, well no. If lines are all small,grep
can discard the buffers it has processed so far. You cangrep
the output ofyes
indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines.
– Stéphane Chazelas
Sep 10 '13 at 12:51
3
The GNU grep--null-data
option may also be useful here. It forces the use of NUL instead of newline as an input line terminator.
– iruvar
Sep 16 '13 at 15:27
1
@1_CR, good point, though that also sets the output line terminator to NUL.
– Stéphane Chazelas
Sep 16 '13 at 15:35
|
show 6 more comments
I usually do
find ~/Documents | xargs grep -ne 'expression'
I tried a bunch of methods, and found this to be the fastest. Note that this doesn't handle files with spaces the file name very well. If you know this is the case and have a GNU version of grep, you can use:
find ~/Documents -print0 | xargs -0 grep -ne 'expression'
If not you can use:
find ~/Documents -exec grep -ne 'expression' "{}" ;
Which will exec
a grep for every file.
This will break on files with spaces.
– Chris Down
Sep 10 '13 at 11:04
Hmm, that is true.
– Kotte
Sep 10 '13 at 11:08
You can get around that withfind -print0 | xargs -0 grep -ne 'expression'
– Drav Sloan
Sep 10 '13 at 11:09
@ChrisDown rather a non-protable solution than a broken-portable solution.
– reto
Sep 10 '13 at 16:41
@ChrisDown Most major unices have adoptedfind -print0
andxargs -0
by now: all three BSD, MINIX 3, Solaris 11, …
– Gilles
Sep 10 '13 at 21:21
|
show 1 more comment
I can think of a few ways to get around this:
Instead of grepping all files at once, do one file at a time. Example:
find /Documents -type f -exec grep -H Milledgeville "{}" ;
If you only need to know which files contain the words, do
grep -l
instead. Since grep will there stop searching after the first hit, it won't have to keep reading any huge files
If you do want the actual text as well, you could string two separate greps along:
for file in $( grep -Rl Milledgeville /Documents ); do grep -H Milledgeville "$file"; done
The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, sincegrep
outputs using a delimiter that is legal in file names). You also need to quote$file
.
– Chris Down
Sep 10 '13 at 11:05
The latter example suffers with the issue of file names having newline or whitespace in them, (it will causefor
to process the file as two arguments)
– Drav Sloan
Sep 10 '13 at 11:12
@DravSloan Your edit, while an improvement, still breaks on legal file names.
– Chris Down
Sep 10 '13 at 11:19
1
Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files).
– Drav Sloan
Sep 10 '13 at 11:34
Corrections of his -> her, my apologies Jenny :/
– Drav Sloan
Sep 10 '13 at 11:38
add a comment |
I'm grepping a 6TB disk to search for lost data, and got the memory exhausted -error. This should work for other files too.
The solution we came up with was to read the disk in chunks by using dd, and grepping the chunks. This is the code (big-grep.sh):
#problem: grep gives "memory exhausted" error on 6TB disks
#solution: read it on parts
FILE=$1
MATCH=$2
#TODO this is still incomplete, need to get some way to read the size of the file and how many times to read it (so that modulo is 0)
BYTES=732565323
SIZE=6001175126016
COUNT=8192
#BYTES=$(expr 4 * 1024)
#COUNT=$(expr $SIZE / $BYTES)
#TODO didn't get the variable to work for some readon
#for I in {1..$COUNT}; do
for I in {0..8192}; do
dd bs=$BYTES skip=$I count=1 if=$FILE status=none |buffer |grep -UF -a --context 6 "$MATCH"
done
Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match.
– Kusalananda
Jan 28 at 19:59
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f90036%2fgrep-memory-exhausted%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Two potential problems:
grep -R
(except for the modified GNUgrep
found on OS/X 10.8 and above) follows symlinks, so even if there's only 100GB of files in~/Documents
, there might still be a symlink to/
for instance and you'll end up scanning the whole file system including files like/dev/zero
. Usegrep -r
with newer GNUgrep
, or use the standard syntax:
find ~/Documents -type f -exec grep Milledgeville /dev/null {} +
(however note that the exit status won't reflect the fact that the pattern is matched or not).
grep
finds the lines that match the pattern. For that, it has to load one line at a time in memory. GNUgrep
as opposed to many othergrep
implementations doesn't have a limit on the size of the lines it reads and supports search in binary files. So, if you've got a file with a very big line (that is, with two newline characters very far appart), bigger than the available memory, it will fail.
That would typically happen with a sparse file. You can reproduce it with:
truncate -s200G some-file
grep foo some-file
That one is difficult to work around. You could do it as (still with GNU
grep
):
find ~/Documents -type f -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} +
That converts sequences of NUL characters into one newline character prior to feeding the input to
grep
. That would cover for cases where the problem is due to sparse files.
You could optimise it by doing it only for large files:
find ~/Documents -type f ( -size -100M -exec
grep -He Milledgeville {} + -o -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} + )
If the files are not sparse and you have a version of GNU
grep
prior to2.6
, you can use the--mmap
option. The lines will be mmapped in memory as opposed to copied there, which means the system can always reclaim the memory by paging out the pages to the file. That option was removed in GNUgrep
2.6
Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: lists.freebsd.org/pipermail/freebsd-current/2010-August/…
– Godric Seer
Sep 10 '13 at 12:32
4
@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM.
– Stéphane Chazelas
Sep 10 '13 at 12:44
1
@GodricSeer, well no. If lines are all small,grep
can discard the buffers it has processed so far. You cangrep
the output ofyes
indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines.
– Stéphane Chazelas
Sep 10 '13 at 12:51
3
The GNU grep--null-data
option may also be useful here. It forces the use of NUL instead of newline as an input line terminator.
– iruvar
Sep 16 '13 at 15:27
1
@1_CR, good point, though that also sets the output line terminator to NUL.
– Stéphane Chazelas
Sep 16 '13 at 15:35
|
show 6 more comments
Two potential problems:
grep -R
(except for the modified GNUgrep
found on OS/X 10.8 and above) follows symlinks, so even if there's only 100GB of files in~/Documents
, there might still be a symlink to/
for instance and you'll end up scanning the whole file system including files like/dev/zero
. Usegrep -r
with newer GNUgrep
, or use the standard syntax:
find ~/Documents -type f -exec grep Milledgeville /dev/null {} +
(however note that the exit status won't reflect the fact that the pattern is matched or not).
grep
finds the lines that match the pattern. For that, it has to load one line at a time in memory. GNUgrep
as opposed to many othergrep
implementations doesn't have a limit on the size of the lines it reads and supports search in binary files. So, if you've got a file with a very big line (that is, with two newline characters very far appart), bigger than the available memory, it will fail.
That would typically happen with a sparse file. You can reproduce it with:
truncate -s200G some-file
grep foo some-file
That one is difficult to work around. You could do it as (still with GNU
grep
):
find ~/Documents -type f -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} +
That converts sequences of NUL characters into one newline character prior to feeding the input to
grep
. That would cover for cases where the problem is due to sparse files.
You could optimise it by doing it only for large files:
find ~/Documents -type f ( -size -100M -exec
grep -He Milledgeville {} + -o -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} + )
If the files are not sparse and you have a version of GNU
grep
prior to2.6
, you can use the--mmap
option. The lines will be mmapped in memory as opposed to copied there, which means the system can always reclaim the memory by paging out the pages to the file. That option was removed in GNUgrep
2.6
Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: lists.freebsd.org/pipermail/freebsd-current/2010-August/…
– Godric Seer
Sep 10 '13 at 12:32
4
@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM.
– Stéphane Chazelas
Sep 10 '13 at 12:44
1
@GodricSeer, well no. If lines are all small,grep
can discard the buffers it has processed so far. You cangrep
the output ofyes
indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines.
– Stéphane Chazelas
Sep 10 '13 at 12:51
3
The GNU grep--null-data
option may also be useful here. It forces the use of NUL instead of newline as an input line terminator.
– iruvar
Sep 16 '13 at 15:27
1
@1_CR, good point, though that also sets the output line terminator to NUL.
– Stéphane Chazelas
Sep 16 '13 at 15:35
|
show 6 more comments
Two potential problems:
grep -R
(except for the modified GNUgrep
found on OS/X 10.8 and above) follows symlinks, so even if there's only 100GB of files in~/Documents
, there might still be a symlink to/
for instance and you'll end up scanning the whole file system including files like/dev/zero
. Usegrep -r
with newer GNUgrep
, or use the standard syntax:
find ~/Documents -type f -exec grep Milledgeville /dev/null {} +
(however note that the exit status won't reflect the fact that the pattern is matched or not).
grep
finds the lines that match the pattern. For that, it has to load one line at a time in memory. GNUgrep
as opposed to many othergrep
implementations doesn't have a limit on the size of the lines it reads and supports search in binary files. So, if you've got a file with a very big line (that is, with two newline characters very far appart), bigger than the available memory, it will fail.
That would typically happen with a sparse file. You can reproduce it with:
truncate -s200G some-file
grep foo some-file
That one is difficult to work around. You could do it as (still with GNU
grep
):
find ~/Documents -type f -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} +
That converts sequences of NUL characters into one newline character prior to feeding the input to
grep
. That would cover for cases where the problem is due to sparse files.
You could optimise it by doing it only for large files:
find ~/Documents -type f ( -size -100M -exec
grep -He Milledgeville {} + -o -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} + )
If the files are not sparse and you have a version of GNU
grep
prior to2.6
, you can use the--mmap
option. The lines will be mmapped in memory as opposed to copied there, which means the system can always reclaim the memory by paging out the pages to the file. That option was removed in GNUgrep
2.6
Two potential problems:
grep -R
(except for the modified GNUgrep
found on OS/X 10.8 and above) follows symlinks, so even if there's only 100GB of files in~/Documents
, there might still be a symlink to/
for instance and you'll end up scanning the whole file system including files like/dev/zero
. Usegrep -r
with newer GNUgrep
, or use the standard syntax:
find ~/Documents -type f -exec grep Milledgeville /dev/null {} +
(however note that the exit status won't reflect the fact that the pattern is matched or not).
grep
finds the lines that match the pattern. For that, it has to load one line at a time in memory. GNUgrep
as opposed to many othergrep
implementations doesn't have a limit on the size of the lines it reads and supports search in binary files. So, if you've got a file with a very big line (that is, with two newline characters very far appart), bigger than the available memory, it will fail.
That would typically happen with a sparse file. You can reproduce it with:
truncate -s200G some-file
grep foo some-file
That one is difficult to work around. You could do it as (still with GNU
grep
):
find ~/Documents -type f -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} +
That converts sequences of NUL characters into one newline character prior to feeding the input to
grep
. That would cover for cases where the problem is due to sparse files.
You could optimise it by doing it only for large files:
find ~/Documents -type f ( -size -100M -exec
grep -He Milledgeville {} + -o -exec sh -c 'for i do
tr -s "" "n" < "$i" | grep --label="$i" -He "$0"
done' Milledgeville {} + )
If the files are not sparse and you have a version of GNU
grep
prior to2.6
, you can use the--mmap
option. The lines will be mmapped in memory as opposed to copied there, which means the system can always reclaim the memory by paging out the pages to the file. That option was removed in GNUgrep
2.6
edited Sep 15 '13 at 8:52
answered Sep 10 '13 at 11:26
Stéphane ChazelasStéphane Chazelas
304k57573927
304k57573927
Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: lists.freebsd.org/pipermail/freebsd-current/2010-August/…
– Godric Seer
Sep 10 '13 at 12:32
4
@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM.
– Stéphane Chazelas
Sep 10 '13 at 12:44
1
@GodricSeer, well no. If lines are all small,grep
can discard the buffers it has processed so far. You cangrep
the output ofyes
indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines.
– Stéphane Chazelas
Sep 10 '13 at 12:51
3
The GNU grep--null-data
option may also be useful here. It forces the use of NUL instead of newline as an input line terminator.
– iruvar
Sep 16 '13 at 15:27
1
@1_CR, good point, though that also sets the output line terminator to NUL.
– Stéphane Chazelas
Sep 16 '13 at 15:35
|
show 6 more comments
Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: lists.freebsd.org/pipermail/freebsd-current/2010-August/…
– Godric Seer
Sep 10 '13 at 12:32
4
@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM.
– Stéphane Chazelas
Sep 10 '13 at 12:44
1
@GodricSeer, well no. If lines are all small,grep
can discard the buffers it has processed so far. You cangrep
the output ofyes
indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines.
– Stéphane Chazelas
Sep 10 '13 at 12:51
3
The GNU grep--null-data
option may also be useful here. It forces the use of NUL instead of newline as an input line terminator.
– iruvar
Sep 16 '13 at 15:27
1
@1_CR, good point, though that also sets the output line terminator to NUL.
– Stéphane Chazelas
Sep 16 '13 at 15:35
Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: lists.freebsd.org/pipermail/freebsd-current/2010-August/…
– Godric Seer
Sep 10 '13 at 12:32
Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: lists.freebsd.org/pipermail/freebsd-current/2010-August/…
– Godric Seer
Sep 10 '13 at 12:32
4
4
@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM.
– Stéphane Chazelas
Sep 10 '13 at 12:44
@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM.
– Stéphane Chazelas
Sep 10 '13 at 12:44
1
1
@GodricSeer, well no. If lines are all small,
grep
can discard the buffers it has processed so far. You can grep
the output of yes
indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines.– Stéphane Chazelas
Sep 10 '13 at 12:51
@GodricSeer, well no. If lines are all small,
grep
can discard the buffers it has processed so far. You can grep
the output of yes
indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines.– Stéphane Chazelas
Sep 10 '13 at 12:51
3
3
The GNU grep
--null-data
option may also be useful here. It forces the use of NUL instead of newline as an input line terminator.– iruvar
Sep 16 '13 at 15:27
The GNU grep
--null-data
option may also be useful here. It forces the use of NUL instead of newline as an input line terminator.– iruvar
Sep 16 '13 at 15:27
1
1
@1_CR, good point, though that also sets the output line terminator to NUL.
– Stéphane Chazelas
Sep 16 '13 at 15:35
@1_CR, good point, though that also sets the output line terminator to NUL.
– Stéphane Chazelas
Sep 16 '13 at 15:35
|
show 6 more comments
I usually do
find ~/Documents | xargs grep -ne 'expression'
I tried a bunch of methods, and found this to be the fastest. Note that this doesn't handle files with spaces the file name very well. If you know this is the case and have a GNU version of grep, you can use:
find ~/Documents -print0 | xargs -0 grep -ne 'expression'
If not you can use:
find ~/Documents -exec grep -ne 'expression' "{}" ;
Which will exec
a grep for every file.
This will break on files with spaces.
– Chris Down
Sep 10 '13 at 11:04
Hmm, that is true.
– Kotte
Sep 10 '13 at 11:08
You can get around that withfind -print0 | xargs -0 grep -ne 'expression'
– Drav Sloan
Sep 10 '13 at 11:09
@ChrisDown rather a non-protable solution than a broken-portable solution.
– reto
Sep 10 '13 at 16:41
@ChrisDown Most major unices have adoptedfind -print0
andxargs -0
by now: all three BSD, MINIX 3, Solaris 11, …
– Gilles
Sep 10 '13 at 21:21
|
show 1 more comment
I usually do
find ~/Documents | xargs grep -ne 'expression'
I tried a bunch of methods, and found this to be the fastest. Note that this doesn't handle files with spaces the file name very well. If you know this is the case and have a GNU version of grep, you can use:
find ~/Documents -print0 | xargs -0 grep -ne 'expression'
If not you can use:
find ~/Documents -exec grep -ne 'expression' "{}" ;
Which will exec
a grep for every file.
This will break on files with spaces.
– Chris Down
Sep 10 '13 at 11:04
Hmm, that is true.
– Kotte
Sep 10 '13 at 11:08
You can get around that withfind -print0 | xargs -0 grep -ne 'expression'
– Drav Sloan
Sep 10 '13 at 11:09
@ChrisDown rather a non-protable solution than a broken-portable solution.
– reto
Sep 10 '13 at 16:41
@ChrisDown Most major unices have adoptedfind -print0
andxargs -0
by now: all three BSD, MINIX 3, Solaris 11, …
– Gilles
Sep 10 '13 at 21:21
|
show 1 more comment
I usually do
find ~/Documents | xargs grep -ne 'expression'
I tried a bunch of methods, and found this to be the fastest. Note that this doesn't handle files with spaces the file name very well. If you know this is the case and have a GNU version of grep, you can use:
find ~/Documents -print0 | xargs -0 grep -ne 'expression'
If not you can use:
find ~/Documents -exec grep -ne 'expression' "{}" ;
Which will exec
a grep for every file.
I usually do
find ~/Documents | xargs grep -ne 'expression'
I tried a bunch of methods, and found this to be the fastest. Note that this doesn't handle files with spaces the file name very well. If you know this is the case and have a GNU version of grep, you can use:
find ~/Documents -print0 | xargs -0 grep -ne 'expression'
If not you can use:
find ~/Documents -exec grep -ne 'expression' "{}" ;
Which will exec
a grep for every file.
edited Sep 10 '13 at 11:37
Drav Sloan
9,74523138
9,74523138
answered Sep 10 '13 at 10:46
KotteKotte
1,5071323
1,5071323
This will break on files with spaces.
– Chris Down
Sep 10 '13 at 11:04
Hmm, that is true.
– Kotte
Sep 10 '13 at 11:08
You can get around that withfind -print0 | xargs -0 grep -ne 'expression'
– Drav Sloan
Sep 10 '13 at 11:09
@ChrisDown rather a non-protable solution than a broken-portable solution.
– reto
Sep 10 '13 at 16:41
@ChrisDown Most major unices have adoptedfind -print0
andxargs -0
by now: all three BSD, MINIX 3, Solaris 11, …
– Gilles
Sep 10 '13 at 21:21
|
show 1 more comment
This will break on files with spaces.
– Chris Down
Sep 10 '13 at 11:04
Hmm, that is true.
– Kotte
Sep 10 '13 at 11:08
You can get around that withfind -print0 | xargs -0 grep -ne 'expression'
– Drav Sloan
Sep 10 '13 at 11:09
@ChrisDown rather a non-protable solution than a broken-portable solution.
– reto
Sep 10 '13 at 16:41
@ChrisDown Most major unices have adoptedfind -print0
andxargs -0
by now: all three BSD, MINIX 3, Solaris 11, …
– Gilles
Sep 10 '13 at 21:21
This will break on files with spaces.
– Chris Down
Sep 10 '13 at 11:04
This will break on files with spaces.
– Chris Down
Sep 10 '13 at 11:04
Hmm, that is true.
– Kotte
Sep 10 '13 at 11:08
Hmm, that is true.
– Kotte
Sep 10 '13 at 11:08
You can get around that with
find -print0 | xargs -0 grep -ne 'expression'
– Drav Sloan
Sep 10 '13 at 11:09
You can get around that with
find -print0 | xargs -0 grep -ne 'expression'
– Drav Sloan
Sep 10 '13 at 11:09
@ChrisDown rather a non-protable solution than a broken-portable solution.
– reto
Sep 10 '13 at 16:41
@ChrisDown rather a non-protable solution than a broken-portable solution.
– reto
Sep 10 '13 at 16:41
@ChrisDown Most major unices have adopted
find -print0
and xargs -0
by now: all three BSD, MINIX 3, Solaris 11, …– Gilles
Sep 10 '13 at 21:21
@ChrisDown Most major unices have adopted
find -print0
and xargs -0
by now: all three BSD, MINIX 3, Solaris 11, …– Gilles
Sep 10 '13 at 21:21
|
show 1 more comment
I can think of a few ways to get around this:
Instead of grepping all files at once, do one file at a time. Example:
find /Documents -type f -exec grep -H Milledgeville "{}" ;
If you only need to know which files contain the words, do
grep -l
instead. Since grep will there stop searching after the first hit, it won't have to keep reading any huge files
If you do want the actual text as well, you could string two separate greps along:
for file in $( grep -Rl Milledgeville /Documents ); do grep -H Milledgeville "$file"; done
The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, sincegrep
outputs using a delimiter that is legal in file names). You also need to quote$file
.
– Chris Down
Sep 10 '13 at 11:05
The latter example suffers with the issue of file names having newline or whitespace in them, (it will causefor
to process the file as two arguments)
– Drav Sloan
Sep 10 '13 at 11:12
@DravSloan Your edit, while an improvement, still breaks on legal file names.
– Chris Down
Sep 10 '13 at 11:19
1
Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files).
– Drav Sloan
Sep 10 '13 at 11:34
Corrections of his -> her, my apologies Jenny :/
– Drav Sloan
Sep 10 '13 at 11:38
add a comment |
I can think of a few ways to get around this:
Instead of grepping all files at once, do one file at a time. Example:
find /Documents -type f -exec grep -H Milledgeville "{}" ;
If you only need to know which files contain the words, do
grep -l
instead. Since grep will there stop searching after the first hit, it won't have to keep reading any huge files
If you do want the actual text as well, you could string two separate greps along:
for file in $( grep -Rl Milledgeville /Documents ); do grep -H Milledgeville "$file"; done
The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, sincegrep
outputs using a delimiter that is legal in file names). You also need to quote$file
.
– Chris Down
Sep 10 '13 at 11:05
The latter example suffers with the issue of file names having newline or whitespace in them, (it will causefor
to process the file as two arguments)
– Drav Sloan
Sep 10 '13 at 11:12
@DravSloan Your edit, while an improvement, still breaks on legal file names.
– Chris Down
Sep 10 '13 at 11:19
1
Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files).
– Drav Sloan
Sep 10 '13 at 11:34
Corrections of his -> her, my apologies Jenny :/
– Drav Sloan
Sep 10 '13 at 11:38
add a comment |
I can think of a few ways to get around this:
Instead of grepping all files at once, do one file at a time. Example:
find /Documents -type f -exec grep -H Milledgeville "{}" ;
If you only need to know which files contain the words, do
grep -l
instead. Since grep will there stop searching after the first hit, it won't have to keep reading any huge files
If you do want the actual text as well, you could string two separate greps along:
for file in $( grep -Rl Milledgeville /Documents ); do grep -H Milledgeville "$file"; done
I can think of a few ways to get around this:
Instead of grepping all files at once, do one file at a time. Example:
find /Documents -type f -exec grep -H Milledgeville "{}" ;
If you only need to know which files contain the words, do
grep -l
instead. Since grep will there stop searching after the first hit, it won't have to keep reading any huge files
If you do want the actual text as well, you could string two separate greps along:
for file in $( grep -Rl Milledgeville /Documents ); do grep -H Milledgeville "$file"; done
edited Sep 10 '13 at 11:31
Stéphane Chazelas
304k57573927
304k57573927
answered Sep 10 '13 at 9:05
Jenny DJenny D
10.6k22746
10.6k22746
The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, sincegrep
outputs using a delimiter that is legal in file names). You also need to quote$file
.
– Chris Down
Sep 10 '13 at 11:05
The latter example suffers with the issue of file names having newline or whitespace in them, (it will causefor
to process the file as two arguments)
– Drav Sloan
Sep 10 '13 at 11:12
@DravSloan Your edit, while an improvement, still breaks on legal file names.
– Chris Down
Sep 10 '13 at 11:19
1
Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files).
– Drav Sloan
Sep 10 '13 at 11:34
Corrections of his -> her, my apologies Jenny :/
– Drav Sloan
Sep 10 '13 at 11:38
add a comment |
The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, sincegrep
outputs using a delimiter that is legal in file names). You also need to quote$file
.
– Chris Down
Sep 10 '13 at 11:05
The latter example suffers with the issue of file names having newline or whitespace in them, (it will causefor
to process the file as two arguments)
– Drav Sloan
Sep 10 '13 at 11:12
@DravSloan Your edit, while an improvement, still breaks on legal file names.
– Chris Down
Sep 10 '13 at 11:19
1
Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files).
– Drav Sloan
Sep 10 '13 at 11:34
Corrections of his -> her, my apologies Jenny :/
– Drav Sloan
Sep 10 '13 at 11:38
The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, since
grep
outputs using a delimiter that is legal in file names). You also need to quote $file
.– Chris Down
Sep 10 '13 at 11:05
The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, since
grep
outputs using a delimiter that is legal in file names). You also need to quote $file
.– Chris Down
Sep 10 '13 at 11:05
The latter example suffers with the issue of file names having newline or whitespace in them, (it will cause
for
to process the file as two arguments)– Drav Sloan
Sep 10 '13 at 11:12
The latter example suffers with the issue of file names having newline or whitespace in them, (it will cause
for
to process the file as two arguments)– Drav Sloan
Sep 10 '13 at 11:12
@DravSloan Your edit, while an improvement, still breaks on legal file names.
– Chris Down
Sep 10 '13 at 11:19
@DravSloan Your edit, while an improvement, still breaks on legal file names.
– Chris Down
Sep 10 '13 at 11:19
1
1
Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files).
– Drav Sloan
Sep 10 '13 at 11:34
Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files).
– Drav Sloan
Sep 10 '13 at 11:34
Corrections of his -> her, my apologies Jenny :/
– Drav Sloan
Sep 10 '13 at 11:38
Corrections of his -> her, my apologies Jenny :/
– Drav Sloan
Sep 10 '13 at 11:38
add a comment |
I'm grepping a 6TB disk to search for lost data, and got the memory exhausted -error. This should work for other files too.
The solution we came up with was to read the disk in chunks by using dd, and grepping the chunks. This is the code (big-grep.sh):
#problem: grep gives "memory exhausted" error on 6TB disks
#solution: read it on parts
FILE=$1
MATCH=$2
#TODO this is still incomplete, need to get some way to read the size of the file and how many times to read it (so that modulo is 0)
BYTES=732565323
SIZE=6001175126016
COUNT=8192
#BYTES=$(expr 4 * 1024)
#COUNT=$(expr $SIZE / $BYTES)
#TODO didn't get the variable to work for some readon
#for I in {1..$COUNT}; do
for I in {0..8192}; do
dd bs=$BYTES skip=$I count=1 if=$FILE status=none |buffer |grep -UF -a --context 6 "$MATCH"
done
Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match.
– Kusalananda
Jan 28 at 19:59
add a comment |
I'm grepping a 6TB disk to search for lost data, and got the memory exhausted -error. This should work for other files too.
The solution we came up with was to read the disk in chunks by using dd, and grepping the chunks. This is the code (big-grep.sh):
#problem: grep gives "memory exhausted" error on 6TB disks
#solution: read it on parts
FILE=$1
MATCH=$2
#TODO this is still incomplete, need to get some way to read the size of the file and how many times to read it (so that modulo is 0)
BYTES=732565323
SIZE=6001175126016
COUNT=8192
#BYTES=$(expr 4 * 1024)
#COUNT=$(expr $SIZE / $BYTES)
#TODO didn't get the variable to work for some readon
#for I in {1..$COUNT}; do
for I in {0..8192}; do
dd bs=$BYTES skip=$I count=1 if=$FILE status=none |buffer |grep -UF -a --context 6 "$MATCH"
done
Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match.
– Kusalananda
Jan 28 at 19:59
add a comment |
I'm grepping a 6TB disk to search for lost data, and got the memory exhausted -error. This should work for other files too.
The solution we came up with was to read the disk in chunks by using dd, and grepping the chunks. This is the code (big-grep.sh):
#problem: grep gives "memory exhausted" error on 6TB disks
#solution: read it on parts
FILE=$1
MATCH=$2
#TODO this is still incomplete, need to get some way to read the size of the file and how many times to read it (so that modulo is 0)
BYTES=732565323
SIZE=6001175126016
COUNT=8192
#BYTES=$(expr 4 * 1024)
#COUNT=$(expr $SIZE / $BYTES)
#TODO didn't get the variable to work for some readon
#for I in {1..$COUNT}; do
for I in {0..8192}; do
dd bs=$BYTES skip=$I count=1 if=$FILE status=none |buffer |grep -UF -a --context 6 "$MATCH"
done
I'm grepping a 6TB disk to search for lost data, and got the memory exhausted -error. This should work for other files too.
The solution we came up with was to read the disk in chunks by using dd, and grepping the chunks. This is the code (big-grep.sh):
#problem: grep gives "memory exhausted" error on 6TB disks
#solution: read it on parts
FILE=$1
MATCH=$2
#TODO this is still incomplete, need to get some way to read the size of the file and how many times to read it (so that modulo is 0)
BYTES=732565323
SIZE=6001175126016
COUNT=8192
#BYTES=$(expr 4 * 1024)
#COUNT=$(expr $SIZE / $BYTES)
#TODO didn't get the variable to work for some readon
#for I in {1..$COUNT}; do
for I in {0..8192}; do
dd bs=$BYTES skip=$I count=1 if=$FILE status=none |buffer |grep -UF -a --context 6 "$MATCH"
done
answered Jan 28 at 19:35
PHZ.fi-PharazonPHZ.fi-Pharazon
214
214
Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match.
– Kusalananda
Jan 28 at 19:59
add a comment |
Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match.
– Kusalananda
Jan 28 at 19:59
Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match.
– Kusalananda
Jan 28 at 19:59
Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match.
– Kusalananda
Jan 28 at 19:59
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f90036%2fgrep-memory-exhausted%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown