Will we ever “find” files whose names are changed by “find”? Why not?

While answering an older question it struck me that it seems find, in the following example, potentially would process files multiple times:

find dir -type f -name '*.txt' 

    -exec sh -c 'mv "$1" "${1%.txt}_hello.txt"' sh {} ';'

or the more efficient

find dir -type f -name '*.txt' 

    -exec sh -c 'for n; do mv "$n" "${n%.txt}_hello.txt"; done' sh {} +

The command finds .txt files and changes their filename suffix from .txt to _hello.txt.

While doing so, the directories will start accumulating new files whose names matches the *.txt pattern, namely these _hello.txt files.

Question: Why are they not actually processed by find? Because in my experience they aren't, and we don't want them to be either as it would introduce a sort of infinite loop. This is also the case with mv replaced by cp, by the way.

The POSIX standard says (my emphasis)

If a file is removed from or added to the directory hierarchy being searched it is unspecified whether or not find includes that file in its search.

Since it's unspecified whether new files will be included, maybe a safer approach would be

find dir -type d -exec sh -c '

    for n in "$1"/*.txt; do

        test -f "$n" && mv "$n" "${n%.txt}_hello.txt"

    done' sh {} ';'

Here, we don't look for files but for directories, and the for loop of the internal sh script evaluates its range once before the first iteration, so we don't have the same potential issue.

The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual.

edited Jul 3 '18 at 9:15

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

it is unspecified whether or not - I wonder why do the authors of find utility not get concerned of such tricky behavior
– RomanPerekhrest
Feb 13 '18 at 19:07

1

readdir has essentially the same specification. I would expect it to be potentially filesystem-specific, even (loading one block of directory entries at a time is pretty reasonable).
– Michael Homer
Feb 13 '18 at 19:12

@RomanPerekhrest, it's unspecified in the POSIX specification. That doesn't mean the authors of find utility don't get concerned with the behavior. It means it's left to the authors of any given implementation of the find utility how to handle that case, rather than being specified. (If that seems unclear I recommend you fully clear up the words "specification" and "implementation" as they apply to software.)
– Wildcard
Feb 13 '18 at 21:51

@Wildcard, words "specification" and "implementation" are clear. A cite from the question: The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual. So, that's bad ... and I shouldn't be compelled to like that
– RomanPerekhrest
Feb 13 '18 at 22:01

@RomanPerekhrest, ah, I see. Your first comment quoted the POSIX spec so I wasn't sure. I wouldn't say it's bad, though (that the find devs don't mention it). As described in ikkachu's answer, that's a filesystem level behavior, not even specified in the readdir() system call spec. There are going to be race conditions no matter what.
– Wildcard
Feb 13 '18 at 22:04

|
show 1 more comment

While answering an older question it struck me that it seems find, in the following example, potentially would process files multiple times:

find dir -type f -name '*.txt' 

    -exec sh -c 'mv "$1" "${1%.txt}_hello.txt"' sh {} ';'

or the more efficient

find dir -type f -name '*.txt' 

    -exec sh -c 'for n; do mv "$n" "${n%.txt}_hello.txt"; done' sh {} +

The command finds .txt files and changes their filename suffix from .txt to _hello.txt.

While doing so, the directories will start accumulating new files whose names matches the *.txt pattern, namely these _hello.txt files.

The POSIX standard says (my emphasis)

If a file is removed from or added to the directory hierarchy being searched it is unspecified whether or not find includes that file in its search.

Since it's unspecified whether new files will be included, maybe a safer approach would be

find dir -type d -exec sh -c '

    for n in "$1"/*.txt; do

        test -f "$n" && mv "$n" "${n%.txt}_hello.txt"

    done' sh {} ';'

Here, we don't look for files but for directories, and the for loop of the internal sh script evaluates its range once before the first iteration, so we don't have the same potential issue.

The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual.

edited Jul 3 '18 at 9:15

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

it is unspecified whether or not - I wonder why do the authors of find utility not get concerned of such tricky behavior
– RomanPerekhrest
Feb 13 '18 at 19:07

1

readdir has essentially the same specification. I would expect it to be potentially filesystem-specific, even (loading one block of directory entries at a time is pretty reasonable).
– Michael Homer
Feb 13 '18 at 19:12

@RomanPerekhrest, it's unspecified in the POSIX specification. That doesn't mean the authors of find utility don't get concerned with the behavior. It means it's left to the authors of any given implementation of the find utility how to handle that case, rather than being specified. (If that seems unclear I recommend you fully clear up the words "specification" and "implementation" as they apply to software.)
– Wildcard
Feb 13 '18 at 21:51

@Wildcard, words "specification" and "implementation" are clear. A cite from the question: The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual. So, that's bad ... and I shouldn't be compelled to like that
– RomanPerekhrest
Feb 13 '18 at 22:01

@RomanPerekhrest, ah, I see. Your first comment quoted the POSIX spec so I wasn't sure. I wouldn't say it's bad, though (that the find devs don't mention it). As described in ikkachu's answer, that's a filesystem level behavior, not even specified in the readdir() system call spec. There are going to be race conditions no matter what.
– Wildcard
Feb 13 '18 at 22:04

|
show 1 more comment

While answering an older question it struck me that it seems find, in the following example, potentially would process files multiple times:

find dir -type f -name '*.txt' 

    -exec sh -c 'mv "$1" "${1%.txt}_hello.txt"' sh {} ';'

or the more efficient

find dir -type f -name '*.txt' 

    -exec sh -c 'for n; do mv "$n" "${n%.txt}_hello.txt"; done' sh {} +

The command finds .txt files and changes their filename suffix from .txt to _hello.txt.

While doing so, the directories will start accumulating new files whose names matches the *.txt pattern, namely these _hello.txt files.

The POSIX standard says (my emphasis)

If a file is removed from or added to the directory hierarchy being searched it is unspecified whether or not find includes that file in its search.

Since it's unspecified whether new files will be included, maybe a safer approach would be

find dir -type d -exec sh -c '

    for n in "$1"/*.txt; do

        test -f "$n" && mv "$n" "${n%.txt}_hello.txt"

    done' sh {} ';'

Here, we don't look for files but for directories, and the for loop of the internal sh script evaluates its range once before the first iteration, so we don't have the same potential issue.

The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual.

edited Jul 3 '18 at 9:15

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

While answering an older question it struck me that it seems find, in the following example, potentially would process files multiple times:

find dir -type f -name '*.txt' 

    -exec sh -c 'mv "$1" "${1%.txt}_hello.txt"' sh {} ';'

or the more efficient

find dir -type f -name '*.txt' 

    -exec sh -c 'for n; do mv "$n" "${n%.txt}_hello.txt"; done' sh {} +

The command finds .txt files and changes their filename suffix from .txt to _hello.txt.

While doing so, the directories will start accumulating new files whose names matches the *.txt pattern, namely these _hello.txt files.

The POSIX standard says (my emphasis)

If a file is removed from or added to the directory hierarchy being searched it is unspecified whether or not find includes that file in its search.

Since it's unspecified whether new files will be included, maybe a safer approach would be

find dir -type d -exec sh -c '

    for n in "$1"/*.txt; do

        test -f "$n" && mv "$n" "${n%.txt}_hello.txt"

    done' sh {} ';'

Here, we don't look for files but for directories, and the for loop of the internal sh script evaluates its range once before the first iteration, so we don't have the same potential issue.

The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual.

find

edited Jul 3 '18 at 9:15

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

edited Jul 3 '18 at 9:15

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

edited Jul 3 '18 at 9:15

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

asked Feb 13 '18 at 18:56

Kusalananda

123k16232377

it is unspecified whether or not - I wonder why do the authors of find utility not get concerned of such tricky behavior
– RomanPerekhrest
Feb 13 '18 at 19:07

1

readdir has essentially the same specification. I would expect it to be potentially filesystem-specific, even (loading one block of directory entries at a time is pretty reasonable).
– Michael Homer
Feb 13 '18 at 19:12

@RomanPerekhrest, it's unspecified in the POSIX specification. That doesn't mean the authors of find utility don't get concerned with the behavior. It means it's left to the authors of any given implementation of the find utility how to handle that case, rather than being specified. (If that seems unclear I recommend you fully clear up the words "specification" and "implementation" as they apply to software.)
– Wildcard
Feb 13 '18 at 21:51

@Wildcard, words "specification" and "implementation" are clear. A cite from the question: The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual. So, that's bad ... and I shouldn't be compelled to like that
– RomanPerekhrest
Feb 13 '18 at 22:01

@RomanPerekhrest, ah, I see. Your first comment quoted the POSIX spec so I wasn't sure. I wouldn't say it's bad, though (that the find devs don't mention it). As described in ikkachu's answer, that's a filesystem level behavior, not even specified in the readdir() system call spec. There are going to be race conditions no matter what.
– Wildcard
Feb 13 '18 at 22:04

|
show 1 more comment

it is unspecified whether or not - I wonder why do the authors of find utility not get concerned of such tricky behavior
– RomanPerekhrest
Feb 13 '18 at 19:07

1

readdir has essentially the same specification. I would expect it to be potentially filesystem-specific, even (loading one block of directory entries at a time is pretty reasonable).
– Michael Homer
Feb 13 '18 at 19:12

@RomanPerekhrest, it's unspecified in the POSIX specification. That doesn't mean the authors of find utility don't get concerned with the behavior. It means it's left to the authors of any given implementation of the find utility how to handle that case, rather than being specified. (If that seems unclear I recommend you fully clear up the words "specification" and "implementation" as they apply to software.)
– Wildcard
Feb 13 '18 at 21:51

@Wildcard, words "specification" and "implementation" are clear. A cite from the question: The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual. So, that's bad ... and I shouldn't be compelled to like that
– RomanPerekhrest
Feb 13 '18 at 22:01

@RomanPerekhrest, ah, I see. Your first comment quoted the POSIX spec so I wasn't sure. I wouldn't say it's bad, though (that the find devs don't mention it). As described in ikkachu's answer, that's a filesystem level behavior, not even specified in the readdir() system call spec. There are going to be race conditions no matter what.
– Wildcard
Feb 13 '18 at 22:04

it is unspecified whether or not - I wonder why do the authors of find utility not get concerned of such tricky behavior
– RomanPerekhrest
Feb 13 '18 at 19:07

readdir has essentially the same specification. I would expect it to be potentially filesystem-specific, even (loading one block of directory entries at a time is pretty reasonable).
– Michael Homer
Feb 13 '18 at 19:12

@RomanPerekhrest, it's unspecified in the POSIX specification. That doesn't mean the authors of find utility don't get concerned with the behavior. It means it's left to the authors of any given implementation of the find utility how to handle that case, rather than being specified. (If that seems unclear I recommend you fully clear up the words "specification" and "implementation" as they apply to software.)
– Wildcard
Feb 13 '18 at 21:51

@Wildcard, words "specification" and "implementation" are clear. A cite from the question: The GNU find manual does not explicitly say anything about this and neither does the OpenBSD find manual. So, that's bad ... and I shouldn't be compelled to like that
– RomanPerekhrest
Feb 13 '18 at 22:01

@RomanPerekhrest, ah, I see. Your first comment quoted the POSIX spec so I wasn't sure. I wouldn't say it's bad, though (that the find devs don't mention it). As described in ikkachu's answer, that's a filesystem level behavior, not even specified in the readdir() system call spec. There are going to be race conditions no matter what.
– Wildcard
Feb 13 '18 at 22:04

|
show 1 more comment

1 Answer
1

active

oldest

votes

Can find find files that were created while it was walking the directory?

In brief: Yes, but it depends on the implementation. It's probably best to write the conditions so that already processed files are ignored.

As mentioned, POSIX makes no guarantees either way, like it also makes no guarantees on the underlying readdir() system call:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

I tested the find on my Debian (GNU find, Debian package version 4.6.0+git+20161106-2). strace showed that it read the full directory before doing anything.

Browsing the source code a bit more makes it seem that GNU find uses parts of gnulib to read the directories, and there's this in gnulib/lib/fts.c (gl/lib/fts.c in the find tarball):

/* If possible (see max_entries, below), read no more than this many directory

   entries at a time.  Without this limit (i.e., when using non-NULL

   fts_compar), processing a directory with 4,000,000 entries requires ~1GiB

   of memory, and handling 64M entries would require 16GiB of memory.  */

#ifndef FTS_MAX_READDIR_ENTRIES

# define FTS_MAX_READDIR_ENTRIES 100000

#endif

I changed that limit to 100, and did

mkdir test; cd test; touch {0000..2999}.foo

find . -type f -exec sh -c 'mv "$1" "${1%.foo}.barbarbarbarbarbarbarbar"' sh {} ; -print

resulting in such hilarious results as this file, which got renamed five times:

1046.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar

Obviously, a very large directory (more than 100 000 entries) would be needed to trigger that effect on a default build of GNU find, but a trivial readdir+process loop without caching would be even more vulnerable.

In theory, if the OS always added renamed files last in the order where readdir() returned them, a simple implementation like that could even fall into an endless loop.

On Linux, readdir() in the C library is implemented through the getdents() system call, which already returns multiple directory entries at one go. Which means that later calls to readdir() might return files that were already removed, but for very small directories you'd effectively get a snapshot of the starting state. I don't know about other systems.

In the above test, I did the renames to a longer file name on purpose: to prevent the file name from being overwritten in-place. No matter, the same test on a same-length rename also did double and triple renames. If and how this matters would of course depend on the filesystem internals.

Considering all this, it's probably prudent to avoid the whole issue by making the find expression not match the files that were already processed. That is, to add -name "*.foo" in my example or ! -name "*_hello.txt" to the command in the question.

edited 2 days ago

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

This would seem to indicate that the default GNU find indeed would have issues with directories holding more than 100K files (or maybe entries of any type?) and that my precaution is not as silly as I first thought. (well, having 100K files in a directory is in itself a bit silly) I will look for similar code in my native OpenBSD find as soon as I get a chance.
– Kusalananda
Feb 13 '18 at 21:06

Interesting. Better be more careful with find regexps in the future.
– Rui F Ribeiro
Feb 13 '18 at 21:21

Actually, I wonder if the alternative approach would work on such a directory... expanding a glob to more than 100K pathnames? That would have its own issues.
– Kusalananda
Feb 13 '18 at 22:22

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f423941%2fwill-we-ever-find-files-whose-names-are-changed-by-find-why-not%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Can find find files that were created while it was walking the directory?

In brief: Yes, but it depends on the implementation. It's probably best to write the conditions so that already processed files are ignored.

As mentioned, POSIX makes no guarantees either way, like it also makes no guarantees on the underlying readdir() system call:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

I tested the find on my Debian (GNU find, Debian package version 4.6.0+git+20161106-2). strace showed that it read the full directory before doing anything.

Browsing the source code a bit more makes it seem that GNU find uses parts of gnulib to read the directories, and there's this in gnulib/lib/fts.c (gl/lib/fts.c in the find tarball):

/* If possible (see max_entries, below), read no more than this many directory

   entries at a time.  Without this limit (i.e., when using non-NULL

   fts_compar), processing a directory with 4,000,000 entries requires ~1GiB

   of memory, and handling 64M entries would require 16GiB of memory.  */

#ifndef FTS_MAX_READDIR_ENTRIES

# define FTS_MAX_READDIR_ENTRIES 100000

#endif

I changed that limit to 100, and did

mkdir test; cd test; touch {0000..2999}.foo

find . -type f -exec sh -c 'mv "$1" "${1%.foo}.barbarbarbarbarbarbarbar"' sh {} ; -print

resulting in such hilarious results as this file, which got renamed five times:

1046.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar

In theory, if the OS always added renamed files last in the order where readdir() returned them, a simple implementation like that could even fall into an endless loop.

edited 2 days ago

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

This would seem to indicate that the default GNU find indeed would have issues with directories holding more than 100K files (or maybe entries of any type?) and that my precaution is not as silly as I first thought. (well, having 100K files in a directory is in itself a bit silly) I will look for similar code in my native OpenBSD find as soon as I get a chance.
– Kusalananda
Feb 13 '18 at 21:06

Interesting. Better be more careful with find regexps in the future.
– Rui F Ribeiro
Feb 13 '18 at 21:21

Actually, I wonder if the alternative approach would work on such a directory... expanding a glob to more than 100K pathnames? That would have its own issues.
– Kusalananda
Feb 13 '18 at 22:22

add a comment |

Can find find files that were created while it was walking the directory?

In brief: Yes, but it depends on the implementation. It's probably best to write the conditions so that already processed files are ignored.

As mentioned, POSIX makes no guarantees either way, like it also makes no guarantees on the underlying readdir() system call:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

I tested the find on my Debian (GNU find, Debian package version 4.6.0+git+20161106-2). strace showed that it read the full directory before doing anything.

Browsing the source code a bit more makes it seem that GNU find uses parts of gnulib to read the directories, and there's this in gnulib/lib/fts.c (gl/lib/fts.c in the find tarball):

/* If possible (see max_entries, below), read no more than this many directory

   entries at a time.  Without this limit (i.e., when using non-NULL

   fts_compar), processing a directory with 4,000,000 entries requires ~1GiB

   of memory, and handling 64M entries would require 16GiB of memory.  */

#ifndef FTS_MAX_READDIR_ENTRIES

# define FTS_MAX_READDIR_ENTRIES 100000

#endif

I changed that limit to 100, and did

mkdir test; cd test; touch {0000..2999}.foo

find . -type f -exec sh -c 'mv "$1" "${1%.foo}.barbarbarbarbarbarbarbar"' sh {} ; -print

resulting in such hilarious results as this file, which got renamed five times:

1046.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar

In theory, if the OS always added renamed files last in the order where readdir() returned them, a simple implementation like that could even fall into an endless loop.

edited 2 days ago

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

This would seem to indicate that the default GNU find indeed would have issues with directories holding more than 100K files (or maybe entries of any type?) and that my precaution is not as silly as I first thought. (well, having 100K files in a directory is in itself a bit silly) I will look for similar code in my native OpenBSD find as soon as I get a chance.
– Kusalananda
Feb 13 '18 at 21:06

Interesting. Better be more careful with find regexps in the future.
– Rui F Ribeiro
Feb 13 '18 at 21:21

Actually, I wonder if the alternative approach would work on such a directory... expanding a glob to more than 100K pathnames? That would have its own issues.
– Kusalananda
Feb 13 '18 at 22:22

add a comment |

Can find find files that were created while it was walking the directory?

In brief: Yes, but it depends on the implementation. It's probably best to write the conditions so that already processed files are ignored.

As mentioned, POSIX makes no guarantees either way, like it also makes no guarantees on the underlying readdir() system call:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

I tested the find on my Debian (GNU find, Debian package version 4.6.0+git+20161106-2). strace showed that it read the full directory before doing anything.

Browsing the source code a bit more makes it seem that GNU find uses parts of gnulib to read the directories, and there's this in gnulib/lib/fts.c (gl/lib/fts.c in the find tarball):

/* If possible (see max_entries, below), read no more than this many directory

   entries at a time.  Without this limit (i.e., when using non-NULL

   fts_compar), processing a directory with 4,000,000 entries requires ~1GiB

   of memory, and handling 64M entries would require 16GiB of memory.  */

#ifndef FTS_MAX_READDIR_ENTRIES

# define FTS_MAX_READDIR_ENTRIES 100000

#endif

I changed that limit to 100, and did

mkdir test; cd test; touch {0000..2999}.foo

find . -type f -exec sh -c 'mv "$1" "${1%.foo}.barbarbarbarbarbarbarbar"' sh {} ; -print

resulting in such hilarious results as this file, which got renamed five times:

1046.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar

In theory, if the OS always added renamed files last in the order where readdir() returned them, a simple implementation like that could even fall into an endless loop.

edited 2 days ago

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

Can find find files that were created while it was walking the directory?

In brief: Yes, but it depends on the implementation. It's probably best to write the conditions so that already processed files are ignored.

As mentioned, POSIX makes no guarantees either way, like it also makes no guarantees on the underlying readdir() system call:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

I tested the find on my Debian (GNU find, Debian package version 4.6.0+git+20161106-2). strace showed that it read the full directory before doing anything.

Browsing the source code a bit more makes it seem that GNU find uses parts of gnulib to read the directories, and there's this in gnulib/lib/fts.c (gl/lib/fts.c in the find tarball):

/* If possible (see max_entries, below), read no more than this many directory

   entries at a time.  Without this limit (i.e., when using non-NULL

   fts_compar), processing a directory with 4,000,000 entries requires ~1GiB

   of memory, and handling 64M entries would require 16GiB of memory.  */

#ifndef FTS_MAX_READDIR_ENTRIES

# define FTS_MAX_READDIR_ENTRIES 100000

#endif

I changed that limit to 100, and did

mkdir test; cd test; touch {0000..2999}.foo

find . -type f -exec sh -c 'mv "$1" "${1%.foo}.barbarbarbarbarbarbarbar"' sh {} ; -print

resulting in such hilarious results as this file, which got renamed five times:

1046.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar.barbarbarbarbarbarbarbar

In theory, if the OS always added renamed files last in the order where readdir() returned them, a simple implementation like that could even fall into an endless loop.

edited 2 days ago

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

edited 2 days ago

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

answered Feb 13 '18 at 21:01

ilkkachu

56.4k784156

This would seem to indicate that the default GNU find indeed would have issues with directories holding more than 100K files (or maybe entries of any type?) and that my precaution is not as silly as I first thought. (well, having 100K files in a directory is in itself a bit silly) I will look for similar code in my native OpenBSD find as soon as I get a chance.
– Kusalananda
Feb 13 '18 at 21:06

Interesting. Better be more careful with find regexps in the future.
– Rui F Ribeiro
Feb 13 '18 at 21:21

Actually, I wonder if the alternative approach would work on such a directory... expanding a glob to more than 100K pathnames? That would have its own issues.
– Kusalananda
Feb 13 '18 at 22:22

add a comment |

This would seem to indicate that the default GNU find indeed would have issues with directories holding more than 100K files (or maybe entries of any type?) and that my precaution is not as silly as I first thought. (well, having 100K files in a directory is in itself a bit silly) I will look for similar code in my native OpenBSD find as soon as I get a chance.
– Kusalananda
Feb 13 '18 at 21:06

Interesting. Better be more careful with find regexps in the future.
– Rui F Ribeiro
Feb 13 '18 at 21:21

Actually, I wonder if the alternative approach would work on such a directory... expanding a glob to more than 100K pathnames? That would have its own issues.
– Kusalananda
Feb 13 '18 at 22:22

This would seem to indicate that the default GNU find indeed would have issues with directories holding more than 100K files (or maybe entries of any type?) and that my precaution is not as silly as I first thought. (well, having 100K files in a directory is in itself a bit silly) I will look for similar code in my native OpenBSD find as soon as I get a chance.
– Kusalananda
Feb 13 '18 at 21:06

Interesting. Better be more careful with find regexps in the future.
– Rui F Ribeiro
Feb 13 '18 at 21:21

Actually, I wonder if the alternative approach would work on such a directory... expanding a glob to more than 100K pathnames? That would have its own issues.
– Kusalananda
Feb 13 '18 at 22:22

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly