Copying Hundred of Thousands Files from Remote Directory to Another Remote Directory

I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.

My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).

Here is what I've tried so far:

scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to

But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.

Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.

edited Feb 8 at 13:07

terdon♦

131k32257436

asked Feb 8 at 12:24

xcode

1113

1

Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

– JStrahl
Feb 8 at 12:55

add a comment |

My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).

Here is what I've tried so far:

scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to

But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.

edited Feb 8 at 13:07

terdon♦

131k32257436

asked Feb 8 at 12:24

xcode

1113

1

Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

– JStrahl
Feb 8 at 12:55

add a comment |

My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).

Here is what I've tried so far:

scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to

But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.

edited Feb 8 at 13:07

terdon♦

131k32257436

asked Feb 8 at 12:24

xcode

1113

My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).

Here is what I've tried so far:

scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to

But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.

ssh rsync scp sftp

edited Feb 8 at 13:07

terdon♦

131k32257436

asked Feb 8 at 12:24

xcode

1113

edited Feb 8 at 13:07

terdon♦

131k32257436

asked Feb 8 at 12:24

xcode

1113

edited Feb 8 at 13:07

terdon♦

131k32257436

edited Feb 8 at 13:07

terdon♦

131k32257436

edited Feb 8 at 13:07

terdon♦

131k32257436

asked Feb 8 at 12:24

xcode

1113

asked Feb 8 at 12:24

xcode

1113

asked Feb 8 at 12:24

xcode

1113

1

Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

– JStrahl
Feb 8 at 12:55

add a comment |

1

Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

– JStrahl
Feb 8 at 12:55

Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

– JStrahl
Feb 8 at 12:55

add a comment |

3 Answers
3

active

oldest

votes

Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.

ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"

This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .

edited Feb 11 at 14:20

answered Feb 8 at 12:46

Bodo

1,583212

how to limit into let say 1000 first files found using find?

– xcode
Feb 8 at 14:22

@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

– Bodo
Feb 8 at 14:31

this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

– xcode
Feb 10 at 3:11

@xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

– Bodo
Feb 11 at 14:21

add a comment |

Try `locate`

It seems find is too slow for this application.

There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.

updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.

This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).

Usage

Create and next times update the database
```
sudo updatedb
```

Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.

I suggest two command lines, that you may modify, and later combine with scp or rsync.

You can limit the number of files with --limit

If you search only in /path/from/ and not in sub-directories
```
locate --regex --limit 1000 '/path/from/A.*random.*'
```
If you search not in /path/from/ itself but in its sub-directories
```
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
```
See man locate for more details.

General comments

Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),

or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.

Maybe you can also remove some files (for example when they are getting too old).

edited Feb 12 at 10:51

answered Feb 12 at 9:00

sudodus

1,54837

add a comment |

tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

transfer this one file however you like

once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.

answered Feb 12 at 15:00

ron

1,0741815

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499467%2fcopying-hundred-of-thousands-files-from-remote-directory-to-another-remote-direc%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.

ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"

This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .

edited Feb 11 at 14:20

answered Feb 8 at 12:46

Bodo

1,583212

how to limit into let say 1000 first files found using find?

– xcode
Feb 8 at 14:22

@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

– Bodo
Feb 8 at 14:31

this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

– xcode
Feb 10 at 3:11

@xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

– Bodo
Feb 11 at 14:21

add a comment |

Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.

ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"

This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .

edited Feb 11 at 14:20

answered Feb 8 at 12:46

Bodo

1,583212

how to limit into let say 1000 first files found using find?

– xcode
Feb 8 at 14:22

@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

– Bodo
Feb 8 at 14:31

this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

– xcode
Feb 10 at 3:11

@xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

– Bodo
Feb 11 at 14:21

add a comment |

Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.

ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"

This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .

edited Feb 11 at 14:20

answered Feb 8 at 12:46

Bodo

1,583212

Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.

ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"

This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .

edited Feb 11 at 14:20

answered Feb 8 at 12:46

Bodo

1,583212

edited Feb 11 at 14:20

answered Feb 8 at 12:46

Bodo

1,583212

answered Feb 8 at 12:46

Bodo

1,583212

answered Feb 8 at 12:46

Bodo

1,583212

how to limit into let say 1000 first files found using find?

– xcode
Feb 8 at 14:22

@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

– Bodo
Feb 8 at 14:31

this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

– xcode
Feb 10 at 3:11

@xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

– Bodo
Feb 11 at 14:21

add a comment |

how to limit into let say 1000 first files found using find?

– xcode
Feb 8 at 14:22

@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

– Bodo
Feb 8 at 14:31

this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

– xcode
Feb 10 at 3:11

@xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

– Bodo
Feb 11 at 14:21

how to limit into let say 1000 first files found using find?

– xcode
Feb 8 at 14:22

@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

– Bodo
Feb 8 at 14:31

this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

– xcode
Feb 10 at 3:11

@xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

– Bodo
Feb 11 at 14:21

add a comment |

Try `locate`

It seems find is too slow for this application.

There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.

updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.

This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).

Usage

Create and next times update the database
```
sudo updatedb
```

Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.

I suggest two command lines, that you may modify, and later combine with scp or rsync.

You can limit the number of files with --limit

If you search only in /path/from/ and not in sub-directories
```
locate --regex --limit 1000 '/path/from/A.*random.*'
```
If you search not in /path/from/ itself but in its sub-directories
```
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
```
See man locate for more details.

General comments

Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),

or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.

Maybe you can also remove some files (for example when they are getting too old).

edited Feb 12 at 10:51

answered Feb 12 at 9:00

sudodus

1,54837

add a comment |

Try `locate`

It seems find is too slow for this application.

There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.

updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.

This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).

Usage

Create and next times update the database
```
sudo updatedb
```

Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.

I suggest two command lines, that you may modify, and later combine with scp or rsync.

You can limit the number of files with --limit

If you search only in /path/from/ and not in sub-directories
```
locate --regex --limit 1000 '/path/from/A.*random.*'
```
If you search not in /path/from/ itself but in its sub-directories
```
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
```
See man locate for more details.

General comments

Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),

or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.

Maybe you can also remove some files (for example when they are getting too old).

edited Feb 12 at 10:51

answered Feb 12 at 9:00

sudodus

1,54837

add a comment |

Try `locate`

It seems find is too slow for this application.

There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.

updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.

This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).

Usage

Create and next times update the database
```
sudo updatedb
```

Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.

I suggest two command lines, that you may modify, and later combine with scp or rsync.

You can limit the number of files with --limit

If you search only in /path/from/ and not in sub-directories
```
locate --regex --limit 1000 '/path/from/A.*random.*'
```
If you search not in /path/from/ itself but in its sub-directories
```
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
```
See man locate for more details.

General comments

Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),

or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.

Maybe you can also remove some files (for example when they are getting too old).

edited Feb 12 at 10:51

answered Feb 12 at 9:00

sudodus

1,54837

Try `locate`

It seems find is too slow for this application.

There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.

updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.

This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).

Usage

Create and next times update the database
```
sudo updatedb
```

Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.

I suggest two command lines, that you may modify, and later combine with scp or rsync.

You can limit the number of files with --limit

If you search only in /path/from/ and not in sub-directories
```
locate --regex --limit 1000 '/path/from/A.*random.*'
```
If you search not in /path/from/ itself but in its sub-directories
```
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
```
See man locate for more details.

General comments

Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),

or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.

Maybe you can also remove some files (for example when they are getting too old).

edited Feb 12 at 10:51

answered Feb 12 at 9:00

sudodus

1,54837

edited Feb 12 at 10:51

answered Feb 12 at 9:00

sudodus

1,54837

answered Feb 12 at 9:00

sudodus

1,54837

answered Feb 12 at 9:00

sudodus

1,54837

add a comment |

tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

transfer this one file however you like

once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.

answered Feb 12 at 15:00

ron

1,0741815

add a comment |

tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

transfer this one file however you like

once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.

answered Feb 12 at 15:00

ron

1,0741815

add a comment |

tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

transfer this one file however you like

once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.

answered Feb 12 at 15:00

ron

1,0741815

tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

transfer this one file however you like

once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.

answered Feb 12 at 15:00

ron

1,0741815

answered Feb 12 at 15:00

ron

1,0741815

answered Feb 12 at 15:00

ron

1,0741815

answered Feb 12 at 15:00

ron

1,0741815

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly