Copying Hundred of Thousands Files from Remote Directory to Another Remote Directory
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from
(an Ubuntu machine) that has millions of tiny .txt
small size files, doing like simple ls
command or even opening the /path/from/
directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to
).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/
directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync
? How should I do it? And how can I limit the find
result to a certain number, let's say 1000
, because I only know how to limit it using the last modified on date, -mtime
.
ssh rsync scp sftp
add a comment |
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from
(an Ubuntu machine) that has millions of tiny .txt
small size files, doing like simple ls
command or even opening the /path/from/
directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to
).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/
directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync
? How should I do it? And how can I limit the find
result to a certain number, let's say 1000
, because I only know how to limit it using the last modified on date, -mtime
.
ssh rsync scp sftp
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
add a comment |
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from
(an Ubuntu machine) that has millions of tiny .txt
small size files, doing like simple ls
command or even opening the /path/from/
directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to
).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/
directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync
? How should I do it? And how can I limit the find
result to a certain number, let's say 1000
, because I only know how to limit it using the last modified on date, -mtime
.
ssh rsync scp sftp
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from
(an Ubuntu machine) that has millions of tiny .txt
small size files, doing like simple ls
command or even opening the /path/from/
directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to
).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/
directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync
? How should I do it? And how can I limit the find
result to a certain number, let's say 1000
, because I only know how to limit it using the last modified on date, -mtime
.
ssh rsync scp sftp
ssh rsync scp sftp
edited Feb 8 at 13:07
terdon♦
131k32257436
131k32257436
asked Feb 8 at 12:24
xcodexcode
1113
1113
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
add a comment |
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
1
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
add a comment |
3 Answers
3
active
oldest
votes
Maybe you can use find
in combination with cpio
to create a stream from your many files on one machine and extract the files with cpio
on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find
and cpio
.
how to limit into let say 1000 first files found usingfind
?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000
... but if you want to continue later I suggest to savefind
's output to a file first, then usesplit
(orhead
andtail
) and a loop. The result of repeatedfind
calls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many arguments
cpio: premature end of archive
errors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpio
command should becpio -o -Bav -H crc
(not ...-o Bav
...)
– Bodo
Feb 11 at 14:21
add a comment |
Try locate
It seems find
is too slow for this application.
There is a faster tool to find files,
locate
. It uses a database, that must be updated forlocate
to find the newest files.
updatedb
creates or updates a database used bylocate
. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
find
and when the database is updated,locate
will find all files (and it is much faster thanfind
).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locate
provides several useful options but not a lot of options likefind
. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scp
orrsync
.
You can limit the number of files with
--limit
If you search only in
/path/from/
and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/
itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locate
for more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12
,2019-02-13
...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
add a comment |
- tar, zip, or compress everything under the folder into one
source.tar
file; can quickly do viatar -cf /sourcedirectory
; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.tar
or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499467%2fcopying-hundred-of-thousands-files-from-remote-directory-to-another-remote-direc%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Maybe you can use find
in combination with cpio
to create a stream from your many files on one machine and extract the files with cpio
on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find
and cpio
.
how to limit into let say 1000 first files found usingfind
?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000
... but if you want to continue later I suggest to savefind
's output to a file first, then usesplit
(orhead
andtail
) and a loop. The result of repeatedfind
calls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many arguments
cpio: premature end of archive
errors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpio
command should becpio -o -Bav -H crc
(not ...-o Bav
...)
– Bodo
Feb 11 at 14:21
add a comment |
Maybe you can use find
in combination with cpio
to create a stream from your many files on one machine and extract the files with cpio
on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find
and cpio
.
how to limit into let say 1000 first files found usingfind
?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000
... but if you want to continue later I suggest to savefind
's output to a file first, then usesplit
(orhead
andtail
) and a loop. The result of repeatedfind
calls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many arguments
cpio: premature end of archive
errors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpio
command should becpio -o -Bav -H crc
(not ...-o Bav
...)
– Bodo
Feb 11 at 14:21
add a comment |
Maybe you can use find
in combination with cpio
to create a stream from your many files on one machine and extract the files with cpio
on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find
and cpio
.
Maybe you can use find
in combination with cpio
to create a stream from your many files on one machine and extract the files with cpio
on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find
and cpio
.
edited Feb 11 at 14:20
answered Feb 8 at 12:46
BodoBodo
1,583212
1,583212
how to limit into let say 1000 first files found usingfind
?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000
... but if you want to continue later I suggest to savefind
's output to a file first, then usesplit
(orhead
andtail
) and a loop. The result of repeatedfind
calls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many arguments
cpio: premature end of archive
errors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpio
command should becpio -o -Bav -H crc
(not ...-o Bav
...)
– Bodo
Feb 11 at 14:21
add a comment |
how to limit into let say 1000 first files found usingfind
?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000
... but if you want to continue later I suggest to savefind
's output to a file first, then usesplit
(orhead
andtail
) and a loop. The result of repeatedfind
calls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many arguments
cpio: premature end of archive
errors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpio
command should becpio -o -Bav -H crc
(not ...-o Bav
...)
– Bodo
Feb 11 at 14:21
how to limit into let say 1000 first files found using
find
?– xcode
Feb 8 at 14:22
how to limit into let say 1000 first files found using
find
?– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like
find ./ -xdev -name 'A*random*' -print | head -1000
... but if you want to continue later I suggest to save find
's output to a file first, then use split
(or head
and tail
) and a loop. The result of repeated find
calls may be inconsistent if files are created or removed in between.– Bodo
Feb 8 at 14:31
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like
find ./ -xdev -name 'A*random*' -print | head -1000
... but if you want to continue later I suggest to save find
's output to a file first, then use split
(or head
and tail
) and a loop. The result of repeated find
calls may be inconsistent if files are created or removed in between.– Bodo
Feb 8 at 14:31
this doesn't work it always give me
cpio: Too many arguments
cpio: premature end of archive
errors– xcode
Feb 10 at 3:11
this doesn't work it always give me
cpio: Too many arguments
cpio: premature end of archive
errors– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The first
cpio
command should be cpio -o -Bav -H crc
(not ... -o Bav
...)– Bodo
Feb 11 at 14:21
@xcode I fixed a typo. The first
cpio
command should be cpio -o -Bav -H crc
(not ... -o Bav
...)– Bodo
Feb 11 at 14:21
add a comment |
Try locate
It seems find
is too slow for this application.
There is a faster tool to find files,
locate
. It uses a database, that must be updated forlocate
to find the newest files.
updatedb
creates or updates a database used bylocate
. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
find
and when the database is updated,locate
will find all files (and it is much faster thanfind
).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locate
provides several useful options but not a lot of options likefind
. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scp
orrsync
.
You can limit the number of files with
--limit
If you search only in
/path/from/
and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/
itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locate
for more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12
,2019-02-13
...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
add a comment |
Try locate
It seems find
is too slow for this application.
There is a faster tool to find files,
locate
. It uses a database, that must be updated forlocate
to find the newest files.
updatedb
creates or updates a database used bylocate
. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
find
and when the database is updated,locate
will find all files (and it is much faster thanfind
).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locate
provides several useful options but not a lot of options likefind
. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scp
orrsync
.
You can limit the number of files with
--limit
If you search only in
/path/from/
and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/
itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locate
for more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12
,2019-02-13
...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
add a comment |
Try locate
It seems find
is too slow for this application.
There is a faster tool to find files,
locate
. It uses a database, that must be updated forlocate
to find the newest files.
updatedb
creates or updates a database used bylocate
. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
find
and when the database is updated,locate
will find all files (and it is much faster thanfind
).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locate
provides several useful options but not a lot of options likefind
. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scp
orrsync
.
You can limit the number of files with
--limit
If you search only in
/path/from/
and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/
itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locate
for more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12
,2019-02-13
...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
Try locate
It seems find
is too slow for this application.
There is a faster tool to find files,
locate
. It uses a database, that must be updated forlocate
to find the newest files.
updatedb
creates or updates a database used bylocate
. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
find
and when the database is updated,locate
will find all files (and it is much faster thanfind
).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locate
provides several useful options but not a lot of options likefind
. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scp
orrsync
.
You can limit the number of files with
--limit
If you search only in
/path/from/
and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/
itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locate
for more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12
,2019-02-13
...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
edited Feb 12 at 10:51
answered Feb 12 at 9:00
sudodussudodus
1,54837
1,54837
add a comment |
add a comment |
- tar, zip, or compress everything under the folder into one
source.tar
file; can quickly do viatar -cf /sourcedirectory
; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.tar
or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
add a comment |
- tar, zip, or compress everything under the folder into one
source.tar
file; can quickly do viatar -cf /sourcedirectory
; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.tar
or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
add a comment |
- tar, zip, or compress everything under the folder into one
source.tar
file; can quickly do viatar -cf /sourcedirectory
; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.tar
or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
- tar, zip, or compress everything under the folder into one
source.tar
file; can quickly do viatar -cf /sourcedirectory
; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.tar
or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
answered Feb 12 at 15:00
ronron
1,0741815
1,0741815
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499467%2fcopying-hundred-of-thousands-files-from-remote-directory-to-another-remote-direc%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55