Copying Hundred of Thousands Files from Remote Directory to Another Remote Directory
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.
ssh rsync scp sftp
add a comment |
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.
ssh rsync scp sftp
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
add a comment |
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.
ssh rsync scp sftp
I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.
My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).
Here is what I've tried so far:
scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to
But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.
Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.
ssh rsync scp sftp
ssh rsync scp sftp
edited Feb 8 at 13:07
terdon♦
131k32257436
131k32257436
asked Feb 8 at 12:24
xcodexcode
1113
1113
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
add a comment |
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
1
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55
add a comment |
3 Answers
3
active
oldest
votes
Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .
how to limit into let say 1000 first files found usingfind?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to savefind's output to a file first, then usesplit(orheadandtail) and a loop. The result of repeatedfindcalls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many argumentscpio: premature end of archiveerrors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpiocommand should becpio -o -Bav -H crc(not ...-o Bav...)
– Bodo
Feb 11 at 14:21
add a comment |
Try locate
It seems find is too slow for this application.
There is a faster tool to find files,
locate. It uses a database, that must be updated forlocateto find the newest files.
updatedbcreates or updates a database used bylocate. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
findand when the database is updated,locatewill find all files (and it is much faster thanfind).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locateprovides several useful options but not a lot of options likefind. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scporrsync.
You can limit the number of files with
--limit
If you search only in
/path/from/and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locatefor more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12,2019-02-13...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
add a comment |
- tar, zip, or compress everything under the folder into one
source.tarfile; can quickly do viatar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.taror unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499467%2fcopying-hundred-of-thousands-files-from-remote-directory-to-another-remote-direc%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .
how to limit into let say 1000 first files found usingfind?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to savefind's output to a file first, then usesplit(orheadandtail) and a loop. The result of repeatedfindcalls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many argumentscpio: premature end of archiveerrors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpiocommand should becpio -o -Bav -H crc(not ...-o Bav...)
– Bodo
Feb 11 at 14:21
add a comment |
Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .
how to limit into let say 1000 first files found usingfind?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to savefind's output to a file first, then usesplit(orheadandtail) and a loop. The result of repeatedfindcalls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many argumentscpio: premature end of archiveerrors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpiocommand should becpio -o -Bav -H crc(not ...-o Bav...)
– Bodo
Feb 11 at 14:21
add a comment |
Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .
Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.
ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"
This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .
edited Feb 11 at 14:20
answered Feb 8 at 12:46
BodoBodo
1,583212
1,583212
how to limit into let say 1000 first files found usingfind?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to savefind's output to a file first, then usesplit(orheadandtail) and a loop. The result of repeatedfindcalls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many argumentscpio: premature end of archiveerrors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpiocommand should becpio -o -Bav -H crc(not ...-o Bav...)
– Bodo
Feb 11 at 14:21
add a comment |
how to limit into let say 1000 first files found usingfind?
– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something likefind ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to savefind's output to a file first, then usesplit(orheadandtail) and a loop. The result of repeatedfindcalls may be inconsistent if files are created or removed in between.
– Bodo
Feb 8 at 14:31
this doesn't work it always give mecpio: Too many argumentscpio: premature end of archiveerrors
– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The firstcpiocommand should becpio -o -Bav -H crc(not ...-o Bav...)
– Bodo
Feb 11 at 14:21
how to limit into let say 1000 first files found using
find?– xcode
Feb 8 at 14:22
how to limit into let say 1000 first files found using
find?– xcode
Feb 8 at 14:22
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like
find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.– Bodo
Feb 8 at 14:31
@xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like
find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.– Bodo
Feb 8 at 14:31
this doesn't work it always give me
cpio: Too many arguments cpio: premature end of archive errors– xcode
Feb 10 at 3:11
this doesn't work it always give me
cpio: Too many arguments cpio: premature end of archive errors– xcode
Feb 10 at 3:11
@xcode I fixed a typo. The first
cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)– Bodo
Feb 11 at 14:21
@xcode I fixed a typo. The first
cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)– Bodo
Feb 11 at 14:21
add a comment |
Try locate
It seems find is too slow for this application.
There is a faster tool to find files,
locate. It uses a database, that must be updated forlocateto find the newest files.
updatedbcreates or updates a database used bylocate. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
findand when the database is updated,locatewill find all files (and it is much faster thanfind).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locateprovides several useful options but not a lot of options likefind. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scporrsync.
You can limit the number of files with
--limit
If you search only in
/path/from/and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locatefor more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12,2019-02-13...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
add a comment |
Try locate
It seems find is too slow for this application.
There is a faster tool to find files,
locate. It uses a database, that must be updated forlocateto find the newest files.
updatedbcreates or updates a database used bylocate. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
findand when the database is updated,locatewill find all files (and it is much faster thanfind).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locateprovides several useful options but not a lot of options likefind. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scporrsync.
You can limit the number of files with
--limit
If you search only in
/path/from/and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locatefor more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12,2019-02-13...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
add a comment |
Try locate
It seems find is too slow for this application.
There is a faster tool to find files,
locate. It uses a database, that must be updated forlocateto find the newest files.
updatedbcreates or updates a database used bylocate. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
findand when the database is updated,locatewill find all files (and it is much faster thanfind).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locateprovides several useful options but not a lot of options likefind. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scporrsync.
You can limit the number of files with
--limit
If you search only in
/path/from/and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locatefor more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12,2019-02-13...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
Try locate
It seems find is too slow for this application.
There is a faster tool to find files,
locate. It uses a database, that must be updated forlocateto find the newest files.
updatedbcreates or updates a database used bylocate. If the database already exists, its data is reused to avoid rereading directories that have not changed.
This update process is very fast compared to
findand when the database is updated,locatewill find all files (and it is much faster thanfind).
Usage
Create and next times update the database
sudo updatedb
Find the relevant files.
locateprovides several useful options but not a lot of options likefind. You might be able to design a useful pattern for your purpose.
I suggest two command lines, that you may modify, and later combine with
scporrsync.
You can limit the number of files with
--limit
If you search only in
/path/from/and not in sub-directories
locate --regex --limit 1000 '/path/from/A.*random.*'
If you search not in
/path/from/itself but in its sub-directories
locate --regex --limit 1000 '/path/from/.*/A.*random.*'
See
man locatefor more details.
General comments
Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (
2019-02-12,2019-02-13...),
or even better, like many photo managers are storing picture files,
- one level of subdirectories for each year
- the next level of subdirectories for each month of the year
- the final level of subdirectories for each day of the month, where the files are stored.
Maybe you can also remove some files (for example when they are getting too old).
edited Feb 12 at 10:51
answered Feb 12 at 9:00
sudodussudodus
1,54837
1,54837
add a comment |
add a comment |
- tar, zip, or compress everything under the folder into one
source.tarfile; can quickly do viatar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.taror unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
add a comment |
- tar, zip, or compress everything under the folder into one
source.tarfile; can quickly do viatar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.taror unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
add a comment |
- tar, zip, or compress everything under the folder into one
source.tarfile; can quickly do viatar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.taror unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
- tar, zip, or compress everything under the folder into one
source.tarfile; can quickly do viatar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file. - transfer this one file however you like
- once at the destination,
tar -xf source.taror unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.
answered Feb 12 at 15:00
ronron
1,0741815
1,0741815
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499467%2fcopying-hundred-of-thousands-files-from-remote-directory-to-another-remote-direc%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)
– JStrahl
Feb 8 at 12:55