Copying Hundred of Thousands Files from Remote Directory to Another Remote Directory












2















I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.



My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).



Here is what I've tried so far:



scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to


But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.



Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.










share|improve this question




















  • 1





    Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

    – JStrahl
    Feb 8 at 12:55


















2















I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.



My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).



Here is what I've tried so far:



scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to


But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.



Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.










share|improve this question




















  • 1





    Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

    – JStrahl
    Feb 8 at 12:55
















2












2








2








I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.



My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).



Here is what I've tried so far:



scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to


But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.



Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.










share|improve this question
















I have a remote directory (SSH enabled) on 11.11.11.11/:/path/from (an Ubuntu machine) that has millions of tiny .txt small size files, doing like simple ls command or even opening the /path/from/ directory using WinSCP is impossible because there are millions of files.



My goal is to find few thousands files that have a specific name pattern and copy them to another remote location (e.g 22.22.22.22:/path/to).



Here is what I've tried so far:



scp --exec=`find /path/from -name 'A*random*' -mtime +0 -mtime -10` user@22.22.22.22:/path/to


But it takes a long long time to do that, like I said the /path/from/ directory contains literally millions of files.



Do you have suggestion to make it faster? Using rsync? How should I do it? And how can I limit the find result to a certain number, let's say 1000, because I only know how to limit it using the last modified on date, -mtime.







ssh rsync scp sftp






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 8 at 13:07









terdon

131k32257436




131k32257436










asked Feb 8 at 12:24









xcodexcode

1113




1113








  • 1





    Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

    – JStrahl
    Feb 8 at 12:55
















  • 1





    Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

    – JStrahl
    Feb 8 at 12:55










1




1





Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

– JStrahl
Feb 8 at 12:55







Have you tried using the include and exclude in rsync? e.g, rsync -nrv --include="/" --include="Arandom*" --exclude="*" /path/from/ (Note that the * is missing between A and random, it disappears when I save the comment!)

– JStrahl
Feb 8 at 12:55












3 Answers
3






active

oldest

votes


















0














Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.



ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"


This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .






share|improve this answer


























  • how to limit into let say 1000 first files found using find?

    – xcode
    Feb 8 at 14:22











  • @xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

    – Bodo
    Feb 8 at 14:31













  • this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

    – xcode
    Feb 10 at 3:11











  • @xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

    – Bodo
    Feb 11 at 14:21





















0














Try locate



It seems find is too slow for this application.




  • There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.



  • updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.



    This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).




Usage





  • Create and next times update the database



    sudo updatedb



  • Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.



    I suggest two command lines, that you may modify, and later combine with scp or rsync.



    You can limit the number of files with --limit



    If you search only in /path/from/ and not in sub-directories



    locate --regex --limit 1000 '/path/from/A.*random.*'


    If you search not in /path/from/ itself but in its sub-directories



    locate --regex --limit 1000 '/path/from/.*/A.*random.*'


    See man locate for more details.




General comments





  • Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),



    or even better, like many photo managers are storing picture files,




    • one level of subdirectories for each year

    • the next level of subdirectories for each month of the year

    • the final level of subdirectories for each day of the month, where the files are stored.



  • Maybe you can also remove some files (for example when they are getting too old).







share|improve this answer

































    0















    1. tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

    2. transfer this one file however you like

    3. once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.






    share|improve this answer























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499467%2fcopying-hundred-of-thousands-files-from-remote-directory-to-another-remote-direc%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.



      ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"


      This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .






      share|improve this answer


























      • how to limit into let say 1000 first files found using find?

        – xcode
        Feb 8 at 14:22











      • @xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

        – Bodo
        Feb 8 at 14:31













      • this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

        – xcode
        Feb 10 at 3:11











      • @xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

        – Bodo
        Feb 11 at 14:21


















      0














      Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.



      ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"


      This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .






      share|improve this answer


























      • how to limit into let say 1000 first files found using find?

        – xcode
        Feb 8 at 14:22











      • @xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

        – Bodo
        Feb 8 at 14:31













      • this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

        – xcode
        Feb 10 at 3:11











      • @xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

        – Bodo
        Feb 11 at 14:21
















      0












      0








      0







      Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.



      ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"


      This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .






      share|improve this answer















      Maybe you can use find in combination with cpio to create a stream from your many files on one machine and extract the files with cpio on the other machine.



      ssh user@source "cd sourcedir && find ./ -xdev -name 'A*random*' -print | cpio -o -Bav -H crc" | ssh user@target "cd destinationdir && cpio -i -vumd"


      This (untested) solution is based on https://www.netroby.com/view/3602 . There you will find some explanation of the arguments for find and cpio .







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Feb 11 at 14:20

























      answered Feb 8 at 12:46









      BodoBodo

      1,583212




      1,583212













      • how to limit into let say 1000 first files found using find?

        – xcode
        Feb 8 at 14:22











      • @xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

        – Bodo
        Feb 8 at 14:31













      • this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

        – xcode
        Feb 10 at 3:11











      • @xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

        – Bodo
        Feb 11 at 14:21





















      • how to limit into let say 1000 first files found using find?

        – xcode
        Feb 8 at 14:22











      • @xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

        – Bodo
        Feb 8 at 14:31













      • this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

        – xcode
        Feb 10 at 3:11











      • @xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

        – Bodo
        Feb 11 at 14:21



















      how to limit into let say 1000 first files found using find?

      – xcode
      Feb 8 at 14:22





      how to limit into let say 1000 first files found using find?

      – xcode
      Feb 8 at 14:22













      @xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

      – Bodo
      Feb 8 at 14:31







      @xcode: Why do you want to do this? Do you want to continue with the next 1000 files later? You could use something like find ./ -xdev -name 'A*random*' -print | head -1000... but if you want to continue later I suggest to save find's output to a file first, then use split (or head and tail) and a loop. The result of repeated find calls may be inconsistent if files are created or removed in between.

      – Bodo
      Feb 8 at 14:31















      this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

      – xcode
      Feb 10 at 3:11





      this doesn't work it always give me cpio: Too many arguments cpio: premature end of archive errors

      – xcode
      Feb 10 at 3:11













      @xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

      – Bodo
      Feb 11 at 14:21







      @xcode I fixed a typo. The first cpio command should be cpio -o -Bav -H crc (not ... -o Bav ...)

      – Bodo
      Feb 11 at 14:21















      0














      Try locate



      It seems find is too slow for this application.




      • There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.



      • updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.



        This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).




      Usage





      • Create and next times update the database



        sudo updatedb



      • Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.



        I suggest two command lines, that you may modify, and later combine with scp or rsync.



        You can limit the number of files with --limit



        If you search only in /path/from/ and not in sub-directories



        locate --regex --limit 1000 '/path/from/A.*random.*'


        If you search not in /path/from/ itself but in its sub-directories



        locate --regex --limit 1000 '/path/from/.*/A.*random.*'


        See man locate for more details.




      General comments





      • Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),



        or even better, like many photo managers are storing picture files,




        • one level of subdirectories for each year

        • the next level of subdirectories for each month of the year

        • the final level of subdirectories for each day of the month, where the files are stored.



      • Maybe you can also remove some files (for example when they are getting too old).







      share|improve this answer






























        0














        Try locate



        It seems find is too slow for this application.




        • There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.



        • updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.



          This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).




        Usage





        • Create and next times update the database



          sudo updatedb



        • Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.



          I suggest two command lines, that you may modify, and later combine with scp or rsync.



          You can limit the number of files with --limit



          If you search only in /path/from/ and not in sub-directories



          locate --regex --limit 1000 '/path/from/A.*random.*'


          If you search not in /path/from/ itself but in its sub-directories



          locate --regex --limit 1000 '/path/from/.*/A.*random.*'


          See man locate for more details.




        General comments





        • Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),



          or even better, like many photo managers are storing picture files,




          • one level of subdirectories for each year

          • the next level of subdirectories for each month of the year

          • the final level of subdirectories for each day of the month, where the files are stored.



        • Maybe you can also remove some files (for example when they are getting too old).







        share|improve this answer




























          0












          0








          0







          Try locate



          It seems find is too slow for this application.




          • There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.



          • updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.



            This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).




          Usage





          • Create and next times update the database



            sudo updatedb



          • Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.



            I suggest two command lines, that you may modify, and later combine with scp or rsync.



            You can limit the number of files with --limit



            If you search only in /path/from/ and not in sub-directories



            locate --regex --limit 1000 '/path/from/A.*random.*'


            If you search not in /path/from/ itself but in its sub-directories



            locate --regex --limit 1000 '/path/from/.*/A.*random.*'


            See man locate for more details.




          General comments





          • Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),



            or even better, like many photo managers are storing picture files,




            • one level of subdirectories for each year

            • the next level of subdirectories for each month of the year

            • the final level of subdirectories for each day of the month, where the files are stored.



          • Maybe you can also remove some files (for example when they are getting too old).







          share|improve this answer















          Try locate



          It seems find is too slow for this application.




          • There is a faster tool to find files, locate. It uses a database, that must be updated for locate to find the newest files.



          • updatedb creates or updates a database used by locate. If the database already exists, its data is reused to avoid rereading directories that have not changed.



            This update process is very fast compared to find and when the database is updated, locate will find all files (and it is much faster than find).




          Usage





          • Create and next times update the database



            sudo updatedb



          • Find the relevant files. locate provides several useful options but not a lot of options like find. You might be able to design a useful pattern for your purpose.



            I suggest two command lines, that you may modify, and later combine with scp or rsync.



            You can limit the number of files with --limit



            If you search only in /path/from/ and not in sub-directories



            locate --regex --limit 1000 '/path/from/A.*random.*'


            If you search not in /path/from/ itself but in its sub-directories



            locate --regex --limit 1000 '/path/from/.*/A.*random.*'


            See man locate for more details.




          General comments





          • Maybe you should modify how these files are written and stored, for example with several sub-directories, so that there are not too many files in each directory, for example one sub-directory for each date (2019-02-12, 2019-02-13 ...),



            or even better, like many photo managers are storing picture files,




            • one level of subdirectories for each year

            • the next level of subdirectories for each month of the year

            • the final level of subdirectories for each day of the month, where the files are stored.



          • Maybe you can also remove some files (for example when they are getting too old).








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 12 at 10:51

























          answered Feb 12 at 9:00









          sudodussudodus

          1,54837




          1,54837























              0















              1. tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

              2. transfer this one file however you like

              3. once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.






              share|improve this answer




























                0















                1. tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

                2. transfer this one file however you like

                3. once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.






                share|improve this answer


























                  0












                  0








                  0








                  1. tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

                  2. transfer this one file however you like

                  3. once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.






                  share|improve this answer














                  1. tar, zip, or compress everything under the folder into one source.tar file; can quickly do via tar -cf /sourcedirectory; this will be however large containing all 100,000+ files now in one file.

                  2. transfer this one file however you like

                  3. once at the destination, tar -xf source.tar or unzip/uncompress appropriately back to the original folder structure containing 100,000+ files.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 12 at 15:00









                  ronron

                  1,0741815




                  1,0741815






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499467%2fcopying-hundred-of-thousands-files-from-remote-directory-to-another-remote-direc%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to make a Squid Proxy server?

                      Is this a new Fibonacci Identity?

                      19世紀