Improving performance of Script Using find and exec [on hold]












2















I have a script that iterates through a given directory and automatically compresses those that do not contain at least one file used in less than 30 days. Now, I am wondering if I could improve performance by using find together with exec. I tried something, but it is not working. Do you have any suggestions?



#!/bin/bash
# find all the directories
dirs=`find . -type d`
# iterate every file in every directory
for dir in $dirs
do
n="totar"
# search all the file in the directory
files=`find $dir -type f -atime -30`
for file in $files
do
n="keepasis"
done
if [ $n == "totar" ]; then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
done


My idea was to replace the second for loop with something like:



find $dir -type f -atime -30 -exec n="keepasis" {} ;









share|improve this question















put on hold as too broad by Kusalananda, Mr Shunz, Shadur, Romeo Ninov, RalfFriedl yesterday


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 1





    You realise that this would try to archive the topmost directory and delete it (with all subdirectories) if it happens to contain a file (anywhere below) that has been recently accessed? Is this what you want? What does your directory structure look like?

    – Kusalananda
    Jan 10 at 20:47













  • The idea is that I compress those directories, that I have not been using for more than 30 days. I used rm to delete the directory "dublicate" once I compressed it. I realize though that this could happen to my topmost directory if I never used a file inside. Which I don't desire. Thanks for the heads up!

    – Horbaje
    Jan 10 at 21:08


















2















I have a script that iterates through a given directory and automatically compresses those that do not contain at least one file used in less than 30 days. Now, I am wondering if I could improve performance by using find together with exec. I tried something, but it is not working. Do you have any suggestions?



#!/bin/bash
# find all the directories
dirs=`find . -type d`
# iterate every file in every directory
for dir in $dirs
do
n="totar"
# search all the file in the directory
files=`find $dir -type f -atime -30`
for file in $files
do
n="keepasis"
done
if [ $n == "totar" ]; then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
done


My idea was to replace the second for loop with something like:



find $dir -type f -atime -30 -exec n="keepasis" {} ;









share|improve this question















put on hold as too broad by Kusalananda, Mr Shunz, Shadur, Romeo Ninov, RalfFriedl yesterday


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 1





    You realise that this would try to archive the topmost directory and delete it (with all subdirectories) if it happens to contain a file (anywhere below) that has been recently accessed? Is this what you want? What does your directory structure look like?

    – Kusalananda
    Jan 10 at 20:47













  • The idea is that I compress those directories, that I have not been using for more than 30 days. I used rm to delete the directory "dublicate" once I compressed it. I realize though that this could happen to my topmost directory if I never used a file inside. Which I don't desire. Thanks for the heads up!

    – Horbaje
    Jan 10 at 21:08
















2












2








2








I have a script that iterates through a given directory and automatically compresses those that do not contain at least one file used in less than 30 days. Now, I am wondering if I could improve performance by using find together with exec. I tried something, but it is not working. Do you have any suggestions?



#!/bin/bash
# find all the directories
dirs=`find . -type d`
# iterate every file in every directory
for dir in $dirs
do
n="totar"
# search all the file in the directory
files=`find $dir -type f -atime -30`
for file in $files
do
n="keepasis"
done
if [ $n == "totar" ]; then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
done


My idea was to replace the second for loop with something like:



find $dir -type f -atime -30 -exec n="keepasis" {} ;









share|improve this question
















I have a script that iterates through a given directory and automatically compresses those that do not contain at least one file used in less than 30 days. Now, I am wondering if I could improve performance by using find together with exec. I tried something, but it is not working. Do you have any suggestions?



#!/bin/bash
# find all the directories
dirs=`find . -type d`
# iterate every file in every directory
for dir in $dirs
do
n="totar"
# search all the file in the directory
files=`find $dir -type f -atime -30`
for file in $files
do
n="keepasis"
done
if [ $n == "totar" ]; then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
done


My idea was to replace the second for loop with something like:



find $dir -type f -atime -30 -exec n="keepasis" {} ;






scripting find date performance compression






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 10 at 19:16









Rui F Ribeiro

39.5k1479132




39.5k1479132










asked Jan 10 at 18:30









HorbajeHorbaje

284




284




put on hold as too broad by Kusalananda, Mr Shunz, Shadur, Romeo Ninov, RalfFriedl yesterday


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






put on hold as too broad by Kusalananda, Mr Shunz, Shadur, Romeo Ninov, RalfFriedl yesterday


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 1





    You realise that this would try to archive the topmost directory and delete it (with all subdirectories) if it happens to contain a file (anywhere below) that has been recently accessed? Is this what you want? What does your directory structure look like?

    – Kusalananda
    Jan 10 at 20:47













  • The idea is that I compress those directories, that I have not been using for more than 30 days. I used rm to delete the directory "dublicate" once I compressed it. I realize though that this could happen to my topmost directory if I never used a file inside. Which I don't desire. Thanks for the heads up!

    – Horbaje
    Jan 10 at 21:08
















  • 1





    You realise that this would try to archive the topmost directory and delete it (with all subdirectories) if it happens to contain a file (anywhere below) that has been recently accessed? Is this what you want? What does your directory structure look like?

    – Kusalananda
    Jan 10 at 20:47













  • The idea is that I compress those directories, that I have not been using for more than 30 days. I used rm to delete the directory "dublicate" once I compressed it. I realize though that this could happen to my topmost directory if I never used a file inside. Which I don't desire. Thanks for the heads up!

    – Horbaje
    Jan 10 at 21:08










1




1





You realise that this would try to archive the topmost directory and delete it (with all subdirectories) if it happens to contain a file (anywhere below) that has been recently accessed? Is this what you want? What does your directory structure look like?

– Kusalananda
Jan 10 at 20:47







You realise that this would try to archive the topmost directory and delete it (with all subdirectories) if it happens to contain a file (anywhere below) that has been recently accessed? Is this what you want? What does your directory structure look like?

– Kusalananda
Jan 10 at 20:47















The idea is that I compress those directories, that I have not been using for more than 30 days. I used rm to delete the directory "dublicate" once I compressed it. I realize though that this could happen to my topmost directory if I never used a file inside. Which I don't desire. Thanks for the heads up!

– Horbaje
Jan 10 at 21:08







The idea is that I compress those directories, that I have not been using for more than 30 days. I used rm to delete the directory "dublicate" once I compressed it. I realize though that this could happen to my topmost directory if I never used a file inside. Which I don't desire. Thanks for the heads up!

– Horbaje
Jan 10 at 21:08












1 Answer
1






active

oldest

votes


















0














If you set a variable in find's -exec action this will not be visible.



The fact that find has found a file and printed its name is sufficient to decide that you don't want to archive the directory. So you don't need the for file in $files loop, instead check that $files is not empty.



If your find command supports the -quit action you can use this to stop after the first match. (see How to stop the find command after first match?)



Instead of putting the output of the first find into a variable and using a for loop with word splitting you should better read find's output lime by line.



#!/bin/bash
# find all the directories
# -mindepth 1 prevents "find" from printing "."
find . -mindepth 1 -type d | while read -r dir
do
# a subdirectory might no longer exist if a parent has been archived before
if [ -d "$dir" ]
then
# search any new file in the directory
newfilefound=`find $dir -type f -atime -30 -print -quit`

if [ -z "$newfilefound" ]
then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
fi
done


If you are using bash you can improve the first find to correctly handle more directory names with special characters: find . -type d -print0 | while IFS= read -r -d '' dir; do...



There is still a performance issue:



If a directory contains a new file somewhere in a subdirectory you don't remove it. Later you will get all subdirectory names down to the one with this file. In this case you will use find several times to find the same new file.



The only solution that comes to my mind is to use two find, some post-processing and one fgrep:




  1. Let one find print the names of all new files, process the output by removing the file names, printing all the parent directories as separate lines and removing duplicates and putting the list into a file NEWDIRS.

  2. With a second find print all directory names to a second file ALLDIRS.

  3. Use fgrep to find all lines from ALLDIRS that don't match a line in NEWDIRS.


You should check that the tar command was successful before removing the directory.






share|improve this answer










New contributor




Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • Thank you Bobo. I really appreciate your answer and great explanation!

    – Horbaje
    Jan 10 at 20:35











  • This, like the code in the question, would try to archive the current directory if any file anywhere below it has been recently accessed.

    – Kusalananda
    Jan 10 at 20:50











  • @Kusalananda: Maybe there is a bug I don't currently see. My understanding is that -atime -30 -print will print a file name when the file has an access time of less than 30 days ago. if [ -z "$newfilefound" ] will execute the archiving commands if the output of find is empty, i.e. if no file matching the condition was found. Is there something wrong? Please explain.

    – Bodo
    Jan 11 at 10:11











  • I got my logic back to front. The code would try to archive and delete the top-most directory if no recently accessed file was found anywhere below it. The name of the archive in that case, would be ..tgz.

    – Kusalananda
    Jan 11 at 10:13













  • @Kusalananda I will add -mindepth 1 to the first find to prevent it from printing ..

    – Bodo
    Jan 11 at 10:19


















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














If you set a variable in find's -exec action this will not be visible.



The fact that find has found a file and printed its name is sufficient to decide that you don't want to archive the directory. So you don't need the for file in $files loop, instead check that $files is not empty.



If your find command supports the -quit action you can use this to stop after the first match. (see How to stop the find command after first match?)



Instead of putting the output of the first find into a variable and using a for loop with word splitting you should better read find's output lime by line.



#!/bin/bash
# find all the directories
# -mindepth 1 prevents "find" from printing "."
find . -mindepth 1 -type d | while read -r dir
do
# a subdirectory might no longer exist if a parent has been archived before
if [ -d "$dir" ]
then
# search any new file in the directory
newfilefound=`find $dir -type f -atime -30 -print -quit`

if [ -z "$newfilefound" ]
then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
fi
done


If you are using bash you can improve the first find to correctly handle more directory names with special characters: find . -type d -print0 | while IFS= read -r -d '' dir; do...



There is still a performance issue:



If a directory contains a new file somewhere in a subdirectory you don't remove it. Later you will get all subdirectory names down to the one with this file. In this case you will use find several times to find the same new file.



The only solution that comes to my mind is to use two find, some post-processing and one fgrep:




  1. Let one find print the names of all new files, process the output by removing the file names, printing all the parent directories as separate lines and removing duplicates and putting the list into a file NEWDIRS.

  2. With a second find print all directory names to a second file ALLDIRS.

  3. Use fgrep to find all lines from ALLDIRS that don't match a line in NEWDIRS.


You should check that the tar command was successful before removing the directory.






share|improve this answer










New contributor




Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • Thank you Bobo. I really appreciate your answer and great explanation!

    – Horbaje
    Jan 10 at 20:35











  • This, like the code in the question, would try to archive the current directory if any file anywhere below it has been recently accessed.

    – Kusalananda
    Jan 10 at 20:50











  • @Kusalananda: Maybe there is a bug I don't currently see. My understanding is that -atime -30 -print will print a file name when the file has an access time of less than 30 days ago. if [ -z "$newfilefound" ] will execute the archiving commands if the output of find is empty, i.e. if no file matching the condition was found. Is there something wrong? Please explain.

    – Bodo
    Jan 11 at 10:11











  • I got my logic back to front. The code would try to archive and delete the top-most directory if no recently accessed file was found anywhere below it. The name of the archive in that case, would be ..tgz.

    – Kusalananda
    Jan 11 at 10:13













  • @Kusalananda I will add -mindepth 1 to the first find to prevent it from printing ..

    – Bodo
    Jan 11 at 10:19
















0














If you set a variable in find's -exec action this will not be visible.



The fact that find has found a file and printed its name is sufficient to decide that you don't want to archive the directory. So you don't need the for file in $files loop, instead check that $files is not empty.



If your find command supports the -quit action you can use this to stop after the first match. (see How to stop the find command after first match?)



Instead of putting the output of the first find into a variable and using a for loop with word splitting you should better read find's output lime by line.



#!/bin/bash
# find all the directories
# -mindepth 1 prevents "find" from printing "."
find . -mindepth 1 -type d | while read -r dir
do
# a subdirectory might no longer exist if a parent has been archived before
if [ -d "$dir" ]
then
# search any new file in the directory
newfilefound=`find $dir -type f -atime -30 -print -quit`

if [ -z "$newfilefound" ]
then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
fi
done


If you are using bash you can improve the first find to correctly handle more directory names with special characters: find . -type d -print0 | while IFS= read -r -d '' dir; do...



There is still a performance issue:



If a directory contains a new file somewhere in a subdirectory you don't remove it. Later you will get all subdirectory names down to the one with this file. In this case you will use find several times to find the same new file.



The only solution that comes to my mind is to use two find, some post-processing and one fgrep:




  1. Let one find print the names of all new files, process the output by removing the file names, printing all the parent directories as separate lines and removing duplicates and putting the list into a file NEWDIRS.

  2. With a second find print all directory names to a second file ALLDIRS.

  3. Use fgrep to find all lines from ALLDIRS that don't match a line in NEWDIRS.


You should check that the tar command was successful before removing the directory.






share|improve this answer










New contributor




Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • Thank you Bobo. I really appreciate your answer and great explanation!

    – Horbaje
    Jan 10 at 20:35











  • This, like the code in the question, would try to archive the current directory if any file anywhere below it has been recently accessed.

    – Kusalananda
    Jan 10 at 20:50











  • @Kusalananda: Maybe there is a bug I don't currently see. My understanding is that -atime -30 -print will print a file name when the file has an access time of less than 30 days ago. if [ -z "$newfilefound" ] will execute the archiving commands if the output of find is empty, i.e. if no file matching the condition was found. Is there something wrong? Please explain.

    – Bodo
    Jan 11 at 10:11











  • I got my logic back to front. The code would try to archive and delete the top-most directory if no recently accessed file was found anywhere below it. The name of the archive in that case, would be ..tgz.

    – Kusalananda
    Jan 11 at 10:13













  • @Kusalananda I will add -mindepth 1 to the first find to prevent it from printing ..

    – Bodo
    Jan 11 at 10:19














0












0








0







If you set a variable in find's -exec action this will not be visible.



The fact that find has found a file and printed its name is sufficient to decide that you don't want to archive the directory. So you don't need the for file in $files loop, instead check that $files is not empty.



If your find command supports the -quit action you can use this to stop after the first match. (see How to stop the find command after first match?)



Instead of putting the output of the first find into a variable and using a for loop with word splitting you should better read find's output lime by line.



#!/bin/bash
# find all the directories
# -mindepth 1 prevents "find" from printing "."
find . -mindepth 1 -type d | while read -r dir
do
# a subdirectory might no longer exist if a parent has been archived before
if [ -d "$dir" ]
then
# search any new file in the directory
newfilefound=`find $dir -type f -atime -30 -print -quit`

if [ -z "$newfilefound" ]
then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
fi
done


If you are using bash you can improve the first find to correctly handle more directory names with special characters: find . -type d -print0 | while IFS= read -r -d '' dir; do...



There is still a performance issue:



If a directory contains a new file somewhere in a subdirectory you don't remove it. Later you will get all subdirectory names down to the one with this file. In this case you will use find several times to find the same new file.



The only solution that comes to my mind is to use two find, some post-processing and one fgrep:




  1. Let one find print the names of all new files, process the output by removing the file names, printing all the parent directories as separate lines and removing duplicates and putting the list into a file NEWDIRS.

  2. With a second find print all directory names to a second file ALLDIRS.

  3. Use fgrep to find all lines from ALLDIRS that don't match a line in NEWDIRS.


You should check that the tar command was successful before removing the directory.






share|improve this answer










New contributor




Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










If you set a variable in find's -exec action this will not be visible.



The fact that find has found a file and printed its name is sufficient to decide that you don't want to archive the directory. So you don't need the for file in $files loop, instead check that $files is not empty.



If your find command supports the -quit action you can use this to stop after the first match. (see How to stop the find command after first match?)



Instead of putting the output of the first find into a variable and using a for loop with word splitting you should better read find's output lime by line.



#!/bin/bash
# find all the directories
# -mindepth 1 prevents "find" from printing "."
find . -mindepth 1 -type d | while read -r dir
do
# a subdirectory might no longer exist if a parent has been archived before
if [ -d "$dir" ]
then
# search any new file in the directory
newfilefound=`find $dir -type f -atime -30 -print -quit`

if [ -z "$newfilefound" ]
then
tar -zcvf $dir.tgz $dir
rm -r $dir
fi
fi
done


If you are using bash you can improve the first find to correctly handle more directory names with special characters: find . -type d -print0 | while IFS= read -r -d '' dir; do...



There is still a performance issue:



If a directory contains a new file somewhere in a subdirectory you don't remove it. Later you will get all subdirectory names down to the one with this file. In this case you will use find several times to find the same new file.



The only solution that comes to my mind is to use two find, some post-processing and one fgrep:




  1. Let one find print the names of all new files, process the output by removing the file names, printing all the parent directories as separate lines and removing duplicates and putting the list into a file NEWDIRS.

  2. With a second find print all directory names to a second file ALLDIRS.

  3. Use fgrep to find all lines from ALLDIRS that don't match a line in NEWDIRS.


You should check that the tar command was successful before removing the directory.







share|improve this answer










New contributor




Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer








edited Jan 11 at 10:21





















New contributor




Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered Jan 10 at 19:31









BodoBodo

2135




2135




New contributor




Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Bodo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • Thank you Bobo. I really appreciate your answer and great explanation!

    – Horbaje
    Jan 10 at 20:35











  • This, like the code in the question, would try to archive the current directory if any file anywhere below it has been recently accessed.

    – Kusalananda
    Jan 10 at 20:50











  • @Kusalananda: Maybe there is a bug I don't currently see. My understanding is that -atime -30 -print will print a file name when the file has an access time of less than 30 days ago. if [ -z "$newfilefound" ] will execute the archiving commands if the output of find is empty, i.e. if no file matching the condition was found. Is there something wrong? Please explain.

    – Bodo
    Jan 11 at 10:11











  • I got my logic back to front. The code would try to archive and delete the top-most directory if no recently accessed file was found anywhere below it. The name of the archive in that case, would be ..tgz.

    – Kusalananda
    Jan 11 at 10:13













  • @Kusalananda I will add -mindepth 1 to the first find to prevent it from printing ..

    – Bodo
    Jan 11 at 10:19



















  • Thank you Bobo. I really appreciate your answer and great explanation!

    – Horbaje
    Jan 10 at 20:35











  • This, like the code in the question, would try to archive the current directory if any file anywhere below it has been recently accessed.

    – Kusalananda
    Jan 10 at 20:50











  • @Kusalananda: Maybe there is a bug I don't currently see. My understanding is that -atime -30 -print will print a file name when the file has an access time of less than 30 days ago. if [ -z "$newfilefound" ] will execute the archiving commands if the output of find is empty, i.e. if no file matching the condition was found. Is there something wrong? Please explain.

    – Bodo
    Jan 11 at 10:11











  • I got my logic back to front. The code would try to archive and delete the top-most directory if no recently accessed file was found anywhere below it. The name of the archive in that case, would be ..tgz.

    – Kusalananda
    Jan 11 at 10:13













  • @Kusalananda I will add -mindepth 1 to the first find to prevent it from printing ..

    – Bodo
    Jan 11 at 10:19

















Thank you Bobo. I really appreciate your answer and great explanation!

– Horbaje
Jan 10 at 20:35





Thank you Bobo. I really appreciate your answer and great explanation!

– Horbaje
Jan 10 at 20:35













This, like the code in the question, would try to archive the current directory if any file anywhere below it has been recently accessed.

– Kusalananda
Jan 10 at 20:50





This, like the code in the question, would try to archive the current directory if any file anywhere below it has been recently accessed.

– Kusalananda
Jan 10 at 20:50













@Kusalananda: Maybe there is a bug I don't currently see. My understanding is that -atime -30 -print will print a file name when the file has an access time of less than 30 days ago. if [ -z "$newfilefound" ] will execute the archiving commands if the output of find is empty, i.e. if no file matching the condition was found. Is there something wrong? Please explain.

– Bodo
Jan 11 at 10:11





@Kusalananda: Maybe there is a bug I don't currently see. My understanding is that -atime -30 -print will print a file name when the file has an access time of less than 30 days ago. if [ -z "$newfilefound" ] will execute the archiving commands if the output of find is empty, i.e. if no file matching the condition was found. Is there something wrong? Please explain.

– Bodo
Jan 11 at 10:11













I got my logic back to front. The code would try to archive and delete the top-most directory if no recently accessed file was found anywhere below it. The name of the archive in that case, would be ..tgz.

– Kusalananda
Jan 11 at 10:13







I got my logic back to front. The code would try to archive and delete the top-most directory if no recently accessed file was found anywhere below it. The name of the archive in that case, would be ..tgz.

– Kusalananda
Jan 11 at 10:13















@Kusalananda I will add -mindepth 1 to the first find to prevent it from printing ..

– Bodo
Jan 11 at 10:19





@Kusalananda I will add -mindepth 1 to the first find to prevent it from printing ..

– Bodo
Jan 11 at 10:19



Popular posts from this blog

How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

is 'sed' thread safe

How to make a Squid Proxy server?