create files through terminal and joining two files in script python3
I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.
dir$ find . -type f -printf '%fn' > old_names.csv
I am using a detox code to change filenames. And I am making a new list using
dir $ find . -type f -printf '%fn' > new_names.csv
I would like to join this to lists together and make a new list with two columns something like this;
To do that I read both csv files into pandas data frame and join them on index as follows in python3 script
import pandas as pd
import csv
df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')
df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')
df_names=df_new.join(df_old)
The problem is I am getting something like this, wrong file pairs;
When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?
linux python3
add a comment |
I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.
dir$ find . -type f -printf '%fn' > old_names.csv
I am using a detox code to change filenames. And I am making a new list using
dir $ find . -type f -printf '%fn' > new_names.csv
I would like to join this to lists together and make a new list with two columns something like this;
To do that I read both csv files into pandas data frame and join them on index as follows in python3 script
import pandas as pd
import csv
df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')
df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')
df_names=df_new.join(df_old)
The problem is I am getting something like this, wrong file pairs;
When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?
linux python3
There's no particular reason that the twofind
s must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or wouldpaste
suffice?
– Michael Homer
Jan 21 at 19:11
Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.
– kutlus
Jan 21 at 19:15
There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system).detox
will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.
– Michael Homer
Jan 21 at 19:18
add a comment |
I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.
dir$ find . -type f -printf '%fn' > old_names.csv
I am using a detox code to change filenames. And I am making a new list using
dir $ find . -type f -printf '%fn' > new_names.csv
I would like to join this to lists together and make a new list with two columns something like this;
To do that I read both csv files into pandas data frame and join them on index as follows in python3 script
import pandas as pd
import csv
df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')
df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')
df_names=df_new.join(df_old)
The problem is I am getting something like this, wrong file pairs;
When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?
linux python3
I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.
dir$ find . -type f -printf '%fn' > old_names.csv
I am using a detox code to change filenames. And I am making a new list using
dir $ find . -type f -printf '%fn' > new_names.csv
I would like to join this to lists together and make a new list with two columns something like this;
To do that I read both csv files into pandas data frame and join them on index as follows in python3 script
import pandas as pd
import csv
df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')
df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')
df_names=df_new.join(df_old)
The problem is I am getting something like this, wrong file pairs;
When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?
linux python3
linux python3
edited Jan 21 at 17:13
Tomasz
9,51652965
9,51652965
asked Jan 21 at 16:55
kutluskutlus
536
536
There's no particular reason that the twofind
s must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or wouldpaste
suffice?
– Michael Homer
Jan 21 at 19:11
Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.
– kutlus
Jan 21 at 19:15
There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system).detox
will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.
– Michael Homer
Jan 21 at 19:18
add a comment |
There's no particular reason that the twofind
s must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or wouldpaste
suffice?
– Michael Homer
Jan 21 at 19:11
Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.
– kutlus
Jan 21 at 19:15
There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system).detox
will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.
– Michael Homer
Jan 21 at 19:18
There's no particular reason that the two
find
s must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or would paste
suffice?– Michael Homer
Jan 21 at 19:11
There's no particular reason that the two
find
s must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or would paste
suffice?– Michael Homer
Jan 21 at 19:11
Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.
– kutlus
Jan 21 at 19:15
Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.
– kutlus
Jan 21 at 19:15
There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system).
detox
will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.– Michael Homer
Jan 21 at 19:18
There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system).
detox
will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.– Michael Homer
Jan 21 at 19:18
add a comment |
1 Answer
1
active
oldest
votes
The find
command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two find
s should give the same order like that.
For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.
It's possible that sort
ing the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find
runs. I don't recommend even trying that.
However, detox
does have a -v
option that prints out the changes it is making (and -n
to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run
.
detox -v ... | sed -e 's/ -> /,/' > names.csv
would produce a CSV file at least as well as one of your find
s, with the old and new names automatically matched up. For the basenames (like %f
did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.
Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.
– kutlus
Jan 21 at 21:58
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495819%2fcreate-files-through-terminal-and-joining-two-files-in-script-python3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The find
command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two find
s should give the same order like that.
For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.
It's possible that sort
ing the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find
runs. I don't recommend even trying that.
However, detox
does have a -v
option that prints out the changes it is making (and -n
to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run
.
detox -v ... | sed -e 's/ -> /,/' > names.csv
would produce a CSV file at least as well as one of your find
s, with the old and new names automatically matched up. For the basenames (like %f
did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.
Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.
– kutlus
Jan 21 at 21:58
add a comment |
The find
command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two find
s should give the same order like that.
For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.
It's possible that sort
ing the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find
runs. I don't recommend even trying that.
However, detox
does have a -v
option that prints out the changes it is making (and -n
to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run
.
detox -v ... | sed -e 's/ -> /,/' > names.csv
would produce a CSV file at least as well as one of your find
s, with the old and new names automatically matched up. For the basenames (like %f
did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.
Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.
– kutlus
Jan 21 at 21:58
add a comment |
The find
command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two find
s should give the same order like that.
For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.
It's possible that sort
ing the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find
runs. I don't recommend even trying that.
However, detox
does have a -v
option that prints out the changes it is making (and -n
to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run
.
detox -v ... | sed -e 's/ -> /,/' > names.csv
would produce a CSV file at least as well as one of your find
s, with the old and new names automatically matched up. For the basenames (like %f
did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.
The find
command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two find
s should give the same order like that.
For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.
It's possible that sort
ing the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find
runs. I don't recommend even trying that.
However, detox
does have a -v
option that prints out the changes it is making (and -n
to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run
.
detox -v ... | sed -e 's/ -> /,/' > names.csv
would produce a CSV file at least as well as one of your find
s, with the old and new names automatically matched up. For the basenames (like %f
did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.
edited Jan 21 at 21:00
answered Jan 21 at 19:41
Michael HomerMichael Homer
47.4k8124162
47.4k8124162
Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.
– kutlus
Jan 21 at 21:58
add a comment |
Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.
– kutlus
Jan 21 at 21:58
Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.
– kutlus
Jan 21 at 21:58
Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.
– kutlus
Jan 21 at 21:58
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495819%2fcreate-files-through-terminal-and-joining-two-files-in-script-python3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
There's no particular reason that the two
find
s must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or wouldpaste
suffice?– Michael Homer
Jan 21 at 19:11
Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.
– kutlus
Jan 21 at 19:15
There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system).
detox
will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.– Michael Homer
Jan 21 at 19:18