create files through terminal and joining two files in script python3

I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.

dir$ find . -type f -printf '%fn' > old_names.csv

I am using a detox code to change filenames. And I am making a new list using

dir $ find . -type f -printf '%fn' > new_names.csv

I would like to join this to lists together and make a new list with two columns something like this;

enter image description here

To do that I read both csv files into pandas data frame and join them on index as follows in python3 script

 import pandas as pd

 import csv



 df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')

 df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')

 df_names=df_new.join(df_old)

The problem is I am getting something like this, wrong file pairs;

enter image description here

When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?

edited Jan 21 at 17:13

Tomasz

9,51652965

asked Jan 21 at 16:55

kutlus

536

There's no particular reason that the two finds must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or would paste suffice?

– Michael Homer
Jan 21 at 19:11

Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.

– kutlus
Jan 21 at 19:15

There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system). detox will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.

– Michael Homer
Jan 21 at 19:18

add a comment |

I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.

dir$ find . -type f -printf '%fn' > old_names.csv

I am using a detox code to change filenames. And I am making a new list using

dir $ find . -type f -printf '%fn' > new_names.csv

I would like to join this to lists together and make a new list with two columns something like this;

enter image description here

To do that I read both csv files into pandas data frame and join them on index as follows in python3 script

 import pandas as pd

 import csv



 df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')

 df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')

 df_names=df_new.join(df_old)

The problem is I am getting something like this, wrong file pairs;

enter image description here

When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?

edited Jan 21 at 17:13

Tomasz

9,51652965

asked Jan 21 at 16:55

kutlus

536

There's no particular reason that the two finds must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or would paste suffice?

– Michael Homer
Jan 21 at 19:11

Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.

– kutlus
Jan 21 at 19:15

There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system). detox will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.

– Michael Homer
Jan 21 at 19:18

add a comment |

I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.

dir$ find . -type f -printf '%fn' > old_names.csv

I am using a detox code to change filenames. And I am making a new list using

dir $ find . -type f -printf '%fn' > new_names.csv

I would like to join this to lists together and make a new list with two columns something like this;

enter image description here

To do that I read both csv files into pandas data frame and join them on index as follows in python3 script

 import pandas as pd

 import csv



 df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')

 df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')

 df_names=df_new.join(df_old)

The problem is I am getting something like this, wrong file pairs;

enter image description here

When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?

edited Jan 21 at 17:13

Tomasz

9,51652965

asked Jan 21 at 16:55

kutlus

536

I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.

dir$ find . -type f -printf '%fn' > old_names.csv

I am using a detox code to change filenames. And I am making a new list using

dir $ find . -type f -printf '%fn' > new_names.csv

I would like to join this to lists together and make a new list with two columns something like this;

enter image description here

To do that I read both csv files into pandas data frame and join them on index as follows in python3 script

 import pandas as pd

 import csv



 df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')

 df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')

 df_names=df_new.join(df_old)

The problem is I am getting something like this, wrong file pairs;

enter image description here

When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?

linux python3

edited Jan 21 at 17:13

Tomasz

9,51652965

asked Jan 21 at 16:55

kutlus

536

edited Jan 21 at 17:13

Tomasz

9,51652965

asked Jan 21 at 16:55

kutlus

536

edited Jan 21 at 17:13

Tomasz

9,51652965

edited Jan 21 at 17:13

Tomasz

9,51652965

edited Jan 21 at 17:13

Tomasz

9,51652965

asked Jan 21 at 16:55

kutlus

536

asked Jan 21 at 16:55

kutlus

536

asked Jan 21 at 16:55

kutlus

536

There's no particular reason that the two finds must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or would paste suffice?

– Michael Homer
Jan 21 at 19:11

Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.

– kutlus
Jan 21 at 19:15

There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system). detox will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.

– Michael Homer
Jan 21 at 19:18

add a comment |

There's no particular reason that the two finds must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or would paste suffice?

– Michael Homer
Jan 21 at 19:11

Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.

– kutlus
Jan 21 at 19:15

There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system). detox will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.

– Michael Homer
Jan 21 at 19:18

There's no particular reason that the two finds must produce output in the same order, so the whole exercise may be flawed, but if you're in a situation where they do, do you need to be using Python to join them or would paste suffice?

– Michael Homer
Jan 21 at 19:11

Hi, thanks! I have many folders like that so i wanted to do join them in a for loop over directories in python.

– kutlus
Jan 21 at 19:15

There's just no reason that this would work except under pretty controlled conditions (specific filesystems in use, possibly control of other simultaneous operations on the system). detox will tell you what changes it's making and you'd be better off to use that information instead, I think, rather than trying to reverse-engineer it.

– Michael Homer
Jan 21 at 19:18

add a comment |

1 Answer
1

active

oldest

votes

The find command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two finds should give the same order like that.

For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.

It's possible that sorting the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find runs. I don't recommend even trying that.

However, detox does have a -v option that prints out the changes it is making (and -n to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run.

detox -v ... | sed -e 's/ -> /,/' > names.csv

would produce a CSV file at least as well as one of your finds, with the old and new names automatically matched up. For the basenames (like %f did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.

edited Jan 21 at 21:00

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.

– kutlus
Jan 21 at 21:58

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495819%2fcreate-files-through-terminal-and-joining-two-files-in-script-python3%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

detox -v ... | sed -e 's/ -> /,/' > names.csv

edited Jan 21 at 21:00

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.

– kutlus
Jan 21 at 21:58

add a comment |

detox -v ... | sed -e 's/ -> /,/' > names.csv

edited Jan 21 at 21:00

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.

– kutlus
Jan 21 at 21:58

add a comment |

detox -v ... | sed -e 's/ -> /,/' > names.csv

edited Jan 21 at 21:00

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

detox -v ... | sed -e 's/ -> /,/' > names.csv

edited Jan 21 at 21:00

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

edited Jan 21 at 21:00

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

answered Jan 21 at 19:41

Michael Homer

47.4k8124162

Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.

– kutlus
Jan 21 at 21:58

add a comment |

Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.

– kutlus
Jan 21 at 21:58

Thank you Michael this was helpful, I am convinced that i can`t get around this. The detox code not the detox package but just some functions I defined them to replace some characters with others, and called them detox functions. Now,I have decided convert first list into data frame, call each row with index and apply the detox functions on each row to create the new name list.

– kutlus
Jan 21 at 21:58

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly