Parent process blocking trying to read output from zombie child process

The setup: I have a python (3.6) script (call it "operator") that executes a bash script (call it "plunger") in standard subprocess ways, collecting and logging stdout from the child. The plunger script itself is simple and invokes other scripts/programs to do three moderately complicated things: a) shut down a bunch of daemon processes, b) do some housekeeping, c) fire up some new daemon processes then exit. There's nothing especially strange about the system itself: plain old CentOS running with standard rpms.

The problem: When the plunger script runs just parts a and b, everything works as expected - plunger (without c) runs to completion, and operator collects all the output and continues on doing the rest of its job. When I include step c, however, plunger runs correctly, operator collects all output (if I'm reading a little at a time), but then never notices that plunger has exited and never finishes reading the output, so control never gets passed back to the operator script.

Trivial Example:

return subprocess.check_output("plunger")  # doesn't complete with the real plunger script

Observations:

running plunger in an interactive shell always works properly

the plunger process does everything it is supposed to do AND exits

running ps shows the plunger bash process as a zombie ("plunger ")

using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

using Popen and checking process status with poll() only emits None

it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.

Conjecture:
The best guess I have is that the final step's spawning of new background (daemon) processes is somehow inheriting and keeping open the stdout stream so that even though the executed plunger script outputs and exits, some unknown process continues to hold onto it and thus doesn't allow the operator script to continue.

Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?

Postscript:
My horrible hacky fugly workaround is for plunger to echo a distinctive line after it has done its job, and when operator sees it, kill the plunger process. I feel dirty just typing that.

Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:

$ (date; (sleep 5 &); date); date

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

$ (date; (sleep 5 &); date) | cat; date

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:56 EST 2019  # <- five second gap!

$ (date; ((sleep 5 &)>/dev/null); date) | cat; date

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

# this works too

$ (date; (sleep 5 >/dev/null &); date) | cat; date

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

I'm guessing that there isn't a way to really protect against this situation. The real culprit is that the scripts C calls to launch the daemons need to make sure to redirect outputs to something else so they don't keep the pipe open.

edited Feb 6 at 18:24

asked Feb 6 at 3:24

m.thome

Your guess is going to be better than ours, without seeing what step "c" actually does...

– Jeff Schaller
Feb 6 at 11:31

have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

– pizdelect
Feb 6 at 11:39

@JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

– m.thome
Feb 6 at 17:28

@pizdelect that is an excellent idea - thanks

– m.thome
Feb 6 at 17:28

add a comment |

Trivial Example:

return subprocess.check_output("plunger")  # doesn't complete with the real plunger script

Observations:

running plunger in an interactive shell always works properly

the plunger process does everything it is supposed to do AND exits

running ps shows the plunger bash process as a zombie ("plunger ")

using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

using Popen and checking process status with poll() only emits None

it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.

Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?

Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:

$ (date; (sleep 5 &); date); date

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

$ (date; (sleep 5 &); date) | cat; date

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:56 EST 2019  # <- five second gap!

$ (date; ((sleep 5 &)>/dev/null); date) | cat; date

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

# this works too

$ (date; (sleep 5 >/dev/null &); date) | cat; date

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

edited Feb 6 at 18:24

asked Feb 6 at 3:24

m.thome

Your guess is going to be better than ours, without seeing what step "c" actually does...

– Jeff Schaller
Feb 6 at 11:31

have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

– pizdelect
Feb 6 at 11:39

@JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

– m.thome
Feb 6 at 17:28

@pizdelect that is an excellent idea - thanks

– m.thome
Feb 6 at 17:28

add a comment |

Trivial Example:

return subprocess.check_output("plunger")  # doesn't complete with the real plunger script

Observations:

running plunger in an interactive shell always works properly

the plunger process does everything it is supposed to do AND exits

running ps shows the plunger bash process as a zombie ("plunger ")

using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

using Popen and checking process status with poll() only emits None

it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.

Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?

Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:

$ (date; (sleep 5 &); date); date

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

$ (date; (sleep 5 &); date) | cat; date

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:56 EST 2019  # <- five second gap!

$ (date; ((sleep 5 &)>/dev/null); date) | cat; date

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

# this works too

$ (date; (sleep 5 >/dev/null &); date) | cat; date

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

edited Feb 6 at 18:24

asked Feb 6 at 3:24

m.thome

Trivial Example:

return subprocess.check_output("plunger")  # doesn't complete with the real plunger script

Observations:

running plunger in an interactive shell always works properly

the plunger process does everything it is supposed to do AND exits

running ps shows the plunger bash process as a zombie ("plunger ")

using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

using Popen and checking process status with poll() only emits None

it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.

Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?

Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:

$ (date; (sleep 5 &); date); date

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

Wed Feb  6 12:46:27 EST 2019

$ (date; (sleep 5 &); date) | cat; date

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:51 EST 2019

Wed Feb  6 12:46:56 EST 2019  # <- five second gap!

$ (date; ((sleep 5 &)>/dev/null); date) | cat; date

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

Wed Feb  6 12:47:13 EST 2019

# this works too

$ (date; (sleep 5 >/dev/null &); date) | cat; date

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

Wed Feb  6 13:11:24 EST 2019

linux bash python

edited Feb 6 at 18:24

asked Feb 6 at 3:24

m.thome

edited Feb 6 at 18:24

asked Feb 6 at 3:24

m.thome

edited Feb 6 at 18:24

asked Feb 6 at 3:24

m.thome

asked Feb 6 at 3:24

m.thome

asked Feb 6 at 3:24

m.thome

Your guess is going to be better than ours, without seeing what step "c" actually does...

– Jeff Schaller
Feb 6 at 11:31

have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

– pizdelect
Feb 6 at 11:39

@JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

– m.thome
Feb 6 at 17:28

@pizdelect that is an excellent idea - thanks

– m.thome
Feb 6 at 17:28

add a comment |

Your guess is going to be better than ours, without seeing what step "c" actually does...

– Jeff Schaller
Feb 6 at 11:31

have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

– pizdelect
Feb 6 at 11:39

@JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

– m.thome
Feb 6 at 17:28

@pizdelect that is an excellent idea - thanks

– m.thome
Feb 6 at 17:28

Your guess is going to be better than ours, without seeing what step "c" actually does...

– Jeff Schaller
Feb 6 at 11:31

have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

– pizdelect
Feb 6 at 11:39

@JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

– m.thome
Feb 6 at 17:28

@pizdelect that is an excellent idea - thanks

– m.thome
Feb 6 at 17:28

add a comment |

1 Answer
1

active

oldest

votes

I've answered the question in the final section of the (now edited) question.

The short form is that anything that starts a background process, no matter how deeply nested, can capture your output stream and prevent you from cleanly closing your child. The solutions that I've come up with are: (a) redirect output of deamon launches to /dev/null or (b) redirect output of daemon launches to a file and (if you care) separately watch the file until your immediate child exits.

answered Feb 6 at 19:33

m.thome

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498951%2fparent-process-blocking-trying-to-read-output-from-zombie-child-process%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I've answered the question in the final section of the (now edited) question.

answered Feb 6 at 19:33

m.thome

add a comment |

I've answered the question in the final section of the (now edited) question.

answered Feb 6 at 19:33

m.thome

add a comment |

I've answered the question in the final section of the (now edited) question.

answered Feb 6 at 19:33

m.thome

I've answered the question in the final section of the (now edited) question.

answered Feb 6 at 19:33

m.thome

answered Feb 6 at 19:33

m.thome

answered Feb 6 at 19:33

m.thome

answered Feb 6 at 19:33

m.thome

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly