Parent process blocking trying to read output from zombie child process












1















The setup: I have a python (3.6) script (call it "operator") that executes a bash script (call it "plunger") in standard subprocess ways, collecting and logging stdout from the child. The plunger script itself is simple and invokes other scripts/programs to do three moderately complicated things: a) shut down a bunch of daemon processes, b) do some housekeeping, c) fire up some new daemon processes then exit. There's nothing especially strange about the system itself: plain old CentOS running with standard rpms.



The problem: When the plunger script runs just parts a and b, everything works as expected - plunger (without c) runs to completion, and operator collects all the output and continues on doing the rest of its job. When I include step c, however, plunger runs correctly, operator collects all output (if I'm reading a little at a time), but then never notices that plunger has exited and never finishes reading the output, so control never gets passed back to the operator script.



Trivial Example:



return subprocess.check_output("plunger")  # doesn't complete with the real plunger script


Observations:




  • running plunger in an interactive shell always works properly

  • the plunger process does everything it is supposed to do AND exits

  • running ps shows the plunger bash process as a zombie ("plunger ")

  • using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

  • using Popen and checking process status with poll() only emits None

  • it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.


Conjecture:
The best guess I have is that the final step's spawning of new background (daemon) processes is somehow inheriting and keeping open the stdout stream so that even though the executed plunger script outputs and exits, some unknown process continues to hold onto it and thus doesn't allow the operator script to continue.



Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?



Postscript:
My horrible hacky fugly workaround is for plunger to echo a distinctive line after it has done its job, and when operator sees it, kill the plunger process. I feel dirty just typing that.



Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:



$ (date; (sleep 5 &); date); date
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
$ (date; (sleep 5 &); date) | cat; date
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:56 EST 2019 # <- five second gap!
$ (date; ((sleep 5 &)>/dev/null); date) | cat; date
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
# this works too
$ (date; (sleep 5 >/dev/null &); date) | cat; date
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019


I'm guessing that there isn't a way to really protect against this situation. The real culprit is that the scripts C calls to launch the daemons need to make sure to redirect outputs to something else so they don't keep the pipe open.










share|improve this question

























  • Your guess is going to be better than ours, without seeing what step "c" actually does...

    – Jeff Schaller
    Feb 6 at 11:31











  • have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

    – pizdelect
    Feb 6 at 11:39











  • @JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

    – m.thome
    Feb 6 at 17:28











  • @pizdelect that is an excellent idea - thanks

    – m.thome
    Feb 6 at 17:28
















1















The setup: I have a python (3.6) script (call it "operator") that executes a bash script (call it "plunger") in standard subprocess ways, collecting and logging stdout from the child. The plunger script itself is simple and invokes other scripts/programs to do three moderately complicated things: a) shut down a bunch of daemon processes, b) do some housekeeping, c) fire up some new daemon processes then exit. There's nothing especially strange about the system itself: plain old CentOS running with standard rpms.



The problem: When the plunger script runs just parts a and b, everything works as expected - plunger (without c) runs to completion, and operator collects all the output and continues on doing the rest of its job. When I include step c, however, plunger runs correctly, operator collects all output (if I'm reading a little at a time), but then never notices that plunger has exited and never finishes reading the output, so control never gets passed back to the operator script.



Trivial Example:



return subprocess.check_output("plunger")  # doesn't complete with the real plunger script


Observations:




  • running plunger in an interactive shell always works properly

  • the plunger process does everything it is supposed to do AND exits

  • running ps shows the plunger bash process as a zombie ("plunger ")

  • using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

  • using Popen and checking process status with poll() only emits None

  • it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.


Conjecture:
The best guess I have is that the final step's spawning of new background (daemon) processes is somehow inheriting and keeping open the stdout stream so that even though the executed plunger script outputs and exits, some unknown process continues to hold onto it and thus doesn't allow the operator script to continue.



Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?



Postscript:
My horrible hacky fugly workaround is for plunger to echo a distinctive line after it has done its job, and when operator sees it, kill the plunger process. I feel dirty just typing that.



Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:



$ (date; (sleep 5 &); date); date
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
$ (date; (sleep 5 &); date) | cat; date
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:56 EST 2019 # <- five second gap!
$ (date; ((sleep 5 &)>/dev/null); date) | cat; date
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
# this works too
$ (date; (sleep 5 >/dev/null &); date) | cat; date
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019


I'm guessing that there isn't a way to really protect against this situation. The real culprit is that the scripts C calls to launch the daemons need to make sure to redirect outputs to something else so they don't keep the pipe open.










share|improve this question

























  • Your guess is going to be better than ours, without seeing what step "c" actually does...

    – Jeff Schaller
    Feb 6 at 11:31











  • have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

    – pizdelect
    Feb 6 at 11:39











  • @JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

    – m.thome
    Feb 6 at 17:28











  • @pizdelect that is an excellent idea - thanks

    – m.thome
    Feb 6 at 17:28














1












1








1








The setup: I have a python (3.6) script (call it "operator") that executes a bash script (call it "plunger") in standard subprocess ways, collecting and logging stdout from the child. The plunger script itself is simple and invokes other scripts/programs to do three moderately complicated things: a) shut down a bunch of daemon processes, b) do some housekeeping, c) fire up some new daemon processes then exit. There's nothing especially strange about the system itself: plain old CentOS running with standard rpms.



The problem: When the plunger script runs just parts a and b, everything works as expected - plunger (without c) runs to completion, and operator collects all the output and continues on doing the rest of its job. When I include step c, however, plunger runs correctly, operator collects all output (if I'm reading a little at a time), but then never notices that plunger has exited and never finishes reading the output, so control never gets passed back to the operator script.



Trivial Example:



return subprocess.check_output("plunger")  # doesn't complete with the real plunger script


Observations:




  • running plunger in an interactive shell always works properly

  • the plunger process does everything it is supposed to do AND exits

  • running ps shows the plunger bash process as a zombie ("plunger ")

  • using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

  • using Popen and checking process status with poll() only emits None

  • it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.


Conjecture:
The best guess I have is that the final step's spawning of new background (daemon) processes is somehow inheriting and keeping open the stdout stream so that even though the executed plunger script outputs and exits, some unknown process continues to hold onto it and thus doesn't allow the operator script to continue.



Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?



Postscript:
My horrible hacky fugly workaround is for plunger to echo a distinctive line after it has done its job, and when operator sees it, kill the plunger process. I feel dirty just typing that.



Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:



$ (date; (sleep 5 &); date); date
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
$ (date; (sleep 5 &); date) | cat; date
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:56 EST 2019 # <- five second gap!
$ (date; ((sleep 5 &)>/dev/null); date) | cat; date
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
# this works too
$ (date; (sleep 5 >/dev/null &); date) | cat; date
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019


I'm guessing that there isn't a way to really protect against this situation. The real culprit is that the scripts C calls to launch the daemons need to make sure to redirect outputs to something else so they don't keep the pipe open.










share|improve this question
















The setup: I have a python (3.6) script (call it "operator") that executes a bash script (call it "plunger") in standard subprocess ways, collecting and logging stdout from the child. The plunger script itself is simple and invokes other scripts/programs to do three moderately complicated things: a) shut down a bunch of daemon processes, b) do some housekeeping, c) fire up some new daemon processes then exit. There's nothing especially strange about the system itself: plain old CentOS running with standard rpms.



The problem: When the plunger script runs just parts a and b, everything works as expected - plunger (without c) runs to completion, and operator collects all the output and continues on doing the rest of its job. When I include step c, however, plunger runs correctly, operator collects all output (if I'm reading a little at a time), but then never notices that plunger has exited and never finishes reading the output, so control never gets passed back to the operator script.



Trivial Example:



return subprocess.check_output("plunger")  # doesn't complete with the real plunger script


Observations:




  • running plunger in an interactive shell always works properly

  • the plunger process does everything it is supposed to do AND exits

  • running ps shows the plunger bash process as a zombie ("plunger ")

  • using Popen and reading line by line indicates all expected lines are output and properly terminated with a newline

  • using Popen and checking process status with poll() only emits None

  • it is behaving like the child isn't ending or there are bytes remaining to be read, even when it has exited and the only PIPE stream is stdout... and reading from stdout blocks.


Conjecture:
The best guess I have is that the final step's spawning of new background (daemon) processes is somehow inheriting and keeping open the stdout stream so that even though the executed plunger script outputs and exits, some unknown process continues to hold onto it and thus doesn't allow the operator script to continue.



Questions:
Is my conjecture likely (or possible)? If not, what else might I look for? If so, how could I protect operator and/or plunger from downstream abuse of my streams?



Postscript:
My horrible hacky fugly workaround is for plunger to echo a distinctive line after it has done its job, and when operator sees it, kill the plunger process. I feel dirty just typing that.



Edit and conclusion:
My conjecture is correct and the problem has nothing to do with python or, really, bash and more to do with how fork works. Here's a minimal example:



$ (date; (sleep 5 &); date); date
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
Wed Feb 6 12:46:27 EST 2019
$ (date; (sleep 5 &); date) | cat; date
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:51 EST 2019
Wed Feb 6 12:46:56 EST 2019 # <- five second gap!
$ (date; ((sleep 5 &)>/dev/null); date) | cat; date
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
Wed Feb 6 12:47:13 EST 2019
# this works too
$ (date; (sleep 5 >/dev/null &); date) | cat; date
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019
Wed Feb 6 13:11:24 EST 2019


I'm guessing that there isn't a way to really protect against this situation. The real culprit is that the scripts C calls to launch the daemons need to make sure to redirect outputs to something else so they don't keep the pipe open.







linux bash python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 6 at 18:24







m.thome

















asked Feb 6 at 3:24









m.thomem.thome

63




63













  • Your guess is going to be better than ours, without seeing what step "c" actually does...

    – Jeff Schaller
    Feb 6 at 11:31











  • have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

    – pizdelect
    Feb 6 at 11:39











  • @JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

    – m.thome
    Feb 6 at 17:28











  • @pizdelect that is an excellent idea - thanks

    – m.thome
    Feb 6 at 17:28



















  • Your guess is going to be better than ours, without seeing what step "c" actually does...

    – Jeff Schaller
    Feb 6 at 11:31











  • have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

    – pizdelect
    Feb 6 at 11:39











  • @JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

    – m.thome
    Feb 6 at 17:28











  • @pizdelect that is an excellent idea - thanks

    – m.thome
    Feb 6 at 17:28

















Your guess is going to be better than ours, without seeing what step "c" actually does...

– Jeff Schaller
Feb 6 at 11:31





Your guess is going to be better than ours, without seeing what step "c" actually does...

– Jeff Schaller
Feb 6 at 11:31













have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

– pizdelect
Feb 6 at 11:39





have you considered debugging (strace -fe trace=execve,fork) instead of conjecturing? ;-

– pizdelect
Feb 6 at 11:39













@JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

– m.thome
Feb 6 at 17:28





@JeffSchaller I can't share the actual code and have yet to hit upon a succinctly sharable repro, but the questions are the same: what could it do to arrange to keep the pipe's fd open after exiting? and What can I do in either script to protect them from this situation?

– m.thome
Feb 6 at 17:28













@pizdelect that is an excellent idea - thanks

– m.thome
Feb 6 at 17:28





@pizdelect that is an excellent idea - thanks

– m.thome
Feb 6 at 17:28










1 Answer
1






active

oldest

votes


















0














I've answered the question in the final section of the (now edited) question.



The short form is that anything that starts a background process, no matter how deeply nested, can capture your output stream and prevent you from cleanly closing your child. The solutions that I've come up with are: (a) redirect output of deamon launches to /dev/null or (b) redirect output of daemon launches to a file and (if you care) separately watch the file until your immediate child exits.






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498951%2fparent-process-blocking-trying-to-read-output-from-zombie-child-process%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I've answered the question in the final section of the (now edited) question.



    The short form is that anything that starts a background process, no matter how deeply nested, can capture your output stream and prevent you from cleanly closing your child. The solutions that I've come up with are: (a) redirect output of deamon launches to /dev/null or (b) redirect output of daemon launches to a file and (if you care) separately watch the file until your immediate child exits.






    share|improve this answer




























      0














      I've answered the question in the final section of the (now edited) question.



      The short form is that anything that starts a background process, no matter how deeply nested, can capture your output stream and prevent you from cleanly closing your child. The solutions that I've come up with are: (a) redirect output of deamon launches to /dev/null or (b) redirect output of daemon launches to a file and (if you care) separately watch the file until your immediate child exits.






      share|improve this answer


























        0












        0








        0







        I've answered the question in the final section of the (now edited) question.



        The short form is that anything that starts a background process, no matter how deeply nested, can capture your output stream and prevent you from cleanly closing your child. The solutions that I've come up with are: (a) redirect output of deamon launches to /dev/null or (b) redirect output of daemon launches to a file and (if you care) separately watch the file until your immediate child exits.






        share|improve this answer













        I've answered the question in the final section of the (now edited) question.



        The short form is that anything that starts a background process, no matter how deeply nested, can capture your output stream and prevent you from cleanly closing your child. The solutions that I've come up with are: (a) redirect output of deamon launches to /dev/null or (b) redirect output of daemon launches to a file and (if you care) separately watch the file until your immediate child exits.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 6 at 19:33









        m.thomem.thome

        63




        63






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498951%2fparent-process-blocking-trying-to-read-output-from-zombie-child-process%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

            is 'sed' thread safe

            How to make a Squid Proxy server?