BASH script to monitor subprocess and throttle it for CPU temperature control

I need to run CPU-intensive tasks on a very old machine with overheating issues. The script below will monitor temperature and pause the jobs when it gets too high, continuing them when it's back to normal.

The actual commands run are, of course, not included, since they are irrelevant to the question.

I am looking for hidden traps I may have set in my code (listed at the bottom), and for other things I have done incorrectly. Aside from special characters in the commands and arguments that are run, which are hand created so I can control that risk, what traps or "got-ya's" have I unknowingly set into the code? What ways are there for making this more error-proof, or better in other ways?

For the timing function I know I could have used the

time { command ...; command ...; }

construct, but I was more interested in the time spent by the machine (and previously, by me in the chair) than in the CPU time involved.

The Script:

The code comments should explain what it does, as well as why I did some of it the way I did.

#!/bin/bash



# Build my time reporting function

function report {

    # Get the current time, do the math, report the results.

    end_time=$(date +%s);



    # The time used for the last run process

    proc_time=$(echo "$end_time"-"$start_time" | bc);

    echo " ******* Processing time: $(date -u -d @${proc_time} +%T)";



    # The cummulative time for all processes so far 

    run_time=$(echo "$end_time"-"$launch_time" | bc);

    echo " ******* Running time: $(date -u -d @${run_time} +%T)";

}



# The high and low temperatures to monitor for. Processing is paused

# once the high temp is reached, and will not resume again until the

# low temp is reached.



# My system recovers to 60°C reasonably quick (idle is around 45°C)

temp_lo=60;



# My system dies at about 115°C - since 100°C is normal, suggests my

# sensors are not accurate, but I work with what I have.

# 20°C margin allows for delay in the detection of the high temp, and

# delay in the process pausing, while still keeping temp under danger

# zone. Also allows for when Core0 is rising faster than Core1. They

# seem to take turns being the leader, but seldom more than 5-10°C 

# difference.

temp_hi=95;



# The routine to read the CPU temp with lm sensors. Could be coded

# inline in the watch_child function, but that means placing it in

# three places, and if the grep/sed needs adjusting, then I have to

# remember to change _all_ three, and not make any typos. This cuts

# my chance of errors to a third.

function get_temp {

    # the grep and/or sed may need changing for other sensor output

    # on different systems

    sensors | grep 'Core1' | sed -e 's/.*: +([+-][0-9.]+)°C.*$/01/'

}



# Routine to monitor the CPU temp, pausing the processing as needed

# to remain in the 'safe' range for processor temperature.

function watch_child {

    # argument should be the PID of the backgrounded process

    childd=$1;

    # pre-load the CPU temp

    temp=$(get_temp);

    # As long as the backgrounded process is still running

    while [ -e /proc/$childd ]; do

        # Monitor the process, for still running, and the temp, still

        # safe

        while [ -e /proc/$childd ] && [ $(echo "$temp < $temp_hi" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # If the process is still running, then it was over-temp that

        # caused the while loop to end

        if [ -e /proc/$childd ]; then

            # Tell the process to take a break

            kill -SIGSTOP "$childd";

        fi

        # Drops through here if the process has ended, otherwise,

        # monitor the temp for a restart

        while [ -e /proc/$childd ] && [ $(echo "$temp > $temp_lo" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # Drop through here if the process has ended.

        if [ -e /proc/$childd ]; then

            # Otherwise, tell the process that the break is over.

            kill -SIGCONT "$childd";

        fi

    done

    # Only get this far once the process has ended.

    # In the rare case of the process never waking up, the outer while

    # loop will run infinitely!

    # Human monitoring still required!

}



# Start the timer for cumulative run time reports 

launch_time=$(date +%s);



echo "********* The step to perform.";

# Start the timer for this process

start_time=$(date +%s);

# Launch the dangerous process in the background

my_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;



echo "********* The next step to perform.";

# Start the timer for the next process

start_time=$(date +%s);

# Launch the dangerous process in the background

another_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

This question came from our site for users of Linux, FreeBSD and other Un*x-like operating systems.

$begingroup$
Would fit well at Code Review as well.
$endgroup$
– phk
Dec 29 '16 at 12:49

$begingroup$
@phk Since cross-posting is out, how do I move this there?
$endgroup$
– Gypsy Spellweaver
Jan 10 '17 at 9:12

$begingroup$
Actually, I am not entirely sure. Either deleting it here and posting it there then or flagging this thread for moderator attention. If you don't find any info on this at help center, Code Review Meta or Meta Stack Exchange then ask at Code Review Meta.
$endgroup$
– phk
Jan 10 '17 at 10:23

add a comment |

The actual commands run are, of course, not included, since they are irrelevant to the question.

For the timing function I know I could have used the

time { command ...; command ...; }

construct, but I was more interested in the time spent by the machine (and previously, by me in the chair) than in the CPU time involved.

The Script:

The code comments should explain what it does, as well as why I did some of it the way I did.

#!/bin/bash



# Build my time reporting function

function report {

    # Get the current time, do the math, report the results.

    end_time=$(date +%s);



    # The time used for the last run process

    proc_time=$(echo "$end_time"-"$start_time" | bc);

    echo " ******* Processing time: $(date -u -d @${proc_time} +%T)";



    # The cummulative time for all processes so far 

    run_time=$(echo "$end_time"-"$launch_time" | bc);

    echo " ******* Running time: $(date -u -d @${run_time} +%T)";

}



# The high and low temperatures to monitor for. Processing is paused

# once the high temp is reached, and will not resume again until the

# low temp is reached.



# My system recovers to 60°C reasonably quick (idle is around 45°C)

temp_lo=60;



# My system dies at about 115°C - since 100°C is normal, suggests my

# sensors are not accurate, but I work with what I have.

# 20°C margin allows for delay in the detection of the high temp, and

# delay in the process pausing, while still keeping temp under danger

# zone. Also allows for when Core0 is rising faster than Core1. They

# seem to take turns being the leader, but seldom more than 5-10°C 

# difference.

temp_hi=95;



# The routine to read the CPU temp with lm sensors. Could be coded

# inline in the watch_child function, but that means placing it in

# three places, and if the grep/sed needs adjusting, then I have to

# remember to change _all_ three, and not make any typos. This cuts

# my chance of errors to a third.

function get_temp {

    # the grep and/or sed may need changing for other sensor output

    # on different systems

    sensors | grep 'Core1' | sed -e 's/.*: +([+-][0-9.]+)°C.*$/01/'

}



# Routine to monitor the CPU temp, pausing the processing as needed

# to remain in the 'safe' range for processor temperature.

function watch_child {

    # argument should be the PID of the backgrounded process

    childd=$1;

    # pre-load the CPU temp

    temp=$(get_temp);

    # As long as the backgrounded process is still running

    while [ -e /proc/$childd ]; do

        # Monitor the process, for still running, and the temp, still

        # safe

        while [ -e /proc/$childd ] && [ $(echo "$temp < $temp_hi" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # If the process is still running, then it was over-temp that

        # caused the while loop to end

        if [ -e /proc/$childd ]; then

            # Tell the process to take a break

            kill -SIGSTOP "$childd";

        fi

        # Drops through here if the process has ended, otherwise,

        # monitor the temp for a restart

        while [ -e /proc/$childd ] && [ $(echo "$temp > $temp_lo" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # Drop through here if the process has ended.

        if [ -e /proc/$childd ]; then

            # Otherwise, tell the process that the break is over.

            kill -SIGCONT "$childd";

        fi

    done

    # Only get this far once the process has ended.

    # In the rare case of the process never waking up, the outer while

    # loop will run infinitely!

    # Human monitoring still required!

}



# Start the timer for cumulative run time reports 

launch_time=$(date +%s);



echo "********* The step to perform.";

# Start the timer for this process

start_time=$(date +%s);

# Launch the dangerous process in the background

my_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;



echo "********* The next step to perform.";

# Start the timer for the next process

start_time=$(date +%s);

# Launch the dangerous process in the background

another_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

This question came from our site for users of Linux, FreeBSD and other Un*x-like operating systems.

$begingroup$
Would fit well at Code Review as well.
$endgroup$
– phk
Dec 29 '16 at 12:49

$begingroup$
@phk Since cross-posting is out, how do I move this there?
$endgroup$
– Gypsy Spellweaver
Jan 10 '17 at 9:12

$begingroup$
Actually, I am not entirely sure. Either deleting it here and posting it there then or flagging this thread for moderator attention. If you don't find any info on this at help center, Code Review Meta or Meta Stack Exchange then ask at Code Review Meta.
$endgroup$
– phk
Jan 10 '17 at 10:23

add a comment |

The actual commands run are, of course, not included, since they are irrelevant to the question.

For the timing function I know I could have used the

time { command ...; command ...; }

construct, but I was more interested in the time spent by the machine (and previously, by me in the chair) than in the CPU time involved.

The Script:

The code comments should explain what it does, as well as why I did some of it the way I did.

#!/bin/bash



# Build my time reporting function

function report {

    # Get the current time, do the math, report the results.

    end_time=$(date +%s);



    # The time used for the last run process

    proc_time=$(echo "$end_time"-"$start_time" | bc);

    echo " ******* Processing time: $(date -u -d @${proc_time} +%T)";



    # The cummulative time for all processes so far 

    run_time=$(echo "$end_time"-"$launch_time" | bc);

    echo " ******* Running time: $(date -u -d @${run_time} +%T)";

}



# The high and low temperatures to monitor for. Processing is paused

# once the high temp is reached, and will not resume again until the

# low temp is reached.



# My system recovers to 60°C reasonably quick (idle is around 45°C)

temp_lo=60;



# My system dies at about 115°C - since 100°C is normal, suggests my

# sensors are not accurate, but I work with what I have.

# 20°C margin allows for delay in the detection of the high temp, and

# delay in the process pausing, while still keeping temp under danger

# zone. Also allows for when Core0 is rising faster than Core1. They

# seem to take turns being the leader, but seldom more than 5-10°C 

# difference.

temp_hi=95;



# The routine to read the CPU temp with lm sensors. Could be coded

# inline in the watch_child function, but that means placing it in

# three places, and if the grep/sed needs adjusting, then I have to

# remember to change _all_ three, and not make any typos. This cuts

# my chance of errors to a third.

function get_temp {

    # the grep and/or sed may need changing for other sensor output

    # on different systems

    sensors | grep 'Core1' | sed -e 's/.*: +([+-][0-9.]+)°C.*$/01/'

}



# Routine to monitor the CPU temp, pausing the processing as needed

# to remain in the 'safe' range for processor temperature.

function watch_child {

    # argument should be the PID of the backgrounded process

    childd=$1;

    # pre-load the CPU temp

    temp=$(get_temp);

    # As long as the backgrounded process is still running

    while [ -e /proc/$childd ]; do

        # Monitor the process, for still running, and the temp, still

        # safe

        while [ -e /proc/$childd ] && [ $(echo "$temp < $temp_hi" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # If the process is still running, then it was over-temp that

        # caused the while loop to end

        if [ -e /proc/$childd ]; then

            # Tell the process to take a break

            kill -SIGSTOP "$childd";

        fi

        # Drops through here if the process has ended, otherwise,

        # monitor the temp for a restart

        while [ -e /proc/$childd ] && [ $(echo "$temp > $temp_lo" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # Drop through here if the process has ended.

        if [ -e /proc/$childd ]; then

            # Otherwise, tell the process that the break is over.

            kill -SIGCONT "$childd";

        fi

    done

    # Only get this far once the process has ended.

    # In the rare case of the process never waking up, the outer while

    # loop will run infinitely!

    # Human monitoring still required!

}



# Start the timer for cumulative run time reports 

launch_time=$(date +%s);



echo "********* The step to perform.";

# Start the timer for this process

start_time=$(date +%s);

# Launch the dangerous process in the background

my_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;



echo "********* The next step to perform.";

# Start the timer for the next process

start_time=$(date +%s);

# Launch the dangerous process in the background

another_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

The actual commands run are, of course, not included, since they are irrelevant to the question.

For the timing function I know I could have used the

time { command ...; command ...; }

construct, but I was more interested in the time spent by the machine (and previously, by me in the chair) than in the CPU time involved.

The Script:

The code comments should explain what it does, as well as why I did some of it the way I did.

#!/bin/bash



# Build my time reporting function

function report {

    # Get the current time, do the math, report the results.

    end_time=$(date +%s);



    # The time used for the last run process

    proc_time=$(echo "$end_time"-"$start_time" | bc);

    echo " ******* Processing time: $(date -u -d @${proc_time} +%T)";



    # The cummulative time for all processes so far 

    run_time=$(echo "$end_time"-"$launch_time" | bc);

    echo " ******* Running time: $(date -u -d @${run_time} +%T)";

}



# The high and low temperatures to monitor for. Processing is paused

# once the high temp is reached, and will not resume again until the

# low temp is reached.



# My system recovers to 60°C reasonably quick (idle is around 45°C)

temp_lo=60;



# My system dies at about 115°C - since 100°C is normal, suggests my

# sensors are not accurate, but I work with what I have.

# 20°C margin allows for delay in the detection of the high temp, and

# delay in the process pausing, while still keeping temp under danger

# zone. Also allows for when Core0 is rising faster than Core1. They

# seem to take turns being the leader, but seldom more than 5-10°C 

# difference.

temp_hi=95;



# The routine to read the CPU temp with lm sensors. Could be coded

# inline in the watch_child function, but that means placing it in

# three places, and if the grep/sed needs adjusting, then I have to

# remember to change _all_ three, and not make any typos. This cuts

# my chance of errors to a third.

function get_temp {

    # the grep and/or sed may need changing for other sensor output

    # on different systems

    sensors | grep 'Core1' | sed -e 's/.*: +([+-][0-9.]+)°C.*$/01/'

}



# Routine to monitor the CPU temp, pausing the processing as needed

# to remain in the 'safe' range for processor temperature.

function watch_child {

    # argument should be the PID of the backgrounded process

    childd=$1;

    # pre-load the CPU temp

    temp=$(get_temp);

    # As long as the backgrounded process is still running

    while [ -e /proc/$childd ]; do

        # Monitor the process, for still running, and the temp, still

        # safe

        while [ -e /proc/$childd ] && [ $(echo "$temp < $temp_hi" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # If the process is still running, then it was over-temp that

        # caused the while loop to end

        if [ -e /proc/$childd ]; then

            # Tell the process to take a break

            kill -SIGSTOP "$childd";

        fi

        # Drops through here if the process has ended, otherwise,

        # monitor the temp for a restart

        while [ -e /proc/$childd ] && [ $(echo "$temp > $temp_lo" | bc) = 1 ]; do

            # wait a spell

            sleep 5;

            # re-load the temp for a re-check

            temp=$(get_temp);

        done

        # Drop through here if the process has ended.

        if [ -e /proc/$childd ]; then

            # Otherwise, tell the process that the break is over.

            kill -SIGCONT "$childd";

        fi

    done

    # Only get this far once the process has ended.

    # In the rare case of the process never waking up, the outer while

    # loop will run infinitely!

    # Human monitoring still required!

}



# Start the timer for cumulative run time reports 

launch_time=$(date +%s);



echo "********* The step to perform.";

# Start the timer for this process

start_time=$(date +%s);

# Launch the dangerous process in the background

my_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;



echo "********* The next step to perform.";

# Start the timer for the next process

start_time=$(date +%s);

# Launch the dangerous process in the background

another_long_running_command arg1 arg2 &

# Capture its PID

child=$!;

# Block, with temp throttling, until this process is done

watch_child $child;

report;

bash

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

asked Dec 29 '16 at 5:59

Gypsy Spellweaver

1961215

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

This question came from our site for users of Linux, FreeBSD and other Un*x-like operating systems.

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

This question came from our site for users of Linux, FreeBSD and other Un*x-like operating systems.

$begingroup$
Would fit well at Code Review as well.
$endgroup$
– phk
Dec 29 '16 at 12:49

$begingroup$
@phk Since cross-posting is out, how do I move this there?
$endgroup$
– Gypsy Spellweaver
Jan 10 '17 at 9:12

$begingroup$
Actually, I am not entirely sure. Either deleting it here and posting it there then or flagging this thread for moderator attention. If you don't find any info on this at help center, Code Review Meta or Meta Stack Exchange then ask at Code Review Meta.
$endgroup$
– phk
Jan 10 '17 at 10:23

add a comment |

$begingroup$
Would fit well at Code Review as well.
$endgroup$
– phk
Dec 29 '16 at 12:49

$begingroup$
@phk Since cross-posting is out, how do I move this there?
$endgroup$
– Gypsy Spellweaver
Jan 10 '17 at 9:12

$begingroup$
Actually, I am not entirely sure. Either deleting it here and posting it there then or flagging this thread for moderator attention. If you don't find any info on this at help center, Code Review Meta or Meta Stack Exchange then ask at Code Review Meta.
$endgroup$
– phk
Jan 10 '17 at 10:23

Would fit well at Code Review as well.

– phk
Dec 29 '16 at 12:49

@phk Since cross-posting is out, how do I move this there?

– Gypsy Spellweaver
Jan 10 '17 at 9:12

Actually, I am not entirely sure. Either deleting it here and posting it there then or flagging this thread for moderator attention. If you don't find any info on this at help center, Code Review Meta or Meta Stack Exchange then ask at Code Review Meta.

– phk
Jan 10 '17 at 10:23

add a comment |

1 Answer
1

active

oldest

votes

Although unrelated to the code, I'll mention that for a CPU to overheat, especially a dual-core CPU, is not usual except with very high ambient temps. I'd suggest removing the heat sink and re-applying thermal paste. Any number of youtube videos can provide step-by-step instructions.

Moving on to the code:

terminal semicolons aren't needed

configuration should go at the top

kill -0 PID is a portable alternative to -e /proc/$pid

bash builtins let and [[ x -gt y ]] can replace bc for these purposes

[[ .. ]] is a builtin alternative to [ .. ]

date +%s can be replaced by builtin printf

gawk can extract the temperature more flexibly than grep+sed

your time/run/report pattern can be factored into a function

the monitoring loop can be simplified by moving sleep to the end

no real harm in monitoring more aggressively, since the loop is not going to use a lot of CPU

can save a couple of forks by reading temp directly from /sys

Putting it all together:

#!/bin/bash

temp_lo=60

temp_hi=95



temp_label=$( grep -l ^Core /sys/bus/platform/devices/coretemp.*/hwmon/hwmon*/temp*_label  |head -1 )

temp_source=${temp_label%_label}_input

alias now="printf '%(%s)Tn' -1"



function watch_child {

    childd=$1

    while kill -0 $childd; do

        temp=$(( $(<$temp_source) / 1000 ))

        [[ $temp -ge $temp_hi ]] && kill -SIGSTOP $childd

        [[ $temp -le $temp_lo ]] && kill -SIGCONT $childd

        sleep 1

    done

}



function elapsed {

    echo " ******* $1 time: $(date -u -d @$(( ${3:-$(now)}-$2 )) +%T)"

}



function monitor {

    launch_time=${launch_time:-$(now)}

    start_time=$(now)

    echo "********* $1"

    shift

    "$@" &

    watch_child $!

    elapsed Processing $start_time

    elapsed Running $launch_time

}



monitor "The step to perform." my_long_running_command arg1 arg2 

monitor "The next step to perform." another_long_running_command arg1 arg2

edited 22 mins ago

answered 2 hours ago

Oh My Goodness

49017

$begingroup$
Thanks for the review. Agreed on the root cause, but all remedies failed. It was a 12+ yr old core with a hard life. Semicolons are a personal style and convenience. kill is portable, yet [[ .. ]] isn't as much so. gawd over grep+sed is a great call, reducing CPU load as well (I think). Refactoring time/run/report is a good one too. Not so sure about the increased aggressiveness, the objective is to not only know when the core is cool, but also allow it to cool as fast as possible.
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
One issue: as long as the temp is not between the threshold values, additional, unneeded, kill commands will be issued. -SIGSTOP will be repeatedly issued every second, the core is below temp_high. Once the temp goes below temp_low, -SIGCONT will be reissued every second. Would not only one kill per threshold crossing be better at conserving CPU resources?
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
There are really no CPU resources used: kill is a builtin that invokes a single syscall. On my system, ten million invocations is 2.5s of CPU time, or ~250ns each. Compare the sensors command line, also run once per loop, at ~8ms each, or 32000 times longer. [[ ]] is "portable" to any other bash and definitely more efficient than forking bc.
$endgroup$
– Oh My Goodness
49 mins ago

$begingroup$
To give an idea of the cost of forks (bc and [ .. ]) I ran both versions of watch_child with sleeps disabled and the same gawk-based get_temp. Based on loops executed per 5 seconds, the modified version is about 50% faster.
$endgroup$
– Oh My Goodness
35 mins ago

$begingroup$
edit: you can cut the use of sensors/gawk altogether; see edits to my code
$endgroup$
– Oh My Goodness
21 mins ago

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f152320%2fbash-script-to-monitor-subprocess-and-throttle-it-for-cpu-temperature-control%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Moving on to the code:

terminal semicolons aren't needed

configuration should go at the top

kill -0 PID is a portable alternative to -e /proc/$pid

bash builtins let and [[ x -gt y ]] can replace bc for these purposes

[[ .. ]] is a builtin alternative to [ .. ]

date +%s can be replaced by builtin printf

gawk can extract the temperature more flexibly than grep+sed

your time/run/report pattern can be factored into a function

the monitoring loop can be simplified by moving sleep to the end

no real harm in monitoring more aggressively, since the loop is not going to use a lot of CPU

can save a couple of forks by reading temp directly from /sys

Putting it all together:

#!/bin/bash

temp_lo=60

temp_hi=95



temp_label=$( grep -l ^Core /sys/bus/platform/devices/coretemp.*/hwmon/hwmon*/temp*_label  |head -1 )

temp_source=${temp_label%_label}_input

alias now="printf '%(%s)Tn' -1"



function watch_child {

    childd=$1

    while kill -0 $childd; do

        temp=$(( $(<$temp_source) / 1000 ))

        [[ $temp -ge $temp_hi ]] && kill -SIGSTOP $childd

        [[ $temp -le $temp_lo ]] && kill -SIGCONT $childd

        sleep 1

    done

}



function elapsed {

    echo " ******* $1 time: $(date -u -d @$(( ${3:-$(now)}-$2 )) +%T)"

}



function monitor {

    launch_time=${launch_time:-$(now)}

    start_time=$(now)

    echo "********* $1"

    shift

    "$@" &

    watch_child $!

    elapsed Processing $start_time

    elapsed Running $launch_time

}



monitor "The step to perform." my_long_running_command arg1 arg2 

monitor "The next step to perform." another_long_running_command arg1 arg2

edited 22 mins ago

answered 2 hours ago

Oh My Goodness

49017

$begingroup$
Thanks for the review. Agreed on the root cause, but all remedies failed. It was a 12+ yr old core with a hard life. Semicolons are a personal style and convenience. kill is portable, yet [[ .. ]] isn't as much so. gawd over grep+sed is a great call, reducing CPU load as well (I think). Refactoring time/run/report is a good one too. Not so sure about the increased aggressiveness, the objective is to not only know when the core is cool, but also allow it to cool as fast as possible.
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
One issue: as long as the temp is not between the threshold values, additional, unneeded, kill commands will be issued. -SIGSTOP will be repeatedly issued every second, the core is below temp_high. Once the temp goes below temp_low, -SIGCONT will be reissued every second. Would not only one kill per threshold crossing be better at conserving CPU resources?
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
There are really no CPU resources used: kill is a builtin that invokes a single syscall. On my system, ten million invocations is 2.5s of CPU time, or ~250ns each. Compare the sensors command line, also run once per loop, at ~8ms each, or 32000 times longer. [[ ]] is "portable" to any other bash and definitely more efficient than forking bc.
$endgroup$
– Oh My Goodness
49 mins ago

$begingroup$
To give an idea of the cost of forks (bc and [ .. ]) I ran both versions of watch_child with sleeps disabled and the same gawk-based get_temp. Based on loops executed per 5 seconds, the modified version is about 50% faster.
$endgroup$
– Oh My Goodness
35 mins ago

$begingroup$
edit: you can cut the use of sensors/gawk altogether; see edits to my code
$endgroup$
– Oh My Goodness
21 mins ago

|
show 2 more comments

Moving on to the code:

terminal semicolons aren't needed

configuration should go at the top

kill -0 PID is a portable alternative to -e /proc/$pid

bash builtins let and [[ x -gt y ]] can replace bc for these purposes

[[ .. ]] is a builtin alternative to [ .. ]

date +%s can be replaced by builtin printf

gawk can extract the temperature more flexibly than grep+sed

your time/run/report pattern can be factored into a function

the monitoring loop can be simplified by moving sleep to the end

no real harm in monitoring more aggressively, since the loop is not going to use a lot of CPU

can save a couple of forks by reading temp directly from /sys

Putting it all together:

#!/bin/bash

temp_lo=60

temp_hi=95



temp_label=$( grep -l ^Core /sys/bus/platform/devices/coretemp.*/hwmon/hwmon*/temp*_label  |head -1 )

temp_source=${temp_label%_label}_input

alias now="printf '%(%s)Tn' -1"



function watch_child {

    childd=$1

    while kill -0 $childd; do

        temp=$(( $(<$temp_source) / 1000 ))

        [[ $temp -ge $temp_hi ]] && kill -SIGSTOP $childd

        [[ $temp -le $temp_lo ]] && kill -SIGCONT $childd

        sleep 1

    done

}



function elapsed {

    echo " ******* $1 time: $(date -u -d @$(( ${3:-$(now)}-$2 )) +%T)"

}



function monitor {

    launch_time=${launch_time:-$(now)}

    start_time=$(now)

    echo "********* $1"

    shift

    "$@" &

    watch_child $!

    elapsed Processing $start_time

    elapsed Running $launch_time

}



monitor "The step to perform." my_long_running_command arg1 arg2 

monitor "The next step to perform." another_long_running_command arg1 arg2

edited 22 mins ago

answered 2 hours ago

Oh My Goodness

49017

$begingroup$
Thanks for the review. Agreed on the root cause, but all remedies failed. It was a 12+ yr old core with a hard life. Semicolons are a personal style and convenience. kill is portable, yet [[ .. ]] isn't as much so. gawd over grep+sed is a great call, reducing CPU load as well (I think). Refactoring time/run/report is a good one too. Not so sure about the increased aggressiveness, the objective is to not only know when the core is cool, but also allow it to cool as fast as possible.
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
One issue: as long as the temp is not between the threshold values, additional, unneeded, kill commands will be issued. -SIGSTOP will be repeatedly issued every second, the core is below temp_high. Once the temp goes below temp_low, -SIGCONT will be reissued every second. Would not only one kill per threshold crossing be better at conserving CPU resources?
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
There are really no CPU resources used: kill is a builtin that invokes a single syscall. On my system, ten million invocations is 2.5s of CPU time, or ~250ns each. Compare the sensors command line, also run once per loop, at ~8ms each, or 32000 times longer. [[ ]] is "portable" to any other bash and definitely more efficient than forking bc.
$endgroup$
– Oh My Goodness
49 mins ago

$begingroup$
To give an idea of the cost of forks (bc and [ .. ]) I ran both versions of watch_child with sleeps disabled and the same gawk-based get_temp. Based on loops executed per 5 seconds, the modified version is about 50% faster.
$endgroup$
– Oh My Goodness
35 mins ago

$begingroup$
edit: you can cut the use of sensors/gawk altogether; see edits to my code
$endgroup$
– Oh My Goodness
21 mins ago

|
show 2 more comments

Moving on to the code:

terminal semicolons aren't needed

configuration should go at the top

kill -0 PID is a portable alternative to -e /proc/$pid

bash builtins let and [[ x -gt y ]] can replace bc for these purposes

[[ .. ]] is a builtin alternative to [ .. ]

date +%s can be replaced by builtin printf

gawk can extract the temperature more flexibly than grep+sed

your time/run/report pattern can be factored into a function

the monitoring loop can be simplified by moving sleep to the end

no real harm in monitoring more aggressively, since the loop is not going to use a lot of CPU

can save a couple of forks by reading temp directly from /sys

Putting it all together:

#!/bin/bash

temp_lo=60

temp_hi=95



temp_label=$( grep -l ^Core /sys/bus/platform/devices/coretemp.*/hwmon/hwmon*/temp*_label  |head -1 )

temp_source=${temp_label%_label}_input

alias now="printf '%(%s)Tn' -1"



function watch_child {

    childd=$1

    while kill -0 $childd; do

        temp=$(( $(<$temp_source) / 1000 ))

        [[ $temp -ge $temp_hi ]] && kill -SIGSTOP $childd

        [[ $temp -le $temp_lo ]] && kill -SIGCONT $childd

        sleep 1

    done

}



function elapsed {

    echo " ******* $1 time: $(date -u -d @$(( ${3:-$(now)}-$2 )) +%T)"

}



function monitor {

    launch_time=${launch_time:-$(now)}

    start_time=$(now)

    echo "********* $1"

    shift

    "$@" &

    watch_child $!

    elapsed Processing $start_time

    elapsed Running $launch_time

}



monitor "The step to perform." my_long_running_command arg1 arg2 

monitor "The next step to perform." another_long_running_command arg1 arg2

edited 22 mins ago

answered 2 hours ago

Oh My Goodness

49017

Moving on to the code:

terminal semicolons aren't needed

configuration should go at the top

kill -0 PID is a portable alternative to -e /proc/$pid

bash builtins let and [[ x -gt y ]] can replace bc for these purposes

[[ .. ]] is a builtin alternative to [ .. ]

date +%s can be replaced by builtin printf

gawk can extract the temperature more flexibly than grep+sed

your time/run/report pattern can be factored into a function

the monitoring loop can be simplified by moving sleep to the end

no real harm in monitoring more aggressively, since the loop is not going to use a lot of CPU

can save a couple of forks by reading temp directly from /sys

Putting it all together:

#!/bin/bash

temp_lo=60

temp_hi=95



temp_label=$( grep -l ^Core /sys/bus/platform/devices/coretemp.*/hwmon/hwmon*/temp*_label  |head -1 )

temp_source=${temp_label%_label}_input

alias now="printf '%(%s)Tn' -1"



function watch_child {

    childd=$1

    while kill -0 $childd; do

        temp=$(( $(<$temp_source) / 1000 ))

        [[ $temp -ge $temp_hi ]] && kill -SIGSTOP $childd

        [[ $temp -le $temp_lo ]] && kill -SIGCONT $childd

        sleep 1

    done

}



function elapsed {

    echo " ******* $1 time: $(date -u -d @$(( ${3:-$(now)}-$2 )) +%T)"

}



function monitor {

    launch_time=${launch_time:-$(now)}

    start_time=$(now)

    echo "********* $1"

    shift

    "$@" &

    watch_child $!

    elapsed Processing $start_time

    elapsed Running $launch_time

}



monitor "The step to perform." my_long_running_command arg1 arg2 

monitor "The next step to perform." another_long_running_command arg1 arg2

edited 22 mins ago

answered 2 hours ago

Oh My Goodness

49017

edited 22 mins ago

answered 2 hours ago

Oh My Goodness

49017

answered 2 hours ago

Oh My Goodness

49017

answered 2 hours ago

Oh My Goodness

49017

$begingroup$
Thanks for the review. Agreed on the root cause, but all remedies failed. It was a 12+ yr old core with a hard life. Semicolons are a personal style and convenience. kill is portable, yet [[ .. ]] isn't as much so. gawd over grep+sed is a great call, reducing CPU load as well (I think). Refactoring time/run/report is a good one too. Not so sure about the increased aggressiveness, the objective is to not only know when the core is cool, but also allow it to cool as fast as possible.
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
One issue: as long as the temp is not between the threshold values, additional, unneeded, kill commands will be issued. -SIGSTOP will be repeatedly issued every second, the core is below temp_high. Once the temp goes below temp_low, -SIGCONT will be reissued every second. Would not only one kill per threshold crossing be better at conserving CPU resources?
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
There are really no CPU resources used: kill is a builtin that invokes a single syscall. On my system, ten million invocations is 2.5s of CPU time, or ~250ns each. Compare the sensors command line, also run once per loop, at ~8ms each, or 32000 times longer. [[ ]] is "portable" to any other bash and definitely more efficient than forking bc.
$endgroup$
– Oh My Goodness
49 mins ago

$begingroup$
To give an idea of the cost of forks (bc and [ .. ]) I ran both versions of watch_child with sleeps disabled and the same gawk-based get_temp. Based on loops executed per 5 seconds, the modified version is about 50% faster.
$endgroup$
– Oh My Goodness
35 mins ago

$begingroup$
edit: you can cut the use of sensors/gawk altogether; see edits to my code
$endgroup$
– Oh My Goodness
21 mins ago

|
show 2 more comments

$begingroup$
Thanks for the review. Agreed on the root cause, but all remedies failed. It was a 12+ yr old core with a hard life. Semicolons are a personal style and convenience. kill is portable, yet [[ .. ]] isn't as much so. gawd over grep+sed is a great call, reducing CPU load as well (I think). Refactoring time/run/report is a good one too. Not so sure about the increased aggressiveness, the objective is to not only know when the core is cool, but also allow it to cool as fast as possible.
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
One issue: as long as the temp is not between the threshold values, additional, unneeded, kill commands will be issued. -SIGSTOP will be repeatedly issued every second, the core is below temp_high. Once the temp goes below temp_low, -SIGCONT will be reissued every second. Would not only one kill per threshold crossing be better at conserving CPU resources?
$endgroup$
– Gypsy Spellweaver
1 hour ago

$begingroup$
There are really no CPU resources used: kill is a builtin that invokes a single syscall. On my system, ten million invocations is 2.5s of CPU time, or ~250ns each. Compare the sensors command line, also run once per loop, at ~8ms each, or 32000 times longer. [[ ]] is "portable" to any other bash and definitely more efficient than forking bc.
$endgroup$
– Oh My Goodness
49 mins ago

$begingroup$
To give an idea of the cost of forks (bc and [ .. ]) I ran both versions of watch_child with sleeps disabled and the same gawk-based get_temp. Based on loops executed per 5 seconds, the modified version is about 50% faster.
$endgroup$
– Oh My Goodness
35 mins ago

$begingroup$
edit: you can cut the use of sensors/gawk altogether; see edits to my code
$endgroup$
– Oh My Goodness
21 mins ago

Thanks for the review. Agreed on the root cause, but all remedies failed. It was a 12+ yr old core with a hard life. Semicolons are a personal style and convenience. kill is portable, yet [[ .. ]] isn't as much so. gawd over grep+sed is a great call, reducing CPU load as well (I think). Refactoring time/run/report is a good one too. Not so sure about the increased aggressiveness, the objective is to not only know when the core is cool, but also allow it to cool as fast as possible.

– Gypsy Spellweaver
1 hour ago

One issue: as long as the temp is not between the threshold values, additional, unneeded, kill commands will be issued. -SIGSTOP will be repeatedly issued every second, the core is below temp_high. Once the temp goes below temp_low, -SIGCONT will be reissued every second. Would not only one kill per threshold crossing be better at conserving CPU resources?

– Gypsy Spellweaver
1 hour ago

There are really no CPU resources used: kill is a builtin that invokes a single syscall. On my system, ten million invocations is 2.5s of CPU time, or ~250ns each. Compare the sensors command line, also run once per loop, at ~8ms each, or 32000 times longer. [[ ]] is "portable" to any other bash and definitely more efficient than forking bc.

– Oh My Goodness
49 mins ago

To give an idea of the cost of forks (bc and [ .. ]) I ran both versions of watch_child with sleeps disabled and the same gawk-based get_temp. Based on loops executed per 5 seconds, the modified version is about 50% faster.

– Oh My Goodness
35 mins ago

edit: you can cut the use of sensors/gawk altogether; see edits to my code

– Oh My Goodness
21 mins ago

|
show 2 more comments

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly

BASH script to monitor subprocess and throttle it for CPU temperature control

The Script:

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

The Script:

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

The Script:

The Script:

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

is 'sed' thread safe

How to make a Squid Proxy server?

BASH script to monitor subprocess and throttle it for CPU temperature control

The Script:

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

The Script:

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

The Script:

The Script:

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

migrated from unix.stackexchange.com Jan 11 '17 at 10:04

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

is 'sed' thread safe

How to make a Squid Proxy server?

1 Answer
1

1 Answer
1

1 Answer
1