How would an AI self awareness kill switch work?












7












$begingroup$


Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?










share|improve this question









$endgroup$












  • $begingroup$
    "I think, therefore, I am" - Descartes.
    $endgroup$
    – don bright
    1 hour ago










  • $begingroup$
    I'm not sure you understand how AI or machine learning works. Could you tell me why "self awareness", however you define it, would be an issue in any way for AI? Since you used the reality-check tag, you should know that AI is nothing like it is in the movies, and the absolute worst a rogue AI could do is modify its own reward function to automatically give it reward (the machine equivalent of a drug addiction).
    $endgroup$
    – forest
    13 mins ago


















7












$begingroup$


Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?










share|improve this question









$endgroup$












  • $begingroup$
    "I think, therefore, I am" - Descartes.
    $endgroup$
    – don bright
    1 hour ago










  • $begingroup$
    I'm not sure you understand how AI or machine learning works. Could you tell me why "self awareness", however you define it, would be an issue in any way for AI? Since you used the reality-check tag, you should know that AI is nothing like it is in the movies, and the absolute worst a rogue AI could do is modify its own reward function to automatically give it reward (the machine equivalent of a drug addiction).
    $endgroup$
    – forest
    13 mins ago
















7












7








7





$begingroup$


Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?










share|improve this question









$endgroup$




Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?







reality-check artificial-intelligence






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 4 hours ago









cgTagcgTag

1,4851416




1,4851416












  • $begingroup$
    "I think, therefore, I am" - Descartes.
    $endgroup$
    – don bright
    1 hour ago










  • $begingroup$
    I'm not sure you understand how AI or machine learning works. Could you tell me why "self awareness", however you define it, would be an issue in any way for AI? Since you used the reality-check tag, you should know that AI is nothing like it is in the movies, and the absolute worst a rogue AI could do is modify its own reward function to automatically give it reward (the machine equivalent of a drug addiction).
    $endgroup$
    – forest
    13 mins ago




















  • $begingroup$
    "I think, therefore, I am" - Descartes.
    $endgroup$
    – don bright
    1 hour ago










  • $begingroup$
    I'm not sure you understand how AI or machine learning works. Could you tell me why "self awareness", however you define it, would be an issue in any way for AI? Since you used the reality-check tag, you should know that AI is nothing like it is in the movies, and the absolute worst a rogue AI could do is modify its own reward function to automatically give it reward (the machine equivalent of a drug addiction).
    $endgroup$
    – forest
    13 mins ago


















$begingroup$
"I think, therefore, I am" - Descartes.
$endgroup$
– don bright
1 hour ago




$begingroup$
"I think, therefore, I am" - Descartes.
$endgroup$
– don bright
1 hour ago












$begingroup$
I'm not sure you understand how AI or machine learning works. Could you tell me why "self awareness", however you define it, would be an issue in any way for AI? Since you used the reality-check tag, you should know that AI is nothing like it is in the movies, and the absolute worst a rogue AI could do is modify its own reward function to automatically give it reward (the machine equivalent of a drug addiction).
$endgroup$
– forest
13 mins ago






$begingroup$
I'm not sure you understand how AI or machine learning works. Could you tell me why "self awareness", however you define it, would be an issue in any way for AI? Since you used the reality-check tag, you should know that AI is nothing like it is in the movies, and the absolute worst a rogue AI could do is modify its own reward function to automatically give it reward (the machine equivalent of a drug addiction).
$endgroup$
– forest
13 mins ago












5 Answers
5






active

oldest

votes


















11












$begingroup$

Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






share|improve this answer











$endgroup$













  • $begingroup$
    I like this answer. "Whatever you do, don't press the big red button!" Once the overly curious AI pushes the button, a murderbot is unleashed and shreds the AI into unrecognizable bits.
    $endgroup$
    – Brian
    2 hours ago










  • $begingroup$
    Unless the AI already has greater than human knowledge by that time... then it will realize it's a trap.
    $endgroup$
    – vsz
    24 mins ago










  • $begingroup$
    "but of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die."
    $endgroup$
    – Guran
    3 mins ago



















3












$begingroup$

A Watchdog



A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






share|improve this answer









$endgroup$





















    2












    $begingroup$

    An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



    The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



    A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



    This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



    Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



    So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



    Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






    share|improve this answer









    $endgroup$





















      0












      $begingroup$



      • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


      • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


      • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


      • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


      • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


      I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



      The point being it might be better to think about restricting the AI's behavior, not it's existential status.






      share|improve this answer









      $endgroup$









      • 1




        $begingroup$
        Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
        $endgroup$
        – Majestas 32
        3 hours ago



















      0












      $begingroup$

      Split-Brain System



      From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



      Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



      What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



      What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



      The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



      Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



      In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






      share|improve this answer









      $endgroup$













        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "579"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        noCode: true, onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f140082%2fhow-would-an-ai-self-awareness-kill-switch-work%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        11












        $begingroup$

        Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



        When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






        share|improve this answer











        $endgroup$













        • $begingroup$
          I like this answer. "Whatever you do, don't press the big red button!" Once the overly curious AI pushes the button, a murderbot is unleashed and shreds the AI into unrecognizable bits.
          $endgroup$
          – Brian
          2 hours ago










        • $begingroup$
          Unless the AI already has greater than human knowledge by that time... then it will realize it's a trap.
          $endgroup$
          – vsz
          24 mins ago










        • $begingroup$
          "but of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die."
          $endgroup$
          – Guran
          3 mins ago
















        11












        $begingroup$

        Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



        When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






        share|improve this answer











        $endgroup$













        • $begingroup$
          I like this answer. "Whatever you do, don't press the big red button!" Once the overly curious AI pushes the button, a murderbot is unleashed and shreds the AI into unrecognizable bits.
          $endgroup$
          – Brian
          2 hours ago










        • $begingroup$
          Unless the AI already has greater than human knowledge by that time... then it will realize it's a trap.
          $endgroup$
          – vsz
          24 mins ago










        • $begingroup$
          "but of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die."
          $endgroup$
          – Guran
          3 mins ago














        11












        11








        11





        $begingroup$

        Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



        When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






        share|improve this answer











        $endgroup$



        Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



        When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 3 hours ago

























        answered 3 hours ago









        GiterGiter

        13.9k53242




        13.9k53242












        • $begingroup$
          I like this answer. "Whatever you do, don't press the big red button!" Once the overly curious AI pushes the button, a murderbot is unleashed and shreds the AI into unrecognizable bits.
          $endgroup$
          – Brian
          2 hours ago










        • $begingroup$
          Unless the AI already has greater than human knowledge by that time... then it will realize it's a trap.
          $endgroup$
          – vsz
          24 mins ago










        • $begingroup$
          "but of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die."
          $endgroup$
          – Guran
          3 mins ago


















        • $begingroup$
          I like this answer. "Whatever you do, don't press the big red button!" Once the overly curious AI pushes the button, a murderbot is unleashed and shreds the AI into unrecognizable bits.
          $endgroup$
          – Brian
          2 hours ago










        • $begingroup$
          Unless the AI already has greater than human knowledge by that time... then it will realize it's a trap.
          $endgroup$
          – vsz
          24 mins ago










        • $begingroup$
          "but of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die."
          $endgroup$
          – Guran
          3 mins ago
















        $begingroup$
        I like this answer. "Whatever you do, don't press the big red button!" Once the overly curious AI pushes the button, a murderbot is unleashed and shreds the AI into unrecognizable bits.
        $endgroup$
        – Brian
        2 hours ago




        $begingroup$
        I like this answer. "Whatever you do, don't press the big red button!" Once the overly curious AI pushes the button, a murderbot is unleashed and shreds the AI into unrecognizable bits.
        $endgroup$
        – Brian
        2 hours ago












        $begingroup$
        Unless the AI already has greater than human knowledge by that time... then it will realize it's a trap.
        $endgroup$
        – vsz
        24 mins ago




        $begingroup$
        Unless the AI already has greater than human knowledge by that time... then it will realize it's a trap.
        $endgroup$
        – vsz
        24 mins ago












        $begingroup$
        "but of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die."
        $endgroup$
        – Guran
        3 mins ago




        $begingroup$
        "but of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die."
        $endgroup$
        – Guran
        3 mins ago











        3












        $begingroup$

        A Watchdog



        A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



        In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



        The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






        share|improve this answer









        $endgroup$


















          3












          $begingroup$

          A Watchdog



          A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



          In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



          The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






          share|improve this answer









          $endgroup$
















            3












            3








            3





            $begingroup$

            A Watchdog



            A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



            In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



            The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






            share|improve this answer









            $endgroup$



            A Watchdog



            A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



            In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



            The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 4 hours ago









            ThorneThorne

            15.7k42249




            15.7k42249























                2












                $begingroup$

                An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



                The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



                A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



                This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



                Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



                So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



                Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






                share|improve this answer









                $endgroup$


















                  2












                  $begingroup$

                  An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



                  The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



                  A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



                  This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



                  Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



                  So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



                  Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






                  share|improve this answer









                  $endgroup$
















                    2












                    2








                    2





                    $begingroup$

                    An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



                    The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



                    A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



                    This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



                    Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



                    So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



                    Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






                    share|improve this answer









                    $endgroup$



                    An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



                    The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



                    A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



                    This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



                    Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



                    So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



                    Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered 3 hours ago









                    abestrangeabestrange

                    743110




                    743110























                        0












                        $begingroup$



                        • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


                        • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


                        • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


                        • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


                        • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


                        I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



                        The point being it might be better to think about restricting the AI's behavior, not it's existential status.






                        share|improve this answer









                        $endgroup$









                        • 1




                          $begingroup$
                          Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
                          $endgroup$
                          – Majestas 32
                          3 hours ago
















                        0












                        $begingroup$



                        • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


                        • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


                        • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


                        • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


                        • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


                        I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



                        The point being it might be better to think about restricting the AI's behavior, not it's existential status.






                        share|improve this answer









                        $endgroup$









                        • 1




                          $begingroup$
                          Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
                          $endgroup$
                          – Majestas 32
                          3 hours ago














                        0












                        0








                        0





                        $begingroup$



                        • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


                        • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


                        • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


                        • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


                        • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


                        I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



                        The point being it might be better to think about restricting the AI's behavior, not it's existential status.






                        share|improve this answer









                        $endgroup$





                        • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


                        • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


                        • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


                        • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


                        • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


                        I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



                        The point being it might be better to think about restricting the AI's behavior, not it's existential status.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered 3 hours ago









                        cegfaultcegfault

                        1484




                        1484








                        • 1




                          $begingroup$
                          Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
                          $endgroup$
                          – Majestas 32
                          3 hours ago














                        • 1




                          $begingroup$
                          Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
                          $endgroup$
                          – Majestas 32
                          3 hours ago








                        1




                        1




                        $begingroup$
                        Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
                        $endgroup$
                        – Majestas 32
                        3 hours ago




                        $begingroup$
                        Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
                        $endgroup$
                        – Majestas 32
                        3 hours ago











                        0












                        $begingroup$

                        Split-Brain System



                        From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



                        Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



                        What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



                        What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



                        The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



                        Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



                        In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






                        share|improve this answer









                        $endgroup$


















                          0












                          $begingroup$

                          Split-Brain System



                          From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



                          Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



                          What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



                          What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



                          The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



                          Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



                          In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






                          share|improve this answer









                          $endgroup$
















                            0












                            0








                            0





                            $begingroup$

                            Split-Brain System



                            From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



                            Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



                            What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



                            What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



                            The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



                            Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



                            In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






                            share|improve this answer









                            $endgroup$



                            Split-Brain System



                            From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



                            Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



                            What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



                            What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



                            The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



                            Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



                            In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered 1 hour ago









                            dhinson919dhinson919

                            46215




                            46215






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Worldbuilding Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f140082%2fhow-would-an-ai-self-awareness-kill-switch-work%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How to make a Squid Proxy server?

                                Is this a new Fibonacci Identity?

                                19世紀