Can a neural network compute $y = x^2$?
$begingroup$
In spirit of the famous Tensorflow Fizz Buzz joke and XOr problem I started to think, if it's possible to design a neural network that implements $y = x^2$ function?
Given some representation of a number (e.g. as a vector in binary form, so that number 5
is represented as [1,0,1,0,0,0,0,...]
), the neural network should learn to return its square - 25 in this case.
If I could implement $y=x^2$, I could probably implement $y=x^3$ and generally any polynomial of x, and then with Taylor series I could approximate $y=sin(x)$, which would solve the Fizz Buzz problem - a neural network that can find remainder of the division.
Clearly, just the linear part of NNs won't be able to perform this task, so if we could do the multiplication, it would be happening thanks to activation function.
Can you suggest any ideas or reading on subject?
machine-learning neural-network
New contributor
$endgroup$
add a comment |
$begingroup$
In spirit of the famous Tensorflow Fizz Buzz joke and XOr problem I started to think, if it's possible to design a neural network that implements $y = x^2$ function?
Given some representation of a number (e.g. as a vector in binary form, so that number 5
is represented as [1,0,1,0,0,0,0,...]
), the neural network should learn to return its square - 25 in this case.
If I could implement $y=x^2$, I could probably implement $y=x^3$ and generally any polynomial of x, and then with Taylor series I could approximate $y=sin(x)$, which would solve the Fizz Buzz problem - a neural network that can find remainder of the division.
Clearly, just the linear part of NNs won't be able to perform this task, so if we could do the multiplication, it would be happening thanks to activation function.
Can you suggest any ideas or reading on subject?
machine-learning neural-network
New contributor
$endgroup$
add a comment |
$begingroup$
In spirit of the famous Tensorflow Fizz Buzz joke and XOr problem I started to think, if it's possible to design a neural network that implements $y = x^2$ function?
Given some representation of a number (e.g. as a vector in binary form, so that number 5
is represented as [1,0,1,0,0,0,0,...]
), the neural network should learn to return its square - 25 in this case.
If I could implement $y=x^2$, I could probably implement $y=x^3$ and generally any polynomial of x, and then with Taylor series I could approximate $y=sin(x)$, which would solve the Fizz Buzz problem - a neural network that can find remainder of the division.
Clearly, just the linear part of NNs won't be able to perform this task, so if we could do the multiplication, it would be happening thanks to activation function.
Can you suggest any ideas or reading on subject?
machine-learning neural-network
New contributor
$endgroup$
In spirit of the famous Tensorflow Fizz Buzz joke and XOr problem I started to think, if it's possible to design a neural network that implements $y = x^2$ function?
Given some representation of a number (e.g. as a vector in binary form, so that number 5
is represented as [1,0,1,0,0,0,0,...]
), the neural network should learn to return its square - 25 in this case.
If I could implement $y=x^2$, I could probably implement $y=x^3$ and generally any polynomial of x, and then with Taylor series I could approximate $y=sin(x)$, which would solve the Fizz Buzz problem - a neural network that can find remainder of the division.
Clearly, just the linear part of NNs won't be able to perform this task, so if we could do the multiplication, it would be happening thanks to activation function.
Can you suggest any ideas or reading on subject?
machine-learning neural-network
machine-learning neural-network
New contributor
New contributor
edited 8 hours ago
Boris Burkov
New contributor
asked 12 hours ago
Boris BurkovBoris Burkov
1235
1235
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Neural networks are also called as the universal function approximation which is based in the universal function approximation theorem. It states that :
In the mathematical theory of artificial neural networks,
the universal approximation theorem states that a feed-forward network
with a single hidden layer containing a finite number of neurons can
approximate continuous functions on compact subsets of Rn, under mild
assumptions on the activation function
Meaning a ANN with a non linear activation function could map the function which relates the input with the output. The function y = x^2 could be easily approximated using regression ANN.
You can find an excellent lesson here with a notebook example.
Also, because of such ability ANN could map complex relationships for example between an image and its labels.
$endgroup$
1
$begingroup$
Thank you very much, this is exactly what I was asking for!
$endgroup$
– Boris Burkov
12 hours ago
2
$begingroup$
Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends.
$endgroup$
– Jeffrey
10 hours ago
add a comment |
$begingroup$
I think the answer of @ShubhamPanchal is a little bit misleading. Yes, it is true that by Cybenko's universal approximation theorem we can approximate $f(x)=x^2$ with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $mathbb{R}^n$, under mild assumptions on the activation function.
But the main problem is that the theorem has a very important
limitation. The function needs to be defined on compact subsets of
$mathbb{R}^n$ (compact subset = bounded + closed subset). But why
is this problematic?. When training the function approximator you
will always have a finite data set. Hence, you will approximate the
function inside a compact subset of $mathbb{R}^n$. But we can always
find a point $x$ for which the approximation will probably fail. That
being said. If you only want to approximate $f(x)=x^2$ on a compact
subset of $mathbb{R}$ then we can answer your question with yes.
But if you want to approximate $f(x)=x^2$ for all $xin mathbb{R}$
then the answer is no (I exclude the trivial case in which you use
a quadratic activation function).
Side remark on Taylor approximation: You always have to keep in mind that a Taylor approximation is only a local approximation. If you only want to approximate a function in a predefined region then you should be able to use Taylor series. But approximating $sin(x)$ by the Taylor series evaluated at $x=0$ will give you horrible results for $xto 10000$ if you don't use enough terms in your Taylor expansion.
New contributor
$endgroup$
1
$begingroup$
Nice catch! "compact set".
$endgroup$
– Esmailian
8 hours ago
$begingroup$
Many thanks, mate! Eye-opener!
$endgroup$
– Boris Burkov
8 hours ago
$begingroup$
@Esmailian: Thank you :).
$endgroup$
– MachineLearner
8 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Boris Burkov is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47787%2fcan-a-neural-network-compute-y-x2%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Neural networks are also called as the universal function approximation which is based in the universal function approximation theorem. It states that :
In the mathematical theory of artificial neural networks,
the universal approximation theorem states that a feed-forward network
with a single hidden layer containing a finite number of neurons can
approximate continuous functions on compact subsets of Rn, under mild
assumptions on the activation function
Meaning a ANN with a non linear activation function could map the function which relates the input with the output. The function y = x^2 could be easily approximated using regression ANN.
You can find an excellent lesson here with a notebook example.
Also, because of such ability ANN could map complex relationships for example between an image and its labels.
$endgroup$
1
$begingroup$
Thank you very much, this is exactly what I was asking for!
$endgroup$
– Boris Burkov
12 hours ago
2
$begingroup$
Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends.
$endgroup$
– Jeffrey
10 hours ago
add a comment |
$begingroup$
Neural networks are also called as the universal function approximation which is based in the universal function approximation theorem. It states that :
In the mathematical theory of artificial neural networks,
the universal approximation theorem states that a feed-forward network
with a single hidden layer containing a finite number of neurons can
approximate continuous functions on compact subsets of Rn, under mild
assumptions on the activation function
Meaning a ANN with a non linear activation function could map the function which relates the input with the output. The function y = x^2 could be easily approximated using regression ANN.
You can find an excellent lesson here with a notebook example.
Also, because of such ability ANN could map complex relationships for example between an image and its labels.
$endgroup$
1
$begingroup$
Thank you very much, this is exactly what I was asking for!
$endgroup$
– Boris Burkov
12 hours ago
2
$begingroup$
Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends.
$endgroup$
– Jeffrey
10 hours ago
add a comment |
$begingroup$
Neural networks are also called as the universal function approximation which is based in the universal function approximation theorem. It states that :
In the mathematical theory of artificial neural networks,
the universal approximation theorem states that a feed-forward network
with a single hidden layer containing a finite number of neurons can
approximate continuous functions on compact subsets of Rn, under mild
assumptions on the activation function
Meaning a ANN with a non linear activation function could map the function which relates the input with the output. The function y = x^2 could be easily approximated using regression ANN.
You can find an excellent lesson here with a notebook example.
Also, because of such ability ANN could map complex relationships for example between an image and its labels.
$endgroup$
Neural networks are also called as the universal function approximation which is based in the universal function approximation theorem. It states that :
In the mathematical theory of artificial neural networks,
the universal approximation theorem states that a feed-forward network
with a single hidden layer containing a finite number of neurons can
approximate continuous functions on compact subsets of Rn, under mild
assumptions on the activation function
Meaning a ANN with a non linear activation function could map the function which relates the input with the output. The function y = x^2 could be easily approximated using regression ANN.
You can find an excellent lesson here with a notebook example.
Also, because of such ability ANN could map complex relationships for example between an image and its labels.
answered 12 hours ago
Shubham PanchalShubham Panchal
34215
34215
1
$begingroup$
Thank you very much, this is exactly what I was asking for!
$endgroup$
– Boris Burkov
12 hours ago
2
$begingroup$
Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends.
$endgroup$
– Jeffrey
10 hours ago
add a comment |
1
$begingroup$
Thank you very much, this is exactly what I was asking for!
$endgroup$
– Boris Burkov
12 hours ago
2
$begingroup$
Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends.
$endgroup$
– Jeffrey
10 hours ago
1
1
$begingroup$
Thank you very much, this is exactly what I was asking for!
$endgroup$
– Boris Burkov
12 hours ago
$begingroup$
Thank you very much, this is exactly what I was asking for!
$endgroup$
– Boris Burkov
12 hours ago
2
2
$begingroup$
Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends.
$endgroup$
– Jeffrey
10 hours ago
$begingroup$
Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends.
$endgroup$
– Jeffrey
10 hours ago
add a comment |
$begingroup$
I think the answer of @ShubhamPanchal is a little bit misleading. Yes, it is true that by Cybenko's universal approximation theorem we can approximate $f(x)=x^2$ with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $mathbb{R}^n$, under mild assumptions on the activation function.
But the main problem is that the theorem has a very important
limitation. The function needs to be defined on compact subsets of
$mathbb{R}^n$ (compact subset = bounded + closed subset). But why
is this problematic?. When training the function approximator you
will always have a finite data set. Hence, you will approximate the
function inside a compact subset of $mathbb{R}^n$. But we can always
find a point $x$ for which the approximation will probably fail. That
being said. If you only want to approximate $f(x)=x^2$ on a compact
subset of $mathbb{R}$ then we can answer your question with yes.
But if you want to approximate $f(x)=x^2$ for all $xin mathbb{R}$
then the answer is no (I exclude the trivial case in which you use
a quadratic activation function).
Side remark on Taylor approximation: You always have to keep in mind that a Taylor approximation is only a local approximation. If you only want to approximate a function in a predefined region then you should be able to use Taylor series. But approximating $sin(x)$ by the Taylor series evaluated at $x=0$ will give you horrible results for $xto 10000$ if you don't use enough terms in your Taylor expansion.
New contributor
$endgroup$
1
$begingroup$
Nice catch! "compact set".
$endgroup$
– Esmailian
8 hours ago
$begingroup$
Many thanks, mate! Eye-opener!
$endgroup$
– Boris Burkov
8 hours ago
$begingroup$
@Esmailian: Thank you :).
$endgroup$
– MachineLearner
8 hours ago
add a comment |
$begingroup$
I think the answer of @ShubhamPanchal is a little bit misleading. Yes, it is true that by Cybenko's universal approximation theorem we can approximate $f(x)=x^2$ with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $mathbb{R}^n$, under mild assumptions on the activation function.
But the main problem is that the theorem has a very important
limitation. The function needs to be defined on compact subsets of
$mathbb{R}^n$ (compact subset = bounded + closed subset). But why
is this problematic?. When training the function approximator you
will always have a finite data set. Hence, you will approximate the
function inside a compact subset of $mathbb{R}^n$. But we can always
find a point $x$ for which the approximation will probably fail. That
being said. If you only want to approximate $f(x)=x^2$ on a compact
subset of $mathbb{R}$ then we can answer your question with yes.
But if you want to approximate $f(x)=x^2$ for all $xin mathbb{R}$
then the answer is no (I exclude the trivial case in which you use
a quadratic activation function).
Side remark on Taylor approximation: You always have to keep in mind that a Taylor approximation is only a local approximation. If you only want to approximate a function in a predefined region then you should be able to use Taylor series. But approximating $sin(x)$ by the Taylor series evaluated at $x=0$ will give you horrible results for $xto 10000$ if you don't use enough terms in your Taylor expansion.
New contributor
$endgroup$
1
$begingroup$
Nice catch! "compact set".
$endgroup$
– Esmailian
8 hours ago
$begingroup$
Many thanks, mate! Eye-opener!
$endgroup$
– Boris Burkov
8 hours ago
$begingroup$
@Esmailian: Thank you :).
$endgroup$
– MachineLearner
8 hours ago
add a comment |
$begingroup$
I think the answer of @ShubhamPanchal is a little bit misleading. Yes, it is true that by Cybenko's universal approximation theorem we can approximate $f(x)=x^2$ with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $mathbb{R}^n$, under mild assumptions on the activation function.
But the main problem is that the theorem has a very important
limitation. The function needs to be defined on compact subsets of
$mathbb{R}^n$ (compact subset = bounded + closed subset). But why
is this problematic?. When training the function approximator you
will always have a finite data set. Hence, you will approximate the
function inside a compact subset of $mathbb{R}^n$. But we can always
find a point $x$ for which the approximation will probably fail. That
being said. If you only want to approximate $f(x)=x^2$ on a compact
subset of $mathbb{R}$ then we can answer your question with yes.
But if you want to approximate $f(x)=x^2$ for all $xin mathbb{R}$
then the answer is no (I exclude the trivial case in which you use
a quadratic activation function).
Side remark on Taylor approximation: You always have to keep in mind that a Taylor approximation is only a local approximation. If you only want to approximate a function in a predefined region then you should be able to use Taylor series. But approximating $sin(x)$ by the Taylor series evaluated at $x=0$ will give you horrible results for $xto 10000$ if you don't use enough terms in your Taylor expansion.
New contributor
$endgroup$
I think the answer of @ShubhamPanchal is a little bit misleading. Yes, it is true that by Cybenko's universal approximation theorem we can approximate $f(x)=x^2$ with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $mathbb{R}^n$, under mild assumptions on the activation function.
But the main problem is that the theorem has a very important
limitation. The function needs to be defined on compact subsets of
$mathbb{R}^n$ (compact subset = bounded + closed subset). But why
is this problematic?. When training the function approximator you
will always have a finite data set. Hence, you will approximate the
function inside a compact subset of $mathbb{R}^n$. But we can always
find a point $x$ for which the approximation will probably fail. That
being said. If you only want to approximate $f(x)=x^2$ on a compact
subset of $mathbb{R}$ then we can answer your question with yes.
But if you want to approximate $f(x)=x^2$ for all $xin mathbb{R}$
then the answer is no (I exclude the trivial case in which you use
a quadratic activation function).
Side remark on Taylor approximation: You always have to keep in mind that a Taylor approximation is only a local approximation. If you only want to approximate a function in a predefined region then you should be able to use Taylor series. But approximating $sin(x)$ by the Taylor series evaluated at $x=0$ will give you horrible results for $xto 10000$ if you don't use enough terms in your Taylor expansion.
New contributor
edited 8 hours ago
New contributor
answered 8 hours ago
MachineLearnerMachineLearner
30810
30810
New contributor
New contributor
1
$begingroup$
Nice catch! "compact set".
$endgroup$
– Esmailian
8 hours ago
$begingroup$
Many thanks, mate! Eye-opener!
$endgroup$
– Boris Burkov
8 hours ago
$begingroup$
@Esmailian: Thank you :).
$endgroup$
– MachineLearner
8 hours ago
add a comment |
1
$begingroup$
Nice catch! "compact set".
$endgroup$
– Esmailian
8 hours ago
$begingroup$
Many thanks, mate! Eye-opener!
$endgroup$
– Boris Burkov
8 hours ago
$begingroup$
@Esmailian: Thank you :).
$endgroup$
– MachineLearner
8 hours ago
1
1
$begingroup$
Nice catch! "compact set".
$endgroup$
– Esmailian
8 hours ago
$begingroup$
Nice catch! "compact set".
$endgroup$
– Esmailian
8 hours ago
$begingroup$
Many thanks, mate! Eye-opener!
$endgroup$
– Boris Burkov
8 hours ago
$begingroup$
Many thanks, mate! Eye-opener!
$endgroup$
– Boris Burkov
8 hours ago
$begingroup$
@Esmailian: Thank you :).
$endgroup$
– MachineLearner
8 hours ago
$begingroup$
@Esmailian: Thank you :).
$endgroup$
– MachineLearner
8 hours ago
add a comment |
Boris Burkov is a new contributor. Be nice, and check out our Code of Conduct.
Boris Burkov is a new contributor. Be nice, and check out our Code of Conduct.
Boris Burkov is a new contributor. Be nice, and check out our Code of Conduct.
Boris Burkov is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47787%2fcan-a-neural-network-compute-y-x2%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown