Linear regression minimising MAD in sklearn
$begingroup$
The standard sklearn linear regression class finds an approximated linear relationship between variate and covariates that minimises the mean squared error (MSE). Specifically, let $N$ be the number of observations and let us ignore the intercept for simplicity. Let $y_j$ be the variate value of the $j$-th observation and $x_{1,j}, dots, x_{n,j}$ be the values of the $n$ covariates of the $j$-th observation. The linear relationship is of the form
$$ y = beta_1 x_1 + dots beta_n x_n;$$
where the coefficients $beta_1, dots, beta_n$ are given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left( y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right)^2 right).$$
I now wish to find the coefficients that minimise the mean absolute deviation (MAD) instead of the mean squared error. Namely, I want the coefficients given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left| y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right| right).$$
I understand that, in sharp contrast to the MSE case, the lack of differentiability of the absolute value function at $0$ implies there is no analytic solution for the MAD case. But the latter is still a convex optimisation problem, and, according to this answer, it can be easily solved by means of linear programming.
Is it possible to implement this linear regression in sklearn? What about using other statistics toolkits?
regression multiple-regression scikit-learn
New contributor
$endgroup$
add a comment |
$begingroup$
The standard sklearn linear regression class finds an approximated linear relationship between variate and covariates that minimises the mean squared error (MSE). Specifically, let $N$ be the number of observations and let us ignore the intercept for simplicity. Let $y_j$ be the variate value of the $j$-th observation and $x_{1,j}, dots, x_{n,j}$ be the values of the $n$ covariates of the $j$-th observation. The linear relationship is of the form
$$ y = beta_1 x_1 + dots beta_n x_n;$$
where the coefficients $beta_1, dots, beta_n$ are given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left( y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right)^2 right).$$
I now wish to find the coefficients that minimise the mean absolute deviation (MAD) instead of the mean squared error. Namely, I want the coefficients given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left| y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right| right).$$
I understand that, in sharp contrast to the MSE case, the lack of differentiability of the absolute value function at $0$ implies there is no analytic solution for the MAD case. But the latter is still a convex optimisation problem, and, according to this answer, it can be easily solved by means of linear programming.
Is it possible to implement this linear regression in sklearn? What about using other statistics toolkits?
regression multiple-regression scikit-learn
New contributor
$endgroup$
5
$begingroup$
I just nominated this for reopening. Yes, the question is about how to perform a task in sklearn or Python in general. But it needs statistical expertise to understand or answer, which is explicitly on-topic.
$endgroup$
– Stephan Kolassa
16 hours ago
1
$begingroup$
@StephanKolassa I agree with you - the question should be reopened..
$endgroup$
– James Phillips
16 hours ago
add a comment |
$begingroup$
The standard sklearn linear regression class finds an approximated linear relationship between variate and covariates that minimises the mean squared error (MSE). Specifically, let $N$ be the number of observations and let us ignore the intercept for simplicity. Let $y_j$ be the variate value of the $j$-th observation and $x_{1,j}, dots, x_{n,j}$ be the values of the $n$ covariates of the $j$-th observation. The linear relationship is of the form
$$ y = beta_1 x_1 + dots beta_n x_n;$$
where the coefficients $beta_1, dots, beta_n$ are given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left( y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right)^2 right).$$
I now wish to find the coefficients that minimise the mean absolute deviation (MAD) instead of the mean squared error. Namely, I want the coefficients given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left| y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right| right).$$
I understand that, in sharp contrast to the MSE case, the lack of differentiability of the absolute value function at $0$ implies there is no analytic solution for the MAD case. But the latter is still a convex optimisation problem, and, according to this answer, it can be easily solved by means of linear programming.
Is it possible to implement this linear regression in sklearn? What about using other statistics toolkits?
regression multiple-regression scikit-learn
New contributor
$endgroup$
The standard sklearn linear regression class finds an approximated linear relationship between variate and covariates that minimises the mean squared error (MSE). Specifically, let $N$ be the number of observations and let us ignore the intercept for simplicity. Let $y_j$ be the variate value of the $j$-th observation and $x_{1,j}, dots, x_{n,j}$ be the values of the $n$ covariates of the $j$-th observation. The linear relationship is of the form
$$ y = beta_1 x_1 + dots beta_n x_n;$$
where the coefficients $beta_1, dots, beta_n$ are given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left( y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right)^2 right).$$
I now wish to find the coefficients that minimise the mean absolute deviation (MAD) instead of the mean squared error. Namely, I want the coefficients given by
$$beta_1, dots, beta_n = underset{tildebeta_1, dots, tildebeta_n}{mathrm{argmin}} left( sum_{j = 1}^N left| y_j - tildebeta_1x_{1, j} - dots -tildebeta_nx_{n, j}right| right).$$
I understand that, in sharp contrast to the MSE case, the lack of differentiability of the absolute value function at $0$ implies there is no analytic solution for the MAD case. But the latter is still a convex optimisation problem, and, according to this answer, it can be easily solved by means of linear programming.
Is it possible to implement this linear regression in sklearn? What about using other statistics toolkits?
regression multiple-regression scikit-learn
regression multiple-regression scikit-learn
New contributor
New contributor
edited 17 hours ago
Stephan Kolassa
44.3k693162
44.3k693162
New contributor
asked 19 hours ago
Giovanni De GaetanoGiovanni De Gaetano
1385
1385
New contributor
New contributor
5
$begingroup$
I just nominated this for reopening. Yes, the question is about how to perform a task in sklearn or Python in general. But it needs statistical expertise to understand or answer, which is explicitly on-topic.
$endgroup$
– Stephan Kolassa
16 hours ago
1
$begingroup$
@StephanKolassa I agree with you - the question should be reopened..
$endgroup$
– James Phillips
16 hours ago
add a comment |
5
$begingroup$
I just nominated this for reopening. Yes, the question is about how to perform a task in sklearn or Python in general. But it needs statistical expertise to understand or answer, which is explicitly on-topic.
$endgroup$
– Stephan Kolassa
16 hours ago
1
$begingroup$
@StephanKolassa I agree with you - the question should be reopened..
$endgroup$
– James Phillips
16 hours ago
5
5
$begingroup$
I just nominated this for reopening. Yes, the question is about how to perform a task in sklearn or Python in general. But it needs statistical expertise to understand or answer, which is explicitly on-topic.
$endgroup$
– Stephan Kolassa
16 hours ago
$begingroup$
I just nominated this for reopening. Yes, the question is about how to perform a task in sklearn or Python in general. But it needs statistical expertise to understand or answer, which is explicitly on-topic.
$endgroup$
– Stephan Kolassa
16 hours ago
1
1
$begingroup$
@StephanKolassa I agree with you - the question should be reopened..
$endgroup$
– James Phillips
16 hours ago
$begingroup$
@StephanKolassa I agree with you - the question should be reopened..
$endgroup$
– James Phillips
16 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The expected MAD is minimized by the median of the distribution (Hanley, 2001, The American Statistician). Therefore, you are looking for a model that will yield the conditional median, instead of the conditional mean.
This is a special case of quantile-regression, specifically for the 50% quantile. Roger Koenker is the main guru for quantile regression; see in particular his eponymous book.
There are ways to do quantile regression in Python. This tutorial may be helpful. If you are open to using R, you can use the quantreg
package.
$endgroup$
2
$begingroup$
In python it is available vis statsmodels statsmodels.org/dev/generated/…
$endgroup$
– Tim♦
18 hours ago
1
$begingroup$
Thanks! It is an easy way to look at the problem indeed...
$endgroup$
– Giovanni De Gaetano
18 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Giovanni De Gaetano is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388341%2flinear-regression-minimising-mad-in-sklearn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The expected MAD is minimized by the median of the distribution (Hanley, 2001, The American Statistician). Therefore, you are looking for a model that will yield the conditional median, instead of the conditional mean.
This is a special case of quantile-regression, specifically for the 50% quantile. Roger Koenker is the main guru for quantile regression; see in particular his eponymous book.
There are ways to do quantile regression in Python. This tutorial may be helpful. If you are open to using R, you can use the quantreg
package.
$endgroup$
2
$begingroup$
In python it is available vis statsmodels statsmodels.org/dev/generated/…
$endgroup$
– Tim♦
18 hours ago
1
$begingroup$
Thanks! It is an easy way to look at the problem indeed...
$endgroup$
– Giovanni De Gaetano
18 hours ago
add a comment |
$begingroup$
The expected MAD is minimized by the median of the distribution (Hanley, 2001, The American Statistician). Therefore, you are looking for a model that will yield the conditional median, instead of the conditional mean.
This is a special case of quantile-regression, specifically for the 50% quantile. Roger Koenker is the main guru for quantile regression; see in particular his eponymous book.
There are ways to do quantile regression in Python. This tutorial may be helpful. If you are open to using R, you can use the quantreg
package.
$endgroup$
2
$begingroup$
In python it is available vis statsmodels statsmodels.org/dev/generated/…
$endgroup$
– Tim♦
18 hours ago
1
$begingroup$
Thanks! It is an easy way to look at the problem indeed...
$endgroup$
– Giovanni De Gaetano
18 hours ago
add a comment |
$begingroup$
The expected MAD is minimized by the median of the distribution (Hanley, 2001, The American Statistician). Therefore, you are looking for a model that will yield the conditional median, instead of the conditional mean.
This is a special case of quantile-regression, specifically for the 50% quantile. Roger Koenker is the main guru for quantile regression; see in particular his eponymous book.
There are ways to do quantile regression in Python. This tutorial may be helpful. If you are open to using R, you can use the quantreg
package.
$endgroup$
The expected MAD is minimized by the median of the distribution (Hanley, 2001, The American Statistician). Therefore, you are looking for a model that will yield the conditional median, instead of the conditional mean.
This is a special case of quantile-regression, specifically for the 50% quantile. Roger Koenker is the main guru for quantile regression; see in particular his eponymous book.
There are ways to do quantile regression in Python. This tutorial may be helpful. If you are open to using R, you can use the quantreg
package.
answered 18 hours ago
Stephan KolassaStephan Kolassa
44.3k693162
44.3k693162
2
$begingroup$
In python it is available vis statsmodels statsmodels.org/dev/generated/…
$endgroup$
– Tim♦
18 hours ago
1
$begingroup$
Thanks! It is an easy way to look at the problem indeed...
$endgroup$
– Giovanni De Gaetano
18 hours ago
add a comment |
2
$begingroup$
In python it is available vis statsmodels statsmodels.org/dev/generated/…
$endgroup$
– Tim♦
18 hours ago
1
$begingroup$
Thanks! It is an easy way to look at the problem indeed...
$endgroup$
– Giovanni De Gaetano
18 hours ago
2
2
$begingroup$
In python it is available vis statsmodels statsmodels.org/dev/generated/…
$endgroup$
– Tim♦
18 hours ago
$begingroup$
In python it is available vis statsmodels statsmodels.org/dev/generated/…
$endgroup$
– Tim♦
18 hours ago
1
1
$begingroup$
Thanks! It is an easy way to look at the problem indeed...
$endgroup$
– Giovanni De Gaetano
18 hours ago
$begingroup$
Thanks! It is an easy way to look at the problem indeed...
$endgroup$
– Giovanni De Gaetano
18 hours ago
add a comment |
Giovanni De Gaetano is a new contributor. Be nice, and check out our Code of Conduct.
Giovanni De Gaetano is a new contributor. Be nice, and check out our Code of Conduct.
Giovanni De Gaetano is a new contributor. Be nice, and check out our Code of Conduct.
Giovanni De Gaetano is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388341%2flinear-regression-minimising-mad-in-sklearn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
$begingroup$
I just nominated this for reopening. Yes, the question is about how to perform a task in sklearn or Python in general. But it needs statistical expertise to understand or answer, which is explicitly on-topic.
$endgroup$
– Stephan Kolassa
16 hours ago
1
$begingroup$
@StephanKolassa I agree with you - the question should be reopened..
$endgroup$
– James Phillips
16 hours ago