APEI Generic Hardware Error
Over the past week my server (running Debian Jessie) has rebooted twice. In the syslog I see this before each reboot, and at no other points:
Aug 15 13:32:58 hoshimiya kernel: [296512.005355] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Aug 15 13:32:58 hoshimiya kernel: [296512.005360] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Aug 15 13:32:58 hoshimiya kernel: [296512.005361] {1}[Hardware Error]: event severity: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005362] {1}[Hardware Error]: Error 0, type: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005363] {1}[Hardware Error]: fru_text: CorrectedErr
Aug 15 13:32:58 hoshimiya kernel: [296512.005364] {1}[Hardware Error]: section_type: memory error
Aug 15 13:32:58 hoshimiya kernel: [296512.005365] [Firmware Warn]: error section length is too small
Some googling leads me to believe that this is to do with my ECC RAM detecting and recovering from an error. Is this correct? If it's recovering, why does the system reboot? I'd like to prevent the system from rebooting, if at all possible.
hardware
add a comment |
Over the past week my server (running Debian Jessie) has rebooted twice. In the syslog I see this before each reboot, and at no other points:
Aug 15 13:32:58 hoshimiya kernel: [296512.005355] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Aug 15 13:32:58 hoshimiya kernel: [296512.005360] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Aug 15 13:32:58 hoshimiya kernel: [296512.005361] {1}[Hardware Error]: event severity: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005362] {1}[Hardware Error]: Error 0, type: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005363] {1}[Hardware Error]: fru_text: CorrectedErr
Aug 15 13:32:58 hoshimiya kernel: [296512.005364] {1}[Hardware Error]: section_type: memory error
Aug 15 13:32:58 hoshimiya kernel: [296512.005365] [Firmware Warn]: error section length is too small
Some googling leads me to believe that this is to do with my ECC RAM detecting and recovering from an error. Is this correct? If it's recovering, why does the system reboot? I'd like to prevent the system from rebooting, if at all possible.
hardware
add a comment |
Over the past week my server (running Debian Jessie) has rebooted twice. In the syslog I see this before each reboot, and at no other points:
Aug 15 13:32:58 hoshimiya kernel: [296512.005355] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Aug 15 13:32:58 hoshimiya kernel: [296512.005360] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Aug 15 13:32:58 hoshimiya kernel: [296512.005361] {1}[Hardware Error]: event severity: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005362] {1}[Hardware Error]: Error 0, type: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005363] {1}[Hardware Error]: fru_text: CorrectedErr
Aug 15 13:32:58 hoshimiya kernel: [296512.005364] {1}[Hardware Error]: section_type: memory error
Aug 15 13:32:58 hoshimiya kernel: [296512.005365] [Firmware Warn]: error section length is too small
Some googling leads me to believe that this is to do with my ECC RAM detecting and recovering from an error. Is this correct? If it's recovering, why does the system reboot? I'd like to prevent the system from rebooting, if at all possible.
hardware
Over the past week my server (running Debian Jessie) has rebooted twice. In the syslog I see this before each reboot, and at no other points:
Aug 15 13:32:58 hoshimiya kernel: [296512.005355] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Aug 15 13:32:58 hoshimiya kernel: [296512.005360] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Aug 15 13:32:58 hoshimiya kernel: [296512.005361] {1}[Hardware Error]: event severity: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005362] {1}[Hardware Error]: Error 0, type: corrected
Aug 15 13:32:58 hoshimiya kernel: [296512.005363] {1}[Hardware Error]: fru_text: CorrectedErr
Aug 15 13:32:58 hoshimiya kernel: [296512.005364] {1}[Hardware Error]: section_type: memory error
Aug 15 13:32:58 hoshimiya kernel: [296512.005365] [Firmware Warn]: error section length is too small
Some googling leads me to believe that this is to do with my ECC RAM detecting and recovering from an error. Is this correct? If it's recovering, why does the system reboot? I'd like to prevent the system from rebooting, if at all possible.
hardware
hardware
edited Apr 27 '16 at 17:58
Anthon
60.9k17103166
60.9k17103166
asked Aug 15 '14 at 19:04
moujikmoujik
48113
48113
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Looks like your RAM is failing, or having errors that are being corrected. Depending on the severity it sounds like these errors are impacting it's ability to function and it's having to reboot afterwards.
From the looks of this thread the message bit at the end about the error section length being too small is likely the culprit.
excerpt - [PATCH 1/1] efi: cper: Support different length of Error Section
Some fields might be added to the Error Section in the newer UEFI
spec. For example, the fields 'Reserved', 'Rank Number', 'Card Handle'
and 'Module Handle' are added to the Memory Error Section started from
UEFI spec 2.3. Unfortunately, there will have the following warning
message if the memory corrected error is detected and the field
'revision' in struct acpi_generic_data is less then 0x203 (UEFI spec
2.3):
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]: Error 0, type: corrected
{1}[Hardware Error]: section_type: memory error
[Firmware Warn]: error section length is too small
This behavior causes this corrected error cannot be displayed
correctly. To solve the issue, this patch supports different length of
the Error Section for different UEFI spec version.
And, this patch employs a pre-defined structure to clean up the
duplicated codes in function cper_estatus_print_section.
With applying this patch, the memory corrected error could be
displayed correctly after injecting the error.
Tested on v3.14-rc5 with Grantley platform and Intel RAStool.
So it would seem a patch for that particular error is in the works and might be available in a newer version of the kernel.
add a comment |
FYI I appeared to have a very similar issue as this.
As it turned out the solution was taking the memory out, and reseating it, and everything was back to normal.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f150451%2fapei-generic-hardware-error%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Looks like your RAM is failing, or having errors that are being corrected. Depending on the severity it sounds like these errors are impacting it's ability to function and it's having to reboot afterwards.
From the looks of this thread the message bit at the end about the error section length being too small is likely the culprit.
excerpt - [PATCH 1/1] efi: cper: Support different length of Error Section
Some fields might be added to the Error Section in the newer UEFI
spec. For example, the fields 'Reserved', 'Rank Number', 'Card Handle'
and 'Module Handle' are added to the Memory Error Section started from
UEFI spec 2.3. Unfortunately, there will have the following warning
message if the memory corrected error is detected and the field
'revision' in struct acpi_generic_data is less then 0x203 (UEFI spec
2.3):
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]: Error 0, type: corrected
{1}[Hardware Error]: section_type: memory error
[Firmware Warn]: error section length is too small
This behavior causes this corrected error cannot be displayed
correctly. To solve the issue, this patch supports different length of
the Error Section for different UEFI spec version.
And, this patch employs a pre-defined structure to clean up the
duplicated codes in function cper_estatus_print_section.
With applying this patch, the memory corrected error could be
displayed correctly after injecting the error.
Tested on v3.14-rc5 with Grantley platform and Intel RAStool.
So it would seem a patch for that particular error is in the works and might be available in a newer version of the kernel.
add a comment |
Looks like your RAM is failing, or having errors that are being corrected. Depending on the severity it sounds like these errors are impacting it's ability to function and it's having to reboot afterwards.
From the looks of this thread the message bit at the end about the error section length being too small is likely the culprit.
excerpt - [PATCH 1/1] efi: cper: Support different length of Error Section
Some fields might be added to the Error Section in the newer UEFI
spec. For example, the fields 'Reserved', 'Rank Number', 'Card Handle'
and 'Module Handle' are added to the Memory Error Section started from
UEFI spec 2.3. Unfortunately, there will have the following warning
message if the memory corrected error is detected and the field
'revision' in struct acpi_generic_data is less then 0x203 (UEFI spec
2.3):
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]: Error 0, type: corrected
{1}[Hardware Error]: section_type: memory error
[Firmware Warn]: error section length is too small
This behavior causes this corrected error cannot be displayed
correctly. To solve the issue, this patch supports different length of
the Error Section for different UEFI spec version.
And, this patch employs a pre-defined structure to clean up the
duplicated codes in function cper_estatus_print_section.
With applying this patch, the memory corrected error could be
displayed correctly after injecting the error.
Tested on v3.14-rc5 with Grantley platform and Intel RAStool.
So it would seem a patch for that particular error is in the works and might be available in a newer version of the kernel.
add a comment |
Looks like your RAM is failing, or having errors that are being corrected. Depending on the severity it sounds like these errors are impacting it's ability to function and it's having to reboot afterwards.
From the looks of this thread the message bit at the end about the error section length being too small is likely the culprit.
excerpt - [PATCH 1/1] efi: cper: Support different length of Error Section
Some fields might be added to the Error Section in the newer UEFI
spec. For example, the fields 'Reserved', 'Rank Number', 'Card Handle'
and 'Module Handle' are added to the Memory Error Section started from
UEFI spec 2.3. Unfortunately, there will have the following warning
message if the memory corrected error is detected and the field
'revision' in struct acpi_generic_data is less then 0x203 (UEFI spec
2.3):
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]: Error 0, type: corrected
{1}[Hardware Error]: section_type: memory error
[Firmware Warn]: error section length is too small
This behavior causes this corrected error cannot be displayed
correctly. To solve the issue, this patch supports different length of
the Error Section for different UEFI spec version.
And, this patch employs a pre-defined structure to clean up the
duplicated codes in function cper_estatus_print_section.
With applying this patch, the memory corrected error could be
displayed correctly after injecting the error.
Tested on v3.14-rc5 with Grantley platform and Intel RAStool.
So it would seem a patch for that particular error is in the works and might be available in a newer version of the kernel.
Looks like your RAM is failing, or having errors that are being corrected. Depending on the severity it sounds like these errors are impacting it's ability to function and it's having to reboot afterwards.
From the looks of this thread the message bit at the end about the error section length being too small is likely the culprit.
excerpt - [PATCH 1/1] efi: cper: Support different length of Error Section
Some fields might be added to the Error Section in the newer UEFI
spec. For example, the fields 'Reserved', 'Rank Number', 'Card Handle'
and 'Module Handle' are added to the Memory Error Section started from
UEFI spec 2.3. Unfortunately, there will have the following warning
message if the memory corrected error is detected and the field
'revision' in struct acpi_generic_data is less then 0x203 (UEFI spec
2.3):
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]: Error 0, type: corrected
{1}[Hardware Error]: section_type: memory error
[Firmware Warn]: error section length is too small
This behavior causes this corrected error cannot be displayed
correctly. To solve the issue, this patch supports different length of
the Error Section for different UEFI spec version.
And, this patch employs a pre-defined structure to clean up the
duplicated codes in function cper_estatus_print_section.
With applying this patch, the memory corrected error could be
displayed correctly after injecting the error.
Tested on v3.14-rc5 with Grantley platform and Intel RAStool.
So it would seem a patch for that particular error is in the works and might be available in a newer version of the kernel.
answered Aug 21 '14 at 13:54
slm♦slm
251k69529685
251k69529685
add a comment |
add a comment |
FYI I appeared to have a very similar issue as this.
As it turned out the solution was taking the memory out, and reseating it, and everything was back to normal.
add a comment |
FYI I appeared to have a very similar issue as this.
As it turned out the solution was taking the memory out, and reseating it, and everything was back to normal.
add a comment |
FYI I appeared to have a very similar issue as this.
As it turned out the solution was taking the memory out, and reseating it, and everything was back to normal.
FYI I appeared to have a very similar issue as this.
As it turned out the solution was taking the memory out, and reseating it, and everything was back to normal.
answered Dec 5 '17 at 21:02
Darren HarrisonDarren Harrison
412
412
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f150451%2fapei-generic-hardware-error%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown