non-printable characters in man pages and logs












0















I am seeing non-printable characters in a lot of system contexts, like man pages and logs. They show up as a highlighted question mark. For example, in one man page I have the following values:



E2 9F A8 ...email address... E2 9F A9


and the author name



F r C3 A9 d C3 A9 r i c k


where the numbers are hex values. I could not find these values in unicode, so I am not sure what they are. When I cat them as escape sequences I get the following:



M-b M-^ M-^_ M-(  ... email address ... M-b M-^_ M-)


How can I get the output translated correctly? Note that my locale is set as:



LANG=en_US.UTF-8


The $TERM is "linux" and I am using a virtual console on the machine itself.










share|improve this question

























  • This one seems to come up a lot. unix.stackexchange.com/questions/106421 , unix.stackexchange.com/questions/112018 , unix.stackexchange.com/questions/61293 …

    – JdeBP
    Jan 28 at 0:43











  • Those are U+27E8 and U+27E9 MATHEMATICAL LEFT/RIGHT ANGLE BRACKET and the name "Frédérick".

    – Michael Homer
    Jan 28 at 0:43






  • 1





    You probably need to say what your terminal emulator is and/or where else you're seeing the misrendering.

    – Michael Homer
    Jan 28 at 0:46











  • @MichaelHomer the $TERM is "linux"

    – Tyler Durden
    Jan 28 at 0:50











  • Do you mean that you're in the textual VT, or in something that emulates it?

    – Michael Homer
    Jan 28 at 0:52
















0















I am seeing non-printable characters in a lot of system contexts, like man pages and logs. They show up as a highlighted question mark. For example, in one man page I have the following values:



E2 9F A8 ...email address... E2 9F A9


and the author name



F r C3 A9 d C3 A9 r i c k


where the numbers are hex values. I could not find these values in unicode, so I am not sure what they are. When I cat them as escape sequences I get the following:



M-b M-^ M-^_ M-(  ... email address ... M-b M-^_ M-)


How can I get the output translated correctly? Note that my locale is set as:



LANG=en_US.UTF-8


The $TERM is "linux" and I am using a virtual console on the machine itself.










share|improve this question

























  • This one seems to come up a lot. unix.stackexchange.com/questions/106421 , unix.stackexchange.com/questions/112018 , unix.stackexchange.com/questions/61293 …

    – JdeBP
    Jan 28 at 0:43











  • Those are U+27E8 and U+27E9 MATHEMATICAL LEFT/RIGHT ANGLE BRACKET and the name "Frédérick".

    – Michael Homer
    Jan 28 at 0:43






  • 1





    You probably need to say what your terminal emulator is and/or where else you're seeing the misrendering.

    – Michael Homer
    Jan 28 at 0:46











  • @MichaelHomer the $TERM is "linux"

    – Tyler Durden
    Jan 28 at 0:50











  • Do you mean that you're in the textual VT, or in something that emulates it?

    – Michael Homer
    Jan 28 at 0:52














0












0








0








I am seeing non-printable characters in a lot of system contexts, like man pages and logs. They show up as a highlighted question mark. For example, in one man page I have the following values:



E2 9F A8 ...email address... E2 9F A9


and the author name



F r C3 A9 d C3 A9 r i c k


where the numbers are hex values. I could not find these values in unicode, so I am not sure what they are. When I cat them as escape sequences I get the following:



M-b M-^ M-^_ M-(  ... email address ... M-b M-^_ M-)


How can I get the output translated correctly? Note that my locale is set as:



LANG=en_US.UTF-8


The $TERM is "linux" and I am using a virtual console on the machine itself.










share|improve this question
















I am seeing non-printable characters in a lot of system contexts, like man pages and logs. They show up as a highlighted question mark. For example, in one man page I have the following values:



E2 9F A8 ...email address... E2 9F A9


and the author name



F r C3 A9 d C3 A9 r i c k


where the numbers are hex values. I could not find these values in unicode, so I am not sure what they are. When I cat them as escape sequences I get the following:



M-b M-^ M-^_ M-(  ... email address ... M-b M-^_ M-)


How can I get the output translated correctly? Note that my locale is set as:



LANG=en_US.UTF-8


The $TERM is "linux" and I am using a virtual console on the machine itself.







arch-linux man






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 28 at 1:01









Jeff Schaller

40.9k1056131




40.9k1056131










asked Jan 28 at 0:27









Tyler DurdenTyler Durden

1,62542350




1,62542350













  • This one seems to come up a lot. unix.stackexchange.com/questions/106421 , unix.stackexchange.com/questions/112018 , unix.stackexchange.com/questions/61293 …

    – JdeBP
    Jan 28 at 0:43











  • Those are U+27E8 and U+27E9 MATHEMATICAL LEFT/RIGHT ANGLE BRACKET and the name "Frédérick".

    – Michael Homer
    Jan 28 at 0:43






  • 1





    You probably need to say what your terminal emulator is and/or where else you're seeing the misrendering.

    – Michael Homer
    Jan 28 at 0:46











  • @MichaelHomer the $TERM is "linux"

    – Tyler Durden
    Jan 28 at 0:50











  • Do you mean that you're in the textual VT, or in something that emulates it?

    – Michael Homer
    Jan 28 at 0:52



















  • This one seems to come up a lot. unix.stackexchange.com/questions/106421 , unix.stackexchange.com/questions/112018 , unix.stackexchange.com/questions/61293 …

    – JdeBP
    Jan 28 at 0:43











  • Those are U+27E8 and U+27E9 MATHEMATICAL LEFT/RIGHT ANGLE BRACKET and the name "Frédérick".

    – Michael Homer
    Jan 28 at 0:43






  • 1





    You probably need to say what your terminal emulator is and/or where else you're seeing the misrendering.

    – Michael Homer
    Jan 28 at 0:46











  • @MichaelHomer the $TERM is "linux"

    – Tyler Durden
    Jan 28 at 0:50











  • Do you mean that you're in the textual VT, or in something that emulates it?

    – Michael Homer
    Jan 28 at 0:52

















This one seems to come up a lot. unix.stackexchange.com/questions/106421 , unix.stackexchange.com/questions/112018 , unix.stackexchange.com/questions/61293 …

– JdeBP
Jan 28 at 0:43





This one seems to come up a lot. unix.stackexchange.com/questions/106421 , unix.stackexchange.com/questions/112018 , unix.stackexchange.com/questions/61293 …

– JdeBP
Jan 28 at 0:43













Those are U+27E8 and U+27E9 MATHEMATICAL LEFT/RIGHT ANGLE BRACKET and the name "Frédérick".

– Michael Homer
Jan 28 at 0:43





Those are U+27E8 and U+27E9 MATHEMATICAL LEFT/RIGHT ANGLE BRACKET and the name "Frédérick".

– Michael Homer
Jan 28 at 0:43




1




1





You probably need to say what your terminal emulator is and/or where else you're seeing the misrendering.

– Michael Homer
Jan 28 at 0:46





You probably need to say what your terminal emulator is and/or where else you're seeing the misrendering.

– Michael Homer
Jan 28 at 0:46













@MichaelHomer the $TERM is "linux"

– Tyler Durden
Jan 28 at 0:50





@MichaelHomer the $TERM is "linux"

– Tyler Durden
Jan 28 at 0:50













Do you mean that you're in the textual VT, or in something that emulates it?

– Michael Homer
Jan 28 at 0:52





Do you mean that you're in the textual VT, or in something that emulates it?

– Michael Homer
Jan 28 at 0:52










1 Answer
1






active

oldest

votes


















0














Basically, while you're commenters are not explicitly stating it, your man page is being formatted with UTF-8 characters. Depending on various things, this may be fixed with





  1. Assuming that you want American English in your man pages,



    LC_CTYPE=C man whatever you wanted to look up




should suppress that behavior. Judging by your bio, you might like that option. You could specify LC_ALL instead of LC_CTYPE if you don't have LC_CTYPE set. And like all environment variables, you can set that in your shell and have it take effect for all of your commands.



Another locale option that would disable this is en_US, and if you don't want American English, then there's a bunch of other locales, many of which have a .UTF-8 and presumably don't do UTF-8 without it.




  1. If your console font happens to have a unicode mapping for the characters in question, you may be able to just type -r in your man pager, and your pager will show the right character. That said, this setting is not without risk; it allows any terminal control sequences in the document you're viewing to take effect. You should be safe doing it while viewing man pages, but if viewing files from untrustworthy sources or viewing files with random data in them, it could lock up your console or worse. (There are some terminal escape sequences to allow a file you're displaying to the console to put characters in the keyboard buffer, having them take effect as if you typed them as soon as the keyboard buffer is read. I don't know if the linux virtual console supports any of these or not.)


I think this is something people notice more due to Linux distributions having switched to defaulting to UTF-8 turned on rather than off. Meanwhile, the Linux console still only supports 512 characters in a font, and that only if half of the 16 colors are disabled. And in other Unicode teething issues, X window fonts only support 64k characters, while Unicode has a lot more than 64k characters.






share|improve this answer


























  • "is becoming" and "switching" are possibly not the right tense, "having switched" and "became" being more appropriate. The tooling that M. Dickey talks about in xyr (somewhat more forward-thinking) answer at unix.stackexchange.com/a/284065/5132 has been around for over 20 years. Debian's kbd package started switching the KVTs into Unicode mode for UTF-8 locales back in 2006. Indeed, even the Q&A on this WWW site where this has already been covered is 5 years old, and several of the other similar ones are older.

    – JdeBP
    Jan 28 at 12:59











  • You're right. I was tempted to use that tense, but didn't want to sound like one of those people who said this was an old problem. One of the results of Linux being so stable and bloat-resistant is people can get by with old systems for a lot longer. I upgraded both my Linux desktop and laptop in 2018... from computers from 2005 and 2008. Due to network changes in moves, my old systems had been off-network for about five years before that, so hadn't been updated. I thought the OP might have experienced something similar. But it's more likely that it's just using the VT after so many years.

    – Ed Grimm
    Jan 28 at 15:33











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497099%2fnon-printable-characters-in-man-pages-and-logs%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Basically, while you're commenters are not explicitly stating it, your man page is being formatted with UTF-8 characters. Depending on various things, this may be fixed with





  1. Assuming that you want American English in your man pages,



    LC_CTYPE=C man whatever you wanted to look up




should suppress that behavior. Judging by your bio, you might like that option. You could specify LC_ALL instead of LC_CTYPE if you don't have LC_CTYPE set. And like all environment variables, you can set that in your shell and have it take effect for all of your commands.



Another locale option that would disable this is en_US, and if you don't want American English, then there's a bunch of other locales, many of which have a .UTF-8 and presumably don't do UTF-8 without it.




  1. If your console font happens to have a unicode mapping for the characters in question, you may be able to just type -r in your man pager, and your pager will show the right character. That said, this setting is not without risk; it allows any terminal control sequences in the document you're viewing to take effect. You should be safe doing it while viewing man pages, but if viewing files from untrustworthy sources or viewing files with random data in them, it could lock up your console or worse. (There are some terminal escape sequences to allow a file you're displaying to the console to put characters in the keyboard buffer, having them take effect as if you typed them as soon as the keyboard buffer is read. I don't know if the linux virtual console supports any of these or not.)


I think this is something people notice more due to Linux distributions having switched to defaulting to UTF-8 turned on rather than off. Meanwhile, the Linux console still only supports 512 characters in a font, and that only if half of the 16 colors are disabled. And in other Unicode teething issues, X window fonts only support 64k characters, while Unicode has a lot more than 64k characters.






share|improve this answer


























  • "is becoming" and "switching" are possibly not the right tense, "having switched" and "became" being more appropriate. The tooling that M. Dickey talks about in xyr (somewhat more forward-thinking) answer at unix.stackexchange.com/a/284065/5132 has been around for over 20 years. Debian's kbd package started switching the KVTs into Unicode mode for UTF-8 locales back in 2006. Indeed, even the Q&A on this WWW site where this has already been covered is 5 years old, and several of the other similar ones are older.

    – JdeBP
    Jan 28 at 12:59











  • You're right. I was tempted to use that tense, but didn't want to sound like one of those people who said this was an old problem. One of the results of Linux being so stable and bloat-resistant is people can get by with old systems for a lot longer. I upgraded both my Linux desktop and laptop in 2018... from computers from 2005 and 2008. Due to network changes in moves, my old systems had been off-network for about five years before that, so hadn't been updated. I thought the OP might have experienced something similar. But it's more likely that it's just using the VT after so many years.

    – Ed Grimm
    Jan 28 at 15:33
















0














Basically, while you're commenters are not explicitly stating it, your man page is being formatted with UTF-8 characters. Depending on various things, this may be fixed with





  1. Assuming that you want American English in your man pages,



    LC_CTYPE=C man whatever you wanted to look up




should suppress that behavior. Judging by your bio, you might like that option. You could specify LC_ALL instead of LC_CTYPE if you don't have LC_CTYPE set. And like all environment variables, you can set that in your shell and have it take effect for all of your commands.



Another locale option that would disable this is en_US, and if you don't want American English, then there's a bunch of other locales, many of which have a .UTF-8 and presumably don't do UTF-8 without it.




  1. If your console font happens to have a unicode mapping for the characters in question, you may be able to just type -r in your man pager, and your pager will show the right character. That said, this setting is not without risk; it allows any terminal control sequences in the document you're viewing to take effect. You should be safe doing it while viewing man pages, but if viewing files from untrustworthy sources or viewing files with random data in them, it could lock up your console or worse. (There are some terminal escape sequences to allow a file you're displaying to the console to put characters in the keyboard buffer, having them take effect as if you typed them as soon as the keyboard buffer is read. I don't know if the linux virtual console supports any of these or not.)


I think this is something people notice more due to Linux distributions having switched to defaulting to UTF-8 turned on rather than off. Meanwhile, the Linux console still only supports 512 characters in a font, and that only if half of the 16 colors are disabled. And in other Unicode teething issues, X window fonts only support 64k characters, while Unicode has a lot more than 64k characters.






share|improve this answer


























  • "is becoming" and "switching" are possibly not the right tense, "having switched" and "became" being more appropriate. The tooling that M. Dickey talks about in xyr (somewhat more forward-thinking) answer at unix.stackexchange.com/a/284065/5132 has been around for over 20 years. Debian's kbd package started switching the KVTs into Unicode mode for UTF-8 locales back in 2006. Indeed, even the Q&A on this WWW site where this has already been covered is 5 years old, and several of the other similar ones are older.

    – JdeBP
    Jan 28 at 12:59











  • You're right. I was tempted to use that tense, but didn't want to sound like one of those people who said this was an old problem. One of the results of Linux being so stable and bloat-resistant is people can get by with old systems for a lot longer. I upgraded both my Linux desktop and laptop in 2018... from computers from 2005 and 2008. Due to network changes in moves, my old systems had been off-network for about five years before that, so hadn't been updated. I thought the OP might have experienced something similar. But it's more likely that it's just using the VT after so many years.

    – Ed Grimm
    Jan 28 at 15:33














0












0








0







Basically, while you're commenters are not explicitly stating it, your man page is being formatted with UTF-8 characters. Depending on various things, this may be fixed with





  1. Assuming that you want American English in your man pages,



    LC_CTYPE=C man whatever you wanted to look up




should suppress that behavior. Judging by your bio, you might like that option. You could specify LC_ALL instead of LC_CTYPE if you don't have LC_CTYPE set. And like all environment variables, you can set that in your shell and have it take effect for all of your commands.



Another locale option that would disable this is en_US, and if you don't want American English, then there's a bunch of other locales, many of which have a .UTF-8 and presumably don't do UTF-8 without it.




  1. If your console font happens to have a unicode mapping for the characters in question, you may be able to just type -r in your man pager, and your pager will show the right character. That said, this setting is not without risk; it allows any terminal control sequences in the document you're viewing to take effect. You should be safe doing it while viewing man pages, but if viewing files from untrustworthy sources or viewing files with random data in them, it could lock up your console or worse. (There are some terminal escape sequences to allow a file you're displaying to the console to put characters in the keyboard buffer, having them take effect as if you typed them as soon as the keyboard buffer is read. I don't know if the linux virtual console supports any of these or not.)


I think this is something people notice more due to Linux distributions having switched to defaulting to UTF-8 turned on rather than off. Meanwhile, the Linux console still only supports 512 characters in a font, and that only if half of the 16 colors are disabled. And in other Unicode teething issues, X window fonts only support 64k characters, while Unicode has a lot more than 64k characters.






share|improve this answer















Basically, while you're commenters are not explicitly stating it, your man page is being formatted with UTF-8 characters. Depending on various things, this may be fixed with





  1. Assuming that you want American English in your man pages,



    LC_CTYPE=C man whatever you wanted to look up




should suppress that behavior. Judging by your bio, you might like that option. You could specify LC_ALL instead of LC_CTYPE if you don't have LC_CTYPE set. And like all environment variables, you can set that in your shell and have it take effect for all of your commands.



Another locale option that would disable this is en_US, and if you don't want American English, then there's a bunch of other locales, many of which have a .UTF-8 and presumably don't do UTF-8 without it.




  1. If your console font happens to have a unicode mapping for the characters in question, you may be able to just type -r in your man pager, and your pager will show the right character. That said, this setting is not without risk; it allows any terminal control sequences in the document you're viewing to take effect. You should be safe doing it while viewing man pages, but if viewing files from untrustworthy sources or viewing files with random data in them, it could lock up your console or worse. (There are some terminal escape sequences to allow a file you're displaying to the console to put characters in the keyboard buffer, having them take effect as if you typed them as soon as the keyboard buffer is read. I don't know if the linux virtual console supports any of these or not.)


I think this is something people notice more due to Linux distributions having switched to defaulting to UTF-8 turned on rather than off. Meanwhile, the Linux console still only supports 512 characters in a font, and that only if half of the 16 colors are disabled. And in other Unicode teething issues, X window fonts only support 64k characters, while Unicode has a lot more than 64k characters.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 28 at 15:20

























answered Jan 28 at 3:39









Ed GrimmEd Grimm

3286




3286













  • "is becoming" and "switching" are possibly not the right tense, "having switched" and "became" being more appropriate. The tooling that M. Dickey talks about in xyr (somewhat more forward-thinking) answer at unix.stackexchange.com/a/284065/5132 has been around for over 20 years. Debian's kbd package started switching the KVTs into Unicode mode for UTF-8 locales back in 2006. Indeed, even the Q&A on this WWW site where this has already been covered is 5 years old, and several of the other similar ones are older.

    – JdeBP
    Jan 28 at 12:59











  • You're right. I was tempted to use that tense, but didn't want to sound like one of those people who said this was an old problem. One of the results of Linux being so stable and bloat-resistant is people can get by with old systems for a lot longer. I upgraded both my Linux desktop and laptop in 2018... from computers from 2005 and 2008. Due to network changes in moves, my old systems had been off-network for about five years before that, so hadn't been updated. I thought the OP might have experienced something similar. But it's more likely that it's just using the VT after so many years.

    – Ed Grimm
    Jan 28 at 15:33



















  • "is becoming" and "switching" are possibly not the right tense, "having switched" and "became" being more appropriate. The tooling that M. Dickey talks about in xyr (somewhat more forward-thinking) answer at unix.stackexchange.com/a/284065/5132 has been around for over 20 years. Debian's kbd package started switching the KVTs into Unicode mode for UTF-8 locales back in 2006. Indeed, even the Q&A on this WWW site where this has already been covered is 5 years old, and several of the other similar ones are older.

    – JdeBP
    Jan 28 at 12:59











  • You're right. I was tempted to use that tense, but didn't want to sound like one of those people who said this was an old problem. One of the results of Linux being so stable and bloat-resistant is people can get by with old systems for a lot longer. I upgraded both my Linux desktop and laptop in 2018... from computers from 2005 and 2008. Due to network changes in moves, my old systems had been off-network for about five years before that, so hadn't been updated. I thought the OP might have experienced something similar. But it's more likely that it's just using the VT after so many years.

    – Ed Grimm
    Jan 28 at 15:33

















"is becoming" and "switching" are possibly not the right tense, "having switched" and "became" being more appropriate. The tooling that M. Dickey talks about in xyr (somewhat more forward-thinking) answer at unix.stackexchange.com/a/284065/5132 has been around for over 20 years. Debian's kbd package started switching the KVTs into Unicode mode for UTF-8 locales back in 2006. Indeed, even the Q&A on this WWW site where this has already been covered is 5 years old, and several of the other similar ones are older.

– JdeBP
Jan 28 at 12:59





"is becoming" and "switching" are possibly not the right tense, "having switched" and "became" being more appropriate. The tooling that M. Dickey talks about in xyr (somewhat more forward-thinking) answer at unix.stackexchange.com/a/284065/5132 has been around for over 20 years. Debian's kbd package started switching the KVTs into Unicode mode for UTF-8 locales back in 2006. Indeed, even the Q&A on this WWW site where this has already been covered is 5 years old, and several of the other similar ones are older.

– JdeBP
Jan 28 at 12:59













You're right. I was tempted to use that tense, but didn't want to sound like one of those people who said this was an old problem. One of the results of Linux being so stable and bloat-resistant is people can get by with old systems for a lot longer. I upgraded both my Linux desktop and laptop in 2018... from computers from 2005 and 2008. Due to network changes in moves, my old systems had been off-network for about five years before that, so hadn't been updated. I thought the OP might have experienced something similar. But it's more likely that it's just using the VT after so many years.

– Ed Grimm
Jan 28 at 15:33





You're right. I was tempted to use that tense, but didn't want to sound like one of those people who said this was an old problem. One of the results of Linux being so stable and bloat-resistant is people can get by with old systems for a lot longer. I upgraded both my Linux desktop and laptop in 2018... from computers from 2005 and 2008. Due to network changes in moves, my old systems had been off-network for about five years before that, so hadn't been updated. I thought the OP might have experienced something similar. But it's more likely that it's just using the VT after so many years.

– Ed Grimm
Jan 28 at 15:33


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497099%2fnon-printable-characters-in-man-pages-and-logs%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

is 'sed' thread safe

How to make a Squid Proxy server?