Trying to upload specific characters in Python 3 using Windows Powershell












2















I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:



Here is languages.txt:



Afrikaans
አማርኛ
Аҧсшәа
العربية
Aragonés
Arpetan
Azərbaycanca
Bamanankan
বাংলা
Bân-lâm-gú
Беларуская
Български
Boarisch
Bosanski
Буряад
Català
Чӑвашла
Čeština
Cymraeg
Dansk
Deutsch
Eesti
Ελληνικά
Español
Esperanto
فارسی
Français
Frysk
Gaelg
Gàidhlig
Galego
한국어
Հայերեն
हिन्दी
Hrvatski
Ido
Interlingua
Italiano
עברית
ಕನ್ನಡ
Kapampangan
ქართული
Қазақша
Kreyòl ayisyen
Latgaļu
Latina
Latviešu
Lëtzebuergesch
Lietuvių
Magyar
Македонски
Malti
मराठी
მარგალური
مازِرونی
Bahasa Melayu
Монгол
Nederlands
नेपाल भाषा
日本語
Norsk bokmål
Nouormand
Occitan
Oʻzbekcha/ўзбекча
ਪੰਜਾਬੀ
پنجابی
پښتو
Plattdüütsch
Polski
Português
Română
Romani
Русский
Seeltersk
Shqip
Simple English
Slovenčina
کوردیی ناوەندی
Српски / srpski
Suomi
Svenska
Tagalog
தமிழ்
ภาษาไทย
Taqbaylit
Татарча/tatarça
తెలుగు
Тоҷикӣ
Türkçe
Українська
اردو
Tiếng Việt
Võro
文言
吴语
ייִדיש
中文


Then, here is the code I used:



import sys
script, input_encoding, error = sys.argv


def main(language_file, encoding, errors):
line = language_file.readline()

if line:
print_line(line, encoding, errors)
return main(language_file, encoding, errors)


def print_line(line, encoding, errors):
next_lang = line.strip()
raw_bytes = next_lang.encode(encoding, errors=errors)
cooked_string = raw_bytes.decode(encoding, errors=errors)

print(raw_bytes, "<===>", cooked_string)


languages = open("languages.txt", encoding="utf-8")

main(languages, input_encoding, error)


Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):



Output 1



Output 2



I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?










share|improve this question













migrated from superuser.com Jan 27 at 9:55


This question came from our site for computer enthusiasts and power users.



















  • Is languages.txt encoded as a UTF-8 file?

    – JakeGould
    Jan 19 at 4:29











  • Yes. languages.txt is encoded as a UTF-8 file.

    – CuriousEinstein
    Jan 19 at 13:00






  • 1





    It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

    – zvone
    Jan 27 at 10:08
















2















I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:



Here is languages.txt:



Afrikaans
አማርኛ
Аҧсшәа
العربية
Aragonés
Arpetan
Azərbaycanca
Bamanankan
বাংলা
Bân-lâm-gú
Беларуская
Български
Boarisch
Bosanski
Буряад
Català
Чӑвашла
Čeština
Cymraeg
Dansk
Deutsch
Eesti
Ελληνικά
Español
Esperanto
فارسی
Français
Frysk
Gaelg
Gàidhlig
Galego
한국어
Հայերեն
हिन्दी
Hrvatski
Ido
Interlingua
Italiano
עברית
ಕನ್ನಡ
Kapampangan
ქართული
Қазақша
Kreyòl ayisyen
Latgaļu
Latina
Latviešu
Lëtzebuergesch
Lietuvių
Magyar
Македонски
Malti
मराठी
მარგალური
مازِرونی
Bahasa Melayu
Монгол
Nederlands
नेपाल भाषा
日本語
Norsk bokmål
Nouormand
Occitan
Oʻzbekcha/ўзбекча
ਪੰਜਾਬੀ
پنجابی
پښتو
Plattdüütsch
Polski
Português
Română
Romani
Русский
Seeltersk
Shqip
Simple English
Slovenčina
کوردیی ناوەندی
Српски / srpski
Suomi
Svenska
Tagalog
தமிழ்
ภาษาไทย
Taqbaylit
Татарча/tatarça
తెలుగు
Тоҷикӣ
Türkçe
Українська
اردو
Tiếng Việt
Võro
文言
吴语
ייִדיש
中文


Then, here is the code I used:



import sys
script, input_encoding, error = sys.argv


def main(language_file, encoding, errors):
line = language_file.readline()

if line:
print_line(line, encoding, errors)
return main(language_file, encoding, errors)


def print_line(line, encoding, errors):
next_lang = line.strip()
raw_bytes = next_lang.encode(encoding, errors=errors)
cooked_string = raw_bytes.decode(encoding, errors=errors)

print(raw_bytes, "<===>", cooked_string)


languages = open("languages.txt", encoding="utf-8")

main(languages, input_encoding, error)


Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):



Output 1



Output 2



I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?










share|improve this question













migrated from superuser.com Jan 27 at 9:55


This question came from our site for computer enthusiasts and power users.



















  • Is languages.txt encoded as a UTF-8 file?

    – JakeGould
    Jan 19 at 4:29











  • Yes. languages.txt is encoded as a UTF-8 file.

    – CuriousEinstein
    Jan 19 at 13:00






  • 1





    It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

    – zvone
    Jan 27 at 10:08














2












2








2








I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:



Here is languages.txt:



Afrikaans
አማርኛ
Аҧсшәа
العربية
Aragonés
Arpetan
Azərbaycanca
Bamanankan
বাংলা
Bân-lâm-gú
Беларуская
Български
Boarisch
Bosanski
Буряад
Català
Чӑвашла
Čeština
Cymraeg
Dansk
Deutsch
Eesti
Ελληνικά
Español
Esperanto
فارسی
Français
Frysk
Gaelg
Gàidhlig
Galego
한국어
Հայերեն
हिन्दी
Hrvatski
Ido
Interlingua
Italiano
עברית
ಕನ್ನಡ
Kapampangan
ქართული
Қазақша
Kreyòl ayisyen
Latgaļu
Latina
Latviešu
Lëtzebuergesch
Lietuvių
Magyar
Македонски
Malti
मराठी
მარგალური
مازِرونی
Bahasa Melayu
Монгол
Nederlands
नेपाल भाषा
日本語
Norsk bokmål
Nouormand
Occitan
Oʻzbekcha/ўзбекча
ਪੰਜਾਬੀ
پنجابی
پښتو
Plattdüütsch
Polski
Português
Română
Romani
Русский
Seeltersk
Shqip
Simple English
Slovenčina
کوردیی ناوەندی
Српски / srpski
Suomi
Svenska
Tagalog
தமிழ்
ภาษาไทย
Taqbaylit
Татарча/tatarça
తెలుగు
Тоҷикӣ
Türkçe
Українська
اردو
Tiếng Việt
Võro
文言
吴语
ייִדיש
中文


Then, here is the code I used:



import sys
script, input_encoding, error = sys.argv


def main(language_file, encoding, errors):
line = language_file.readline()

if line:
print_line(line, encoding, errors)
return main(language_file, encoding, errors)


def print_line(line, encoding, errors):
next_lang = line.strip()
raw_bytes = next_lang.encode(encoding, errors=errors)
cooked_string = raw_bytes.decode(encoding, errors=errors)

print(raw_bytes, "<===>", cooked_string)


languages = open("languages.txt", encoding="utf-8")

main(languages, input_encoding, error)


Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):



Output 1



Output 2



I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?










share|improve this question














I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:



Here is languages.txt:



Afrikaans
አማርኛ
Аҧсшәа
العربية
Aragonés
Arpetan
Azərbaycanca
Bamanankan
বাংলা
Bân-lâm-gú
Беларуская
Български
Boarisch
Bosanski
Буряад
Català
Чӑвашла
Čeština
Cymraeg
Dansk
Deutsch
Eesti
Ελληνικά
Español
Esperanto
فارسی
Français
Frysk
Gaelg
Gàidhlig
Galego
한국어
Հայերեն
हिन्दी
Hrvatski
Ido
Interlingua
Italiano
עברית
ಕನ್ನಡ
Kapampangan
ქართული
Қазақша
Kreyòl ayisyen
Latgaļu
Latina
Latviešu
Lëtzebuergesch
Lietuvių
Magyar
Македонски
Malti
मराठी
მარგალური
مازِرونی
Bahasa Melayu
Монгол
Nederlands
नेपाल भाषा
日本語
Norsk bokmål
Nouormand
Occitan
Oʻzbekcha/ўзбекча
ਪੰਜਾਬੀ
پنجابی
پښتو
Plattdüütsch
Polski
Português
Română
Romani
Русский
Seeltersk
Shqip
Simple English
Slovenčina
کوردیی ناوەندی
Српски / srpski
Suomi
Svenska
Tagalog
தமிழ்
ภาษาไทย
Taqbaylit
Татарча/tatarça
తెలుగు
Тоҷикӣ
Türkçe
Українська
اردو
Tiếng Việt
Võro
文言
吴语
ייִדיש
中文


Then, here is the code I used:



import sys
script, input_encoding, error = sys.argv


def main(language_file, encoding, errors):
line = language_file.readline()

if line:
print_line(line, encoding, errors)
return main(language_file, encoding, errors)


def print_line(line, encoding, errors):
next_lang = line.strip()
raw_bytes = next_lang.encode(encoding, errors=errors)
cooked_string = raw_bytes.decode(encoding, errors=errors)

print(raw_bytes, "<===>", cooked_string)


languages = open("languages.txt", encoding="utf-8")

main(languages, input_encoding, error)


Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):



Output 1



Output 2



I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?







powershell python






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 19 at 1:43









CuriousEinsteinCuriousEinstein

134




134




migrated from superuser.com Jan 27 at 9:55


This question came from our site for computer enthusiasts and power users.









migrated from superuser.com Jan 27 at 9:55


This question came from our site for computer enthusiasts and power users.















  • Is languages.txt encoded as a UTF-8 file?

    – JakeGould
    Jan 19 at 4:29











  • Yes. languages.txt is encoded as a UTF-8 file.

    – CuriousEinstein
    Jan 19 at 13:00






  • 1





    It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

    – zvone
    Jan 27 at 10:08



















  • Is languages.txt encoded as a UTF-8 file?

    – JakeGould
    Jan 19 at 4:29











  • Yes. languages.txt is encoded as a UTF-8 file.

    – CuriousEinstein
    Jan 19 at 13:00






  • 1





    It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

    – zvone
    Jan 27 at 10:08

















Is languages.txt encoded as a UTF-8 file?

– JakeGould
Jan 19 at 4:29





Is languages.txt encoded as a UTF-8 file?

– JakeGould
Jan 19 at 4:29













Yes. languages.txt is encoded as a UTF-8 file.

– CuriousEinstein
Jan 19 at 13:00





Yes. languages.txt is encoded as a UTF-8 file.

– CuriousEinstein
Jan 19 at 13:00




1




1





It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

– zvone
Jan 27 at 10:08





It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

– zvone
Jan 27 at 10:08












1 Answer
1






active

oldest

votes


















0














The first string which fails is አማርኛ. The first character, is in unicode 12A0 (see here). In UTF-8, that is b'xe1x8axa0'. So, that part is obviously fine. The file really is UTF-8.



Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.



The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.



If it is just for play, you should not worry about it. Consider it working correctly.





On the other hand, I would suggest changing some things in your code:




  1. You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.


for line in lines:
print_line(line, encoding, errors)



  1. You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.


with open("languages.txt", 'rb') as f:
raw_file_contents = f.read()





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54386894%2ftrying-to-upload-specific-characters-in-python-3-using-windows-powershell%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The first string which fails is አማርኛ. The first character, is in unicode 12A0 (see here). In UTF-8, that is b'xe1x8axa0'. So, that part is obviously fine. The file really is UTF-8.



    Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.



    The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.



    If it is just for play, you should not worry about it. Consider it working correctly.





    On the other hand, I would suggest changing some things in your code:




    1. You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.


    for line in lines:
    print_line(line, encoding, errors)



    1. You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.


    with open("languages.txt", 'rb') as f:
    raw_file_contents = f.read()





    share|improve this answer






























      0














      The first string which fails is አማርኛ. The first character, is in unicode 12A0 (see here). In UTF-8, that is b'xe1x8axa0'. So, that part is obviously fine. The file really is UTF-8.



      Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.



      The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.



      If it is just for play, you should not worry about it. Consider it working correctly.





      On the other hand, I would suggest changing some things in your code:




      1. You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.


      for line in lines:
      print_line(line, encoding, errors)



      1. You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.


      with open("languages.txt", 'rb') as f:
      raw_file_contents = f.read()





      share|improve this answer




























        0












        0








        0







        The first string which fails is አማርኛ. The first character, is in unicode 12A0 (see here). In UTF-8, that is b'xe1x8axa0'. So, that part is obviously fine. The file really is UTF-8.



        Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.



        The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.



        If it is just for play, you should not worry about it. Consider it working correctly.





        On the other hand, I would suggest changing some things in your code:




        1. You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.


        for line in lines:
        print_line(line, encoding, errors)



        1. You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.


        with open("languages.txt", 'rb') as f:
        raw_file_contents = f.read()





        share|improve this answer















        The first string which fails is አማርኛ. The first character, is in unicode 12A0 (see here). In UTF-8, that is b'xe1x8axa0'. So, that part is obviously fine. The file really is UTF-8.



        Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.



        The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.



        If it is just for play, you should not worry about it. Consider it working correctly.





        On the other hand, I would suggest changing some things in your code:




        1. You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.


        for line in lines:
        print_line(line, encoding, errors)



        1. You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.


        with open("languages.txt", 'rb') as f:
        raw_file_contents = f.read()






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 30 at 17:57

























        answered Jan 27 at 10:33









        zvonezvone

        9,60412346




        9,60412346






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54386894%2ftrying-to-upload-specific-characters-in-python-3-using-windows-powershell%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to make a Squid Proxy server?

            Is this a new Fibonacci Identity?

            19世紀