Trying to upload specific characters in Python 3 using Windows Powershell

I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:

Here is languages.txt:

Afrikaans

አማርኛ

Аҧсшәа

العربية

Aragonés

Arpetan

Azərbaycanca

Bamanankan

বাংলা

Bân-lâm-gú

Беларуская

Български

Boarisch

Bosanski

Буряад

Català

Чӑвашла

Čeština

Cymraeg

Dansk

Deutsch

Eesti

Ελληνικά

Español

Esperanto

فارسی

Français

Frysk

Gaelg

Gàidhlig

Galego

한국어

Հայերեն

हिन्दी

Hrvatski

Ido

Interlingua

Italiano

עברית

ಕನ್ನಡ

Kapampangan

ქართული

Қазақша

Kreyòl ayisyen

Latgaļu

Latina

Latviešu

Lëtzebuergesch

Lietuvių

Magyar

Македонски

Malti

मराठी

მარგალური

مازِرونی

Bahasa Melayu

Монгол

Nederlands

नेपाल भाषा

日本語

Norsk bokmål

Nouormand

Occitan

Oʻzbekcha/ўзбекча

ਪੰਜਾਬੀ

پنجابی

پښتو

Plattdüütsch

Polski

Português

Română

Romani

Русский

Seeltersk

Shqip

Simple English

Slovenčina

کوردیی ناوەندی

Српски / srpski

Suomi

Svenska

Tagalog

தமிழ்

ภาษาไทย

Taqbaylit

Татарча/tatarça

తెలుగు

Тоҷикӣ

Türkçe

Українська

اردو

Tiếng Việt

Võro

文言

吴语

ייִדיש

中文

Then, here is the code I used:

import sys

script, input_encoding, error = sys.argv





def main(language_file, encoding, errors):

    line = language_file.readline()



    if line:

        print_line(line, encoding, errors)

        return main(language_file, encoding, errors)





def print_line(line, encoding, errors):

    next_lang = line.strip()

    raw_bytes = next_lang.encode(encoding, errors=errors)

    cooked_string = raw_bytes.decode(encoding, errors=errors)



    print(raw_bytes, "<===>", cooked_string)





languages = open("languages.txt", encoding="utf-8")



main(languages, input_encoding, error)

Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):

Output 1

Output 2

I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?

asked Jan 19 at 1:43

CuriousEinstein

134

migrated from superuser.com Jan 27 at 9:55

This question came from our site for computer enthusiasts and power users.

Is languages.txt encoded as a UTF-8 file?

– JakeGould
Jan 19 at 4:29

Yes. languages.txt is encoded as a UTF-8 file.

– CuriousEinstein
Jan 19 at 13:00

1

It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

– zvone
Jan 27 at 10:08

add a comment |

I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:

Here is languages.txt:

Afrikaans

አማርኛ

Аҧсшәа

العربية

Aragonés

Arpetan

Azərbaycanca

Bamanankan

বাংলা

Bân-lâm-gú

Беларуская

Български

Boarisch

Bosanski

Буряад

Català

Чӑвашла

Čeština

Cymraeg

Dansk

Deutsch

Eesti

Ελληνικά

Español

Esperanto

فارسی

Français

Frysk

Gaelg

Gàidhlig

Galego

한국어

Հայերեն

हिन्दी

Hrvatski

Ido

Interlingua

Italiano

עברית

ಕನ್ನಡ

Kapampangan

ქართული

Қазақша

Kreyòl ayisyen

Latgaļu

Latina

Latviešu

Lëtzebuergesch

Lietuvių

Magyar

Македонски

Malti

मराठी

მარგალური

مازِرونی

Bahasa Melayu

Монгол

Nederlands

नेपाल भाषा

日本語

Norsk bokmål

Nouormand

Occitan

Oʻzbekcha/ўзбекча

ਪੰਜਾਬੀ

پنجابی

پښتو

Plattdüütsch

Polski

Português

Română

Romani

Русский

Seeltersk

Shqip

Simple English

Slovenčina

کوردیی ناوەندی

Српски / srpski

Suomi

Svenska

Tagalog

தமிழ்

ภาษาไทย

Taqbaylit

Татарча/tatarça

తెలుగు

Тоҷикӣ

Türkçe

Українська

اردو

Tiếng Việt

Võro

文言

吴语

ייִדיש

中文

Then, here is the code I used:

import sys

script, input_encoding, error = sys.argv





def main(language_file, encoding, errors):

    line = language_file.readline()



    if line:

        print_line(line, encoding, errors)

        return main(language_file, encoding, errors)





def print_line(line, encoding, errors):

    next_lang = line.strip()

    raw_bytes = next_lang.encode(encoding, errors=errors)

    cooked_string = raw_bytes.decode(encoding, errors=errors)



    print(raw_bytes, "<===>", cooked_string)





languages = open("languages.txt", encoding="utf-8")



main(languages, input_encoding, error)

Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):

Output 1

Output 2

I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?

asked Jan 19 at 1:43

CuriousEinstein

134

migrated from superuser.com Jan 27 at 9:55

This question came from our site for computer enthusiasts and power users.

Is languages.txt encoded as a UTF-8 file?

– JakeGould
Jan 19 at 4:29

Yes. languages.txt is encoded as a UTF-8 file.

– CuriousEinstein
Jan 19 at 13:00

1

It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

– zvone
Jan 27 at 10:08

add a comment |

I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:

Here is languages.txt:

Afrikaans

አማርኛ

Аҧсшәа

العربية

Aragonés

Arpetan

Azərbaycanca

Bamanankan

বাংলা

Bân-lâm-gú

Беларуская

Български

Boarisch

Bosanski

Буряад

Català

Чӑвашла

Čeština

Cymraeg

Dansk

Deutsch

Eesti

Ελληνικά

Español

Esperanto

فارسی

Français

Frysk

Gaelg

Gàidhlig

Galego

한국어

Հայերեն

हिन्दी

Hrvatski

Ido

Interlingua

Italiano

עברית

ಕನ್ನಡ

Kapampangan

ქართული

Қазақша

Kreyòl ayisyen

Latgaļu

Latina

Latviešu

Lëtzebuergesch

Lietuvių

Magyar

Македонски

Malti

मराठी

მარგალური

مازِرونی

Bahasa Melayu

Монгол

Nederlands

नेपाल भाषा

日本語

Norsk bokmål

Nouormand

Occitan

Oʻzbekcha/ўзбекча

ਪੰਜਾਬੀ

پنجابی

پښتو

Plattdüütsch

Polski

Português

Română

Romani

Русский

Seeltersk

Shqip

Simple English

Slovenčina

کوردیی ناوەندی

Српски / srpski

Suomi

Svenska

Tagalog

தமிழ்

ภาษาไทย

Taqbaylit

Татарча/tatarça

తెలుగు

Тоҷикӣ

Türkçe

Українська

اردو

Tiếng Việt

Võro

文言

吴语

ייִדיש

中文

Then, here is the code I used:

import sys

script, input_encoding, error = sys.argv





def main(language_file, encoding, errors):

    line = language_file.readline()



    if line:

        print_line(line, encoding, errors)

        return main(language_file, encoding, errors)





def print_line(line, encoding, errors):

    next_lang = line.strip()

    raw_bytes = next_lang.encode(encoding, errors=errors)

    cooked_string = raw_bytes.decode(encoding, errors=errors)



    print(raw_bytes, "<===>", cooked_string)





languages = open("languages.txt", encoding="utf-8")



main(languages, input_encoding, error)

Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):

Output 1

Output 2

I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?

asked Jan 19 at 1:43

CuriousEinstein

134

I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:

Here is languages.txt:

Afrikaans

አማርኛ

Аҧсшәа

العربية

Aragonés

Arpetan

Azərbaycanca

Bamanankan

বাংলা

Bân-lâm-gú

Беларуская

Български

Boarisch

Bosanski

Буряад

Català

Чӑвашла

Čeština

Cymraeg

Dansk

Deutsch

Eesti

Ελληνικά

Español

Esperanto

فارسی

Français

Frysk

Gaelg

Gàidhlig

Galego

한국어

Հայերեն

हिन्दी

Hrvatski

Ido

Interlingua

Italiano

עברית

ಕನ್ನಡ

Kapampangan

ქართული

Қазақша

Kreyòl ayisyen

Latgaļu

Latina

Latviešu

Lëtzebuergesch

Lietuvių

Magyar

Македонски

Malti

मराठी

მარგალური

مازِرونی

Bahasa Melayu

Монгол

Nederlands

नेपाल भाषा

日本語

Norsk bokmål

Nouormand

Occitan

Oʻzbekcha/ўзбекча

ਪੰਜਾਬੀ

پنجابی

پښتو

Plattdüütsch

Polski

Português

Română

Romani

Русский

Seeltersk

Shqip

Simple English

Slovenčina

کوردیی ناوەندی

Српски / srpski

Suomi

Svenska

Tagalog

தமிழ்

ภาษาไทย

Taqbaylit

Татарча/tatarça

తెలుగు

Тоҷикӣ

Türkçe

Українська

اردو

Tiếng Việt

Võro

文言

吴语

ייִדיש

中文

Then, here is the code I used:

import sys

script, input_encoding, error = sys.argv





def main(language_file, encoding, errors):

    line = language_file.readline()



    if line:

        print_line(line, encoding, errors)

        return main(language_file, encoding, errors)





def print_line(line, encoding, errors):

    next_lang = line.strip()

    raw_bytes = next_lang.encode(encoding, errors=errors)

    cooked_string = raw_bytes.decode(encoding, errors=errors)



    print(raw_bytes, "<===>", cooked_string)





languages = open("languages.txt", encoding="utf-8")



main(languages, input_encoding, error)

Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):

Output 1

Output 2

I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?

powershell python

asked Jan 19 at 1:43

CuriousEinstein

134

asked Jan 19 at 1:43

CuriousEinstein

134

asked Jan 19 at 1:43

CuriousEinstein

134

asked Jan 19 at 1:43

CuriousEinstein

134

asked Jan 19 at 1:43

CuriousEinstein

134

migrated from superuser.com Jan 27 at 9:55

This question came from our site for computer enthusiasts and power users.

migrated from superuser.com Jan 27 at 9:55

This question came from our site for computer enthusiasts and power users.

Is languages.txt encoded as a UTF-8 file?

– JakeGould
Jan 19 at 4:29

Yes. languages.txt is encoded as a UTF-8 file.

– CuriousEinstein
Jan 19 at 13:00

1

It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

– zvone
Jan 27 at 10:08

add a comment |

Is languages.txt encoded as a UTF-8 file?

– JakeGould
Jan 19 at 4:29

Yes. languages.txt is encoded as a UTF-8 file.

– CuriousEinstein
Jan 19 at 13:00

1

It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

– zvone
Jan 27 at 10:08

Is languages.txt encoded as a UTF-8 file?

– JakeGould
Jan 19 at 4:29

Yes. languages.txt is encoded as a UTF-8 file.

– CuriousEinstein
Jan 19 at 13:00

It seems that the powershell is unable to print the characters. Perhaps it uses a font which does not support them.

– zvone
Jan 27 at 10:08

add a comment |

1 Answer
1

active

oldest

votes

The first string which fails is አማርኛ. The first character, አ is in unicode 12A0 (see here). In UTF-8, that is b'xe1x8axa0'. So, that part is obviously fine. The file really is UTF-8.

Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.

The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.

If it is just for play, you should not worry about it. Consider it working correctly.

On the other hand, I would suggest changing some things in your code:

You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.

for line in lines:

    print_line(line, encoding, errors)

You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.

with open("languages.txt", 'rb') as f:

    raw_file_contents = f.read()

edited Jan 30 at 17:57

answered Jan 27 at 10:33

zvone

9,60412346

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54386894%2ftrying-to-upload-specific-characters-in-python-3-using-windows-powershell%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.

The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.

If it is just for play, you should not worry about it. Consider it working correctly.

On the other hand, I would suggest changing some things in your code:

You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.

for line in lines:

    print_line(line, encoding, errors)

You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.

with open("languages.txt", 'rb') as f:

    raw_file_contents = f.read()

edited Jan 30 at 17:57

answered Jan 27 at 10:33

zvone

9,60412346

add a comment |

Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.

The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.

If it is just for play, you should not worry about it. Consider it working correctly.

On the other hand, I would suggest changing some things in your code:

You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.

for line in lines:

    print_line(line, encoding, errors)

You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.

with open("languages.txt", 'rb') as f:

    raw_file_contents = f.read()

edited Jan 30 at 17:57

answered Jan 27 at 10:33

zvone

9,60412346

add a comment |

Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.

The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.

If it is just for play, you should not worry about it. Consider it working correctly.

On the other hand, I would suggest changing some things in your code:

You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.

for line in lines:

    print_line(line, encoding, errors)

You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.

with open("languages.txt", 'rb') as f:

    raw_file_contents = f.read()

edited Jan 30 at 17:57

answered Jan 27 at 10:33

zvone

9,60412346

Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.

The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.

If it is just for play, you should not worry about it. Consider it working correctly.

On the other hand, I would suggest changing some things in your code:

You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.

for line in lines:

    print_line(line, encoding, errors)

You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.

with open("languages.txt", 'rb') as f:

    raw_file_contents = f.read()

edited Jan 30 at 17:57

answered Jan 27 at 10:33

zvone

9,60412346

edited Jan 30 at 17:57

answered Jan 27 at 10:33

zvone

9,60412346

answered Jan 27 at 10:33

zvone

9,60412346

answered Jan 27 at 10:33

zvone

9,60412346

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly