How to save a web resource, applying decoding to URL Encoded Characters?
I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.
The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv
, I would like to programmatically convert it in spesa del settore sanità.csv
. In the source name we have %20
that is a space and %E0
that is à
.
How to do this name conversion?
I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
However I do not find a proper way to do it using simply wget or curl.
Thank you
filenames curl wget url
add a comment |
I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.
The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv
, I would like to programmatically convert it in spesa del settore sanità.csv
. In the source name we have %20
that is a space and %E0
that is à
.
How to do this name conversion?
I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
However I do not find a proper way to do it using simply wget or curl.
Thank you
filenames curl wget url
add a comment |
I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.
The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv
, I would like to programmatically convert it in spesa del settore sanità.csv
. In the source name we have %20
that is a space and %E0
that is à
.
How to do this name conversion?
I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
However I do not find a proper way to do it using simply wget or curl.
Thank you
filenames curl wget url
I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.
The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv
, I would like to programmatically convert it in spesa del settore sanità.csv
. In the source name we have %20
that is a space and %E0
that is à
.
How to do this name conversion?
I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
However I do not find a proper way to do it using simply wget or curl.
Thank you
filenames curl wget url
filenames curl wget url
asked Feb 1 at 11:04
aborrusoaborruso
22829
22829
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
If your Wget is built with IRI support, then it will handle this case automatically.
Take a look at your wget --version
. Mine shows this:
GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.
-cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
+ntlm +opie +psl +ssl/gnutls
The important part for you here is the +iri
. Most distributions should compile it with IRI enabled by default.
EDIT:
It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition
header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1
option to it.
Hi darnir I have-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls
, than I should have +iri support. But if I run wget myURL I have‘spesa del settore sanit340.csv’ saved [859/859]
and not‘spesa del settore sanità.csv’ saved [859/859]
– aborruso
Feb 1 at 13:29
1
Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.
– darnir
Feb 1 at 16:38
Hi darnir, but how to read the reply header of the server? If I runcurl HEAD -I myURL
, I have no info back encoding related. Thank you
– aborruso
Feb 1 at 17:12
1
You don't need to. With Wget you can use-S
to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.
– darnir
Feb 1 at 18:16
add a comment |
More generally than just wget
:
The unvis
tool does this, with the -h
option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)
Your percent-encoded name is not in UTF-8, notice.
% printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
0000001c
% printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
spesa del settore sanità.csv
%
Further reading
unvis
. FreeBSD General Commands Manual. 2010-11-27.
unvis
. OpenBSD General Commands Manual. 2013-08-12.
Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.
– aborruso
Feb 1 at 17:18
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498111%2fhow-to-save-a-web-resource-applying-decoding-to-url-encoded-characters%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If your Wget is built with IRI support, then it will handle this case automatically.
Take a look at your wget --version
. Mine shows this:
GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.
-cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
+ntlm +opie +psl +ssl/gnutls
The important part for you here is the +iri
. Most distributions should compile it with IRI enabled by default.
EDIT:
It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition
header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1
option to it.
Hi darnir I have-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls
, than I should have +iri support. But if I run wget myURL I have‘spesa del settore sanit340.csv’ saved [859/859]
and not‘spesa del settore sanità.csv’ saved [859/859]
– aborruso
Feb 1 at 13:29
1
Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.
– darnir
Feb 1 at 16:38
Hi darnir, but how to read the reply header of the server? If I runcurl HEAD -I myURL
, I have no info back encoding related. Thank you
– aborruso
Feb 1 at 17:12
1
You don't need to. With Wget you can use-S
to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.
– darnir
Feb 1 at 18:16
add a comment |
If your Wget is built with IRI support, then it will handle this case automatically.
Take a look at your wget --version
. Mine shows this:
GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.
-cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
+ntlm +opie +psl +ssl/gnutls
The important part for you here is the +iri
. Most distributions should compile it with IRI enabled by default.
EDIT:
It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition
header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1
option to it.
Hi darnir I have-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls
, than I should have +iri support. But if I run wget myURL I have‘spesa del settore sanit340.csv’ saved [859/859]
and not‘spesa del settore sanità.csv’ saved [859/859]
– aborruso
Feb 1 at 13:29
1
Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.
– darnir
Feb 1 at 16:38
Hi darnir, but how to read the reply header of the server? If I runcurl HEAD -I myURL
, I have no info back encoding related. Thank you
– aborruso
Feb 1 at 17:12
1
You don't need to. With Wget you can use-S
to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.
– darnir
Feb 1 at 18:16
add a comment |
If your Wget is built with IRI support, then it will handle this case automatically.
Take a look at your wget --version
. Mine shows this:
GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.
-cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
+ntlm +opie +psl +ssl/gnutls
The important part for you here is the +iri
. Most distributions should compile it with IRI enabled by default.
EDIT:
It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition
header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1
option to it.
If your Wget is built with IRI support, then it will handle this case automatically.
Take a look at your wget --version
. Mine shows this:
GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.
-cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
+ntlm +opie +psl +ssl/gnutls
The important part for you here is the +iri
. Most distributions should compile it with IRI enabled by default.
EDIT:
It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition
header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1
option to it.
edited Feb 1 at 16:45
answered Feb 1 at 12:57
darnirdarnir
3,32211226
3,32211226
Hi darnir I have-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls
, than I should have +iri support. But if I run wget myURL I have‘spesa del settore sanit340.csv’ saved [859/859]
and not‘spesa del settore sanità.csv’ saved [859/859]
– aborruso
Feb 1 at 13:29
1
Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.
– darnir
Feb 1 at 16:38
Hi darnir, but how to read the reply header of the server? If I runcurl HEAD -I myURL
, I have no info back encoding related. Thank you
– aborruso
Feb 1 at 17:12
1
You don't need to. With Wget you can use-S
to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.
– darnir
Feb 1 at 18:16
add a comment |
Hi darnir I have-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls
, than I should have +iri support. But if I run wget myURL I have‘spesa del settore sanit340.csv’ saved [859/859]
and not‘spesa del settore sanità.csv’ saved [859/859]
– aborruso
Feb 1 at 13:29
1
Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.
– darnir
Feb 1 at 16:38
Hi darnir, but how to read the reply header of the server? If I runcurl HEAD -I myURL
, I have no info back encoding related. Thank you
– aborruso
Feb 1 at 17:12
1
You don't need to. With Wget you can use-S
to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.
– darnir
Feb 1 at 18:16
Hi darnir I have
-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls
, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859]
and not ‘spesa del settore sanità.csv’ saved [859/859]
– aborruso
Feb 1 at 13:29
Hi darnir I have
-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls
, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859]
and not ‘spesa del settore sanità.csv’ saved [859/859]
– aborruso
Feb 1 at 13:29
1
1
Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.
– darnir
Feb 1 at 16:38
Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.
– darnir
Feb 1 at 16:38
Hi darnir, but how to read the reply header of the server? If I run
curl HEAD -I myURL
, I have no info back encoding related. Thank you– aborruso
Feb 1 at 17:12
Hi darnir, but how to read the reply header of the server? If I run
curl HEAD -I myURL
, I have no info back encoding related. Thank you– aborruso
Feb 1 at 17:12
1
1
You don't need to. With Wget you can use
-S
to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.– darnir
Feb 1 at 18:16
You don't need to. With Wget you can use
-S
to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.– darnir
Feb 1 at 18:16
add a comment |
More generally than just wget
:
The unvis
tool does this, with the -h
option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)
Your percent-encoded name is not in UTF-8, notice.
% printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
0000001c
% printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
spesa del settore sanità.csv
%
Further reading
unvis
. FreeBSD General Commands Manual. 2010-11-27.
unvis
. OpenBSD General Commands Manual. 2013-08-12.
Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.
– aborruso
Feb 1 at 17:18
add a comment |
More generally than just wget
:
The unvis
tool does this, with the -h
option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)
Your percent-encoded name is not in UTF-8, notice.
% printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
0000001c
% printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
spesa del settore sanità.csv
%
Further reading
unvis
. FreeBSD General Commands Manual. 2010-11-27.
unvis
. OpenBSD General Commands Manual. 2013-08-12.
Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.
– aborruso
Feb 1 at 17:18
add a comment |
More generally than just wget
:
The unvis
tool does this, with the -h
option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)
Your percent-encoded name is not in UTF-8, notice.
% printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
0000001c
% printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
spesa del settore sanità.csv
%
Further reading
unvis
. FreeBSD General Commands Manual. 2010-11-27.
unvis
. OpenBSD General Commands Manual. 2013-08-12.
More generally than just wget
:
The unvis
tool does this, with the -h
option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)
Your percent-encoded name is not in UTF-8, notice.
% printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
0000001c
% printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
spesa del settore sanità.csv
%
Further reading
unvis
. FreeBSD General Commands Manual. 2010-11-27.
unvis
. OpenBSD General Commands Manual. 2013-08-12.
answered Feb 1 at 13:56
JdeBPJdeBP
34.8k470163
34.8k470163
Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.
– aborruso
Feb 1 at 17:18
add a comment |
Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.
– aborruso
Feb 1 at 17:18
Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.
– aborruso
Feb 1 at 17:18
Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.
– aborruso
Feb 1 at 17:18
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498111%2fhow-to-save-a-web-resource-applying-decoding-to-url-encoded-characters%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown