How to save a web resource, applying decoding to URL Encoded Characters?












0















I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.



The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv, I would like to programmatically convert it in spesa del settore sanità.csv. In the source name we have %20 that is a space and %E0 that is à.



How to do this name conversion?



I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
However I do not find a proper way to do it using simply wget or curl.



Thank you










share|improve this question



























    0















    I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.



    The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv, I would like to programmatically convert it in spesa del settore sanità.csv. In the source name we have %20 that is a space and %E0 that is à.



    How to do this name conversion?



    I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
    However I do not find a proper way to do it using simply wget or curl.



    Thank you










    share|improve this question

























      0












      0








      0








      I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.



      The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv, I would like to programmatically convert it in spesa del settore sanità.csv. In the source name we have %20 that is a space and %E0 that is à.



      How to do this name conversion?



      I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
      However I do not find a proper way to do it using simply wget or curl.



      Thank you










      share|improve this question














      I would like to save this file with this URL "http://pti.regione.sicilia.it/portal/page/portal/PIR_PORTALE/PIR_LaStrutturaRegionale/PIR_AssessoratoEconomia/PIR_DipBilancioTesoro/PIR_Areetematiche/PIR_ServizioStatistica/PIR_1839271.4501140784/PIR_idatidellaSicilia/spesa%20del%20settore%20sanit%E0.csv" using its source name, but saving it applying decoding to URL Encoded Characters it has.



      The name we have in the URL is spesa%20del%20settore%20sanit%E0.csv, I would like to programmatically convert it in spesa del settore sanità.csv. In the source name we have %20 that is a space and %E0 that is à.



      How to do this name conversion?



      I could create a search and replace list starting from this table but I imagine there is an utility, a library that could do it for me.
      However I do not find a proper way to do it using simply wget or curl.



      Thank you







      filenames curl wget url






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 1 at 11:04









      aborrusoaborruso

      22829




      22829






















          2 Answers
          2






          active

          oldest

          votes


















          1














          If your Wget is built with IRI support, then it will handle this case automatically.



          Take a look at your wget --version. Mine shows this:



          GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.

          -cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
          +ntlm +opie +psl +ssl/gnutls


          The important part for you here is the +iri. Most distributions should compile it with IRI enabled by default.



          EDIT:
          It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1 option to it.






          share|improve this answer


























          • Hi darnir I have -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859] and not ‘spesa del settore sanità.csv’ saved [859/859]

            – aborruso
            Feb 1 at 13:29






          • 1





            Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.

            – darnir
            Feb 1 at 16:38











          • Hi darnir, but how to read the reply header of the server? If I run curl HEAD -I myURL, I have no info back encoding related. Thank you

            – aborruso
            Feb 1 at 17:12






          • 1





            You don't need to. With Wget you can use -S to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.

            – darnir
            Feb 1 at 18:16



















          2














          More generally than just wget:



          The unvis tool does this, with the -h option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)



          Your percent-encoded name is not in UTF-8, notice.




          % printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
          00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
          00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
          0000001c
          % printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
          spesa del settore sanità.csv
          %


          Further reading





          • unvis. FreeBSD General Commands Manual. 2010-11-27.


          • unvis. OpenBSD General Commands Manual. 2013-08-12.






          share|improve this answer
























          • Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.

            – aborruso
            Feb 1 at 17:18











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498111%2fhow-to-save-a-web-resource-applying-decoding-to-url-encoded-characters%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          If your Wget is built with IRI support, then it will handle this case automatically.



          Take a look at your wget --version. Mine shows this:



          GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.

          -cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
          +ntlm +opie +psl +ssl/gnutls


          The important part for you here is the +iri. Most distributions should compile it with IRI enabled by default.



          EDIT:
          It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1 option to it.






          share|improve this answer


























          • Hi darnir I have -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859] and not ‘spesa del settore sanità.csv’ saved [859/859]

            – aborruso
            Feb 1 at 13:29






          • 1





            Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.

            – darnir
            Feb 1 at 16:38











          • Hi darnir, but how to read the reply header of the server? If I run curl HEAD -I myURL, I have no info back encoding related. Thank you

            – aborruso
            Feb 1 at 17:12






          • 1





            You don't need to. With Wget you can use -S to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.

            – darnir
            Feb 1 at 18:16
















          1














          If your Wget is built with IRI support, then it will handle this case automatically.



          Take a look at your wget --version. Mine shows this:



          GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.

          -cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
          +ntlm +opie +psl +ssl/gnutls


          The important part for you here is the +iri. Most distributions should compile it with IRI enabled by default.



          EDIT:
          It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1 option to it.






          share|improve this answer


























          • Hi darnir I have -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859] and not ‘spesa del settore sanità.csv’ saved [859/859]

            – aborruso
            Feb 1 at 13:29






          • 1





            Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.

            – darnir
            Feb 1 at 16:38











          • Hi darnir, but how to read the reply header of the server? If I run curl HEAD -I myURL, I have no info back encoding related. Thank you

            – aborruso
            Feb 1 at 17:12






          • 1





            You don't need to. With Wget you can use -S to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.

            – darnir
            Feb 1 at 18:16














          1












          1








          1







          If your Wget is built with IRI support, then it will handle this case automatically.



          Take a look at your wget --version. Mine shows this:



          GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.

          -cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
          +ntlm +opie +psl +ssl/gnutls


          The important part for you here is the +iri. Most distributions should compile it with IRI enabled by default.



          EDIT:
          It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1 option to it.






          share|improve this answer















          If your Wget is built with IRI support, then it will handle this case automatically.



          Take a look at your wget --version. Mine shows this:



          GNU Wget 1.20.1.7-5dce-dirty built on linux-gnu.

          -cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
          +ntlm +opie +psl +ssl/gnutls


          The important part for you here is the +iri. Most distributions should compile it with IRI enabled by default.



          EDIT:
          It seems like the server in this case sends the filename encoded in latin-1. While the default assumption is always utf-8. Ideally, the server should send a Content-Disposition header to mention this. It can however be handled by Wget is you pass the --remote-encoding=latin1 option to it.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 1 at 16:45

























          answered Feb 1 at 12:57









          darnirdarnir

          3,32211226




          3,32211226













          • Hi darnir I have -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859] and not ‘spesa del settore sanità.csv’ saved [859/859]

            – aborruso
            Feb 1 at 13:29






          • 1





            Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.

            – darnir
            Feb 1 at 16:38











          • Hi darnir, but how to read the reply header of the server? If I run curl HEAD -I myURL, I have no info back encoding related. Thank you

            – aborruso
            Feb 1 at 17:12






          • 1





            You don't need to. With Wget you can use -S to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.

            – darnir
            Feb 1 at 18:16



















          • Hi darnir I have -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859] and not ‘spesa del settore sanità.csv’ saved [859/859]

            – aborruso
            Feb 1 at 13:29






          • 1





            Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.

            – darnir
            Feb 1 at 16:38











          • Hi darnir, but how to read the reply header of the server? If I run curl HEAD -I myURL, I have no info back encoding related. Thank you

            – aborruso
            Feb 1 at 17:12






          • 1





            You don't need to. With Wget you can use -S to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.

            – darnir
            Feb 1 at 18:16

















          Hi darnir I have -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859] and not ‘spesa del settore sanità.csv’ saved [859/859]

          – aborruso
          Feb 1 at 13:29





          Hi darnir I have -cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls +ntlm +opie +psl +ssl/gnutls, than I should have +iri support. But if I run wget myURL I have ‘spesa del settore sanit340.csv’ saved [859/859] and not ‘spesa del settore sanità.csv’ saved [859/859]

          – aborruso
          Feb 1 at 13:29




          1




          1





          Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.

          – darnir
          Feb 1 at 16:38





          Aah yes.. As user @JdeBP mentioned, the name is not correct UTF-8, which causes this problem. And the server doesn't send a header telling the client that it should interpret it as latin-1 instead.

          – darnir
          Feb 1 at 16:38













          Hi darnir, but how to read the reply header of the server? If I run curl HEAD -I myURL, I have no info back encoding related. Thank you

          – aborruso
          Feb 1 at 17:12





          Hi darnir, but how to read the reply header of the server? If I run curl HEAD -I myURL, I have no info back encoding related. Thank you

          – aborruso
          Feb 1 at 17:12




          1




          1





          You don't need to. With Wget you can use -S to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.

          – darnir
          Feb 1 at 18:16





          You don't need to. With Wget you can use -S to print the server reply, but as mentioned in my answer, Wget would deal with it automatically if the Content-Disposition header was mentioned with the name and encoding.

          – darnir
          Feb 1 at 18:16













          2














          More generally than just wget:



          The unvis tool does this, with the -h option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)



          Your percent-encoded name is not in UTF-8, notice.




          % printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
          00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
          00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
          0000001c
          % printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
          spesa del settore sanità.csv
          %


          Further reading





          • unvis. FreeBSD General Commands Manual. 2010-11-27.


          • unvis. OpenBSD General Commands Manual. 2013-08-12.






          share|improve this answer
























          • Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.

            – aborruso
            Feb 1 at 17:18
















          2














          More generally than just wget:



          The unvis tool does this, with the -h option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)



          Your percent-encoded name is not in UTF-8, notice.




          % printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
          00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
          00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
          0000001c
          % printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
          spesa del settore sanità.csv
          %


          Further reading





          • unvis. FreeBSD General Commands Manual. 2010-11-27.


          • unvis. OpenBSD General Commands Manual. 2013-08-12.






          share|improve this answer
























          • Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.

            – aborruso
            Feb 1 at 17:18














          2












          2








          2







          More generally than just wget:



          The unvis tool does this, with the -h option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)



          Your percent-encoded name is not in UTF-8, notice.




          % printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
          00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
          00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
          0000001c
          % printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
          spesa del settore sanità.csv
          %


          Further reading





          • unvis. FreeBSD General Commands Manual. 2010-11-27.


          • unvis. OpenBSD General Commands Manual. 2013-08-12.






          share|improve this answer













          More generally than just wget:



          The unvis tool does this, with the -h option to specify percent encoding. (The OpenBSD and MacOS versions of the tool do not have this, note.)



          Your percent-encoded name is not in UTF-8, notice.




          % printf '%s' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | hexdump -C
          00000000 73 70 65 73 61 20 64 65 6c 20 73 65 74 74 6f 72 |spesa del settor|
          00000010 65 20 73 61 6e 69 74 e0 2e 63 73 76 |e sanit..csv|
          0000001c
          % printf '%sn' 'spesa%20del%20settore%20sanit%E0.csv' | unvis -h | iconv -f latin1
          spesa del settore sanità.csv
          %


          Further reading





          • unvis. FreeBSD General Commands Manual. 2010-11-27.


          • unvis. OpenBSD General Commands Manual. 2013-08-12.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 1 at 13:56









          JdeBPJdeBP

          34.8k470163




          34.8k470163













          • Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.

            – aborruso
            Feb 1 at 17:18



















          • Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.

            – aborruso
            Feb 1 at 17:18

















          Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.

          – aborruso
          Feb 1 at 17:18





          Thank you very much @JdeBP. I do not find a way to install unvis. It's not possible in my debian with apt-get.

          – aborruso
          Feb 1 at 17:18


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498111%2fhow-to-save-a-web-resource-applying-decoding-to-url-encoded-characters%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to make a Squid Proxy server?

          Is this a new Fibonacci Identity?

          19世紀