Removing duplicate entries and replacing it with comma | Bash












0















I have a file which contains ip address and port number in this order:



ipaddress : port



1.1.1.1:21

1.1.1.1:22

2.2.2.2:443

3.3.3.3:80

3.3.3.3:443



I need Result in this below format


ipaddress : port, port



1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443










share|improve this question





























    0















    I have a file which contains ip address and port number in this order:



    ipaddress : port



    1.1.1.1:21

    1.1.1.1:22

    2.2.2.2:443

    3.3.3.3:80

    3.3.3.3:443



    I need Result in this below format


    ipaddress : port, port



    1.1.1.1:21,22

    2.2.2.2:443

    3.3.3.3:80,443










    share|improve this question



























      0












      0








      0








      I have a file which contains ip address and port number in this order:



      ipaddress : port



      1.1.1.1:21

      1.1.1.1:22

      2.2.2.2:443

      3.3.3.3:80

      3.3.3.3:443



      I need Result in this below format


      ipaddress : port, port



      1.1.1.1:21,22

      2.2.2.2:443

      3.3.3.3:80,443










      share|improve this question
















      I have a file which contains ip address and port number in this order:



      ipaddress : port



      1.1.1.1:21

      1.1.1.1:22

      2.2.2.2:443

      3.3.3.3:80

      3.3.3.3:443



      I need Result in this below format


      ipaddress : port, port



      1.1.1.1:21,22

      2.2.2.2:443

      3.3.3.3:80,443







      text-processing awk sed






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 1 at 15:44









      Jeff Schaller

      41.4k1056131




      41.4k1056131










      asked Feb 1 at 13:33









      user334662user334662

      32




      32






















          5 Answers
          5






          active

          oldest

          votes


















          2














          Assuming there are no trailing spaces on the lines in the input file:



          $ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file
          3.3.3.3:80,443
          1.1.1.1:21,22
          2.2.2.2:443


          The awk script,



          BEGIN       { OFS=FS }
          $1 in ports { ports[$1] = ports[$1] "," $2; next }
          { ports[$1] = $2 }
          END { for (ip in ports) print ip, ports[ip] }


          would first set the output field separator to be the same as the input field separator, which is a : character (this is given on the command line with -F ':'), then it would test whether the current first field (the IP address) is a key in the ports array. If it is, the port number (the second field) is added with a comma as a delimiter to that array entry. If it's not, the entry in the array is simply set to the port number for that IP address.



          At the end, all stored IP addresses are printed with their collected port numbers.






          share|improve this answer
























          • Thank you soo much it worked :)

            – user334662
            Feb 1 at 13:53



















          2














          With GNU Datamash



          datamash -t: -s groupby 1 collapse 2 < file


          If your data are already sorted, you can omit the -s .





          Or using an anonymous array inside a hash in Perl:



          $ perl -F: -lne '
          push @{ $h{$F[0]} }, $F[1]
          }{
          for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }
          ' file
          1.1.1.1:21,22
          2.2.2.2:443
          3.3.3.3:80,443





          share|improve this answer


























          • Thanks This also worked for me :)

            – user334662
            Feb 1 at 13:56



















          0














          using miller (http://johnkerl.org/miller/doc) is



          mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input


          it gives you back



          1.1.1.1:21,22
          2.2.2.2:443
          3.3.3.3:80,443





          share|improve this answer































            0














            Tried with below command and it worked fine



            for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done


            output



            1.1.1.1 21,22
            2.2.2.2 443
            3.3.3.3 80,443





            share|improve this answer































              0














              You can do using the sed editor. There we maintain 2 lines at any time in the pattern space and look for changes in the IP number. So long as we continue getting the same IP, we remove from the 2nd portion the IP and join it with the 1st portion with a comma. If not, then that means an IP change has been detected and we promptly print the first portion only, remove it from the pattern space, and go back and read in the next IP line into the pattern space and repeat the same checks.



              $ sed -e '
              :loop
              $!N
              s/^(([^:]*:).*[^[:space:]]).*n2/1,/
              tloop
              P;D
              ' input-file.txt

              1.1.1.1:21,22
              2.2.2.2:443
              3.3.3.3:80,443

              $ perl -lne '
              my($ip, $port) = /(H+):(H+)/;
              push @seen, $ip if ! exists $h{$ip};
              push @{$h{$ip}}, $port;}{
              print $_, ":", join ",", @{$h{$_}} for @seen;
              ' input-file.txt


              With Perl we can do the same by means of a hash which will maintain the IPs as it's keys and an array ref as the values comprising the ports. Also, we ensure to not consider any trailing blanks. The array @seen maintains the IPs in the order they were seen.






              share|improve this answer

























                Your Answer








                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "106"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498132%2fremoving-duplicate-entries-and-replacing-it-with-comma-bash%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                2














                Assuming there are no trailing spaces on the lines in the input file:



                $ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file
                3.3.3.3:80,443
                1.1.1.1:21,22
                2.2.2.2:443


                The awk script,



                BEGIN       { OFS=FS }
                $1 in ports { ports[$1] = ports[$1] "," $2; next }
                { ports[$1] = $2 }
                END { for (ip in ports) print ip, ports[ip] }


                would first set the output field separator to be the same as the input field separator, which is a : character (this is given on the command line with -F ':'), then it would test whether the current first field (the IP address) is a key in the ports array. If it is, the port number (the second field) is added with a comma as a delimiter to that array entry. If it's not, the entry in the array is simply set to the port number for that IP address.



                At the end, all stored IP addresses are printed with their collected port numbers.






                share|improve this answer
























                • Thank you soo much it worked :)

                  – user334662
                  Feb 1 at 13:53
















                2














                Assuming there are no trailing spaces on the lines in the input file:



                $ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file
                3.3.3.3:80,443
                1.1.1.1:21,22
                2.2.2.2:443


                The awk script,



                BEGIN       { OFS=FS }
                $1 in ports { ports[$1] = ports[$1] "," $2; next }
                { ports[$1] = $2 }
                END { for (ip in ports) print ip, ports[ip] }


                would first set the output field separator to be the same as the input field separator, which is a : character (this is given on the command line with -F ':'), then it would test whether the current first field (the IP address) is a key in the ports array. If it is, the port number (the second field) is added with a comma as a delimiter to that array entry. If it's not, the entry in the array is simply set to the port number for that IP address.



                At the end, all stored IP addresses are printed with their collected port numbers.






                share|improve this answer
























                • Thank you soo much it worked :)

                  – user334662
                  Feb 1 at 13:53














                2












                2








                2







                Assuming there are no trailing spaces on the lines in the input file:



                $ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file
                3.3.3.3:80,443
                1.1.1.1:21,22
                2.2.2.2:443


                The awk script,



                BEGIN       { OFS=FS }
                $1 in ports { ports[$1] = ports[$1] "," $2; next }
                { ports[$1] = $2 }
                END { for (ip in ports) print ip, ports[ip] }


                would first set the output field separator to be the same as the input field separator, which is a : character (this is given on the command line with -F ':'), then it would test whether the current first field (the IP address) is a key in the ports array. If it is, the port number (the second field) is added with a comma as a delimiter to that array entry. If it's not, the entry in the array is simply set to the port number for that IP address.



                At the end, all stored IP addresses are printed with their collected port numbers.






                share|improve this answer













                Assuming there are no trailing spaces on the lines in the input file:



                $ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file
                3.3.3.3:80,443
                1.1.1.1:21,22
                2.2.2.2:443


                The awk script,



                BEGIN       { OFS=FS }
                $1 in ports { ports[$1] = ports[$1] "," $2; next }
                { ports[$1] = $2 }
                END { for (ip in ports) print ip, ports[ip] }


                would first set the output field separator to be the same as the input field separator, which is a : character (this is given on the command line with -F ':'), then it would test whether the current first field (the IP address) is a key in the ports array. If it is, the port number (the second field) is added with a comma as a delimiter to that array entry. If it's not, the entry in the array is simply set to the port number for that IP address.



                At the end, all stored IP addresses are printed with their collected port numbers.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Feb 1 at 13:48









                KusalanandaKusalananda

                129k16245404




                129k16245404













                • Thank you soo much it worked :)

                  – user334662
                  Feb 1 at 13:53



















                • Thank you soo much it worked :)

                  – user334662
                  Feb 1 at 13:53

















                Thank you soo much it worked :)

                – user334662
                Feb 1 at 13:53





                Thank you soo much it worked :)

                – user334662
                Feb 1 at 13:53













                2














                With GNU Datamash



                datamash -t: -s groupby 1 collapse 2 < file


                If your data are already sorted, you can omit the -s .





                Or using an anonymous array inside a hash in Perl:



                $ perl -F: -lne '
                push @{ $h{$F[0]} }, $F[1]
                }{
                for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }
                ' file
                1.1.1.1:21,22
                2.2.2.2:443
                3.3.3.3:80,443





                share|improve this answer


























                • Thanks This also worked for me :)

                  – user334662
                  Feb 1 at 13:56
















                2














                With GNU Datamash



                datamash -t: -s groupby 1 collapse 2 < file


                If your data are already sorted, you can omit the -s .





                Or using an anonymous array inside a hash in Perl:



                $ perl -F: -lne '
                push @{ $h{$F[0]} }, $F[1]
                }{
                for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }
                ' file
                1.1.1.1:21,22
                2.2.2.2:443
                3.3.3.3:80,443





                share|improve this answer


























                • Thanks This also worked for me :)

                  – user334662
                  Feb 1 at 13:56














                2












                2








                2







                With GNU Datamash



                datamash -t: -s groupby 1 collapse 2 < file


                If your data are already sorted, you can omit the -s .





                Or using an anonymous array inside a hash in Perl:



                $ perl -F: -lne '
                push @{ $h{$F[0]} }, $F[1]
                }{
                for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }
                ' file
                1.1.1.1:21,22
                2.2.2.2:443
                3.3.3.3:80,443





                share|improve this answer















                With GNU Datamash



                datamash -t: -s groupby 1 collapse 2 < file


                If your data are already sorted, you can omit the -s .





                Or using an anonymous array inside a hash in Perl:



                $ perl -F: -lne '
                push @{ $h{$F[0]} }, $F[1]
                }{
                for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }
                ' file
                1.1.1.1:21,22
                2.2.2.2:443
                3.3.3.3:80,443






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Feb 1 at 14:03

























                answered Feb 1 at 13:53









                steeldriversteeldriver

                36.1k35286




                36.1k35286













                • Thanks This also worked for me :)

                  – user334662
                  Feb 1 at 13:56



















                • Thanks This also worked for me :)

                  – user334662
                  Feb 1 at 13:56

















                Thanks This also worked for me :)

                – user334662
                Feb 1 at 13:56





                Thanks This also worked for me :)

                – user334662
                Feb 1 at 13:56











                0














                using miller (http://johnkerl.org/miller/doc) is



                mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input


                it gives you back



                1.1.1.1:21,22
                2.2.2.2:443
                3.3.3.3:80,443





                share|improve this answer




























                  0














                  using miller (http://johnkerl.org/miller/doc) is



                  mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input


                  it gives you back



                  1.1.1.1:21,22
                  2.2.2.2:443
                  3.3.3.3:80,443





                  share|improve this answer


























                    0












                    0








                    0







                    using miller (http://johnkerl.org/miller/doc) is



                    mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input


                    it gives you back



                    1.1.1.1:21,22
                    2.2.2.2:443
                    3.3.3.3:80,443





                    share|improve this answer













                    using miller (http://johnkerl.org/miller/doc) is



                    mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input


                    it gives you back



                    1.1.1.1:21,22
                    2.2.2.2:443
                    3.3.3.3:80,443






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Feb 1 at 14:33









                    aborrusoaborruso

                    22829




                    22829























                        0














                        Tried with below command and it worked fine



                        for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done


                        output



                        1.1.1.1 21,22
                        2.2.2.2 443
                        3.3.3.3 80,443





                        share|improve this answer




























                          0














                          Tried with below command and it worked fine



                          for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done


                          output



                          1.1.1.1 21,22
                          2.2.2.2 443
                          3.3.3.3 80,443





                          share|improve this answer


























                            0












                            0








                            0







                            Tried with below command and it worked fine



                            for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done


                            output



                            1.1.1.1 21,22
                            2.2.2.2 443
                            3.3.3.3 80,443





                            share|improve this answer













                            Tried with below command and it worked fine



                            for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done


                            output



                            1.1.1.1 21,22
                            2.2.2.2 443
                            3.3.3.3 80,443






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Feb 1 at 18:40









                            Praveen Kumar BSPraveen Kumar BS

                            1,474138




                            1,474138























                                0














                                You can do using the sed editor. There we maintain 2 lines at any time in the pattern space and look for changes in the IP number. So long as we continue getting the same IP, we remove from the 2nd portion the IP and join it with the 1st portion with a comma. If not, then that means an IP change has been detected and we promptly print the first portion only, remove it from the pattern space, and go back and read in the next IP line into the pattern space and repeat the same checks.



                                $ sed -e '
                                :loop
                                $!N
                                s/^(([^:]*:).*[^[:space:]]).*n2/1,/
                                tloop
                                P;D
                                ' input-file.txt

                                1.1.1.1:21,22
                                2.2.2.2:443
                                3.3.3.3:80,443

                                $ perl -lne '
                                my($ip, $port) = /(H+):(H+)/;
                                push @seen, $ip if ! exists $h{$ip};
                                push @{$h{$ip}}, $port;}{
                                print $_, ":", join ",", @{$h{$_}} for @seen;
                                ' input-file.txt


                                With Perl we can do the same by means of a hash which will maintain the IPs as it's keys and an array ref as the values comprising the ports. Also, we ensure to not consider any trailing blanks. The array @seen maintains the IPs in the order they were seen.






                                share|improve this answer






























                                  0














                                  You can do using the sed editor. There we maintain 2 lines at any time in the pattern space and look for changes in the IP number. So long as we continue getting the same IP, we remove from the 2nd portion the IP and join it with the 1st portion with a comma. If not, then that means an IP change has been detected and we promptly print the first portion only, remove it from the pattern space, and go back and read in the next IP line into the pattern space and repeat the same checks.



                                  $ sed -e '
                                  :loop
                                  $!N
                                  s/^(([^:]*:).*[^[:space:]]).*n2/1,/
                                  tloop
                                  P;D
                                  ' input-file.txt

                                  1.1.1.1:21,22
                                  2.2.2.2:443
                                  3.3.3.3:80,443

                                  $ perl -lne '
                                  my($ip, $port) = /(H+):(H+)/;
                                  push @seen, $ip if ! exists $h{$ip};
                                  push @{$h{$ip}}, $port;}{
                                  print $_, ":", join ",", @{$h{$_}} for @seen;
                                  ' input-file.txt


                                  With Perl we can do the same by means of a hash which will maintain the IPs as it's keys and an array ref as the values comprising the ports. Also, we ensure to not consider any trailing blanks. The array @seen maintains the IPs in the order they were seen.






                                  share|improve this answer




























                                    0












                                    0








                                    0







                                    You can do using the sed editor. There we maintain 2 lines at any time in the pattern space and look for changes in the IP number. So long as we continue getting the same IP, we remove from the 2nd portion the IP and join it with the 1st portion with a comma. If not, then that means an IP change has been detected and we promptly print the first portion only, remove it from the pattern space, and go back and read in the next IP line into the pattern space and repeat the same checks.



                                    $ sed -e '
                                    :loop
                                    $!N
                                    s/^(([^:]*:).*[^[:space:]]).*n2/1,/
                                    tloop
                                    P;D
                                    ' input-file.txt

                                    1.1.1.1:21,22
                                    2.2.2.2:443
                                    3.3.3.3:80,443

                                    $ perl -lne '
                                    my($ip, $port) = /(H+):(H+)/;
                                    push @seen, $ip if ! exists $h{$ip};
                                    push @{$h{$ip}}, $port;}{
                                    print $_, ":", join ",", @{$h{$_}} for @seen;
                                    ' input-file.txt


                                    With Perl we can do the same by means of a hash which will maintain the IPs as it's keys and an array ref as the values comprising the ports. Also, we ensure to not consider any trailing blanks. The array @seen maintains the IPs in the order they were seen.






                                    share|improve this answer















                                    You can do using the sed editor. There we maintain 2 lines at any time in the pattern space and look for changes in the IP number. So long as we continue getting the same IP, we remove from the 2nd portion the IP and join it with the 1st portion with a comma. If not, then that means an IP change has been detected and we promptly print the first portion only, remove it from the pattern space, and go back and read in the next IP line into the pattern space and repeat the same checks.



                                    $ sed -e '
                                    :loop
                                    $!N
                                    s/^(([^:]*:).*[^[:space:]]).*n2/1,/
                                    tloop
                                    P;D
                                    ' input-file.txt

                                    1.1.1.1:21,22
                                    2.2.2.2:443
                                    3.3.3.3:80,443

                                    $ perl -lne '
                                    my($ip, $port) = /(H+):(H+)/;
                                    push @seen, $ip if ! exists $h{$ip};
                                    push @{$h{$ip}}, $port;}{
                                    print $_, ":", join ",", @{$h{$_}} for @seen;
                                    ' input-file.txt


                                    With Perl we can do the same by means of a hash which will maintain the IPs as it's keys and an array ref as the values comprising the ports. Also, we ensure to not consider any trailing blanks. The array @seen maintains the IPs in the order they were seen.







                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Feb 2 at 7:33

























                                    answered Feb 2 at 6:01









                                    Rakesh SharmaRakesh Sharma

                                    302113




                                    302113






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498132%2fremoving-duplicate-entries-and-replacing-it-with-comma-bash%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        How to make a Squid Proxy server?

                                        Is this a new Fibonacci Identity?

                                        19世紀