Removing duplicate entries and replacing it with comma

Removing duplicate entries and replacing it with comma | Bash

I have a file which contains ip address and port number in this order:

ipaddress : port

1.1.1.1:21

1.1.1.1:22

2.2.2.2:443

3.3.3.3:80

3.3.3.3:443

I need Result in this below format

ipaddress : port, port

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

asked Feb 1 at 13:33

user334662

add a comment |

I have a file which contains ip address and port number in this order:

ipaddress : port

1.1.1.1:21

1.1.1.1:22

2.2.2.2:443

3.3.3.3:80

3.3.3.3:443

I need Result in this below format

ipaddress : port, port

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

asked Feb 1 at 13:33

user334662

add a comment |

I have a file which contains ip address and port number in this order:

ipaddress : port

1.1.1.1:21

1.1.1.1:22

2.2.2.2:443

3.3.3.3:80

3.3.3.3:443

I need Result in this below format

ipaddress : port, port

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

asked Feb 1 at 13:33

user334662

I have a file which contains ip address and port number in this order:

ipaddress : port

1.1.1.1:21

1.1.1.1:22

2.2.2.2:443

3.3.3.3:80

3.3.3.3:443

I need Result in this below format

ipaddress : port, port

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

text-processing awk sed

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

asked Feb 1 at 13:33

user334662

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

asked Feb 1 at 13:33

user334662

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

edited Feb 1 at 15:44

Jeff Schaller

41.4k1056131

asked Feb 1 at 13:33

user334662

asked Feb 1 at 13:33

user334662

asked Feb 1 at 13:33

user334662

add a comment |

5 Answers
5

active

oldest

votes

Assuming there are no trailing spaces on the lines in the input file:

$ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file

3.3.3.3:80,443

1.1.1.1:21,22

2.2.2.2:443

The awk script,

BEGIN       { OFS=FS }

$1 in ports { ports[$1] = ports[$1] "," $2; next }

            { ports[$1] = $2 }

END         { for (ip in ports) print ip, ports[ip] }

would first set the output field separator to be the same as the input field separator, which is a : character (this is given on the command line with -F ':'), then it would test whether the current first field (the IP address) is a key in the ports array. If it is, the port number (the second field) is added with a comma as a delimiter to that array entry. If it's not, the entry in the array is simply set to the port number for that IP address.

At the end, all stored IP addresses are printed with their collected port numbers.

answered Feb 1 at 13:48

Kusalananda

129k16245404

Thank you soo much it worked :)

– user334662
Feb 1 at 13:53

add a comment |

With GNU Datamash

datamash -t: -s groupby 1 collapse 2 < file

If your data are already sorted, you can omit the -s .

Or using an anonymous array inside a hash in Perl:

$ perl -F: -lne '

    push @{ $h{$F[0]} }, $F[1] 

    }{ 

    for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }

' file

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 14:03

answered Feb 1 at 13:53

steeldriver

36.1k35286

Thanks This also worked for me :)

– user334662
Feb 1 at 13:56

add a comment |

using miller (http://johnkerl.org/miller/doc) is

mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input

it gives you back

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

answered Feb 1 at 14:33

aborruso

22829

add a comment |

Tried with below command and it worked fine

for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done

output

1.1.1.1 21,22

2.2.2.2 443

3.3.3.3 80,443

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

add a comment |

You can do using the sed editor. There we maintain 2 lines at any time in the pattern space and look for changes in the IP number. So long as we continue getting the same IP, we remove from the 2nd portion the IP and join it with the 1st portion with a comma. If not, then that means an IP change has been detected and we promptly print the first portion only, remove it from the pattern space, and go back and read in the next IP line into the pattern space and repeat the same checks.

$ sed -e '

    :loop

       $!N

       s/^(([^:]*:).*[^[:space:]]).*n2/1,/

    tloop

    P;D

 ' input-file.txt



 1.1.1.1:21,22

 2.2.2.2:443 

 3.3.3.3:80,443



 $ perl -lne '

    my($ip, $port) = /(H+):(H+)/;

    push @seen, $ip if ! exists $h{$ip};

    push @{$h{$ip}}, $port;}{

    print $_,  ":", join ",", @{$h{$_}} for @seen;

 ' input-file.txt

With Perl we can do the same by means of a hash which will maintain the IPs as it's keys and an array ref as the values comprising the ports. Also, we ensure to not consider any trailing blanks. The array @seen maintains the IPs in the order they were seen.

edited Feb 2 at 7:33

answered Feb 2 at 6:01

Rakesh Sharma

302113

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498132%2fremoving-duplicate-entries-and-replacing-it-with-comma-bash%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

Assuming there are no trailing spaces on the lines in the input file:

$ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file

3.3.3.3:80,443

1.1.1.1:21,22

2.2.2.2:443

The awk script,

BEGIN       { OFS=FS }

$1 in ports { ports[$1] = ports[$1] "," $2; next }

            { ports[$1] = $2 }

END         { for (ip in ports) print ip, ports[ip] }

At the end, all stored IP addresses are printed with their collected port numbers.

answered Feb 1 at 13:48

Kusalananda

129k16245404

Thank you soo much it worked :)

– user334662
Feb 1 at 13:53

add a comment |

Assuming there are no trailing spaces on the lines in the input file:

$ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file

3.3.3.3:80,443

1.1.1.1:21,22

2.2.2.2:443

The awk script,

BEGIN       { OFS=FS }

$1 in ports { ports[$1] = ports[$1] "," $2; next }

            { ports[$1] = $2 }

END         { for (ip in ports) print ip, ports[ip] }

At the end, all stored IP addresses are printed with their collected port numbers.

answered Feb 1 at 13:48

Kusalananda

129k16245404

Thank you soo much it worked :)

– user334662
Feb 1 at 13:53

add a comment |

Assuming there are no trailing spaces on the lines in the input file:

$ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file

3.3.3.3:80,443

1.1.1.1:21,22

2.2.2.2:443

The awk script,

BEGIN       { OFS=FS }

$1 in ports { ports[$1] = ports[$1] "," $2; next }

            { ports[$1] = $2 }

END         { for (ip in ports) print ip, ports[ip] }

At the end, all stored IP addresses are printed with their collected port numbers.

answered Feb 1 at 13:48

Kusalananda

129k16245404

Assuming there are no trailing spaces on the lines in the input file:

$ awk -F ':' 'BEGIN { OFS=FS } $1 in ports { ports[$1] = ports[$1] "," $2; next } { ports[$1] = $2 } END { for (ip in ports) print ip, ports[ip] }' file

3.3.3.3:80,443

1.1.1.1:21,22

2.2.2.2:443

The awk script,

BEGIN       { OFS=FS }

$1 in ports { ports[$1] = ports[$1] "," $2; next }

            { ports[$1] = $2 }

END         { for (ip in ports) print ip, ports[ip] }

At the end, all stored IP addresses are printed with their collected port numbers.

answered Feb 1 at 13:48

Kusalananda

129k16245404

answered Feb 1 at 13:48

Kusalananda

129k16245404

answered Feb 1 at 13:48

Kusalananda

129k16245404

answered Feb 1 at 13:48

Kusalananda

129k16245404

Thank you soo much it worked :)

– user334662
Feb 1 at 13:53

add a comment |

Thank you soo much it worked :)

– user334662
Feb 1 at 13:53

Thank you soo much it worked :)

– user334662
Feb 1 at 13:53

add a comment |

With GNU Datamash

datamash -t: -s groupby 1 collapse 2 < file

If your data are already sorted, you can omit the -s .

Or using an anonymous array inside a hash in Perl:

$ perl -F: -lne '

    push @{ $h{$F[0]} }, $F[1] 

    }{ 

    for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }

' file

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 14:03

answered Feb 1 at 13:53

steeldriver

36.1k35286

Thanks This also worked for me :)

– user334662
Feb 1 at 13:56

add a comment |

With GNU Datamash

datamash -t: -s groupby 1 collapse 2 < file

If your data are already sorted, you can omit the -s .

Or using an anonymous array inside a hash in Perl:

$ perl -F: -lne '

    push @{ $h{$F[0]} }, $F[1] 

    }{ 

    for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }

' file

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 14:03

answered Feb 1 at 13:53

steeldriver

36.1k35286

Thanks This also worked for me :)

– user334662
Feb 1 at 13:56

add a comment |

With GNU Datamash

datamash -t: -s groupby 1 collapse 2 < file

If your data are already sorted, you can omit the -s .

Or using an anonymous array inside a hash in Perl:

$ perl -F: -lne '

    push @{ $h{$F[0]} }, $F[1] 

    }{ 

    for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }

' file

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 14:03

answered Feb 1 at 13:53

steeldriver

36.1k35286

With GNU Datamash

datamash -t: -s groupby 1 collapse 2 < file

If your data are already sorted, you can omit the -s .

Or using an anonymous array inside a hash in Perl:

$ perl -F: -lne '

    push @{ $h{$F[0]} }, $F[1] 

    }{ 

    for $k (sort keys %h) {print "$k:", join ",", @{ $h{$k}} }

' file

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

edited Feb 1 at 14:03

answered Feb 1 at 13:53

steeldriver

36.1k35286

edited Feb 1 at 14:03

answered Feb 1 at 13:53

steeldriver

36.1k35286

answered Feb 1 at 13:53

steeldriver

36.1k35286

answered Feb 1 at 13:53

steeldriver

36.1k35286

Thanks This also worked for me :)

– user334662
Feb 1 at 13:56

add a comment |

Thanks This also worked for me :)

– user334662
Feb 1 at 13:56

Thanks This also worked for me :)

– user334662
Feb 1 at 13:56

add a comment |

using miller (http://johnkerl.org/miller/doc) is

mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input

it gives you back

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

answered Feb 1 at 14:33

aborruso

22829

add a comment |

using miller (http://johnkerl.org/miller/doc) is

mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input

it gives you back

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

answered Feb 1 at 14:33

aborruso

22829

add a comment |

using miller (http://johnkerl.org/miller/doc) is

mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input

it gives you back

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

answered Feb 1 at 14:33

aborruso

22829

using miller (http://johnkerl.org/miller/doc) is

mlr --nidx --fs ':' nest --implode --values --across-records --nested-fs "," -f 2 input

it gives you back

1.1.1.1:21,22

2.2.2.2:443

3.3.3.3:80,443

answered Feb 1 at 14:33

aborruso

22829

answered Feb 1 at 14:33

aborruso

22829

answered Feb 1 at 14:33

aborruso

22829

answered Feb 1 at 14:33

aborruso

22829

add a comment |

Tried with below command and it worked fine

for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done

output

1.1.1.1 21,22

2.2.2.2 443

3.3.3.3 80,443

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

add a comment |

Tried with below command and it worked fine

for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done

output

1.1.1.1 21,22

2.2.2.2 443

3.3.3.3 80,443

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

add a comment |

Tried with below command and it worked fine

for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done

output

1.1.1.1 21,22

2.2.2.2 443

3.3.3.3 80,443

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

Tried with below command and it worked fine

for i in `awk -F ":" '{print $1}' filename| sort | uniq`; do awk -F ":" -v i="$i" '$1 == i{print i,$2}' l.txt| s '/^$/d'| awk '{if (!seen[$1]++ )print }'| tr "n" ","| sed "s/,/ /" ;done

output

1.1.1.1 21,22

2.2.2.2 443

3.3.3.3 80,443

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

answered Feb 1 at 18:40

Praveen Kumar BS

1,474138

add a comment |

$ sed -e '

    :loop

       $!N

       s/^(([^:]*:).*[^[:space:]]).*n2/1,/

    tloop

    P;D

 ' input-file.txt



 1.1.1.1:21,22

 2.2.2.2:443 

 3.3.3.3:80,443



 $ perl -lne '

    my($ip, $port) = /(H+):(H+)/;

    push @seen, $ip if ! exists $h{$ip};

    push @{$h{$ip}}, $port;}{

    print $_,  ":", join ",", @{$h{$_}} for @seen;

 ' input-file.txt

edited Feb 2 at 7:33

answered Feb 2 at 6:01

Rakesh Sharma

302113

add a comment |

$ sed -e '

    :loop

       $!N

       s/^(([^:]*:).*[^[:space:]]).*n2/1,/

    tloop

    P;D

 ' input-file.txt



 1.1.1.1:21,22

 2.2.2.2:443 

 3.3.3.3:80,443



 $ perl -lne '

    my($ip, $port) = /(H+):(H+)/;

    push @seen, $ip if ! exists $h{$ip};

    push @{$h{$ip}}, $port;}{

    print $_,  ":", join ",", @{$h{$_}} for @seen;

 ' input-file.txt

edited Feb 2 at 7:33

answered Feb 2 at 6:01

Rakesh Sharma

302113

add a comment |

$ sed -e '

    :loop

       $!N

       s/^(([^:]*:).*[^[:space:]]).*n2/1,/

    tloop

    P;D

 ' input-file.txt



 1.1.1.1:21,22

 2.2.2.2:443 

 3.3.3.3:80,443



 $ perl -lne '

    my($ip, $port) = /(H+):(H+)/;

    push @seen, $ip if ! exists $h{$ip};

    push @{$h{$ip}}, $port;}{

    print $_,  ":", join ",", @{$h{$_}} for @seen;

 ' input-file.txt

edited Feb 2 at 7:33

answered Feb 2 at 6:01

Rakesh Sharma

302113

$ sed -e '

    :loop

       $!N

       s/^(([^:]*:).*[^[:space:]]).*n2/1,/

    tloop

    P;D

 ' input-file.txt



 1.1.1.1:21,22

 2.2.2.2:443 

 3.3.3.3:80,443



 $ perl -lne '

    my($ip, $port) = /(H+):(H+)/;

    push @seen, $ip if ! exists $h{$ip};

    push @{$h{$ip}}, $port;}{

    print $_,  ":", join ",", @{$h{$_}} for @seen;

 ' input-file.txt

edited Feb 2 at 7:33

answered Feb 2 at 6:01

Rakesh Sharma

302113

edited Feb 2 at 7:33

answered Feb 2 at 6:01

Rakesh Sharma

302113

answered Feb 2 at 6:01

Rakesh Sharma

302113

answered Feb 2 at 6:01

Rakesh Sharma

302113

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly