Convert a DataFrame into Adjacency/Weights Matrix in R

I have a DataFrame, df.

n is a column denoting the number of groups in the x column.
x is a column containing the comma-separated groups.

df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



> df

n        x

2     a, b

3  a, c, d

2     c, d

2     d, b

I would like to convert this DataFrame into a weights matrix where the row and column names are the unique values of the groups in `df$c`, and the elements represent the number of times each of the groups appear together in `df$c`.

The output should look like this:

m <- matrix(c(0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 2, 1, 1, 2, 0), nrow = 4, ncol = 4)

rownames(m) <- letters[1:4]; colnames(m) <- letters[1:4]



> m

  a b c d

a 0 1 1 1

b 1 0 0 1

c 1 0 0 2

d 1 1 2 0

asked 6 hours ago

Rich Pauloo

2,188930

1

your question is unclear. I can't see c in df. it only has n and x

– YOLO
6 hours ago

c is one of the x values. Its a frequency table of how often different letters appear in the same line in x

– RAB
5 hours ago

Do you mean df$x instead of df$c in the bolded part of the question?

– mikoontz
4 hours ago

add a comment |

I have a DataFrame, df.

n is a column denoting the number of groups in the x column.
x is a column containing the comma-separated groups.

df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



> df

n        x

2     a, b

3  a, c, d

2     c, d

2     d, b

I would like to convert this DataFrame into a weights matrix where the row and column names are the unique values of the groups in `df$c`, and the elements represent the number of times each of the groups appear together in `df$c`.

The output should look like this:

m <- matrix(c(0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 2, 1, 1, 2, 0), nrow = 4, ncol = 4)

rownames(m) <- letters[1:4]; colnames(m) <- letters[1:4]



> m

  a b c d

a 0 1 1 1

b 1 0 0 1

c 1 0 0 2

d 1 1 2 0

asked 6 hours ago

Rich Pauloo

2,188930

1

your question is unclear. I can't see c in df. it only has n and x

– YOLO
6 hours ago

c is one of the x values. Its a frequency table of how often different letters appear in the same line in x

– RAB
5 hours ago

Do you mean df$x instead of df$c in the bolded part of the question?

– mikoontz
4 hours ago

add a comment |

I have a DataFrame, df.

n is a column denoting the number of groups in the x column.
x is a column containing the comma-separated groups.

df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



> df

n        x

2     a, b

3  a, c, d

2     c, d

2     d, b

I would like to convert this DataFrame into a weights matrix where the row and column names are the unique values of the groups in `df$c`, and the elements represent the number of times each of the groups appear together in `df$c`.

The output should look like this:

m <- matrix(c(0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 2, 1, 1, 2, 0), nrow = 4, ncol = 4)

rownames(m) <- letters[1:4]; colnames(m) <- letters[1:4]



> m

  a b c d

a 0 1 1 1

b 1 0 0 1

c 1 0 0 2

d 1 1 2 0

asked 6 hours ago

Rich Pauloo

2,188930

I have a DataFrame, df.

n is a column denoting the number of groups in the x column.
x is a column containing the comma-separated groups.

df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



> df

n        x

2     a, b

3  a, c, d

2     c, d

2     d, b

I would like to convert this DataFrame into a weights matrix where the row and column names are the unique values of the groups in `df$c`, and the elements represent the number of times each of the groups appear together in `df$c`.

The output should look like this:

m <- matrix(c(0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 2, 1, 1, 2, 0), nrow = 4, ncol = 4)

rownames(m) <- letters[1:4]; colnames(m) <- letters[1:4]



> m

  a b c d

a 0 1 1 1

b 1 0 0 1

c 1 0 0 2

d 1 1 2 0

r matrix adjacency-matrix

asked 6 hours ago

Rich Pauloo

2,188930

asked 6 hours ago

Rich Pauloo

2,188930

asked 6 hours ago

Rich Pauloo

2,188930

asked 6 hours ago

Rich Pauloo

2,188930

asked 6 hours ago

Rich Pauloo

2,188930

1

your question is unclear. I can't see c in df. it only has n and x

– YOLO
6 hours ago

c is one of the x values. Its a frequency table of how often different letters appear in the same line in x

– RAB
5 hours ago

Do you mean df$x instead of df$c in the bolded part of the question?

– mikoontz
4 hours ago

add a comment |

1

your question is unclear. I can't see c in df. it only has n and x

– YOLO
6 hours ago

c is one of the x values. Its a frequency table of how often different letters appear in the same line in x

– RAB
5 hours ago

Do you mean df$x instead of df$c in the bolded part of the question?

– mikoontz
4 hours ago

your question is unclear. I can't see c in df. it only has n and x

– YOLO
6 hours ago

c is one of the x values. Its a frequency table of how often different letters appear in the same line in x

– RAB
5 hours ago

Do you mean df$x instead of df$c in the bolded part of the question?

– mikoontz
4 hours ago

add a comment |

3 Answers
3

active

oldest

votes

Here's a very rough and probably pretty inefficient solution using tidyverse for wrangling and combinat to generate permutations.

library(tidyverse)

library(combinat)



df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



df %>% 

    ## Parse entries in x into distinct elements

    mutate(split = map(x, str_split, pattern = ', '), 

           flat = flatten(split)) %>% 

    ## Construct 2-element subsets of each set of elements

    mutate(combn = map(flat, combn, 2, simplify = FALSE)) %>% 

    unnest(combn) %>% 

    ## Construct permutations of the 2-element subsets

    mutate(perm = map(combn, permn)) %>% 

    unnest(perm) %>% 

    ## Parse the permutations into row and column indices

    mutate(row = map_chr(perm, 1), 

           col = map_chr(perm, 2)) %>% 

    count(row, col) %>% 

    ## Long to wide representation

    spread(key = col, value = nn, fill = 0) %>% 

    ## Coerce to matrix

    column_to_rownames(var = 'row') %>% 

    as.matrix()

answered 5 hours ago

Dan Hicks

1876

add a comment |

Using Base R, you could do something like below

a = strsplit(as.character(df$x),', ')

b = unique(unlist(a))

d = unlist(sapply(a,combn,2,toString))

e = data.frame(table(factor(d,c(paste(b,b,sep=','),combn(b,2,toString)))))

f = read.table(text = do.call(paste,c(sep =',', e)),sep=',',strip.white = T)

g = xtabs(V3~V1+V2,f)

g[lower.tri(g)] = t(g)[lower.tri(g)]

g

   V2

V1  a b c d

  a 0 1 1 1

  b 1 0 0 0

  c 1 0 0 2

  d 1 0 2 0

answered 4 hours ago

Onyambu

15.5k1520

add a comment |

Here is another possible approach using data.table:

#generate the combis

combis <- df[, transpose(combn(sort(strsplit(x, ", ")[[1L]]), 2L, simplify=FALSE)), 

    by=1L:df[,.N]]



#create new rows for identical letters within a pair or any other missing combi

withDiag <- out[CJ(c(V1,V2), c(V1,V2), unique=TRUE), on=.(V1, V2)]



#duplicate the above for lower triangular part of the matrix

withLowerTri <- rbindlist(list(withDiag, withDiag[,.(df, V2, V1)]))



#pivot to get weights matrix

outDT <- dcast(withLowerTri, V1 ~ V2, function(x) sum(!is.na(x)), value.var="df")

outDT output:

   V1 a b c d

1:  a 0 1 1 1

2:  b 1 0 0 1

3:  c 1 0 0 2

4:  d 1 1 2 0

If matrix output is desired, then

mat <- as.matrix(outDT[, -1L])

rownames(mat) <- unlist(outDT[,1L])

output:

edited 1 hour ago

answered 1 hour ago

chinsoon12

8,66111219

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54338215%2fconvert-a-dataframe-into-adjacency-weights-matrix-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Here's a very rough and probably pretty inefficient solution using tidyverse for wrangling and combinat to generate permutations.

library(tidyverse)

library(combinat)



df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



df %>% 

    ## Parse entries in x into distinct elements

    mutate(split = map(x, str_split, pattern = ', '), 

           flat = flatten(split)) %>% 

    ## Construct 2-element subsets of each set of elements

    mutate(combn = map(flat, combn, 2, simplify = FALSE)) %>% 

    unnest(combn) %>% 

    ## Construct permutations of the 2-element subsets

    mutate(perm = map(combn, permn)) %>% 

    unnest(perm) %>% 

    ## Parse the permutations into row and column indices

    mutate(row = map_chr(perm, 1), 

           col = map_chr(perm, 2)) %>% 

    count(row, col) %>% 

    ## Long to wide representation

    spread(key = col, value = nn, fill = 0) %>% 

    ## Coerce to matrix

    column_to_rownames(var = 'row') %>% 

    as.matrix()

answered 5 hours ago

Dan Hicks

1876

add a comment |

Here's a very rough and probably pretty inefficient solution using tidyverse for wrangling and combinat to generate permutations.

library(tidyverse)

library(combinat)



df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



df %>% 

    ## Parse entries in x into distinct elements

    mutate(split = map(x, str_split, pattern = ', '), 

           flat = flatten(split)) %>% 

    ## Construct 2-element subsets of each set of elements

    mutate(combn = map(flat, combn, 2, simplify = FALSE)) %>% 

    unnest(combn) %>% 

    ## Construct permutations of the 2-element subsets

    mutate(perm = map(combn, permn)) %>% 

    unnest(perm) %>% 

    ## Parse the permutations into row and column indices

    mutate(row = map_chr(perm, 1), 

           col = map_chr(perm, 2)) %>% 

    count(row, col) %>% 

    ## Long to wide representation

    spread(key = col, value = nn, fill = 0) %>% 

    ## Coerce to matrix

    column_to_rownames(var = 'row') %>% 

    as.matrix()

answered 5 hours ago

Dan Hicks

1876

add a comment |

Here's a very rough and probably pretty inefficient solution using tidyverse for wrangling and combinat to generate permutations.

library(tidyverse)

library(combinat)



df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



df %>% 

    ## Parse entries in x into distinct elements

    mutate(split = map(x, str_split, pattern = ', '), 

           flat = flatten(split)) %>% 

    ## Construct 2-element subsets of each set of elements

    mutate(combn = map(flat, combn, 2, simplify = FALSE)) %>% 

    unnest(combn) %>% 

    ## Construct permutations of the 2-element subsets

    mutate(perm = map(combn, permn)) %>% 

    unnest(perm) %>% 

    ## Parse the permutations into row and column indices

    mutate(row = map_chr(perm, 1), 

           col = map_chr(perm, 2)) %>% 

    count(row, col) %>% 

    ## Long to wide representation

    spread(key = col, value = nn, fill = 0) %>% 

    ## Coerce to matrix

    column_to_rownames(var = 'row') %>% 

    as.matrix()

answered 5 hours ago

Dan Hicks

1876

Here's a very rough and probably pretty inefficient solution using tidyverse for wrangling and combinat to generate permutations.

library(tidyverse)

library(combinat)



df <- data.frame(n = c(2, 3, 2, 2), 

                 x = c("a, b", "a, c, d", "c, d", "d, b"))



df %>% 

    ## Parse entries in x into distinct elements

    mutate(split = map(x, str_split, pattern = ', '), 

           flat = flatten(split)) %>% 

    ## Construct 2-element subsets of each set of elements

    mutate(combn = map(flat, combn, 2, simplify = FALSE)) %>% 

    unnest(combn) %>% 

    ## Construct permutations of the 2-element subsets

    mutate(perm = map(combn, permn)) %>% 

    unnest(perm) %>% 

    ## Parse the permutations into row and column indices

    mutate(row = map_chr(perm, 1), 

           col = map_chr(perm, 2)) %>% 

    count(row, col) %>% 

    ## Long to wide representation

    spread(key = col, value = nn, fill = 0) %>% 

    ## Coerce to matrix

    column_to_rownames(var = 'row') %>% 

    as.matrix()

answered 5 hours ago

Dan Hicks

1876

answered 5 hours ago

Dan Hicks

1876

answered 5 hours ago

Dan Hicks

1876

answered 5 hours ago

Dan Hicks

1876

add a comment |

Using Base R, you could do something like below

a = strsplit(as.character(df$x),', ')

b = unique(unlist(a))

d = unlist(sapply(a,combn,2,toString))

e = data.frame(table(factor(d,c(paste(b,b,sep=','),combn(b,2,toString)))))

f = read.table(text = do.call(paste,c(sep =',', e)),sep=',',strip.white = T)

g = xtabs(V3~V1+V2,f)

g[lower.tri(g)] = t(g)[lower.tri(g)]

g

   V2

V1  a b c d

  a 0 1 1 1

  b 1 0 0 0

  c 1 0 0 2

  d 1 0 2 0

answered 4 hours ago

Onyambu

15.5k1520

add a comment |

Using Base R, you could do something like below

a = strsplit(as.character(df$x),', ')

b = unique(unlist(a))

d = unlist(sapply(a,combn,2,toString))

e = data.frame(table(factor(d,c(paste(b,b,sep=','),combn(b,2,toString)))))

f = read.table(text = do.call(paste,c(sep =',', e)),sep=',',strip.white = T)

g = xtabs(V3~V1+V2,f)

g[lower.tri(g)] = t(g)[lower.tri(g)]

g

   V2

V1  a b c d

  a 0 1 1 1

  b 1 0 0 0

  c 1 0 0 2

  d 1 0 2 0

answered 4 hours ago

Onyambu

15.5k1520

add a comment |

Using Base R, you could do something like below

a = strsplit(as.character(df$x),', ')

b = unique(unlist(a))

d = unlist(sapply(a,combn,2,toString))

e = data.frame(table(factor(d,c(paste(b,b,sep=','),combn(b,2,toString)))))

f = read.table(text = do.call(paste,c(sep =',', e)),sep=',',strip.white = T)

g = xtabs(V3~V1+V2,f)

g[lower.tri(g)] = t(g)[lower.tri(g)]

g

   V2

V1  a b c d

  a 0 1 1 1

  b 1 0 0 0

  c 1 0 0 2

  d 1 0 2 0

answered 4 hours ago

Onyambu

15.5k1520

Using Base R, you could do something like below

a = strsplit(as.character(df$x),', ')

b = unique(unlist(a))

d = unlist(sapply(a,combn,2,toString))

e = data.frame(table(factor(d,c(paste(b,b,sep=','),combn(b,2,toString)))))

f = read.table(text = do.call(paste,c(sep =',', e)),sep=',',strip.white = T)

g = xtabs(V3~V1+V2,f)

g[lower.tri(g)] = t(g)[lower.tri(g)]

g

   V2

V1  a b c d

  a 0 1 1 1

  b 1 0 0 0

  c 1 0 0 2

  d 1 0 2 0

answered 4 hours ago

Onyambu

15.5k1520

answered 4 hours ago

Onyambu

15.5k1520

answered 4 hours ago

Onyambu

15.5k1520

answered 4 hours ago

Onyambu

15.5k1520

add a comment |

Here is another possible approach using data.table:

#generate the combis

combis <- df[, transpose(combn(sort(strsplit(x, ", ")[[1L]]), 2L, simplify=FALSE)), 

    by=1L:df[,.N]]



#create new rows for identical letters within a pair or any other missing combi

withDiag <- out[CJ(c(V1,V2), c(V1,V2), unique=TRUE), on=.(V1, V2)]



#duplicate the above for lower triangular part of the matrix

withLowerTri <- rbindlist(list(withDiag, withDiag[,.(df, V2, V1)]))



#pivot to get weights matrix

outDT <- dcast(withLowerTri, V1 ~ V2, function(x) sum(!is.na(x)), value.var="df")

outDT output:

   V1 a b c d

1:  a 0 1 1 1

2:  b 1 0 0 1

3:  c 1 0 0 2

4:  d 1 1 2 0

If matrix output is desired, then

mat <- as.matrix(outDT[, -1L])

rownames(mat) <- unlist(outDT[,1L])

output:

edited 1 hour ago

answered 1 hour ago

chinsoon12

8,66111219

add a comment |

Here is another possible approach using data.table:

#generate the combis

combis <- df[, transpose(combn(sort(strsplit(x, ", ")[[1L]]), 2L, simplify=FALSE)), 

    by=1L:df[,.N]]



#create new rows for identical letters within a pair or any other missing combi

withDiag <- out[CJ(c(V1,V2), c(V1,V2), unique=TRUE), on=.(V1, V2)]



#duplicate the above for lower triangular part of the matrix

withLowerTri <- rbindlist(list(withDiag, withDiag[,.(df, V2, V1)]))



#pivot to get weights matrix

outDT <- dcast(withLowerTri, V1 ~ V2, function(x) sum(!is.na(x)), value.var="df")

outDT output:

   V1 a b c d

1:  a 0 1 1 1

2:  b 1 0 0 1

3:  c 1 0 0 2

4:  d 1 1 2 0

If matrix output is desired, then

mat <- as.matrix(outDT[, -1L])

rownames(mat) <- unlist(outDT[,1L])

output:

edited 1 hour ago

answered 1 hour ago

chinsoon12

8,66111219

add a comment |

Here is another possible approach using data.table:

#generate the combis

combis <- df[, transpose(combn(sort(strsplit(x, ", ")[[1L]]), 2L, simplify=FALSE)), 

    by=1L:df[,.N]]



#create new rows for identical letters within a pair or any other missing combi

withDiag <- out[CJ(c(V1,V2), c(V1,V2), unique=TRUE), on=.(V1, V2)]



#duplicate the above for lower triangular part of the matrix

withLowerTri <- rbindlist(list(withDiag, withDiag[,.(df, V2, V1)]))



#pivot to get weights matrix

outDT <- dcast(withLowerTri, V1 ~ V2, function(x) sum(!is.na(x)), value.var="df")

outDT output:

   V1 a b c d

1:  a 0 1 1 1

2:  b 1 0 0 1

3:  c 1 0 0 2

4:  d 1 1 2 0

If matrix output is desired, then

mat <- as.matrix(outDT[, -1L])

rownames(mat) <- unlist(outDT[,1L])

output:

edited 1 hour ago

answered 1 hour ago

chinsoon12

8,66111219

Here is another possible approach using data.table:

#generate the combis

combis <- df[, transpose(combn(sort(strsplit(x, ", ")[[1L]]), 2L, simplify=FALSE)), 

    by=1L:df[,.N]]



#create new rows for identical letters within a pair or any other missing combi

withDiag <- out[CJ(c(V1,V2), c(V1,V2), unique=TRUE), on=.(V1, V2)]



#duplicate the above for lower triangular part of the matrix

withLowerTri <- rbindlist(list(withDiag, withDiag[,.(df, V2, V1)]))



#pivot to get weights matrix

outDT <- dcast(withLowerTri, V1 ~ V2, function(x) sum(!is.na(x)), value.var="df")

outDT output:

   V1 a b c d

1:  a 0 1 1 1

2:  b 1 0 0 1

3:  c 1 0 0 2

4:  d 1 1 2 0

If matrix output is desired, then

mat <- as.matrix(outDT[, -1L])

rownames(mat) <- unlist(outDT[,1L])

output:

edited 1 hour ago

answered 1 hour ago

chinsoon12

8,66111219

edited 1 hour ago

answered 1 hour ago

chinsoon12

8,66111219

answered 1 hour ago

chinsoon12

8,66111219

answered 1 hour ago

chinsoon12

8,66111219

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly