Group files into average size of the largest file

I have 6 files and would like to group them by 2 or 3 according to the average size.

file1.log 50G

file2.log 40G

file3.log 20G

file4.log 10G

file5.log 30G

file6.log 70G

File6 is 70G is the biggest file and I would like to group the rest of the files according to the biggest.

The output should look like this:

Group by 1 should contain all the files - Parallel 1

Group by 2 - Parallel 2

Output 1

file4.log 10G

file5.log 30G

file6.log 70G

Output 2

file1.log 50G

file2.log 40G

file3.log 20G

Notice average is both files are equals.

The third group parallel 3 should look like this:

output 1

file6.log 70G

output2

file1.log 50G

file3.log 20G

output3

file2.log 40G

file4.log 10G

file5.log 30G

It does not have to be the exact average, just divide the file the closest average possible.

Thanks!!

edited Nov 27 '18 at 6:12

Inian

4,015824

asked Nov 27 '18 at 6:04

Kwa Arboncana

1

What have you tried?

– Peschke
Nov 27 '18 at 7:09

Here is what I have tried.

– Kwa Arboncana
Nov 27 '18 at 13:39

I don't see any code or research in your question. You may get better answers if you show what you have tried, rather than asking us to write code for you.

– Peschke
Nov 27 '18 at 16:22

add a comment |

I have 6 files and would like to group them by 2 or 3 according to the average size.

file1.log 50G

file2.log 40G

file3.log 20G

file4.log 10G

file5.log 30G

file6.log 70G

File6 is 70G is the biggest file and I would like to group the rest of the files according to the biggest.

The output should look like this:

Group by 1 should contain all the files - Parallel 1

Group by 2 - Parallel 2

Output 1

file4.log 10G

file5.log 30G

file6.log 70G

Output 2

file1.log 50G

file2.log 40G

file3.log 20G

Notice average is both files are equals.

The third group parallel 3 should look like this:

output 1

file6.log 70G

output2

file1.log 50G

file3.log 20G

output3

file2.log 40G

file4.log 10G

file5.log 30G

It does not have to be the exact average, just divide the file the closest average possible.

Thanks!!

edited Nov 27 '18 at 6:12

Inian

4,015824

asked Nov 27 '18 at 6:04

Kwa Arboncana

1

What have you tried?

– Peschke
Nov 27 '18 at 7:09

Here is what I have tried.

– Kwa Arboncana
Nov 27 '18 at 13:39

I don't see any code or research in your question. You may get better answers if you show what you have tried, rather than asking us to write code for you.

– Peschke
Nov 27 '18 at 16:22

add a comment |

I have 6 files and would like to group them by 2 or 3 according to the average size.

file1.log 50G

file2.log 40G

file3.log 20G

file4.log 10G

file5.log 30G

file6.log 70G

File6 is 70G is the biggest file and I would like to group the rest of the files according to the biggest.

The output should look like this:

Group by 1 should contain all the files - Parallel 1

Group by 2 - Parallel 2

Output 1

file4.log 10G

file5.log 30G

file6.log 70G

Output 2

file1.log 50G

file2.log 40G

file3.log 20G

Notice average is both files are equals.

The third group parallel 3 should look like this:

output 1

file6.log 70G

output2

file1.log 50G

file3.log 20G

output3

file2.log 40G

file4.log 10G

file5.log 30G

It does not have to be the exact average, just divide the file the closest average possible.

Thanks!!

edited Nov 27 '18 at 6:12

Inian

4,015824

asked Nov 27 '18 at 6:04

Kwa Arboncana

I have 6 files and would like to group them by 2 or 3 according to the average size.

file1.log 50G

file2.log 40G

file3.log 20G

file4.log 10G

file5.log 30G

file6.log 70G

File6 is 70G is the biggest file and I would like to group the rest of the files according to the biggest.

The output should look like this:

Group by 1 should contain all the files - Parallel 1

Group by 2 - Parallel 2

Output 1

file4.log 10G

file5.log 30G

file6.log 70G

Output 2

file1.log 50G

file2.log 40G

file3.log 20G

Notice average is both files are equals.

The third group parallel 3 should look like this:

output 1

file6.log 70G

output2

file1.log 50G

file3.log 20G

output3

file2.log 40G

file4.log 10G

file5.log 30G

It does not have to be the exact average, just divide the file the closest average possible.

Thanks!!

shell-script awk sed

edited Nov 27 '18 at 6:12

Inian

4,015824

asked Nov 27 '18 at 6:04

Kwa Arboncana

edited Nov 27 '18 at 6:12

Inian

4,015824

asked Nov 27 '18 at 6:04

Kwa Arboncana

edited Nov 27 '18 at 6:12

Inian

4,015824

edited Nov 27 '18 at 6:12

Inian

4,015824

edited Nov 27 '18 at 6:12

Inian

4,015824

asked Nov 27 '18 at 6:04

Kwa Arboncana

asked Nov 27 '18 at 6:04

Kwa Arboncana

asked Nov 27 '18 at 6:04

Kwa Arboncana

1

What have you tried?

– Peschke
Nov 27 '18 at 7:09

Here is what I have tried.

– Kwa Arboncana
Nov 27 '18 at 13:39

I don't see any code or research in your question. You may get better answers if you show what you have tried, rather than asking us to write code for you.

– Peschke
Nov 27 '18 at 16:22

add a comment |

1

What have you tried?

– Peschke
Nov 27 '18 at 7:09

Here is what I have tried.

– Kwa Arboncana
Nov 27 '18 at 13:39

I don't see any code or research in your question. You may get better answers if you show what you have tried, rather than asking us to write code for you.

– Peschke
Nov 27 '18 at 16:22

What have you tried?

– Peschke
Nov 27 '18 at 7:09

Here is what I have tried.

– Kwa Arboncana
Nov 27 '18 at 13:39

I don't see any code or research in your question. You may get better answers if you show what you have tried, rather than asking us to write code for you.

– Peschke
Nov 27 '18 at 16:22

add a comment |

2 Answers
2

active

oldest

votes

#!/usr/bin/env zsh



# To care about hidden filenames:

#setopt GLOB_DOTS



# Load the zstat builtin

zmodload -F zsh/stat b:zstat



# Get the regular files in the current directory,

# ordered by size (largest first)

files=( ./*(.OL) )



# Precalculate the filesizes

typeset -A filesizes

for file in "${files[@]}"; do

    filesizes[$file]=$( zstat +size "$file" )

done



# The maximum size of a bin is the size of the largest file

maxsize=${filesizes[${files[1]}]}



binsizes=()

typeset -A filebins

for file in "${files[@]}"; do

    filesize=${filesizes[$file]}

    bin=1   # try fitting into first bin first

    ok=0    # haven't yet found a bin for this file

    for binsize in "${binsizes[@]}"; do

        if (( filesize + binsize <= maxsize )); then

            # File fits in this bin,

            # update bin size and place file in bin

            binsizes[$bin]=$(( filesize + binsize ))

            filebins[$file]=$bin

            ok=1    # now we're good

            break

        fi

        # Try next bin

        bin=$(( bin + 1 ))

    done



    if [ "$ok" -eq 0 ]; then

        # Wasn't able to fit file in existing bin,

        # create new bin

        binsizes+=( "$filesize" )

        filebins[$file]=${#binsizes[@]}

    fi

done



# Do final output

printf 'Bin max size = %dn' "$maxsize"

for file in "${files[@]}"; do

    printf '%d: %s (file size=%d / bin size=%d)n' "${filebins[$file]}" "$file" 

        "${filesizes[$file]}" "${binsizes[$filebins[$file]]}"

done | sort -n

The above zsh shell script does binning of all the files in the current directory with a maximum bin size based strictly on the size of the largest file. It implements a first-fit algorithm with the files ordered by decreasing size. This is what's called the "FFD" algorithm in the "Bin packing problem" Wikipedia article. The "MFFD" algorithm is non-trivial to implement in zsh in less than 200 or so lines of code, so I won't post it here.

Testing:

$ ls -l

total 450816

-rw-r--r--  1 kk  wheel  10485760 Jan 19 23:53 file-10.log

-rw-r--r--  1 kk  wheel  20971520 Jan 19 23:53 file-20.log

-rw-r--r--  1 kk  wheel  31457280 Jan 19 23:53 file-30.log

-rw-r--r--  1 kk  wheel  41943040 Jan 19 23:53 file-40.log

-rw-r--r--  1 kk  wheel  52428800 Jan 19 23:53 file-50.log

-rw-r--r--  1 kk  wheel  73400320 Jan 19 23:53 file-70.log

$ zsh ../script.sh

Bin max size = 73400320

1: ./file-70.log (file size=73400320 / bin size=73400320)

2: ./file-20.log (file size=20971520 / bin size=73400320)

2: ./file-50.log (file size=52428800 / bin size=73400320)

3: ./file-30.log (file size=31457280 / bin size=73400320)

3: ./file-40.log (file size=41943040 / bin size=73400320)

4: ./file-10.log (file size=10485760 / bin size=10485760)

The number at the start of each line above corresponds to the bin number assigned to the file.

edited yesterday

answered Jan 19 at 23:50

Kusalananda

127k16239393

add a comment |

This seems to be pretty much equivalent to the Bin Packing problem.

The Bin Packing problem is NP-hard, so there is no known shortcut to doing it, brute force (trying all the options in some sensible order that excludes silly attempts, like adding more files to an already oversized group) is the way to go.

For six files, the brute force approach should be simple enough to do by hand; just list all the possible groupings, count how they split the file usage, and choose the one that gives you the smallest maximum group size.

answered Nov 27 '18 at 8:24

Bass

22113

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484369%2fgroup-files-into-average-size-of-the-largest-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

#!/usr/bin/env zsh



# To care about hidden filenames:

#setopt GLOB_DOTS



# Load the zstat builtin

zmodload -F zsh/stat b:zstat



# Get the regular files in the current directory,

# ordered by size (largest first)

files=( ./*(.OL) )



# Precalculate the filesizes

typeset -A filesizes

for file in "${files[@]}"; do

    filesizes[$file]=$( zstat +size "$file" )

done



# The maximum size of a bin is the size of the largest file

maxsize=${filesizes[${files[1]}]}



binsizes=()

typeset -A filebins

for file in "${files[@]}"; do

    filesize=${filesizes[$file]}

    bin=1   # try fitting into first bin first

    ok=0    # haven't yet found a bin for this file

    for binsize in "${binsizes[@]}"; do

        if (( filesize + binsize <= maxsize )); then

            # File fits in this bin,

            # update bin size and place file in bin

            binsizes[$bin]=$(( filesize + binsize ))

            filebins[$file]=$bin

            ok=1    # now we're good

            break

        fi

        # Try next bin

        bin=$(( bin + 1 ))

    done



    if [ "$ok" -eq 0 ]; then

        # Wasn't able to fit file in existing bin,

        # create new bin

        binsizes+=( "$filesize" )

        filebins[$file]=${#binsizes[@]}

    fi

done



# Do final output

printf 'Bin max size = %dn' "$maxsize"

for file in "${files[@]}"; do

    printf '%d: %s (file size=%d / bin size=%d)n' "${filebins[$file]}" "$file" 

        "${filesizes[$file]}" "${binsizes[$filebins[$file]]}"

done | sort -n

Testing:

$ ls -l

total 450816

-rw-r--r--  1 kk  wheel  10485760 Jan 19 23:53 file-10.log

-rw-r--r--  1 kk  wheel  20971520 Jan 19 23:53 file-20.log

-rw-r--r--  1 kk  wheel  31457280 Jan 19 23:53 file-30.log

-rw-r--r--  1 kk  wheel  41943040 Jan 19 23:53 file-40.log

-rw-r--r--  1 kk  wheel  52428800 Jan 19 23:53 file-50.log

-rw-r--r--  1 kk  wheel  73400320 Jan 19 23:53 file-70.log

$ zsh ../script.sh

Bin max size = 73400320

1: ./file-70.log (file size=73400320 / bin size=73400320)

2: ./file-20.log (file size=20971520 / bin size=73400320)

2: ./file-50.log (file size=52428800 / bin size=73400320)

3: ./file-30.log (file size=31457280 / bin size=73400320)

3: ./file-40.log (file size=41943040 / bin size=73400320)

4: ./file-10.log (file size=10485760 / bin size=10485760)

The number at the start of each line above corresponds to the bin number assigned to the file.

edited yesterday

answered Jan 19 at 23:50

Kusalananda

127k16239393

add a comment |

#!/usr/bin/env zsh



# To care about hidden filenames:

#setopt GLOB_DOTS



# Load the zstat builtin

zmodload -F zsh/stat b:zstat



# Get the regular files in the current directory,

# ordered by size (largest first)

files=( ./*(.OL) )



# Precalculate the filesizes

typeset -A filesizes

for file in "${files[@]}"; do

    filesizes[$file]=$( zstat +size "$file" )

done



# The maximum size of a bin is the size of the largest file

maxsize=${filesizes[${files[1]}]}



binsizes=()

typeset -A filebins

for file in "${files[@]}"; do

    filesize=${filesizes[$file]}

    bin=1   # try fitting into first bin first

    ok=0    # haven't yet found a bin for this file

    for binsize in "${binsizes[@]}"; do

        if (( filesize + binsize <= maxsize )); then

            # File fits in this bin,

            # update bin size and place file in bin

            binsizes[$bin]=$(( filesize + binsize ))

            filebins[$file]=$bin

            ok=1    # now we're good

            break

        fi

        # Try next bin

        bin=$(( bin + 1 ))

    done



    if [ "$ok" -eq 0 ]; then

        # Wasn't able to fit file in existing bin,

        # create new bin

        binsizes+=( "$filesize" )

        filebins[$file]=${#binsizes[@]}

    fi

done



# Do final output

printf 'Bin max size = %dn' "$maxsize"

for file in "${files[@]}"; do

    printf '%d: %s (file size=%d / bin size=%d)n' "${filebins[$file]}" "$file" 

        "${filesizes[$file]}" "${binsizes[$filebins[$file]]}"

done | sort -n

Testing:

$ ls -l

total 450816

-rw-r--r--  1 kk  wheel  10485760 Jan 19 23:53 file-10.log

-rw-r--r--  1 kk  wheel  20971520 Jan 19 23:53 file-20.log

-rw-r--r--  1 kk  wheel  31457280 Jan 19 23:53 file-30.log

-rw-r--r--  1 kk  wheel  41943040 Jan 19 23:53 file-40.log

-rw-r--r--  1 kk  wheel  52428800 Jan 19 23:53 file-50.log

-rw-r--r--  1 kk  wheel  73400320 Jan 19 23:53 file-70.log

$ zsh ../script.sh

Bin max size = 73400320

1: ./file-70.log (file size=73400320 / bin size=73400320)

2: ./file-20.log (file size=20971520 / bin size=73400320)

2: ./file-50.log (file size=52428800 / bin size=73400320)

3: ./file-30.log (file size=31457280 / bin size=73400320)

3: ./file-40.log (file size=41943040 / bin size=73400320)

4: ./file-10.log (file size=10485760 / bin size=10485760)

The number at the start of each line above corresponds to the bin number assigned to the file.

edited yesterday

answered Jan 19 at 23:50

Kusalananda

127k16239393

add a comment |

#!/usr/bin/env zsh



# To care about hidden filenames:

#setopt GLOB_DOTS



# Load the zstat builtin

zmodload -F zsh/stat b:zstat



# Get the regular files in the current directory,

# ordered by size (largest first)

files=( ./*(.OL) )



# Precalculate the filesizes

typeset -A filesizes

for file in "${files[@]}"; do

    filesizes[$file]=$( zstat +size "$file" )

done



# The maximum size of a bin is the size of the largest file

maxsize=${filesizes[${files[1]}]}



binsizes=()

typeset -A filebins

for file in "${files[@]}"; do

    filesize=${filesizes[$file]}

    bin=1   # try fitting into first bin first

    ok=0    # haven't yet found a bin for this file

    for binsize in "${binsizes[@]}"; do

        if (( filesize + binsize <= maxsize )); then

            # File fits in this bin,

            # update bin size and place file in bin

            binsizes[$bin]=$(( filesize + binsize ))

            filebins[$file]=$bin

            ok=1    # now we're good

            break

        fi

        # Try next bin

        bin=$(( bin + 1 ))

    done



    if [ "$ok" -eq 0 ]; then

        # Wasn't able to fit file in existing bin,

        # create new bin

        binsizes+=( "$filesize" )

        filebins[$file]=${#binsizes[@]}

    fi

done



# Do final output

printf 'Bin max size = %dn' "$maxsize"

for file in "${files[@]}"; do

    printf '%d: %s (file size=%d / bin size=%d)n' "${filebins[$file]}" "$file" 

        "${filesizes[$file]}" "${binsizes[$filebins[$file]]}"

done | sort -n

Testing:

$ ls -l

total 450816

-rw-r--r--  1 kk  wheel  10485760 Jan 19 23:53 file-10.log

-rw-r--r--  1 kk  wheel  20971520 Jan 19 23:53 file-20.log

-rw-r--r--  1 kk  wheel  31457280 Jan 19 23:53 file-30.log

-rw-r--r--  1 kk  wheel  41943040 Jan 19 23:53 file-40.log

-rw-r--r--  1 kk  wheel  52428800 Jan 19 23:53 file-50.log

-rw-r--r--  1 kk  wheel  73400320 Jan 19 23:53 file-70.log

$ zsh ../script.sh

Bin max size = 73400320

1: ./file-70.log (file size=73400320 / bin size=73400320)

2: ./file-20.log (file size=20971520 / bin size=73400320)

2: ./file-50.log (file size=52428800 / bin size=73400320)

3: ./file-30.log (file size=31457280 / bin size=73400320)

3: ./file-40.log (file size=41943040 / bin size=73400320)

4: ./file-10.log (file size=10485760 / bin size=10485760)

The number at the start of each line above corresponds to the bin number assigned to the file.

edited yesterday

answered Jan 19 at 23:50

Kusalananda

127k16239393

#!/usr/bin/env zsh



# To care about hidden filenames:

#setopt GLOB_DOTS



# Load the zstat builtin

zmodload -F zsh/stat b:zstat



# Get the regular files in the current directory,

# ordered by size (largest first)

files=( ./*(.OL) )



# Precalculate the filesizes

typeset -A filesizes

for file in "${files[@]}"; do

    filesizes[$file]=$( zstat +size "$file" )

done



# The maximum size of a bin is the size of the largest file

maxsize=${filesizes[${files[1]}]}



binsizes=()

typeset -A filebins

for file in "${files[@]}"; do

    filesize=${filesizes[$file]}

    bin=1   # try fitting into first bin first

    ok=0    # haven't yet found a bin for this file

    for binsize in "${binsizes[@]}"; do

        if (( filesize + binsize <= maxsize )); then

            # File fits in this bin,

            # update bin size and place file in bin

            binsizes[$bin]=$(( filesize + binsize ))

            filebins[$file]=$bin

            ok=1    # now we're good

            break

        fi

        # Try next bin

        bin=$(( bin + 1 ))

    done



    if [ "$ok" -eq 0 ]; then

        # Wasn't able to fit file in existing bin,

        # create new bin

        binsizes+=( "$filesize" )

        filebins[$file]=${#binsizes[@]}

    fi

done



# Do final output

printf 'Bin max size = %dn' "$maxsize"

for file in "${files[@]}"; do

    printf '%d: %s (file size=%d / bin size=%d)n' "${filebins[$file]}" "$file" 

        "${filesizes[$file]}" "${binsizes[$filebins[$file]]}"

done | sort -n

Testing:

$ ls -l

total 450816

-rw-r--r--  1 kk  wheel  10485760 Jan 19 23:53 file-10.log

-rw-r--r--  1 kk  wheel  20971520 Jan 19 23:53 file-20.log

-rw-r--r--  1 kk  wheel  31457280 Jan 19 23:53 file-30.log

-rw-r--r--  1 kk  wheel  41943040 Jan 19 23:53 file-40.log

-rw-r--r--  1 kk  wheel  52428800 Jan 19 23:53 file-50.log

-rw-r--r--  1 kk  wheel  73400320 Jan 19 23:53 file-70.log

$ zsh ../script.sh

Bin max size = 73400320

1: ./file-70.log (file size=73400320 / bin size=73400320)

2: ./file-20.log (file size=20971520 / bin size=73400320)

2: ./file-50.log (file size=52428800 / bin size=73400320)

3: ./file-30.log (file size=31457280 / bin size=73400320)

3: ./file-40.log (file size=41943040 / bin size=73400320)

4: ./file-10.log (file size=10485760 / bin size=10485760)

The number at the start of each line above corresponds to the bin number assigned to the file.

edited yesterday

answered Jan 19 at 23:50

Kusalananda

127k16239393

edited yesterday

answered Jan 19 at 23:50

Kusalananda

127k16239393

answered Jan 19 at 23:50

Kusalananda

127k16239393

answered Jan 19 at 23:50

Kusalananda

127k16239393

add a comment |

This seems to be pretty much equivalent to the Bin Packing problem.

answered Nov 27 '18 at 8:24

Bass

22113

add a comment |

This seems to be pretty much equivalent to the Bin Packing problem.

answered Nov 27 '18 at 8:24

Bass

22113

add a comment |

This seems to be pretty much equivalent to the Bin Packing problem.

answered Nov 27 '18 at 8:24

Bass

22113

This seems to be pretty much equivalent to the Bin Packing problem.

answered Nov 27 '18 at 8:24

Bass

22113

answered Nov 27 '18 at 8:24

Bass

22113

answered Nov 27 '18 at 8:24

Bass

22113

answered Nov 27 '18 at 8:24

Bass

22113

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly