Why are true and false so large?
After finding out that several common commands (such as read
) are actually Bash builtins (and when running them at the prompt I'm actually running a two-line shell script which just forwards to the builtin), I was looking to see if the same is true for true
and false
.
Well, they are definitely binaries.
sh-4.2$ which true
/usr/bin/true
sh-4.2$ which false
/usr/bin/false
sh-4.2$ file /usr/bin/true
/usr/bin/true: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=2697339d3c19235
06e10af65aa3120b12295277e, stripped
sh-4.2$ file /usr/bin/false
/usr/bin/false: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=b160fa513fcc13
537d7293f05e40444fe5843640, stripped
sh-4.2$
However, what I found most surprising was their size. I expected them to be only a few bytes each, as true
is basically just exit 0
and false
is exit 1
.
sh-4.2$ true
sh-4.2$ echo $?
0
sh-4.2$ false
sh-4.2$ echo $?
1
sh-4.2$
However I found to my surprise that both files are over 28KB in size.
sh-4.2$ stat /usr/bin/true
File: '/usr/bin/true'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530320 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 19:46:32.703463708 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:17.447563336 +0000
Birth: -
sh-4.2$ stat /usr/bin/false
File: '/usr/bin/false'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530697 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 20:06:27.210764704 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:18.148561245 +0000
Birth: -
sh-4.2$
So my question is: Why are they so big? What's in the executable other than the return code?
PS: I am using RHEL 7.4
linux reverse-engineering
|
show 10 more comments
After finding out that several common commands (such as read
) are actually Bash builtins (and when running them at the prompt I'm actually running a two-line shell script which just forwards to the builtin), I was looking to see if the same is true for true
and false
.
Well, they are definitely binaries.
sh-4.2$ which true
/usr/bin/true
sh-4.2$ which false
/usr/bin/false
sh-4.2$ file /usr/bin/true
/usr/bin/true: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=2697339d3c19235
06e10af65aa3120b12295277e, stripped
sh-4.2$ file /usr/bin/false
/usr/bin/false: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=b160fa513fcc13
537d7293f05e40444fe5843640, stripped
sh-4.2$
However, what I found most surprising was their size. I expected them to be only a few bytes each, as true
is basically just exit 0
and false
is exit 1
.
sh-4.2$ true
sh-4.2$ echo $?
0
sh-4.2$ false
sh-4.2$ echo $?
1
sh-4.2$
However I found to my surprise that both files are over 28KB in size.
sh-4.2$ stat /usr/bin/true
File: '/usr/bin/true'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530320 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 19:46:32.703463708 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:17.447563336 +0000
Birth: -
sh-4.2$ stat /usr/bin/false
File: '/usr/bin/false'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530697 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 20:06:27.210764704 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:18.148561245 +0000
Birth: -
sh-4.2$
So my question is: Why are they so big? What's in the executable other than the return code?
PS: I am using RHEL 7.4
linux reverse-engineering
9
You should usecommand -V true
notwhich
. It will output:true is a shell builtin
for bash.
– meuh
Jan 25 '18 at 20:53
32
true
andfalse
are builtins in every modern shell, but the systems also includes external program versions of them because it's part of the standard system so that programs invoking commands directly (bypassing the shell) can use them.which
ignores builtins, and looks up external commands only, which is why it only showed you the external ones. Trytype -a true
andtype -a false
instead.
– mtraceur
Jan 25 '18 at 22:15
73
It's ironic that you write such a long question to say "Why aretrue
andfalse
29kb each? What's in the executable other than the return code?"
– David Richerby
Jan 25 '18 at 23:51
6
Some early versions of unix just had an empty file for true since that was a valid sh program that would return exit code 0. I really wish I could find an article I read years ago about the history of the true utility from an empty file to the monstrosity it is today, but all I could find is this: trillian.mit.edu/~jc/humor/ATT_Copyright_true.html
– Philip
Jan 26 '18 at 4:16
9
Obligatory - the smallest implementation offalse
: muppetlabs.com/~breadbox/software/tiny/teensy.html
– d33tah
Jan 26 '18 at 14:36
|
show 10 more comments
After finding out that several common commands (such as read
) are actually Bash builtins (and when running them at the prompt I'm actually running a two-line shell script which just forwards to the builtin), I was looking to see if the same is true for true
and false
.
Well, they are definitely binaries.
sh-4.2$ which true
/usr/bin/true
sh-4.2$ which false
/usr/bin/false
sh-4.2$ file /usr/bin/true
/usr/bin/true: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=2697339d3c19235
06e10af65aa3120b12295277e, stripped
sh-4.2$ file /usr/bin/false
/usr/bin/false: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=b160fa513fcc13
537d7293f05e40444fe5843640, stripped
sh-4.2$
However, what I found most surprising was their size. I expected them to be only a few bytes each, as true
is basically just exit 0
and false
is exit 1
.
sh-4.2$ true
sh-4.2$ echo $?
0
sh-4.2$ false
sh-4.2$ echo $?
1
sh-4.2$
However I found to my surprise that both files are over 28KB in size.
sh-4.2$ stat /usr/bin/true
File: '/usr/bin/true'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530320 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 19:46:32.703463708 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:17.447563336 +0000
Birth: -
sh-4.2$ stat /usr/bin/false
File: '/usr/bin/false'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530697 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 20:06:27.210764704 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:18.148561245 +0000
Birth: -
sh-4.2$
So my question is: Why are they so big? What's in the executable other than the return code?
PS: I am using RHEL 7.4
linux reverse-engineering
After finding out that several common commands (such as read
) are actually Bash builtins (and when running them at the prompt I'm actually running a two-line shell script which just forwards to the builtin), I was looking to see if the same is true for true
and false
.
Well, they are definitely binaries.
sh-4.2$ which true
/usr/bin/true
sh-4.2$ which false
/usr/bin/false
sh-4.2$ file /usr/bin/true
/usr/bin/true: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=2697339d3c19235
06e10af65aa3120b12295277e, stripped
sh-4.2$ file /usr/bin/false
/usr/bin/false: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=b160fa513fcc13
537d7293f05e40444fe5843640, stripped
sh-4.2$
However, what I found most surprising was their size. I expected them to be only a few bytes each, as true
is basically just exit 0
and false
is exit 1
.
sh-4.2$ true
sh-4.2$ echo $?
0
sh-4.2$ false
sh-4.2$ echo $?
1
sh-4.2$
However I found to my surprise that both files are over 28KB in size.
sh-4.2$ stat /usr/bin/true
File: '/usr/bin/true'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530320 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 19:46:32.703463708 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:17.447563336 +0000
Birth: -
sh-4.2$ stat /usr/bin/false
File: '/usr/bin/false'
Size: 28920 Blocks: 64 IO Block: 4096 regular file
Device: fd2ch/64812d Inode: 530697 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-25 20:06:27.210764704 +0000
Modify: 2016-06-30 09:44:27.000000000 +0100
Change: 2017-12-22 09:43:18.148561245 +0000
Birth: -
sh-4.2$
So my question is: Why are they so big? What's in the executable other than the return code?
PS: I am using RHEL 7.4
linux reverse-engineering
linux reverse-engineering
edited Feb 3 '18 at 12:16
Rui F Ribeiro
40.3k1479137
40.3k1479137
asked Jan 25 '18 at 20:14
KidburlaKidburla
5531313
5531313
9
You should usecommand -V true
notwhich
. It will output:true is a shell builtin
for bash.
– meuh
Jan 25 '18 at 20:53
32
true
andfalse
are builtins in every modern shell, but the systems also includes external program versions of them because it's part of the standard system so that programs invoking commands directly (bypassing the shell) can use them.which
ignores builtins, and looks up external commands only, which is why it only showed you the external ones. Trytype -a true
andtype -a false
instead.
– mtraceur
Jan 25 '18 at 22:15
73
It's ironic that you write such a long question to say "Why aretrue
andfalse
29kb each? What's in the executable other than the return code?"
– David Richerby
Jan 25 '18 at 23:51
6
Some early versions of unix just had an empty file for true since that was a valid sh program that would return exit code 0. I really wish I could find an article I read years ago about the history of the true utility from an empty file to the monstrosity it is today, but all I could find is this: trillian.mit.edu/~jc/humor/ATT_Copyright_true.html
– Philip
Jan 26 '18 at 4:16
9
Obligatory - the smallest implementation offalse
: muppetlabs.com/~breadbox/software/tiny/teensy.html
– d33tah
Jan 26 '18 at 14:36
|
show 10 more comments
9
You should usecommand -V true
notwhich
. It will output:true is a shell builtin
for bash.
– meuh
Jan 25 '18 at 20:53
32
true
andfalse
are builtins in every modern shell, but the systems also includes external program versions of them because it's part of the standard system so that programs invoking commands directly (bypassing the shell) can use them.which
ignores builtins, and looks up external commands only, which is why it only showed you the external ones. Trytype -a true
andtype -a false
instead.
– mtraceur
Jan 25 '18 at 22:15
73
It's ironic that you write such a long question to say "Why aretrue
andfalse
29kb each? What's in the executable other than the return code?"
– David Richerby
Jan 25 '18 at 23:51
6
Some early versions of unix just had an empty file for true since that was a valid sh program that would return exit code 0. I really wish I could find an article I read years ago about the history of the true utility from an empty file to the monstrosity it is today, but all I could find is this: trillian.mit.edu/~jc/humor/ATT_Copyright_true.html
– Philip
Jan 26 '18 at 4:16
9
Obligatory - the smallest implementation offalse
: muppetlabs.com/~breadbox/software/tiny/teensy.html
– d33tah
Jan 26 '18 at 14:36
9
9
You should use
command -V true
not which
. It will output: true is a shell builtin
for bash.– meuh
Jan 25 '18 at 20:53
You should use
command -V true
not which
. It will output: true is a shell builtin
for bash.– meuh
Jan 25 '18 at 20:53
32
32
true
and false
are builtins in every modern shell, but the systems also includes external program versions of them because it's part of the standard system so that programs invoking commands directly (bypassing the shell) can use them. which
ignores builtins, and looks up external commands only, which is why it only showed you the external ones. Try type -a true
and type -a false
instead.– mtraceur
Jan 25 '18 at 22:15
true
and false
are builtins in every modern shell, but the systems also includes external program versions of them because it's part of the standard system so that programs invoking commands directly (bypassing the shell) can use them. which
ignores builtins, and looks up external commands only, which is why it only showed you the external ones. Try type -a true
and type -a false
instead.– mtraceur
Jan 25 '18 at 22:15
73
73
It's ironic that you write such a long question to say "Why are
true
and false
29kb each? What's in the executable other than the return code?"– David Richerby
Jan 25 '18 at 23:51
It's ironic that you write such a long question to say "Why are
true
and false
29kb each? What's in the executable other than the return code?"– David Richerby
Jan 25 '18 at 23:51
6
6
Some early versions of unix just had an empty file for true since that was a valid sh program that would return exit code 0. I really wish I could find an article I read years ago about the history of the true utility from an empty file to the monstrosity it is today, but all I could find is this: trillian.mit.edu/~jc/humor/ATT_Copyright_true.html
– Philip
Jan 26 '18 at 4:16
Some early versions of unix just had an empty file for true since that was a valid sh program that would return exit code 0. I really wish I could find an article I read years ago about the history of the true utility from an empty file to the monstrosity it is today, but all I could find is this: trillian.mit.edu/~jc/humor/ATT_Copyright_true.html
– Philip
Jan 26 '18 at 4:16
9
9
Obligatory - the smallest implementation of
false
: muppetlabs.com/~breadbox/software/tiny/teensy.html– d33tah
Jan 26 '18 at 14:36
Obligatory - the smallest implementation of
false
: muppetlabs.com/~breadbox/software/tiny/teensy.html– d33tah
Jan 26 '18 at 14:36
|
show 10 more comments
4 Answers
4
active
oldest
votes
Introductory notes:
In the past, /bin/true
and /bin/false
in the shell were actually scripts.
For instance, in a PDP/11 Unix System 7:
$ ls -la /bin/true /bin/false
-rwxr-xr-x 1 bin 7 Jun 8 1979 /bin/false
-rwxr-xr-x 1 bin 0 Jun 8 1979 /bin/true
$
$ cat /bin/false
exit 1
$
$ cat /bin/true
$
Nowadays, at least in bash
, the true
and false
commands are implemented as shell built-in commands. Thus no executable binary files are invoked by default, both when using the false
and true
directives in the bash
command line and inside shell scripts.
From the bash
source, builtins/mkbuiltins.c
:
char *posix_builtins =
{
"alias", "bg", "cd", "command", "**false**", "fc", "fg", "getopts", "jobs",
"kill", "newgrp", "pwd", "read", "**true**", "umask", "unalias", "wait",
(char *)NULL
};
Also per @meuh comments:
$ command -V true false
true is a shell builtin
false is a shell builtin
So it can be said with a high degree of certainty the true
and false
executable files exist mainly for being called from other programs.
From now on, the answer will focus on the /bin/true
binary from the coreutils
package in Debian 9 / 64 bits. (/usr/bin/true
running RedHat. RedHat and Debian use both the coreutils
package, analysed the compiled version of the latter having it more at hand).
As it can be seen in the source file false.c
, /bin/false
is compiled with (almost) the same source code as /bin/true
, just returning EXIT_FAILURE (1) instead, so this answer can be applied for both binaries.
#define EXIT_STATUS EXIT_FAILURE
#include "true.c"
As it also can be confirmed by both executables having the same size:
$ ls -l /bin/true /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/true
Alas, the direct question to the answer why are true and false so large?
could be, because there are not anymore so pressing reasons to care about their top performance. They are not essential to bash
performance, not being used anymore by bash
(scripting).
Similar comments apply to their size, 26KB for the kind of hardware we have nowadays is insignificant. Space is not at premium for the typical server/desktop anymore, and they do not even bother anymore to use the same binary for false
and true
, as it is just deployed twice in distributions using coreutils
.
Focusing, however, in the real spirit of the question, why something that should be so simple and small, gets so large?
The real distribution of the sections of /bin/true
is as these charts shows; the main code+data amounts to roughly 3KB out of a 26KB binary, which amounts to 12% of the size of /bin/true
.
The true
utility got indeed more cruft code over the years, most notably the standard support for --version
and --help
.
However, that it is not the (only) main justification for it being so big, but rather, while being dynamically linked (using shared libs), also having part of a generic library commonly used by coreutils
binaries linked as a static library. The metada for building an elf
executable file also amounts for a significant part of the binary, being it a relatively small file by today´s standards.
The rest of the answer is for explaining how we got to build the following charts detailing the composition of the /bin/true
executable binary file and how we arrived to that conclusion.
As @Maks says, the binary was compiled from C; as per my comment also, it is also confirmed it is from coreutils. We are pointing directly to the author(s) git https://github.com/wertarbyte/coreutils/blob/master/src/true.c, instead of the gnu git as @Maks (same sources, different repositories - this repository was selected as it has the full source of the coreutils
libraries)
We can see the various building blocks of the /bin/true
binary here (Debian 9 - 64 bits from coreutils
):
$ file /bin/true
/bin/true: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=9ae82394864538fa7b23b7f87b259ea2a20889c4, stripped
$ size /bin/true
text data bss dec hex filename
24583 1160 416 26159 662f true
Of those:
- text (usually code) is around 24KB
- data (initialised variables, mostly strings) are around 1KB
- bss (uninitialized data) 0.5KB
Of the 24KB, around 1KB is for fixing up the 58 external functions.
That still leaves around roughly 23KB for rest of the code. We will show down bellow that the actual main file - main()+usage() code is around 1KB compiled, and explain what the other 22KB are used for.
Drilling further down the binary with readelf -S true
, we can see that while the binary is 26159 bytes, the actual compiled code is 13017 bytes, and the rest is assorted data/initialisation code.
However, true.c
is not the whole story and 13KB seems pretty much excessive if it were only that file; we can see functions called in main()
that are not listed in the external functions seen in the elf with objdump -T true
; functions that are present at:
https://github.com/coreutils/gnulib/blob/master/lib/progname.c- https://github.com/coreutils/gnulib/blob/master/lib/closeout.c
- https://github.com/coreutils/gnulib/blob/master/lib/version-etc.c
Those extra functions not linked externally in main()
are:
- set_program_name()
- close_stdout()
- version_etc()
So my first suspicion was partly correct, whilst the library is using dynamic libraries, the /bin/true
binary is big *because it has some static libraries included with it* (but that is not the only cause).
Compiling C code is not usually that inefficient for having such space unaccounted for, hence my initial suspicion something was amiss.
The extra space, almost 90% of the size of the binary, is indeed extra libraries/elf metadata.
While using Hopper for disassembling/decompiling the binary to understand where functions are, it can be seen the compiled binary code of true.c/usage() function is actually 833 bytes, and of the true.c/main() function is 225 bytes, which is roughly slightly less than 1KB. The logic for version functions, which is buried in the static libraries, is around 1KB.
The actual compiled main()+usage()+version()+strings+vars are only using up around 3KB to 3.5KB.
It is indeed ironic, such small and humble utilities have became bigger in size for the reasons explained above.
related question: Understanding what a Linux binary is doing
true.c
main() with the offending function calls:
int
main (int argc, char **argv)
{
/* Recognize --help or --version only if it's the only command-line
argument. */
if (argc == 2)
{
initialize_main (&argc, &argv);
set_program_name (argv[0]); <-----------
setlocale (LC_ALL, "");
bindtextdomain (PACKAGE, LOCALEDIR);
textdomain (PACKAGE);
atexit (close_stdout); <-----
if (STREQ (argv[1], "--help"))
usage (EXIT_STATUS);
if (STREQ (argv[1], "--version"))
version_etc (stdout, PROGRAM_NAME, PACKAGE_NAME, Version, AUTHORS, <------
(char *) NULL);
}
exit (EXIT_STATUS);
}
The decimal size of the various sections of the binary:
$ size -A -t true
true :
section size addr
.interp 28 568
.note.ABI-tag 32 596
.note.gnu.build-id 36 628
.gnu.hash 60 664
.dynsym 1416 728
.dynstr 676 2144
.gnu.version 118 2820
.gnu.version_r 96 2944
.rela.dyn 624 3040
.rela.plt 1104 3664
.init 23 4768
.plt 752 4800
.plt.got 8 5552
.text 13017 5568
.fini 9 18588
.rodata 3104 18624
.eh_frame_hdr 572 21728
.eh_frame 2908 22304
.init_array 8 2125160
.fini_array 8 2125168
.jcr 8 2125176
.data.rel.ro 88 2125184
.dynamic 480 2125272
.got 48 2125752
.got.plt 392 2125824
.data 128 2126240
.bss 416 2126368
.gnu_debuglink 52 0
Total 26211
Output of readelf -S true
$ readelf -S true
There are 30 section headers, starting at offset 0x7368:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000000254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000000274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000298 00000298
000000000000003c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000000002d8 000002d8
0000000000000588 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000000860 00000860
00000000000002a4 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000000b04 00000b04
0000000000000076 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000000b80 00000b80
0000000000000060 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000000be0 00000be0
0000000000000270 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000000e50 00000e50
0000000000000450 0000000000000018 AI 5 25 8
[11] .init PROGBITS 00000000000012a0 000012a0
0000000000000017 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000000012c0 000012c0
00000000000002f0 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 00000000000015b0 000015b0
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 00000000000015c0 000015c0
00000000000032d9 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 000000000000489c 0000489c
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 00000000000048c0 000048c0
0000000000000c20 0000000000000000 A 0 0 32
[17] .eh_frame_hdr PROGBITS 00000000000054e0 000054e0
000000000000023c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 0000000000005720 00005720
0000000000000b5c 0000000000000000 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000206d68 00006d68
0000000000000008 0000000000000008 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000206d70 00006d70
0000000000000008 0000000000000008 WA 0 0 8
[21] .jcr PROGBITS 0000000000206d78 00006d78
0000000000000008 0000000000000000 WA 0 0 8
[22] .data.rel.ro PROGBITS 0000000000206d80 00006d80
0000000000000058 0000000000000000 WA 0 0 32
[23] .dynamic DYNAMIC 0000000000206dd8 00006dd8
00000000000001e0 0000000000000010 WA 6 0 8
[24] .got PROGBITS 0000000000206fb8 00006fb8
0000000000000030 0000000000000008 WA 0 0 8
[25] .got.plt PROGBITS 0000000000207000 00007000
0000000000000188 0000000000000008 WA 0 0 8
[26] .data PROGBITS 00000000002071a0 000071a0
0000000000000080 0000000000000000 WA 0 0 32
[27] .bss NOBITS 0000000000207220 00007220
00000000000001a0 0000000000000000 WA 0 0 32
[28] .gnu_debuglink PROGBITS 0000000000000000 00007220
0000000000000034 0000000000000000 0 0 1
[29] .shstrtab STRTAB 0000000000000000 00007254
000000000000010f 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Output of objdump -T true
(external functions dynamically linked on run-time)
$ objdump -T true
true: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __uflow
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 getenv
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 free
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 abort
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __errno_location
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strncmp
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 _exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __fpending
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 textdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fclose
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 bindtextdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 dcgettext
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __ctype_get_mb_cur_max
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strlen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.4 __stack_chk_fail
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbrtowc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strrchr
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 lseek
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memset
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fscanf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 close
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memcmp
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fputs_unlocked
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 calloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strcmp
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.14 memcpy
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fileno
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 malloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fflush
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 nl_langinfo
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 ungetc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __freading
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 realloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fdopen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 setlocale
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __printf_chk
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 error
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 open
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fseeko
0000000000000000 w D *UND* 0000000000000000 _Jv_RegisterClasses
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_atexit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fwrite
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __fprintf_chk
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbsinit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 iswprint
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3 __ctype_b_loc
0000000000207228 g DO .bss 0000000000000008 GLIBC_2.2.5 stdout
0000000000207220 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname
0000000000207230 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_name
0000000000207230 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname_full
0000000000207220 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_short_name
0000000000207240 g DO .bss 0000000000000008 GLIBC_2.2.5 stderr
5
Having done some programming recently with a 64kB+2kB microcontroller, 28kB doesn't seem all that small..
– Barleyman
Jan 26 '18 at 16:49
1
@Barleyman you have OpenWRT, yocto, uClinux, uclib, busybox, microcoreutils, and other solutions for that kind of environments. Edited the post with your concern.
– Rui F Ribeiro
Jan 27 '18 at 9:09
3
@Barleyman: If you were optimizing for binary executable size, you can implementtrue
orfalse
with a 45-byte x86 ELF executable, packing the executable code (4 x86 instructions) inside the ELF program header (without support for any command-line options!). A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. (Or slightly larger if you want to avoid depending on Linux ELF loader implementation details :P)
– Peter Cordes
Jan 28 '18 at 23:39
3
Not really, no. Yocto for example can be crammed into less than a megabyte which is heaps and bounds above 64kB.. In this kind of device you may use RTOS of some kind with rudimentary process / memory management but even those can easily become too heavy. I wrote a simple cooperative multithreading system and used the built in memory protection to protect code from being overwritten. All told the firmware consumes some 55kB right now so not too much room there for additional overhead. Those ginormous 2kB look up tables..
– Barleyman
Jan 29 '18 at 0:09
2
@PeterCordes for sure but you need couple of magnitudes of more resources before Linux becomes viable. For what it's worth, C++ doesn't really work in that environment either. Well, not the standard libraries anyways. Iostream is right out at around 200kB etc.
– Barleyman
Jan 29 '18 at 0:19
|
show 1 more comment
The implementation probably comes from GNU coreutils. These binaries are compiled from C; no particular effort has been made to make them smaller than they are by default.
You could try to compile the trivial implementation of true
yourself, and you'll notice it's already few KB in size. For example, on my system:
$ echo 'int main() { return 0; }' | gcc -xc - -o true
$ wc -c true
8136 true
Of course, your binaries are even bigger. That's because they also support command line arguments. Try running /usr/bin/true --help
or /usr/bin/true --version
.
In addition to the string data, the binary includes logic to parse command line flags, etc. That adds up to about 20 KB of code, apparently.
For reference, you can find the source code here: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c
2
FYI I was complaining about these coreutils implementations on their bug tracker, but no chance to get it fixed lists.gnu.org/archive/html/bug-coreutils/2016-03/msg00040.html
– rudimeier
Jan 25 '18 at 21:43
6
It is not the logic for arguments, C is not that inefficient...is inline libraries/house keeping tasks. Have a look at my answer for the gory details.
– Rui F Ribeiro
Jan 25 '18 at 21:59
7
This is misleading because it suggests that compiled machine code (from C or otherwise) is what takes the huge amount of space - the actual size overhead has more to do with massive amounts of standard C library/runtime boilerplate that gets inlined by the compiler in order to interoperate with the C library (glibc, unless you've heard that your system uses something else, probably), and, to a lesser extent, ELF headers/metadata (a lot of which are not strictly necessary, but deemed worthwhile enough to include in default builds).
– mtraceur
Jan 25 '18 at 22:10
2
The actual main()+usage()+strings on both functions are around 2KB, not 20KB.
– Rui F Ribeiro
Jan 25 '18 at 23:11
2
@JdeBP logic for --version/version funtions 1KB, --usage/--help 833 bytes, main() 225 bytes and the whole static data of the binary is 1KB
– Rui F Ribeiro
Jan 26 '18 at 9:04
|
show 3 more comments
Stripping them down to core functionality and writing in assembler yields far smaller binaries.
Original true/false binaries are written in C, which by its nature pulls in various library + symbol references. If you run readelf -a /bin/true
this is quite noticeable.
352 bytes for a stripped ELF static executable (with room to save a couple bytes by optimizing the asm for code-size).
$ more true.asm false.asm
::::::::::::::
true.asm
::::::::::::::
global _start
_start:
mov ebx,0
mov eax,1 ; SYS_exit from asm/unistd_32.h
int 0x80 ; The 32-bit ABI is supported in 64-bit code, in kernels compiled with IA-32 emulation
::::::::::::::
false.asm
::::::::::::::
global _start
_start:
mov ebx,1
mov eax,1
int 0x80
$ nasm -f elf64 true.asm && ld -s -o true true.o # -s means strip
$ nasm -f elf64 false.asm && ld -s -o false false.o
$ ll true false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 true
$ ./true ; echo $?
0
$ ./false ; echo $?
1
$
Or, with a bit of a nasty/ingenious approach (kudos to stalkr), create your own ELF headers, getting it down to 132 127 bytes. We're entering Code Golf territory here.
$ cat true2.asm
BITS 64
org 0x400000 ; _start is at 0x400080 as usual, but the ELF headers come first
ehdr: ; Elf64_Ehdr
db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 0x3e ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf64_Phdr
dd 1 ; p_type
dd 5 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
xor edi,edi ; int status = 0
; or mov dil,1 for false: high bytes are ignored.
lea eax, [rdi+60] ; rax = 60 = SYS_exit, using a 3-byte instruction: base+disp8 addressing mode
syscall ; native 64-bit system call, works without CONFIG_IA32_EMULATION
; less-golfed version:
; mov edi, 1 ; for false
; mov eax,252 ; SYS_exit_group from asm/unistd_64.h
; syscall
filesize equ $ - $$ ; used earlier in some ELF header fields
$ nasm -f bin -o true2 true2.asm
$ ll true2
-rw-r--r-- 1 peter peter 127 Jan 28 20:08 true2
$ chmod +x true2 ; ./true2 ; echo $?
0
$
2
Comments are not for extended discussion; this conversation has been moved to chat.
– terdon♦
Jan 28 '18 at 16:27
2
Also see this excellent write-up: muppetlabs.com/~breadbox/software/tiny/teensy.html
– mic_e
Jan 28 '18 at 21:38
3
You're using theint 0x80
32-bit ABI in a 64-bit executable, which is unusual but supported. Usingsyscall
wouldn't save you anything. The high bytes ofebx
are ignored, so you could use 2-bytemov bl,1
. Or of coursexor ebx,ebx
for zero. Linux inits integer registers to zero, so you could justinc eax
to get 1 = __NR_exit (i386 ABI).
– Peter Cordes
Jan 28 '18 at 23:48
1
I updated the code on your golfed example to use the 64-bit ABI, and golf it down to 127 bytes fortrue
. (I don't see an easy way to manage less than 128 bytes forfalse
, though, other than using the 32-bit ABI or taking advantage of the fact that Linux zeros registers on process startup, somov al,252
(2 bytes) works.push imm8
/pop rdi
would also work instead oflea
for settingedi=1
, but we still can't beat the 32-bit ABI where we couldmov bl,1
without a REX prefix.
– Peter Cordes
Jan 29 '18 at 0:17
add a comment |
l $(which true false)
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/false
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/true
Pretty big on my Ubuntu 16.04 too. exactly the same size? What makes them so big?
strings $(which true)
(excerpt:)
Usage: %s [ignored command line arguments]
or: %s OPTION
Exit with a status code indicating success.
--help display this help and exit
--version output version information and exit
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here. Please refer to your shell's documentation
for details about the options it supports.
http://www.gnu.org/software/coreutils/
Report %s translation bugs to <http://translationproject.org/team/>
Full documentation at: <%s%s>
or available locally via: info '(coreutils) %s%s'
Ah, there is help for true and false, so let's try it:
true --help
true --version
#
Nothing. Ah, there was this other line:
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here.
So on my system, it's /bin/true, not /usr/bin/true
/bin/true --version
true (GNU coreutils) 8.25
Copyright © 2016 Free Software Foundation, Inc.
Lizenz GPLv3+: GNU GPL Version 3 oder höher <http://gnu.org/licenses/gpl.html>
Dies ist freie Software: Sie können sie ändern und weitergeben.
Es gibt keinerlei Garantien, soweit wie es das Gesetz erlaubt.
Geschrieben von Jim Meyering.
LANG=C /bin/true --version
true (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Jim Meyering.
So there is help, there is version information, binding to a library for internationalization. This explains much of the size, and the shell uses its optimized command anyway and most of the time.
Including static libraries, and half of the size of binary for elf metada. See my answer.
– Rui F Ribeiro
Feb 18 '18 at 19:13
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f419697%2fwhy-are-true-and-false-so-large%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Introductory notes:
In the past, /bin/true
and /bin/false
in the shell were actually scripts.
For instance, in a PDP/11 Unix System 7:
$ ls -la /bin/true /bin/false
-rwxr-xr-x 1 bin 7 Jun 8 1979 /bin/false
-rwxr-xr-x 1 bin 0 Jun 8 1979 /bin/true
$
$ cat /bin/false
exit 1
$
$ cat /bin/true
$
Nowadays, at least in bash
, the true
and false
commands are implemented as shell built-in commands. Thus no executable binary files are invoked by default, both when using the false
and true
directives in the bash
command line and inside shell scripts.
From the bash
source, builtins/mkbuiltins.c
:
char *posix_builtins =
{
"alias", "bg", "cd", "command", "**false**", "fc", "fg", "getopts", "jobs",
"kill", "newgrp", "pwd", "read", "**true**", "umask", "unalias", "wait",
(char *)NULL
};
Also per @meuh comments:
$ command -V true false
true is a shell builtin
false is a shell builtin
So it can be said with a high degree of certainty the true
and false
executable files exist mainly for being called from other programs.
From now on, the answer will focus on the /bin/true
binary from the coreutils
package in Debian 9 / 64 bits. (/usr/bin/true
running RedHat. RedHat and Debian use both the coreutils
package, analysed the compiled version of the latter having it more at hand).
As it can be seen in the source file false.c
, /bin/false
is compiled with (almost) the same source code as /bin/true
, just returning EXIT_FAILURE (1) instead, so this answer can be applied for both binaries.
#define EXIT_STATUS EXIT_FAILURE
#include "true.c"
As it also can be confirmed by both executables having the same size:
$ ls -l /bin/true /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/true
Alas, the direct question to the answer why are true and false so large?
could be, because there are not anymore so pressing reasons to care about their top performance. They are not essential to bash
performance, not being used anymore by bash
(scripting).
Similar comments apply to their size, 26KB for the kind of hardware we have nowadays is insignificant. Space is not at premium for the typical server/desktop anymore, and they do not even bother anymore to use the same binary for false
and true
, as it is just deployed twice in distributions using coreutils
.
Focusing, however, in the real spirit of the question, why something that should be so simple and small, gets so large?
The real distribution of the sections of /bin/true
is as these charts shows; the main code+data amounts to roughly 3KB out of a 26KB binary, which amounts to 12% of the size of /bin/true
.
The true
utility got indeed more cruft code over the years, most notably the standard support for --version
and --help
.
However, that it is not the (only) main justification for it being so big, but rather, while being dynamically linked (using shared libs), also having part of a generic library commonly used by coreutils
binaries linked as a static library. The metada for building an elf
executable file also amounts for a significant part of the binary, being it a relatively small file by today´s standards.
The rest of the answer is for explaining how we got to build the following charts detailing the composition of the /bin/true
executable binary file and how we arrived to that conclusion.
As @Maks says, the binary was compiled from C; as per my comment also, it is also confirmed it is from coreutils. We are pointing directly to the author(s) git https://github.com/wertarbyte/coreutils/blob/master/src/true.c, instead of the gnu git as @Maks (same sources, different repositories - this repository was selected as it has the full source of the coreutils
libraries)
We can see the various building blocks of the /bin/true
binary here (Debian 9 - 64 bits from coreutils
):
$ file /bin/true
/bin/true: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=9ae82394864538fa7b23b7f87b259ea2a20889c4, stripped
$ size /bin/true
text data bss dec hex filename
24583 1160 416 26159 662f true
Of those:
- text (usually code) is around 24KB
- data (initialised variables, mostly strings) are around 1KB
- bss (uninitialized data) 0.5KB
Of the 24KB, around 1KB is for fixing up the 58 external functions.
That still leaves around roughly 23KB for rest of the code. We will show down bellow that the actual main file - main()+usage() code is around 1KB compiled, and explain what the other 22KB are used for.
Drilling further down the binary with readelf -S true
, we can see that while the binary is 26159 bytes, the actual compiled code is 13017 bytes, and the rest is assorted data/initialisation code.
However, true.c
is not the whole story and 13KB seems pretty much excessive if it were only that file; we can see functions called in main()
that are not listed in the external functions seen in the elf with objdump -T true
; functions that are present at:
https://github.com/coreutils/gnulib/blob/master/lib/progname.c- https://github.com/coreutils/gnulib/blob/master/lib/closeout.c
- https://github.com/coreutils/gnulib/blob/master/lib/version-etc.c
Those extra functions not linked externally in main()
are:
- set_program_name()
- close_stdout()
- version_etc()
So my first suspicion was partly correct, whilst the library is using dynamic libraries, the /bin/true
binary is big *because it has some static libraries included with it* (but that is not the only cause).
Compiling C code is not usually that inefficient for having such space unaccounted for, hence my initial suspicion something was amiss.
The extra space, almost 90% of the size of the binary, is indeed extra libraries/elf metadata.
While using Hopper for disassembling/decompiling the binary to understand where functions are, it can be seen the compiled binary code of true.c/usage() function is actually 833 bytes, and of the true.c/main() function is 225 bytes, which is roughly slightly less than 1KB. The logic for version functions, which is buried in the static libraries, is around 1KB.
The actual compiled main()+usage()+version()+strings+vars are only using up around 3KB to 3.5KB.
It is indeed ironic, such small and humble utilities have became bigger in size for the reasons explained above.
related question: Understanding what a Linux binary is doing
true.c
main() with the offending function calls:
int
main (int argc, char **argv)
{
/* Recognize --help or --version only if it's the only command-line
argument. */
if (argc == 2)
{
initialize_main (&argc, &argv);
set_program_name (argv[0]); <-----------
setlocale (LC_ALL, "");
bindtextdomain (PACKAGE, LOCALEDIR);
textdomain (PACKAGE);
atexit (close_stdout); <-----
if (STREQ (argv[1], "--help"))
usage (EXIT_STATUS);
if (STREQ (argv[1], "--version"))
version_etc (stdout, PROGRAM_NAME, PACKAGE_NAME, Version, AUTHORS, <------
(char *) NULL);
}
exit (EXIT_STATUS);
}
The decimal size of the various sections of the binary:
$ size -A -t true
true :
section size addr
.interp 28 568
.note.ABI-tag 32 596
.note.gnu.build-id 36 628
.gnu.hash 60 664
.dynsym 1416 728
.dynstr 676 2144
.gnu.version 118 2820
.gnu.version_r 96 2944
.rela.dyn 624 3040
.rela.plt 1104 3664
.init 23 4768
.plt 752 4800
.plt.got 8 5552
.text 13017 5568
.fini 9 18588
.rodata 3104 18624
.eh_frame_hdr 572 21728
.eh_frame 2908 22304
.init_array 8 2125160
.fini_array 8 2125168
.jcr 8 2125176
.data.rel.ro 88 2125184
.dynamic 480 2125272
.got 48 2125752
.got.plt 392 2125824
.data 128 2126240
.bss 416 2126368
.gnu_debuglink 52 0
Total 26211
Output of readelf -S true
$ readelf -S true
There are 30 section headers, starting at offset 0x7368:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000000254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000000274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000298 00000298
000000000000003c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000000002d8 000002d8
0000000000000588 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000000860 00000860
00000000000002a4 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000000b04 00000b04
0000000000000076 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000000b80 00000b80
0000000000000060 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000000be0 00000be0
0000000000000270 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000000e50 00000e50
0000000000000450 0000000000000018 AI 5 25 8
[11] .init PROGBITS 00000000000012a0 000012a0
0000000000000017 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000000012c0 000012c0
00000000000002f0 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 00000000000015b0 000015b0
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 00000000000015c0 000015c0
00000000000032d9 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 000000000000489c 0000489c
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 00000000000048c0 000048c0
0000000000000c20 0000000000000000 A 0 0 32
[17] .eh_frame_hdr PROGBITS 00000000000054e0 000054e0
000000000000023c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 0000000000005720 00005720
0000000000000b5c 0000000000000000 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000206d68 00006d68
0000000000000008 0000000000000008 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000206d70 00006d70
0000000000000008 0000000000000008 WA 0 0 8
[21] .jcr PROGBITS 0000000000206d78 00006d78
0000000000000008 0000000000000000 WA 0 0 8
[22] .data.rel.ro PROGBITS 0000000000206d80 00006d80
0000000000000058 0000000000000000 WA 0 0 32
[23] .dynamic DYNAMIC 0000000000206dd8 00006dd8
00000000000001e0 0000000000000010 WA 6 0 8
[24] .got PROGBITS 0000000000206fb8 00006fb8
0000000000000030 0000000000000008 WA 0 0 8
[25] .got.plt PROGBITS 0000000000207000 00007000
0000000000000188 0000000000000008 WA 0 0 8
[26] .data PROGBITS 00000000002071a0 000071a0
0000000000000080 0000000000000000 WA 0 0 32
[27] .bss NOBITS 0000000000207220 00007220
00000000000001a0 0000000000000000 WA 0 0 32
[28] .gnu_debuglink PROGBITS 0000000000000000 00007220
0000000000000034 0000000000000000 0 0 1
[29] .shstrtab STRTAB 0000000000000000 00007254
000000000000010f 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Output of objdump -T true
(external functions dynamically linked on run-time)
$ objdump -T true
true: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __uflow
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 getenv
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 free
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 abort
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __errno_location
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strncmp
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 _exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __fpending
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 textdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fclose
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 bindtextdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 dcgettext
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __ctype_get_mb_cur_max
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strlen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.4 __stack_chk_fail
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbrtowc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strrchr
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 lseek
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memset
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fscanf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 close
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memcmp
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fputs_unlocked
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 calloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strcmp
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.14 memcpy
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fileno
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 malloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fflush
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 nl_langinfo
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 ungetc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __freading
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 realloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fdopen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 setlocale
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __printf_chk
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 error
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 open
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fseeko
0000000000000000 w D *UND* 0000000000000000 _Jv_RegisterClasses
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_atexit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fwrite
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __fprintf_chk
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbsinit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 iswprint
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3 __ctype_b_loc
0000000000207228 g DO .bss 0000000000000008 GLIBC_2.2.5 stdout
0000000000207220 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname
0000000000207230 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_name
0000000000207230 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname_full
0000000000207220 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_short_name
0000000000207240 g DO .bss 0000000000000008 GLIBC_2.2.5 stderr
5
Having done some programming recently with a 64kB+2kB microcontroller, 28kB doesn't seem all that small..
– Barleyman
Jan 26 '18 at 16:49
1
@Barleyman you have OpenWRT, yocto, uClinux, uclib, busybox, microcoreutils, and other solutions for that kind of environments. Edited the post with your concern.
– Rui F Ribeiro
Jan 27 '18 at 9:09
3
@Barleyman: If you were optimizing for binary executable size, you can implementtrue
orfalse
with a 45-byte x86 ELF executable, packing the executable code (4 x86 instructions) inside the ELF program header (without support for any command-line options!). A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. (Or slightly larger if you want to avoid depending on Linux ELF loader implementation details :P)
– Peter Cordes
Jan 28 '18 at 23:39
3
Not really, no. Yocto for example can be crammed into less than a megabyte which is heaps and bounds above 64kB.. In this kind of device you may use RTOS of some kind with rudimentary process / memory management but even those can easily become too heavy. I wrote a simple cooperative multithreading system and used the built in memory protection to protect code from being overwritten. All told the firmware consumes some 55kB right now so not too much room there for additional overhead. Those ginormous 2kB look up tables..
– Barleyman
Jan 29 '18 at 0:09
2
@PeterCordes for sure but you need couple of magnitudes of more resources before Linux becomes viable. For what it's worth, C++ doesn't really work in that environment either. Well, not the standard libraries anyways. Iostream is right out at around 200kB etc.
– Barleyman
Jan 29 '18 at 0:19
|
show 1 more comment
Introductory notes:
In the past, /bin/true
and /bin/false
in the shell were actually scripts.
For instance, in a PDP/11 Unix System 7:
$ ls -la /bin/true /bin/false
-rwxr-xr-x 1 bin 7 Jun 8 1979 /bin/false
-rwxr-xr-x 1 bin 0 Jun 8 1979 /bin/true
$
$ cat /bin/false
exit 1
$
$ cat /bin/true
$
Nowadays, at least in bash
, the true
and false
commands are implemented as shell built-in commands. Thus no executable binary files are invoked by default, both when using the false
and true
directives in the bash
command line and inside shell scripts.
From the bash
source, builtins/mkbuiltins.c
:
char *posix_builtins =
{
"alias", "bg", "cd", "command", "**false**", "fc", "fg", "getopts", "jobs",
"kill", "newgrp", "pwd", "read", "**true**", "umask", "unalias", "wait",
(char *)NULL
};
Also per @meuh comments:
$ command -V true false
true is a shell builtin
false is a shell builtin
So it can be said with a high degree of certainty the true
and false
executable files exist mainly for being called from other programs.
From now on, the answer will focus on the /bin/true
binary from the coreutils
package in Debian 9 / 64 bits. (/usr/bin/true
running RedHat. RedHat and Debian use both the coreutils
package, analysed the compiled version of the latter having it more at hand).
As it can be seen in the source file false.c
, /bin/false
is compiled with (almost) the same source code as /bin/true
, just returning EXIT_FAILURE (1) instead, so this answer can be applied for both binaries.
#define EXIT_STATUS EXIT_FAILURE
#include "true.c"
As it also can be confirmed by both executables having the same size:
$ ls -l /bin/true /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/true
Alas, the direct question to the answer why are true and false so large?
could be, because there are not anymore so pressing reasons to care about their top performance. They are not essential to bash
performance, not being used anymore by bash
(scripting).
Similar comments apply to their size, 26KB for the kind of hardware we have nowadays is insignificant. Space is not at premium for the typical server/desktop anymore, and they do not even bother anymore to use the same binary for false
and true
, as it is just deployed twice in distributions using coreutils
.
Focusing, however, in the real spirit of the question, why something that should be so simple and small, gets so large?
The real distribution of the sections of /bin/true
is as these charts shows; the main code+data amounts to roughly 3KB out of a 26KB binary, which amounts to 12% of the size of /bin/true
.
The true
utility got indeed more cruft code over the years, most notably the standard support for --version
and --help
.
However, that it is not the (only) main justification for it being so big, but rather, while being dynamically linked (using shared libs), also having part of a generic library commonly used by coreutils
binaries linked as a static library. The metada for building an elf
executable file also amounts for a significant part of the binary, being it a relatively small file by today´s standards.
The rest of the answer is for explaining how we got to build the following charts detailing the composition of the /bin/true
executable binary file and how we arrived to that conclusion.
As @Maks says, the binary was compiled from C; as per my comment also, it is also confirmed it is from coreutils. We are pointing directly to the author(s) git https://github.com/wertarbyte/coreutils/blob/master/src/true.c, instead of the gnu git as @Maks (same sources, different repositories - this repository was selected as it has the full source of the coreutils
libraries)
We can see the various building blocks of the /bin/true
binary here (Debian 9 - 64 bits from coreutils
):
$ file /bin/true
/bin/true: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=9ae82394864538fa7b23b7f87b259ea2a20889c4, stripped
$ size /bin/true
text data bss dec hex filename
24583 1160 416 26159 662f true
Of those:
- text (usually code) is around 24KB
- data (initialised variables, mostly strings) are around 1KB
- bss (uninitialized data) 0.5KB
Of the 24KB, around 1KB is for fixing up the 58 external functions.
That still leaves around roughly 23KB for rest of the code. We will show down bellow that the actual main file - main()+usage() code is around 1KB compiled, and explain what the other 22KB are used for.
Drilling further down the binary with readelf -S true
, we can see that while the binary is 26159 bytes, the actual compiled code is 13017 bytes, and the rest is assorted data/initialisation code.
However, true.c
is not the whole story and 13KB seems pretty much excessive if it were only that file; we can see functions called in main()
that are not listed in the external functions seen in the elf with objdump -T true
; functions that are present at:
https://github.com/coreutils/gnulib/blob/master/lib/progname.c- https://github.com/coreutils/gnulib/blob/master/lib/closeout.c
- https://github.com/coreutils/gnulib/blob/master/lib/version-etc.c
Those extra functions not linked externally in main()
are:
- set_program_name()
- close_stdout()
- version_etc()
So my first suspicion was partly correct, whilst the library is using dynamic libraries, the /bin/true
binary is big *because it has some static libraries included with it* (but that is not the only cause).
Compiling C code is not usually that inefficient for having such space unaccounted for, hence my initial suspicion something was amiss.
The extra space, almost 90% of the size of the binary, is indeed extra libraries/elf metadata.
While using Hopper for disassembling/decompiling the binary to understand where functions are, it can be seen the compiled binary code of true.c/usage() function is actually 833 bytes, and of the true.c/main() function is 225 bytes, which is roughly slightly less than 1KB. The logic for version functions, which is buried in the static libraries, is around 1KB.
The actual compiled main()+usage()+version()+strings+vars are only using up around 3KB to 3.5KB.
It is indeed ironic, such small and humble utilities have became bigger in size for the reasons explained above.
related question: Understanding what a Linux binary is doing
true.c
main() with the offending function calls:
int
main (int argc, char **argv)
{
/* Recognize --help or --version only if it's the only command-line
argument. */
if (argc == 2)
{
initialize_main (&argc, &argv);
set_program_name (argv[0]); <-----------
setlocale (LC_ALL, "");
bindtextdomain (PACKAGE, LOCALEDIR);
textdomain (PACKAGE);
atexit (close_stdout); <-----
if (STREQ (argv[1], "--help"))
usage (EXIT_STATUS);
if (STREQ (argv[1], "--version"))
version_etc (stdout, PROGRAM_NAME, PACKAGE_NAME, Version, AUTHORS, <------
(char *) NULL);
}
exit (EXIT_STATUS);
}
The decimal size of the various sections of the binary:
$ size -A -t true
true :
section size addr
.interp 28 568
.note.ABI-tag 32 596
.note.gnu.build-id 36 628
.gnu.hash 60 664
.dynsym 1416 728
.dynstr 676 2144
.gnu.version 118 2820
.gnu.version_r 96 2944
.rela.dyn 624 3040
.rela.plt 1104 3664
.init 23 4768
.plt 752 4800
.plt.got 8 5552
.text 13017 5568
.fini 9 18588
.rodata 3104 18624
.eh_frame_hdr 572 21728
.eh_frame 2908 22304
.init_array 8 2125160
.fini_array 8 2125168
.jcr 8 2125176
.data.rel.ro 88 2125184
.dynamic 480 2125272
.got 48 2125752
.got.plt 392 2125824
.data 128 2126240
.bss 416 2126368
.gnu_debuglink 52 0
Total 26211
Output of readelf -S true
$ readelf -S true
There are 30 section headers, starting at offset 0x7368:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000000254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000000274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000298 00000298
000000000000003c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000000002d8 000002d8
0000000000000588 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000000860 00000860
00000000000002a4 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000000b04 00000b04
0000000000000076 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000000b80 00000b80
0000000000000060 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000000be0 00000be0
0000000000000270 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000000e50 00000e50
0000000000000450 0000000000000018 AI 5 25 8
[11] .init PROGBITS 00000000000012a0 000012a0
0000000000000017 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000000012c0 000012c0
00000000000002f0 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 00000000000015b0 000015b0
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 00000000000015c0 000015c0
00000000000032d9 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 000000000000489c 0000489c
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 00000000000048c0 000048c0
0000000000000c20 0000000000000000 A 0 0 32
[17] .eh_frame_hdr PROGBITS 00000000000054e0 000054e0
000000000000023c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 0000000000005720 00005720
0000000000000b5c 0000000000000000 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000206d68 00006d68
0000000000000008 0000000000000008 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000206d70 00006d70
0000000000000008 0000000000000008 WA 0 0 8
[21] .jcr PROGBITS 0000000000206d78 00006d78
0000000000000008 0000000000000000 WA 0 0 8
[22] .data.rel.ro PROGBITS 0000000000206d80 00006d80
0000000000000058 0000000000000000 WA 0 0 32
[23] .dynamic DYNAMIC 0000000000206dd8 00006dd8
00000000000001e0 0000000000000010 WA 6 0 8
[24] .got PROGBITS 0000000000206fb8 00006fb8
0000000000000030 0000000000000008 WA 0 0 8
[25] .got.plt PROGBITS 0000000000207000 00007000
0000000000000188 0000000000000008 WA 0 0 8
[26] .data PROGBITS 00000000002071a0 000071a0
0000000000000080 0000000000000000 WA 0 0 32
[27] .bss NOBITS 0000000000207220 00007220
00000000000001a0 0000000000000000 WA 0 0 32
[28] .gnu_debuglink PROGBITS 0000000000000000 00007220
0000000000000034 0000000000000000 0 0 1
[29] .shstrtab STRTAB 0000000000000000 00007254
000000000000010f 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Output of objdump -T true
(external functions dynamically linked on run-time)
$ objdump -T true
true: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __uflow
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 getenv
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 free
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 abort
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __errno_location
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strncmp
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 _exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __fpending
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 textdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fclose
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 bindtextdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 dcgettext
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __ctype_get_mb_cur_max
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strlen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.4 __stack_chk_fail
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbrtowc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strrchr
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 lseek
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memset
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fscanf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 close
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memcmp
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fputs_unlocked
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 calloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strcmp
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.14 memcpy
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fileno
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 malloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fflush
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 nl_langinfo
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 ungetc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __freading
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 realloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fdopen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 setlocale
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __printf_chk
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 error
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 open
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fseeko
0000000000000000 w D *UND* 0000000000000000 _Jv_RegisterClasses
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_atexit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fwrite
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __fprintf_chk
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbsinit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 iswprint
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3 __ctype_b_loc
0000000000207228 g DO .bss 0000000000000008 GLIBC_2.2.5 stdout
0000000000207220 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname
0000000000207230 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_name
0000000000207230 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname_full
0000000000207220 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_short_name
0000000000207240 g DO .bss 0000000000000008 GLIBC_2.2.5 stderr
5
Having done some programming recently with a 64kB+2kB microcontroller, 28kB doesn't seem all that small..
– Barleyman
Jan 26 '18 at 16:49
1
@Barleyman you have OpenWRT, yocto, uClinux, uclib, busybox, microcoreutils, and other solutions for that kind of environments. Edited the post with your concern.
– Rui F Ribeiro
Jan 27 '18 at 9:09
3
@Barleyman: If you were optimizing for binary executable size, you can implementtrue
orfalse
with a 45-byte x86 ELF executable, packing the executable code (4 x86 instructions) inside the ELF program header (without support for any command-line options!). A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. (Or slightly larger if you want to avoid depending on Linux ELF loader implementation details :P)
– Peter Cordes
Jan 28 '18 at 23:39
3
Not really, no. Yocto for example can be crammed into less than a megabyte which is heaps and bounds above 64kB.. In this kind of device you may use RTOS of some kind with rudimentary process / memory management but even those can easily become too heavy. I wrote a simple cooperative multithreading system and used the built in memory protection to protect code from being overwritten. All told the firmware consumes some 55kB right now so not too much room there for additional overhead. Those ginormous 2kB look up tables..
– Barleyman
Jan 29 '18 at 0:09
2
@PeterCordes for sure but you need couple of magnitudes of more resources before Linux becomes viable. For what it's worth, C++ doesn't really work in that environment either. Well, not the standard libraries anyways. Iostream is right out at around 200kB etc.
– Barleyman
Jan 29 '18 at 0:19
|
show 1 more comment
Introductory notes:
In the past, /bin/true
and /bin/false
in the shell were actually scripts.
For instance, in a PDP/11 Unix System 7:
$ ls -la /bin/true /bin/false
-rwxr-xr-x 1 bin 7 Jun 8 1979 /bin/false
-rwxr-xr-x 1 bin 0 Jun 8 1979 /bin/true
$
$ cat /bin/false
exit 1
$
$ cat /bin/true
$
Nowadays, at least in bash
, the true
and false
commands are implemented as shell built-in commands. Thus no executable binary files are invoked by default, both when using the false
and true
directives in the bash
command line and inside shell scripts.
From the bash
source, builtins/mkbuiltins.c
:
char *posix_builtins =
{
"alias", "bg", "cd", "command", "**false**", "fc", "fg", "getopts", "jobs",
"kill", "newgrp", "pwd", "read", "**true**", "umask", "unalias", "wait",
(char *)NULL
};
Also per @meuh comments:
$ command -V true false
true is a shell builtin
false is a shell builtin
So it can be said with a high degree of certainty the true
and false
executable files exist mainly for being called from other programs.
From now on, the answer will focus on the /bin/true
binary from the coreutils
package in Debian 9 / 64 bits. (/usr/bin/true
running RedHat. RedHat and Debian use both the coreutils
package, analysed the compiled version of the latter having it more at hand).
As it can be seen in the source file false.c
, /bin/false
is compiled with (almost) the same source code as /bin/true
, just returning EXIT_FAILURE (1) instead, so this answer can be applied for both binaries.
#define EXIT_STATUS EXIT_FAILURE
#include "true.c"
As it also can be confirmed by both executables having the same size:
$ ls -l /bin/true /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/true
Alas, the direct question to the answer why are true and false so large?
could be, because there are not anymore so pressing reasons to care about their top performance. They are not essential to bash
performance, not being used anymore by bash
(scripting).
Similar comments apply to their size, 26KB for the kind of hardware we have nowadays is insignificant. Space is not at premium for the typical server/desktop anymore, and they do not even bother anymore to use the same binary for false
and true
, as it is just deployed twice in distributions using coreutils
.
Focusing, however, in the real spirit of the question, why something that should be so simple and small, gets so large?
The real distribution of the sections of /bin/true
is as these charts shows; the main code+data amounts to roughly 3KB out of a 26KB binary, which amounts to 12% of the size of /bin/true
.
The true
utility got indeed more cruft code over the years, most notably the standard support for --version
and --help
.
However, that it is not the (only) main justification for it being so big, but rather, while being dynamically linked (using shared libs), also having part of a generic library commonly used by coreutils
binaries linked as a static library. The metada for building an elf
executable file also amounts for a significant part of the binary, being it a relatively small file by today´s standards.
The rest of the answer is for explaining how we got to build the following charts detailing the composition of the /bin/true
executable binary file and how we arrived to that conclusion.
As @Maks says, the binary was compiled from C; as per my comment also, it is also confirmed it is from coreutils. We are pointing directly to the author(s) git https://github.com/wertarbyte/coreutils/blob/master/src/true.c, instead of the gnu git as @Maks (same sources, different repositories - this repository was selected as it has the full source of the coreutils
libraries)
We can see the various building blocks of the /bin/true
binary here (Debian 9 - 64 bits from coreutils
):
$ file /bin/true
/bin/true: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=9ae82394864538fa7b23b7f87b259ea2a20889c4, stripped
$ size /bin/true
text data bss dec hex filename
24583 1160 416 26159 662f true
Of those:
- text (usually code) is around 24KB
- data (initialised variables, mostly strings) are around 1KB
- bss (uninitialized data) 0.5KB
Of the 24KB, around 1KB is for fixing up the 58 external functions.
That still leaves around roughly 23KB for rest of the code. We will show down bellow that the actual main file - main()+usage() code is around 1KB compiled, and explain what the other 22KB are used for.
Drilling further down the binary with readelf -S true
, we can see that while the binary is 26159 bytes, the actual compiled code is 13017 bytes, and the rest is assorted data/initialisation code.
However, true.c
is not the whole story and 13KB seems pretty much excessive if it were only that file; we can see functions called in main()
that are not listed in the external functions seen in the elf with objdump -T true
; functions that are present at:
https://github.com/coreutils/gnulib/blob/master/lib/progname.c- https://github.com/coreutils/gnulib/blob/master/lib/closeout.c
- https://github.com/coreutils/gnulib/blob/master/lib/version-etc.c
Those extra functions not linked externally in main()
are:
- set_program_name()
- close_stdout()
- version_etc()
So my first suspicion was partly correct, whilst the library is using dynamic libraries, the /bin/true
binary is big *because it has some static libraries included with it* (but that is not the only cause).
Compiling C code is not usually that inefficient for having such space unaccounted for, hence my initial suspicion something was amiss.
The extra space, almost 90% of the size of the binary, is indeed extra libraries/elf metadata.
While using Hopper for disassembling/decompiling the binary to understand where functions are, it can be seen the compiled binary code of true.c/usage() function is actually 833 bytes, and of the true.c/main() function is 225 bytes, which is roughly slightly less than 1KB. The logic for version functions, which is buried in the static libraries, is around 1KB.
The actual compiled main()+usage()+version()+strings+vars are only using up around 3KB to 3.5KB.
It is indeed ironic, such small and humble utilities have became bigger in size for the reasons explained above.
related question: Understanding what a Linux binary is doing
true.c
main() with the offending function calls:
int
main (int argc, char **argv)
{
/* Recognize --help or --version only if it's the only command-line
argument. */
if (argc == 2)
{
initialize_main (&argc, &argv);
set_program_name (argv[0]); <-----------
setlocale (LC_ALL, "");
bindtextdomain (PACKAGE, LOCALEDIR);
textdomain (PACKAGE);
atexit (close_stdout); <-----
if (STREQ (argv[1], "--help"))
usage (EXIT_STATUS);
if (STREQ (argv[1], "--version"))
version_etc (stdout, PROGRAM_NAME, PACKAGE_NAME, Version, AUTHORS, <------
(char *) NULL);
}
exit (EXIT_STATUS);
}
The decimal size of the various sections of the binary:
$ size -A -t true
true :
section size addr
.interp 28 568
.note.ABI-tag 32 596
.note.gnu.build-id 36 628
.gnu.hash 60 664
.dynsym 1416 728
.dynstr 676 2144
.gnu.version 118 2820
.gnu.version_r 96 2944
.rela.dyn 624 3040
.rela.plt 1104 3664
.init 23 4768
.plt 752 4800
.plt.got 8 5552
.text 13017 5568
.fini 9 18588
.rodata 3104 18624
.eh_frame_hdr 572 21728
.eh_frame 2908 22304
.init_array 8 2125160
.fini_array 8 2125168
.jcr 8 2125176
.data.rel.ro 88 2125184
.dynamic 480 2125272
.got 48 2125752
.got.plt 392 2125824
.data 128 2126240
.bss 416 2126368
.gnu_debuglink 52 0
Total 26211
Output of readelf -S true
$ readelf -S true
There are 30 section headers, starting at offset 0x7368:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000000254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000000274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000298 00000298
000000000000003c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000000002d8 000002d8
0000000000000588 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000000860 00000860
00000000000002a4 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000000b04 00000b04
0000000000000076 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000000b80 00000b80
0000000000000060 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000000be0 00000be0
0000000000000270 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000000e50 00000e50
0000000000000450 0000000000000018 AI 5 25 8
[11] .init PROGBITS 00000000000012a0 000012a0
0000000000000017 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000000012c0 000012c0
00000000000002f0 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 00000000000015b0 000015b0
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 00000000000015c0 000015c0
00000000000032d9 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 000000000000489c 0000489c
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 00000000000048c0 000048c0
0000000000000c20 0000000000000000 A 0 0 32
[17] .eh_frame_hdr PROGBITS 00000000000054e0 000054e0
000000000000023c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 0000000000005720 00005720
0000000000000b5c 0000000000000000 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000206d68 00006d68
0000000000000008 0000000000000008 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000206d70 00006d70
0000000000000008 0000000000000008 WA 0 0 8
[21] .jcr PROGBITS 0000000000206d78 00006d78
0000000000000008 0000000000000000 WA 0 0 8
[22] .data.rel.ro PROGBITS 0000000000206d80 00006d80
0000000000000058 0000000000000000 WA 0 0 32
[23] .dynamic DYNAMIC 0000000000206dd8 00006dd8
00000000000001e0 0000000000000010 WA 6 0 8
[24] .got PROGBITS 0000000000206fb8 00006fb8
0000000000000030 0000000000000008 WA 0 0 8
[25] .got.plt PROGBITS 0000000000207000 00007000
0000000000000188 0000000000000008 WA 0 0 8
[26] .data PROGBITS 00000000002071a0 000071a0
0000000000000080 0000000000000000 WA 0 0 32
[27] .bss NOBITS 0000000000207220 00007220
00000000000001a0 0000000000000000 WA 0 0 32
[28] .gnu_debuglink PROGBITS 0000000000000000 00007220
0000000000000034 0000000000000000 0 0 1
[29] .shstrtab STRTAB 0000000000000000 00007254
000000000000010f 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Output of objdump -T true
(external functions dynamically linked on run-time)
$ objdump -T true
true: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __uflow
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 getenv
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 free
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 abort
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __errno_location
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strncmp
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 _exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __fpending
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 textdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fclose
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 bindtextdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 dcgettext
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __ctype_get_mb_cur_max
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strlen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.4 __stack_chk_fail
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbrtowc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strrchr
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 lseek
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memset
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fscanf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 close
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memcmp
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fputs_unlocked
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 calloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strcmp
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.14 memcpy
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fileno
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 malloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fflush
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 nl_langinfo
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 ungetc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __freading
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 realloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fdopen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 setlocale
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __printf_chk
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 error
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 open
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fseeko
0000000000000000 w D *UND* 0000000000000000 _Jv_RegisterClasses
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_atexit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fwrite
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __fprintf_chk
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbsinit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 iswprint
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3 __ctype_b_loc
0000000000207228 g DO .bss 0000000000000008 GLIBC_2.2.5 stdout
0000000000207220 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname
0000000000207230 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_name
0000000000207230 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname_full
0000000000207220 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_short_name
0000000000207240 g DO .bss 0000000000000008 GLIBC_2.2.5 stderr
Introductory notes:
In the past, /bin/true
and /bin/false
in the shell were actually scripts.
For instance, in a PDP/11 Unix System 7:
$ ls -la /bin/true /bin/false
-rwxr-xr-x 1 bin 7 Jun 8 1979 /bin/false
-rwxr-xr-x 1 bin 0 Jun 8 1979 /bin/true
$
$ cat /bin/false
exit 1
$
$ cat /bin/true
$
Nowadays, at least in bash
, the true
and false
commands are implemented as shell built-in commands. Thus no executable binary files are invoked by default, both when using the false
and true
directives in the bash
command line and inside shell scripts.
From the bash
source, builtins/mkbuiltins.c
:
char *posix_builtins =
{
"alias", "bg", "cd", "command", "**false**", "fc", "fg", "getopts", "jobs",
"kill", "newgrp", "pwd", "read", "**true**", "umask", "unalias", "wait",
(char *)NULL
};
Also per @meuh comments:
$ command -V true false
true is a shell builtin
false is a shell builtin
So it can be said with a high degree of certainty the true
and false
executable files exist mainly for being called from other programs.
From now on, the answer will focus on the /bin/true
binary from the coreutils
package in Debian 9 / 64 bits. (/usr/bin/true
running RedHat. RedHat and Debian use both the coreutils
package, analysed the compiled version of the latter having it more at hand).
As it can be seen in the source file false.c
, /bin/false
is compiled with (almost) the same source code as /bin/true
, just returning EXIT_FAILURE (1) instead, so this answer can be applied for both binaries.
#define EXIT_STATUS EXIT_FAILURE
#include "true.c"
As it also can be confirmed by both executables having the same size:
$ ls -l /bin/true /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22 2017 /bin/true
Alas, the direct question to the answer why are true and false so large?
could be, because there are not anymore so pressing reasons to care about their top performance. They are not essential to bash
performance, not being used anymore by bash
(scripting).
Similar comments apply to their size, 26KB for the kind of hardware we have nowadays is insignificant. Space is not at premium for the typical server/desktop anymore, and they do not even bother anymore to use the same binary for false
and true
, as it is just deployed twice in distributions using coreutils
.
Focusing, however, in the real spirit of the question, why something that should be so simple and small, gets so large?
The real distribution of the sections of /bin/true
is as these charts shows; the main code+data amounts to roughly 3KB out of a 26KB binary, which amounts to 12% of the size of /bin/true
.
The true
utility got indeed more cruft code over the years, most notably the standard support for --version
and --help
.
However, that it is not the (only) main justification for it being so big, but rather, while being dynamically linked (using shared libs), also having part of a generic library commonly used by coreutils
binaries linked as a static library. The metada for building an elf
executable file also amounts for a significant part of the binary, being it a relatively small file by today´s standards.
The rest of the answer is for explaining how we got to build the following charts detailing the composition of the /bin/true
executable binary file and how we arrived to that conclusion.
As @Maks says, the binary was compiled from C; as per my comment also, it is also confirmed it is from coreutils. We are pointing directly to the author(s) git https://github.com/wertarbyte/coreutils/blob/master/src/true.c, instead of the gnu git as @Maks (same sources, different repositories - this repository was selected as it has the full source of the coreutils
libraries)
We can see the various building blocks of the /bin/true
binary here (Debian 9 - 64 bits from coreutils
):
$ file /bin/true
/bin/true: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=9ae82394864538fa7b23b7f87b259ea2a20889c4, stripped
$ size /bin/true
text data bss dec hex filename
24583 1160 416 26159 662f true
Of those:
- text (usually code) is around 24KB
- data (initialised variables, mostly strings) are around 1KB
- bss (uninitialized data) 0.5KB
Of the 24KB, around 1KB is for fixing up the 58 external functions.
That still leaves around roughly 23KB for rest of the code. We will show down bellow that the actual main file - main()+usage() code is around 1KB compiled, and explain what the other 22KB are used for.
Drilling further down the binary with readelf -S true
, we can see that while the binary is 26159 bytes, the actual compiled code is 13017 bytes, and the rest is assorted data/initialisation code.
However, true.c
is not the whole story and 13KB seems pretty much excessive if it were only that file; we can see functions called in main()
that are not listed in the external functions seen in the elf with objdump -T true
; functions that are present at:
https://github.com/coreutils/gnulib/blob/master/lib/progname.c- https://github.com/coreutils/gnulib/blob/master/lib/closeout.c
- https://github.com/coreutils/gnulib/blob/master/lib/version-etc.c
Those extra functions not linked externally in main()
are:
- set_program_name()
- close_stdout()
- version_etc()
So my first suspicion was partly correct, whilst the library is using dynamic libraries, the /bin/true
binary is big *because it has some static libraries included with it* (but that is not the only cause).
Compiling C code is not usually that inefficient for having such space unaccounted for, hence my initial suspicion something was amiss.
The extra space, almost 90% of the size of the binary, is indeed extra libraries/elf metadata.
While using Hopper for disassembling/decompiling the binary to understand where functions are, it can be seen the compiled binary code of true.c/usage() function is actually 833 bytes, and of the true.c/main() function is 225 bytes, which is roughly slightly less than 1KB. The logic for version functions, which is buried in the static libraries, is around 1KB.
The actual compiled main()+usage()+version()+strings+vars are only using up around 3KB to 3.5KB.
It is indeed ironic, such small and humble utilities have became bigger in size for the reasons explained above.
related question: Understanding what a Linux binary is doing
true.c
main() with the offending function calls:
int
main (int argc, char **argv)
{
/* Recognize --help or --version only if it's the only command-line
argument. */
if (argc == 2)
{
initialize_main (&argc, &argv);
set_program_name (argv[0]); <-----------
setlocale (LC_ALL, "");
bindtextdomain (PACKAGE, LOCALEDIR);
textdomain (PACKAGE);
atexit (close_stdout); <-----
if (STREQ (argv[1], "--help"))
usage (EXIT_STATUS);
if (STREQ (argv[1], "--version"))
version_etc (stdout, PROGRAM_NAME, PACKAGE_NAME, Version, AUTHORS, <------
(char *) NULL);
}
exit (EXIT_STATUS);
}
The decimal size of the various sections of the binary:
$ size -A -t true
true :
section size addr
.interp 28 568
.note.ABI-tag 32 596
.note.gnu.build-id 36 628
.gnu.hash 60 664
.dynsym 1416 728
.dynstr 676 2144
.gnu.version 118 2820
.gnu.version_r 96 2944
.rela.dyn 624 3040
.rela.plt 1104 3664
.init 23 4768
.plt 752 4800
.plt.got 8 5552
.text 13017 5568
.fini 9 18588
.rodata 3104 18624
.eh_frame_hdr 572 21728
.eh_frame 2908 22304
.init_array 8 2125160
.fini_array 8 2125168
.jcr 8 2125176
.data.rel.ro 88 2125184
.dynamic 480 2125272
.got 48 2125752
.got.plt 392 2125824
.data 128 2126240
.bss 416 2126368
.gnu_debuglink 52 0
Total 26211
Output of readelf -S true
$ readelf -S true
There are 30 section headers, starting at offset 0x7368:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000000254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000000274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000298 00000298
000000000000003c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000000002d8 000002d8
0000000000000588 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000000860 00000860
00000000000002a4 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000000b04 00000b04
0000000000000076 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000000b80 00000b80
0000000000000060 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000000be0 00000be0
0000000000000270 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000000e50 00000e50
0000000000000450 0000000000000018 AI 5 25 8
[11] .init PROGBITS 00000000000012a0 000012a0
0000000000000017 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000000012c0 000012c0
00000000000002f0 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 00000000000015b0 000015b0
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 00000000000015c0 000015c0
00000000000032d9 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 000000000000489c 0000489c
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 00000000000048c0 000048c0
0000000000000c20 0000000000000000 A 0 0 32
[17] .eh_frame_hdr PROGBITS 00000000000054e0 000054e0
000000000000023c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 0000000000005720 00005720
0000000000000b5c 0000000000000000 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000206d68 00006d68
0000000000000008 0000000000000008 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000206d70 00006d70
0000000000000008 0000000000000008 WA 0 0 8
[21] .jcr PROGBITS 0000000000206d78 00006d78
0000000000000008 0000000000000000 WA 0 0 8
[22] .data.rel.ro PROGBITS 0000000000206d80 00006d80
0000000000000058 0000000000000000 WA 0 0 32
[23] .dynamic DYNAMIC 0000000000206dd8 00006dd8
00000000000001e0 0000000000000010 WA 6 0 8
[24] .got PROGBITS 0000000000206fb8 00006fb8
0000000000000030 0000000000000008 WA 0 0 8
[25] .got.plt PROGBITS 0000000000207000 00007000
0000000000000188 0000000000000008 WA 0 0 8
[26] .data PROGBITS 00000000002071a0 000071a0
0000000000000080 0000000000000000 WA 0 0 32
[27] .bss NOBITS 0000000000207220 00007220
00000000000001a0 0000000000000000 WA 0 0 32
[28] .gnu_debuglink PROGBITS 0000000000000000 00007220
0000000000000034 0000000000000000 0 0 1
[29] .shstrtab STRTAB 0000000000000000 00007254
000000000000010f 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Output of objdump -T true
(external functions dynamically linked on run-time)
$ objdump -T true
true: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __uflow
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 getenv
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 free
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 abort
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __errno_location
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strncmp
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 _exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __fpending
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 textdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fclose
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 bindtextdomain
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 dcgettext
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __ctype_get_mb_cur_max
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strlen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.4 __stack_chk_fail
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbrtowc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strrchr
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 lseek
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memset
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fscanf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 close
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memcmp
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fputs_unlocked
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 calloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 strcmp
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.14 memcpy
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fileno
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 malloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fflush
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 nl_langinfo
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 ungetc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __freading
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 realloc
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fdopen
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 setlocale
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __printf_chk
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 error
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 open
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fseeko
0000000000000000 w D *UND* 0000000000000000 _Jv_RegisterClasses
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_atexit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 exit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fwrite
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.4 __fprintf_chk
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mbsinit
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 iswprint
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3 __ctype_b_loc
0000000000207228 g DO .bss 0000000000000008 GLIBC_2.2.5 stdout
0000000000207220 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname
0000000000207230 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_name
0000000000207230 g DO .bss 0000000000000008 GLIBC_2.2.5 __progname_full
0000000000207220 w DO .bss 0000000000000008 GLIBC_2.2.5 program_invocation_short_name
0000000000207240 g DO .bss 0000000000000008 GLIBC_2.2.5 stderr
edited Feb 7 at 19:12
answered Jan 25 '18 at 20:50
Rui F RibeiroRui F Ribeiro
40.3k1479137
40.3k1479137
5
Having done some programming recently with a 64kB+2kB microcontroller, 28kB doesn't seem all that small..
– Barleyman
Jan 26 '18 at 16:49
1
@Barleyman you have OpenWRT, yocto, uClinux, uclib, busybox, microcoreutils, and other solutions for that kind of environments. Edited the post with your concern.
– Rui F Ribeiro
Jan 27 '18 at 9:09
3
@Barleyman: If you were optimizing for binary executable size, you can implementtrue
orfalse
with a 45-byte x86 ELF executable, packing the executable code (4 x86 instructions) inside the ELF program header (without support for any command-line options!). A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. (Or slightly larger if you want to avoid depending on Linux ELF loader implementation details :P)
– Peter Cordes
Jan 28 '18 at 23:39
3
Not really, no. Yocto for example can be crammed into less than a megabyte which is heaps and bounds above 64kB.. In this kind of device you may use RTOS of some kind with rudimentary process / memory management but even those can easily become too heavy. I wrote a simple cooperative multithreading system and used the built in memory protection to protect code from being overwritten. All told the firmware consumes some 55kB right now so not too much room there for additional overhead. Those ginormous 2kB look up tables..
– Barleyman
Jan 29 '18 at 0:09
2
@PeterCordes for sure but you need couple of magnitudes of more resources before Linux becomes viable. For what it's worth, C++ doesn't really work in that environment either. Well, not the standard libraries anyways. Iostream is right out at around 200kB etc.
– Barleyman
Jan 29 '18 at 0:19
|
show 1 more comment
5
Having done some programming recently with a 64kB+2kB microcontroller, 28kB doesn't seem all that small..
– Barleyman
Jan 26 '18 at 16:49
1
@Barleyman you have OpenWRT, yocto, uClinux, uclib, busybox, microcoreutils, and other solutions for that kind of environments. Edited the post with your concern.
– Rui F Ribeiro
Jan 27 '18 at 9:09
3
@Barleyman: If you were optimizing for binary executable size, you can implementtrue
orfalse
with a 45-byte x86 ELF executable, packing the executable code (4 x86 instructions) inside the ELF program header (without support for any command-line options!). A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. (Or slightly larger if you want to avoid depending on Linux ELF loader implementation details :P)
– Peter Cordes
Jan 28 '18 at 23:39
3
Not really, no. Yocto for example can be crammed into less than a megabyte which is heaps and bounds above 64kB.. In this kind of device you may use RTOS of some kind with rudimentary process / memory management but even those can easily become too heavy. I wrote a simple cooperative multithreading system and used the built in memory protection to protect code from being overwritten. All told the firmware consumes some 55kB right now so not too much room there for additional overhead. Those ginormous 2kB look up tables..
– Barleyman
Jan 29 '18 at 0:09
2
@PeterCordes for sure but you need couple of magnitudes of more resources before Linux becomes viable. For what it's worth, C++ doesn't really work in that environment either. Well, not the standard libraries anyways. Iostream is right out at around 200kB etc.
– Barleyman
Jan 29 '18 at 0:19
5
5
Having done some programming recently with a 64kB+2kB microcontroller, 28kB doesn't seem all that small..
– Barleyman
Jan 26 '18 at 16:49
Having done some programming recently with a 64kB+2kB microcontroller, 28kB doesn't seem all that small..
– Barleyman
Jan 26 '18 at 16:49
1
1
@Barleyman you have OpenWRT, yocto, uClinux, uclib, busybox, microcoreutils, and other solutions for that kind of environments. Edited the post with your concern.
– Rui F Ribeiro
Jan 27 '18 at 9:09
@Barleyman you have OpenWRT, yocto, uClinux, uclib, busybox, microcoreutils, and other solutions for that kind of environments. Edited the post with your concern.
– Rui F Ribeiro
Jan 27 '18 at 9:09
3
3
@Barleyman: If you were optimizing for binary executable size, you can implement
true
or false
with a 45-byte x86 ELF executable, packing the executable code (4 x86 instructions) inside the ELF program header (without support for any command-line options!). A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. (Or slightly larger if you want to avoid depending on Linux ELF loader implementation details :P)– Peter Cordes
Jan 28 '18 at 23:39
@Barleyman: If you were optimizing for binary executable size, you can implement
true
or false
with a 45-byte x86 ELF executable, packing the executable code (4 x86 instructions) inside the ELF program header (without support for any command-line options!). A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. (Or slightly larger if you want to avoid depending on Linux ELF loader implementation details :P)– Peter Cordes
Jan 28 '18 at 23:39
3
3
Not really, no. Yocto for example can be crammed into less than a megabyte which is heaps and bounds above 64kB.. In this kind of device you may use RTOS of some kind with rudimentary process / memory management but even those can easily become too heavy. I wrote a simple cooperative multithreading system and used the built in memory protection to protect code from being overwritten. All told the firmware consumes some 55kB right now so not too much room there for additional overhead. Those ginormous 2kB look up tables..
– Barleyman
Jan 29 '18 at 0:09
Not really, no. Yocto for example can be crammed into less than a megabyte which is heaps and bounds above 64kB.. In this kind of device you may use RTOS of some kind with rudimentary process / memory management but even those can easily become too heavy. I wrote a simple cooperative multithreading system and used the built in memory protection to protect code from being overwritten. All told the firmware consumes some 55kB right now so not too much room there for additional overhead. Those ginormous 2kB look up tables..
– Barleyman
Jan 29 '18 at 0:09
2
2
@PeterCordes for sure but you need couple of magnitudes of more resources before Linux becomes viable. For what it's worth, C++ doesn't really work in that environment either. Well, not the standard libraries anyways. Iostream is right out at around 200kB etc.
– Barleyman
Jan 29 '18 at 0:19
@PeterCordes for sure but you need couple of magnitudes of more resources before Linux becomes viable. For what it's worth, C++ doesn't really work in that environment either. Well, not the standard libraries anyways. Iostream is right out at around 200kB etc.
– Barleyman
Jan 29 '18 at 0:19
|
show 1 more comment
The implementation probably comes from GNU coreutils. These binaries are compiled from C; no particular effort has been made to make them smaller than they are by default.
You could try to compile the trivial implementation of true
yourself, and you'll notice it's already few KB in size. For example, on my system:
$ echo 'int main() { return 0; }' | gcc -xc - -o true
$ wc -c true
8136 true
Of course, your binaries are even bigger. That's because they also support command line arguments. Try running /usr/bin/true --help
or /usr/bin/true --version
.
In addition to the string data, the binary includes logic to parse command line flags, etc. That adds up to about 20 KB of code, apparently.
For reference, you can find the source code here: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c
2
FYI I was complaining about these coreutils implementations on their bug tracker, but no chance to get it fixed lists.gnu.org/archive/html/bug-coreutils/2016-03/msg00040.html
– rudimeier
Jan 25 '18 at 21:43
6
It is not the logic for arguments, C is not that inefficient...is inline libraries/house keeping tasks. Have a look at my answer for the gory details.
– Rui F Ribeiro
Jan 25 '18 at 21:59
7
This is misleading because it suggests that compiled machine code (from C or otherwise) is what takes the huge amount of space - the actual size overhead has more to do with massive amounts of standard C library/runtime boilerplate that gets inlined by the compiler in order to interoperate with the C library (glibc, unless you've heard that your system uses something else, probably), and, to a lesser extent, ELF headers/metadata (a lot of which are not strictly necessary, but deemed worthwhile enough to include in default builds).
– mtraceur
Jan 25 '18 at 22:10
2
The actual main()+usage()+strings on both functions are around 2KB, not 20KB.
– Rui F Ribeiro
Jan 25 '18 at 23:11
2
@JdeBP logic for --version/version funtions 1KB, --usage/--help 833 bytes, main() 225 bytes and the whole static data of the binary is 1KB
– Rui F Ribeiro
Jan 26 '18 at 9:04
|
show 3 more comments
The implementation probably comes from GNU coreutils. These binaries are compiled from C; no particular effort has been made to make them smaller than they are by default.
You could try to compile the trivial implementation of true
yourself, and you'll notice it's already few KB in size. For example, on my system:
$ echo 'int main() { return 0; }' | gcc -xc - -o true
$ wc -c true
8136 true
Of course, your binaries are even bigger. That's because they also support command line arguments. Try running /usr/bin/true --help
or /usr/bin/true --version
.
In addition to the string data, the binary includes logic to parse command line flags, etc. That adds up to about 20 KB of code, apparently.
For reference, you can find the source code here: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c
2
FYI I was complaining about these coreutils implementations on their bug tracker, but no chance to get it fixed lists.gnu.org/archive/html/bug-coreutils/2016-03/msg00040.html
– rudimeier
Jan 25 '18 at 21:43
6
It is not the logic for arguments, C is not that inefficient...is inline libraries/house keeping tasks. Have a look at my answer for the gory details.
– Rui F Ribeiro
Jan 25 '18 at 21:59
7
This is misleading because it suggests that compiled machine code (from C or otherwise) is what takes the huge amount of space - the actual size overhead has more to do with massive amounts of standard C library/runtime boilerplate that gets inlined by the compiler in order to interoperate with the C library (glibc, unless you've heard that your system uses something else, probably), and, to a lesser extent, ELF headers/metadata (a lot of which are not strictly necessary, but deemed worthwhile enough to include in default builds).
– mtraceur
Jan 25 '18 at 22:10
2
The actual main()+usage()+strings on both functions are around 2KB, not 20KB.
– Rui F Ribeiro
Jan 25 '18 at 23:11
2
@JdeBP logic for --version/version funtions 1KB, --usage/--help 833 bytes, main() 225 bytes and the whole static data of the binary is 1KB
– Rui F Ribeiro
Jan 26 '18 at 9:04
|
show 3 more comments
The implementation probably comes from GNU coreutils. These binaries are compiled from C; no particular effort has been made to make them smaller than they are by default.
You could try to compile the trivial implementation of true
yourself, and you'll notice it's already few KB in size. For example, on my system:
$ echo 'int main() { return 0; }' | gcc -xc - -o true
$ wc -c true
8136 true
Of course, your binaries are even bigger. That's because they also support command line arguments. Try running /usr/bin/true --help
or /usr/bin/true --version
.
In addition to the string data, the binary includes logic to parse command line flags, etc. That adds up to about 20 KB of code, apparently.
For reference, you can find the source code here: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c
The implementation probably comes from GNU coreutils. These binaries are compiled from C; no particular effort has been made to make them smaller than they are by default.
You could try to compile the trivial implementation of true
yourself, and you'll notice it's already few KB in size. For example, on my system:
$ echo 'int main() { return 0; }' | gcc -xc - -o true
$ wc -c true
8136 true
Of course, your binaries are even bigger. That's because they also support command line arguments. Try running /usr/bin/true --help
or /usr/bin/true --version
.
In addition to the string data, the binary includes logic to parse command line flags, etc. That adds up to about 20 KB of code, apparently.
For reference, you can find the source code here: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c
answered Jan 25 '18 at 20:29
Maks VerverMaks Verver
43724
43724
2
FYI I was complaining about these coreutils implementations on their bug tracker, but no chance to get it fixed lists.gnu.org/archive/html/bug-coreutils/2016-03/msg00040.html
– rudimeier
Jan 25 '18 at 21:43
6
It is not the logic for arguments, C is not that inefficient...is inline libraries/house keeping tasks. Have a look at my answer for the gory details.
– Rui F Ribeiro
Jan 25 '18 at 21:59
7
This is misleading because it suggests that compiled machine code (from C or otherwise) is what takes the huge amount of space - the actual size overhead has more to do with massive amounts of standard C library/runtime boilerplate that gets inlined by the compiler in order to interoperate with the C library (glibc, unless you've heard that your system uses something else, probably), and, to a lesser extent, ELF headers/metadata (a lot of which are not strictly necessary, but deemed worthwhile enough to include in default builds).
– mtraceur
Jan 25 '18 at 22:10
2
The actual main()+usage()+strings on both functions are around 2KB, not 20KB.
– Rui F Ribeiro
Jan 25 '18 at 23:11
2
@JdeBP logic for --version/version funtions 1KB, --usage/--help 833 bytes, main() 225 bytes and the whole static data of the binary is 1KB
– Rui F Ribeiro
Jan 26 '18 at 9:04
|
show 3 more comments
2
FYI I was complaining about these coreutils implementations on their bug tracker, but no chance to get it fixed lists.gnu.org/archive/html/bug-coreutils/2016-03/msg00040.html
– rudimeier
Jan 25 '18 at 21:43
6
It is not the logic for arguments, C is not that inefficient...is inline libraries/house keeping tasks. Have a look at my answer for the gory details.
– Rui F Ribeiro
Jan 25 '18 at 21:59
7
This is misleading because it suggests that compiled machine code (from C or otherwise) is what takes the huge amount of space - the actual size overhead has more to do with massive amounts of standard C library/runtime boilerplate that gets inlined by the compiler in order to interoperate with the C library (glibc, unless you've heard that your system uses something else, probably), and, to a lesser extent, ELF headers/metadata (a lot of which are not strictly necessary, but deemed worthwhile enough to include in default builds).
– mtraceur
Jan 25 '18 at 22:10
2
The actual main()+usage()+strings on both functions are around 2KB, not 20KB.
– Rui F Ribeiro
Jan 25 '18 at 23:11
2
@JdeBP logic for --version/version funtions 1KB, --usage/--help 833 bytes, main() 225 bytes and the whole static data of the binary is 1KB
– Rui F Ribeiro
Jan 26 '18 at 9:04
2
2
FYI I was complaining about these coreutils implementations on their bug tracker, but no chance to get it fixed lists.gnu.org/archive/html/bug-coreutils/2016-03/msg00040.html
– rudimeier
Jan 25 '18 at 21:43
FYI I was complaining about these coreutils implementations on their bug tracker, but no chance to get it fixed lists.gnu.org/archive/html/bug-coreutils/2016-03/msg00040.html
– rudimeier
Jan 25 '18 at 21:43
6
6
It is not the logic for arguments, C is not that inefficient...is inline libraries/house keeping tasks. Have a look at my answer for the gory details.
– Rui F Ribeiro
Jan 25 '18 at 21:59
It is not the logic for arguments, C is not that inefficient...is inline libraries/house keeping tasks. Have a look at my answer for the gory details.
– Rui F Ribeiro
Jan 25 '18 at 21:59
7
7
This is misleading because it suggests that compiled machine code (from C or otherwise) is what takes the huge amount of space - the actual size overhead has more to do with massive amounts of standard C library/runtime boilerplate that gets inlined by the compiler in order to interoperate with the C library (glibc, unless you've heard that your system uses something else, probably), and, to a lesser extent, ELF headers/metadata (a lot of which are not strictly necessary, but deemed worthwhile enough to include in default builds).
– mtraceur
Jan 25 '18 at 22:10
This is misleading because it suggests that compiled machine code (from C or otherwise) is what takes the huge amount of space - the actual size overhead has more to do with massive amounts of standard C library/runtime boilerplate that gets inlined by the compiler in order to interoperate with the C library (glibc, unless you've heard that your system uses something else, probably), and, to a lesser extent, ELF headers/metadata (a lot of which are not strictly necessary, but deemed worthwhile enough to include in default builds).
– mtraceur
Jan 25 '18 at 22:10
2
2
The actual main()+usage()+strings on both functions are around 2KB, not 20KB.
– Rui F Ribeiro
Jan 25 '18 at 23:11
The actual main()+usage()+strings on both functions are around 2KB, not 20KB.
– Rui F Ribeiro
Jan 25 '18 at 23:11
2
2
@JdeBP logic for --version/version funtions 1KB, --usage/--help 833 bytes, main() 225 bytes and the whole static data of the binary is 1KB
– Rui F Ribeiro
Jan 26 '18 at 9:04
@JdeBP logic for --version/version funtions 1KB, --usage/--help 833 bytes, main() 225 bytes and the whole static data of the binary is 1KB
– Rui F Ribeiro
Jan 26 '18 at 9:04
|
show 3 more comments
Stripping them down to core functionality and writing in assembler yields far smaller binaries.
Original true/false binaries are written in C, which by its nature pulls in various library + symbol references. If you run readelf -a /bin/true
this is quite noticeable.
352 bytes for a stripped ELF static executable (with room to save a couple bytes by optimizing the asm for code-size).
$ more true.asm false.asm
::::::::::::::
true.asm
::::::::::::::
global _start
_start:
mov ebx,0
mov eax,1 ; SYS_exit from asm/unistd_32.h
int 0x80 ; The 32-bit ABI is supported in 64-bit code, in kernels compiled with IA-32 emulation
::::::::::::::
false.asm
::::::::::::::
global _start
_start:
mov ebx,1
mov eax,1
int 0x80
$ nasm -f elf64 true.asm && ld -s -o true true.o # -s means strip
$ nasm -f elf64 false.asm && ld -s -o false false.o
$ ll true false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 true
$ ./true ; echo $?
0
$ ./false ; echo $?
1
$
Or, with a bit of a nasty/ingenious approach (kudos to stalkr), create your own ELF headers, getting it down to 132 127 bytes. We're entering Code Golf territory here.
$ cat true2.asm
BITS 64
org 0x400000 ; _start is at 0x400080 as usual, but the ELF headers come first
ehdr: ; Elf64_Ehdr
db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 0x3e ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf64_Phdr
dd 1 ; p_type
dd 5 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
xor edi,edi ; int status = 0
; or mov dil,1 for false: high bytes are ignored.
lea eax, [rdi+60] ; rax = 60 = SYS_exit, using a 3-byte instruction: base+disp8 addressing mode
syscall ; native 64-bit system call, works without CONFIG_IA32_EMULATION
; less-golfed version:
; mov edi, 1 ; for false
; mov eax,252 ; SYS_exit_group from asm/unistd_64.h
; syscall
filesize equ $ - $$ ; used earlier in some ELF header fields
$ nasm -f bin -o true2 true2.asm
$ ll true2
-rw-r--r-- 1 peter peter 127 Jan 28 20:08 true2
$ chmod +x true2 ; ./true2 ; echo $?
0
$
2
Comments are not for extended discussion; this conversation has been moved to chat.
– terdon♦
Jan 28 '18 at 16:27
2
Also see this excellent write-up: muppetlabs.com/~breadbox/software/tiny/teensy.html
– mic_e
Jan 28 '18 at 21:38
3
You're using theint 0x80
32-bit ABI in a 64-bit executable, which is unusual but supported. Usingsyscall
wouldn't save you anything. The high bytes ofebx
are ignored, so you could use 2-bytemov bl,1
. Or of coursexor ebx,ebx
for zero. Linux inits integer registers to zero, so you could justinc eax
to get 1 = __NR_exit (i386 ABI).
– Peter Cordes
Jan 28 '18 at 23:48
1
I updated the code on your golfed example to use the 64-bit ABI, and golf it down to 127 bytes fortrue
. (I don't see an easy way to manage less than 128 bytes forfalse
, though, other than using the 32-bit ABI or taking advantage of the fact that Linux zeros registers on process startup, somov al,252
(2 bytes) works.push imm8
/pop rdi
would also work instead oflea
for settingedi=1
, but we still can't beat the 32-bit ABI where we couldmov bl,1
without a REX prefix.
– Peter Cordes
Jan 29 '18 at 0:17
add a comment |
Stripping them down to core functionality and writing in assembler yields far smaller binaries.
Original true/false binaries are written in C, which by its nature pulls in various library + symbol references. If you run readelf -a /bin/true
this is quite noticeable.
352 bytes for a stripped ELF static executable (with room to save a couple bytes by optimizing the asm for code-size).
$ more true.asm false.asm
::::::::::::::
true.asm
::::::::::::::
global _start
_start:
mov ebx,0
mov eax,1 ; SYS_exit from asm/unistd_32.h
int 0x80 ; The 32-bit ABI is supported in 64-bit code, in kernels compiled with IA-32 emulation
::::::::::::::
false.asm
::::::::::::::
global _start
_start:
mov ebx,1
mov eax,1
int 0x80
$ nasm -f elf64 true.asm && ld -s -o true true.o # -s means strip
$ nasm -f elf64 false.asm && ld -s -o false false.o
$ ll true false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 true
$ ./true ; echo $?
0
$ ./false ; echo $?
1
$
Or, with a bit of a nasty/ingenious approach (kudos to stalkr), create your own ELF headers, getting it down to 132 127 bytes. We're entering Code Golf territory here.
$ cat true2.asm
BITS 64
org 0x400000 ; _start is at 0x400080 as usual, but the ELF headers come first
ehdr: ; Elf64_Ehdr
db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 0x3e ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf64_Phdr
dd 1 ; p_type
dd 5 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
xor edi,edi ; int status = 0
; or mov dil,1 for false: high bytes are ignored.
lea eax, [rdi+60] ; rax = 60 = SYS_exit, using a 3-byte instruction: base+disp8 addressing mode
syscall ; native 64-bit system call, works without CONFIG_IA32_EMULATION
; less-golfed version:
; mov edi, 1 ; for false
; mov eax,252 ; SYS_exit_group from asm/unistd_64.h
; syscall
filesize equ $ - $$ ; used earlier in some ELF header fields
$ nasm -f bin -o true2 true2.asm
$ ll true2
-rw-r--r-- 1 peter peter 127 Jan 28 20:08 true2
$ chmod +x true2 ; ./true2 ; echo $?
0
$
2
Comments are not for extended discussion; this conversation has been moved to chat.
– terdon♦
Jan 28 '18 at 16:27
2
Also see this excellent write-up: muppetlabs.com/~breadbox/software/tiny/teensy.html
– mic_e
Jan 28 '18 at 21:38
3
You're using theint 0x80
32-bit ABI in a 64-bit executable, which is unusual but supported. Usingsyscall
wouldn't save you anything. The high bytes ofebx
are ignored, so you could use 2-bytemov bl,1
. Or of coursexor ebx,ebx
for zero. Linux inits integer registers to zero, so you could justinc eax
to get 1 = __NR_exit (i386 ABI).
– Peter Cordes
Jan 28 '18 at 23:48
1
I updated the code on your golfed example to use the 64-bit ABI, and golf it down to 127 bytes fortrue
. (I don't see an easy way to manage less than 128 bytes forfalse
, though, other than using the 32-bit ABI or taking advantage of the fact that Linux zeros registers on process startup, somov al,252
(2 bytes) works.push imm8
/pop rdi
would also work instead oflea
for settingedi=1
, but we still can't beat the 32-bit ABI where we couldmov bl,1
without a REX prefix.
– Peter Cordes
Jan 29 '18 at 0:17
add a comment |
Stripping them down to core functionality and writing in assembler yields far smaller binaries.
Original true/false binaries are written in C, which by its nature pulls in various library + symbol references. If you run readelf -a /bin/true
this is quite noticeable.
352 bytes for a stripped ELF static executable (with room to save a couple bytes by optimizing the asm for code-size).
$ more true.asm false.asm
::::::::::::::
true.asm
::::::::::::::
global _start
_start:
mov ebx,0
mov eax,1 ; SYS_exit from asm/unistd_32.h
int 0x80 ; The 32-bit ABI is supported in 64-bit code, in kernels compiled with IA-32 emulation
::::::::::::::
false.asm
::::::::::::::
global _start
_start:
mov ebx,1
mov eax,1
int 0x80
$ nasm -f elf64 true.asm && ld -s -o true true.o # -s means strip
$ nasm -f elf64 false.asm && ld -s -o false false.o
$ ll true false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 true
$ ./true ; echo $?
0
$ ./false ; echo $?
1
$
Or, with a bit of a nasty/ingenious approach (kudos to stalkr), create your own ELF headers, getting it down to 132 127 bytes. We're entering Code Golf territory here.
$ cat true2.asm
BITS 64
org 0x400000 ; _start is at 0x400080 as usual, but the ELF headers come first
ehdr: ; Elf64_Ehdr
db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 0x3e ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf64_Phdr
dd 1 ; p_type
dd 5 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
xor edi,edi ; int status = 0
; or mov dil,1 for false: high bytes are ignored.
lea eax, [rdi+60] ; rax = 60 = SYS_exit, using a 3-byte instruction: base+disp8 addressing mode
syscall ; native 64-bit system call, works without CONFIG_IA32_EMULATION
; less-golfed version:
; mov edi, 1 ; for false
; mov eax,252 ; SYS_exit_group from asm/unistd_64.h
; syscall
filesize equ $ - $$ ; used earlier in some ELF header fields
$ nasm -f bin -o true2 true2.asm
$ ll true2
-rw-r--r-- 1 peter peter 127 Jan 28 20:08 true2
$ chmod +x true2 ; ./true2 ; echo $?
0
$
Stripping them down to core functionality and writing in assembler yields far smaller binaries.
Original true/false binaries are written in C, which by its nature pulls in various library + symbol references. If you run readelf -a /bin/true
this is quite noticeable.
352 bytes for a stripped ELF static executable (with room to save a couple bytes by optimizing the asm for code-size).
$ more true.asm false.asm
::::::::::::::
true.asm
::::::::::::::
global _start
_start:
mov ebx,0
mov eax,1 ; SYS_exit from asm/unistd_32.h
int 0x80 ; The 32-bit ABI is supported in 64-bit code, in kernels compiled with IA-32 emulation
::::::::::::::
false.asm
::::::::::::::
global _start
_start:
mov ebx,1
mov eax,1
int 0x80
$ nasm -f elf64 true.asm && ld -s -o true true.o # -s means strip
$ nasm -f elf64 false.asm && ld -s -o false false.o
$ ll true false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 true
$ ./true ; echo $?
0
$ ./false ; echo $?
1
$
Or, with a bit of a nasty/ingenious approach (kudos to stalkr), create your own ELF headers, getting it down to 132 127 bytes. We're entering Code Golf territory here.
$ cat true2.asm
BITS 64
org 0x400000 ; _start is at 0x400080 as usual, but the ELF headers come first
ehdr: ; Elf64_Ehdr
db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 0x3e ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf64_Phdr
dd 1 ; p_type
dd 5 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
xor edi,edi ; int status = 0
; or mov dil,1 for false: high bytes are ignored.
lea eax, [rdi+60] ; rax = 60 = SYS_exit, using a 3-byte instruction: base+disp8 addressing mode
syscall ; native 64-bit system call, works without CONFIG_IA32_EMULATION
; less-golfed version:
; mov edi, 1 ; for false
; mov eax,252 ; SYS_exit_group from asm/unistd_64.h
; syscall
filesize equ $ - $$ ; used earlier in some ELF header fields
$ nasm -f bin -o true2 true2.asm
$ ll true2
-rw-r--r-- 1 peter peter 127 Jan 28 20:08 true2
$ chmod +x true2 ; ./true2 ; echo $?
0
$
edited Feb 2 '18 at 17:21
Rui F Ribeiro
40.3k1479137
40.3k1479137
answered Jan 25 '18 at 21:05
stevesteve
14k22452
14k22452
2
Comments are not for extended discussion; this conversation has been moved to chat.
– terdon♦
Jan 28 '18 at 16:27
2
Also see this excellent write-up: muppetlabs.com/~breadbox/software/tiny/teensy.html
– mic_e
Jan 28 '18 at 21:38
3
You're using theint 0x80
32-bit ABI in a 64-bit executable, which is unusual but supported. Usingsyscall
wouldn't save you anything. The high bytes ofebx
are ignored, so you could use 2-bytemov bl,1
. Or of coursexor ebx,ebx
for zero. Linux inits integer registers to zero, so you could justinc eax
to get 1 = __NR_exit (i386 ABI).
– Peter Cordes
Jan 28 '18 at 23:48
1
I updated the code on your golfed example to use the 64-bit ABI, and golf it down to 127 bytes fortrue
. (I don't see an easy way to manage less than 128 bytes forfalse
, though, other than using the 32-bit ABI or taking advantage of the fact that Linux zeros registers on process startup, somov al,252
(2 bytes) works.push imm8
/pop rdi
would also work instead oflea
for settingedi=1
, but we still can't beat the 32-bit ABI where we couldmov bl,1
without a REX prefix.
– Peter Cordes
Jan 29 '18 at 0:17
add a comment |
2
Comments are not for extended discussion; this conversation has been moved to chat.
– terdon♦
Jan 28 '18 at 16:27
2
Also see this excellent write-up: muppetlabs.com/~breadbox/software/tiny/teensy.html
– mic_e
Jan 28 '18 at 21:38
3
You're using theint 0x80
32-bit ABI in a 64-bit executable, which is unusual but supported. Usingsyscall
wouldn't save you anything. The high bytes ofebx
are ignored, so you could use 2-bytemov bl,1
. Or of coursexor ebx,ebx
for zero. Linux inits integer registers to zero, so you could justinc eax
to get 1 = __NR_exit (i386 ABI).
– Peter Cordes
Jan 28 '18 at 23:48
1
I updated the code on your golfed example to use the 64-bit ABI, and golf it down to 127 bytes fortrue
. (I don't see an easy way to manage less than 128 bytes forfalse
, though, other than using the 32-bit ABI or taking advantage of the fact that Linux zeros registers on process startup, somov al,252
(2 bytes) works.push imm8
/pop rdi
would also work instead oflea
for settingedi=1
, but we still can't beat the 32-bit ABI where we couldmov bl,1
without a REX prefix.
– Peter Cordes
Jan 29 '18 at 0:17
2
2
Comments are not for extended discussion; this conversation has been moved to chat.
– terdon♦
Jan 28 '18 at 16:27
Comments are not for extended discussion; this conversation has been moved to chat.
– terdon♦
Jan 28 '18 at 16:27
2
2
Also see this excellent write-up: muppetlabs.com/~breadbox/software/tiny/teensy.html
– mic_e
Jan 28 '18 at 21:38
Also see this excellent write-up: muppetlabs.com/~breadbox/software/tiny/teensy.html
– mic_e
Jan 28 '18 at 21:38
3
3
You're using the
int 0x80
32-bit ABI in a 64-bit executable, which is unusual but supported. Using syscall
wouldn't save you anything. The high bytes of ebx
are ignored, so you could use 2-byte mov bl,1
. Or of course xor ebx,ebx
for zero. Linux inits integer registers to zero, so you could just inc eax
to get 1 = __NR_exit (i386 ABI).– Peter Cordes
Jan 28 '18 at 23:48
You're using the
int 0x80
32-bit ABI in a 64-bit executable, which is unusual but supported. Using syscall
wouldn't save you anything. The high bytes of ebx
are ignored, so you could use 2-byte mov bl,1
. Or of course xor ebx,ebx
for zero. Linux inits integer registers to zero, so you could just inc eax
to get 1 = __NR_exit (i386 ABI).– Peter Cordes
Jan 28 '18 at 23:48
1
1
I updated the code on your golfed example to use the 64-bit ABI, and golf it down to 127 bytes for
true
. (I don't see an easy way to manage less than 128 bytes for false
, though, other than using the 32-bit ABI or taking advantage of the fact that Linux zeros registers on process startup, so mov al,252
(2 bytes) works. push imm8
/ pop rdi
would also work instead of lea
for setting edi=1
, but we still can't beat the 32-bit ABI where we could mov bl,1
without a REX prefix.– Peter Cordes
Jan 29 '18 at 0:17
I updated the code on your golfed example to use the 64-bit ABI, and golf it down to 127 bytes for
true
. (I don't see an easy way to manage less than 128 bytes for false
, though, other than using the 32-bit ABI or taking advantage of the fact that Linux zeros registers on process startup, so mov al,252
(2 bytes) works. push imm8
/ pop rdi
would also work instead of lea
for setting edi=1
, but we still can't beat the 32-bit ABI where we could mov bl,1
without a REX prefix.– Peter Cordes
Jan 29 '18 at 0:17
add a comment |
l $(which true false)
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/false
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/true
Pretty big on my Ubuntu 16.04 too. exactly the same size? What makes them so big?
strings $(which true)
(excerpt:)
Usage: %s [ignored command line arguments]
or: %s OPTION
Exit with a status code indicating success.
--help display this help and exit
--version output version information and exit
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here. Please refer to your shell's documentation
for details about the options it supports.
http://www.gnu.org/software/coreutils/
Report %s translation bugs to <http://translationproject.org/team/>
Full documentation at: <%s%s>
or available locally via: info '(coreutils) %s%s'
Ah, there is help for true and false, so let's try it:
true --help
true --version
#
Nothing. Ah, there was this other line:
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here.
So on my system, it's /bin/true, not /usr/bin/true
/bin/true --version
true (GNU coreutils) 8.25
Copyright © 2016 Free Software Foundation, Inc.
Lizenz GPLv3+: GNU GPL Version 3 oder höher <http://gnu.org/licenses/gpl.html>
Dies ist freie Software: Sie können sie ändern und weitergeben.
Es gibt keinerlei Garantien, soweit wie es das Gesetz erlaubt.
Geschrieben von Jim Meyering.
LANG=C /bin/true --version
true (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Jim Meyering.
So there is help, there is version information, binding to a library for internationalization. This explains much of the size, and the shell uses its optimized command anyway and most of the time.
Including static libraries, and half of the size of binary for elf metada. See my answer.
– Rui F Ribeiro
Feb 18 '18 at 19:13
add a comment |
l $(which true false)
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/false
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/true
Pretty big on my Ubuntu 16.04 too. exactly the same size? What makes them so big?
strings $(which true)
(excerpt:)
Usage: %s [ignored command line arguments]
or: %s OPTION
Exit with a status code indicating success.
--help display this help and exit
--version output version information and exit
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here. Please refer to your shell's documentation
for details about the options it supports.
http://www.gnu.org/software/coreutils/
Report %s translation bugs to <http://translationproject.org/team/>
Full documentation at: <%s%s>
or available locally via: info '(coreutils) %s%s'
Ah, there is help for true and false, so let's try it:
true --help
true --version
#
Nothing. Ah, there was this other line:
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here.
So on my system, it's /bin/true, not /usr/bin/true
/bin/true --version
true (GNU coreutils) 8.25
Copyright © 2016 Free Software Foundation, Inc.
Lizenz GPLv3+: GNU GPL Version 3 oder höher <http://gnu.org/licenses/gpl.html>
Dies ist freie Software: Sie können sie ändern und weitergeben.
Es gibt keinerlei Garantien, soweit wie es das Gesetz erlaubt.
Geschrieben von Jim Meyering.
LANG=C /bin/true --version
true (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Jim Meyering.
So there is help, there is version information, binding to a library for internationalization. This explains much of the size, and the shell uses its optimized command anyway and most of the time.
Including static libraries, and half of the size of binary for elf metada. See my answer.
– Rui F Ribeiro
Feb 18 '18 at 19:13
add a comment |
l $(which true false)
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/false
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/true
Pretty big on my Ubuntu 16.04 too. exactly the same size? What makes them so big?
strings $(which true)
(excerpt:)
Usage: %s [ignored command line arguments]
or: %s OPTION
Exit with a status code indicating success.
--help display this help and exit
--version output version information and exit
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here. Please refer to your shell's documentation
for details about the options it supports.
http://www.gnu.org/software/coreutils/
Report %s translation bugs to <http://translationproject.org/team/>
Full documentation at: <%s%s>
or available locally via: info '(coreutils) %s%s'
Ah, there is help for true and false, so let's try it:
true --help
true --version
#
Nothing. Ah, there was this other line:
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here.
So on my system, it's /bin/true, not /usr/bin/true
/bin/true --version
true (GNU coreutils) 8.25
Copyright © 2016 Free Software Foundation, Inc.
Lizenz GPLv3+: GNU GPL Version 3 oder höher <http://gnu.org/licenses/gpl.html>
Dies ist freie Software: Sie können sie ändern und weitergeben.
Es gibt keinerlei Garantien, soweit wie es das Gesetz erlaubt.
Geschrieben von Jim Meyering.
LANG=C /bin/true --version
true (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Jim Meyering.
So there is help, there is version information, binding to a library for internationalization. This explains much of the size, and the shell uses its optimized command anyway and most of the time.
l $(which true false)
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/false
-rwxr-xr-x 1 root root 27280 Mär 2 2017 /bin/true
Pretty big on my Ubuntu 16.04 too. exactly the same size? What makes them so big?
strings $(which true)
(excerpt:)
Usage: %s [ignored command line arguments]
or: %s OPTION
Exit with a status code indicating success.
--help display this help and exit
--version output version information and exit
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here. Please refer to your shell's documentation
for details about the options it supports.
http://www.gnu.org/software/coreutils/
Report %s translation bugs to <http://translationproject.org/team/>
Full documentation at: <%s%s>
or available locally via: info '(coreutils) %s%s'
Ah, there is help for true and false, so let's try it:
true --help
true --version
#
Nothing. Ah, there was this other line:
NOTE: your shell may have its own version of %s, which usually supersedes
the version described here.
So on my system, it's /bin/true, not /usr/bin/true
/bin/true --version
true (GNU coreutils) 8.25
Copyright © 2016 Free Software Foundation, Inc.
Lizenz GPLv3+: GNU GPL Version 3 oder höher <http://gnu.org/licenses/gpl.html>
Dies ist freie Software: Sie können sie ändern und weitergeben.
Es gibt keinerlei Garantien, soweit wie es das Gesetz erlaubt.
Geschrieben von Jim Meyering.
LANG=C /bin/true --version
true (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Jim Meyering.
So there is help, there is version information, binding to a library for internationalization. This explains much of the size, and the shell uses its optimized command anyway and most of the time.
answered Feb 14 '18 at 13:22
user unknownuser unknown
7,35312349
7,35312349
Including static libraries, and half of the size of binary for elf metada. See my answer.
– Rui F Ribeiro
Feb 18 '18 at 19:13
add a comment |
Including static libraries, and half of the size of binary for elf metada. See my answer.
– Rui F Ribeiro
Feb 18 '18 at 19:13
Including static libraries, and half of the size of binary for elf metada. See my answer.
– Rui F Ribeiro
Feb 18 '18 at 19:13
Including static libraries, and half of the size of binary for elf metada. See my answer.
– Rui F Ribeiro
Feb 18 '18 at 19:13
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f419697%2fwhy-are-true-and-false-so-large%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
9
You should use
command -V true
notwhich
. It will output:true is a shell builtin
for bash.– meuh
Jan 25 '18 at 20:53
32
true
andfalse
are builtins in every modern shell, but the systems also includes external program versions of them because it's part of the standard system so that programs invoking commands directly (bypassing the shell) can use them.which
ignores builtins, and looks up external commands only, which is why it only showed you the external ones. Trytype -a true
andtype -a false
instead.– mtraceur
Jan 25 '18 at 22:15
73
It's ironic that you write such a long question to say "Why are
true
andfalse
29kb each? What's in the executable other than the return code?"– David Richerby
Jan 25 '18 at 23:51
6
Some early versions of unix just had an empty file for true since that was a valid sh program that would return exit code 0. I really wish I could find an article I read years ago about the history of the true utility from an empty file to the monstrosity it is today, but all I could find is this: trillian.mit.edu/~jc/humor/ATT_Copyright_true.html
– Philip
Jan 26 '18 at 4:16
9
Obligatory - the smallest implementation of
false
: muppetlabs.com/~breadbox/software/tiny/teensy.html– d33tah
Jan 26 '18 at 14:36