Is there a character as `delim` of `read`, so that `read` reads the entire of a file at once?
In bash manual, about read
builtin command
-d delim
The first character ofdelim
is used to terminate the input line,
rather than newline.
Is it possible to specify a character as delim
of read
, so that it never matches (unless it can match EOF, which is a character?) and read
always reads the entire of a file at once?
Thanks.
bash read
add a comment |
In bash manual, about read
builtin command
-d delim
The first character ofdelim
is used to terminate the input line,
rather than newline.
Is it possible to specify a character as delim
of read
, so that it never matches (unless it can match EOF, which is a character?) and read
always reads the entire of a file at once?
Thanks.
bash read
1
A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24
1
Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49
add a comment |
In bash manual, about read
builtin command
-d delim
The first character ofdelim
is used to terminate the input line,
rather than newline.
Is it possible to specify a character as delim
of read
, so that it never matches (unless it can match EOF, which is a character?) and read
always reads the entire of a file at once?
Thanks.
bash read
In bash manual, about read
builtin command
-d delim
The first character ofdelim
is used to terminate the input line,
rather than newline.
Is it possible to specify a character as delim
of read
, so that it never matches (unless it can match EOF, which is a character?) and read
always reads the entire of a file at once?
Thanks.
bash read
bash read
asked Dec 29 '18 at 15:54
Tim
26.1k74246455
26.1k74246455
1
A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24
1
Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49
add a comment |
1
A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24
1
Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49
1
1
A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24
A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24
1
1
Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49
Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49
add a comment |
3 Answers
3
active
oldest
votes
Since bash
can't store NUL bytes in its variables anyway, you can always do:
IFS= read -rd '' var < file
which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).
Another option is to store the content of the file as the array of its lines (including the line delimiter if any):
readarray array < file
You can then join them with:
IFS=; var="${array[*]}"
If the input contains NUL bytes, everything past the first occurrence on each line will be lost.
In POSIX sh syntax, you can do:
var=$(cat < file; echo .); var=${var%.}
We add a .
which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.
If the file contains NUL bytes, the behaviour will vary between implementations. zsh
is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash
and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.
You could also store the content of the file in some encoded form like:
var=$(uuencode -m - < file)
And get it back with:
printf '%sn' "$var" | uudecode
Or with NULs encoded as 000
so as to be able to use it in arguments to printf %b
in bash
(assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):
var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file
And then:
printf %b "$var"
to get it back.
If the firstread
reads a file up to the firstNUL
, will the secondread
be able to read pastNUL
?
– Tim
2 days ago
@Tim, yes,IFS= read -rd ''
is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
– Stéphane Chazelas
2 days ago
add a comment |
The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.
You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:
IFS="" contents=$(<file)
is particularly elegant; it causes Bash to read the contents of file
into the variable contents
, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS=""
sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).
Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read
with the -N
option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.
New contributor
That however removes all trailing newline characters. Inbash
, that also removes all NUL bytes.
– Stéphane Chazelas
Dec 29 '18 at 19:52
@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33
I don't get your point; should I add another 20
s to40000000
because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
– pizdelect
Dec 29 '18 at 20:57
@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more likefs=$(du -b file | cut -f1); read -rN${fs} contents <file
, to make the code seem a tad less arbitrary : )
– ozzy
Dec 29 '18 at 21:18
@ozzy fwiw, that would better usewc -c
instead ofdu .. | cut
, and setLC_CTYPE=C
, since the-N
is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about theEOF
"character", not second guess the OP (why useread
? why read a whole file in a bash variable? why use that muchbash
in the 1st place ;-))
– pizdelect
Dec 30 '18 at 10:51
|
show 1 more comment
In bash
, use the -N
(number of characters) option.
read -rN 40000000 foo
Omit the -r
option if you really want backslashes to escape characters in the file.
from help read
:
-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter
EOF
is not a character, but a status: a read
(the system call, not the shell builtin) has returned a zero length. But getchar()
and other functions will conveniently return EOF
which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z
) because they were only keeping track of whole blocks in the filesystem metadata.
Curiously, read -N0
seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)
strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0
Notice that the buffer that bash
's read
builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...
; otherwise bash
will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.
I'm not sureread -N0
even makes any sense.
– ilkkachu
Dec 29 '18 at 17:23
note that Bash can't store NUL bytes in variables, andread
drops them so you can't get an exact copy of a file that contains them.
– ilkkachu
Dec 29 '18 at 17:33
@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look atlbuf
inlib/sh/zread.c
).
– pizdelect
Dec 29 '18 at 20:00
@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17
@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash'sread
builtin will read 128 bytes chunks even on seekable files.
– pizdelect
Dec 29 '18 at 20:25
|
show 1 more comment
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f491481%2fis-there-a-character-as-delim-of-read-so-that-read-reads-the-entire-of-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since bash
can't store NUL bytes in its variables anyway, you can always do:
IFS= read -rd '' var < file
which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).
Another option is to store the content of the file as the array of its lines (including the line delimiter if any):
readarray array < file
You can then join them with:
IFS=; var="${array[*]}"
If the input contains NUL bytes, everything past the first occurrence on each line will be lost.
In POSIX sh syntax, you can do:
var=$(cat < file; echo .); var=${var%.}
We add a .
which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.
If the file contains NUL bytes, the behaviour will vary between implementations. zsh
is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash
and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.
You could also store the content of the file in some encoded form like:
var=$(uuencode -m - < file)
And get it back with:
printf '%sn' "$var" | uudecode
Or with NULs encoded as 000
so as to be able to use it in arguments to printf %b
in bash
(assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):
var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file
And then:
printf %b "$var"
to get it back.
If the firstread
reads a file up to the firstNUL
, will the secondread
be able to read pastNUL
?
– Tim
2 days ago
@Tim, yes,IFS= read -rd ''
is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
– Stéphane Chazelas
2 days ago
add a comment |
Since bash
can't store NUL bytes in its variables anyway, you can always do:
IFS= read -rd '' var < file
which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).
Another option is to store the content of the file as the array of its lines (including the line delimiter if any):
readarray array < file
You can then join them with:
IFS=; var="${array[*]}"
If the input contains NUL bytes, everything past the first occurrence on each line will be lost.
In POSIX sh syntax, you can do:
var=$(cat < file; echo .); var=${var%.}
We add a .
which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.
If the file contains NUL bytes, the behaviour will vary between implementations. zsh
is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash
and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.
You could also store the content of the file in some encoded form like:
var=$(uuencode -m - < file)
And get it back with:
printf '%sn' "$var" | uudecode
Or with NULs encoded as 000
so as to be able to use it in arguments to printf %b
in bash
(assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):
var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file
And then:
printf %b "$var"
to get it back.
If the firstread
reads a file up to the firstNUL
, will the secondread
be able to read pastNUL
?
– Tim
2 days ago
@Tim, yes,IFS= read -rd ''
is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
– Stéphane Chazelas
2 days ago
add a comment |
Since bash
can't store NUL bytes in its variables anyway, you can always do:
IFS= read -rd '' var < file
which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).
Another option is to store the content of the file as the array of its lines (including the line delimiter if any):
readarray array < file
You can then join them with:
IFS=; var="${array[*]}"
If the input contains NUL bytes, everything past the first occurrence on each line will be lost.
In POSIX sh syntax, you can do:
var=$(cat < file; echo .); var=${var%.}
We add a .
which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.
If the file contains NUL bytes, the behaviour will vary between implementations. zsh
is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash
and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.
You could also store the content of the file in some encoded form like:
var=$(uuencode -m - < file)
And get it back with:
printf '%sn' "$var" | uudecode
Or with NULs encoded as 000
so as to be able to use it in arguments to printf %b
in bash
(assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):
var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file
And then:
printf %b "$var"
to get it back.
Since bash
can't store NUL bytes in its variables anyway, you can always do:
IFS= read -rd '' var < file
which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).
Another option is to store the content of the file as the array of its lines (including the line delimiter if any):
readarray array < file
You can then join them with:
IFS=; var="${array[*]}"
If the input contains NUL bytes, everything past the first occurrence on each line will be lost.
In POSIX sh syntax, you can do:
var=$(cat < file; echo .); var=${var%.}
We add a .
which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.
If the file contains NUL bytes, the behaviour will vary between implementations. zsh
is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash
and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.
You could also store the content of the file in some encoded form like:
var=$(uuencode -m - < file)
And get it back with:
printf '%sn' "$var" | uudecode
Or with NULs encoded as 000
so as to be able to use it in arguments to printf %b
in bash
(assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):
var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file
And then:
printf %b "$var"
to get it back.
edited 2 days ago
answered Dec 29 '18 at 19:51
Stéphane Chazelas
300k54564913
300k54564913
If the firstread
reads a file up to the firstNUL
, will the secondread
be able to read pastNUL
?
– Tim
2 days ago
@Tim, yes,IFS= read -rd ''
is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
– Stéphane Chazelas
2 days ago
add a comment |
If the firstread
reads a file up to the firstNUL
, will the secondread
be able to read pastNUL
?
– Tim
2 days ago
@Tim, yes,IFS= read -rd ''
is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
– Stéphane Chazelas
2 days ago
If the first
read
reads a file up to the first NUL
, will the second read
be able to read past NUL
?– Tim
2 days ago
If the first
read
reads a file up to the first NUL
, will the second read
be able to read past NUL
?– Tim
2 days ago
@Tim, yes,
IFS= read -rd ''
is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE– Stéphane Chazelas
2 days ago
@Tim, yes,
IFS= read -rd ''
is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE– Stéphane Chazelas
2 days ago
add a comment |
The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.
You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:
IFS="" contents=$(<file)
is particularly elegant; it causes Bash to read the contents of file
into the variable contents
, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS=""
sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).
Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read
with the -N
option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.
New contributor
That however removes all trailing newline characters. Inbash
, that also removes all NUL bytes.
– Stéphane Chazelas
Dec 29 '18 at 19:52
@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33
I don't get your point; should I add another 20
s to40000000
because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
– pizdelect
Dec 29 '18 at 20:57
@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more likefs=$(du -b file | cut -f1); read -rN${fs} contents <file
, to make the code seem a tad less arbitrary : )
– ozzy
Dec 29 '18 at 21:18
@ozzy fwiw, that would better usewc -c
instead ofdu .. | cut
, and setLC_CTYPE=C
, since the-N
is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about theEOF
"character", not second guess the OP (why useread
? why read a whole file in a bash variable? why use that muchbash
in the 1st place ;-))
– pizdelect
Dec 30 '18 at 10:51
|
show 1 more comment
The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.
You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:
IFS="" contents=$(<file)
is particularly elegant; it causes Bash to read the contents of file
into the variable contents
, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS=""
sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).
Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read
with the -N
option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.
New contributor
That however removes all trailing newline characters. Inbash
, that also removes all NUL bytes.
– Stéphane Chazelas
Dec 29 '18 at 19:52
@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33
I don't get your point; should I add another 20
s to40000000
because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
– pizdelect
Dec 29 '18 at 20:57
@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more likefs=$(du -b file | cut -f1); read -rN${fs} contents <file
, to make the code seem a tad less arbitrary : )
– ozzy
Dec 29 '18 at 21:18
@ozzy fwiw, that would better usewc -c
instead ofdu .. | cut
, and setLC_CTYPE=C
, since the-N
is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about theEOF
"character", not second guess the OP (why useread
? why read a whole file in a bash variable? why use that muchbash
in the 1st place ;-))
– pizdelect
Dec 30 '18 at 10:51
|
show 1 more comment
The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.
You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:
IFS="" contents=$(<file)
is particularly elegant; it causes Bash to read the contents of file
into the variable contents
, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS=""
sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).
Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read
with the -N
option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.
New contributor
The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.
You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:
IFS="" contents=$(<file)
is particularly elegant; it causes Bash to read the contents of file
into the variable contents
, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS=""
sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).
Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read
with the -N
option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.
New contributor
edited Dec 29 '18 at 20:30
New contributor
answered Dec 29 '18 at 16:23
ozzy
1893
1893
New contributor
New contributor
That however removes all trailing newline characters. Inbash
, that also removes all NUL bytes.
– Stéphane Chazelas
Dec 29 '18 at 19:52
@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33
I don't get your point; should I add another 20
s to40000000
because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
– pizdelect
Dec 29 '18 at 20:57
@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more likefs=$(du -b file | cut -f1); read -rN${fs} contents <file
, to make the code seem a tad less arbitrary : )
– ozzy
Dec 29 '18 at 21:18
@ozzy fwiw, that would better usewc -c
instead ofdu .. | cut
, and setLC_CTYPE=C
, since the-N
is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about theEOF
"character", not second guess the OP (why useread
? why read a whole file in a bash variable? why use that muchbash
in the 1st place ;-))
– pizdelect
Dec 30 '18 at 10:51
|
show 1 more comment
That however removes all trailing newline characters. Inbash
, that also removes all NUL bytes.
– Stéphane Chazelas
Dec 29 '18 at 19:52
@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33
I don't get your point; should I add another 20
s to40000000
because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
– pizdelect
Dec 29 '18 at 20:57
@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more likefs=$(du -b file | cut -f1); read -rN${fs} contents <file
, to make the code seem a tad less arbitrary : )
– ozzy
Dec 29 '18 at 21:18
@ozzy fwiw, that would better usewc -c
instead ofdu .. | cut
, and setLC_CTYPE=C
, since the-N
is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about theEOF
"character", not second guess the OP (why useread
? why read a whole file in a bash variable? why use that muchbash
in the 1st place ;-))
– pizdelect
Dec 30 '18 at 10:51
That however removes all trailing newline characters. In
bash
, that also removes all NUL bytes.– Stéphane Chazelas
Dec 29 '18 at 19:52
That however removes all trailing newline characters. In
bash
, that also removes all NUL bytes.– Stéphane Chazelas
Dec 29 '18 at 19:52
@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33
@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33
I don't get your point; should I add another 2
0
s to 40000000
because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)– pizdelect
Dec 29 '18 at 20:57
I don't get your point; should I add another 2
0
s to 40000000
because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)– pizdelect
Dec 29 '18 at 20:57
@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like
fs=$(du -b file | cut -f1); read -rN${fs} contents <file
, to make the code seem a tad less arbitrary : )– ozzy
Dec 29 '18 at 21:18
@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like
fs=$(du -b file | cut -f1); read -rN${fs} contents <file
, to make the code seem a tad less arbitrary : )– ozzy
Dec 29 '18 at 21:18
@ozzy fwiw, that would better use
wc -c
instead of du .. | cut
, and set LC_CTYPE=C
, since the -N
is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF
"character", not second guess the OP (why use read
? why read a whole file in a bash variable? why use that much bash
in the 1st place ;-))– pizdelect
Dec 30 '18 at 10:51
@ozzy fwiw, that would better use
wc -c
instead of du .. | cut
, and set LC_CTYPE=C
, since the -N
is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF
"character", not second guess the OP (why use read
? why read a whole file in a bash variable? why use that much bash
in the 1st place ;-))– pizdelect
Dec 30 '18 at 10:51
|
show 1 more comment
In bash
, use the -N
(number of characters) option.
read -rN 40000000 foo
Omit the -r
option if you really want backslashes to escape characters in the file.
from help read
:
-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter
EOF
is not a character, but a status: a read
(the system call, not the shell builtin) has returned a zero length. But getchar()
and other functions will conveniently return EOF
which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z
) because they were only keeping track of whole blocks in the filesystem metadata.
Curiously, read -N0
seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)
strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0
Notice that the buffer that bash
's read
builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...
; otherwise bash
will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.
I'm not sureread -N0
even makes any sense.
– ilkkachu
Dec 29 '18 at 17:23
note that Bash can't store NUL bytes in variables, andread
drops them so you can't get an exact copy of a file that contains them.
– ilkkachu
Dec 29 '18 at 17:33
@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look atlbuf
inlib/sh/zread.c
).
– pizdelect
Dec 29 '18 at 20:00
@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17
@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash'sread
builtin will read 128 bytes chunks even on seekable files.
– pizdelect
Dec 29 '18 at 20:25
|
show 1 more comment
In bash
, use the -N
(number of characters) option.
read -rN 40000000 foo
Omit the -r
option if you really want backslashes to escape characters in the file.
from help read
:
-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter
EOF
is not a character, but a status: a read
(the system call, not the shell builtin) has returned a zero length. But getchar()
and other functions will conveniently return EOF
which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z
) because they were only keeping track of whole blocks in the filesystem metadata.
Curiously, read -N0
seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)
strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0
Notice that the buffer that bash
's read
builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...
; otherwise bash
will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.
I'm not sureread -N0
even makes any sense.
– ilkkachu
Dec 29 '18 at 17:23
note that Bash can't store NUL bytes in variables, andread
drops them so you can't get an exact copy of a file that contains them.
– ilkkachu
Dec 29 '18 at 17:33
@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look atlbuf
inlib/sh/zread.c
).
– pizdelect
Dec 29 '18 at 20:00
@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17
@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash'sread
builtin will read 128 bytes chunks even on seekable files.
– pizdelect
Dec 29 '18 at 20:25
|
show 1 more comment
In bash
, use the -N
(number of characters) option.
read -rN 40000000 foo
Omit the -r
option if you really want backslashes to escape characters in the file.
from help read
:
-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter
EOF
is not a character, but a status: a read
(the system call, not the shell builtin) has returned a zero length. But getchar()
and other functions will conveniently return EOF
which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z
) because they were only keeping track of whole blocks in the filesystem metadata.
Curiously, read -N0
seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)
strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0
Notice that the buffer that bash
's read
builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...
; otherwise bash
will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.
In bash
, use the -N
(number of characters) option.
read -rN 40000000 foo
Omit the -r
option if you really want backslashes to escape characters in the file.
from help read
:
-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter
EOF
is not a character, but a status: a read
(the system call, not the shell builtin) has returned a zero length. But getchar()
and other functions will conveniently return EOF
which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z
) because they were only keeping track of whole blocks in the filesystem metadata.
Curiously, read -N0
seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)
strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0
Notice that the buffer that bash
's read
builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...
; otherwise bash
will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.
edited Dec 29 '18 at 20:37
answered Dec 29 '18 at 16:20
pizdelect
36716
36716
I'm not sureread -N0
even makes any sense.
– ilkkachu
Dec 29 '18 at 17:23
note that Bash can't store NUL bytes in variables, andread
drops them so you can't get an exact copy of a file that contains them.
– ilkkachu
Dec 29 '18 at 17:33
@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look atlbuf
inlib/sh/zread.c
).
– pizdelect
Dec 29 '18 at 20:00
@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17
@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash'sread
builtin will read 128 bytes chunks even on seekable files.
– pizdelect
Dec 29 '18 at 20:25
|
show 1 more comment
I'm not sureread -N0
even makes any sense.
– ilkkachu
Dec 29 '18 at 17:23
note that Bash can't store NUL bytes in variables, andread
drops them so you can't get an exact copy of a file that contains them.
– ilkkachu
Dec 29 '18 at 17:33
@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look atlbuf
inlib/sh/zread.c
).
– pizdelect
Dec 29 '18 at 20:00
@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17
@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash'sread
builtin will read 128 bytes chunks even on seekable files.
– pizdelect
Dec 29 '18 at 20:25
I'm not sure
read -N0
even makes any sense.– ilkkachu
Dec 29 '18 at 17:23
I'm not sure
read -N0
even makes any sense.– ilkkachu
Dec 29 '18 at 17:23
note that Bash can't store NUL bytes in variables, and
read
drops them so you can't get an exact copy of a file that contains them.– ilkkachu
Dec 29 '18 at 17:33
note that Bash can't store NUL bytes in variables, and
read
drops them so you can't get an exact copy of a file that contains them.– ilkkachu
Dec 29 '18 at 17:33
@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at
lbuf
in lib/sh/zread.c
).– pizdelect
Dec 29 '18 at 20:00
@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at
lbuf
in lib/sh/zread.c
).– pizdelect
Dec 29 '18 at 20:00
@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17
@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17
@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's
read
builtin will read 128 bytes chunks even on seekable files.– pizdelect
Dec 29 '18 at 20:25
@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's
read
builtin will read 128 bytes chunks even on seekable files.– pizdelect
Dec 29 '18 at 20:25
|
show 1 more comment
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f491481%2fis-there-a-character-as-delim-of-read-so-that-read-reads-the-entire-of-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24
1
Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49