Is there a character as `delim` of `read`, so that `read` reads the entire of a file at once?












1














In bash manual, about read builtin command




-d delim The first character of delim is used to terminate the input line,
rather than newline.




Is it possible to specify a character as delim of read, so that it never matches (unless it can match EOF, which is a character?) and read always reads the entire of a file at once?



Thanks.










share|improve this question


















  • 1




    A file does not end with an End of File character
    – steeldriver
    Dec 29 '18 at 16:24






  • 1




    Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
    – Jeff Schaller
    Dec 29 '18 at 19:49
















1














In bash manual, about read builtin command




-d delim The first character of delim is used to terminate the input line,
rather than newline.




Is it possible to specify a character as delim of read, so that it never matches (unless it can match EOF, which is a character?) and read always reads the entire of a file at once?



Thanks.










share|improve this question


















  • 1




    A file does not end with an End of File character
    – steeldriver
    Dec 29 '18 at 16:24






  • 1




    Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
    – Jeff Schaller
    Dec 29 '18 at 19:49














1












1








1







In bash manual, about read builtin command




-d delim The first character of delim is used to terminate the input line,
rather than newline.




Is it possible to specify a character as delim of read, so that it never matches (unless it can match EOF, which is a character?) and read always reads the entire of a file at once?



Thanks.










share|improve this question













In bash manual, about read builtin command




-d delim The first character of delim is used to terminate the input line,
rather than newline.




Is it possible to specify a character as delim of read, so that it never matches (unless it can match EOF, which is a character?) and read always reads the entire of a file at once?



Thanks.







bash read






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 29 '18 at 15:54









Tim

26.1k74246455




26.1k74246455








  • 1




    A file does not end with an End of File character
    – steeldriver
    Dec 29 '18 at 16:24






  • 1




    Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
    – Jeff Schaller
    Dec 29 '18 at 19:49














  • 1




    A file does not end with an End of File character
    – steeldriver
    Dec 29 '18 at 16:24






  • 1




    Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
    – Jeff Schaller
    Dec 29 '18 at 19:49








1




1




A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24




A file does not end with an End of File character
– steeldriver
Dec 29 '18 at 16:24




1




1




Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49




Can you motivate the question with a use-case? Why does a file's contents need to end up in a variable?
– Jeff Schaller
Dec 29 '18 at 19:49










3 Answers
3






active

oldest

votes


















4














Since bash can't store NUL bytes in its variables anyway, you can always do:



IFS= read -rd '' var < file


which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).



Another option is to store the content of the file as the array of its lines (including the line delimiter if any):



readarray array < file


You can then join them with:



IFS=; var="${array[*]}"


If the input contains NUL bytes, everything past the first occurrence on each line will be lost.



In POSIX sh syntax, you can do:



var=$(cat < file; echo .); var=${var%.}


We add a . which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.



If the file contains NUL bytes, the behaviour will vary between implementations. zsh is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.



You could also store the content of the file in some encoded form like:



var=$(uuencode -m - < file)


And get it back with:



printf '%sn' "$var" | uudecode


Or with NULs encoded as 000 so as to be able to use it in arguments to printf %b in bash (assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):



var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file


And then:



printf %b "$var"


to get it back.






share|improve this answer























  • If the first read reads a file up to the first NUL, will the second read be able to read past NUL?
    – Tim
    2 days ago










  • @Tim, yes, IFS= read -rd '' is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
    – Stéphane Chazelas
    2 days ago



















3














The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.



You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:



IFS="" contents=$(<file)


is particularly elegant; it causes Bash to read the contents of file into the variable contents, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS="" sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).



Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read with the -N option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.






share|improve this answer










New contributor




ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • That however removes all trailing newline characters. In bash, that also removes all NUL bytes.
    – Stéphane Chazelas
    Dec 29 '18 at 19:52










  • @StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
    – ozzy
    Dec 29 '18 at 20:33










  • I don't get your point; should I add another 2 0s to 40000000 because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
    – pizdelect
    Dec 29 '18 at 20:57










  • @pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like fs=$(du -b file | cut -f1); read -rN${fs} contents <file, to make the code seem a tad less arbitrary : )
    – ozzy
    Dec 29 '18 at 21:18












  • @ozzy fwiw, that would better use wc -c instead of du .. | cut, and set LC_CTYPE=C, since the -N is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF "character", not second guess the OP (why use read? why read a whole file in a bash variable? why use that much bash in the 1st place ;-))
    – pizdelect
    Dec 30 '18 at 10:51





















2














In bash, use the -N (number of characters) option.



read -rN 40000000 foo


Omit the -r option if you really want backslashes to escape characters in the file.



from help read:




-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter



EOF is not a character, but a status: a read (the system call, not the shell builtin) has returned a zero length. But getchar() and other functions will conveniently return EOF which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z) because they were only keeping track of whole blocks in the filesystem metadata.



Curiously, read -N0 seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)



strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0


Notice that the buffer that bash's read builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...; otherwise bash will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.






share|improve this answer























  • I'm not sure read -N0 even makes any sense.
    – ilkkachu
    Dec 29 '18 at 17:23










  • note that Bash can't store NUL bytes in variables, and read drops them so you can't get an exact copy of a file that contains them.
    – ilkkachu
    Dec 29 '18 at 17:33












  • @StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at lbuf in lib/sh/zread.c).
    – pizdelect
    Dec 29 '18 at 20:00










  • @ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
    – pizdelect
    Dec 29 '18 at 20:17










  • @StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's read builtin will read 128 bytes chunks even on seekable files.
    – pizdelect
    Dec 29 '18 at 20:25













Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f491481%2fis-there-a-character-as-delim-of-read-so-that-read-reads-the-entire-of-a%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














Since bash can't store NUL bytes in its variables anyway, you can always do:



IFS= read -rd '' var < file


which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).



Another option is to store the content of the file as the array of its lines (including the line delimiter if any):



readarray array < file


You can then join them with:



IFS=; var="${array[*]}"


If the input contains NUL bytes, everything past the first occurrence on each line will be lost.



In POSIX sh syntax, you can do:



var=$(cat < file; echo .); var=${var%.}


We add a . which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.



If the file contains NUL bytes, the behaviour will vary between implementations. zsh is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.



You could also store the content of the file in some encoded form like:



var=$(uuencode -m - < file)


And get it back with:



printf '%sn' "$var" | uudecode


Or with NULs encoded as 000 so as to be able to use it in arguments to printf %b in bash (assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):



var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file


And then:



printf %b "$var"


to get it back.






share|improve this answer























  • If the first read reads a file up to the first NUL, will the second read be able to read past NUL?
    – Tim
    2 days ago










  • @Tim, yes, IFS= read -rd '' is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
    – Stéphane Chazelas
    2 days ago
















4














Since bash can't store NUL bytes in its variables anyway, you can always do:



IFS= read -rd '' var < file


which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).



Another option is to store the content of the file as the array of its lines (including the line delimiter if any):



readarray array < file


You can then join them with:



IFS=; var="${array[*]}"


If the input contains NUL bytes, everything past the first occurrence on each line will be lost.



In POSIX sh syntax, you can do:



var=$(cat < file; echo .); var=${var%.}


We add a . which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.



If the file contains NUL bytes, the behaviour will vary between implementations. zsh is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.



You could also store the content of the file in some encoded form like:



var=$(uuencode -m - < file)


And get it back with:



printf '%sn' "$var" | uudecode


Or with NULs encoded as 000 so as to be able to use it in arguments to printf %b in bash (assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):



var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file


And then:



printf %b "$var"


to get it back.






share|improve this answer























  • If the first read reads a file up to the first NUL, will the second read be able to read past NUL?
    – Tim
    2 days ago










  • @Tim, yes, IFS= read -rd '' is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
    – Stéphane Chazelas
    2 days ago














4












4








4






Since bash can't store NUL bytes in its variables anyway, you can always do:



IFS= read -rd '' var < file


which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).



Another option is to store the content of the file as the array of its lines (including the line delimiter if any):



readarray array < file


You can then join them with:



IFS=; var="${array[*]}"


If the input contains NUL bytes, everything past the first occurrence on each line will be lost.



In POSIX sh syntax, you can do:



var=$(cat < file; echo .); var=${var%.}


We add a . which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.



If the file contains NUL bytes, the behaviour will vary between implementations. zsh is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.



You could also store the content of the file in some encoded form like:



var=$(uuencode -m - < file)


And get it back with:



printf '%sn' "$var" | uudecode


Or with NULs encoded as 000 so as to be able to use it in arguments to printf %b in bash (assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):



var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file


And then:



printf %b "$var"


to get it back.






share|improve this answer














Since bash can't store NUL bytes in its variables anyway, you can always do:



IFS= read -rd '' var < file


which will store the content of the file up to the first NUL byte or the end of the file if the file has no NUL bytes (text files, by definition (by the POSIX definition at least) don't contain NUL bytes).



Another option is to store the content of the file as the array of its lines (including the line delimiter if any):



readarray array < file


You can then join them with:



IFS=; var="${array[*]}"


If the input contains NUL bytes, everything past the first occurrence on each line will be lost.



In POSIX sh syntax, you can do:



var=$(cat < file; echo .); var=${var%.}


We add a . which we remove afterwards to work around the fact that command substitution strips all trailing newline characters.



If the file contains NUL bytes, the behaviour will vary between implementations. zsh is the only shell that will preserve them (it's also the only shell that can store NUL bytes in its variables). bash and a few other shells just removes them, while some others choke on them and discard everything past the first NUL occurrence.



You could also store the content of the file in some encoded form like:



var=$(uuencode -m - < file)


And get it back with:



printf '%sn' "$var" | uudecode


Or with NULs encoded as 000 so as to be able to use it in arguments to printf %b in bash (assuming you're not using locales where the charset is BIG5, GB18030, GBK, BIG5-HKCSC):



var=; while true; do
if IFS= read -rd '' rec; then
var+=${rec//\/\\}\0000
else
var+=${rec//\/\\}
break
fi
done < file


And then:



printf %b "$var"


to get it back.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered Dec 29 '18 at 19:51









Stéphane Chazelas

300k54564913




300k54564913












  • If the first read reads a file up to the first NUL, will the second read be able to read past NUL?
    – Tim
    2 days ago










  • @Tim, yes, IFS= read -rd '' is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
    – Stéphane Chazelas
    2 days ago


















  • If the first read reads a file up to the first NUL, will the second read be able to read past NUL?
    – Tim
    2 days ago










  • @Tim, yes, IFS= read -rd '' is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
    – Stéphane Chazelas
    2 days ago
















If the first read reads a file up to the first NUL, will the second read be able to read past NUL?
– Tim
2 days ago




If the first read reads a file up to the first NUL, will the second read be able to read past NUL?
– Tim
2 days ago












@Tim, yes, IFS= read -rd '' is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
– Stéphane Chazelas
2 days ago




@Tim, yes, IFS= read -rd '' is often used in a loop to read NUL-delimited records in zsh or bash. You'll find many examples on unix.SE
– Stéphane Chazelas
2 days ago













3














The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.



You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:



IFS="" contents=$(<file)


is particularly elegant; it causes Bash to read the contents of file into the variable contents, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS="" sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).



Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read with the -N option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.






share|improve this answer










New contributor




ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • That however removes all trailing newline characters. In bash, that also removes all NUL bytes.
    – Stéphane Chazelas
    Dec 29 '18 at 19:52










  • @StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
    – ozzy
    Dec 29 '18 at 20:33










  • I don't get your point; should I add another 2 0s to 40000000 because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
    – pizdelect
    Dec 29 '18 at 20:57










  • @pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like fs=$(du -b file | cut -f1); read -rN${fs} contents <file, to make the code seem a tad less arbitrary : )
    – ozzy
    Dec 29 '18 at 21:18












  • @ozzy fwiw, that would better use wc -c instead of du .. | cut, and set LC_CTYPE=C, since the -N is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF "character", not second guess the OP (why use read? why read a whole file in a bash variable? why use that much bash in the 1st place ;-))
    – pizdelect
    Dec 30 '18 at 10:51


















3














The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.



You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:



IFS="" contents=$(<file)


is particularly elegant; it causes Bash to read the contents of file into the variable contents, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS="" sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).



Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read with the -N option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.






share|improve this answer










New contributor




ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • That however removes all trailing newline characters. In bash, that also removes all NUL bytes.
    – Stéphane Chazelas
    Dec 29 '18 at 19:52










  • @StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
    – ozzy
    Dec 29 '18 at 20:33










  • I don't get your point; should I add another 2 0s to 40000000 because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
    – pizdelect
    Dec 29 '18 at 20:57










  • @pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like fs=$(du -b file | cut -f1); read -rN${fs} contents <file, to make the code seem a tad less arbitrary : )
    – ozzy
    Dec 29 '18 at 21:18












  • @ozzy fwiw, that would better use wc -c instead of du .. | cut, and set LC_CTYPE=C, since the -N is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF "character", not second guess the OP (why use read? why read a whole file in a bash variable? why use that much bash in the 1st place ;-))
    – pizdelect
    Dec 30 '18 at 10:51
















3












3








3






The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.



You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:



IFS="" contents=$(<file)


is particularly elegant; it causes Bash to read the contents of file into the variable contents, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS="" sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).



Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read with the -N option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.






share|improve this answer










New contributor




ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









The answer is generally "no", simply because - as a general rule - there is no actual character in a file that conclusively marks the end of a file.



You are probably well advised to try a different approach, such as one of those suggested here: https://stackoverflow.com/questions/10984432/how-to-read-the-file-content-into-a-variable-in-one-go. Use of:



IFS="" contents=$(<file)


is particularly elegant; it causes Bash to read the contents of file into the variable contents, except for NULL-bytes, which Bash-variables can't hold (due to its internal use of C-style, NULL-byte terminated strings). IFS="" sets the internal field separator to empty so as to disable word splitting (and hence to avoid the removal of newlines).



Note: Since (for lack of reputation points) I can't comment on the answer suggesting the use of read with the -N option, I note here that that answer is - by definition - not guaranteed to work as it stands, because the filesize is unknown in advance.







share|improve this answer










New contributor




ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer








edited Dec 29 '18 at 20:30





















New contributor




ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered Dec 29 '18 at 16:23









ozzy

1893




1893




New contributor




ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






ozzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • That however removes all trailing newline characters. In bash, that also removes all NUL bytes.
    – Stéphane Chazelas
    Dec 29 '18 at 19:52










  • @StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
    – ozzy
    Dec 29 '18 at 20:33










  • I don't get your point; should I add another 2 0s to 40000000 because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
    – pizdelect
    Dec 29 '18 at 20:57










  • @pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like fs=$(du -b file | cut -f1); read -rN${fs} contents <file, to make the code seem a tad less arbitrary : )
    – ozzy
    Dec 29 '18 at 21:18












  • @ozzy fwiw, that would better use wc -c instead of du .. | cut, and set LC_CTYPE=C, since the -N is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF "character", not second guess the OP (why use read? why read a whole file in a bash variable? why use that much bash in the 1st place ;-))
    – pizdelect
    Dec 30 '18 at 10:51




















  • That however removes all trailing newline characters. In bash, that also removes all NUL bytes.
    – Stéphane Chazelas
    Dec 29 '18 at 19:52










  • @StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
    – ozzy
    Dec 29 '18 at 20:33










  • I don't get your point; should I add another 2 0s to 40000000 because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
    – pizdelect
    Dec 29 '18 at 20:57










  • @pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like fs=$(du -b file | cut -f1); read -rN${fs} contents <file, to make the code seem a tad less arbitrary : )
    – ozzy
    Dec 29 '18 at 21:18












  • @ozzy fwiw, that would better use wc -c instead of du .. | cut, and set LC_CTYPE=C, since the -N is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF "character", not second guess the OP (why use read? why read a whole file in a bash variable? why use that much bash in the 1st place ;-))
    – pizdelect
    Dec 30 '18 at 10:51


















That however removes all trailing newline characters. In bash, that also removes all NUL bytes.
– Stéphane Chazelas
Dec 29 '18 at 19:52




That however removes all trailing newline characters. In bash, that also removes all NUL bytes.
– Stéphane Chazelas
Dec 29 '18 at 19:52












@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33




@StéphaneChazelas Thanks for pointing that out; I adapted the answer to fix the issue as much as possible.
– ozzy
Dec 29 '18 at 20:33












I don't get your point; should I add another 2 0s to 40000000 because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
– pizdelect
Dec 29 '18 at 20:57




I don't get your point; should I add another 2 0s to 40000000 because maybe someone is going to read a 4G file into a bash variable, with a 128 bytes read buffer? ;-)
– pizdelect
Dec 29 '18 at 20:57












@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like fs=$(du -b file | cut -f1); read -rN${fs} contents <file, to make the code seem a tad less arbitrary : )
– ozzy
Dec 29 '18 at 21:18






@pizdelect Honestly, I didn't know about the 128 bytes read buffer. Still, I'd be inclined to code something more like fs=$(du -b file | cut -f1); read -rN${fs} contents <file, to make the code seem a tad less arbitrary : )
– ozzy
Dec 29 '18 at 21:18














@ozzy fwiw, that would better use wc -c instead of du .. | cut, and set LC_CTYPE=C, since the -N is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF "character", not second guess the OP (why use read? why read a whole file in a bash variable? why use that much bash in the 1st place ;-))
– pizdelect
Dec 30 '18 at 10:51






@ozzy fwiw, that would better use wc -c instead of du .. | cut, and set LC_CTYPE=C, since the -N is the length in characters, not in bytes. All in all, I'm sorry that you didn't like my answer; I had tried to keep to the actual question and correct some misconceptions about the EOF "character", not second guess the OP (why use read? why read a whole file in a bash variable? why use that much bash in the 1st place ;-))
– pizdelect
Dec 30 '18 at 10:51













2














In bash, use the -N (number of characters) option.



read -rN 40000000 foo


Omit the -r option if you really want backslashes to escape characters in the file.



from help read:




-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter



EOF is not a character, but a status: a read (the system call, not the shell builtin) has returned a zero length. But getchar() and other functions will conveniently return EOF which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z) because they were only keeping track of whole blocks in the filesystem metadata.



Curiously, read -N0 seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)



strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0


Notice that the buffer that bash's read builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...; otherwise bash will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.






share|improve this answer























  • I'm not sure read -N0 even makes any sense.
    – ilkkachu
    Dec 29 '18 at 17:23










  • note that Bash can't store NUL bytes in variables, and read drops them so you can't get an exact copy of a file that contains them.
    – ilkkachu
    Dec 29 '18 at 17:33












  • @StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at lbuf in lib/sh/zread.c).
    – pizdelect
    Dec 29 '18 at 20:00










  • @ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
    – pizdelect
    Dec 29 '18 at 20:17










  • @StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's read builtin will read 128 bytes chunks even on seekable files.
    – pizdelect
    Dec 29 '18 at 20:25


















2














In bash, use the -N (number of characters) option.



read -rN 40000000 foo


Omit the -r option if you really want backslashes to escape characters in the file.



from help read:




-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter



EOF is not a character, but a status: a read (the system call, not the shell builtin) has returned a zero length. But getchar() and other functions will conveniently return EOF which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z) because they were only keeping track of whole blocks in the filesystem metadata.



Curiously, read -N0 seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)



strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0


Notice that the buffer that bash's read builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...; otherwise bash will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.






share|improve this answer























  • I'm not sure read -N0 even makes any sense.
    – ilkkachu
    Dec 29 '18 at 17:23










  • note that Bash can't store NUL bytes in variables, and read drops them so you can't get an exact copy of a file that contains them.
    – ilkkachu
    Dec 29 '18 at 17:33












  • @StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at lbuf in lib/sh/zread.c).
    – pizdelect
    Dec 29 '18 at 20:00










  • @ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
    – pizdelect
    Dec 29 '18 at 20:17










  • @StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's read builtin will read 128 bytes chunks even on seekable files.
    – pizdelect
    Dec 29 '18 at 20:25
















2












2








2






In bash, use the -N (number of characters) option.



read -rN 40000000 foo


Omit the -r option if you really want backslashes to escape characters in the file.



from help read:




-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter



EOF is not a character, but a status: a read (the system call, not the shell builtin) has returned a zero length. But getchar() and other functions will conveniently return EOF which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z) because they were only keeping track of whole blocks in the filesystem metadata.



Curiously, read -N0 seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)



strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0


Notice that the buffer that bash's read builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...; otherwise bash will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.






share|improve this answer














In bash, use the -N (number of characters) option.



read -rN 40000000 foo


Omit the -r option if you really want backslashes to escape characters in the file.



from help read:




-N nchars return only after reading exactly NCHARS characters, unless
EOF is encountered or read times out, ignoring any delimiter



EOF is not a character, but a status: a read (the system call, not the shell builtin) has returned a zero length. But getchar() and other functions will conveniently return EOF which is an integer with a value (-1) that cannot conflict with any valid character from any charset. Thence the confusion, compounded by the fact that some old operating systems really did use an EOF marker (usually ^Z) because they were only keeping track of whole blocks in the filesystem metadata.



Curiously, read -N0 seems to does a "slow slurp" (it will read the whole file just the same, but doing a system call for each character). I'm not sure this is an intended feature ;-)



strace -fe trace=read ./bash -c 'echo yes | read -N0'
...
[pid 8032] read(0, "y", 1) = 1
[pid 8032] read(0, "e", 1) = 1
[pid 8032] read(0, "s", 1) = 1
[pid 8032] read(0, "n", 1) = 1
[pid 8032] read(0, "", 1) = 0


Notice that the buffer that bash's read builtin is using is only 128 bytes, so you shouldn't read large files with it. Also, if your file is heavily utf-8, you should use LC_CTYPE=C read ...; otherwise bash will alternate reads of 128 bytes with byte-by-byte reads, making it even slower.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 29 '18 at 20:37

























answered Dec 29 '18 at 16:20









pizdelect

36716




36716












  • I'm not sure read -N0 even makes any sense.
    – ilkkachu
    Dec 29 '18 at 17:23










  • note that Bash can't store NUL bytes in variables, and read drops them so you can't get an exact copy of a file that contains them.
    – ilkkachu
    Dec 29 '18 at 17:33












  • @StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at lbuf in lib/sh/zread.c).
    – pizdelect
    Dec 29 '18 at 20:00










  • @ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
    – pizdelect
    Dec 29 '18 at 20:17










  • @StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's read builtin will read 128 bytes chunks even on seekable files.
    – pizdelect
    Dec 29 '18 at 20:25




















  • I'm not sure read -N0 even makes any sense.
    – ilkkachu
    Dec 29 '18 at 17:23










  • note that Bash can't store NUL bytes in variables, and read drops them so you can't get an exact copy of a file that contains them.
    – ilkkachu
    Dec 29 '18 at 17:33












  • @StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at lbuf in lib/sh/zread.c).
    – pizdelect
    Dec 29 '18 at 20:00










  • @ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
    – pizdelect
    Dec 29 '18 at 20:17










  • @StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's read builtin will read 128 bytes chunks even on seekable files.
    – pizdelect
    Dec 29 '18 at 20:25


















I'm not sure read -N0 even makes any sense.
– ilkkachu
Dec 29 '18 at 17:23




I'm not sure read -N0 even makes any sense.
– ilkkachu
Dec 29 '18 at 17:23












note that Bash can't store NUL bytes in variables, and read drops them so you can't get an exact copy of a file that contains them.
– ilkkachu
Dec 29 '18 at 17:33






note that Bash can't store NUL bytes in variables, and read drops them so you can't get an exact copy of a file that contains them.
– ilkkachu
Dec 29 '18 at 17:33














@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at lbuf in lib/sh/zread.c).
– pizdelect
Dec 29 '18 at 20:00




@StéphaneChazelas Oh no, it's not that bad ;-) It will try to read the 40000000 in 128 bytes sized chunks (strace it or look at lbuf in lib/sh/zread.c).
– pizdelect
Dec 29 '18 at 20:00












@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17




@ilkkachu it's not at all obvious why truncating the file at the first NUL bytes is preferable to stripping NUL bytes. If you do the latter on an UTF-16 file, you will get something readable.
– pizdelect
Dec 29 '18 at 20:17












@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's read builtin will read 128 bytes chunks even on seekable files.
– pizdelect
Dec 29 '18 at 20:25






@StéphaneChazelas OK for the dumb character conversion (it will read 128 bytes in one shot, then another 128 one by one). But no, bash's read builtin will read 128 bytes chunks even on seekable files.
– pizdelect
Dec 29 '18 at 20:25




















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f491481%2fis-there-a-character-as-delim-of-read-so-that-read-reads-the-entire-of-a%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to make a Squid Proxy server?

Is this a new Fibonacci Identity?

19世紀