Stack data from multiple delimited files into one, with variable columns
I'm new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldn't find a solution to my issue. Greatly appreciate any help..
I have three csv files -> Apex_10_Latest.csv, Apex_20_Latest.csv, Apex_30_Latest.csv. Number of columns are varying in these 3 files. Typically the latest file, based on the numbering, might have some new columns appended to the end. So I want to take the latest header and stack the data from all the 3 files into a new file Apex.csv. When stacking the data from older file which might have less columns than latest file, I want the data to be populated as null with appropriate delimiters..
Also this has to be done recursively for a multiple set of files (3 each), all in the same folder.
- Apex_10_Latest.csv,Apex_20_Latest.csv,Apex_30_Latest.csv - merged into Apex.csv
- Code_10_Latest.csv,Code_20_Latest.csv,Code_30_Latest.csv - merged into Code.csv
- Trans_10_Latest.csv,Trans_20_Latest.csv,Trans_30_Latest.csv - merged into Trans.csv
Following is the format of the source files and expected target file...
SOURCE FILES:
Apex_30_Latest.csv:
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6Apex_20_Latest.csv:
A,B,C
4,5,6
5,6,7
6,7,8Apex_10_Latest.csv:
A,B
7,8
8,9
9,10
EXPECTED TARGET FILE:
- Apex.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,,
5,6,7,,
6,7,8,,
7,8,,,
8,9,,,
9,10,,,
Thanks...
text-processing files awk csv merge
add a comment |
I'm new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldn't find a solution to my issue. Greatly appreciate any help..
I have three csv files -> Apex_10_Latest.csv, Apex_20_Latest.csv, Apex_30_Latest.csv. Number of columns are varying in these 3 files. Typically the latest file, based on the numbering, might have some new columns appended to the end. So I want to take the latest header and stack the data from all the 3 files into a new file Apex.csv. When stacking the data from older file which might have less columns than latest file, I want the data to be populated as null with appropriate delimiters..
Also this has to be done recursively for a multiple set of files (3 each), all in the same folder.
- Apex_10_Latest.csv,Apex_20_Latest.csv,Apex_30_Latest.csv - merged into Apex.csv
- Code_10_Latest.csv,Code_20_Latest.csv,Code_30_Latest.csv - merged into Code.csv
- Trans_10_Latest.csv,Trans_20_Latest.csv,Trans_30_Latest.csv - merged into Trans.csv
Following is the format of the source files and expected target file...
SOURCE FILES:
Apex_30_Latest.csv:
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6Apex_20_Latest.csv:
A,B,C
4,5,6
5,6,7
6,7,8Apex_10_Latest.csv:
A,B
7,8
8,9
9,10
EXPECTED TARGET FILE:
- Apex.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,,
5,6,7,,
6,7,8,,
7,8,,,
8,9,,,
9,10,,,
Thanks...
text-processing files awk csv merge
Welcome to stackexchange! Great first question. I believe the commandscut
andpaste
can be configured to do what you want, but you may need to use a more fully featured text processing tool such asawk
.
– Wildcard
Oct 23 '15 at 2:34
A question: Do the headers always start the same? Or might you have A,B,D in one file? Or C,D,E? In other words, is column 3 in one file the same as column 3 in every other file that has 3 columns? (And for every other number in place of 3?)
– Wildcard
Oct 23 '15 at 5:37
add a comment |
I'm new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldn't find a solution to my issue. Greatly appreciate any help..
I have three csv files -> Apex_10_Latest.csv, Apex_20_Latest.csv, Apex_30_Latest.csv. Number of columns are varying in these 3 files. Typically the latest file, based on the numbering, might have some new columns appended to the end. So I want to take the latest header and stack the data from all the 3 files into a new file Apex.csv. When stacking the data from older file which might have less columns than latest file, I want the data to be populated as null with appropriate delimiters..
Also this has to be done recursively for a multiple set of files (3 each), all in the same folder.
- Apex_10_Latest.csv,Apex_20_Latest.csv,Apex_30_Latest.csv - merged into Apex.csv
- Code_10_Latest.csv,Code_20_Latest.csv,Code_30_Latest.csv - merged into Code.csv
- Trans_10_Latest.csv,Trans_20_Latest.csv,Trans_30_Latest.csv - merged into Trans.csv
Following is the format of the source files and expected target file...
SOURCE FILES:
Apex_30_Latest.csv:
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6Apex_20_Latest.csv:
A,B,C
4,5,6
5,6,7
6,7,8Apex_10_Latest.csv:
A,B
7,8
8,9
9,10
EXPECTED TARGET FILE:
- Apex.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,,
5,6,7,,
6,7,8,,
7,8,,,
8,9,,,
9,10,,,
Thanks...
text-processing files awk csv merge
I'm new to scripting. Got struck with a file merge issue in Unix. Was looking for some direction and stumbled upon this site. I saw many great posts and replies but couldn't find a solution to my issue. Greatly appreciate any help..
I have three csv files -> Apex_10_Latest.csv, Apex_20_Latest.csv, Apex_30_Latest.csv. Number of columns are varying in these 3 files. Typically the latest file, based on the numbering, might have some new columns appended to the end. So I want to take the latest header and stack the data from all the 3 files into a new file Apex.csv. When stacking the data from older file which might have less columns than latest file, I want the data to be populated as null with appropriate delimiters..
Also this has to be done recursively for a multiple set of files (3 each), all in the same folder.
- Apex_10_Latest.csv,Apex_20_Latest.csv,Apex_30_Latest.csv - merged into Apex.csv
- Code_10_Latest.csv,Code_20_Latest.csv,Code_30_Latest.csv - merged into Code.csv
- Trans_10_Latest.csv,Trans_20_Latest.csv,Trans_30_Latest.csv - merged into Trans.csv
Following is the format of the source files and expected target file...
SOURCE FILES:
Apex_30_Latest.csv:
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6Apex_20_Latest.csv:
A,B,C
4,5,6
5,6,7
6,7,8Apex_10_Latest.csv:
A,B
7,8
8,9
9,10
EXPECTED TARGET FILE:
- Apex.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,,
5,6,7,,
6,7,8,,
7,8,,,
8,9,,,
9,10,,,
Thanks...
text-processing files awk csv merge
text-processing files awk csv merge
edited Oct 23 '15 at 4:43
Wildcard
22.7k962164
22.7k962164
asked Oct 23 '15 at 2:26
wamshiwamshi
162
162
Welcome to stackexchange! Great first question. I believe the commandscut
andpaste
can be configured to do what you want, but you may need to use a more fully featured text processing tool such asawk
.
– Wildcard
Oct 23 '15 at 2:34
A question: Do the headers always start the same? Or might you have A,B,D in one file? Or C,D,E? In other words, is column 3 in one file the same as column 3 in every other file that has 3 columns? (And for every other number in place of 3?)
– Wildcard
Oct 23 '15 at 5:37
add a comment |
Welcome to stackexchange! Great first question. I believe the commandscut
andpaste
can be configured to do what you want, but you may need to use a more fully featured text processing tool such asawk
.
– Wildcard
Oct 23 '15 at 2:34
A question: Do the headers always start the same? Or might you have A,B,D in one file? Or C,D,E? In other words, is column 3 in one file the same as column 3 in every other file that has 3 columns? (And for every other number in place of 3?)
– Wildcard
Oct 23 '15 at 5:37
Welcome to stackexchange! Great first question. I believe the commands
cut
and paste
can be configured to do what you want, but you may need to use a more fully featured text processing tool such as awk
.– Wildcard
Oct 23 '15 at 2:34
Welcome to stackexchange! Great first question. I believe the commands
cut
and paste
can be configured to do what you want, but you may need to use a more fully featured text processing tool such as awk
.– Wildcard
Oct 23 '15 at 2:34
A question: Do the headers always start the same? Or might you have A,B,D in one file? Or C,D,E? In other words, is column 3 in one file the same as column 3 in every other file that has 3 columns? (And for every other number in place of 3?)
– Wildcard
Oct 23 '15 at 5:37
A question: Do the headers always start the same? Or might you have A,B,D in one file? Or C,D,E? In other words, is column 3 in one file the same as column 3 in every other file that has 3 columns? (And for every other number in place of 3?)
– Wildcard
Oct 23 '15 at 5:37
add a comment |
4 Answers
4
active
oldest
votes
cat $(ls -1 Apex_*_Latest.csv | sort -nr -k2 -t'_') | awk -F"," '{
if (NR==1){
nfm=NF};
for (i=1;i<=nfm;i++) {
printf $i","};
print ""}' >Apex.csv
You can reverse sort
filenames based on second field (30,20,10..) and cat
the files so that lines with highest no of columns comes first.
Then with awk
you can get the highest no of columns NF
from first line NR
if (NR==1){nfm=NF}
Then Run a for
loop till i
(column number) greater or equal nfm
to print values in the field no i
followed by ','. if there is no value for i
th field(happen when columns are less than latest files )it will print just ,
.
You can remove everything on the first line up toawk
and insert$(ls -r Apex_*_latest.csv)
on the last line between the last single quote'
and the>
. That would eliminatecat
andsort
, be less code, and easier to understand.
– RobertL
Oct 26 '15 at 3:24
add a comment |
I think the previous answer is the best, I just show a different approach since I haven't used awk in years, since perl and python became big. I think awk is fine, it's just that a mixture of shell, sed, python and/or perl has suited my work better.
However, in this case, I think anyone can see that the awk solution is more succinct and easier to read. Come to think of it, I think I've heard awk referred to as the command line spreadsheet, or something like that. :-)
Based on the original post, I've chosen to let the ls command sort the filenames by file modification time, as opposed to relying on the filename format. Six of one, half a dozen of the other.
So just for comparison, here's an efficient, portable, modular(?!), pure shell version of the solution:
#!/bin/sh
get_commas() {
sed 's/[^,]//g; 1q' "$@"
}
get_extra_commas() {
local maxcommas="$1"
local file="$2"
local new_commas=$(get_commas "$file")
local extra_commas=""
while [ "${new_commas}${extra_commas}" != "${maxcommas}" ]
do
extra_commas=",$extra_commas"
done
echo "$extra_commas"
}
unset header
ls -t Apex*.csv |
while read filename
do
if [ -z "$header" ]
then
header="$(sed 1q "$filename")"
commas=$(echo "$header" | get_commas)
echo "$header"
fi
extra_commas=$(get_extra_commas $commas "$filename")
sed "1d; s/$/$extra_commas/" "$filename"
done
The analogies that have stuck with me from a variety of sources are:sed
is the assembly language of text processing;awk
is the command line spreadsheet tool;perl
is the omnipotent god of text processing. :)
– Wildcard
Oct 27 '15 at 6:19
More like omnipotent demon! :-)
– RobertL
Oct 27 '15 at 6:45
add a comment |
Here is an answer implemented in Miller:
$ cat rect.mlr
for (k,v in $*) {
@fields[k] = v; # retain already-seen field names
}
for (k,v in @fields) {
if (isabsent($[k])) {
$[k] = "";
}
}
$ mlr --csvlite put -f rect.mlr Apex_30_Latest.csv Apex_20_Latest.csv Apex_10_Latest.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
Since Miller handles named columns intrinsically, header-line management becomes simpler.
add a comment |
with Miller (http://johnkerl.org/miller/doc/) as usual is very easy
mlr --csv unsparsify Apex_*_Latest.csv
gives you
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f238046%2fstack-data-from-multiple-delimited-files-into-one-with-variable-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
cat $(ls -1 Apex_*_Latest.csv | sort -nr -k2 -t'_') | awk -F"," '{
if (NR==1){
nfm=NF};
for (i=1;i<=nfm;i++) {
printf $i","};
print ""}' >Apex.csv
You can reverse sort
filenames based on second field (30,20,10..) and cat
the files so that lines with highest no of columns comes first.
Then with awk
you can get the highest no of columns NF
from first line NR
if (NR==1){nfm=NF}
Then Run a for
loop till i
(column number) greater or equal nfm
to print values in the field no i
followed by ','. if there is no value for i
th field(happen when columns are less than latest files )it will print just ,
.
You can remove everything on the first line up toawk
and insert$(ls -r Apex_*_latest.csv)
on the last line between the last single quote'
and the>
. That would eliminatecat
andsort
, be less code, and easier to understand.
– RobertL
Oct 26 '15 at 3:24
add a comment |
cat $(ls -1 Apex_*_Latest.csv | sort -nr -k2 -t'_') | awk -F"," '{
if (NR==1){
nfm=NF};
for (i=1;i<=nfm;i++) {
printf $i","};
print ""}' >Apex.csv
You can reverse sort
filenames based on second field (30,20,10..) and cat
the files so that lines with highest no of columns comes first.
Then with awk
you can get the highest no of columns NF
from first line NR
if (NR==1){nfm=NF}
Then Run a for
loop till i
(column number) greater or equal nfm
to print values in the field no i
followed by ','. if there is no value for i
th field(happen when columns are less than latest files )it will print just ,
.
You can remove everything on the first line up toawk
and insert$(ls -r Apex_*_latest.csv)
on the last line between the last single quote'
and the>
. That would eliminatecat
andsort
, be less code, and easier to understand.
– RobertL
Oct 26 '15 at 3:24
add a comment |
cat $(ls -1 Apex_*_Latest.csv | sort -nr -k2 -t'_') | awk -F"," '{
if (NR==1){
nfm=NF};
for (i=1;i<=nfm;i++) {
printf $i","};
print ""}' >Apex.csv
You can reverse sort
filenames based on second field (30,20,10..) and cat
the files so that lines with highest no of columns comes first.
Then with awk
you can get the highest no of columns NF
from first line NR
if (NR==1){nfm=NF}
Then Run a for
loop till i
(column number) greater or equal nfm
to print values in the field no i
followed by ','. if there is no value for i
th field(happen when columns are less than latest files )it will print just ,
.
cat $(ls -1 Apex_*_Latest.csv | sort -nr -k2 -t'_') | awk -F"," '{
if (NR==1){
nfm=NF};
for (i=1;i<=nfm;i++) {
printf $i","};
print ""}' >Apex.csv
You can reverse sort
filenames based on second field (30,20,10..) and cat
the files so that lines with highest no of columns comes first.
Then with awk
you can get the highest no of columns NF
from first line NR
if (NR==1){nfm=NF}
Then Run a for
loop till i
(column number) greater or equal nfm
to print values in the field no i
followed by ','. if there is no value for i
th field(happen when columns are less than latest files )it will print just ,
.
answered Oct 23 '15 at 9:02
7171u7171u
85548
85548
You can remove everything on the first line up toawk
and insert$(ls -r Apex_*_latest.csv)
on the last line between the last single quote'
and the>
. That would eliminatecat
andsort
, be less code, and easier to understand.
– RobertL
Oct 26 '15 at 3:24
add a comment |
You can remove everything on the first line up toawk
and insert$(ls -r Apex_*_latest.csv)
on the last line between the last single quote'
and the>
. That would eliminatecat
andsort
, be less code, and easier to understand.
– RobertL
Oct 26 '15 at 3:24
You can remove everything on the first line up to
awk
and insert $(ls -r Apex_*_latest.csv)
on the last line between the last single quote '
and the >
. That would eliminate cat
and sort
, be less code, and easier to understand.– RobertL
Oct 26 '15 at 3:24
You can remove everything on the first line up to
awk
and insert $(ls -r Apex_*_latest.csv)
on the last line between the last single quote '
and the >
. That would eliminate cat
and sort
, be less code, and easier to understand.– RobertL
Oct 26 '15 at 3:24
add a comment |
I think the previous answer is the best, I just show a different approach since I haven't used awk in years, since perl and python became big. I think awk is fine, it's just that a mixture of shell, sed, python and/or perl has suited my work better.
However, in this case, I think anyone can see that the awk solution is more succinct and easier to read. Come to think of it, I think I've heard awk referred to as the command line spreadsheet, or something like that. :-)
Based on the original post, I've chosen to let the ls command sort the filenames by file modification time, as opposed to relying on the filename format. Six of one, half a dozen of the other.
So just for comparison, here's an efficient, portable, modular(?!), pure shell version of the solution:
#!/bin/sh
get_commas() {
sed 's/[^,]//g; 1q' "$@"
}
get_extra_commas() {
local maxcommas="$1"
local file="$2"
local new_commas=$(get_commas "$file")
local extra_commas=""
while [ "${new_commas}${extra_commas}" != "${maxcommas}" ]
do
extra_commas=",$extra_commas"
done
echo "$extra_commas"
}
unset header
ls -t Apex*.csv |
while read filename
do
if [ -z "$header" ]
then
header="$(sed 1q "$filename")"
commas=$(echo "$header" | get_commas)
echo "$header"
fi
extra_commas=$(get_extra_commas $commas "$filename")
sed "1d; s/$/$extra_commas/" "$filename"
done
The analogies that have stuck with me from a variety of sources are:sed
is the assembly language of text processing;awk
is the command line spreadsheet tool;perl
is the omnipotent god of text processing. :)
– Wildcard
Oct 27 '15 at 6:19
More like omnipotent demon! :-)
– RobertL
Oct 27 '15 at 6:45
add a comment |
I think the previous answer is the best, I just show a different approach since I haven't used awk in years, since perl and python became big. I think awk is fine, it's just that a mixture of shell, sed, python and/or perl has suited my work better.
However, in this case, I think anyone can see that the awk solution is more succinct and easier to read. Come to think of it, I think I've heard awk referred to as the command line spreadsheet, or something like that. :-)
Based on the original post, I've chosen to let the ls command sort the filenames by file modification time, as opposed to relying on the filename format. Six of one, half a dozen of the other.
So just for comparison, here's an efficient, portable, modular(?!), pure shell version of the solution:
#!/bin/sh
get_commas() {
sed 's/[^,]//g; 1q' "$@"
}
get_extra_commas() {
local maxcommas="$1"
local file="$2"
local new_commas=$(get_commas "$file")
local extra_commas=""
while [ "${new_commas}${extra_commas}" != "${maxcommas}" ]
do
extra_commas=",$extra_commas"
done
echo "$extra_commas"
}
unset header
ls -t Apex*.csv |
while read filename
do
if [ -z "$header" ]
then
header="$(sed 1q "$filename")"
commas=$(echo "$header" | get_commas)
echo "$header"
fi
extra_commas=$(get_extra_commas $commas "$filename")
sed "1d; s/$/$extra_commas/" "$filename"
done
The analogies that have stuck with me from a variety of sources are:sed
is the assembly language of text processing;awk
is the command line spreadsheet tool;perl
is the omnipotent god of text processing. :)
– Wildcard
Oct 27 '15 at 6:19
More like omnipotent demon! :-)
– RobertL
Oct 27 '15 at 6:45
add a comment |
I think the previous answer is the best, I just show a different approach since I haven't used awk in years, since perl and python became big. I think awk is fine, it's just that a mixture of shell, sed, python and/or perl has suited my work better.
However, in this case, I think anyone can see that the awk solution is more succinct and easier to read. Come to think of it, I think I've heard awk referred to as the command line spreadsheet, or something like that. :-)
Based on the original post, I've chosen to let the ls command sort the filenames by file modification time, as opposed to relying on the filename format. Six of one, half a dozen of the other.
So just for comparison, here's an efficient, portable, modular(?!), pure shell version of the solution:
#!/bin/sh
get_commas() {
sed 's/[^,]//g; 1q' "$@"
}
get_extra_commas() {
local maxcommas="$1"
local file="$2"
local new_commas=$(get_commas "$file")
local extra_commas=""
while [ "${new_commas}${extra_commas}" != "${maxcommas}" ]
do
extra_commas=",$extra_commas"
done
echo "$extra_commas"
}
unset header
ls -t Apex*.csv |
while read filename
do
if [ -z "$header" ]
then
header="$(sed 1q "$filename")"
commas=$(echo "$header" | get_commas)
echo "$header"
fi
extra_commas=$(get_extra_commas $commas "$filename")
sed "1d; s/$/$extra_commas/" "$filename"
done
I think the previous answer is the best, I just show a different approach since I haven't used awk in years, since perl and python became big. I think awk is fine, it's just that a mixture of shell, sed, python and/or perl has suited my work better.
However, in this case, I think anyone can see that the awk solution is more succinct and easier to read. Come to think of it, I think I've heard awk referred to as the command line spreadsheet, or something like that. :-)
Based on the original post, I've chosen to let the ls command sort the filenames by file modification time, as opposed to relying on the filename format. Six of one, half a dozen of the other.
So just for comparison, here's an efficient, portable, modular(?!), pure shell version of the solution:
#!/bin/sh
get_commas() {
sed 's/[^,]//g; 1q' "$@"
}
get_extra_commas() {
local maxcommas="$1"
local file="$2"
local new_commas=$(get_commas "$file")
local extra_commas=""
while [ "${new_commas}${extra_commas}" != "${maxcommas}" ]
do
extra_commas=",$extra_commas"
done
echo "$extra_commas"
}
unset header
ls -t Apex*.csv |
while read filename
do
if [ -z "$header" ]
then
header="$(sed 1q "$filename")"
commas=$(echo "$header" | get_commas)
echo "$header"
fi
extra_commas=$(get_extra_commas $commas "$filename")
sed "1d; s/$/$extra_commas/" "$filename"
done
answered Oct 24 '15 at 22:25
RobertLRobertL
4,808624
4,808624
The analogies that have stuck with me from a variety of sources are:sed
is the assembly language of text processing;awk
is the command line spreadsheet tool;perl
is the omnipotent god of text processing. :)
– Wildcard
Oct 27 '15 at 6:19
More like omnipotent demon! :-)
– RobertL
Oct 27 '15 at 6:45
add a comment |
The analogies that have stuck with me from a variety of sources are:sed
is the assembly language of text processing;awk
is the command line spreadsheet tool;perl
is the omnipotent god of text processing. :)
– Wildcard
Oct 27 '15 at 6:19
More like omnipotent demon! :-)
– RobertL
Oct 27 '15 at 6:45
The analogies that have stuck with me from a variety of sources are:
sed
is the assembly language of text processing; awk
is the command line spreadsheet tool; perl
is the omnipotent god of text processing. :)– Wildcard
Oct 27 '15 at 6:19
The analogies that have stuck with me from a variety of sources are:
sed
is the assembly language of text processing; awk
is the command line spreadsheet tool; perl
is the omnipotent god of text processing. :)– Wildcard
Oct 27 '15 at 6:19
More like omnipotent demon! :-)
– RobertL
Oct 27 '15 at 6:45
More like omnipotent demon! :-)
– RobertL
Oct 27 '15 at 6:45
add a comment |
Here is an answer implemented in Miller:
$ cat rect.mlr
for (k,v in $*) {
@fields[k] = v; # retain already-seen field names
}
for (k,v in @fields) {
if (isabsent($[k])) {
$[k] = "";
}
}
$ mlr --csvlite put -f rect.mlr Apex_30_Latest.csv Apex_20_Latest.csv Apex_10_Latest.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
Since Miller handles named columns intrinsically, header-line management becomes simpler.
add a comment |
Here is an answer implemented in Miller:
$ cat rect.mlr
for (k,v in $*) {
@fields[k] = v; # retain already-seen field names
}
for (k,v in @fields) {
if (isabsent($[k])) {
$[k] = "";
}
}
$ mlr --csvlite put -f rect.mlr Apex_30_Latest.csv Apex_20_Latest.csv Apex_10_Latest.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
Since Miller handles named columns intrinsically, header-line management becomes simpler.
add a comment |
Here is an answer implemented in Miller:
$ cat rect.mlr
for (k,v in $*) {
@fields[k] = v; # retain already-seen field names
}
for (k,v in @fields) {
if (isabsent($[k])) {
$[k] = "";
}
}
$ mlr --csvlite put -f rect.mlr Apex_30_Latest.csv Apex_20_Latest.csv Apex_10_Latest.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
Since Miller handles named columns intrinsically, header-line management becomes simpler.
Here is an answer implemented in Miller:
$ cat rect.mlr
for (k,v in $*) {
@fields[k] = v; # retain already-seen field names
}
for (k,v in @fields) {
if (isabsent($[k])) {
$[k] = "";
}
}
$ mlr --csvlite put -f rect.mlr Apex_30_Latest.csv Apex_20_Latest.csv Apex_10_Latest.csv
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
Since Miller handles named columns intrinsically, header-line management becomes simpler.
answered Nov 12 '16 at 2:16
John KerlJohn Kerl
1465
1465
add a comment |
add a comment |
with Miller (http://johnkerl.org/miller/doc/) as usual is very easy
mlr --csv unsparsify Apex_*_Latest.csv
gives you
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
add a comment |
with Miller (http://johnkerl.org/miller/doc/) as usual is very easy
mlr --csv unsparsify Apex_*_Latest.csv
gives you
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
add a comment |
with Miller (http://johnkerl.org/miller/doc/) as usual is very easy
mlr --csv unsparsify Apex_*_Latest.csv
gives you
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
with Miller (http://johnkerl.org/miller/doc/) as usual is very easy
mlr --csv unsparsify Apex_*_Latest.csv
gives you
A,B,C,D
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,
5,6,7,
6,7,8,
7,8,,
8,9,,
9,10,,
answered 2 days ago
aborrusoaborruso
1619
1619
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f238046%2fstack-data-from-multiple-delimited-files-into-one-with-variable-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to stackexchange! Great first question. I believe the commands
cut
andpaste
can be configured to do what you want, but you may need to use a more fully featured text processing tool such asawk
.– Wildcard
Oct 23 '15 at 2:34
A question: Do the headers always start the same? Or might you have A,B,D in one file? Or C,D,E? In other words, is column 3 in one file the same as column 3 in every other file that has 3 columns? (And for every other number in place of 3?)
– Wildcard
Oct 23 '15 at 5:37