LINUX Shell script to convert rows to multiple columns

490 views Asked by At

Shell script to convert rows to multiple columns

input CSV file:

Driver Id,Driver Name,Measure Names,Measure Values
123,XYZ,Total Offers,10
123,XYZ,Driver Reject,0
123,XYZ,Driver Accept ,4
123,XYZ,Expired Offers,3
123,XYZ,Total Bookings,6
123,XYZ,Rider Cancels,2
123,XYZ,Driver Cancels,0
123,XYZ,Rider No-Show,0
123,XYZ,Completed Rides,4
124,PQR,Total Offers,2
124,PQR,Driver Reject,0
124,PQR,Driver Accept ,1
124,PQR,Expired Offers,1
124,PQR,Total Bookings,1
124,PQR,Rider Cancels,0
124,PQR,Driver Cancels,0
124,PQR,Rider No-Show,0
124,PQR,Completed Rides,1

Output Required:

Driver Id,Driver Name,Total Offers,Driver Reject,Driver Accept,Expired Offers,Total Bookings,Rider Cancels,Driver Cancels,Rider No-Show,Completed Rides

123,XYZ,10,0,4,3,6,2,0,0,4
124,PQR,2,0,1,1,1,0,0,0,1

I tried with awk but it gives incorrect result.

awk -F\, '
    BEGIN{
        P["Total Offers"]="%s;%s;%s;;;;;;;;;\n"
       P["Driver Reject"]="%s;%s;;%s;;;;;;;;\n"
       P["Driver Accept"]="%s;%s;;;%s;;;;;;;\n"
      P["Expired Offers"]="%s;%s;;;;%s;;;;;;\n"
      P["Total Bookings"]="%s;%s;;;;;%s;;;;;\n"     
       P["Rider Cancels"]="%s;%s;;;;;;%s;;;;\n"
      P["Driver Cancels"]="%s;%s;;;;;;;%s;;;\n"     
       P["Rider No-Show"]="%s;%s;;;;;;;;%s;;\n"     
     P["Completed Rides"]="%s;%s;;;;;;;;;%s;\n" 
        }                         
    FNR==1{
        print "Driver Id,Driver Name,Total Offers,Driver Reject,Driver Accept,Expired Offers,Total Bookings,Rider Cancels,Driver Cancels,Rider No-Show,Completed Rides"
        next
        }
    {
        printf(P[$3],$1,$2,$4)
        }
    ' sample1.csv

could somebody please assist me or show me any other method to implement this. Thanks in Advance

3

There are 3 answers

0
RavinderSingh13 On

Considering your Input_file is same as shown sample and if you don't care of about output sequence should be as input then following may help you in same.

awk -F, 'FNR>1{a[$1,$2]=a[$1,$2]?a[$1,$2] FS $NF:$NF} END{for(i in a){print i FS a[i]}}' SUBSEP=","   Input_file
4
Akshay Hegde On

Below one takes care of order of output as well as missing value, if there's any

awk '
     BEGIN{
       FS=OFS=SUBSEP=","; 
     }
     FNR==1{
        printf("%s%s%s",$1,OFS,$2);
        next
     }
     {
      if(!(($1,$2) in tmp)){
        usr[++u] = $1 OFS $2
        tmp[$1,$2]
      }
      if(!($3 in tmp)){
        names[++n] = $3;
        tmp[$3]
        printf("%s%s",OFS,$3)
      }
       arr[$1,$2,$3] = $4
     }
     END{
       print ""
       for(u=1; u in usr; u++){
           printf("%s", usr[u]);
           for(n=1; n in names; n++){
               indexkey = usr[u] SUBSEP names[n]
               printf("%s%s",OFS, (indexkey in arr) ? arr[indexkey]:"")
           }
           print ""
       }
     }
    ' infile

Explanation:

  • FS=OFS=SUBSEP=","; - Set field separator, output field separator and built-in variable subsep to comma, in current program atleast atleast OFS and SUBSEP should be same, because I used it access array indexkey = usr[u] SUBSEP names[n], so if you got any other input field separator (say pipe) then make FS="|"; OFS=SUBSEP=","

  • FNR==1{ printf("%s%s%s",$1,OFS,$2); next } If first line, then print first 2 fields and go to next line

  • if(!(($1,$2) in tmp)){ usr[++u] = $1 OFS $2 tmp[$1,$2] } Since you want ordered output, contiguous (in order) array (usr) is used in this program. tmp is array, where as index being $1 and $2, usr is array, where index being variable u, value being $1 and $2, if(!(($1,$2) in tmp)) takes care of if doesn't exist before.

  • if(!($3 in tmp)){ names[++n] = $3; tmp[$3] printf("%s%s",OFS,$3) } Similarly like above, names array is contiguous, value being $3

  • arr[$1,$2,$3] = $4 array arr key being 3 fields, $1,$2,$3 and value being $4

  • Finally in END block loop through usr and names array, build indexkey and print array value, if indexkey exists in array arr

Input :

$ cat infile
Driver Id,Driver Name,Measure Names,Measure Values
123,XYZ,Total Offers,10
123,XYZ,Driver Reject,0
123,XYZ,Driver Accept ,4
123,XYZ,Expired Offers,3
123,XYZ,Total Bookings,6
123,XYZ,Rider Cancels,2
123,XYZ,Driver Cancels,0
123,XYZ,Rider No-Show,0
123,XYZ,Completed Rides,4
124,PQR,Total Offers,2
124,PQR,Driver Reject,0
124,PQR,Driver Accept ,1
124,PQR,Expired Offers,1
124,PQR,Total Bookings,1
124,PQR,Rider Cancels,0
124,PQR,Driver Cancels,0
124,PQR,Rider No-Show,0
124,PQR,Completed Rides,1

Output:

$ awk '
     BEGIN{
       FS=OFS=SUBSEP=","; 
     }
     FNR==1{
        printf("%s%s%s",$1,OFS,$2);
        next
     }
     {
      if(!(($1,$2) in tmp)){
        usr[++u] = $1 OFS $2
        tmp[$1,$2]
      }
      if(!($3 in tmp)){
        names[++n] = $3;
        tmp[$3]
        printf("%s%s",OFS,$3)
      }
       arr[$1,$2,$3] = $4
     }
     END{
       print ""
       for(u=1; u in usr; u++){
           printf("%s", usr[u]);
           for(n=1; n in names; n++){
               indexkey = usr[u] SUBSEP names[n]
               printf("%s%s",OFS, (indexkey in arr) ? arr[indexkey]:"")
           }
           print ""
       }
     }
    ' infile
Driver Id,Driver Name,Total Offers,Driver Reject,Driver Accept ,Expired Offers,Total Bookings,Rider Cancels,Driver Cancels,Rider No-Show,Completed Rides
123,XYZ,10,0,4,3,6,2,0,0,4
124,PQR,2,0,1,1,1,0,0,0,1
0
karakfa On

if the rows are not ordered in the required fields you have to use an associative array.

$ awk -F, -v cols='Total Offers,Driver Reject,Driver Accept ,Expired Offers,Total Bookings,Rider Cancels,Driver Cancels,Rider No-Show,Completed Rides' '
   BEGIN {n=split(cols,f)} 
   NR>1  {k=$1 FS $2; keys[k]; a[k,$3]=$4} 
   END   {for(k in keys) 
             {printf "%s", k; 
              for(i=1;i<=n;i++) printf "%s%d", FS,+a[k,f[i]]; 
              print ""}}' file

124,PQR,2,0,1,1,1,0,0,0,1
123,XYZ,10,0,4,3,6,2,0,0,4

this will take care if any of the measure rows are missing

ps. Note that "Driver Accept " has a trailing space, which I kept.