1

How to divide lines of a file into other files

Im not of a programer myself but developed a shellscript to read a positional file and based on a single letter specified at position 16 copy all the line to another file.

Exemple:

INPUT FILE
003402841000011A10CNPJ08963394000195
003402841000041B20CNPJ08963394000195 16012020XX5313720087903007 003402841000011A10CNPJ08963394000195
003402841000041B20CNPJ08963394000195 16012020XX5313720087903007

OUTPUT FILE A
003402841000011A10CNPJ08963394000195
003402841000011A10CNPJ08963394000195

OUTPUT FILE B
003402841000041B20CNPJ08963394000195 16012020XX5313720087903007 003402841000041B20CNPJ08963394000195 16012020XX5313720087903007

The code i current have:

#!/usr/bin/env bash

ARQ_IN="$1";
DIR_OUT="C:/Users/etc/etc/";

while IFS= read -r line || [[ -n "$line" ]]; 
do 

SUBSTRING=$(echo $line| cut -c16);

if [ $SUBSTRING == "A" ]
then
    echo "$line" >> "$DIR_OUT"arqA.txt;
else
    if [ $SUBSTRING == "B" ]
    then
        echo "$line" >> "$DIR_OUT"arqB.txt;
    else
        if [ $SUBSTRING == "K" ]
        then
            echo "$line" >> "$DIR_OUT"arqK.txt;
        else
            if [ $SUBSTRING == "1" ]
            then
                echo "$line" >> "$DIR_OUT"arq1.txt;
            else
            
            fi
        fi
    fi
fi


done < "$ARQ_IN"

Although this code works, it doesn't work in the speed that i need, the INPUT FILE has around 400k registers.

Can someone help me to write a new code or improve this one?

Submitted December 02nd 2020 by Admin

Answers
0

This is a job for awk, could you please try following, though I haven't tested it with huge dataset but it should be definitely faster than OP's current approach. To add abosulte path before output file name we could pass shell variable into awk variable and get it in outputFile variable here.

awk '
{ close(outputFile) outputFile=("output_file_"substr($0,16,1)) print >> (outputFile)
}
' Input_file

With complete folder path to save files use following:

DIR_OUT="/tmp/test/"
awk -v folder="${DIR_OUT}" '
{ close(outputFile) outputFile=(folder"arq"substr($0,16,1)".txt") print >> (outputFile)
}
' Input_file

Admin | 10 months ago



Relevant Questions