Delete a single line if a specific string is found

564 views Asked by At

I'm fairly new with IIB, currently what I want to achieve is to delete a single line from a TXT if it contains a specific word for example the word USA as per below.I read it as a BLOB then convert it to a string. Should I achieve this using Compute node or Java node? Thanks in advance.

e.g

Before

Hello my name 
is Malcom and I live
in the USA

After

Hello my name 
is Malcom and I live

Current Flow FileInput -> Compute -> JavaCompute -> FileOutput

FileInput : To read data from a specific folder

Compute : Replace a string to another string (mask)

CREATE PROCEDURE getBLOBMessage() BEGIN
        DECLARE fullBLOB CHARACTER;
        SET fullBLOB = CAST(OutputRoot.BLOB.BLOB as char CCSID 1208 Encoding 815);
        SET OutputLocalEnvironment.msg = fullBLOB;
    END;
    
    CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
         SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.115.11.112', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.112.11.112', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.111.111.116', 'XXX.XXX.XXX.XXX');
         SET msg = REPLACE (msg, '172.16.18.72', 'XXX.XX.XX.XX');
         SET msg = REPLACE (msg, 'b1111111110', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, '11111111101', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, '11111111111', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, 'B1111111111', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
         SET msg = REPLACE (msg, '11111111111N', 'XXXXXXXXXXXX'); 
         SET OutputRoot.BLOB.BLOB = CAST (msg AS BLOB CCSID 1208 Encoding 815);
    END;

JavaCompute: For removing line maybe?

FileOutput: To generate the output txt file

1

There are 1 answers

0
Daniel Steinmann On BEST ANSWER

If you use the Record detection feature of the File Input node, your requirements can be fulfilled in ESQL.

FileInput node:

  • Records and Elements: Record detection = Delimited
  • Connect End of Data to Finish File of FileOutput node

Compute node:

CREATE COMPUTE MODULE Thaqif_Compute

    CREATE FUNCTION Main() RETURNS BOOLEAN
    BEGIN
        SET OutputRoot = InputRoot;
        DECLARE line CHARACTER CAST(OutputRoot.BLOB.BLOB AS CHAR
                                    CCSID InputProperties.CodedCharSetId
                                    ENCODING InputProperties.Encoding);
        IF CONTAINS(line, 'USA') THEN
            RETURN FALSE;
        ELSE
            CALL maskMessage(line);
            SET OutputRoot.BLOB.BLOB = CAST(line AS BLOB 
                                            CCSID InputProperties.CodedCharSetId
                                            ENCODING InputProperties.Encoding);
            RETURN TRUE;
        END IF;
    END;

    CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
        SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
        -- Other patterns removed for brevity
        SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
    END;

END MODULE;

FileOutput node:

  • Records and Elements: Record definition = Record is delimited data

Example input:

Hello my name 
is Malcom and I live
in the USA
where 431.111.55.113 is masked
but Q2222222 is still ok

Resulting output:

Hello my name 
is Malcom and I live
where XXX.XXX.XX.XXX is masked
but Q2222222 is still ok