0

I want to achieve a simple use case to remove any blank row found in the CSV file. How Can I achieve this using NiFi?

I have CSV File as follows: (Plz see Attached Screenshot showing which row needs to be removed) enter image description here

I want to remove the first Blank Row in the csv just above the headers using NiFi. Please, any suggestion is much appreciated. Thank You!

4
  • What does this have to do with regex? Commented Jan 4, 2019 at 0:19
  • @emsimpson92 In NiFi you can use regex or any other way to delete the blank row. I want to know ways to achieve my use case. Commented Jan 4, 2019 at 0:41
  • Please edit your question and provide the text of your flow file. Is it only first line empty? Commented Jan 4, 2019 at 5:49
  • @daggett I want to remove the first row of the CSV please see the screenshot attached. Commented Jan 4, 2019 at 18:30

1 Answer 1

4

You can use a ReplaceText processor which replaces \A\n|\n*\s*(?=\n) with '' (empty replacement value). The search regex looks for:

  • \A\n - beginning of the content immediately followed by a newline OR
  • \n*\s*(?=\n) - newline (0 or more) followed by whitespace (0 or more) followed by a newline (not captured using lookahead group)

Update

Not sure why this was downvoted or did not work for some user, as I just created a template and it worked exactly as described.

Overview of NiFi flow

Configuration of GenerateFlowFile processor

Configuration of ReplaceText processor

2019-01-08 12:25:27,642 INFO [Timer-Driven Process Thread-2] o.a.n.processors.standard.LogAttribute LogAttribute[id=2f22d047-0168-1000-47b0-9ec963e65367] logging for flow file StandardFlowFileRecord[uuid=6c9cc388-19c8-4b98-9970-6a6e3979e4ee,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1546979126561-1, container=default, section=1], offset=152, length=50],offset=0,name=6c9cc388-19c8-4b98-9970-6a6e3979e4ee,size=50]
--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
    Value: 'Tue Jan 08 12:25:27 PST 2019'
Key: 'lineageStartDate'
    Value: 'Tue Jan 08 12:25:27 PST 2019'
Key: 'fileSize'
    Value: '50'
FlowFile Attribute Map Content
Key: 'filename'
    Value: '6c9cc388-19c8-4b98-9970-6a6e3979e4ee'
Key: 'path'
    Value: './'
Key: 'uuid'
    Value: '6c9cc388-19c8-4b98-9970-6a6e3979e4ee'
--------------------------------------------------
header1,header2,header3
A1,A2,A3
B1,B2,B3
C1,C2,C3
Sign up to request clarification or add additional context in comments.

4 Comments

It went to failure. CAn you provide more detail? Like what will be the replacement strategy?
I've provided a template, screenshots of configuration and flow statistics, and log output verifying this works.
If this causes failure for you, there may be other characters on your "blank" line. Please provide the CSV file in plaintext format, not as a screenshot in Excel.
Thanks for the detail explanation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.