2

I have a database of addresses where acronyms have been separated with a space, I want to remove these space so I turned to trusty regular expressions. However, I am struggling to perform a secondary function on the regexp result '\&' - I have checked the forums and docs and just cannot get this to work. Example data I have is as follows:

  • 'A V C Welding' should be 'AVC Welding'
  • 'H S B C' should be 'HSBC'
  • etc..

I have the following regexp:

trim(regexp_replace(organisation || ' ', '(([A-Z]\s){1}){2,}', replace('\&',' ',''), 'g'))

The replace('\&',' ','') is not having any effect at all, I just get the same string back. I have tried other functions e.g. lower('\&') and none of these seem to work as expected. Concatenation with || does work however. I have tried casting the '\&' to text, tried replace('' || '\&' || '',' ','') - still, no joy.

Any advice would be much appreciated, I am sure the solution is something very simple but I just cannot see where to go next!

4
  • Hi Vivek - thanks for your reply, as per above I am expecting to be able to convert 'A V C Welding' to 'AVC Welding', 'H S B C' to HSBC etc. It also needs to work for multiple acronyms so 'P D James & H S Wilson' would need to be 'PD James & HS Wilson'. Any advice you can provide would be much appreciated. Commented Nov 20, 2015 at 10:12
  • So to confim select trim(regexp_replace('A V C Welding', '(([A-Z]\s){1}){2,}', replace('\&',' ',''), 'g')); returns A V C Welding whereas I am expecting AVC Welding Commented Nov 20, 2015 at 10:15
  • What exactly do you want to do with replace('\&', ' ', '')? It's a certain no-op this way. Commented Nov 20, 2015 at 11:29
  • Hi Patrick and thanks for the reply - the logic I am following is: 'find me all acronyms in a string, for each of this replace any whitespace' - so I was expecting that as '\& contains the matching text for the regexp that I would just be able to replace the spaces in this and get what I need e.g. turn 'A V C Welding' into 'AVC Welding'. If I use a fixed string instead of a function it works e.g. select trim(regexp_replace('A V C Welding', '(([A-Z]\s){1}){2,}', 'XXX ', 'g')); returns 'XXX Welding' so I don't think I am that far off. Commented Nov 20, 2015 at 11:33

1 Answer 1

0

What you are trying to do with \& will never work. The \& pattern will replace the entire pattern, but you need a solution that works on individual parts.

What you need is to replace the pattern CAPITAL-space with just CAPITAL but only when followed by another capital which is not the start of a longer word. In other words: you need a negative lookahead and if the pattern is matched, then replace only the first atom of the replace string:

select regexp_replace('A V C Welding', '([A-Z]){1}(\s){1}(?![A-Z][a-z])', '\1', 'g');

You can replace the negative lookahead pattern with something broader if needed (such as no capital letter start, numbers, punctuation, etc.).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much @Patrick, that is exactly what I was looking for, I have not used negative lookahead before, I will be sure to take this into account in the future, thanks again, you have made my day!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.