Tuesday, February 4, 2014

ORCL - REGEXP_REPLACE

I really like the listagg function, but it can not be used with distinct.

Sometimes I want remove the duplicate values return form listagg function and I don't like to use subquery, etc.

Finally I found a great solution using regular expression! really cool and I thought maybe useful to someone:))


select '1223-1223-1223-1345' t1, rtrim( REGEXP_REPLACE('1223-1223-1223-1345', '([^-]*)(-\1)+($|-)', '\1\3'), '-') t2

from dual;

1223-1223-1223-1345 will be 1223-1345, but be sure the list is order by the item!

Here is a example for me:

select regexp_replace( (listagg(a.short_name, ',' ) within group (order by a.short_name)), '([^,]*)(,\1)+($|,)','\1\3')

from gift g, allocation a where g.gift_donor_id = l.gift_donor_id and g.gift_associated_allocation = a.allocation_code and a.alloc_school = 'LS') distinct_allocs      

Explanation is following: Courtesy to Srinivasan - bean farmer 

http://srinisqlwork.blogspot.com/2016/12/sql-remove-duplicates-in-listagg-result.html

Understanding the solution with example:
select
REGEXP_REPLACE( 'English,English,English,Hindi,Hindi,Kannada,Kannada,Kannada,Tamil,Tamil,Telugu,Telugu','([^,]*)(,\1)+($|,)'   ,'\1\3')
from dual
1. Understanding regExMatch expression: '([^,]*)(,\1)+($|,)'
There are three groups in above expression
group 1: ([^,]*) : match all or no characters till comma   
match Result: English
group 2: (,\1)+ : \1 - stands for first group which is Kannada, so it becomes (,English)+, meaning match one or more occurrences of  ',English'.
match Result: ,English,English
group 3: ($|,) : $ stands for 'end of string', | stands or. So it says match either end of string or a comma
match Result: ,    (this is comma which is following group2 match: ,English,English)
2. Understanding regExReplace expression '\1\3':
\1\3 represent the group number which are used in regExMatch expression.
replace Result: English,

REGEXP_REPLACE

Syntax
Description of regexp_replace.gif follows
Description of the illustration regexp_replace.gif

Purpose
REGEXP_REPLACE extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string. The string returned is in the same character set as source_char. The function returns VARCHAR2 if the first argument is not a LOB and returns CLOB if the first argument is a LOB.
This function complies with the POSIX regular expression standard and the Unicode Regular Expression Guidelines. For more information, please refer to Appendix C, "Oracle Regular Expression Support".
  • source_char is a character expression that serves as the search value. It is commonly a character column and can be of any of the datatypes CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB or NCLOB.
  • pattern is the regular expression. It is usually a text literal and can be of any of the datatypes CHAR, VARCHAR2, NCHAR, or NVARCHAR2. It can contain up to 512 bytes. If the datatype of pattern is different from the datatype of source_char, Oracle Database converts pattern to the datatype of source_char. For a listing of the operators you can specify in pattern, please refer to Appendix C, "Oracle Regular Expression Support".
  • replace_string can be of any of the datatypes CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. If replace_string is a CLOB or NCLOB, then Oracle truncates replace_string to 32K. The replace_string can contain up to 500 backreferences to subexpressions in the form \n, where n is a number from 1 to 9. If n is the backslash character in replace_string, then you must precede it with the escape character (\\). For more information on backreference expressions, please refer to the notes to "Oracle Regular Expression Support", Table C-1.
  • position is a positive integer indicating the character of source_char where Oracle should begin the search. The default is 1, meaning that Oracle begins the search at the first character of source_char.
  • occurrence is a nonnegative integer indicating the occurrence of the replace operation:
    • If you specify 0, then Oracle replaces all occurrences of the match.
    • If you specify a positive integer n, then Oracle replaces the nth occurrence.
  • match_parameter is a text literal that lets you change the default matching behavior of the function. This argument affects only the matching process and has no effect on replace_string. You can specify one or more of the following values for match_parameter:
    • 'i' specifies case-insensitive matching.
    • 'c' specifies case-sensitive matching.
    • 'n' allows the period (.), which is the match-any-character character, to match the newline character. If you omit this parameter, the period does not match the newline character.
    • 'm' treats the source string as multiple lines. Oracle interprets ^ and $ as the start and end, respectively, of any line anywhere in the source string, rather than only at the start or end of the entire source string. If you omit this parameter, Oracle treats the source string as a single line.
    • 'x' ignores whitespace characters. By default, whitespace characters match themselves.
    If you specify multiple contradictory values, Oracle uses the last value. For example, if you specify 'ic', then Oracle uses case-sensitive matching. If you specify a character other than those shown above, then Oracle returns an error.
    If you omit match_parameter, then:
    • The default case sensitivity is determined by the value of the NLS_SORT parameter.
    • A period (.) does not match the newline character.
    • The source string is treated as a single line.

No comments:

Post a Comment