Sometimes I want remove the duplicate values return form listagg function and I don't like to use subquery, etc.
Finally I found a great solution using regular expression! really cool and I thought maybe useful to someone:))
select '1223-1223-1223-1345' t1, rtrim( REGEXP_REPLACE('1223-1223-1223-1345', '([^-]*)(-\1)+($|-)', '\1\3'), '-') t2
from dual;
1223-1223-1223-1345 will be 1223-1345, but be sure the list is order by the item!
Here is a example for me:
select regexp_replace( (listagg(a.short_name, ',' ) within group (order by a.short_name)), '([^,]*)(,\1)+($|,)','\1\3')
from gift g, allocation a where g.gift_donor_id = l.gift_donor_id and g.gift_associated_allocation = a.allocation_code and a.alloc_school = 'LS') distinct_allocs
Explanation is following: Courtesy to Srinivasan - bean farmer
http://srinisqlwork.blogspot.com/2016/12/sql-remove-duplicates-in-listagg-result.html
Understanding the solution with example:
select
REGEXP_REPLACE( 'English,English,English,Hindi,Hindi,Kannada,Kannada,Kannada,Tamil,Tamil,Telugu,Telugu','([^,]*)(,\1)+($|,)' ,'\1\3')
from dual
1. Understanding regExMatch expression: '([^,]*)(,\1)+($|,)'
There are three groups in above expression
group 1: ([^,]*) : match all or no characters till comma
match Result: English
group 2: (,\1)+ : \1 - stands for first group which is Kannada, so it becomes (,English)+, meaning match one or more occurrences of ',English'.
match Result: ,English,English
group 3: ($|,) : $ stands for 'end of string', | stands or. So it says match either end of string or a comma
match Result: , (this is comma which is following group2 match: ,English,English)
2. Understanding regExReplace expression '\1\3':
\1\3 represent the group number which are used in regExMatch expression.
replace Result: English,
Description of the illustration regexp_replace.gif
This function complies with the POSIX regular expression standard and the Unicode Regular Expression Guidelines. For more information, please refer to Appendix C, "Oracle Regular Expression Support".
REGEXP_REPLACE
Description of the illustration regexp_replace.gif
REGEXP_REPLACE
extends the functionality of the REPLACE
function by letting you search a string for a regular expression pattern. By default, the function returns source_char
with every occurrence of the regular expression pattern replaced with replace_string
. The string returned is in the same character set as source_char
. The function returns VARCHAR2
if the first argument is not a LOB and returns CLOB
if the first argument is a LOB.This function complies with the POSIX regular expression standard and the Unicode Regular Expression Guidelines. For more information, please refer to Appendix C, "Oracle Regular Expression Support".
source_char
is a character expression that serves as the search value. It is commonly a character column and can be of any of the datatypesCHAR
,VARCHAR2
,NCHAR
,NVARCHAR2
,CLOB
orNCLOB
.pattern
is the regular expression. It is usually a text literal and can be of any of the datatypesCHAR
,VARCHAR2
,NCHAR
, orNVARCHAR2
. It can contain up to 512 bytes. If the datatype ofpattern
is different from the datatype ofsource_char
, Oracle Database convertspattern
to the datatype ofsource_char
. For a listing of the operators you can specify inpattern
, please refer to Appendix C, "Oracle Regular Expression Support".replace_string
can be of any of the datatypesCHAR
,VARCHAR2
,NCHAR
,NVARCHAR2
,CLOB
, orNCLOB
. Ifreplace_string
is aCLOB
orNCLOB
, then Oracle truncatesreplace_string
to 32K. Thereplace_string
can contain up to 500 backreferences to subexpressions in the form\n
, wheren
is a number from 1 to 9. Ifn
is the backslash character inreplace_string
, then you must precede it with the escape character (\\
). For more information on backreference expressions, please refer to the notes to "Oracle Regular Expression Support", Table C-1.position
is a positive integer indicating the character ofsource_char
where Oracle should begin the search. The default is 1, meaning that Oracle begins the search at the first character ofsource_char
.occurrence
is a nonnegative integer indicating the occurrence of the replace operation:
- If you specify 0, then Oracle replaces all occurrences of the match.
- If you specify a positive integer
n
, then Oracle replaces then
th occurrence.
match_parameter
is a text literal that lets you change the default matching behavior of the function. This argument affects only the matching process and has no effect onreplace_string
. You can specify one or more of the following values formatch_parameter
:
'i'
specifies case-insensitive matching.'c'
specifies case-sensitive matching.'n'
allows the period (.), which is the match-any-character character, to match the newline character. If you omit this parameter, the period does not match the newline character.'m'
treats the source string as multiple lines. Oracle interprets^
and$
as the start and end, respectively, of any line anywhere in the source string, rather than only at the start or end of the entire source string. If you omit this parameter, Oracle treats the source string as a single line.- 'x' ignores whitespace characters. By default, whitespace characters match themselves.
'ic'
, then Oracle uses case-sensitive matching. If you specify a character other than those shown above, then Oracle returns an error.
If you omitmatch_parameter
, then:
- The default case sensitivity is determined by the value of the
NLS_SORT
parameter. - A period (.) does not match the newline character.
- The source string is treated as a single line.