For example, I have duplicate records that are only differentiated by a column called comments. I am looking to keep duplicate records if they may have the same business line, item name, etc. but if these things are the same and the comment columns are different, I would like to remove these records.
I currently have this calculated column:
IsDuplicate = IF( CALCULATE( COUNTROWS('2023'), FILTER( ALLEXCEPT('2023','2023'[Business Line],'2023'[PD Rating],'2023'[Item Name]), '2023'[Comments] <> EARLIER('2023'[Comments]) ) ) > 0, TRUE, FALSE )
This works exactly as I need and returns "TRUE" for the duplicates, but I cannot just remove all of the records returning "TRUE" because this removes the records entirely rather than keeping one and removing just the duplicate.
1 Answer
what about this:
Duplicate = IF(CALCULATE(COUNT(TABLE[DUPLICATED_COLUMN]), ALLEXCEPT(TABLE, TABLE[DUPLICATED_COLUMN])) > 1 && TABLE[Date] > CALCULATE(MIN(TABLE[Date]), ALLEXCEPT(TABLE, TABLE[DUPLICATED_COLUMN])), "True", "False")
Let me explain. The first part:
CALCULATE(COUNT(TABLE[DUPLICATED_COLUMN]), ALLEXCEPT(TABLE, TABLE[DUPLICATED_COLUMN])) > 1
counts the number of occurrences for each value of the column with duplicates. If the value is not duplicated, then the count will return 1, so we want the values greater than 1.
The second part:
&& TABLE[Date] > CALCULATE(MIN(TABLE[Date]), ALLEXCEPT(TABLE, TABLE[DUPLICATED_COLUMN]))
is the filter for the duplicated value you want to keep. In my example, I'm keeping the row where the Date is the minimum. You can change it to any filter you want.
Finally, if both conditions are met, the formula will return "True" else will return "False". Now you can create a new table like
RemovedDuplicates = CALCULATETABLE(TABLE, FILTER(TABLE, TABLE[Duplicate] = "False"))
ncG1vNJzZmirpJawrLvVnqmfpJ%2Bse6S7zGiorp2jqbawutJobm9wY2V9dn6OoaawZaSkerOxzKitnmWUqr2ttcKaq56rXaS7brzOsJyrZZKeeqOt0p6bZqeeYrq2uNOip6WdXZi8rcHMp6o%3D