How to find and remove duplicates in Baserow

olgatrykush · January 24, 2024, 9:59am

Whether it’s an import error or human error, duplicate data makes your databases less useful. Here’s four options for identifying duplicates in Baserow:

Option 1: Manual sorting and removal
Option 2: Identify duplicates using Zapier
Option 3: Create new rows using Make
Option 4: Remove existing duplicates with n8n

Let’s dive in!

marcus · January 24, 2024, 11:49am

Thank you for the tip.
That is good, but I still believe - as I requested this in the past - that detecting duplicates right in the baserow, without the need to use another software or service, would be very appreciated.
I think this could be possible to implement, especially if there is a mechanism already to see unique and non-unique field content.

F.e. my usecase is pretty simple:
I have a database of +10k rows in my table where the primary field is just a text type - people’s Full name. I need to find duplicates (the same full names - 2 or more instances of any of those in the table) and need to decide visually and remove them manually. Sometimes The same full name doesn’t mean that it is the same person (can be two or more different, but with the same name - these I need to keep in the table).

naamval · January 24, 2024, 1:28pm

I can’t say anything about the first three suggestions, but the n8n instructions will not work.

There is no need to use the ‘split in batches’ node, but even if you would use it, this is not the way to do it. Without actually creating a loop back to the node, it will only process the first batch and then stop executing. By the way, the ‘split in batches’ node looks different (and works more intuitively) as well in the latest n8n versions.
The ‘item lists’ node does not exist any longer in the latest versions of n8n. It has been replaced by separate nodes. In this case, you would use the ‘remove duplicates’ node instead.

olgatrykush · January 24, 2024, 4:38pm

Thank you for your feedback, @naamval. Our content lead has made the necessary changes to the blog post.

olgatrykush · January 24, 2024, 4:39pm

Hey @marcus, sure thing, that’s on our list! While we don’t have these features yet, we thought it might be useful to share a tutorial with some workarounds.

marcus · January 24, 2024, 7:37pm

OK, that would be really great, the sooner the better, because doing it manually is soooo exhausting and it is very error prone. Especially the scenario where I need to tag selected rows (people) by a specific status, there is a problem later (f.e. when filtering) if of them is tagged differently than the other one…

artoflogic · May 11, 2024, 9:48am

I would like to see a modified countif function that allows me to check how many times a certain entry in one column is already there in Baserow

e.g. like in Excel =COUNTIF(A2,A:A)