Getting distinct records with Prisma and PostgreSQL

January 15, 2024

PostgreSQL allows us to filter a query’s results and ensure we don’t get duplicate rows. This can be helpful when your table has many rows where the data in the columns is the same. In this article, we explore two ways PostgreSQL helps us solve this problem. We also learn how to do that with Prisma and configure it to use the native database features instead of filtering the duplicates in memory.

The DISTINCT keyword#

Let’s say we have the following model created with Prisma.

schema.prisma#

model Address {
  id      Int    @id @default(autoincrement())
  street  String
  city    String
  country String
  user    User?
}
 
model User {
  id        Int       @id @default(autoincrement())
  email     String    @unique
  name      String
  password  String
  address   Address?  @relation(fields: [addressId], references: [id])
  addressId Int?      @unique
}

Let’s say that we have the following addresses in our table:

street

city

country

350 5th Ave

New York

United States

290 Bremner Blvd

Toronto

Canada

Westminster

London

United Kingdom

Great Russell St

London

United Kingdom

1600 Pennsylvania Ave NW

Washington, D.C.

United States

1000 5th Ave

New York

United States

Hill Rise

Richmond

United Kingdom

East Main Street

Richmond

United States

First, let’s create a query to get a list of all city names from our database.

SELECT city FROM "Address";

city

New York

Toronto

London

Washington, D.C.

New York

Richmond

We can modify the above query using the DISTINCT keyword to get a list of unique cities.

SELECT DISTINCT city FROM "Address";

city

New York

Washington, D.C.

London

Toronto

Richmond

Using multiple columns#

When we look closer at the list of addresses, we can see a city called Richmond, both in the UK and the United States. Fortunately, we can use the DISTINCT keyword to get a unique combination of the city and country.

SELECT DISTINCT city, country FROM "Address";

city

country

New York

United States

Toronto

Canada

London

United Kingdom

Washington, D.C.

United States

Richmond

United Kingdom

Richmond

United States

The DISTINCT ON keyword#

With PostgreSQL, we can access another handy tool called the DISTINCT ON. It is similar to DISTINCT but allows us to retain other columns from the row.

SELECT DISTINCT ON (city) * FROM "Address";

street

city

country

Westminster

London

United Kingdom

1000 5th Ave

New York

United States

East Main Street

Richmond

United States

290 Bremner Blvd

Toronto

Canada

1600 Pennsylvania Ave NW

Washington, D.C.

United States

Above, we selected distinct cities, but we also see ids, countries, and streets.

The catch is that DISTINCT ON selected one row per each distinct value, but it is unpredictable. In the case of New York, it removed 350 5th Ave (the Empire State Building) and selected 1000 5th Ave (The Metropolitan Museum of Art).

The DISTINCT ON keyword makes the most sense when combined with the ORDER BY clause. If we order the results before applying the DISTINCT ON filter, we can be sure that PostgreSQL will always choose the first row based on the order we specified. Thanks to that, we can have predictable results.

Let’s use DISTINCT ON to find the first user who signed up with an address from a particular country.

First, we need to join the User and Address table.

SELECT "User".id AS userId, "Address".country AS country
FROM "User"
JOIN "Address" ON "User"."addressId" = "Address".id

Let’s make sure each country is unique using the DISTINCT ON clause. By ordering the rows by the user ID, we ensure we get the users with the lowest ID possible. Thanks to that, we get the first users who signed up from a particular country.

SELECT DISTINCT ON(country) "User".id AS "firstRegisteredUserId", "Address".country AS country
FROM "User"
JOIN "Address" ON "User"."addressId" = "Address".id
ORDER BY "Address".country, "firstRegisteredUserId"

firstRegisteredUserId

country

Canada

United Kingdom

United States

Finding distinct values using Prisma#

Prisma allows us to filter duplicate rows when using the findMany query.

const distinctCities = await this.prismaService.address.findMany({
  distinct: ['city'],
  select: {
    city: true
  }
})

We can take it further and replicate the example we wrote before using the DISTINCT ON keyword.

const countriesWithFirstUser = await this.prismaService.address.findMany({
  include: {
    user: true,
  },
  distinct: ['country'],
  orderBy: {
    user: {
      id: 'asc',
    },
  },
});

How it works under the hood#

Let’s configure Prisma to log all SQL queries to the console.

prisma.service.ts#

import { Injectable, OnModuleInit } from "@nestjs/common"
import { PrismaClient, Prisma } from "@prisma/client"
@Injectable()
export class PrismaService
  extends PrismaClient<Prisma.PrismaClientOptions, Prisma.LogLevel>
  implements OnModuleInit
{
  constructor() {
    super({
      log: [
        {
          emit: "event",
          level: "query",
        },
      ],
    })
  }
  async onModuleInit() {
    await this.$connect()
    this.$on("query", (event) => {
      console.log(`Query: ${event.query}`)
    })
  }
}

Let’s run our findMany query that finds all distinct cities and inspect the SQL query that Prisma makes under the hood:

SELECT "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1

Unfortunately, there is a catch. By default, Prisma makes separate SELECT queries and processes the data in memory when we use distinct. This can result in a performance that is not as good as a native SQL query that uses the DISTINCT keyword. Since Prisma 5.7.0, we can affect that by using a preview feature.

schema.prisma#

generator client {
  provider        = "prisma-client-js"
  previewFeatures = ["nativeDistinct"]
}

When we add the nativeDistinct preview feature and run npx prisma generate, Prisma starts using the DISTINCT ON clause.

SELECT DISTINCT ON ("public"."Address"."city") "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1

Unfortunately, it only works with unordered queries right now. If we run our findMany query that finds the first users who signed up from a particular country, Prisma still parses the data in memory instead of using DISTINCT ON:

SELECT "public"."Address"."id", "public"."Address"."street", "public"."Address"."city", "public"."Address"."country" FROM "public"."Address" LEFT JOIN "public"."User" AS "orderby_1" ON ("orderby_1"."addressId") = ("public"."Address"."id") WHERE 1=1 ORDER BY "orderby_1"."id" ASC OFFSET $1;
SELECT "public"."User"."id", "public"."User"."email", "public"."User"."name", "public"."User"."password", "public"."User"."addressId" FROM "public"."User" WHERE "public"."User"."addressId" IN ($1,$2,$3) OFFSET $4;

Hopefully, Prisma will improve its distinct implementation soon and use the native DISTINCT ON in more cases. We can track the progress of this issue on GitHub.

Summary#

PostgreSQL offers efficient methods to filter out duplicate rows in queries, which helps handle large datasets with repetitive data. In this article, we explored the DISTINCT and DISTINCT ON keywords in PostgreSQL and compared them using various examples.

We also learned how to filter out duplicate values through Prisma. Moreover, we’ve looked under the hood and learned that Prisma filters out the data in memory by default. Finally, we’ve learned how to change the default behavior and configure Prisma to rely more on the native features built into PostgreSQL.