Getting distinct records with Prisma and PostgreSQL
PostgreSQL allows us to filter a query’s results and ensure we don’t get duplicate rows. This can be helpful when your table has many rows where the data in…
January 15, 2024
PostgreSQL allows us to filter a query’s results and ensure we don’t get duplicate rows. This can be helpful when your table has many rows where the data in the columns is the same. In this article, we explore two ways PostgreSQL helps us solve this problem. We also learn how to do that with Prisma and configure it to use the native database features instead of filtering the duplicates in memory.
The DISTINCT keyword#
Let’s say we have the following model created with Prisma.
schema.prisma#
model Address {
id Int @id @default(autoincrement())
street String
city String
country String
user User?
}
model User {
id Int @id @default(autoincrement())
email String @unique
name String
password String
address Address? @relation(fields: [addressId], references: [id])
addressId Int? @unique
}Let’s say that we have the following addresses in our table:
id
street
city
country
1
350 5th Ave
New York
United States
2
290 Bremner Blvd
Toronto
Canada
3
Westminster
London
United Kingdom
4
Great Russell St
London
United Kingdom
6
1600 Pennsylvania Ave NW
Washington, D.C.
United States
7
1000 5th Ave
New York
United States
8
Hill Rise
Richmond
United Kingdom
9
East Main Street
Richmond
United States
First, let’s create a query to get a list of all city names from our database.
SELECT city FROM "Address";city
New York
Toronto
London
London
Washington, D.C.
New York
Richmond
Richmond
We can modify the above query using the DISTINCT keyword to get a list of unique cities.
SELECT DISTINCT city FROM "Address";city
New York
Washington, D.C.
London
Toronto
Richmond
Using multiple columns#
When we look closer at the list of addresses, we can see a city called Richmond, both in the UK and the United States. Fortunately, we can use the DISTINCT keyword to get a unique combination of the city and country.
SELECT DISTINCT city, country FROM "Address";city
country
New York
United States
Toronto
Canada
London
United Kingdom
Washington, D.C.
United States
Richmond
United Kingdom
Richmond
United States
The DISTINCT ON keyword#
With PostgreSQL, we can access another handy tool called the DISTINCT ON. It is similar to DISTINCT but allows us to retain other columns from the row.
SELECT DISTINCT ON (city) * FROM "Address";id
street
city
country
3
Westminster
London
United Kingdom
7
1000 5th Ave
New York
United States
9
East Main Street
Richmond
United States
2
290 Bremner Blvd
Toronto
Canada
6
1600 Pennsylvania Ave NW
Washington, D.C.
United States
Above, we selected distinct cities, but we also see ids, countries, and streets.
The catch is that DISTINCT ON selected one row per each distinct value, but it is unpredictable. In the case of New York, it removed 350 5th Ave (the Empire State Building) and selected 1000 5th Ave (The Metropolitan Museum of Art).
The DISTINCT ON keyword makes the most sense when combined with the ORDER BY clause. If we order the results before applying the DISTINCT ON filter, we can be sure that PostgreSQL will always choose the first row based on the order we specified. Thanks to that, we can have predictable results.
Let’s use DISTINCT ON to find the first user who signed up with an address from a particular country.
First, we need to join the User and Address table.
SELECT "User".id AS userId, "Address".country AS country
FROM "User"
JOIN "Address" ON "User"."addressId" = "Address".idLet’s make sure each country is unique using the DISTINCT ON clause. By ordering the rows by the user ID, we ensure we get the users with the lowest ID possible. Thanks to that, we get the first users who signed up from a particular country.
SELECT DISTINCT ON(country) "User".id AS "firstRegisteredUserId", "Address".country AS country
FROM "User"
JOIN "Address" ON "User"."addressId" = "Address".id
ORDER BY "Address".country, "firstRegisteredUserId"firstRegisteredUserId
country
3
Canada
4
United Kingdom
2
United States
Finding distinct values using Prisma#
Prisma allows us to filter duplicate rows when using the findMany query.
const distinctCities = await this.prismaService.address.findMany({
distinct: ['city'],
select: {
city: true
}
})We can take it further and replicate the example we wrote before using the DISTINCT ON keyword.
const countriesWithFirstUser = await this.prismaService.address.findMany({
include: {
user: true,
},
distinct: ['country'],
orderBy: {
user: {
id: 'asc',
},
},
});How it works under the hood#
Let’s configure Prisma to log all SQL queries to the console.
prisma.service.ts#
import { Injectable, OnModuleInit } from "@nestjs/common"
import { PrismaClient, Prisma } from "@prisma/client"
@Injectable()
export class PrismaService
extends PrismaClient<Prisma.PrismaClientOptions, Prisma.LogLevel>
implements OnModuleInit
{
constructor() {
super({
log: [
{
emit: "event",
level: "query",
},
],
})
}
async onModuleInit() {
await this.$connect()
this.$on("query", (event) => {
console.log(`Query: ${event.query}`)
})
}
}Let’s run our findMany query that finds all distinct cities and inspect the SQL query that Prisma makes under the hood:
SELECT "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1Unfortunately, there is a catch. By default, Prisma makes separate SELECT queries and processes the data in memory when we use distinct. This can result in a performance that is not as good as a native SQL query that uses the DISTINCT keyword. Since Prisma 5.7.0, we can affect that by using a preview feature.
schema.prisma#
generator client {
provider = "prisma-client-js"
previewFeatures = ["nativeDistinct"]
}When we add the nativeDistinct preview feature and run npx prisma generate, Prisma starts using the DISTINCT ON clause.
SELECT DISTINCT ON ("public"."Address"."city") "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1Unfortunately, it only works with unordered queries right now. If we run our findMany query that finds the first users who signed up from a particular country, Prisma still parses the data in memory instead of using DISTINCT ON:
SELECT "public"."Address"."id", "public"."Address"."street", "public"."Address"."city", "public"."Address"."country" FROM "public"."Address" LEFT JOIN "public"."User" AS "orderby_1" ON ("orderby_1"."addressId") = ("public"."Address"."id") WHERE 1=1 ORDER BY "orderby_1"."id" ASC OFFSET $1;
SELECT "public"."User"."id", "public"."User"."email", "public"."User"."name", "public"."User"."password", "public"."User"."addressId" FROM "public"."User" WHERE "public"."User"."addressId" IN ($1,$2,$3) OFFSET $4;Hopefully, Prisma will improve its distinct implementation soon and use the native DISTINCT ON in more cases. We can track the progress of this issue on GitHub.
Summary#
PostgreSQL offers efficient methods to filter out duplicate rows in queries, which helps handle large datasets with repetitive data. In this article, we explored the DISTINCT and DISTINCT ON keywords in PostgreSQL and compared them using various examples.
We also learned how to filter out duplicate values through Prisma. Moreover, we’ve looked under the hood and learned that Prisma filters out the data in memory by default. Finally, we’ve learned how to change the default behavior and configure Prisma to rely more on the native features built into PostgreSQL.