10 min read

Offset and keyset pagination with raw SQL queries

So far, we have returned the full content of our tables. However, as our database grows, this might prove not to be the best approach in terms of performance.…

October 3, 2022

So far, we have returned the full content of our tables. However, as our database grows, this might prove not to be the best approach in terms of performance. A popular solution is to serve the data in chunks by presenting multiple pages or implementing infinite scrolling. In this article, we implement its back-end aspect using NestJS and PostgreSQL. We also compare various approaches to achieving it and point out their advantages and disadvantages.

You can find the code from this article in this repository.

Offset and limit#

Let’s start by investigating this simple query:

SELECT id, title FROM posts

It returns all of the records from the posts table.

A significant thing to acknowledge is that the order of the above rows is not guaranteed. However, when implementing pagination, we depend on the order of rows to be predictable. Therefore, we should use the ORDER BY clause.

SELECT id, title FROM posts
ORDER BY id ASC

To start paginating our data, we need to limit the number of rows in our query. To do that, we need the LIMIT statement.

SELECT id, title FROM posts
ORDER BY id ASC
LIMIT 10

Thanks to the above, we now get the first ten items instead all of them. This allows us to present the user with the first page of results.

To serve the second page of the data, we need to specify the starting point of our query. We can use the OFFSET keyword to specify how many rows we want to skip.

SELECT id, title FROM posts
ORDER BY id ASC
OFFSET 10
LIMIT 10

Above, we omit the first ten posts and get ten posts in the results. In our case, it gives us entities with ids from 11 to 20. This is where the order of our data plays a significant role. We can easily modify it by changing the ORDER BY clause, but keeping some order is important.

Counting the number of rows#

It is a common approach to display the number of data pages to the user. For example, if we have fifty rows and display ten per page, we have five data pages.

To do the above, we need to know the number of rows of data in our table. To do that, we can use the COUNT keyword.

SELECT COUNT(*) AS all_posts_count FROM posts

We can use the COUNT keyword while selecting data from some columns. When doing that, we need to specify the section of the data we are counting by partitioning it. For example, we can count the number of posts by a certain author.

SELECT author_id, COUNT(*) OVER (PARTITION BY author_id) AS author_posts_count FROM posts

To present the results in a readable way, let’s only display one row per author using the DISTINCT keyword.

SELECT DISTINCT author_id, COUNT(*) OVER (PARTITION BY author_id) AS author_posts_count FROM posts

Above, we can see that the author with id 3 wrote two posts, and the author with id 2 wrote forty posts.

In our case, we want to count the total number of posts. However, even though that’s the case, we still need to use the OVER clause.

SELECT id, title, COUNT(*) OVER() AS total_posts_count FROM posts
ORDER BY id ASC
OFFSET 10
LIMIT 10

Grouping and partitioning data with the OVER() function is a good topic for a separate article.

The whole idea is to count the number of rows and fetch their details in the same transaction to keep the integrity of the data. When we run a single query, PostgreSQL wraps it in a transaction out of the box.

We can define a transaction separately if we want to count the posts in a separate SELECT statement.

BEGIN;
  SELECT id, title FROM posts
  ORDER BY id ASC
  OFFSET 10
  LIMIT 10;
  
  SELECT COUNT(*) AS total_posts_count FROM posts;
COMMIT;

It is also important to notice that PostgreSQL returns the result of COUNT as big int. The maximum value of a regular integer is 2³¹⁻¹ (2,147,483,647), and for a big integer, it is 2⁶³⁻¹ (9,223,372,036,854,775,807).

Unfortunately, JavaScript does not know how to parse big integers to JSON out of the box.

const data = {
  value: BigInt(10)
}
 
JSON.stringify(data);

Uncaught TypeError: Do not know how to serialize a BigInt

If we don’t expect our table to hold more than 2,147,483,647 elements, we can cast the result of COUNT(*) to a regular integer.

SELECT COUNT(*) OVER()::int AS total_posts_count FROM posts

Implementing offset pagination with NestJS#

When implementing the offset pagination with NestJS, we expect the user to provide the offset and limit as query parameters. To handle that, we can create a designated class.

paginationParams.ts#
import { IsNumber, Min, IsOptional } from "class-validator"
import { Type } from "class-transformer"
class PaginationParams {
  @IsOptional()
  @Type(() => Number)
  @IsNumber()
  @Min(0)
  offset?: number
  @IsOptional()
  @Type(() => Number)
  @IsNumber()
  @Min(1)
  limit?: number
}
export default PaginationParams

We then use it in our controller.

posts.controller.ts#
import { ClassSerializerInterceptor, Controller, Get, Query, UseInterceptors } from "@nestjs/common"
import { PostsService } from "./posts.service"
import GetPostsByAuthorQuery from "./getPostsByAuthorQuery"
import PaginationParams from "../utils/paginationParams"
@Controller("posts")
@UseInterceptors(ClassSerializerInterceptor)
export default class PostsController {
  constructor(private readonly postsService: PostsService) {}
  @Get()
  getPosts(
    @Query() { authorId }: GetPostsByAuthorQuery,
    @Query() { offset, limit }: PaginationParams,
  ) {
    return this.postsService.getPosts(authorId, offset, limit)
  } // ...
}

The last step is to implement the logic in our PostsRepository class.

posts.repository.ts#
import { Injectable } from "@nestjs/common"
import DatabaseService from "../database/database.service"
import PostModel from "./post.model"
@Injectable()
class PostsRepository {
  constructor(private readonly databaseService: DatabaseService) {}
  async get(offset = 0, limit: number | null = null) {
    const databaseResponse = await this.databaseService.runQuery(
      `
      SELECT id, title, COUNT(*) OVER()::int AS total_posts_count FROM posts
      ORDER BY id ASC
      OFFSET $1
      LIMIT $2
    `,
      [offset, limit],
    )
    const items = databaseResponse.rows.map((databaseRow) => new PostModel(databaseRow))
    const count = databaseResponse.rows[0]?.total_posts_count || 0
    return {
      items,
      count,
    }
  } // ...
}
export default PostsRepository

A significant thing above is that we provide default values for offset and limit:

  • providing for offset means that we don’t intend to skip any rows,
  • by setting the limit to null, we state that we don’t want to limit the results.

Doing all of the above, we end up with fully functional offset pagination.

Disadvantages#

The offset and limit approach to pagination is widely used. Unfortunately, it has some significant disadvantages.

The most important caveat is that the database needs to compute all of the rows skipped by the OFFSET keyword. This can take a toll on the performance:

  • first, the database sorts all of the rows as specified in the ORDER BY clause,
  • then, PostgreSQL drops the number of rows specified in the OFFSET.

Aside from the above issue, we can run into a problem with consistency:

  1. the first user fetches page number one with posts,
  2. the second user creates a new post that ends up on page number one,
  3. the first user fetches the second page.

Unfortunately, the above operations cause the first user to see the last element of the first page again on the second page. Besides that, the user missed the element added to the first page.

Advantages#

The offset approach is very common and straightforward to implement. It is also very easy to change the column we use for sorting, including multiple columns. It makes it an acceptable solution in many cases, especially if the offset is not expected to be big and the data inconsistencies are acceptable.

Keyset pagination#

We can take another approach to pagination by filtering out the data we’ve already seen using the WHERE keyword instead of OFFSET. First, let’s run the following query:

SELECT id, title FROM posts
ORDER BY id ASC
LIMIT 10

In the results, we can see that the last post has an id of 10. We can now use this knowledge to request posts with the id bigger than 10.

SELECT id, title FROM posts
WHERE id > 10
ORDER BY id ASC
LIMIT 10

To get the next page of results, we need to inspect the above results and notice that the id of the last row is 20. We can use that to modify our WHERE clause.

SELECT id, title FROM posts
WHERE id > 20
ORDER BY id ASC
LIMIT 10

Unfortunately, this exposes the most significant disadvantages of the keyset pagination. To get a chunk of data, we need to know the id of the last element of the previous chunk. This makes traversing more than one page at once impossible.

To change the column by which we order our elements, we need to modify both ORDER BY and WHERE clauses.

Counting the number of rows#

It is crucial to notice that using the WHERE clause affects the rows counted with COUNT(*). To deal with this issue, we need to count the rows separately. We can create an explicit transaction or use a Common Table Expression query using the WITH statement.

WITH selected_posts AS (
  SELECT id, title FROM posts
  WHERE id > 10
  ORDER BY id ASC
  LIMIT 10
),
total_posts_count_response AS (
  SELECT COUNT(*) AS total_posts_count FROM posts
)
SELECT * FROM selected_posts, total_posts_count_response

Implementing keyset pagination with NestJS#

First, let’s modify our PaginationParams class to accept an additional query parameter.

paginationParams.ts#
import { IsNumber, Min, IsOptional } from "class-validator"
import { Type } from "class-transformer"
class PaginationParams {
  @IsOptional()
  @Type(() => Number)
  @IsNumber()
  @Min(0)
  offset?: number
  @IsOptional()
  @Type(() => Number)
  @IsNumber()
  @Min(1)
  limit?: number
  @IsOptional()
  @Type(() => Number)
  @IsNumber()
  @Min(1)
  idsToSkip?: number
}
export default PaginationParams

We also need to modify our PostsRepository to handle the additional parameter.

posts.repository.ts#
import { Injectable } from "@nestjs/common"
import DatabaseService from "../database/database.service"
import PostModel from "./post.model"
@Injectable()
class PostsRepository {
  constructor(private readonly databaseService: DatabaseService) {}
  async get(offset = 0, limit: number | null = null, idsToSkip = 0) {
    const databaseResponse = await this.databaseService.runQuery(
      `
      WITH selected_posts AS (
        SELECT * FROM posts
        WHERE id > $3
        ORDER BY id ASC
        OFFSET $1
        LIMIT $2
      ),
      total_posts_count_response AS (
        SELECT COUNT(*)::int AS total_posts_count FROM posts
      )
      SELECT * FROM selected_posts, total_posts_count_response
    `,
      [offset, limit, idsToSkip],
    )
    const items = databaseResponse.rows.map((databaseRow) => new PostModel(databaseRow))
    const count = databaseResponse.rows[0]?.total_posts_count || 0
    return {
      items,
      count,
    }
  } // ...
}
export default PostsRepository

Disadvantages#

The most apparent disadvantage of the keyset pagination is that the users need to know the id of the row they want to start with. However, we could overcome that by mixing the offset-based pagination with the keyset pagination.

Additionally, the column used in the WHERE clause should have an index for an additional performance boost. Fortunately, PostgreSQL creates an index for every primary key out of the box. Therefore, the keyset pagination should perform well when using ids.

Also, ordering the results by text fields might not be straightforward if we want to use natural sorting. If you want to know more, check out this answer on StackOverflow.

Advantages#

The keyset pagination can be a significant performance improvement over the offset-based approach. It also solves the data inconsistency issue we can experience with offset pagination. The user adding or removing elements between fetching chunks of data does not cause elements to be duplicated or skipped.

Summary#

In this article, we’ve gone through two different approaches to pagination with PostgreSQL. After pointing out their advantages and disadvantages, we can conclude that each can be a reasonable solution. The keyset pagination is more restrictive but can provide a performance boost. Fortunately, we can mix different ways of paginating the data, and combining the offset and keyset pagination can cover a wide variety of cases.

Offset and keyset pagination with raw SQL queries | NestJS.io