Generating statistics using aggregate functions in raw SQL
So far, we’ve been mostly writing SQL queries that either store or retrieve the data from the database. Besides that, we can rely on PostgreSQL to process the…
October 10, 2022
So far, we’ve been mostly writing SQL queries that either store or retrieve the data from the database. Besides that, we can rely on PostgreSQL to process the data and get the computed results. By doing that, we can learn more about the rows in our tables. In this article, we look into how we can use aggregate functions to generate statistics about our data.
For the code from this article check out this repository.
The purpose of aggregate functions#
The job of an aggregate function is to compute a single result from multiple input rows. One of the most popular aggregate functions is count(). When used with an asterisk, it measures the total number of rows in the table.
SELECT count(*) FROM usersWhen we provide the count function with a column name, it counts the number of rows with a non-NULL value for that column.
SELECT count(address_id) AS number_of_users_with_address FROM usersThe count() function comes in handy when implementing pagination.
import { Injectable } from "@nestjs/common"
import DatabaseService from "../database/database.service"
import PostModel from "./post.model"
@Injectable()
class PostsRepository {
constructor(private readonly databaseService: DatabaseService) {}
async get(offset = 0, limit: number | null = null, idsToSkip = 0) {
const databaseResponse = await this.databaseService.runQuery(
`
WITH selected_posts AS (
SELECT * FROM posts
WHERE id > $3
ORDER BY id ASC
OFFSET $1
LIMIT $2
),
total_posts_count_response AS (
SELECT COUNT(*)::int AS total_posts_count FROM posts
)
SELECT * FROM selected_posts, total_posts_count_response
`,
[offset, limit, idsToSkip],
)
const items = databaseResponse.rows.map((databaseRow) => new PostModel(databaseRow))
const count = databaseResponse.rows[0]?.total_posts_count || 0
return {
items,
count,
}
} // ...
}
export default PostsRepositoryGrouping data in the table#
Aggregate functions work great when we perform them on groups of data
SELECT author_id, count(*) FROM posts
GROUP BY author_idWhen we do the above, PostgreSQL divides the data into groups and runs the aggregate function on each group individually.
We could make our query even more helpful and order our results. By doing that, we can ensure the authors with the highest number of posts are at the top of the list.
SELECT author_id, count(*) AS posts_count FROM posts
GROUP BY author_id
ORDER BY posts_count DESCLet’s create a model that can hold the above data.
postAuthorStatistics.model.ts#
export interface PostAuthorStatisticsModelData {
author_id: number
posts_count: number
}
class PostAuthorStatisticsModel {
authorId: number
postsCount: number
constructor(postAuthorStatisticsData: PostAuthorStatisticsModelData) {
this.authorId = postAuthorStatisticsData.author_id
this.postsCount = postAuthorStatisticsData.posts_count
}
}
export default PostAuthorStatisticsModelLet’s create a separate statistics repository to prevent our PostsRepository class from getting too big.
postsStatistics.repository.ts#
import { Injectable } from "@nestjs/common"
import DatabaseService from "../database/database.service"
import PostAuthorStatisticsModel from "./postAuthorStatistics.model"
@Injectable()
class PostsStatisticsRepository {
constructor(private readonly databaseService: DatabaseService) {}
async getPostsAuthorStatistics() {
const databaseResponse = await this.databaseService.runQuery(
`
SELECT author_id, count(*)::int AS posts_count FROM posts
GROUP BY author_id
ORDER BY posts_count DESC
`,
[],
)
return databaseResponse.rows.map((databaseRow) => new PostAuthorStatisticsModel(databaseRow))
}
}
export default PostsStatisticsRepositoryWe also need to point to our new repository in the PostsService class.
posts.service.ts#
import { Injectable } from "@nestjs/common"
import PostsRepository from "./posts.repository"
import PostsStatisticsRepository from "./postsStatistics.repository"
@Injectable()
export class PostsService {
constructor(
private readonly postsRepository: PostsRepository,
private readonly postsStatisticsRepository: PostsStatisticsRepository,
) {}
getPostAuthorStatistics() {
return this.postsStatisticsRepository.getPostsAuthorStatistics()
} // ...
}The last step is to use it in the controller.
posts.controller.ts#
import { ClassSerializerInterceptor, Controller, Get, Query, UseInterceptors } from "@nestjs/common"
import { PostsService } from "./posts.service"
import GetPostsByAuthorQuery from "./getPostsByAuthorQuery"
import PaginationParams from "../utils/paginationParams"
@Controller("posts")
@UseInterceptors(ClassSerializerInterceptor)
export default class PostsController {
constructor(private readonly postsService: PostsService) {}
@Get()
getPosts(
@Query() { authorId }: GetPostsByAuthorQuery,
@Query() { offset, limit, idsToSkip }: PaginationParams,
) {
return this.postsService.getPosts(authorId, offset, limit, idsToSkip)
}
@Get("statistics")
getStatistics() {
return this.postsService.getPostAuthorStatistics()
} // ...
}Other aggregate functions#
There are more aggregate functions besides count(). Let’s go through them.
max and min#
Using the max() function, we can find the largest value of the selected column. Respectively, the min() function returns the smallest value of the column.
Since we don’t have any numerical columns in our posts, let’s pair the above functions with length(). This way, we can get the longest and shortest posts of a particular author.
The
length()function returns the length of a string.
postsStatistics.repository.ts#
import { Injectable } from "@nestjs/common"
import DatabaseService from "../database/database.service"
import PostAuthorStatisticsModel from "./postAuthorStatistics.model"
@Injectable()
class PostsStatisticsRepository {
constructor(private readonly databaseService: DatabaseService) {}
async getPostsAuthorStatistics() {
const databaseResponse = await this.databaseService.runQuery(
`
SELECT
author_id,
count(*)::int AS posts_count,
max(length(post_content)) AS longest_post_length,
min(length(post_content)) AS shortest_post_length
FROM posts
GROUP BY author_id
ORDER BY posts_count DESC
`,
[],
)
return databaseResponse.rows.map((databaseRow) => new PostAuthorStatisticsModel(databaseRow))
}
}
export default PostsStatisticsRepositoryRunning
max()ormin()on a text column returns a string based on the alphabetical order.
sum#
With the sum() function, we can return a total sum of a particular column. Since it only works with numerical values, we also need the length() function.
Since the
sum()function also returns the value in the bigint format, we transform it to a regular integer. We can do it because we don’t expect values bigger than 2³¹⁻¹.
postsStatistics.repository.ts#
import { Injectable } from "@nestjs/common"
import DatabaseService from "../database/database.service"
import PostAuthorStatisticsModel from "./postAuthorStatistics.model"
@Injectable()
class PostsStatisticsRepository {
constructor(private readonly databaseService: DatabaseService) {}
async getPostsAuthorStatistics() {
const databaseResponse = await this.databaseService.runQuery(
`
SELECT
author_id,
count(*)::int AS posts_count,
max(length(post_content)) AS longest_post_length,
min(length(post_content)) AS shortest_post_length,
sum(length(post_content))::int AS all_posts_content_sum
FROM posts
GROUP BY author_id
ORDER BY posts_count DESC
`,
[],
)
return databaseResponse.rows.map((databaseRow) => new PostAuthorStatisticsModel(databaseRow))
}
}
export default PostsStatisticsRepositoryavg#
The avg() function calculates the average of the values in a group. Let’s combine it with the length() function to calculate the average length of all posts of a particular author.
postsStatistics.repository.ts#
import { Injectable } from "@nestjs/common"
import DatabaseService from "../database/database.service"
import PostAuthorStatisticsModel from "./postAuthorStatistics.model"
@Injectable()
class PostsStatisticsRepository {
constructor(private readonly databaseService: DatabaseService) {}
async getPostsAuthorStatistics() {
const databaseResponse = await this.databaseService.runQuery(
`
SELECT
author_id,
count(*)::int AS posts_count,
max(length(post_content)) AS longest_post_length,
min(length(post_content)) AS shortest_post_length,
sum(length(post_content))::int AS all_posts_content_sum,
avg(length(post_content))::real AS average_post_content_length
FROM posts
GROUP BY author_id
ORDER BY posts_count DESC
`,
[],
)
return databaseResponse.rows.map((databaseRow) => new PostAuthorStatisticsModel(databaseRow))
}
}
export default PostsStatisticsRepositoryA significant thing about the avg() function is that it returns the data in the numeric type. It can store many digits and is very useful when exactness is crucial. Parsing this data type to JSON converts it to a string by default. Since we don’t need many digits after the decimal, we convert it to the real data type.
Aggregating data from more than one table#
So far, we’ve been grouping and aggregating data in one table. However, a typical case might be when we want to aggregate data using more than one table. In this case, we need to use the JOIN keyword.
SELECT addresses.country, COUNT(*) AS number_of_users FROM users
LEFT JOIN addresses ON users.address_id = addresses.id
GROUP BY addresses.countryFiltering using aggregate functions and grouping#
So far, to filter the results from the database, we’ve been using the WHERE keyword.
SELECT * FROM posts
WHERE id = 1When we want to filter using aggregate functions and grouping, we need to look at the execution order of SQL clauses.
FROMWHEREGROUP BYHAVINGDISTINCTSELECTORDER BYLIMIT
Since PostgreSQL executes the WHERE clause before GROUP BY, we can’t use it with aggregate functions.
SELECT author_id, count(*) as posts_count FROM posts
WHERE count(*) > 100
GROUP BY author_idERROR: aggregate functions are not allowed in WHERE
LINE 2: WHERE count(*) > 100
Instead, we need to use the HAVING keyword.
SELECT author_id, count(*) AS posts_count FROM posts
GROUP BY author_id
HAVING count(*) > 100Using aliases#
An important caveat is that we can’t use column aliases with the HAVING keyword. So, for example, the following code wouldn’t work:
SELECT author_id, count(*) AS posts_count FROM posts
GROUP BY author_id
HAVING posts_count > 100Instead, we need to use the count() function twice and rely on PostgreSQL to optimize it.
Summary#
In this article, we’ve gone through how to use aggregate functions together with grouping. When doing so, we’ve implemented an endpoint that returns statistics about a particular table. We also wrote an example that uses an aggregate function and grouping when joining two tables. Finally, we’ve also learned how to filter our data using aggregate functions and why we can’t do that with the WHERE keyword.
There is still more to cover when writing raw SQL with NestJS, so stay tuned!