Copy From

Jorge putting a bunch of sneakers into a closet all at once.

Table of contents


Problem

In backend applications with Vapor, when we need to insert large volumes of data into the database (initial migrations, catalog imports, bulk loading from external APIs), using .save() in a loop generates multiple individual transactions.

This results in:

  • High latency: each insert opens/closes connection and transaction overhead.
  • Poor throughput: doesn’t leverage the database engine’s bulk insertion capabilities.
  • Timeout risk: slow operations that may fail in environments with time constraints.

For mass import scenarios (thousands or millions of records), we need a bulk insert strategy that leverages native PostgreSQL capabilities.


Solution

We extend Database with a function that executes PostgreSQL’s COPY command, allowing us to import CSV files directly from the file system into the table schema.

extension Database {
    func importCSV(
        _ model: any Model.Type,
        file: PathEnum
        ) async throws {
        let query = SQLQueryString(
            """
            COPY \"\(unsafeRaw: model.space ?? "public")\".
            \"\(unsafeRaw: model.schema)\"
            FROM '\(unsafeRaw: file.rawValue)'
            WITH (FORMAT csv, HEADER true, DELIMITER ',',
            QUOTE '"', ESCAPE '"', NULL '')
            """
        )

        try await self.sqlDatabase
            .raw(query).run()
    }
}

Key points:

  • Uses PostgreSQL’s COPY, the fastest method for bulk insert from files.
  • The model.space parameter supports custom schemas (defaults to “public”).
  • model.schema automatically gets the table name from the Fluent model.
  • Standard CSV configuration: headers, delimiters, and null value handling.
  • Uses unsafeRaw for direct interpolation in the SQL query.

Result

private func importRegions() async throws {
    try await db.importCSV(
        RegionModel.self,
        file: .LocationFile("regions", .csv)
    )
}

Benefits of this approach:

🚀 Extreme performance: COPY is up to 10-100x faster than individual inserts.

📦 Atomic transaction: the entire import happens in a single operation, guaranteeing consistency.

💾 Resource efficiency: minimizes memory usage and database connections.

🔧 Native integration: leverages optimized PostgreSQL engine capabilities.

📊 Scalability: allows importing millions of records without significant degradation.


Notes

This implementation uses COPY … FROM with file system files. There is currently an open issue in Vapor to implement native support for COPY … FROM STDIN, which would allow performing bulk inserts directly from memory without intermediate files.

I’m actively monitoring this issue to integrate this functionality when available, which will provide an even more flexible and efficient API for mass import operations.

If you also need the reverse operation — exporting data from PostgreSQL to CSV files — I covered that in Copy To.

Keep coding, keep running 🏃‍♂️