Home:ALL Converter>How can I store a large dataset result by chunks to a csv file in nodejs?

How can I store a large dataset result by chunks to a csv file in nodejs?

Ask Time:2021-09-22T05:20:37         Author:fpelaezt

Json Formatter

I have a mysql table of about 10 Million records, I would like to send those records to a csv file using NodeJs.

I know I can make a query to get all records, store the result in a json format variable and send those to a csv file using a library like fastcsv in conjunction with createWriteStream. Writing the result to the file is doable using a stream. But what I want to avoid is storing 10 Million records into memory (suppose that the records have a lot of lot of columns).

What I would like to do is to query only a subset of the result (for example 20k rows), store the info to the file, then query the next subset (next 20k rows) and append the results to the same file and continue the process until it finishes. The problem that I have right now is that i don't know how to control the execution for the next iteration. Accoding to the debug, different writing operation are being executed at the same time because of the asynchronous nature of nodejs giving me a file where some lines are mixed (multiple results in the same line) and unordered records.

I know the total execution time is affected with this approach, but in this case i prefer a controlled way and avoid ram consumption.

For the database query I'm using sequelize with MySQL, but the idea is the same regardless the query method.

This is my code so far:

// Store file function receives: 
// (String) filename
// (Boolean) headers: first iteration is true to put a name to the columns
// (json document) jsonData is the information to store in te file
// (Boolean) append: Disabled the first iteration to create a new file 
const storeFile = (filename, headers, jsonData, append) => {
    const flags = append === true ? 'a' : 'w'
    const ws = fs.createWriteStream(filename, { flags, rowDelimiter: '\r\n' })
    fastcsv
        .write(jsonData, { headers })
        .on('finish', () => {
            logger.info(`file=${filename} created/updated sucessfully`)
        })
        .pipe(ws)
}

// main
let filename = 'test.csv'
let offset = 0
let append = false
let headers = true
const limit = 20000
const totalIterations = Math.ceil(10000000/ limit)

for (let i = 0; i < totalIterations; i += 1) {
    // eslint-disable-next-line no-await-in-loop
    const records = await Record.findAll({
        offset,
        limit,
        raw: true,
    })
    storeFile(filename, headers, records, append)
    headers = false
    append = true
    offset += limit // offset is incremented to get the next subset
}

Author:fpelaezt,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/69275677/how-can-i-store-a-large-dataset-result-by-chunks-to-a-csv-file-in-nodejs
yy