Running out of memory writing to a file in NodeJS

I’m processing a very large amount of data that I’m manipulating and storing it in a file. I iterate over the dataset, then I want to store it all in a JSON file.

My initial method using fs, storing it all in an object then dumping it didn’t work as I was running out of memory and it became extremely slow.

  • Asynchronous I/O - Java
  • How to implement linear flow with IO and Either functors in funcitonal programming with javascript?
  • Handle I/O requests in amazon ec2 instances
  • How would one send a String from Java to JavaScript over a network?
  • Can I write files with HTML5/JS?
  • Confusion about node.js internal asynchronous I/O mechanism
  • I’m now using fs.createWriteStream but as far as I can tell it’s still storing it all in memory.

    I want the data to be written object by object to the file, unless someone can recommend a better way of doing it.

    Part of my code:

      // Top of the file
      var wstream = fs.createWriteStream('mydata.json');
      ...
    
      // In a loop
      let JSONtoWrite = {}
      JSONtoWrite[entry.word] = wordData
    
      wstream.write(JSON.stringify(JSONtoWrite))
    
      ...
      // Outside my loop (when memory is probably maxed out)
      wstream.end()
    

    I think I’m using Streams wrong, can someone tell me how to write all this data to a file without running out of memory? Every example I find online relates to reading a stream in but because of the calculations I’m doing on the data, I can’t use a readable stream. I need to add to this file sequentially.

  • Node.js - multiple rounds of parallel io
  • Confusion about node.js internal asynchronous I/O mechanism
  • How can I use html5 to input a local file and output the file on screen?
  • How would one send a String from Java to JavaScript over a network?
  • HTML5 asynchronous file upload, uploaded stream is always invalid
  • How to get the filename from the Javascript FileReader?
  • 2 Solutions collect form web for “Running out of memory writing to a file in NodeJS”

    The problem is that you’re not waiting for the data to be flushed to the filesystem, but instead keep throwing new and new data to the stream synchronously in a tight loop.

    Here’s an piece of pseudocode that should work for you:

        // Top of the file
        const wstream = fs.createWriteStream('mydata.json');
        // I'm no sure how're you getting the data, let's say you have it all in an object
        const entry = {};
        const words = Object.keys(entry);
    
        function writeCB(index) {
           if (index >= words.length) {
               wstream.end()
               return;
           }
    
           const JSONtoWrite = {};
           JSONtoWrite[words[index]] = entry[words[index]];   
           wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(index + 1));
        }
    
        wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(0));
    

    You should wrap your data source in a readable stream too. I don’t know what is your source, but you have to make sure, it does not load all your data in memory.

    For example, assuming your data set come from another file where JSON objects are splitted with end of line character, you could create a Read stream as follow:

    const Readable = require('stream').Readable;
    class JSONReader extends Readable {
    constructor(options={}){
      super(options);
      this._source=options.source: // the source stream
      this._buffer='';
      source.on('readable', function() {
        this.read();
      }.bind(this));//read whenever the source is ready
    }
    _read(size){
       var chunk;
       var line;
       var lineIndex;
       var result;
       if (this._buffer.length === 0) {
         chunk = this._source.read(); // read more from source when buffer is empty
         this._buffer += chunk;
       }
       lineIndex = this._buffer.indexOf('\n'); // find end of line 
       if (lineIndex !== -1) { //we have a end of line and therefore a new object
          line = this._buffer.slice(0, lineIndex); // get the character related to the object
          if (line) {
            result = JSON.parse(line);
            this._buffer = this._buffer.slice(lineIndex + 1);
            this.push(JSON.stringify(line) // push to the internal read queue
          } else {
            this._buffer.slice(1)
          }
      }
    }}
    

    now you can use

    const source = fs.createReadStream('mySourceFile');
    const reader = new JSONReader({source});
    const target = fs.createWriteStream('myTargetFile');
    reader.pipe(target);
    

    then you’ll have a better memory flow:

    synchronous vs stream memory menagement

    Please note that the picture and the above example are taken from the excellent nodejs in practice book