Capturing Captions from Youtube Videos

There are many tutorials on how to download the caption files created by Youtube if you own the video, but I was unable to find a way to download the captions of videos I don’t own. There’s probably an easy way to do it, but since I couldn’t find it, creating a JavaScript function to do it via the console took less effort.

Here’s the code (I did it three ways depending on how you like to code your JS)

Constructor method:

function CaptionCollector () {
  var that = this;
  this.captions = '';
  var nowShowing = '';

  this.collect = function(){
    try {
      var currentCaption = document.getElementsByClassName("captions-text")[0].innerText;
    } catch (e) {
      var currentCaption = null;
    }

    if(currentCaption && nowShowing != currentCaption) {
      nowShowing = currentCaption;
      that.captions += ' ' + nowShowing;
    }

    setTimeout(that.collect, 300);
  }
}

var foo = new CaptionCollector();
foo.collect();

Print the caption with foo.captions. Of course you can use anything instead of “foo”.

Here’s a version using JS object notation:

var captionCollector = {
    captions : '',
    nowShowing: '',

    collect : function(){
      try {
        var currentCaption = document.getElementsByClassName("captions-text")[0].innerText;
      } catch (e) {
        var currentCaption = null;
      }
    if(currentCaption && this.nowShowing != currentCaption) {
        this.nowShowing = currentCaption;
        captionCollector.captions += ' ' + captionCollector.nowShowing;
     }
    setTimeout(captionCollector.collect, 300);
  }
}

captionCollector.collect();

With this version, you print the console with captionCollector.captions

Or you can use the anonymous function method with a single global variable:

(function(){
    ___captions = '';
    var ___nowShowing = '';

    function getCaption() {
        try {
          var currentCaption = document.getElementsByClassName("captions-text")[0].innerText;
        } catch (e) {
          var currentCaption = null;
        }

        if(currentCaption && ___nowShowing != currentCaption) {
          ___nowShowing = currentCaption;
          ___captions += ' ' + ___nowShowing;
        }
        setTimeout(getCaption, 300);
    }

    getCaption();
})();

With this version, you print the console with the global variable ____captions

Because it uses the classname of the caption box Youtube uses for videos, this only works on Youtube. Alter the classname for other video services.

You do have to play the whole video to capture all the captions. With settings, you can play the video at twice the speed.

Clear the console when the video ends. Print the transcript to the console. Select all. Copy. You’re good to go.

Published by

Estelle Weyl

My name is Estelle Weyl. I am a consulting web developer, am writing some books with O'Reilly, run frontend workshops, and speak about web development, performance, and other fun stuff all over the world. If you have any recommendations on topics for me to hit, please let me know via comments. If you want