Lyrinė Tālrunis: “Telephone” with Lyric Translations

Last week I hacked together a mashup of HP’s IdolOnDemand‘s free Speech Recognition API and Google’s fee-for-service Translation API to create a Lyric Translator. Of course, I had to make the site responsive using VW units for text, and Flexbox for the layout. I also used a datalist to provide an optional list of usable media files. Then, tonight, I wrote a blog post explaining these components.

Enjoy!

The app

Lyrinė Tālrunis, or “Lyric Telephone”, does two things:

  1. It captures the text from an audio file using the free Speech Recognition API from HP IdolOnDemand,
  2. It then translates the captured text into from the original language into French, German, Spanish, Vietnamese, Russian and then back to the original language, enabling you to create new lyrics for — or a more entertaining interpretation of — your favorite songs.

The name is basically Lyrics, as in lyrics from songs, but you can use any media file, and ‘Telephone’ in reference to the game of telephone you learned in pre-K, where when a story gets passed to too many people (or languages) the original text gets morphed into something else.

As deployed, the app provides for two preloaded options of Rudolph the Red-nosed Reindeer and Frosty the Snowman, but you can include a link to any .wav or .mp3 file you find on line.

I’ll be expanding the application to allow for file uploads, and hope to implement uploading directly from your device’s microphone with:

<input type="file" accept="audio/*;capture=microphone">

For right now, simply find a song online, or even a video file, and include the full URL in the input box.

HP’s IdolOnDemand Speech Recognition API

IDOL OnDemand’s Speech Recognition API creates a transcript of the text in an audio or video file. It supports seven languages – including both US an UK English (yes, it will produce “colour” instead of “color” if you ask it to).

You do first need to register with IdolOnDemand to get an API Key. Then it’s a simple AJAX call.

The values you need create the request URL for the speech recognitio API include:

  • The encoded URI to your .wav or other audio of video file.
  • Your API Key
  • The language your file is in (defaults to en-US)

Here is how I created my URL:

  var apikey = YOUR_HP_API_KEY;
      url = document.getElementById('url').value,
      language = document.getElementById('lang').value, 
      query = 'https://api.idolondemand.com/1/api/sync/recognizespeech/v1' + '?';       
  query  += "&url=" + encodeURIComponent(url);
  query  += "&apikey=" + apikey;
  query  += "&language=" + ( language || "en-US");

Where url is the id of the input where the user enters the full path to the media file. The input has 3 default options for you to choose from, but you can enter any text you wish. This is explained in the <datalist> / list attribute section below.

The ‘lang’ is the ID of the <select> drop down that list the langage of the media file.

<select name="lang" id="lang">
  <option value="en-US">English (US)</option>
  <option value="en-GB">English (UK)</option>
  <option value="de-DE">German</option>
  <option value="es-ES">Español</option>
  <option value="fr-FR">Français</option>
  <option value="it-IT">Italiano</option>
</select>

The speech recognition API works for all the above languages and Chinese. Surprisingly, I can actually read and understand all of the languages above (don’t ask).  As I can’t read Chinese and wouldn’t be able to debug it, I didn’t include it in this app.  If you want to include Chinese as an option, please feel free to fork the app repo and add Chinese back in, but please use your own API key.

IDOL OnDemand’s Speech Recognition API uses the language code and the country subcode, so use the long form like “en-US” and not “en”. Don’t know what I am talking about (or have insomnia)? Read up on language tags.

Depending on the file size, your file can take a while to process, so make sure to let your user know something is happening, and make sure to handle errors in case it times out. For better user experience, when the button gets clicked, calling the API, the content of the button changes to a rotating pipe in the hopes of making a quick and easy spinner. (Check out the CSS file if you want to learn the animation, as I am not covering it here). The animation stops and the button returns to the original text when the text extraction of the media file is returned from the Speech API.

 var app = {
     ...
    
     init : function () {
	     // add eventHandler to button
        document.getElementById('doThis').addEventListener('click', function(){
          app.submitToHP();
          app.changeButton();
        });
      },

      // get the words from the original media file
      // the `data` object contains default values & the `apikey`
      submitToHP : function () {
        data.url = document.getElementById(
url').value || data.url;
        var query = data.request_url + '?';
            query += "&url=" + encodeURIComponent(data.url);
            query += "&apikey=" + data.apikey;
            query += "&language=" + document.getElementById('lang').value || "en-US");

        var request = $.ajax(query, function(e) {
                // successfully sent - no actions
              })
            .done(function(e) {
                // response received - handle it
                app.acceptResponse(e.document[0].content);
              })
            .fail(function(e) {
                // error - handle it
                app.acceptResponse('Oops, something went wrong.');
              })
            .always(function(e) {
                // finished - stop button animation
                app.revertButton();
            });
      },
      ... // continues

The reqest returns a JSON object:

{
  "document": [
    {
      "content": "the media speech is here"
    }
  ]
}

So we grab that content with:

e.document[0].content

where e is the response.

The other functions included, but not described include:

  • app.changeButton() — changes button to a spinner
  • app.revertButton() — resets the button to original behavior
  • app.acceptResponse(e.document[0].content) — writes text to the page, and initiates translations, which are done via the Goole Translate API

Google Translate API

I tried finding a good, free, intuitive, easy to use, translation API, but came up empty handed. Sorry Microsoft, you have too many steps, and I couldn’t just “dive right in.” I do, however, have the Yandex translation API on my list of things to look up. It looks like it might be a good free alternative to Google’s fee-for-service translation API.

Time is money, and I was already familiar with the Google Translate API, which is the main reason I chose it. Again, fork me repo to try something different.

The request URL for Google Translate API looks something like this:

var request_URL = "https://www.googleapis.com/language/translate/v2?key=" +  
  YOUR_GOOGLE_API_KEY +
  "&source=" + from +
  "&target=" + to +
  "&q=" + encodeURIComponent(text);

Where you use your own Google API Key, the ‘from’ is the original language, the ‘to’ is the language you want to translate to, and the text you encode with encodeURIComponent(text) is the return value from the Speech Recognition API, what we captured as e.document[0].content above.

As we are translating from the original language to French, German, Spanish, Vietnamese, Russian and then back again to the original language, we are calling the Google API 6 times. The important information is how to create the link for Google’s REST API. You can take a look at the source code to see how I iterate thru the various languages and call the AJAX call for each translation.

CSS3 Values

The page is full responsive. On larger devices, the font is larger. This is done without @media queries. I know. I know. Media queries are all the rage. But they’re not always necessary. CSS3 provides responsive features that enable the creation of responsive content without having to define where your design splits. The browser can do it for you.

In this case, it’s the VW, or viewport width unit, that makes the site naturally responsive. The VW unit is relative to the viewport width: the viewport width is 100vw. If you don’t know all your length units, my 4 year old post needs some updating.

  h1 {
    line-height: 30vh;
    font-size: 8vw;
  }

The above CSS snippet reads “the line height should be 30% of the viewport height. The font size should be 8% of the viewport width”

As the viewport narrows, the font-size will shrink. Yes, it will get illegible if the viewport is too narrow, but no phone is narrow enough to make that illegible. That size is relatively huge, and will even be legible on new watches. And the height shrinks, so does the line height, meaning the h1 will never be taller than 30% of the height of the viewport.

I used VW and VH for the height of the blue header and containers for the translation content: even when empty, the articles will be 40% of the document height.

The content in the button and the input also grow and shrink as the viewport grows or shrinks. vh and vw are very well supported, though vmax, the maximum value of the viewport width and height, and vmin, the lesser of those 2 values, is not fully supported.

CSS Flexible Box Layout Module

CSS Layout is fun! No. Seriously. It is. Just use flexbox.

body {
  display: flex;
  flex-direction: column;
  flex: 1 0 300px;
}
header, article {
  flex: 3;
  min-height: 40vh;
}
footer, main {
  flex: 1; 
  max-height: 10vh;
}
main, article {
  display: flex;
  justify-content: space-around;
  align-items: center;
}
section {
  flex: 1;
}
@media screen and (max-width: 500px) {
  main, article {
    flex-wrap: wrap;
  }
}

Above is just part of the CSS. I’ve posted the CSS relevant to the flexible layout of the document. You’ll note that the layout has four vertical sections: the header, main section with the buttons, the articles where the translations go, and the footer.

The body CSS code block above makes the body a flex container, and the header, main, article and footer all flex items. The flex-direction is column, so they’re one on top of another instead of row, which would put the parts side by side.

The main and article are not only flex items, but they, in turn, are also flex containers. The default flex-direction is row, so the children of the main and the children of the article will be laid out side by side within their parent. We have two flex items within the article (the original text result from the Speech API and the final result processed thru 5 translations.) These will be side by side, and will always be of equal height. They will not wrap by default, but if the width of the screen is 500px wide or less, the row of content can wrap, which means the translation can land below the text capture, and the input can land below the button, which can fall below the language selection.

This project is to show a very simple example of flex layout, and is not meant as a full flexbox tutorial. To learn more about flexbox, I have an open source flexbox tutorial you can play with. There you can see that justify-content: space-around; means that the extra space around the items will be evenly distributed around each item, and all the other flexbox, including several not included in this demonstration.

Datalist element and list attribute

Take a look at the input on the page. You’ll note, in modern browsers, there is a little arrow on the right, which if clicked, shows an autocomplete. This effect is achieved using the HTML5 list attribute along with the <datalist> element and that element’s nested <options>.

<input type="url" list="urls" id="url" placeholder=" URL of .wav file">
    
<datalist id="urls">
  <option value="http://estelle.github.io/audiotranslator/data/frosty.wav">Frosty</option>
  <option value="http://estelle.github.io/audiotranslator/data/rudolph.wav">Rudolph</option>
  <option value="http://estelle.github.io/audiotranslator/data/tet.wav">Test File</option>
</datalist>

In this example, we have an <input> of type url, with a list attribute. The value of the list attribute is the value of the ID of the datalist element. This associates the urls datalist with the input. If a browser doesn’t support datalist, it will simply not show the datalist. Totally progressive enhancement: the form control is still usable in Netscape 4.7

If the browser does support datalist, a drop down menu of the options will show when the input has focus. It will only show the values that are still potentionally valid. If you enter ‘h’, all the values will still show. ANy other character, and the options will no longer be valide options, and they will no longer be displayed. I have a tutorial on web forms, which demonstrates the inclusion of datalist on text, url, email, number, color, range, date and time input types.

Future of the app

Here are some ideas I have for expanding the application. Please fork and do it for me :D

  • File upload option
  • Drag and Drop from desktop to upload file
  • Audio Capture directly from your device
  • Inclusion of original <audio> for your listening pleasure

Published by

Estelle Weyl

My name is Estelle Weyl. I am a consulting web developer, am writing some books with O'Reilly, run frontend workshops, and speak about web development, performance, and other fun stuff all over the world. If you have any recommendations on topics for me to hit, please let me know via comments. If you want