This is a Labs feature! We couldn’t wait to get your feedback on it, so we are providing you early access, even though there is still room for improvement. Don't hesitate to let us know your thoughts about this feature in the NetX Ideas Portal .

NetX's integration with Google's Speech-to-Text API automatically converts video and audio data to text, generating a VTT file that is fully indexed for search capabilities. This text will be broken into segments of the video or audio file in seconds, the duration of which may be determined via a NetX  property . This VTT file will be a textual transcription, and in the case of video files may be used to include closed captioning  with your NetX video content. 

NetX's speech-to-text feature relies on Google's API for text generation and accuracy. Results may vary, and will depend on audio quality and clarity of the speech in the video itself. For this reason, generating text or VTT files from certain content (such as music or lower quality audio files) may not fully transcribe the audio as expected, including punctuation such as sentence breaks. Shorter second durations will often transcribe with greater accuracy than longer second settings.

Setup requirements

System requirements

  • NetX version 8.12 and later
  • FFmpeg installed and configured on your desired NetX instance. FFmpeg is already installed for SaaS customers, if you are on-premise, you can access installation information here.

Google credentials

To use Google's Speech to Text feature, you must create an API key with your Google account. Because this key will be linked to a specific Google account, it is recommended that a company account be created and used rather than tying your NetX instance to a personal account. 

  1. From the developer's console , create a new project. 
  2. In the Dashboard, under Getting Started, click Explore and enable APIs, then click Enable APIs and Services at the top of the page. Select  Cloud Speech-to-Text API  and enable it.

    Note

    Be sure to choose Cloud Speech-to-Text API, not Cloud Text-to-Speech API. Creating your credentials before enabling this service will result in an API key that will not work with the speech-to-text feature.

  3. Next, select the  icon along the lefthand sidebar. This will open your Credentials page; choose Create credentials --> API key.



  4. This will generate an API key. Simply use the  icon to automatically copy your key to your clipboard. This is the key you will use to  link  NetX with your Google account.

Implementation

Once you have generated your Google service's API key and inputted the key into the corresponding NetX property , you are ready to set up the AutoTask criteria which will trigger Google's speech-to-text job. You may configure your AutoTask based on standard  AutoTask criteria , but below you will find examples of simple tasks which will generate VTT files for either video or audio files upon every applicable import into NetX. 

VTT files are generated on import, but may not be available immediately even if your asset is full uploaded. To determine the status of your file's speech-to-text extraction, look to the Jobs queue found in the Systems area of your instance. This will show whether or not the process has been completed, or give you an approximation of how much of the process is complete (in percentages) if it is not finished generating.

Audio

This AutoTask will generate VTT files for all audio assets which are imported into your NetX instance. Note the action value is set to import, while the fileFormatFamily is established as audio

<task id="speech" name="Speech To Text - Audio">
	  <matchCriteria type="and">
		<criteria type="action" value="import"/>
		<criteria type="attribute" name="fileFormatFamily" value="audio"/>
	  </matchCriteria>
		<autoTaskJob className="com.netxposure.products.imageportal.autotask2.impl.GoogleSpeechToTextJob"/>
</task> 
CODE

Video

This AutoTask will generate VTT files for all video assets which are imported into your NetX instance. Note the action value is set to import, while the fileFormatFamily is established as video

<task id="speech" name="Speech To Text - Video">
	  <matchCriteria type="and">
		<criteria type="action" value="import"/>
		<criteria type="attribute" name="fileFormatFamily" value="video"/>
	  </matchCriteria>
		<autoTaskJob className="com.netxposure.products.imageportal.autotask2.impl.GoogleSpeechToTextJob"/>
</task> 
CODE

Close captioning with your VTT file

Generated VTT files will appear in the views tab of the asset's asset detail page . In the case of audio files, these will just act as downloadable transcriptions in the form of VTT files. In the case of videos, however, these files will be able to be implemented as closed-captions for the video in question. As long as your VTT view is titled previewVTT, this process should be automatic; simply toggle the cc icon to view or hide subtitles from the video preview. 

Advanced settings

PropertyValuesDescriptionRequires Restart?
external.google.speech.api.key
API keyThis is where you will input your Google API key. This should be a string of random numbers and letters generated by Google.

yes

search.index.asset.contents.includeSubtitles
true / falseDetermines whether or not your generated VTT file will be indexed for content searches .yes
external.google.speech.segmentLength
NumberThis property will determine how many seconds at a time Google will gather speech data, which will also determine how much closed captioning text is presented at a time. The default value for this property is 15.yes