Speech To Text

The Speech-to-text service

eResearch services has developed the speech-to-text service that automatically transcribes audio data. This service is safe, secure and complies with Australian government, Griffith, and ARC funding requirements.

What is the speech-to-text service

The speech-to-text service uses the Microsoft Azure transcription service to transcribe your audio data. Machine learning models are used to automate this process, this is where Microsoft's wealth of training data comes in handy to create transcriptions with a high accuracy.

Unlike other transcription services available, the speech-to-text service is secure. That is, your data will not be shared with third parties, mined for information, or stored in an off-shore location.

Cost

The speech-to-text transcription service hosted at Griffith is free for all Griffith staff and students.

How do I get access

Griffith University members are able to log in directly without approval or account setup.
The service is available at:

Speech-to-Text Transcription Service @Griffith

You can login using your Griffith credentials.

Once you successfully log in there is a help file located on the front page, this is found by clicking the circled i (for information) on the top right of the page.

If you experience any issues with this service please fill in this form.

Note that any uploaded audio files and associated transcription older than 30 days will be automatically removed from the system. Since the system is not designed as a data repository it is recommended that once you've completed the transcription of your uploaded audio, download the transcription and store your audio and transcription files on either Research Space or Research Drive location. Check out available Research Storage options.

Access to the application is only available to Griffith University members. If you would like to add someone external to Griffith, you will need to fill out this form, to get them a Griffith visitors account.

Speech-to-text examples

Examples of speech-to-text transcriptions can be found here. These examples are straight out of the speech-to-text transcription service, no alterations.

Audio quality has the greatest affect on transcription accuracy, below are some tips to improve the audio recording, and thereby your transcription.

Resources

Useful booklet regarding audio-visual recording in human research.

Improving the accuracy of transcriptions

General

You should consider the following tips in preparation for recording an interview for transcription with Griffith's speech-to-text service.

1. Please speak as clear as possible. Refrain from speaking too fast.

2. Try not to speak over each other. The transcription will not be able to differentiate the speakers and can add it all into the same sentence.

3. Pause between sentences. AI uses gaps between voices for punctuation.

4. When conducting phone interviews, please do not use a separate recorder that is external to the phone, use the internal recorder. Using an external recorder for phone interviews will significantly decrease transcription accuracy, especially for the person being interviewed.

Recording Devices

There are several options available to record a phone interview or virtual meeting

Record from a Griffith Desktop Phone

Griffith desktop phones can record calls. You will need to dial in the extension 59788 during the call, this will create a voicemail that can be downloaded later from the self-care portal. For more information please see the user guide

Important: by default the maximum recording time on Griffith desktop phones is 5 minutes, please contact 55555 to extend this time limit.

Microsoft Teams

A comprehensive video about recording interviews in Teams can be found here

Jabber

To record a Jabber call, you will need to use a screen recorder or audio recorder software, then you will need to extract the audio from the video. The following are the recommended programs for video capture:

On a Windows 10, use the built in program "Game Bar"

On a macOS Mojave or later, use the built in program "Screenshot toolbar"

There are also downloadable tools like Audacity and OBS that can be used to capture recording through your computer.

Then you can extract the audio from video file. “VLC media player” is the recommended software, and comes installed on all Griffith computers. This is shown in the following images.

You can also use the command line tool "ffmpeg" to extract audio from video, and to change audio formats.

Things to consider when running the speech-to-text transcription

Speech Diarization will attempt to transcribe the speech of each individual for up to two people. To enable diarization, go into your speech-to-text web portal, and tick the “Diarization” checkbox in advance settings. If you run diarization, you will need to convert the audio to mono if it was captured in stereo. VLC media player is ideal to convert stereo recordings to audio. Conversion of audio from stereo to mono is possible using ffmpeg. Check out this How-To guide.

AI Model - Choose the latest AI model for your transcription. Models are named by year month day, so please pick the model with the most recent date. For example, 20201019 was the latest when this document is written.

Speech To Text Service