Introduction
At Daily, we love seeing what developers build in whatever way works best for them. We offer Daily Prebuilt – our embeddable video call UI – to get started with just a couple lines of code or you can build a totally custom experiences with Daily’s call object. Our goal is to support everyone’s use case and skill level by making browser-based video calls easy to integrate and customize.
As part of making the call experience better for everyone, we have introduced the ability to add live captions to Daily domains with our startTranscription()
instance method in partnership with Deepgram.
This highly requested feature has many uses. They include being able to:
- Introduce wider accessibility options within calls
- Provide an “instant replay” (e.g. What did they just say?)
- Produce a great way to generate meeting notes
This tutorial focuses on pairing transcription with Daily Prebuilt. We already have a thorough tutorial on adding transcription to a custom Daily call, which you should definitely check out! Today’s tutorial includes a demo that walks through how to add transcription alongside a Daily Prebuilt call.
If you are like spoilers and want to see what we’re building today, you can jump straight into the prebuilt-transcription code and also try a live demo
Note: You will have to connect your Daily and Deepgram accounts, as outlined in the demo’s README and in this tutorial, to fully experience the live demo.
What’s the plan?
In this tutorial, we will cover:
- Getting set up with Daily and Deepgram
- Embedding Daily Prebuilt in a Next.js app
- Creating a transcription component
- Adding buttons to start and stop transcription methods
- Loading the live captions
- Optimizing the app for large amounts of text
- Downloading the transcript
Prerequisites
Because transcription is a shared service between Daily and Deepgram, there is a bit of set-up involved to get both services running with each other. But the good news is that you can set it and forget it, because it’s just a one-time step!
Note: Daily does not charge for transcription services and Deepgram offers a free $150 credit upon sign-up, so there is no cost associated with this tutorial.
To walk through this tutorial, you will first need to create a Daily account and a Deepgram account. Once you have an account and are logged into the Daily Dashboard, you can create a new Daily room or use our REST API.
To set up transcription, you will have to enable_transcription
on your Daily domain.
Essentially, you'll need your Daily API key, available in the Daily dashboard, and your Deepgram API key to update your domain settings, like so:
Setting up the demo
Head on over to the prebuilt-transcription GitHub repository and fork the repo to follow along with the rest of this post.
After forking and navigating to the prebuilt-transcription
folder, install the dependencies:
And run the dev server:
Locally, open http://localhost:3000 in your browser.
This demo is based on the Next.js React framework, starting with the create-next-app
template builder. This tutorial also uses TypeScript. If you are new to TypeScript, no worries! Because it’s built on top of JavaScript, it looks very similar with a few additional features and syntax.
Staging the [room]
The most interesting part of our codebase lives in pages/[domain]/[room].tsx
, so let’s start there. And yes, those brackets are part of the file names – this allows us to create URLs dynamically in Next.js.
When we load the page, we want to build and start the call right away with the parameters we retrieve from the page URL. To do this, we create a useCallback
function:
The above startCall
function loads when the page loads, via a React useEffect
:
This creates and joins a call built from the URL parameters.
Incorporating the Daily Prebuilt iframe
Now that the app framework is up and running, let’s add Daily Prebuilt. Since Daily Prebuilt is an embeddable video call UI, Daily has already done most of the video-related work for you. That means this part will be short.
From the aforementioned [room].tsx
page, we load the <CallFrame>
component. The full component looks like this:
The few lines of styling imported from ../styles/CallFrame.module.css
allow the Daily Prebuilt iframe to take up most of the screen:
Without this styling for the <iframe>
and its container, Daily Prebuilt defaults to taking up only a small amount of space on the page. You can change the styling however you like and Daily Prebuilt will fit within those constraints.
Note: We are using wrap()
to add Daily Prebuilt to an existing <iframe>
, but you could also use the createFrame
method to make a new <iframe>
, style that frame, and add it to the page.
Transcription component
Now that we have Daily Prebuilt loaded on the page, let’s start implementing transcription by adding a component to store our buttons and transcript.
From our [room].tsx
, we reference the Transcription
component and pass some managed state to the component:
callFrame
: passes the Daily call frame object, which allows the component to start and stop transcriptionnewMsg
: sends each new transcripted message to the component for showing the text in the transcript windowowner
: this boolean tells the component whether the current user is or isn’t a room ownerisTranscribing
: this boolean tells the component that Daily is or isn’t currently transcribing.
In our Transcription
component (defined in components/Transcription.tsx
), we have a button that toggles the option to start or stop transcription based on whether transcription is currently active according to Daily (we’ll come back to that in a second):
If the meeting participant is not an owner, this button will be disabled along with a message explaining that only meeting room owners can start transcription.
This button utilizes these two simple functions:
How do these functions know if transcription is happening or not? For that, we jump back to [room].tsx
. Earlier in the post, we looked at the basic structure of the startCall
function. In our demo, this function also has a few lines dedicated to Daily event listeners. We are listening to a few Daily-emitted events that help us shape the video call experience. Two of these events are transcription-started
and transcription-stopped
events.
When those events are emitted, we know to update the React state to set isTranscribing
to its correct boolean value.
Note: You can also use our new Daily React Hooks library to more quickly connect your React-based app with Daily’s JavaScript API!
Adding transcription
Now that we are able to start and stop transcription, we need to add the transcripts to the page. Our transcripts come in from Daily via an ”app-message”
event. For that, we need another event listener within our startCall
function. This checks whether each ”app-message”
came from the ID of “transcription” and whether it is a full sentence (that’s what data.is_final
is doing below). When we have a message, we save the message as an object with the author’s username, the text transcription, and a timestamp.
We need some React state to hold messages, so we set up a const
where we instantiate this state as an empty array to hold incoming message objects.
This is essentially all that needs to be done to get transcription on the page. You can loop through this array of messages and add them to the screen, or you can add each new message to the screen as it arrives. However, there’s one extra step worth taking to optimize your app for all of these messages, and we’ll see how that works in the next section.
Note: Transcript messages are ephemeral. They are only available for the message the user has received while they are in the room. If you refresh your page, you’ll lose the transcripts. Similarly, new users will only see a transcript for conversations that have taken place since they’ve joined and not a history.
Optimize your window
Seeing transcripts appear on the screen is super fun, but it can quickly slow down browser windows with the addition of so many DOM elements getting added to the screen. Below, we’ll cover not just how we add transcript messages to our page, but also how to do it in a way that is efficient and not overwhelming to anyone’s browser.
To help with this, we need to add two dependencies to our app: react-window
and react-virtualized-auto-sizer
. These libraries help us by loading only the most recent messages. Instead of loading the entire array of message objects as HTML, the DOM only loads the small part of the data set visible in the window. This virtualization technique prevents poor performance caused by an overloaded browser tab holding too much data in memory. Users can still scroll up and see previous messages which are loaded as needed when requested.
We have established const
s for the transcript list and rows that instantiate as empty objects.
We add new messages received from the parent [room]
page to an array. We also have a small function that keeps the array of messages moving to the bottom (most recent) element every time a message is received.
For each row, we call a formatting function. It structures the transcript in the style of “Message Author: Message Text” on the left and Timestamp trimmed to a local time only on the right (styled with CSS, the handy-dandy float:right
).
Our rendered transcript block then looks like this, with each loaded row wrapped in the react-window
List and react-virtualized-auto-sizer
AutoSizer elements.
Download
The transcripts collected in this app are not available after the call concludes, so downloading them is helpful if you want to use them later.
To do that, we need to prepare a chat file with all of the text, not just the text currently virtualized on the screen.
We have already seen that we are using React state to collect and set messages. For preparing a plain text file with the transcript inside, we will add a transcriptFile
state that instantiates as an empty string.
Next, let’s set up a useEffect
to style the transcript in a way that works best for reviewing later. Unlike the live transcript where we have the timestamp on the right and set to local time only, this includes the full timestamp and date for every message.
This link will get the most recent full transcript and by default save it as a file called transcript.txt
, although this can be changed later by the user.
Conclusion
And there you have it! Using Daily Prebuilt and our new Transcription API with Deepgram, it’s not too much work to add a live transcript to your meetings. From what we’ve shown in this demo, you can easily add different styles (including to the Daily Prebuilt window itself by customizing with your own color themes)
We would love to see what you’ve built using Daily. Reach out to us anytime at help@daily.co!