WebVTT is a format for displaying timed text tracks (e.g. subtitles) with the {{HTMLElement("track")}} element. The primary purpose of WebVTT files is to add subtitles to a {{HTMLElement("video")}}.
WebVTT is a text based format. A WebVTT file must be encoded in UTF-8 format. Where you can use spaces you can also use tabs.
The mime type of WebVTT is text/vtt
.
WebVTT Body
The structure of a WebVTT file requires two things and has four optional components.
- An optional byte order mark (BOM)
- The string
WEBVTT
- An optional text header to the right of
WEBVTT
.- There must be at least one space after
WEBVTT
- You might use this to add a description to the file
- You may use anything except newlines or the string "
-->"
- There must be at least one space after
- A blank line, which is equivalent to two consecutive newlines.
- Zero or more cues or comments.
- Zero or more blank lines.
Example 1 - Simplest possible WEBVTT file
WEBVTT
Example 2 - Very simple WebVTT file
WEBVTT - This file has no cues.
Example 3 - Common WebVTT example
WEBVTT - This file has cues. 14 00:01:14.815 --> 00:01:18.114 - What? - Where are we now? 15 00:01:18.171 --> 00:01:20.991 - This is big bat country. 16 00:01:21.058 --> 00:01:23.868 - [ Bats Screeching ] - They won't get in your hair. They're after the bugs.
WebVTT Comment
Comments are an optional component that can be used to add information to a WebVTT file. Comments are intended for those reading the file and are not seen by users. Comments may contain newlines but it cannot contain a blank line, which is equivalent to two consecutive newlines. A blank line signifies the end of a comment.
A comment cannot contain the string "-->",
the ampersand character (&), or the less-than sign (<). Instead use the escape sequence "&" for ampersand and "<" for less-than. It is also recommended that you use the greater-than escape sequence ">" instead of the greater-than character (>) to avoid confusion with tags.
A comment consists of three parts:
- The string
NOTE
- A space or a newline
- Zero or more characters other than those noted above
Example 4 - Common WebVTT example
NOTE This is a comment
Example 5 - Multi-line comment
NOTE Another comment that is spanning more than one line. NOTE You can also make a comment across more than one line this way.
Example 6 - Common comment usage
WEBVTT - Translation of that film I like NOTE This translation was done by Kyle so that some friends can watch it with their parents. 1 00:02:15.000 --> 00:02:20.000 - Ta en kopp varmt te. - Det är inte varmt. 2 00:02:20.000 --> 00:02:25.000 - Har en kopp te. - Det smakar som te. NOTE This last line may not translate well. 3 00:02:25.000 --> 00:02:30.000 -Ta en kopp.
WebVTT Cues
A cue is a single subtitle block that has a single start time, end time, and textual payload. Example 6 consists of the header, a blank line, and then five cues separated by blank lines. A cue consists of five components:
- An optional cue identifier followed by a newline
- Cue timings
- Optional cue settings with at least one space before the first and between each setting
- One or more newlines
- The cue payload text
Example 7 - Example of a cue
1 - Title Crawl 00:00:5.000 --> 00:00:10.000 line:0 position:20% size:60% align:start Some time ago in a place rather distant....
Cue Identifier
The identifier is a name that identifies the cue. It can be used to reference the cue from a script. It must not contain a newline and cannot contain the string "-->"
. It must end with a single newline. They do not have to be unique, although it is common to number them (e.g. 1, 2, 3, ...).
Example 8 - Cue identifier from Example 7
1 - Title Crawl
Example 9 - Common usage of identifiers
WEBVTT 1 00:00:22.230 --> 00:00:24.606 This is the first subtitle. 2 00:00:30.739 --> 00:00:34.074 This is the second. 3 00:00:34.159 --> 00:00:35.743 Third
Cue Timings
A cue timing indicates when the cue is shown. It has a start and end time which are represented by timestamps. The end time must be greater than the start time, and the start time must be greater than or equal to all previous start times. Cues may have overlapping timings.
If the WebVTT file is being used for chapters ({{HTMLElement("track")}} {{htmlattrxref("kind")}} is chapters
) then the file cannot have overlapping timings.
Each cue timing contains five components:
- Timestamp for start time
- At least one space
- The string "
-->"
- At least one space
- Timestamp for end time
- Which must be greater than the start time
The timestamps must be in one of two formats:
mm:ss.ttt
hh:mm:ss.ttt
Where the components are defined as follows:
hh
is hours- Must be at least two digits
- Hours can be greater than two digits (e.g. 9999:00:00.000)
mm
is minutes- Must be between 00 and 59 inclusive
ss
is senconds- Must be between 00 and 59 inclusive
ttt
is miliseconds- Must be between 000 and 999 inclusive
Example 10 - Basic cue timing examples
00:22.230 --> 00:24.606 00:30.739 --> 00:00:34.074 00:00:34.159 --> 00:35.743 00:00:35.827 --> 00:00:40.122
Example 11 - Overlapping cue timing examples
00:00:00.000 --> 00:00:10.000 00:00:05.000 --> 00:01:00.000 00:00:30.000 --> 00:00:50.000
Example 12 - Non-overlapping cue timing examples
00:00:00.000 --> 00:00:10.000 00:00:10.000 --> 00:01:00.581 00:01:00.581 --> 00:02:00.100 00:02:01.000 --> 00:02:01.000
Cue Settings
Cue settings are optional components used to position where the cue payload text will be displayed over the video. This includes whether the text is displayed horizontally or vertically. There can be zero or more of them, and they can be used in any order so long as each setting is used no more than once.
The cue settings are added to the right of the cue timings. There must be one or more spaces between the cue timing and the first setting and between each setting. A setting's name and value are separated by a colon. The settings are case sensitive so use lower case as shown. There are five cue settings:
- vertical
- Indicates that the text will be displayed vertically rather than horizontally, such as in some Asian languages.
Table 1 - vertical values vertical:rl
writing direction is right to left vertical:lr
writing direction is left to right - line
- Specifies where text appears vertically. If vertical is set, line specifies where text appears horizontally.
- Value can be a line number
- The line height is the height of the first line of the cue as it appears on the video
- Positive numbers indicate top down
- Negative numbers indicate bottom up
- Or value can be a percentage
- Must be an integer (i.e. no decimals) between 0 and 100 inclusive
- Must be followed by a percent sign (%)
Table 2 - line examples vertical
omittedvertical:rl
vertical:lr
line:0
top right left line:-1
bottom left right line:0%
top right left line:100%
bottom left right - position
- Specifies where the text will appear horizontally. If vertical is set, position specifies where the text will appear vertically.
- Value is a percentage
- Must be an integer (no decimals) between 0 and 100 inclusive
- Must be followed by a percent sign (%)
Table 3 - position examples vertical
omittedvertical:rl
vertical:lr
position:0%
left top top position:100%
right bottom bottom - size
- Specifies the width of the text area. If vertical is set, size specifies the height of the text area.
- Value is a percentage
- Must be an integer (i.e. no decimals) between 0 and 100 inclusive
- Must be followed by a percent sign (%)
Table 4 - size examples vertical
omittedvertical:rl
vertical:lr
size:100%
full width full height full height size:50%
half width half height half height - align
- Specifies the alignment of the text. Text is aligned within the space given by the size cue setting if it is set.
Table 5 - align values vertical
omittedvertical:rl
vertical:lr
align:start
left top top align:middle
centred horizontally centred vertically centred vertically align:end
right bottom bottom
Example 13 - Cue setting examples
The first line demonstrates no settings. The second line might be used to overlay text on a sign or label. The third line might be used for a title. The last line might be used for an Asian language.
00:00:5.000 --> 00:00:10.000 00:00:5.000 --> 00:00:10.000 line:63% position:72% align:start 00:00:5.000 --> 00:00:10.000 line:0 position:20% size:60% align:start 00:00:5.000 --> 00:00:10.000 vertical:rt line:-1 align:end
Cue Payload
The payload is where the main information or content is located. In normal usage the payload contains the subtitles to be displayed. The payload text may contain newlines but it cannot contain a blank line, which is equivalent to two consecutive newlines. A blank line signifies the end of a cue.
A cue text payload cannot contain the string "-->"
, the ampersand character (&), or the less-than sign (<). Instead use the escape sequence "&" for ampersand and "<" for less-than. It is also recommended that you use the greater-than escape sequence ">" instead of the greater-than character (>) to avoid confusion with tags. If you are using the WebVTT file for metadata these restrictions do not apply.
In addition to the three escape sequences mentioned above, there are fours others. They are listed in the table below.
Table 6 - Escape sequences | ||
---|---|---|
Name | Character | Escape Sequence |
Ampersand | & | & |
Less-than | < | < |
Greater-than | > | > |
Left-to-right mark | ‎ |
|
Right-to-left mark | ‏ |
|
Non-breaking space | |
|
Cue Payload Text Tags
There are a number of tags, such as <bold>
, that can be used. However, if the WebVTT file is used in a {{HTMLElement("track")}} element where the attribute {{htmlattrxref("kind")}} is chapters
then you cannot use tags.
- Timestamp tag
- The timestamp must be greater that the cue's start timestamp, greater than any previous timestamp in the cue payload, and less than the cue's end timestamp. The active text is the text between the timestamp and the next timestamp or to the end of the payload if there is not another timestamp in the payload. Any text before the active text in the payload is previous text . Any text beyond the active text is future text . This enables karaoke style captions.
Example 12 - Karaoke style text
1 00:16.500 --> 00:18.500 When the moon <00:17.500>hits your eye 1 00:00:18.500 --> 00:00:20.500 Like a <00:19.000>big-a <00:19.500>pizza <00:20.000>pie 1 00:00:20.500 --> 00:00:21.500 That's <00:00:21.000>amore
The following tags are the HTML tags allowed in a cue and require opening and closing tags (e.g. <b>text</b>
).
- Class tag (
<c></c>
)- Style the contained text using a CSS class.
Example 14 - Class tag
<c.classname>text</c>
- Italics tag (
<i></i>
)- Italicize the contained text.
Example 15 - Italics tag
<i>text</i>
- Bold tag (
<b></b>
)- Bold the contained text.
Example 16 - Bold tag
<b>text</b>
- Underline tag (
<u></u>
)- Underline the contained text.
Example 17 - Underline tag
<u>text</u>
- Ruby tag (
<ruby></ruby>
)- Used with ruby text tags to display ruby characters (i.e. small annotative characters above other characters).
Example 18 - Ruby tag
<ruby>WWW<rt>World Wide Web</rt>oui<rt>yes</rt></ruby>
- Ruby text tag (
<rt></rt>
)- Used with ruby tags to display ruby characters (i.e. small annotative characters above other characters).
Example 19 - Ruby text tag
<ruby>WWW<rt>World Wide Web</rt>oui<rt>yes</rt></ruby>
- Voice tag (
<v></v>
)- Similar to class tag, also used to style the contained text using CSS.
Example 20 - Voice tag
<v Bob>text</v>
Specifications
Specification | Status | Comment |
---|---|---|
{{SpecName("WebVTT")}} | {{Spec2("WebVTT")}} | Initial definition |
Compatibility
{{CompatibilityTable}}
Feature | Chrome | Firefox (Gecko) | Internet Explorer | Opera | Safari |
---|---|---|---|---|---|
Basic support | 18 | 24[1] | 10 | 15.0 | 7 |
Feature | Android | Firefox Mobile (Gecko) | IChrome for Mobile | Opera Mobile | Safari Mobile |
---|---|---|---|---|---|
Basic support | 4.4 | {{CompatNo}} | 35.0 | 21.0 | 7 |
[1] This feature is disabled by default in Gecko. See more information within theMozilla wiki.