Documented the ability to force the language when generating subtitles
This commit is contained in:
parent
c7f6207cc6
commit
f634fd4b1e
27
README.md
27
README.md
@ -230,13 +230,20 @@ Values in `generate_subtitles` are objects with the following structure:
|
||||
|
||||
* `type`: `string`
|
||||
The transcription engine to be used for generating the subtitles. Currently
|
||||
only supports the value "whisper".
|
||||
only supports the value "whisper". This key is mandatory.
|
||||
|
||||
* `source`: `string`
|
||||
The name of one of this presentation's video streams, which will be used to
|
||||
generate the subtitle track. Should preferrably use a stream with a camera
|
||||
feed for best synchronization results. The stream may either be an already
|
||||
existing one or one created by this job.
|
||||
existing one or one created by this job. This key is mandatory.
|
||||
|
||||
* `language`: `string`
|
||||
The language in which to generate subtitles. If omitted, the language will
|
||||
be inferred by the subtitling engine. Should mainly be used to force the
|
||||
correct language when the automatic detection fails.
|
||||
The language string should be a two-letter ISO 639-1 language code, such as
|
||||
`sv` or `en`.
|
||||
|
||||
Here is an example of valid `subtitles` and `generate_subtitles` sections:
|
||||
```json
|
||||
@ -246,19 +253,21 @@ Here is an example of valid `subtitles` and `generate_subtitles` sections:
|
||||
"Svenska": null
|
||||
},
|
||||
"generate_subtitles": {
|
||||
"Generated": {
|
||||
"English (generated)": {
|
||||
"type": "whisper",
|
||||
"source": "camera"
|
||||
"source": "camera",
|
||||
"language": "en"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This example would save the provided WEBVTT file for the "English" track,
|
||||
generate subtitles for the "Generated" track based on the given source and
|
||||
delete the "Svenska" track. Note that the source "camera" must exist (either
|
||||
in this job specification or in the package already on disk in case of an
|
||||
update), and `upload_dir` must be provided in the job specification in order
|
||||
to be able to resolve the path to the English subtitle track.
|
||||
generate subtitles for the "English (generated)" track based on the given
|
||||
source and delete the "Svenska" track. Note that the source "camera" must
|
||||
exist (either in this job specification or in the package already on disk in
|
||||
case of an update), and `upload_dir` must be provided in the job specification
|
||||
in order to be able to resolve the path to the English subtitle track.
|
||||
|
||||
|
||||
### Job Sources
|
||||
|
@ -39,8 +39,10 @@ def _do_whisper_transcribe(inpath,
|
||||
word_timestamps=True)
|
||||
end = time.time()
|
||||
if language is None:
|
||||
read_language = result['language']
|
||||
out_language = result['language']
|
||||
logger.info(f"Detected language '{read_language}' in {inpath}.")
|
||||
else:
|
||||
out_language = language
|
||||
vttWriter = whisper.utils.WriteVTT(str(outpath.parent))
|
||||
vttWriter.always_include_hours = True
|
||||
except Exception as e:
|
||||
@ -55,7 +57,7 @@ def _do_whisper_transcribe(inpath,
|
||||
elapsed = time.strftime('%H:%M:%S', time.gmtime(end - start))
|
||||
logger.info('Finished whisper transcription job '
|
||||
f'for {inpath} in {elapsed}.')
|
||||
return outpath
|
||||
return (outpath, out_language)
|
||||
|
||||
|
||||
@Handler.register
|
||||
|
Loading…
x
Reference in New Issue
Block a user