Stefan Nenzén
30361abdd9
Create a VisibilityHandler Co-authored-by: nenzen <stefan@nenzen.com> Reviewed-on: #3 Reviewed-by: erth9960 <thuning@dsv.su.se> Co-authored-by: Stefan Nenzén <nenzen@dsv.su.se> Co-committed-by: Stefan Nenzén <nenzen@dsv.su.se>
593 lines
18 KiB
Markdown
593 lines
18 KiB
Markdown
# DSV play daemon v2 - this time it's threaded
|
|
|
|
This application's job is to monitor for presentation uploads,
|
|
process them and send updates about processing progress to a callback URL.
|
|
|
|
|
|
## Python dependencies
|
|
|
|
This application is written for python3. The libraries required are
|
|
specified in `requirements.txt`.
|
|
|
|
|
|
## Setup
|
|
|
|
Ensure the `ffmpeg` and `ffprobe` binaries are available on the system.
|
|
|
|
Create a virtualenv for use by the application and install the
|
|
dependencies into it.
|
|
|
|
Create `config.ini` and populate it according to the instructions in
|
|
`config.ini.example`.
|
|
|
|
The provided example systemd unit (`play-daemon.service.example`) expects the
|
|
application to be installed under `/opt/play-daemon`, with a venv in the `env`
|
|
subdirectory and run as the `play-daemon` user. Change the unit file as
|
|
required if any of these do not match the actual installation.
|
|
|
|
The running user must have a writable home directory.
|
|
|
|
Also take note of the `TRANSFORMERS_CACHE` environment variable that needs to
|
|
be set to a directory writable by the running user.
|
|
|
|
The unit file (with any necessary modifications made) should be copied to
|
|
`/etc/systemd/system` and activated as normal.
|
|
|
|
|
|
## Workflow overview
|
|
|
|
A job is submitted by first uploading any required files to a unique
|
|
subdirectory of `uploaddir` as set in `config.ini`, followed by
|
|
submitting a job specification via the REST API.
|
|
|
|
Once the specification has been submitted, the job is passed through the
|
|
following steps:
|
|
|
|
* Job pickup: The job is detected by the daemon and the job type validated.
|
|
|
|
* Preprocessing: If the job passed validation, it is preprocessed to unify
|
|
all incoming formats into the format documented here.
|
|
|
|
* Distribution: The distributor checks which handlers should be applied to
|
|
this job and distributes it appropriately. Each relevant handler validates
|
|
the job before it is submitted.
|
|
|
|
* Handlers: Each handler does its specific processing and notifies the
|
|
collector once done.
|
|
|
|
* Collection: The collector waits for each handler to finish, applying each
|
|
handler's changes as they report that they are finished.
|
|
|
|
* Notification: Each successfully applied handler and any errors generate
|
|
a notification. Each notification contains information about further
|
|
pending handlers. If the list of pending handlers is empty, the job as a
|
|
whole is finished and no further notifications will be sent.
|
|
|
|
An error at any step of the process will abort the entire job with a
|
|
notification about the error.
|
|
|
|
|
|
## Queue files
|
|
|
|
Queue files are created by the REST API and shouldn't normally require manual
|
|
creation, but are documented here for completeness. A queue file contains a
|
|
JSON object with the following keys:
|
|
|
|
* `type`: `string`
|
|
The type of job this is. See [Job types](#job-types) below for details.
|
|
|
|
* `data`: `JSON`
|
|
The actual job specification as documented under
|
|
[Job specification structure](#job-specification-structure).
|
|
|
|
* `recorder`: `string`
|
|
Unique to the cattura job type. Specifies which recorder the job was
|
|
submitted from.
|
|
|
|
|
|
## Job types
|
|
|
|
The daemon recognizes jobs of three different types, which are passed on from
|
|
the REST API:
|
|
|
|
* `default`
|
|
The standard job type for uploads. The expected incoming format matches
|
|
what is documented here.
|
|
|
|
* `cattura`
|
|
Jobs coming from Cattura recorders. The incoming format is completely
|
|
determined by the recorder, which is then reformatted to the documented
|
|
format by the cattura preprocessor.
|
|
|
|
* `mediasite`
|
|
Mediasite import jobs. These largely conform to this documentation but
|
|
provide links to source files instead of uploaded files. The mediasite
|
|
preprocessor downloads linked media and reformats the job to the
|
|
documented format.
|
|
|
|
|
|
## Job specification structure
|
|
|
|
A job specification is a JSON object with fields specifying what changes to
|
|
make to a package (or what to store in a new package).
|
|
|
|
All fields are technically optional, but the different Handlers require
|
|
certain combinations of keyss in order to accept a job as valid.
|
|
|
|
An omitted key that does not cause an error will leave the package data
|
|
for that key unchanged.
|
|
|
|
Valid top-level keys and their expected values are:
|
|
|
|
* `pkg_id`: `string`
|
|
The ID of an existing package to modify. If omitted, a new package will be
|
|
created by this job.
|
|
|
|
* `upload_dir`: `string`
|
|
The directory where any files relevant to this job have been uploaded. Must
|
|
be a subdirectory of `uploaddir` as specified in `config.ini`.
|
|
|
|
* `notification_url`: `string`
|
|
The remote endpoint where notifications about this job should be sent via
|
|
HTTP POST request. Must not require any authentication. If omitted, the
|
|
default endpoint configured in `config.ini` is used instead.
|
|
For testing purposes this can also be a local directory on the server,
|
|
which exists and is writable by the user the daemon is running as.
|
|
|
|
* `title`: `JSON`
|
|
The title in the supported languages. Its format is:
|
|
```json
|
|
{
|
|
"sv": "Titel på svenska",
|
|
"en": "Title in english"
|
|
}
|
|
```
|
|
|
|
* `description`: `string`
|
|
A description for the presentation. No i18n support as of yet.
|
|
|
|
* `created`: `int`
|
|
A unix timestamp representing the time and date of recording.
|
|
|
|
* `presenters`: `list of strings`
|
|
A list of presenters. A presenter may be represented by an SU username
|
|
or a real name.
|
|
|
|
* `courses`: `list of JSON`
|
|
A list of courses this presentation is associated with. A course is
|
|
represented by a JSON object with two keys - `designation` and `semester`.
|
|
|
|
* `tags`: `list of strings`
|
|
A list of arbitrary strings.
|
|
|
|
* `thumb`: `string`
|
|
The path to a file that will be used as the tumbnail for this presentation.
|
|
Relative to `upload_dir`. If an empty string is passed, a new thumbnail
|
|
will be generated based on stored metadata and any changes
|
|
passed in this job.
|
|
|
|
* `subtitles`: `JSON`
|
|
A JSON object representing the subtitles to be acted upon. Its format is
|
|
documented in detail under the heading [Subtitles](#subtitles).
|
|
|
|
* `generate_subtitles`: `JSON`
|
|
A JSON object representing subtitle tracks to be generated. Its format is
|
|
documented in detail under the heading [Subtitles](#subtitles).
|
|
|
|
* `sources`: `JSON`
|
|
A JSON object representing the sources to be acted upon. Its format is
|
|
documented in detail under the heading [Job Sources](#job-sources).
|
|
|
|
* `slides`: `string`
|
|
The path to a file representing a slideshow of images to be transcoded to
|
|
video. Relative to `upload_dir`.
|
|
Details under the heading [Slides](#slides).
|
|
|
|
### Subtitles
|
|
|
|
There are two top-level keys that deal with subtitles: `subtitles` and
|
|
`generate_subtitles`. The `subtitles` object is a simple key-value map,
|
|
mapping subtitle track names to subtitle files to be stored. The
|
|
`generate_subtitles` object maps subtitle track names to generation tasks.
|
|
Keys must be unique across these two maps.
|
|
|
|
If the value for a given key in `subtitles` is `null`, that track is deleted
|
|
from the presentation. Non-null values must be files located
|
|
under `upload_dir`.
|
|
|
|
Any subtitle tracks that exist in the presentation but are omitted in the job
|
|
specification are left unmodified.
|
|
|
|
Values in `generate_subtitles` are objects with the following structure:
|
|
|
|
* `type`: `string`
|
|
The transcription engine to be used for generating the subtitles. Currently
|
|
only supports the value "whisper". This key is mandatory.
|
|
|
|
* `source`: `string`
|
|
The name of one of this presentation's video streams, which will be used to
|
|
generate the subtitle track. Should preferrably use a stream with a camera
|
|
feed for best synchronization results. The stream may either be an already
|
|
existing one or one created by this job. This key is mandatory.
|
|
|
|
* `language`: `string`
|
|
The language in which to generate subtitles. If omitted, the language will
|
|
be inferred by the subtitling engine. Should mainly be used to force the
|
|
correct language when the automatic detection fails.
|
|
The language string should be a two-letter ISO 639-1 language code, such as
|
|
`sv` or `en`.
|
|
|
|
Here is an example of valid `subtitles` and `generate_subtitles` sections:
|
|
```json
|
|
{
|
|
"subtitles": {
|
|
"English": "path/to/subs.vtt",
|
|
"Svenska": null
|
|
},
|
|
"generate_subtitles": {
|
|
"English (generated)": {
|
|
"type": "whisper",
|
|
"source": "camera",
|
|
"language": "en"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
This example would save the provided WEBVTT file for the "English" track,
|
|
generate subtitles for the "English (generated)" track based on the given
|
|
source and delete the "Svenska" track. Note that the source "camera" must
|
|
exist (either in this job specification or in the package already on disk in
|
|
case of an update), and `upload_dir` must be provided in the job specification
|
|
in order to be able to resolve the path to the English subtitle track.
|
|
|
|
|
|
### Job Sources
|
|
|
|
The sources object consists of a number of keys, each corresponding to a named
|
|
source. Each source value is itself a JSON object with a number of keys.
|
|
|
|
These are the valid keys for a source object:
|
|
|
|
* `video`: `string`
|
|
The path to a video file to be used for this source.
|
|
Relative to `upload_dir`.
|
|
|
|
* `poster`: `string`
|
|
The path to an image to be used as placeholder for this video stream while
|
|
the player is loading. Relative to `upload_dir`. An empty value will cause
|
|
a new poster to be generated based on either the passed value of `video`
|
|
or the already stored video as appropriate.
|
|
|
|
* `playAudio`: `bool`
|
|
Whether this stream should play its audio track on playback. This should
|
|
only have a `true` value for one stream per presentation. If omitted on
|
|
stream creation, this will defauilt to `false`.
|
|
|
|
* `enabled`: `bool`
|
|
Whether this stream will be displayed in the player. At least one stream
|
|
must be enabled. If omitted on stream creation, this will deafult to `true`.
|
|
|
|
A `sources` object would look like this:
|
|
```json
|
|
{
|
|
"asourcename": {
|
|
"video": "some/path",
|
|
"poster": "some/other/path",
|
|
"playAudio": someBool,
|
|
"enabled": somebool,
|
|
},
|
|
"anothersource": {...},
|
|
...
|
|
}
|
|
```
|
|
|
|
The source name "slides" is reserved for streams generated from slideshow
|
|
images (See [Slides](#slides)).
|
|
|
|
As with the top-level keys, keys in slides and stream objects may be omitted
|
|
if their stored data is to be left unmodified.
|
|
|
|
|
|
### Slides
|
|
|
|
The slides key is used to create video from image-based slideshows. The value
|
|
should be the relative path to a file formatted according to the requirements
|
|
of the FFMPEG `concat` demuxer, which is documented
|
|
[here](https://ffmpeg.org/ffmpeg-formats.html#concat).
|
|
|
|
This is a basic example of a demux file:
|
|
```ffconcat
|
|
ffconcat version 1.0
|
|
file 'path/to/slide1.png'
|
|
duration 4500ms
|
|
|
|
file 'path/to/slide2.jpg'
|
|
# The default unit is seconds:
|
|
duration 3
|
|
|
|
# The final image must be specified again with no duration:
|
|
file 'path/to/slide2.jpg'
|
|
```
|
|
|
|
All paths must be relative to `upload_dir` and pass the requirements imposed
|
|
by the `safe` keyword as documented for `concat`.
|
|
|
|
The slide stream will be added to the package's `sources` object under the
|
|
name "slides". If one already existed under that name it will be replaced.
|
|
|
|
|
|
### Deletion
|
|
|
|
In general, passing a falsy value will delete the data for the relevant key.
|
|
Empty string and list values will simply overwrite any stored data with the
|
|
empty values. There are some keys where this isn't the case, they
|
|
are documented above.
|
|
|
|
In order to delete a stream, pass `null` as the value of the
|
|
appropriate stream name key:
|
|
```json
|
|
{
|
|
"pkg_id": "some_package_id",
|
|
"sources": {
|
|
"sourcename": null
|
|
}
|
|
}
|
|
```
|
|
|
|
Deletions are not special and can be combined with any other desired updates
|
|
to a package in a single job specification, barring contradictory job
|
|
specifications such as adding a slide stream while also deleting it.
|
|
|
|
|
|
### Examples
|
|
|
|
This is a job specification that has all keys and values:
|
|
|
|
```json
|
|
{
|
|
"pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
|
|
"upload_dir": "/configured-uldir/myjobfiles",
|
|
"title": {
|
|
"en": "My Presentation",
|
|
"sv": "Min presentation"
|
|
},
|
|
"description": "A description of the contents",
|
|
"created": 1665151669,
|
|
"presenters": ["A Person", "ausername"],
|
|
"courses": [
|
|
{
|
|
"designation": "IDSV",
|
|
"semester": "vt21"
|
|
},
|
|
{
|
|
"designation": "PROG1",
|
|
"semester": "vt21"
|
|
}
|
|
],
|
|
"tags": ["programming", "python"],
|
|
"thumb": "mythumb.jpg",
|
|
"subtitles": {
|
|
"English": "en.vtt",
|
|
"Swedish": "swedishsubs.vtt"
|
|
},
|
|
"generate_subtitles": {
|
|
"Generated": {
|
|
"type": "whisper",
|
|
"source": "main"
|
|
}
|
|
},
|
|
"sources": {
|
|
"main": {
|
|
"video": "videos/myvideo.mp4",
|
|
"poster": "aposter.jpg",
|
|
"playAudio": true,
|
|
"enabled": true
|
|
},
|
|
"second": {
|
|
"video": "myothervideo.mp4",
|
|
"poster": "anotherposter.jpg",
|
|
"playAudio": false,
|
|
"enabled": false
|
|
}
|
|
},
|
|
"slides": {
|
|
"demux_file": "mydemux.txt",
|
|
"poster": "slides/myfavorite.png"
|
|
}
|
|
}
|
|
```
|
|
|
|
This job specification creates a new package, letting the daemon generate
|
|
thumbnail, subtitles and posters:
|
|
|
|
```json
|
|
{
|
|
"upload_dir": "/configured-uldir/myjobfiles",
|
|
"title": {
|
|
"en": "My Presentation",
|
|
"sv": "Min presentation"
|
|
},
|
|
"description": "A description of the contents",
|
|
"created": 1665151669,
|
|
"presenters": ["A Person", "ausername"],
|
|
"courses": [
|
|
{
|
|
"designation": "IDSV",
|
|
"semester": "vt21"
|
|
},
|
|
{
|
|
"designation": "PROG1",
|
|
"semester": "vt21"
|
|
}
|
|
],
|
|
"tags": ["programming", "python"],
|
|
"thumbnail": "",
|
|
"generate_subtitles": {
|
|
"Generated": {
|
|
"type": "whisper",
|
|
"source": "main"
|
|
}
|
|
},
|
|
"sources": {
|
|
"main": {
|
|
"video": "videos/myvideo.mp4",
|
|
"poster": "",
|
|
"playAudio": true
|
|
},
|
|
"second": {
|
|
"video": "myothervideo.mp4",
|
|
"poster": "",
|
|
"playAudio": false
|
|
}
|
|
},
|
|
"slides": {
|
|
"demux_file": "mydemux.txt",
|
|
"poster": ""
|
|
}
|
|
}
|
|
```
|
|
|
|
An update job making some changes that don't involve uploads:
|
|
|
|
```json
|
|
{
|
|
"pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
|
|
"title": {
|
|
"en": "My Presentation",
|
|
"sv": "Min presentation"
|
|
},
|
|
"description": "A description of the contents",
|
|
"created": 1665151669,
|
|
"presenters": ["A Person", "ausername"],
|
|
"courses": [],
|
|
"subtitles": {
|
|
"Generated": null
|
|
}
|
|
}
|
|
```
|
|
|
|
|
|
## Notifications
|
|
|
|
The daemon will return two types of notifications: `success` and `error`. Both
|
|
are JSON objects.
|
|
|
|
Keys that are common between the two types:
|
|
|
|
* `type`: `"success"` or `"error"`
|
|
Used to identify the type of notification.
|
|
|
|
* `origin`: `string`
|
|
The name of the step in the pipeline that sent the notification. `success`
|
|
notifications will always originate from a Handler, but errors can come
|
|
from other sources. Mostly useful for debugging.
|
|
|
|
* `jobid`: `string`
|
|
The ID of the job this notification is about. This will be the same ID that
|
|
was returned by the REST API when submitting the job specification.
|
|
|
|
Keys unique to `sucess` notifications:
|
|
|
|
* `package`: `JSON`
|
|
The complete package metadata as stored on disk when the notification was
|
|
sent. The most recent notification received for a job will reflect the
|
|
current state of the package as stored on the backend.
|
|
See [Package structure](#package-structure) for details.
|
|
|
|
* `pending`: `list of strings`
|
|
Lists all pending Handlers for this job. If this list is empty, the job is
|
|
completely finished.
|
|
|
|
Keys unique to `error` notifications:
|
|
|
|
* `message`: `string`
|
|
An error message elaborating on what went wrong with the job.
|
|
|
|
|
|
## Package structure
|
|
|
|
This section documents the format used for packages in notifications. Each
|
|
package is a JSON object with the following keys:
|
|
|
|
* `pkg_id`: `string`
|
|
The ID of the package.
|
|
|
|
* `contents`: `JSON`
|
|
The metadata as a JSON object.
|
|
|
|
The keys in the `contents` object are:
|
|
|
|
* `title`: `JSON`
|
|
The title in the supported languages. Its format is:
|
|
```json
|
|
{
|
|
"sv": "Titel på svenska",
|
|
"en": "Title in english"
|
|
}
|
|
```
|
|
|
|
* `description`: `string`
|
|
A description for the presentation. No i18n support as of yet.
|
|
|
|
* `created`: `int`
|
|
A unix timestamp representing the time and date of recording.
|
|
|
|
* `duration`: `int` or `float`
|
|
The duration of the presentation in seconds.
|
|
|
|
* `presenters`: `list of strings`
|
|
A list of presenters. A presenter may be represented by an SU username
|
|
or a real name.
|
|
|
|
* `courses`: `list of strings`
|
|
A list of courses this presentation is associated with. A course is
|
|
represented by its course code.
|
|
|
|
* `tags`: `list of strings`
|
|
A list of arbitrary strings.
|
|
|
|
* `thumb`: `string`
|
|
The relative URL to the tumbnail for this presentation.
|
|
|
|
* `subtitles`: `string`
|
|
The relative URL to this presentation's subtitles.
|
|
|
|
* `sources`: `JSON`
|
|
A JSON object representing playable video sources.
|
|
|
|
A sources object maps source names to source definitions:
|
|
|
|
```json
|
|
{
|
|
"source1": {...},
|
|
"source2": {...}
|
|
}
|
|
```
|
|
|
|
The format of a source definition is documented under the heading
|
|
[Package sources](#package-sources).
|
|
|
|
|
|
### Package sources
|
|
|
|
A source definition is a JSON object with the following keys:
|
|
|
|
* `poster`: `string`
|
|
The relative URL to an image shown in the player while the stream loads.
|
|
|
|
* `video`: `JSON`
|
|
A JSON object mapping resolutions to relative video URLS:
|
|
```json
|
|
{"720": "video-720-variant.mp4",
|
|
"1080": "video-1080-variant.mp4"}
|
|
```
|
|
|
|
* `playAudio`: `bool`
|
|
A boolean value denoting whether to this stream's audio track. This will
|
|
only be set to `true` for one source in a given package.
|
|
|
|
* `enabled`: `bool`
|
|
Whether this stream will be displayed in the player. At least one stream
|
|
will be enabled.
|