play-daemon/README.md

492 lines
15 KiB
Markdown

# DSV play daemon
This application's job is to monitor for presentation uploads,
process them and send metadata to a configured remote URL once
processing is complete.
## Python dependencies
This application is written for python3. The libraries required are
specified in `requirements.txt`.
## Workflow
This is an overview of the packaging workflow:
1. Queue pickup:
Queue files are picked up from the directory configured by
'queue' in config.ini.
1. Origin-specific handling:
Create a consistent package from the different incoming formats.
1. Working directory:
Create a temporary working directory and add it to the package.
1. Subtitles pickup:
If a subtitle file was passed, it is copied into the working directory
and the package updated accordingly.
1. Thumb pickup/generation:
If a thumbnail image was passed, it is copied to the working directory.
Otherwise a thumb is generated based on the package metadata and placed in
the working directory. The package is updated accordingly.
1. Transcoding and poster generation:
All transcoding actions are handled here. Incoming video streams
are transcoded to the expected resolutions, slide streams are
converted to video and any missing poster images are created. The
resulting files are placed in the working directory and
the package updated.
1. Update integration:
If the package is an update job, integrates the existing files with
the updates as appropriate. Otherwise this step is skipped.
1. Stash originals:
If the log level is DEBUG or higher, places all the original files
in a subdirectory of the working directory.
1. Platform notification:
Send the package as a notification to the configured endpoint
('notify_url' in config.ini). This notification package will
use the format specified below.
1. Finalize processing:
Saves the notification package to the working directory and moves
the working directory to the final storage destination ('storage'
in config.ini). Deletes the queue file and incoming files.
## Package formats
The package format is generally similar across incoming data, processing
and notification, but there are differences, especially semantic ones.
### Incoming
Incoming formats vary between origins, but the basic format is as follows,
except for cattura which has its own format.
Unless noted otherwise, all *keys* are mandatory. A lot of the *values* may be
empty strings though - what that means in a specific case is documented below.
```javascript
{
// The title of the presentation in swedish and english.
"title": {
"sv": "",
"en": ""
},
// A description of the presentation. No localization support
// for the time being.
"description": "",
// The creation time of the presentation as a unix timestamp.
"created": 0,
// The duration of the presentation in seconds, possibly fractional.
"duration": 0,
// A possibly empty list of presenter, identified by SU username.
"presenters": [],
// A possibly empty list of courses, identified by course code.
"courses": [],
// A possibly empty list of tags. Tags are arbitrary strings.
"tags": [],
// The path to the presentation thumbnail. May be empty, in which case
// it will be auto-generated.
"thumb": "",
// The possibly empty path to a subtitle file in .vtt format.
"subtitles": "",
// The list of video sources to be processed. The format of a single
// source is documented below.
"sources": [source1, ...]
}
```
This is the generic format of an incoming source object:
```javascript
{
// The name of the stream. Must be unique within the presentation.
"name": "",
// The path to the video file associated with this stream.
// Must not be empty.
"video": "",
// The path to a poster fr this stream. May be empty, in which case
// one will be auto-generated.
"poster": "",
// A boolean value indicating whether to play audio from this stream.
// Only one stream per presentation should have this set to true.
"playAudio": bool
}
```
#### Mediasite
File paths in a mediasite job must be absolute URL:s to the source files'
locations on the mediasite server. The files will be downloaded to local
storage in the packaging step.
A mediasite import job will contain an 'id' key as a top-level key in
the job object.
```javascript
{
...
// This is the ID used by the platform to keep track of the job.
// It is moved to the 'notification_id' field in the packaging step
// in order to make room for the ID generated by the receiving API.
"id": "",
...
}
```
Mediasite jobs can contain a slide stream that needs to be converted to
video. This is a top-level key in the job object.
```javascript
{
...
// A list of slide objects to be merged into a video stream by the
// transcoding step. The field will be removed entirely after packaging
// and replaced with a special source object.
"slides": [
{
// The web URL to a slide to be downloaded by the packager and
// placed in a demux file for later transcoding to video.
"url": "",
// The amount of time that this slide is to be shown,
// in milliseconds.
"duration": 0
},
// Further slide objects.
...
],
...
}
```
#### Manual
Manual uploads must specify a 'base' directory where all associated incoming
files are stored. This path should be located under the 'incoming' directory
configured in `config.ini`. All other paths in the job definition must be
relative to this path.
```javascript
{
...
// The base directory for this job's uploaded files. All other paths in the
// job must be specified relative to this one.
"base": "",
...
}
```
#### Update
Update jobs are special in that they must act on an existing presentation.
The existing presentation to be acted upon is specified in the top-level 'id'
field in the incoming job definition.
Update jobs also require a 'base' field the same way as the 'manual' job type.
Other paths in the job definition must be relative to this path, again just as
in the 'manual' job type.
If sources are to be deleted, they should be listed in the top-level field
'delete' as below. The field is mandatory and should be an empty list of no
streams are to be deleted. If a source is listed for deletion, it must not
occur in the 'sources' element.
```javascript
{
...
// The ID of the existing presentation to be acted upon. The contents of
// this field will be moved into the 'orig_id' field in the packaging step,
// in order to make room for the ID generated by the receiving API.
"id": "",
// Temporary ID used on the notification receiving end to match
// the job to existing metadata.
"notification_id": "",
// The base directory for this job's uploaded files. May be empty if no
// files were uploaded.
// All other paths in the job must be specified relative to this one.
"base": "",
// A possibly empty list of streams to be deleted. The streams are
// identified by their 'name' attribute.
"delete": [],
...
}
```
Some fields that are normally mandatory are optional when updating. Please
note that all fields *not* mentioned here are still mandatory.
The presence of an optional top-level field will cause the update process to
act on that field as specified below.
```javascript
{
...
// Replace the existing thumb with the one specified here. If the field is
// present with an empty value, the thumbnail will be re-generated using the
// automatic generation function.
"thumb": "",
// Replace the existing subtitle file with the one specified here. If the
// field is present with an empty value, the subtitles will be deleted.
"subtitles": "",
// If present, act on the sources specified here. See below for details
// on individual sources.
"sources": [source1, ...],
...
}
```
A source to be updated works as follows.
```javascript
{
...
// The name of the stream to be acted upon. If it does not match an existing
// stream name, a new stream with the given name will be added
// to the presentation.
"name": "",
// If present, replace the current poster for this source with the one
// specified here. If the value is blank, a new poster will be generated.
"poster": "",
// If present, replace the current video stream for this source with the
// one specified here. This field may not have an empty value.
"video": "",
...
}
```
#### Cattura
The format of incoming cattura packages is completely determined by the
cattura recorder software and hence isn't documented here.
Since this is the package type that is used for capturing lectures in the
lecture halls, this package type will read booking data from daisy in order
to populate the fields `title`, `presenters`, `courses` and `tags`.
### Processing
This is the format of the package going through the pipeline.
Some fields get created partway through the pipeline and others only
exist for certain origins.
Unless otherwise noted, a field can be expected to always exist after
the origin-specific handling step (which will create the initial package).
Relative paths are relative to the 'base' attribute until they have been
processed in the transcoding step. After that the path is relative to the
'workbase' directory.
```javascript
{
// The unique ID of the job, assigned by the receiving API. Unless the
// origin is 'update', this will be the permanent ID of the presentation.
// It is used to guarantee unique working directories etc.
"id": "",
// The origin of the job; currently one of:
// 'manual', 'mediasite', 'cattura', 'update'
"origin": "",
// This field only exists for the 'update' origin.
// The ID of the existing presentation to be updated.
"orig_id": "",
// This field only exists for the 'manual' and 'mediasite' origins.
// Temporary ID used on the notification receiving end to match
// the job to existing metadata.
"notification_id": "",
// The absolute path to the directory where incoming job files
// are located. Should be a direct subdirectory of the directory
// configured by 'incoming' in config.ini.
"base": "",
// The absolute path to the working directory used during processing.
// Will be a direct subdirectory of the directory configured by
// 'processing' in config.ini
"workbase": "",
// The presentation creation time as a unix timestamp. Taken from
// incoming metadata, in no way related to creation time of the
// job on the server.
"creation": 0,
// The duration of the presentation as a number of seconds. Decimals OK.
"duration": 0,
// The presentation title.
"title": {
"en": "",
"sv": ""
},
// A possibly empty list of presenters involved in the presentation.
"presenters": [],
// A possibly empty list of course codes the presentation should
// be associated with.
"courses": [],
// The relative path to the thumbnail file to be shown when browsing
// presentations. May be empty before the transcoding step, in which
// case the transcoder will create it and populate the field.
"thumb": "",
// A possibly empty list of tags. Tags are arbitrary strings.
"tags": [],
// A possibly empty relative path to a subtitle file in VTT format.
"subtitles": "",
// A list of at least one source. The format of a source object is
// documented below.
"sources": [source1, ...],
}
```
There are two valid source object formats - video and slide objects. Slide
objects will be converted to video objects in the transcoding step.
A video object:
```javascript
{
// A name for this source. Must be unique within the presentation.
"name": "",
// The possibly empty relative path to the poster image to be shown for
// this source before playback has started. If it is empty, the
// transcode step will create a poster image and populate this field.
"poster": "",
// A boolean indicating whether to play audio from this source.
// Exactly one source in a package should have this set to true.
"playAudio": bool,
// The relative path to the source video file. This will be
// updated in the transcode step to the format specified in
// the [Notification](#Notification) section.
"video": "",
}
```
A slides object:
```javascript
{
// The absolute path to the demux file to be used to produce the video.
"demux_file": "",
// The absolute path to the poster file to be shown before playback
// has started.
"poster": "",
// A boolean indicating whether to play audio from this source.
// Always false, since a slideshow has no audio.
"play_audio": false,
}
```
### Notification
This is the format of the package when it is sent as a notification.
All filenames are given as paths relative to the package root.
```javascript
{
// The ID of the presentation.
"id": "",
// The origin of the notification; currently one of:
// cattura, mediasite, manual, update
"origin": "",
// Only exists for the 'manual', 'mediasite' and 'update' origins. Used
// on the receiving end to match the notification to existing metadata.
"notification_id": "",
// Creation time of the presentation, as a unix timestamp.
"creation": 0,
// The title of the presentation.
"title": {
"en": "",
"sv": ""
},
// A possibly empty list of presenters involved in the presentation.
"presenters": [],
// A possibly empty list of course codes the presentation should
// be associated with.
"courses": [],
// The length of the presentation in seconds. May include decimals.
"duration": 0,
// A possibly empty list of tags. Tags are arbitrary strings.
"tags": [],
// The thumbnail image to be shown when browsing presentations.
"thumb": "",
// The filename of the subtitles file. If there is no subtitles file,
// an empty string is passed.
"subtitles": "",
// A list of one or more source objects. See the source format below.
"sources": [source1, ...],
}
```
This is the format for each source in the list of sources:
```javascript
{
// A name for this source. Must be unique within the presentation.
"name": "main",
// The filename of the poster image to be shown for this source
// before playback has started.
"poster": "",
// A boolean indicating whether to play audio from this source.
// Exactly one source in a package should have this set to true.
"playAudio": bool,
// A dict listing the video filename for each resolution variant
// of this source. The key should be the variant's vertical
// resolution as a string, e.g '720' etc.
// All sources in a presentation will have the same set of resolutions.
"video": {
resolution: "",
...
},
}
```