cbf29c4962
so the code was just adding unnecessary complexity. The pipeline now uses
mp.pool to manage ffmpeg jobs as before.
This reverts commit
|
||
---|---|---|
docs | ||
pipeline | ||
.gitignore | ||
cleanup.sh | ||
config.ini.example | ||
play-daemon.py | ||
play-daemon.service.example | ||
README.md | ||
requirements.txt | ||
test.py |
DSV play daemon v2 - this time it's threaded
This application's job is to monitor for presentation uploads, process them and send updates about processing progress to a callback URL.
Python dependencies
This application is written for python3. The libraries required are
specified in requirements.txt
.
Setup
Ensure the ffmpeg
and ffprobe
binaries are available on the system.
Create a virtualenv for use by the application and install the dependencies into it.
Create config.ini
and populate it according to the instructions in
config.ini.example
.
The provided example systemd unit (play-daemon.service.example
) expects the
application to be installed under /opt/play-daemon
, with a venv in the env
subdirectory and run as the play-daemon
user. Change the unit file as
required if any of these do not match the actual installation.
The running user must have a writable home directory.
Also take note of the TRANSFORMERS_CACHE
environment variable that needs to
be set to a directory writable by the running user.
The unit file (with any necessary modifications made) should be copied to
/etc/systemd/system
and activated as normal.
Workflow overview
A job is submitted by first uploading any required files to a unique
subdirectory of uploaddir
as set in config.ini
, followed by
submitting a job specification via the REST API.
Once the specification has been submitted, the job is passed through the following steps:
-
Job pickup: The job is detected by the daemon and the job type validated.
-
Preprocessing: If the job passed validation, it is preprocessed to unify all incoming formats into the format documented here.
-
Distribution: The distributor checks which handlers should be applied to this job and distributes it appropriately. Each relevant handler validates the job before it is submitted.
-
Handlers: Each handler does its specific processing and notifies the collector once done.
-
Collection: The collector waits for each handler to finish, applying each handler's changes as they report that they are finished.
-
Notification: Each successfully applied handler and any errors generate a notification. Each notification contains information about further pending handlers. If the list of pending handlers is empty, the job as a whole is finished and no further notifications will be sent.
An error at any step of the process will abort the entire job with a notification about the error.
Queue files
Queue files are created by the REST API and shouldn't normally require manual creation, but are documented here for completeness. A queue file contains a JSON object with the following keys:
-
type
:string
The type of job this is. See Job types below for details. -
data
:JSON
The actual job specification as documented under Job specification structure. -
recorder
:string
Unique to the cattura job type. Specifies which recorder the job was submitted from.
Job types
The daemon recognizes jobs of three different types, which are passed on from the REST API:
-
default
The standard job type for uploads. The expected incoming format matches what is documented here. -
cattura
Jobs coming from Cattura recorders. The incoming format is completely determined by the recorder, which is then reformatted to the documented format by the cattura preprocessor. -
mediasite
Mediasite import jobs. These largely conform to this documentation but provide links to source files instead of uploaded files. The mediasite preprocessor downloads linked media and reformats the job to the documented format.
Job specification structure
A job specification is a JSON object with fields specifying what changes to make to a package (or what to store in a new package).
All fields are technically optional, but the different Handlers require certain combinations of keyss in order to accept a job as valid.
An omitted key that does not cause an error will leave the package data for that key unchanged.
Valid top-level keys and their expected values are:
-
pkg_id
:string
The ID of an existing package to modify. If omitted, a new package will be created by this job. -
upload_dir
:string
The directory where any files relevant to this job have been uploaded. Must be a subdirectory ofuploaddir
as specified inconfig.ini
. -
notification_url
:string
The remote endpoint where notifications about this job should be sent via HTTP POST request. Must not require any authentication. If omitted, the default endpoint configured inconfig.ini
is used instead.
For testing purposes this can also be a local directory on the server, which exists and is writable by the user the daemon is running as. -
title
:JSON
The title in the supported languages. Its format is:{ "sv": "Titel på svenska", "en": "Title in english" }
-
description
:string
A description for the presentation. No i18n support as of yet. -
created
:int
A unix timestamp representing the time and date of recording. -
presenters
:list of strings
A list of presenters. A presenter may be represented by an SU username or a real name. -
courses
:list of JSON
A list of courses this presentation is associated with. A course is represented by a JSON object with two keys -designation
andsemester
. -
tags
:list of strings
A list of arbitrary strings. -
thumb
:string
The path to a file that will be used as the tumbnail for this presentation. Relative toupload_dir
. If an empty string is passed, a new thumbnail will be generated based on stored metadata and any changes passed in this job. -
subtitles
:JSON
A JSON object representing the subtitles to be acted upon. Its format is documented in detail under the heading Subtitles. -
generate_subtitles
:JSON
A JSON object representing subtitle tracks to be generated. Its format is documented in detail under the heading Subtitles. -
sources
:JSON
A JSON object representing the sources to be acted upon. Its format is documented in detail under the heading Job Sources. -
slides
:string
The path to a file representing a slideshow of images to be transcoded to video. Relative toupload_dir
. Details under the heading Slides.
Subtitles
There are two top-level keys that deal with subtitles: subtitles
and
generate_subtitles
. The subtitles
object is a simple key-value map,
mapping subtitle track names to subtitle files to be stored. The
generate_subtitles
object maps subtitle track names to generation tasks.
Keys must be unique across these two maps.
If the value for a given key in subtitles
is null
, that track is deleted
from the presentation. Non-null values must be files located
under upload_dir
.
Any subtitle tracks that exist in the presentation but are omitted in the job specification are left unmodified.
Values in generate_subtitles
are objects with the following structure:
-
type
:string
The transcription engine to be used for generating the subtitles. Currently only supports the value "whisper". This key is mandatory. -
source
:string
The name of one of this presentation's video streams, which will be used to generate the subtitle track. Should preferrably use a stream with a camera feed for best synchronization results. The stream may either be an already existing one or one created by this job. This key is mandatory. -
language
:string
The language in which to generate subtitles. If omitted, the language will be inferred by the subtitling engine. Should mainly be used to force the correct language when the automatic detection fails.
The language string should be a two-letter ISO 639-1 language code, such assv
oren
.
Here is an example of valid subtitles
and generate_subtitles
sections:
{
"subtitles": {
"English": "path/to/subs.vtt",
"Svenska": null
},
"generate_subtitles": {
"English (generated)": {
"type": "whisper",
"source": "camera",
"language": "en"
}
}
}
This example would save the provided WEBVTT file for the "English" track,
generate subtitles for the "English (generated)" track based on the given
source and delete the "Svenska" track. Note that the source "camera" must
exist (either in this job specification or in the package already on disk in
case of an update), and upload_dir
must be provided in the job specification
in order to be able to resolve the path to the English subtitle track.
Job Sources
The sources object consists of a number of keys, each corresponding to a named source. Each source value is itself a JSON object with a number of keys.
These are the valid keys for a source object:
-
video
:string
The path to a video file to be used for this source. Relative toupload_dir
. -
poster
:string
The path to an image to be used as placeholder for this video stream while the player is loading. Relative toupload_dir
. An empty value will cause a new poster to be generated based on either the passed value ofvideo
or the already stored video as appropriate. -
playAudio
:bool
Whether this stream should play its audio track on playback. This should only have atrue
value for one stream per presentation. If omitted on stream creation, this will defauilt tofalse
. -
enabled
:bool
Whether this stream will be displayed in the player. At least one stream must be enabled. If omitted on stream creation, this will deafult totrue
.
A sources
object would look like this:
{
"asourcename": {
"video": "some/path",
"poster": "some/other/path",
"playAudio": someBool,
"enabled": somebool,
},
"anothersource": {...},
...
}
The source name "slides" is reserved for streams generated from slideshow images (See Slides).
As with the top-level keys, keys in slides and stream objects may be omitted if their stored data is to be left unmodified.
Slides
The slides key is used to create video from image-based slideshows. The value
should be the relative path to a file formatted according to the requirements
of the FFMPEG concat
demuxer, which is documented
here.
This is a basic example of a demux file:
ffconcat version 1.0
file 'path/to/slide1.png'
duration 4500ms
file 'path/to/slide2.jpg'
# The default unit is seconds:
duration 3
# The final image must be specified again with no duration:
file 'path/to/slide2.jpg'
All paths must be relative to upload_dir
and pass the requirements imposed
by the safe
keyword as documented for concat
.
The slide stream will be added to the package's sources
object under the
name "slides". If one already existed under that name it will be replaced.
Deletion
In general, passing a falsy value will delete the data for the relevant key. Empty string and list values will simply overwrite any stored data with the empty values. There are some keys where this isn't the case, they are documented above.
In order to delete a stream, pass null
as the value of the
appropriate stream name key:
{
"pkg_id": "some_package_id",
"sources": {
"sourcename": null
}
}
Deletions are not special and can be combined with any other desired updates to a package in a single job specification, barring contradictory job specifications such as adding a slide stream while also deleting it.
Examples
This is a job specification that has all keys and values:
{
"pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"upload_dir": "/configured-uldir/myjobfiles",
"title": {
"en": "My Presentation",
"sv": "Min presentation"
},
"description": "A description of the contents",
"created": 1665151669,
"presenters": ["A Person", "ausername"],
"courses": [
{
"designation": "IDSV",
"semester": "vt21"
},
{
"designation": "PROG1",
"semester": "vt21"
}
],
"tags": ["programming", "python"],
"thumb": "mythumb.jpg",
"subtitles": {
"English": "en.vtt",
"Swedish": "swedishsubs.vtt"
},
"generate_subtitles": {
"Generated": {
"type": "whisper",
"source": "main"
}
},
"sources": {
"main": {
"video": "videos/myvideo.mp4",
"poster": "aposter.jpg",
"playAudio": true,
"enabled": true
},
"second": {
"video": "myothervideo.mp4",
"poster": "anotherposter.jpg",
"playAudio": false,
"enabled": false
}
},
"slides": {
"demux_file": "mydemux.txt",
"poster": "slides/myfavorite.png"
}
}
This job specification creates a new package, letting the daemon generate thumbnail, subtitles and posters:
{
"upload_dir": "/configured-uldir/myjobfiles",
"title": {
"en": "My Presentation",
"sv": "Min presentation"
},
"description": "A description of the contents",
"created": 1665151669,
"presenters": ["A Person", "ausername"],
"courses": [
{
"designation": "IDSV",
"semester": "vt21"
},
{
"designation": "PROG1",
"semester": "vt21"
}
],
"tags": ["programming", "python"],
"thumbnail": "",
"generate_subtitles": {
"Generated": {
"type": "whisper",
"source": "main"
}
},
"sources": {
"main": {
"video": "videos/myvideo.mp4",
"poster": "",
"playAudio": true
},
"second": {
"video": "myothervideo.mp4",
"poster": "",
"playAudio": false
}
},
"slides": {
"demux_file": "mydemux.txt",
"poster": ""
}
}
An update job making some changes that don't involve uploads:
{
"pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"title": {
"en": "My Presentation",
"sv": "Min presentation"
},
"description": "A description of the contents",
"created": 1665151669,
"presenters": ["A Person", "ausername"],
"courses": [],
"subtitles": {
"Generated": null
}
}
Notifications
The daemon will return two types of notifications: success
and error
. Both
are JSON objects.
Keys that are common between the two types:
-
type
:"success"
or"error"
Used to identify the type of notification. -
origin
:string
The name of the step in the pipeline that sent the notification.success
notifications will always originate from a Handler, but errors can come from other sources. Mostly useful for debugging. -
jobid
:string
The ID of the job this notification is about. This will be the same ID that was returned by the REST API when submitting the job specification.
Keys unique to sucess
notifications:
-
package
:JSON
The complete package metadata as stored on disk when the notification was sent. The most recent notification received for a job will reflect the current state of the package as stored on the backend. See Package structure for details. -
pending
:list of strings
Lists all pending Handlers for this job. If this list is empty, the job is completely finished.
Keys unique to error
notifications:
message
:string
An error message elaborating on what went wrong with the job.
Package structure
This section documents the format used for packages in notifications. Each package is a JSON object with the following keys:
-
pkg_id
:string
The ID of the package. -
contents
:JSON
The metadata as a JSON object.
The keys in the contents
object are:
-
title
:JSON
The title in the supported languages. Its format is:{ "sv": "Titel på svenska", "en": "Title in english" }
-
description
:string
A description for the presentation. No i18n support as of yet. -
created
:int
A unix timestamp representing the time and date of recording. -
duration
:int
orfloat
The duration of the presentation in seconds. -
presenters
:list of strings
A list of presenters. A presenter may be represented by an SU username or a real name. -
courses
:list of strings
A list of courses this presentation is associated with. A course is represented by its course code. -
tags
:list of strings
A list of arbitrary strings. -
thumb
:string
The relative URL to the tumbnail for this presentation. -
subtitles
:string
The relative URL to this presentation's subtitles. -
sources
:JSON
A JSON object representing playable video sources.
A sources object maps source names to source definitions:
{
"source1": {...},
"source2": {...}
}
The format of a source definition is documented under the heading Package sources.
Package sources
A source definition is a JSON object with the following keys:
-
poster
:string
The relative URL to an image shown in the player while the stream loads. -
video
:JSON
A JSON object mapping resolutions to relative video URLS:{"720": "video-720-variant.mp4", "1080": "video-1080-variant.mp4"}
-
playAudio
:bool
A boolean value denoting whether to this stream's audio track. This will only be set totrue
for one source in a given package. -
enabled
:bool
Whether this stream will be displayed in the player. At least one stream will be enabled.