# DSV play daemon v2 - this time it's threaded This application's job is to monitor for presentation uploads, process them and send updates about processing progress to a callback URL. ## Python dependencies This application is written for python3. The libraries required are specified in `requirements.txt`. ## Setup Ensure the `ffmpeg` and `ffprobe` binaries are available on the system. Create a virtualenv for use by the application and install the dependencies into it. Create `config.ini` and populate it according to the instructions in `config.ini.example`. The provided example systemd unit (`play-daemon.service.example`) expects the application to be installed under `/opt/play-daemon`, with a venv in the `env` subdirectory and run as the `play-daemon` user. Change the unit file as required if any of these do not match the actual installation. The running user must have a writable home directory. Also take note of the `TRANSFORMERS_CACHE` environment variable that needs to be set to a directory writable by the running user. The unit file (with any necessary modifications made) should be copied to `/etc/systemd/system` and activated as normal. ## Workflow overview A job is submitted by first uploading any required files to a unique subdirectory of `uploaddir` as set in `config.ini`, followed by submitting a job specification via the REST API. Once the specification has been submitted, the job is passed through the following steps: * Job pickup: The job is detected by the daemon and the job type validated. * Preprocessing: If the job passed validation, it is preprocessed to unify all incoming formats into the format documented here. * Distribution: The distributor checks which handlers should be applied to this job and distributes it appropriately. Each relevant handler validates the job before it is submitted. * Handlers: Each handler does its specific processing and notifies the collector once done. * Collection: The collector waits for each handler to finish, applying each handler's changes as they report that they are finished. * Notification: Each successfully applied handler and any errors generate a notification. Each notification contains information about further pending handlers. If the list of pending handlers is empty, the job as a whole is finished and no further notifications will be sent. An error at any step of the process will abort the entire job with a notification about the error. ## Queue files Queue files are created by the REST API and shouldn't normally require manual creation, but are documented here for completeness. A queue file contains a JSON object with the following keys: * `type`: `string` The type of job this is. See [Job types](#job-types) below for details. * `data`: `JSON` The actual job specification as documented under [Job specification structure](#job-specification-structure). * `recorder`: `string` Unique to the cattura job type. Specifies which recorder the job was submitted from. ## Job types The daemon recognizes jobs of three different types, which are passed on from the REST API: * `default` The standard job type for uploads. The expected incoming format matches what is documented here. * `cattura` Jobs coming from Cattura recorders. The incoming format is completely determined by the recorder, which is then reformatted to the documented format by the cattura preprocessor. * `mediasite` Mediasite import jobs. These largely conform to this documentation but provide links to source files instead of uploaded files. The mediasite preprocessor downloads linked media and reformats the job to the documented format. ## Job specification structure A job specification is a JSON object with fields specifying what changes to make to a package (or what to store in a new package). All fields are technically optional, but the different Handlers require certain combinations of keyss in order to accept a job as valid. An omitted key that does not cause an error will leave the package data for that key unchanged. Valid top-level keys and their expected values are: * `pkg_id`: `string` The ID of an existing package to modify. If omitted, a new package will be created by this job. * `upload_dir`: `string` The directory where any files relevant to this job have been uploaded. Must be a subdirectory of `uploaddir` as specified in `config.ini`. * `notification_url`: `string` The remote endpoint where notifications about this job should be sent via HTTP POST request. Must not require any authentication. If omitted, the default endpoint configured in `config.ini` is used instead. For testing purposes this can also be a local directory on the server, which exists and is writable by the user the daemon is running as. * `title`: `JSON` The title in the supported languages. Its format is: ```json { "sv": "Titel på svenska", "en": "Title in english" } ``` * `description`: `string` A description for the presentation. No i18n support as of yet. * `created`: `int` A unix timestamp representing the time and date of recording. * `presenters`: `list of strings` A list of presenters. A presenter may be represented by an SU username or a real name. * `courses`: `list of JSON` A list of courses this presentation is associated with. A course is represented by a JSON object with two keys - `designation` and `semester`. * `tags`: `list of strings` A list of arbitrary strings. * `thumb`: `string` The path to a file that will be used as the tumbnail for this presentation. Relative to `upload_dir`. If an empty string is passed, a new thumbnail will be generated based on stored metadata and any changes passed in this job. * `subtitles`: `JSON` A JSON object representing the subtitles to be acted upon. Its format is documented in detail under the heading [Subtitles](#subtitles). * `generate_subtitles`: `JSON` A JSON object representing subtitle tracks to be generated. Its format is documented in detail under the heading [Subtitles](#subtitles). * `sources`: `JSON` A JSON object representing the sources to be acted upon. Its format is documented in detail under the heading [Job Sources](#job-sources). * `slides`: `string` The path to a file representing a slideshow of images to be transcoded to video. Relative to `upload_dir`. Details under the heading [Slides](#slides). ### Subtitles There are two top-level keys that deal with subtitles: `subtitles` and `generate_subtitles`. The `subtitles` object is a simple key-value map, mapping subtitle track names to subtitle files to be stored. The `generate_subtitles` object maps subtitle track names to generation tasks. Keys must be unique across these two maps. If the value for a given key in `subtitles` is `null`, that track is deleted from the presentation. Non-null values must be files located under `upload_dir`. Any subtitle tracks that exist in the presentation but are omitted in the job specification are left unmodified. Values in `generate_subtitles` are objects with the following structure: * `type`: `string` The transcription engine to be used for generating the subtitles. Currently only supports the value "whisper". This key is mandatory. * `source`: `string` The name of one of this presentation's video streams, which will be used to generate the subtitle track. Should preferrably use a stream with a camera feed for best synchronization results. The stream may either be an already existing one or one created by this job. This key is mandatory. * `language`: `string` The language in which to generate subtitles. If omitted, the language will be inferred by the subtitling engine. Should mainly be used to force the correct language when the automatic detection fails. The language string should be a two-letter ISO 639-1 language code, such as `sv` or `en`. Here is an example of valid `subtitles` and `generate_subtitles` sections: ```json { "subtitles": { "English": "path/to/subs.vtt", "Svenska": null }, "generate_subtitles": { "English (generated)": { "type": "whisper", "source": "camera", "language": "en" } } } ``` This example would save the provided WEBVTT file for the "English" track, generate subtitles for the "English (generated)" track based on the given source and delete the "Svenska" track. Note that the source "camera" must exist (either in this job specification or in the package already on disk in case of an update), and `upload_dir` must be provided in the job specification in order to be able to resolve the path to the English subtitle track. ### Job Sources The sources object consists of a number of keys, each corresponding to a named source. Each source value is itself a JSON object with a number of keys. These are the valid keys for a source object: * `video`: `string` The path to a video file to be used for this source. Relative to `upload_dir`. * `poster`: `string` The path to an image to be used as placeholder for this video stream while the player is loading. Relative to `upload_dir`. An empty value will cause a new poster to be generated based on either the passed value of `video` or the already stored video as appropriate. * `playAudio`: `bool` Whether this stream should play its audio track on playback. This should only have a `true` value for one stream per presentation. If omitted on stream creation, this will defauilt to `false`. * `enabled`: `bool` Whether this stream will be displayed in the player. At least one stream must be enabled. If omitted on stream creation, this will deafult to `true`. A `sources` object would look like this: ```json { "asourcename": { "video": "some/path", "poster": "some/other/path", "playAudio": someBool, "enabled": somebool, }, "anothersource": {...}, ... } ``` The source name "slides" is reserved for streams generated from slideshow images (See [Slides](#slides)). As with the top-level keys, keys in slides and stream objects may be omitted if their stored data is to be left unmodified. ### Slides The slides key is used to create video from image-based slideshows. The value should be the relative path to a file formatted according to the requirements of the FFMPEG `concat` demuxer, which is documented [here](https://ffmpeg.org/ffmpeg-formats.html#concat). This is a basic example of a demux file: ```ffconcat ffconcat version 1.0 file 'path/to/slide1.png' duration 4500ms file 'path/to/slide2.jpg' # The default unit is seconds: duration 3 # The final image must be specified again with no duration: file 'path/to/slide2.jpg' ``` All paths must be relative to `upload_dir` and pass the requirements imposed by the `safe` keyword as documented for `concat`. The slide stream will be added to the package's `sources` object under the name "slides". If one already existed under that name it will be replaced. ### Deletion In general, passing a falsy value will delete the data for the relevant key. Empty string and list values will simply overwrite any stored data with the empty values. There are some keys where this isn't the case, they are documented above. In order to delete a stream, pass `null` as the value of the appropriate stream name key: ```json { "pkg_id": "some_package_id", "sources": { "sourcename": null } } ``` Deletions are not special and can be combined with any other desired updates to a package in a single job specification, barring contradictory job specifications such as adding a slide stream while also deleting it. ### Examples This is a job specification that has all keys and values: ```json { "pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "upload_dir": "/configured-uldir/myjobfiles", "title": { "en": "My Presentation", "sv": "Min presentation" }, "description": "A description of the contents", "created": 1665151669, "presenters": ["A Person", "ausername"], "courses": [ { "designation": "IDSV", "semester": "vt21" }, { "designation": "PROG1", "semester": "vt21" } ], "tags": ["programming", "python"], "thumb": "mythumb.jpg", "subtitles": { "English": "en.vtt", "Swedish": "swedishsubs.vtt" }, "generate_subtitles": { "Generated": { "type": "whisper", "source": "main" } }, "sources": { "main": { "video": "videos/myvideo.mp4", "poster": "aposter.jpg", "playAudio": true, "enabled": true }, "second": { "video": "myothervideo.mp4", "poster": "anotherposter.jpg", "playAudio": false, "enabled": false } }, "slides": { "demux_file": "mydemux.txt", "poster": "slides/myfavorite.png" } } ``` This job specification creates a new package, letting the daemon generate thumbnail, subtitles and posters: ```json { "upload_dir": "/configured-uldir/myjobfiles", "title": { "en": "My Presentation", "sv": "Min presentation" }, "description": "A description of the contents", "created": 1665151669, "presenters": ["A Person", "ausername"], "courses": [ { "designation": "IDSV", "semester": "vt21" }, { "designation": "PROG1", "semester": "vt21" } ], "tags": ["programming", "python"], "thumbnail": "", "generate_subtitles": { "Generated": { "type": "whisper", "source": "main" } }, "sources": { "main": { "video": "videos/myvideo.mp4", "poster": "", "playAudio": true }, "second": { "video": "myothervideo.mp4", "poster": "", "playAudio": false } }, "slides": { "demux_file": "mydemux.txt", "poster": "" } } ``` An update job making some changes that don't involve uploads: ```json { "pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "title": { "en": "My Presentation", "sv": "Min presentation" }, "description": "A description of the contents", "created": 1665151669, "presenters": ["A Person", "ausername"], "courses": [], "subtitles": { "Generated": null } } ``` ## Notifications The daemon will return two types of notifications: `success` and `error`. Both are JSON objects. Keys that are common between the two types: * `type`: `"success"` or `"error"` Used to identify the type of notification. * `origin`: `string` The name of the step in the pipeline that sent the notification. `success` notifications will always originate from a Handler, but errors can come from other sources. Mostly useful for debugging. * `jobid`: `string` The ID of the job this notification is about. This will be the same ID that was returned by the REST API when submitting the job specification. Keys unique to `sucess` notifications: * `package`: `JSON` The complete package metadata as stored on disk when the notification was sent. The most recent notification received for a job will reflect the current state of the package as stored on the backend. See [Package structure](#package-structure) for details. * `pending`: `list of strings` Lists all pending Handlers for this job. If this list is empty, the job is completely finished. Keys unique to `error` notifications: * `message`: `string` An error message elaborating on what went wrong with the job. ## Package structure This section documents the format used for packages in notifications. Each package is a JSON object with the following keys: * `pkg_id`: `string` The ID of the package. * `contents`: `JSON` The metadata as a JSON object. The keys in the `contents` object are: * `title`: `JSON` The title in the supported languages. Its format is: ```json { "sv": "Titel på svenska", "en": "Title in english" } ``` * `description`: `string` A description for the presentation. No i18n support as of yet. * `created`: `int` A unix timestamp representing the time and date of recording. * `duration`: `int` or `float` The duration of the presentation in seconds. * `presenters`: `list of strings` A list of presenters. A presenter may be represented by an SU username or a real name. * `courses`: `list of strings` A list of courses this presentation is associated with. A course is represented by its course code. * `tags`: `list of strings` A list of arbitrary strings. * `thumb`: `string` The relative URL to the tumbnail for this presentation. * `subtitles`: `string` The relative URL to this presentation's subtitles. * `sources`: `JSON` A JSON object representing playable video sources. A sources object maps source names to source definitions: ```json { "source1": {...}, "source2": {...} } ``` The format of a source definition is documented under the heading [Package sources](#package-sources). ### Package sources A source definition is a JSON object with the following keys: * `poster`: `string` The relative URL to an image shown in the player while the stream loads. * `video`: `JSON` A JSON object mapping resolutions to relative video URLS: ```json {"720": "video-720-variant.mp4", "1080": "video-1080-variant.mp4"} ``` * `playAudio`: `bool` A boolean value denoting whether to this stream's audio track. This will only be set to `true` for one source in a given package. * `enabled`: `bool` Whether this stream will be displayed in the player. At least one stream will be enabled.