DSV play daemon v2 - this time it's threaded

This application's job is to monitor for presentation uploads, process them and send updates about processing progress to a callback URL.

Python dependencies

This application is written for python3. The libraries required are specified in requirements.txt.

Setup

Create config.ini and populate it according to the instructions in config.ini.example.

The provided example systemd unit (play-daemon.service.example) should be updated to reflect the path to the application and the desired user to run as, then installed to /etc/systemd/system and activated as normal.

Job specification structure

A job specification is a JSON object with fields specifying what changes to make to a package (or what to store in a new package).

All fields are technically optional, but the different Handlers require certain combinations of keyss in order to accept a job as valid.

An omitted key that does not cause an error will leave the package data for that key unchanged.

Valid top-level keys and their expected values are:

  • pkg_id: string
    The ID of an existing package to modify. If omitted, a new package will be created by this job.

  • upload_dir: string
    The directory where any files relevant to this job have been uploaded. Must be a subdirectory of uploaddir as specified in config.ini.

  • title: JSON
    The title in the supported languages. Its format is:

    {
      "sv": "Titel på svenska",
      "en": "Title in english"
    }
    
  • description: string
    A description for the presentation. No i18n support as of yet.

  • created: int
    A unix timestamp representing the time and date of recording.

  • duration: int or float
    The duration of the presentation in seconds.

  • presenters: list of strings
    A list of presenters. A presenter may be represented by an SU username or a real name.

  • courses: list of JSON
    A list of courses this presentation is associated with. A course is represented by a JSON object with two keys - designation and semester.

  • tags: list of strings
    A list of arbitrary strings.

  • thumb: string
    The path to a file that will be used as the tumbnail for this presentation. Relative to upload_dir. If an empty string is passed, a new thumbnail will be generated based on stored metadata and any changes passed in this job.

  • subtitles: JSON
    A JSON object representing the subtitles to be acted upon. Its format is documented in detail under the heading Subtitles.

  • generate_subtitles: JSON
    A JSON object representing subtitle tracks to be generated. Its format is documented in detail under the heading Subtitles.

  • sources: JSON
    A JSON object representing the sources to be acted upon. Its format is documented in detail under the heading Job Sources.

  • slides: string
    The path to a file representing a slideshow of images to be transcoded to video. Relative to upload_dir. Details under the heading Slides.

Subtitles

There are two top-level keys that deal with subtitles: subtitles and generate_subtitles. The subtitles object is a simple key-value map, mapping subtitle track names to subtitle files to be stored. The generate_subtitles object maps subtitle track names to generation tasks. Keys must be unique across these two maps.

If the value for a given key in subtitles is null, that track is deleted from the presentation. Non-null values must be files located under upload_dir.

Any subtitle tracks that exist in the presentation but are omitted in the job specification are left unmodified.

Values in generate_subtitles are objects with the following structure:

  • type: string
    The transcription engine to be used for generating the subtitles. Currently only supports the value "whisper".

  • source: string
    The name of one of this presentation's video streams, which will be used to generate the subtitle track. Should preferrably use a stream with a camera feed for best synchronization results. The stream may either be an already existing one or one created by this job.

Here is an example of valid subtitles and generate_subtitles sections:

{
    "subtitles": {
        "English": "path/to/subs.vtt",
        "Svenska": null
    },
    "generate_subtitles": {
        "Generated": {
            "type": "whisper",
            "source": "camera"
        }
    }
}

This example would save the provided WEBVTT file for the "English" track, generate subtitles for the "Generated" track based on the given source and delete the "Svenska" track. Note that the source "camera" must exist (either in this job specification or in the package already on disk in case of an update), and upload_dir must be provided in the job specification in order to be able to resolve the path to the English subtitle track.

Job Sources

The sources object consists of a number of keys, each corresponding to a named source. Each source value is itself a JSON object with a number of keys.

These are the valid keys for a source object:

  • video: string
    The path to a video file to be used for this source. Relative to upload_dir.

  • poster: string
    The path to an image to be used as placeholder for this video stream while the player is loading. Relative to upload_dir. An empty value will cause a new poster to be generated based on either the passed value of video or the already stored video as appropriate.

  • playAudio: bool
    Whether this stream should play its audio track on playback. This should only have a true value for one stream per presentation. If omitted on stream creation, this will defauilt to false.

A sources object would look like this:

{
  "asourcename": {
    "video": "some/path",
    "poster": "some/other/path",
    "playAudio": someBool
  },
  "anothersource": {...},
  ...
}

The source name "slides" is reserved for streams generated from slideshow images (See Slides).

As with the top-level keys, keys in slides and stream objects may be omitted if their stored data is to be left unmodified.

Slides

The slides key is used to create video from image-based slideshows. The value should be the relative path to a file formatted according to the requirements of the FFMPEG concat demuxer, which is documented here.

This is a basic example of a demux file:

ffconcat version 1.0
file 'path/to/slide1.png'
duration 4500ms

file 'path/to/slide2.jpg'
# The default unit is seconds:
duration 3

# The final image must be specified again with no duration:
file 'path/to/slide2.jpg'

All paths must be relative to upload_dir and pass the requirements imposed by the safe keyword as documented for concat.

The slide stream will be added to the package's sources object under the name "slides". If one already existed under that name it will be replaced.

Deletion

In general, passing a falsy value will delete the data for the relevant key. Empty string and list values will simply overwrite any stored data with the empty values. There are some keys where this isn't the case, they are documented above.

In order to delete a stream, pass null as the value of the appropriate stream name key:

{
  "pkg_id": "some_package_id",
  "sources": {
    "sourcename": null
  }
}

Deletions are not special and can be combined with any other desired updates to a package in a single job specification, barring contradictory job specifications such as adding a slide stream while also deleting it.

Examples

This is a job specification that has all keys and values:

{
    "pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "upload_dir": "/configured-uldir/myjobfiles",
    "title": {
        "en": "My Presentation",
        "sv": "Min presentation"
    },
    "description": "A description of the contents",
    "created": 1665151669,
    "duration": 500.5,
    "presenters": ["A Person", "ausername"],
    "courses": ["IDSV", "PROG1"],
    "tags": ["programming", "python"],
    "thumb": "mythumb.jpg",
    "subtitles": {
        "English": "en.vtt",
        "Swedish": "swedishsubs.vtt"
    },
    "generate_subtitles": {
        "Generated": {
            "type": "whisper",
            "source": "main"
        }
    },
    "sources": {
        "main": {
            "video": "videos/myvideo.mp4",
            "poster": "aposter.jpg",
            "playAudio": true
        },
        "second": {
            "video": "myothervideo.mp4",
            "poster": "anotherposter.jpg",
            "playAudio": false
        }
    },
    "slides": {
        "demux_file": "mydemux.txt",
        "poster": "slides/myfavorite.png"
    }
}

This job specification creates a new package, letting the daemon generate thumbnail, subtitles and posters:

{
    "upload_dir": "/configured-uldir/myjobfiles",
    "title": {
        "en": "My Presentation",
        "sv": "Min presentation"
    },
    "description": "A description of the contents",
    "created": 1665151669,
    "duration": 500.5,
    "presenters": ["A Person", "ausername"],
    "courses": ["IDSV", "PROG1"],
    "tags": ["programming", "python"],
    "thumbnail": "",
    "generate_subtitles": {
        "Generated": {
            "type": "whisper",
            "source": "main"
        }
    },
    "sources": {
        "main": {
            "video": "videos/myvideo.mp4",
            "poster": "",
            "playAudio": true
        },
        "second": {
            "video": "myothervideo.mp4",
            "poster": "",
            "playAudio": false
        }
    },
    "slides": {
        "demux_file": "mydemux.txt",
        "poster": ""
    }
}

An update job making some changes that don't involve uploads:

{
    "pkg_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "title": {
        "en": "My Presentation",
        "sv": "Min presentation"
    },
    "description": "A description of the contents",
    "created": 1665151669,
    "presenters": ["A Person", "ausername"],
    "courses": [],
    "subtitles": {
        "Generated": null
    }
}

Notifications

The daemon will return two types of notifications: success and error. Both are JSON objects.

Keys that are common between the two types:

  • type: "success" or "error"
    Used to identify the type of notification.

  • origin: string
    The name of the step in the pipeline that sent the notification. success notifications will always originate from a Handler, but errors can come from other sources. Mostly useful for debugging.

  • jobid: string
    The ID of the job this notification is about. This will be the same ID that was returned by the REST API when submitting the job specification.

Keys unique to sucess notifications:

  • package: JSON
    The complete package metadata as stored on disk when the notification was sent. The most recent notification received for a job will reflect the current state of the package as stored on the backend. See Package structure for details.

  • pending: list of strings
    Lists all pending Handlers for this job. If this list is empty, the job is completely finished.

Keys unique to error notifications:

  • message: string
    An error message elaborating on what went wrong with the job.

Package structure

This section documents the format used for packages in notifications. Each package is a JSON object with the following keys:

  • pkg_id: string
    The ID of the package.

  • contents: JSON
    The metadata as a JSON object.

The keys in the contents object are:

  • title: JSON
    The title in the supported languages. Its format is:

    {
      "sv": "Titel på svenska",
      "en": "Title in english"
    }
    
  • description: string
    A description for the presentation. No i18n support as of yet.

  • created: int
    A unix timestamp representing the time and date of recording.

  • duration: int or float
    The duration of the presentation in seconds.

  • presenters: list of strings
    A list of presenters. A presenter may be represented by an SU username or a real name.

  • courses: list of strings
    A list of courses this presentation is associated with. A course is represented by its course code.

  • tags: list of strings
    A list of arbitrary strings.

  • thumb: string
    The relative URL to the tumbnail for this presentation.

  • subtitles: string
    The relative URL to this presentation's subtitles.

  • sources: JSON
    A JSON object representing playable video sources.

A sources object maps source names to source definitions:

{
    "source1": {...},
    "source2": {...}
}

The format of a source definition is documented under the heading Package sources.

Package sources

A source definition is a JSON object with the following keys:

  • poster: string
    The relative URL to an image shown in the player while the stream loads.

  • video: JSON
    A JSON object mapping resolutions to relative video URLS:

    {"720": "video-720-variant.mp4",
     "1080": "video-1080-variant.mp4"}
    
  • playAudio: bool
    A boolean value denoting whether to this stream's audio track. This will only be set to true for one source in a given package.

Job submission

A job is submitted by first uploading any required files followed by submitting the job specification via the REST API.

Once the specification has been submitted, the daemon will pick it up and start applying the requested changes. The daemon will send notifications as the job works its way through the pipeline, with information about what steps are remaining. Once the list of remaining steps is empty, the job is done and no further notifications will be sent.

An error at any step of the process will abort the entire job with a notification about the error.

Pipeline structure

The following is an outline of how a job passes through the pipeline.

A failure at any step aborts the job and sends an error message to the originator via the Notifier. Failed jobs are not removed automatically, so it will be picked up again once the daemon is restarted.

  1. The QueueReader picks up a queue item from the queue directory
  2. The job type is read from the queue item and submitted to the appropriate Preprocessor upon successful validation.
  3. The Preprocessor builds a job in the common job format from the queue item and passes the result on to the Distributor.
  4. The Distributor submits the job to each Handler that wants it. It also notifies the Collector about which Handlers the job has been submitted to.
  5. Each Handler does its processing, notifying the Collector once finished.
  6. The Collector applies the Handler's changes to the package on disk, and passes the resulting package state to the Notifier.
  7. The Notifier notifies the originator with a message containing the current state of the package on disk and any further pending Handlers for the job.
Description
No description provided
Readme 1 MiB
Languages
Python 99.9%
Shell 0.1%