Important

Not production ready. More work is needed. This works as an mvp demo for testing.

Seshat Audio Transcription App

Seshat was the ancient Egyptian goddess of writing, knowledge, and wisdom. She played a key role in Egyptian mythology as a scribe and keeper of records, closely associated with Thoth, the god of wisdom and writing. While Thoth was often considered her counterpart or consort, Seshat had her own distinct identity and responsibilities.

Seshat Audio Transcription App is a web application that allows users to upload audio files, transcribe them using Whisper AI, and manage their files. Users can monitor the status of their uploaded files and perform bulk operations like downloading or deleting multiple files.


Features

  • User Authentication:

    • Register, login, and logout functionalities.
    • Secure access to user-specific files and operations.
  • File Upload:

    • Upload audio files with an option to specify the file's language.
    • File metadata is stored, including upload time, file name, and language.
  • File Processing:

    • Transcription powered by AI tools (e.g., Whisper AI).
    • Job status monitoring: PENDING, PROCESSING, COMPLETED, and FAILED.
  • File Management:

    • Bulk download (returns a ZIP file) and delete functionalities.
    • Displays the status of uploaded files.
  • Responsive UI:

    • Built with Thymeleaf, Bootstrap, and JavaScript for a seamless user experience.

Technologies Used

  • Backend:

    • Spring Boot (Java 17)
    • Spring Security
    • Spring Data JPA
    • Hibernate
  • Database:

    • MariaDB (Dockerized using Docker Compose)
  • Frontend:

    • Thymeleaf
    • Bootstrap
    • JavaScript
  • Other Tools:

    • Whisper AI for transcription
    • Apache Commons IO for file management
    • BlockingQueue for job queuing

How to Run

Prerequisites

  • Java 17 or later
  • Docker and Docker Compose
  • Maven

Steps to Run

  1. Clone the Repository:

    git clone https://github.com/your-repo/seshat-app.git
    cd seshat-app
    
    
  2. Create necessary directories:

    mkdir -p /seshat/uploads
    mkdir -p /seshat/outputs
    
    
  3. Set ownership of the directories:

    • Important: Replace $USER with the username that runs the application.
    sudo chown -R $USER:$USER /seshat
    
    

File Upload Workflow

  • User Uploads File:

    • Users upload audio files through the upload form.
    • The form includes a language selection dropdown.
  • Store File and Metadata:

    • The uploaded file is stored in the user's directory.
    • Metadata such as the file name, upload time, and selected language is saved to the database.
  • Queue for Processing:

    • Uploaded files are added to a job queue with a PENDING status.
    • The transcription process begins when a job is picked from the queue.
  • Transcription and Updates:

    • The system calls Whisper AI to transcribe the audio.
    • The job status is updated to PROCESSING, COMPLETED, or FAILED based on the outcome.

Bulk Operations

Bulk Download

  • Users can select multiple files for download.
  • The system creates a ZIP file containing the selected files and sends it to the user.

Bulk Delete

  • Users can select multiple files and delete them in one operation.
  • Deleted files are removed from both the database and storage.

Database Schema

FileMetadata Table

Column Type Description
id BIGINT Unique identifier
file_name VARCHAR(255) Original file name
file_path TEXT Physical file location
language VARCHAR(10) Language of the audio file
job_status VARCHAR(20) Processing status
user_id BIGINT Associated user ID
uploaded_at DATETIME File upload timestamp

Development Notes

Endpoints Overview

HTTP Method Endpoint Description
POST /files/upload Uploads a file
POST /files/download-zip Bulk download as a ZIP
POST /files/bulk-delete Bulk delete files
GET /files/manage File management page

Configuration

  • ** application.properties**:
    • spring.servlet.multipart.max-file-size=5GB
    • spring.servlet.multipart.max-request-size=5GB
    • app.upload-root=/seshat/uploads
    • app.output-root=/seshat/outputs

Future Enhancements

  • Single sign on:
    • Implement OAuth2 for single sign-on.
  • Prevent file naming conflicts:
    • If the user wants to upload the same file again the output will overwrite the previous data.
    • It is better to allow different versions of the same file(s) to exist.
    • We could implement a UUID approach and make sure to map correctly to the file name.
  • Video Transcription:
    • Extend the app to support video files when and if whisper AI will support video files.
    • Extract audio from video files and transcribe the audio content.
  • Real-time progress tracking for transcription jobs:
    • Users can see the progress of their transcription jobs without refreshing the page.
  • Multifile upload:
    • Allow users to upload multiple files at once.
  • Ability to select transcription output by format:
    • Allow users to select the output format of the transcription (e.g., plain text, JSON, srt)
  • Uploaded at date should also be the same on the generated files from the transcription:
    • The date of the file should be the same as the uploaded date.
  • Save the metadata of the input file:
    • Save the metadata of the input file in the database.
    • So we can infer which generated file corresponds to which input file.

Contributors

  • First demo version:
    • Nikolaus Athanassiadis
Description
Seshat Audio Transcription App is a web application that allows users to upload audio files, transcribe them using Whisper AI, and manage their files. Users can monitor the status of their uploaded files and perform bulk operations like downloading or deleting multiple files.
Readme 125 KiB
Languages
Java 83.6%
HTML 14.7%
CSS 1.7%