Reviewed-on: #5
Important
Not production ready. More work is needed. This works as an mvp demo for testing.
Seshat Audio Transcription App
Seshat was the ancient Egyptian goddess of writing, knowledge, and wisdom. She played a key role in Egyptian mythology as a scribe and keeper of records, closely associated with Thoth, the god of wisdom and writing. While Thoth was often considered her counterpart or consort, Seshat had her own distinct identity and responsibilities.
Seshat Audio Transcription App is a web application that allows users to upload audio files, transcribe them using Whisper AI, and manage their files. Users can monitor the status of their uploaded files and perform bulk operations like downloading or deleting multiple files.
Features
-
User Authentication:
- Register, login, and logout functionalities.
- Secure access to user-specific files and operations.
-
File Upload:
- Upload audio files with an option to specify the file's language.
- File metadata is stored, including upload time, file name, and language.
-
File Processing:
- Transcription powered by AI tools (e.g., Whisper AI).
- Job status monitoring:
PENDING
,PROCESSING
,COMPLETED
, andFAILED
.
-
File Management:
- Bulk download (returns a ZIP file) and delete functionalities.
- Displays the status of uploaded files.
-
Responsive UI:
- Built with Thymeleaf, Bootstrap, and JavaScript for a seamless user experience.
Technologies Used
-
Backend:
- Spring Boot (Java 17)
- Spring Security
- Spring Data JPA
- Hibernate
-
Database:
- MariaDB (Dockerized using Docker Compose)
-
Frontend:
- Thymeleaf
- Bootstrap
- JavaScript
-
Other Tools:
- Whisper AI for transcription
- Apache Commons IO for file management
- BlockingQueue for job queuing
How to Run
Prerequisites
- Java 17 or later
- Docker and Docker Compose
- Maven
Steps to Run
-
Clone the Repository:
git clone https://github.com/your-repo/seshat-app.git cd seshat-app
-
Create necessary directories:
mkdir -p /seshat/uploads mkdir -p /seshat/outputs
-
Set ownership of the directories:
- Important: Replace
$USER
with the username that runs the application.
sudo chown -R $USER:$USER /seshat
- Important: Replace
File Upload Workflow
-
User Uploads File:
- Users upload audio files through the upload form.
- The form includes a language selection dropdown.
-
Store File and Metadata:
- The uploaded file is stored in the user's directory.
- Metadata such as the file name, upload time, and selected language is saved to the database.
-
Queue for Processing:
- Uploaded files are added to a job queue with a
PENDING
status. - The transcription process begins when a job is picked from the queue.
- Uploaded files are added to a job queue with a
-
Transcription and Updates:
- The system calls Whisper AI to transcribe the audio.
- The job status is updated to
PROCESSING
,COMPLETED
, orFAILED
based on the outcome.
Bulk Operations
Bulk Download
- Users can select multiple files for download.
- The system creates a ZIP file containing the selected files and sends it to the user.
Bulk Delete
- Users can select multiple files and delete them in one operation.
- Deleted files are removed from both the database and storage.
Database Schema
FileMetadata Table
Column | Type | Description |
---|---|---|
id |
BIGINT |
Unique identifier |
file_name |
VARCHAR(255) |
Original file name |
file_path |
TEXT |
Physical file location |
language |
VARCHAR(10) |
Language of the audio file |
job_status |
VARCHAR(20) |
Processing status |
user_id |
BIGINT |
Associated user ID |
uploaded_at |
DATETIME |
File upload timestamp |
Development Notes
Endpoints Overview
HTTP Method | Endpoint | Description |
---|---|---|
POST |
/files/upload |
Uploads a file |
POST |
/files/download-zip |
Bulk download as a ZIP |
POST |
/files/bulk-delete |
Bulk delete files |
GET |
/files/manage |
File management page |
Configuration
- **
application.properties
**:- spring.servlet.multipart.max-file-size=5GB
- spring.servlet.multipart.max-request-size=5GB
- app.upload-root=/seshat/uploads
- app.output-root=/seshat/outputs
Future Enhancements
- Single sign on:
- Implement OAuth2 for single sign-on.
- Prevent file naming conflicts:
- If the user wants to upload the same file again the output will overwrite the previous data.
- It is better to allow different versions of the same file(s) to exist.
- We could implement a UUID approach and make sure to map correctly to the file name.
- Video Transcription:
- Extend the app to support video files when and if whisper AI will support video files.
- Extract audio from video files and transcribe the audio content.
- Real-time progress tracking for transcription jobs:
- Users can see the progress of their transcription jobs without refreshing the page.
- Multifile upload:
- Allow users to upload multiple files at once.
- Ability to select transcription output by format:
- Allow users to select the output format of the transcription (e.g., plain text, JSON, srt)
- Uploaded at date should also be the same on the generated files from the transcription:
- The date of the file should be the same as the uploaded date.
- Save the metadata of the input file:
- Save the metadata of the input file in the database.
- So we can infer which generated file corresponds to which input file.
Contributors
- First demo version:
- Nikolaus Athanassiadis