DMC/seshat

Go to file

Nico Athanassiadis 2ecdbf8d72 Merge pull request 'Removed manual check of email' (#35 ) from manual-email-check into develop

Reviewed-on: #35

2025-03-31 13:38:37 +02:00

Initial commit

2024-12-02 10:18:41 +01:00

Merge pull request 'Removed manual check of email' (#35 ) from manual-email-check into develop

2025-03-31 13:38:37 +02:00

.gitattributes

Initial commit

2024-12-02 10:18:41 +01:00

.gitignore

Added explicit whisper properties, so the application is easier to configure.

2025-01-14 12:46:08 +01:00

compose.yaml

SSO Oauth 2 - and other improvements.

2025-01-09 07:46:37 +01:00

mvnw

Initial commit

2024-12-02 10:18:41 +01:00

mvnw.cmd

Initial commit

2024-12-02 10:18:41 +01:00

pom.xml

Removed webjars

2025-01-31 09:37:28 +01:00

README.md

Update README.md

2025-03-07 10:13:20 +01:00

README.md

Seshat Audio Transcription App

Seshat was the ancient Egyptian goddess of writing, knowledge, and wisdom. She played a key role in Egyptian mythology as a scribe and keeper of records, closely associated with Thoth, the god of wisdom and writing. While Thoth was often considered her counterpart or consort, Seshat had her own distinct identity and responsibilities.

Seshat Audio Transcription App is a web application that allows users to upload audio files, transcribe them using Whisper AI, and manage their files. Users can monitor the status of their uploaded files and perform bulk operations like downloading or deleting multiple files.

Features

User Authentication:
- Register, login, and logout functionalities.
- Secure access to user-specific files and operations.
File Upload:
- Upload audio files with an option to specify the file's language.
- File metadata is stored, including upload time, file name, and language.
File Processing:
- Transcription powered by AI tools (e.g., Whisper AI).
- Job status monitoring: PENDING, PROCESSING, COMPLETED, and FAILED.
File Management:
- Bulk download (returns a ZIP file) and delete functionalities.
- Displays the status of uploaded files.
Responsive UI:
- Built with Thymeleaf, Bootstrap, and JavaScript for a seamless user experience.

Technologies Used

Backend:
- Spring Boot (Java 17)
- Spring Security
- Spring Data JPA
- Hibernate
Database:
- MariaDB (Dockerized using Docker Compose)
Frontend:
- Thymeleaf
- Bootstrap
- JavaScript
Other Tools:
- Whisper AI for transcription
- Apache Commons IO for file management
- BlockingQueue for job queuing

How to Run

Prerequisites

Java 17 or later
Docker and Docker Compose
Maven

Steps to Run

Clone the Repository:

git clone https://github.com/your-repo/seshat-app.git
cd seshat-app

Create necessary directories:

mkdir -p /seshat/uploads
mkdir -p /seshat/outputs

Set ownership of the directories:
- Important: Replace $USER with the username that runs the application.
```
sudo chown -R $USER:$USER /seshat
```

File Upload Workflow

User Uploads File:
- Users upload audio files through the upload form.
- The form includes a language selection dropdown.
Store File and Metadata:
- The uploaded file is stored in the user's directory.
- Metadata such as the file name, upload time, and selected language is saved to the database.
Queue for Processing:
- Uploaded files are added to a job queue with a PENDING status.
- The transcription process begins when a job is picked from the queue.
Transcription and Updates:
- The system calls Whisper AI to transcribe the audio.
- The job status is updated to PROCESSING, COMPLETED, or FAILED based on the outcome.

Bulk Operations

Bulk Download

Users can select multiple files for download.
The system creates a ZIP file containing the selected files and sends it to the user.

Bulk Delete

Users can select multiple files and delete them in one operation.
Deleted files are removed from both the database and storage.

Database Schema

FileMetadata Table

Column	Type	Description
`id`	`BIGINT`	Unique identifier
`file_name`	`VARCHAR(255)`	Original file name
`file_path`	`TEXT`	Physical file location
`language`	`VARCHAR(10)`	Language of the audio file
`job_status`	`VARCHAR(20)`	Processing status
`user_id`	`BIGINT`	Associated user ID
`uploaded_at`	`DATETIME`	File upload timestamp

Development Notes

Endpoints Overview

HTTP Method	Endpoint	Description
`POST`	`/files/upload`	Uploads a file
`POST`	`/files/download-zip`	Bulk download as a ZIP
`POST`	`/files/bulk-delete`	Bulk delete files
`GET`	`/files/manage`	File management page

Configuration

** application.properties**:
- spring.servlet.multipart.max-file-size=5GB
- spring.servlet.multipart.max-request-size=5GB
- app.upload-root=/seshat/uploads
- app.output-root=/seshat/outputs

Future Enhancements

Single sign on:
- Implement OAuth2 for single sign-on.
Prevent file naming conflicts:
- If the user wants to upload the same file again the output will overwrite the previous data.
- It is better to allow different versions of the same file(s) to exist.
- We could implement a UUID approach and make sure to map correctly to the file name.
Video Transcription:
- Extend the app to support video files when and if whisper AI will support video files.
- Extract audio from video files and transcribe the audio content.
Real-time progress tracking for transcription jobs:
- Users can see the progress of their transcription jobs without refreshing the page.
Multifile upload:
- Allow users to upload multiple files at once.
Ability to select transcription output by format:
- Allow users to select the output format of the transcription (e.g., plain text, JSON, srt)
Uploaded at date should also be the same on the generated files from the transcription:
- The date of the file should be the same as the uploaded date.
Save the metadata of the input file:
- Save the metadata of the input file in the database.
- So we can infer which generated file corresponds to which input file.

Contributors

First demo version:
- Nikolaus Athanassiadis

Description

Seshat Audio Transcription App is a web application that allows users to upload audio files, transcribe them using Whisper AI, and manage their files. Users can monitor the status of their uploaded files and perform bulk operations like downloading or deleting multiple files.

Readme 2.2 MiB