mingyang91 commited on
Commit
a94c72f
1 Parent(s): 5df3ca5
README.MD ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Polyhedron
2
+
3
+ Polyhedron is a voice chat application designed to enable real-time transcription and translation for training across language barriers.
4
+
5
+ ## Overview
6
+ The app allows a trainer to conduct lessons in their native language, while trainees can receive instructions translated into their own languages.
7
+
8
+ ## Key features:
9
+
10
+ Real-time voice transcription of the trainer's speech using Amazon Transcribe
11
+ Translates speech into the trainee's language using Amazon Translate
12
+ Displays translated text to trainees in real-time
13
+ Allows trainer to see transcription and repeat unclear sections
14
+ Facilitates training in multilingual organizations
15
+ Polyhedron uses WebSockets to stream audio and text between clients. The frontend is built with React and Vite.
16
+
17
+ The backend is developed in Rust using the Poem web framework with WebSockets support. It interfaces with AWS services for transcription, translation and text-to-speech.
18
+
19
+ Configuration like AWS credentials and models are specified in config.yaml.
20
+
21
+ ## Getting Started
22
+ To run Polyhedron locally:
23
+
24
+ Clone the repository
25
+ Run `cargo run`
26
+
27
+ Open http://localhost:8080 in the browser
28
+ ## Architecture
29
+
30
+ ![Completed Architecture](./docs/HR%20Training-Completed.drawio.svg)
31
+ Polyhedron uses a broadcast model to share transcription, translation, and speech synthesis work between clients.
32
+
33
+ - A single transcription is generated for the speaker and shared with all language clients.
34
+
35
+ - The transcript is translated once per language and shared with clients of that language.
36
+
37
+ - Speech is synthesized once per voice and shared with clients selecting that voice.
38
+
39
+ This optimized architecture minimizes redundant work and cost:
40
+
41
+ - Automatic speech recognition (ASR) is done only once for the speaker and broadcast.
42
+
43
+ - Translation is done once per language from the shared transcript and broadcast.
44
+
45
+ - Text-to-speech (TTS) synthesis is done once per voice and broadcast.
46
+
47
+ By sharing the intermediate outputs, the system avoids duplicating work across clients. This allows serving many users efficiently and cost effectively.
48
+
49
+ The components communicate using WebSockets and channels to distribute the shared outputs.
50
+
51
+ ![Simply Architecture](./docs/HR%20Training-Simple.drawio.svg)
52
+ The system architecture with a single listener can be summarized as:
53
+
54
+ - Speaker voice input ->
55
+ - ASR Transcription (English) ->
56
+ - Translation to Listener language ->
57
+ - TTS Synthesis in Listener language ->
58
+ - Voice output in Listener language
59
+ The speaker's voice is transcribed to text using ASR in the speaker's language (e.g. English).
60
+
61
+ The transcript is then translated to the listener's language.
62
+
63
+ Text-to-speech synthesis converts the translated text into a voice audio in the listener's language.
64
+
65
+ This synthesized voice audio is played out as output to the listener.
66
+
67
+ The architecture forms a linear pipeline from speaker voice input to listener voice output, with transcription, translation and synthesis steps in between.
68
+
69
+ ## Directory Structure
70
+
71
+ - `src/`: Main Rust backend source code
72
+ - `main.rs`: Entry point and server definition
73
+ - `config.rs`: Configuration loading
74
+ - `lesson.rs`: Lesson management and audio streaming
75
+ - `whisper.rs`: Whisper ASR integration
76
+ - `group.rs`: Group management
77
+ - `static/`: Frontend JavaScript and assets
78
+ - `index.html`: Main HTML page
79
+ - `index.js`: React frontend code
80
+ - `recorderWorkletProcessor.js`: Audio recorder WebWorker
81
+ - `models/`: Whisper speech recognition models
82
+ - `config.yaml`: Server configuration
83
+ - `Cargo.toml`: Rust crate dependencies
84
+
85
+ ## Contributing
86
+ Contributions welcome! Please open an issue or PR.
docs/HR Training-Completed.drawio.svg ADDED
docs/HR Training-Simple.drawio.svg ADDED