The solution uses speech recognition to transcribe audio recordings of lab experiments into text. The transcribed text is then processed and structured according to JSON schema, and then written into an ELN entry in NOMAD. An example of a speech generated entry in a NOMAD Oasis can be seen here This allows for efficient and hands-free documentation of lab experiments.
-
Speech Recognition: The
speech_to_instance.py
script uses thespeech_recognition
andwhisper
libraries to transcribe audio into text. The audio is captured using a microphone, and the transcription is done in real-time. -
Text Processing and Structuring: The transcribed text is processed and structured according to the NOMAD schema defined in
nomad_schema.archive.yaml
. Thecreate_solution_entry
function is used to create structured data entries for NOMAD. -
ELN Entry: The structured data is written into an ELN as a JSON file. This is done in the
main
function.
To run the script, use the following command:
python speech_to_instance.py
The script will start recording audio and transcribing it into text. It will then process the text and write structured data entries into an ELN.
An example notebook demonstrating the conversion from the extracted text from the audio to a structured JSON schema is available in the text_to_instance.ipynb
.
The project depends on several Python libraries, including speech_recognition
, whisper
, langchain
, pygame
, gtts
, and pydub
. It is recommended to create a virtual environment first. Then the dependencies can be installed using the requirements file:
pip install -r requirements.txt
Additionally make sure you have ffmpeg installed. On Windows we recommend using the chocolately package manager to install ffmpeg.
This solution provides a hands-free and efficient way to document lab experiments and write structured data entries into an ELN. It is particularly useful for labs where manual documentation can be cumbersome or impractical because scientists might need both hands in the glovebox while experimenting. The use of a local instance of an LLM is very important, as these experiments protect the IP of the scientist. In this implementation, we used the Llama3:70b model served via Ollama protecting the privacy and still offering an efficient solution for the text processing and structuring (via function calling).