Edit on GitHub

datasources.audio_to_text.audio_to_text

Audio Upload to Text

This data source acts similar to a Preset, but because it needs SearchMedia's validate_query and after_create methods to run, chaining that processor does not work (Presets essentially only run the process and after_process methods of their processors and skip those two datasource only methods).

 1"""
 2Audio Upload to Text
 3
 4This data source acts similar to a Preset, but because it needs SearchMedia's validate_query and after_create methods
 5to run, chaining that processor does not work (Presets essentially only run the process and after_process methods
 6of their processors and skip those two datasource only methods).
 7"""
 8
 9from datasources.media_import.import_media import SearchMedia
10from processors.machine_learning.whisper_speech_to_text import AudioToText
11
12
13class AudioUploadToText(SearchMedia):
14    type = "upload-audio-to-text-search"  # job ID
15    category = "Search"  # category
16    title = "Convert speech to text"  # title displayed in UI
17    description = "Upload your own audio and use OpenAI's Whisper model to create transcripts"  # description displayed in UI
18
19    @classmethod
20    def is_compatible_with(cls, module=None, config=None):
21        #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically
22        # this method is not necessary; if we can adjust that behavior, it ought to function as intended
23
24        # Ensure the Whisper model is available
25        return AudioToText.is_compatible_with(module=module, config=config)
26
27    @classmethod
28    def get_options(cls, *args, **kwargs):
29        # We need both sets of options for this datasource
30        media_options = SearchMedia.get_options(*args, **kwargs)
31        whisper_options = AudioToText.get_options(*args, **kwargs)
32        media_options.update(whisper_options)
33
34        #TODO: there are some odd formatting issues if we use those derived options
35        # The intro help text is not displayed correct (does not wrap)
36        # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust
37
38        media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. "
39                        "4CAT will use OpenAI's Whisper model to create transcripts."
40                        "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)")
41        media_options["advanced"]["help"] = "Advanced Settings"
42
43        return media_options
44
45    @staticmethod
46    def validate_query(query, request, config):
47        # We need SearchMedia's validate_query to upload the media
48        media_query = SearchMedia.validate_query(query, request, config)
49
50        # Here's the real trick: act like a preset and add another processor to the pipeline
51        media_query["next"] = [{"type": "audio-to-text",
52                         "parameters": query.copy()}]
53        return media_query
class AudioUploadToText(datasources.media_import.import_media.SearchMedia):
14class AudioUploadToText(SearchMedia):
15    type = "upload-audio-to-text-search"  # job ID
16    category = "Search"  # category
17    title = "Convert speech to text"  # title displayed in UI
18    description = "Upload your own audio and use OpenAI's Whisper model to create transcripts"  # description displayed in UI
19
20    @classmethod
21    def is_compatible_with(cls, module=None, config=None):
22        #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically
23        # this method is not necessary; if we can adjust that behavior, it ought to function as intended
24
25        # Ensure the Whisper model is available
26        return AudioToText.is_compatible_with(module=module, config=config)
27
28    @classmethod
29    def get_options(cls, *args, **kwargs):
30        # We need both sets of options for this datasource
31        media_options = SearchMedia.get_options(*args, **kwargs)
32        whisper_options = AudioToText.get_options(*args, **kwargs)
33        media_options.update(whisper_options)
34
35        #TODO: there are some odd formatting issues if we use those derived options
36        # The intro help text is not displayed correct (does not wrap)
37        # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust
38
39        media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. "
40                        "4CAT will use OpenAI's Whisper model to create transcripts."
41                        "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)")
42        media_options["advanced"]["help"] = "Advanced Settings"
43
44        return media_options
45
46    @staticmethod
47    def validate_query(query, request, config):
48        # We need SearchMedia's validate_query to upload the media
49        media_query = SearchMedia.validate_query(query, request, config)
50
51        # Here's the real trick: act like a preset and add another processor to the pipeline
52        media_query["next"] = [{"type": "audio-to-text",
53                         "parameters": query.copy()}]
54        return media_query

Abstract processor class

A processor takes a finished dataset as input and processes its result in some way, with another dataset set as output. The input thus is a file, and the output (usually) as well. In other words, the result of a processor can be used as input for another processor (though whether and when this is useful is another question).

To determine whether a processor can process a given dataset, you can define a is_compatible_with(FourcatModule module=None, config=None):) -> bool class method which takes a dataset as argument and returns a bool that determines if this processor is considered compatible with that dataset. For example:

@classmethod
def is_compatible_with(cls, module=None, config=None):
    return module.type == "linguistic-features"
type = 'upload-audio-to-text-search'
category = 'Search'
title = 'Convert speech to text'
description = "Upload your own audio and use OpenAI's Whisper model to create transcripts"
@classmethod
def is_compatible_with(cls, module=None, config=None):
20    @classmethod
21    def is_compatible_with(cls, module=None, config=None):
22        #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically
23        # this method is not necessary; if we can adjust that behavior, it ought to function as intended
24
25        # Ensure the Whisper model is available
26        return AudioToText.is_compatible_with(module=module, config=config)
@classmethod
def get_options(cls, *args, **kwargs):
28    @classmethod
29    def get_options(cls, *args, **kwargs):
30        # We need both sets of options for this datasource
31        media_options = SearchMedia.get_options(*args, **kwargs)
32        whisper_options = AudioToText.get_options(*args, **kwargs)
33        media_options.update(whisper_options)
34
35        #TODO: there are some odd formatting issues if we use those derived options
36        # The intro help text is not displayed correct (does not wrap)
37        # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust
38
39        media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. "
40                        "4CAT will use OpenAI's Whisper model to create transcripts."
41                        "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)")
42        media_options["advanced"]["help"] = "Advanced Settings"
43
44        return media_options

Get processor options

This method by default returns the class's "options" attribute, or an empty dictionary. It can be redefined by processors that need more fine-grained options, e.g. in cases where the availability of options is partially determined by the parent dataset's parameters.

Parameters
  • config:
  • DataSet parent_dataset: An object representing the dataset that the processor would be run on
@staticmethod
def validate_query(query, request, config):
46    @staticmethod
47    def validate_query(query, request, config):
48        # We need SearchMedia's validate_query to upload the media
49        media_query = SearchMedia.validate_query(query, request, config)
50
51        # Here's the real trick: act like a preset and add another processor to the pipeline
52        media_query["next"] = [{"type": "audio-to-text",
53                         "parameters": query.copy()}]
54        return media_query

Step 1: Validate query and files

Confirms that the uploaded files exist and that the media type is valid.

Parameters
  • dict query: Query parameters, from client-side.
  • request: Flask request
  • ConfigManager|None config: Configuration reader (context-aware)