Edit on GitHub

datasources.audio_to_text.audio_to_text

Audio Upload to Text

This data source acts similar to a Preset, but because it needs SearchMedia's validate_query and after_create methods to run, chaining that processor does not work (Presets essentially only run the process and after_process methods of their processors and skip those two datasource only methods).

 1"""
 2Audio Upload to Text
 3
 4This data source acts similar to a Preset, but because it needs SearchMedia's validate_query and after_create methods
 5to run, chaining that processor does not work (Presets essentially only run the process and after_process methods
 6of their processors and skip those two datasource only methods).
 7"""
 8
 9from datasources.media_import.import_media import SearchMedia
10from processors.machine_learning.whisper_speech_to_text import AudioToText
11
12
13class AudioUploadToText(SearchMedia):
14    type = "upload-audio-to-text-search"  # job ID
15    category = "Search"  # category
16    title = "Convert speech to text"  # title displayed in UI
17    description = "Upload your own audio and use OpenAI's Whisper model to create transcripts"  # description displayed in UI
18
19    @classmethod
20    def is_compatible_with(cls, module=None, user=None):
21        #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically
22        # this method is not necessary; if we can adjust that behavior, it ought to function as intended
23
24        # Ensure the Whisper model is available
25        return AudioToText.is_compatible_with(module=module, user=user)
26
27    @classmethod
28    def get_options(cls, parent_dataset=None, user=None):
29        # We need both sets of options for this datasource
30        media_options = SearchMedia.get_options(parent_dataset=parent_dataset, user=user)
31        whisper_options = AudioToText.get_options(parent_dataset=parent_dataset, user=user)
32        media_options.update(whisper_options)
33
34        #TODO: there are some odd formatting issues if we use those derived options
35        # The intro help text is not displayed correct (does not wrap)
36        # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust
37
38        media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. "
39                        "4CAT will use OpenAI's Whisper model to create transcripts."
40                        "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)")
41        media_options["advanced"]["help"] = "Advanced Settings"
42
43        return media_options
44
45    @staticmethod
46    def validate_query(query, request, user):
47        # We need SearchMedia's validate_query to upload the media
48        media_query = SearchMedia.validate_query(query, request, user)
49
50        # Here's the real trick: act like a preset and add another processor to the pipeline
51        media_query["next"] = [{"type": "audio-to-text",
52                         "parameters": query.copy()}]
53        return media_query
class AudioUploadToText(datasources.media_import.import_media.SearchMedia):
14class AudioUploadToText(SearchMedia):
15    type = "upload-audio-to-text-search"  # job ID
16    category = "Search"  # category
17    title = "Convert speech to text"  # title displayed in UI
18    description = "Upload your own audio and use OpenAI's Whisper model to create transcripts"  # description displayed in UI
19
20    @classmethod
21    def is_compatible_with(cls, module=None, user=None):
22        #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically
23        # this method is not necessary; if we can adjust that behavior, it ought to function as intended
24
25        # Ensure the Whisper model is available
26        return AudioToText.is_compatible_with(module=module, user=user)
27
28    @classmethod
29    def get_options(cls, parent_dataset=None, user=None):
30        # We need both sets of options for this datasource
31        media_options = SearchMedia.get_options(parent_dataset=parent_dataset, user=user)
32        whisper_options = AudioToText.get_options(parent_dataset=parent_dataset, user=user)
33        media_options.update(whisper_options)
34
35        #TODO: there are some odd formatting issues if we use those derived options
36        # The intro help text is not displayed correct (does not wrap)
37        # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust
38
39        media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. "
40                        "4CAT will use OpenAI's Whisper model to create transcripts."
41                        "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)")
42        media_options["advanced"]["help"] = "Advanced Settings"
43
44        return media_options
45
46    @staticmethod
47    def validate_query(query, request, user):
48        # We need SearchMedia's validate_query to upload the media
49        media_query = SearchMedia.validate_query(query, request, user)
50
51        # Here's the real trick: act like a preset and add another processor to the pipeline
52        media_query["next"] = [{"type": "audio-to-text",
53                         "parameters": query.copy()}]
54        return media_query

Abstract processor class

A processor takes a finished dataset as input and processes its result in some way, with another dataset set as output. The input thus is a file, and the output (usually) as well. In other words, the result of a processor can be used as input for another processor (though whether and when this is useful is another question).

To determine whether a processor can process a given dataset, you can define a is_compatible_with(FourcatModule module=None, str user=None):) -> bool class method which takes a dataset as argument and returns a bool that determines if this processor is considered compatible with that dataset. For example:


@classmethod def is_compatible_with(cls, module=None, user=None): return module.type == "linguistic-features"

type = 'upload-audio-to-text-search'
category = 'Search'
title = 'Convert speech to text'
description = "Upload your own audio and use OpenAI's Whisper model to create transcripts"
@classmethod
def is_compatible_with(cls, module=None, user=None):
20    @classmethod
21    def is_compatible_with(cls, module=None, user=None):
22        #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically
23        # this method is not necessary; if we can adjust that behavior, it ought to function as intended
24
25        # Ensure the Whisper model is available
26        return AudioToText.is_compatible_with(module=module, user=user)
@classmethod
def get_options(cls, parent_dataset=None, user=None):
28    @classmethod
29    def get_options(cls, parent_dataset=None, user=None):
30        # We need both sets of options for this datasource
31        media_options = SearchMedia.get_options(parent_dataset=parent_dataset, user=user)
32        whisper_options = AudioToText.get_options(parent_dataset=parent_dataset, user=user)
33        media_options.update(whisper_options)
34
35        #TODO: there are some odd formatting issues if we use those derived options
36        # The intro help text is not displayed correct (does not wrap)
37        # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust
38
39        media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. "
40                        "4CAT will use OpenAI's Whisper model to create transcripts."
41                        "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)")
42        media_options["advanced"]["help"] = "Advanced Settings"
43
44        return media_options

Get processor options

This method by default returns the class's "options" attribute, or an empty dictionary. It can be redefined by processors that need more fine-grained options, e.g. in cases where the availability of options is partially determined by the parent dataset's parameters.

Parameters
  • DataSet parent_dataset: An object representing the dataset that the processor would be run on
  • User user: Flask user the options will be displayed for, in case they are requested for display in the 4CAT web interface. This can be used to show some options only to privileges users.
@staticmethod
def validate_query(query, request, user):
46    @staticmethod
47    def validate_query(query, request, user):
48        # We need SearchMedia's validate_query to upload the media
49        media_query = SearchMedia.validate_query(query, request, user)
50
51        # Here's the real trick: act like a preset and add another processor to the pipeline
52        media_query["next"] = [{"type": "audio-to-text",
53                         "parameters": query.copy()}]
54        return media_query

Step 1: Validate query and files

Confirms that the uploaded files exist and that the media type is valid.

Parameters
  • dict query: Query parameters, from client-side.
  • request: Flask request
  • User user: User object of user who has submitted the query
Returns

Safe query parameters