datasources.audio_to_text.audio_to_text
Audio Upload to Text
This data source acts similar to a Preset, but because it needs SearchMedia's validate_query and after_create methods to run, chaining that processor does not work (Presets essentially only run the process and after_process methods of their processors and skip those two datasource only methods).
1""" 2Audio Upload to Text 3 4This data source acts similar to a Preset, but because it needs SearchMedia's validate_query and after_create methods 5to run, chaining that processor does not work (Presets essentially only run the process and after_process methods 6of their processors and skip those two datasource only methods). 7""" 8 9from datasources.media_import.import_media import SearchMedia 10from processors.machine_learning.whisper_speech_to_text import AudioToText 11 12 13class AudioUploadToText(SearchMedia): 14 type = "upload-audio-to-text-search" # job ID 15 category = "Search" # category 16 title = "Convert speech to text" # title displayed in UI 17 description = "Upload your own audio and use OpenAI's Whisper model to create transcripts" # description displayed in UI 18 19 @classmethod 20 def is_compatible_with(cls, module=None, config=None): 21 #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically 22 # this method is not necessary; if we can adjust that behavior, it ought to function as intended 23 24 # Ensure the Whisper model is available 25 return AudioToText.is_compatible_with(module=module, config=config) 26 27 @classmethod 28 def get_options(cls, *args, **kwargs): 29 # We need both sets of options for this datasource 30 media_options = SearchMedia.get_options(*args, **kwargs) 31 whisper_options = AudioToText.get_options(*args, **kwargs) 32 media_options.update(whisper_options) 33 34 #TODO: there are some odd formatting issues if we use those derived options 35 # The intro help text is not displayed correct (does not wrap) 36 # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust 37 38 media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. " 39 "4CAT will use OpenAI's Whisper model to create transcripts." 40 "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)") 41 media_options["advanced"]["help"] = "Advanced Settings" 42 43 return media_options 44 45 @staticmethod 46 def validate_query(query, request, config): 47 # We need SearchMedia's validate_query to upload the media 48 media_query = SearchMedia.validate_query(query, request, config) 49 50 # Here's the real trick: act like a preset and add another processor to the pipeline 51 media_query["next"] = [{"type": "audio-to-text", 52 "parameters": query.copy()}] 53 return media_query
14class AudioUploadToText(SearchMedia): 15 type = "upload-audio-to-text-search" # job ID 16 category = "Search" # category 17 title = "Convert speech to text" # title displayed in UI 18 description = "Upload your own audio and use OpenAI's Whisper model to create transcripts" # description displayed in UI 19 20 @classmethod 21 def is_compatible_with(cls, module=None, config=None): 22 #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically 23 # this method is not necessary; if we can adjust that behavior, it ought to function as intended 24 25 # Ensure the Whisper model is available 26 return AudioToText.is_compatible_with(module=module, config=config) 27 28 @classmethod 29 def get_options(cls, *args, **kwargs): 30 # We need both sets of options for this datasource 31 media_options = SearchMedia.get_options(*args, **kwargs) 32 whisper_options = AudioToText.get_options(*args, **kwargs) 33 media_options.update(whisper_options) 34 35 #TODO: there are some odd formatting issues if we use those derived options 36 # The intro help text is not displayed correct (does not wrap) 37 # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust 38 39 media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. " 40 "4CAT will use OpenAI's Whisper model to create transcripts." 41 "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)") 42 media_options["advanced"]["help"] = "Advanced Settings" 43 44 return media_options 45 46 @staticmethod 47 def validate_query(query, request, config): 48 # We need SearchMedia's validate_query to upload the media 49 media_query = SearchMedia.validate_query(query, request, config) 50 51 # Here's the real trick: act like a preset and add another processor to the pipeline 52 media_query["next"] = [{"type": "audio-to-text", 53 "parameters": query.copy()}] 54 return media_query
Abstract processor class
A processor takes a finished dataset as input and processes its result in some way, with another dataset set as output. The input thus is a file, and the output (usually) as well. In other words, the result of a processor can be used as input for another processor (though whether and when this is useful is another question).
To determine whether a processor can process a given dataset, you can
define a is_compatible_with(FourcatModule module=None, config=None):) -> bool
class
method which takes a dataset as argument and returns a bool that determines
if this processor is considered compatible with that dataset. For example:
@classmethod
def is_compatible_with(cls, module=None, config=None):
return module.type == "linguistic-features"
20 @classmethod 21 def is_compatible_with(cls, module=None, config=None): 22 #TODO: False here does not appear to actually remove the datasource from the "Create dataset" page so technically 23 # this method is not necessary; if we can adjust that behavior, it ought to function as intended 24 25 # Ensure the Whisper model is available 26 return AudioToText.is_compatible_with(module=module, config=config)
28 @classmethod 29 def get_options(cls, *args, **kwargs): 30 # We need both sets of options for this datasource 31 media_options = SearchMedia.get_options(*args, **kwargs) 32 whisper_options = AudioToText.get_options(*args, **kwargs) 33 media_options.update(whisper_options) 34 35 #TODO: there are some odd formatting issues if we use those derived options 36 # The intro help text is not displayed correct (does not wrap) 37 # Advanced Settings uses []() links which do not work on the "Create dataset" page, so we adjust 38 39 media_options["intro"]["help"] = ("Upload audio files here to convert speech to text. " 40 "4CAT will use OpenAI's Whisper model to create transcripts." 41 "\n\nFor information on using advanced settings: [Command Line Arguments (CLI)](https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/transcribe.py#LL374C3-L374C3)") 42 media_options["advanced"]["help"] = "Advanced Settings" 43 44 return media_options
Get processor options
This method by default returns the class's "options" attribute, or an empty dictionary. It can be redefined by processors that need more fine-grained options, e.g. in cases where the availability of options is partially determined by the parent dataset's parameters.
Parameters
- config:
- DataSet parent_dataset: An object representing the dataset that the processor would be run on
46 @staticmethod 47 def validate_query(query, request, config): 48 # We need SearchMedia's validate_query to upload the media 49 media_query = SearchMedia.validate_query(query, request, config) 50 51 # Here's the real trick: act like a preset and add another processor to the pipeline 52 media_query["next"] = [{"type": "audio-to-text", 53 "parameters": query.copy()}] 54 return media_query
Step 1: Validate query and files
Confirms that the uploaded files exist and that the media type is valid.
Parameters
- dict query: Query parameters, from client-side.
- request: Flask request
- ConfigManager|None config: Configuration reader (context-aware)
Inherited Members
- backend.lib.worker.BasicWorker
- BasicWorker
- INTERRUPT_NONE
- INTERRUPT_RETRY
- INTERRUPT_CANCEL
- queue
- log
- manager
- interrupted
- modules
- init_time
- name
- run
- clean_up
- request_interrupt
- is_4cat_class
- datasources.media_import.import_media.SearchMedia
- extension
- is_local
- is_static
- max_workers
- disallowed_characters
- accepted_file_types
- after_create
- process
- get_safe_filename
- backend.lib.processor.BasicProcessor
- db
- job
- dataset
- owner
- source_dataset
- source_file
- config
- is_running_in_preset
- filepath
- work
- after_process
- remove_files
- abort
- iterate_proxied_requests
- push_proxied_request
- flush_proxied_requests
- iterate_archive_contents
- unpack_archive_contents
- extract_archived_file_by_name
- write_csv_items_and_finish
- write_archive_and_finish
- create_standalone
- save_annotations
- map_item_method_available
- get_mapped_item
- is_filter
- get_status
- is_top_dataset
- is_from_collector
- get_extension
- is_rankable
- exclude_followup_processors
- is_4cat_processor