text module

class text.PostprocessText(mydict: dict | None = None, use_csv: bool = False, csv_path: str | None = None, analyze_text: str = 'text_english')

Bases: object

analyse_topic(return_topics: int = 3) → tuple

Performs topic analysis using BERTopic.

Parameters:: return_topics (int, optional) – Number of topics to return. Defaults to 3.
Returns:: tuple – A tuple containing the topic model, topic dataframe, and most frequent topics.

get_text_df(analyze_text: str) → list

Extracts text from the provided dataframe.

Parameters:: analyze_text (str) – Column name for the text field to analyze.
Returns:: list – A list of text extracted from the dataframe.

get_text_dict(analyze_text: str) → list

Extracts text from the provided dictionary.

Parameters:: analyze_text (str) – Key for the text field to analyze.
Returns:: list – A list of text extracted from the dictionary.

class text.TextAnalyzer(csv_path: str, column_key: str | None = None, csv_encoding: str = 'utf-8')

Bases: object

Used to get text from a csv and then run the TextDetector on it.

read_csv() → dict

Read the CSV file and return the dictionary with the text entries.

Returns:: dict – The dictionary with the text entries.

class text.TextDetector(subdict: dict, analyse_text: bool = False, skip_extraction: bool = False, model_names: list | None = None, revision_numbers: list | None = None, accept_privacy: str = 'PRIVACY_AMMICO')

Bases: AnalysisMethod

analyse_image() → dict

Perform text extraction and analysis of the text.

Returns:: dict – The updated dictionary with text analysis results.

clean_text(): Clean the text from unrecognized words and any numbers.

get_text_from_image(): Detect text on the image using Google Cloud Vision API.

remove_linebreaks(): Remove linebreaks from original and translated text.

set_keys() → dict

Set the default keys for text analysis.

Returns:: dict – The dictionary with default text keys.

text_ner(): Perform named entity recognition on the text using the Transformers pipeline.

text_sentiment_transformers(): Perform text classification for sentiment using the Transformers pipeline.

text_summary(): Generate a summary of the text using the Transformers pipeline.

translate_text(): Translate the detected text to English using the Translator object.

text.privacy_disclosure(accept_privacy: str = 'PRIVACY_AMMICO')

Asks the user to accept the privacy statement.

Parameters:: accept_privacy (str) – The name of the disclosure variable (default: “PRIVACY_AMMICO”).

summary module

class summary.SummaryDetector(subdict: dict = {}, model_type: str = 'base', analysis_type: str = 'summary_and_questions', list_of_questions: list[str] | None = None, summary_model=None, summary_vis_processors=None, summary_vqa_model=None, summary_vqa_vis_processors=None, summary_vqa_txt_processors=None, summary_vqa_model_new=None, summary_vqa_vis_processors_new=None, summary_vqa_txt_processors_new=None, device_type: str | None = None)

Bases: AnalysisMethod

all_allowed_model_types = ['base', 'large', 'vqa', 'blip2_t5_pretrain_flant5xxl', 'blip2_t5_pretrain_flant5xl', 'blip2_t5_caption_coco_flant5xl', 'blip2_opt_pretrain_opt2.7b', 'blip2_opt_pretrain_opt6.7b', 'blip2_opt_caption_coco_opt2.7b', 'blip2_opt_caption_coco_opt6.7b']

allowed_analysis_types = ['summary', 'questions', 'summary_and_questions']

allowed_model_types = ['base', 'large', 'vqa']

allowed_new_model_types = ['blip2_t5_pretrain_flant5xxl', 'blip2_t5_pretrain_flant5xl', 'blip2_t5_caption_coco_flant5xl', 'blip2_opt_pretrain_opt2.7b', 'blip2_opt_pretrain_opt6.7b', 'blip2_opt_caption_coco_opt2.7b', 'blip2_opt_caption_coco_opt6.7b']

analyse_image(subdict: dict | None = None, analysis_type: str | None = None, list_of_questions: list[str] | None = None, consequential_questions: bool = False)

Analyse image with blip_caption model.

Parameters:

analysis_type (str) – type of the analysis.
subdict (dict) – dictionary with analising pictures.
list_of_questions (list[str]) – list of questions.
consequential_questions (bool) – whether to ask consequential questions. Works only for new BLIP2 models.

Returns:

self.subdict (dict) – dictionary with analysis results.

analyse_questions(list_of_questions: list[str], consequential_questions: bool = False) → dict

Generate answers to free-form questions about image written in natural language.

Parameters:

list_of_questions (list[str]) – list of questions.
consequential_questions (bool) – whether to ask consequential questions. Works only for new BLIP2 models.

Returns:

self.subdict (dict) – dictionary with answers to questions.

analyse_summary(nondeterministic_summaries: bool = True)

Create 1 constant and 3 non deterministic captions for image.

Parameters:: nondeterministic_summaries (bool) – whether to create 3 non deterministic captions.
Returns:: self.subdict (dict) – dictionary with analysis results.

check_model()

Check model type and return appropriate model and preprocessors.

Args:

Returns:

model (nn.Module) – model.
vis_processors (dict) – visual preprocessor.
txt_processors (dict) – text preprocessor.
model_old (bool) – whether model is old or new.

load_model(model_type: str)

Load blip_caption model and preprocessors for visual inputs from lavis.models.

Parameters:

model_type (str) – type of the model.

Returns:

summary_model (torch.nn.Module) – model.
summary_vis_processors (dict) – preprocessors for visual inputs.

load_model_base()

Load base_coco blip_caption model and preprocessors for visual inputs from lavis.models.

Args:

Returns:

summary_model (torch.nn.Module) – model.
summary_vis_processors (dict) – preprocessors for visual inputs.

load_model_base_blip2_opt_caption_coco_opt67b()

Load BLIP2 model with caption_coco_opt6.7b architecture.

Args:

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_model_base_blip2_opt_pretrain_opt67b()

Load BLIP2 model with pretrain_opt6.7b architecture.

Args:

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_opt_caption_coco_opt27b()

Load BLIP2 model with caption_coco_opt2.7b architecture.

Args:

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_opt_pretrain_opt27b()

Load BLIP2 model with pretrain_opt2 architecture.

Args:

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_caption_coco_flant5xl()

Load BLIP2 model with caption_coco_flant5xl architecture.

Args:

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_pretrain_flant5xl()

Load BLIP2 model with FLAN-T5 XL architecture.

Args:

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_pretrain_flant5xxl()

Load BLIP2 model with FLAN-T5 XXL architecture.

Args:

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_model_large()

Load large_coco blip_caption model and preprocessors for visual inputs from lavis.models.

Args:

Returns:

summary_model (torch.nn.Module) – model.
summary_vis_processors (dict) – preprocessors for visual inputs.

load_new_model(model_type: str)

Load new BLIP2 models.

Parameters:

model_type (str) – type of the model.

Returns:

model (torch.nn.Module) – model.
vis_processors (dict) – preprocessors for visual inputs.
txt_processors (dict) – preprocessors for text inputs.

load_vqa_model()

Load blip_vqa model and preprocessors for visual and text inputs from lavis.models.

Args:

Returns:

summary_vqa_model (torch.nn.Module) – model.
summary_vqa_vis_processors (dict) – preprocessors for visual inputs.
summary_vqa_txt_processors (dict) – preprocessors for text inputs.

multimodal search module

class multimodal_search.MultimodalSearch(subdict: dict)

Bases: AnalysisMethod

compute_gradcam_batch(model: Module, visual_input: Tensor, text_input: str, tokenized_text: Tensor, block_num: str = 6) → tuple

Compute gradcam for itm model.

Parameters:

model (torch.nn.Module) – model.
visual_input (torch.Tensor) – tensors of images features stacked in device.
text_input (str) – text.
tokenized_text (torch.Tensor) – tokenized text.
block_num (int) – number of block. Default: 6.

Returns:

gradcam (torch.Tensor) – gradcam.
output (torch.Tensor) – output of model.

extract_image_features_basic(model, images_tensors: Tensor) → Tensor

Extract image features from images_tensors with blip_feature_extractor or albef_feature_extractor model.

Parameters:

model (torch.nn.Module) – model.
images_tensors (torch.Tensor) – tensors of images stacked in device.

Returns:

features_image_stacked (torch.Tensor) – tensors of images features stacked in device.

extract_image_features_blip2(model, images_tensors: Tensor) → Tensor

Extract image features from images_tensors with blip2_feature_extractor model.

Parameters:

model (torch.nn.Module) – model.
images_tensors (torch.Tensor) – tensors of images stacked in device.

Returns:

features_image_stacked (torch.Tensor) – tensors of images features stacked in device.

extract_image_features_clip(model, images_tensors: Tensor) → Tensor

Extract image features from images_tensors with clip_feature_extractor model.

Parameters:

model (torch.nn.Module) – model.
images_tensors (torch.Tensor) – tensors of images stacked in device.

Returns:

features_image_stacked (torch.Tensor) – tensors of images features stacked in device.

extract_text_features(model, text_input: str) → Tensor

Extract text features from text_input with feature_extractor model.

Parameters:

model (torch.nn.Module) – model.
text_input (str) – text.

Returns:

features_text (torch.Tensor) – tensors of text features.

get_att_map(img: ndarray, att_map: ndarray, blur: bool = True, overlap: bool = True) → ndarray

Get attention map.

Parameters:

img (np.ndarray) – image.
att_map (np.ndarray) – attention map.
blur (bool) – blur attention map. Default: True.
overlap (bool) – overlap attention map with image. Default: True.

Returns:

att_map (np.ndarray) – attention map.

get_pathes_from_query(query: dict[str, str]) → tuple

Get pathes and image names from query.

Parameters:

query (dict) – query.

Returns:

paths (list) – list of pathes.
image_names (list) – list of image names.

image_text_match_reordering(search_query: list[dict[str, str]], itm_model_type: str, image_keys: list, sorted_lists: list[list], batch_size: int = 1, need_grad_cam: bool = False) → tuple

Reorder images with itm model.

Parameters:

search_query (list) – list of querys.
itm_model_type (str) – type of the model.
image_keys (list) – sorted list of image keys.
sorted_lists (list) – sorted list of similarity.
batch_size (int) – batch size. Default: 1.
need_grad_cam (bool) – need gradcam. Default: False. blip2_coco model does not yet work with gradcam.

Returns:

itm_scores2 – list of itm scores.
image_gradcam_with_itm – dict of image names and gradcam.

itm_text_precessing(search_query: list[dict[str, str]]) → list

Process text querys for itm model.

Parameters:: search_query (list) – list of querys.
Returns:: text_query_index (list) – list of indexes of text querys.

load_feature_extractor_model_albef(device: str = 'cpu')

Load base albef_feature_extractor model and preprocessors for visual and text inputs from lavis.models.

Parameters: