text module

class text.PostprocessText(mydict: dict | None = None, use_csv: bool = False, csv_path: str | None = None, analyze_text: str = 'text_english')

Bases: object

analyse_topic(return_topics: int = 3) tuple

Performs topic analysis using BERTopic.

Parameters:

return_topics (int, optional) – Number of topics to return. Defaults to 3.

Returns:

tuple – A tuple containing the topic model, topic dataframe, and most frequent topics.

get_text_df(analyze_text: str) list

Extracts text from the provided dataframe.

Parameters:

analyze_text (str) – Column name for the text field to analyze.

Returns:

list – A list of text extracted from the dataframe.

get_text_dict(analyze_text: str) list

Extracts text from the provided dictionary.

Parameters:

analyze_text (str) – Key for the text field to analyze.

Returns:

list – A list of text extracted from the dictionary.

class text.TextDetector(subdict: dict, analyse_text: bool = False, model_names: list | None = None, revision_numbers: list | None = None)

Bases: AnalysisMethod

analyse_image() dict

Perform text extraction and analysis of the text.

Returns:

dict – The updated dictionary with text analysis results.

clean_text()

Clean the text from unrecognized words and any numbers.

get_text_from_image()

Detect text on the image using Google Cloud Vision API.

remove_linebreaks()

Remove linebreaks from original and translated text.

set_keys() dict

Set the default keys for text analysis.

Returns:

dict – The dictionary with default text keys.

text_ner()

Perform named entity recognition on the text using the Transformers pipeline.

text_sentiment_transformers()

Perform text classification for sentiment using the Transformers pipeline.

text_summary()

Generate a summary of the text using the Transformers pipeline.

translate_text()

Translate the detected text to English using the Translator object.

summary module

class summary.SummaryDetector(subdict: dict = {}, model_type: str = 'base', analysis_type: str = 'summary_and_questions', list_of_questions: list[str] | None = None, summary_model=None, summary_vis_processors=None, summary_vqa_model=None, summary_vqa_vis_processors=None, summary_vqa_txt_processors=None, summary_vqa_model_new=None, summary_vqa_vis_processors_new=None, summary_vqa_txt_processors_new=None, device_type: str | None = None)

Bases: AnalysisMethod

all_allowed_model_types = ['base', 'large', 'vqa', 'blip2_t5_pretrain_flant5xxl', 'blip2_t5_pretrain_flant5xl', 'blip2_t5_caption_coco_flant5xl', 'blip2_opt_pretrain_opt2.7b', 'blip2_opt_pretrain_opt6.7b', 'blip2_opt_caption_coco_opt2.7b', 'blip2_opt_caption_coco_opt6.7b']
allowed_analysis_types = ['summary', 'questions', 'summary_and_questions']
allowed_model_types = ['base', 'large', 'vqa']
allowed_new_model_types = ['blip2_t5_pretrain_flant5xxl', 'blip2_t5_pretrain_flant5xl', 'blip2_t5_caption_coco_flant5xl', 'blip2_opt_pretrain_opt2.7b', 'blip2_opt_pretrain_opt6.7b', 'blip2_opt_caption_coco_opt2.7b', 'blip2_opt_caption_coco_opt6.7b']
analyse_image(subdict: dict | None = None, analysis_type: str | None = None, list_of_questions: list[str] | None = None, consequential_questions: bool = False)

Analyse image with blip_caption model.

Parameters:
  • analysis_type (str) – type of the analysis.

  • subdict (dict) – dictionary with analising pictures.

  • list_of_questions (list[str]) – list of questions.

  • consequential_questions (bool) – whether to ask consequential questions. Works only for new BLIP2 models.

Returns:

self.subdict (dict) – dictionary with analysis results.

analyse_questions(list_of_questions: list[str], consequential_questions: bool = False) dict

Generate answers to free-form questions about image written in natural language.

Parameters:
  • list_of_questions (list[str]) – list of questions.

  • consequential_questions (bool) – whether to ask consequential questions. Works only for new BLIP2 models.

Returns:

self.subdict (dict) – dictionary with answers to questions.

analyse_summary(nondeterministic_summaries: bool = True)

Create 1 constant and 3 non deterministic captions for image.

Parameters:

nondeterministic_summaries (bool) – whether to create 3 non deterministic captions.

Returns:

self.subdict (dict) – dictionary with analysis results.

check_model()

Check model type and return appropriate model and preprocessors.

Args:

Returns:
  • model (nn.Module) – model.

  • vis_processors (dict) – visual preprocessor.

  • txt_processors (dict) – text preprocessor.

  • model_old (bool) – whether model is old or new.

load_model(model_type: str)

Load blip_caption model and preprocessors for visual inputs from lavis.models.

Parameters:

model_type (str) – type of the model.

Returns:
  • summary_model (torch.nn.Module) – model.

  • summary_vis_processors (dict) – preprocessors for visual inputs.

load_model_base()

Load base_coco blip_caption model and preprocessors for visual inputs from lavis.models.

Args:

Returns:
  • summary_model (torch.nn.Module) – model.

  • summary_vis_processors (dict) – preprocessors for visual inputs.

load_model_base_blip2_opt_caption_coco_opt67b()

Load BLIP2 model with caption_coco_opt6.7b architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_base_blip2_opt_pretrain_opt67b()

Load BLIP2 model with pretrain_opt6.7b architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_opt_caption_coco_opt27b()

Load BLIP2 model with caption_coco_opt2.7b architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_opt_pretrain_opt27b()

Load BLIP2 model with pretrain_opt2 architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_caption_coco_flant5xl()

Load BLIP2 model with caption_coco_flant5xl architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_pretrain_flant5xl()

Load BLIP2 model with FLAN-T5 XL architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_pretrain_flant5xxl()

Load BLIP2 model with FLAN-T5 XXL architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_large()

Load large_coco blip_caption model and preprocessors for visual inputs from lavis.models.

Args:

Returns:
  • summary_model (torch.nn.Module) – model.

  • summary_vis_processors (dict) – preprocessors for visual inputs.

load_new_model(model_type: str)

Load new BLIP2 models.

Parameters:

model_type (str) – type of the model.

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_vqa_model()

Load blip_vqa model and preprocessors for visual and text inputs from lavis.models.

Args:

Returns:
  • summary_vqa_model (torch.nn.Module) – model.

  • summary_vqa_vis_processors (dict) – preprocessors for visual inputs.

  • summary_vqa_txt_processors (dict) – preprocessors for text inputs.

faces module

class faces.EmotionDetector(subdict: dict, emotion_threshold: float = 50.0, race_threshold: float = 50.0)

Bases: AnalysisMethod

analyse_image() dict

Performs facial expression analysis on the image.

Returns:

dict – The updated subdict dictionary with analysis results.

analyze_single_face(face: ndarray) dict

Analyzes the features of a single face.

Parameters:

face (np.ndarray) – The face image array.

Returns:

dict – The analysis results for the face.

clean_subdict(result: dict) dict

Cleans the subdict dictionary by converting results into appropriate formats.

Parameters:

result (dict) – The analysis results.

Returns:

dict – The updated subdict dictionary.

facial_expression_analysis() dict

Performs facial expression analysis on the image.

Returns:

dict – The updated subdict dictionary with analysis results.

set_keys() dict

Sets the initial parameters for the analysis.

Returns:

dict – The dictionary with initial parameter values.

wears_mask(face: ndarray) bool

Determines whether a face wears a mask.

Parameters:

face (np.ndarray) – The face image array.

Returns:

bool – True if the face wears a mask, False otherwise.

color_analysis module

class colors.ColorDetector(subdict: dict, delta_e_method: str = 'CIE 1976')

Bases: AnalysisMethod

analyse_image()

Uses the colorgram library to extract the n most common colors from the images. One problem is, that the most common colors are taken before beeing categorized, so for small values it might occur that the ten most common colors are shades of grey, while other colors are present but will be ignored. Because of this n_colors=100 was chosen as default.

The colors are then matched to the closest color in the CSS3 color list using the delta-e metric. They are then merged into one data frame. The colors can be reduced to a smaller list of colors using the get_color_table function. These colors are: “red”, “green”, “blue”, “yellow”,”cyan”, “orange”, “purple”, “pink”, “brown”, “grey”, “white”, “black”.

Returns:

dict – Dictionary with color names as keys and percentage of color in image as values.

rgb2name(c, merge_color: bool = True, delta_e_method: str = 'CIE 1976') str

Take an rgb color as input and return the closest color name from the CSS3 color list.

Parameters:
  • c (Union[List,tuple]) – RGB value.

  • merge_color (bool, Optional) – Whether color name should be reduced, defaults to True.

Returns:

str – Closest matching color name.

set_keys() dict

cropposts module

cropposts.compute_crop_corner(matches: DMatch, kp1: ndarray, kp2: ndarray, region: int = 30, h_margin: int = 0, v_margin: int = 5, min_match: int = 6) Tuple[int, int] | None

Estimate the position on the image from where to crop.

Parameters:
  • matches (cv2.DMatch) – The matched objects on the image.

  • kp1 (np.ndarray) – Key points of the matches for the reference image.

  • kp2 (np.ndarray) – Key points of the matches for the social media posts.

  • region (int, optional) – Area to consider around the keypoints. Defaults to 30.

  • h_margin (int, optional) – Horizontal margin to subtract from the minimum horizontal position. Defaults to 0.

  • v_margin (int, optional) – Vertical margin to subtract from the minimum vertical position. Defaults to 5.

  • min_match – Minimum number of matches required. Defaults to 6.

Returns:

tuple, optional – Tuple of vertical and horizontal crop corner coordinates.

cropposts.crop_image_from_post(view: ndarray, final_h: int) ndarray

Crop the image part from the social media post.

Parameters:
  • view (np.ndarray) – The image to be cropped.

  • final_h – The horizontal position up to which should be cropped.

Returns:

np.ndarray – The cropped image part.

cropposts.crop_media_posts(files, ref_files, save_crop_dir, plt_match=False, plt_crop=False, plt_image=False) None

Crop social media posts so that comments beyond the first comment/post are cut off.

Parameters:
  • files (list) – List of all the files to be cropped.

  • ref_files (list) – List of all the reference images that signify below which regions should be cropped.

  • save_crop_dir (str) – Directory where to write the cropped social media posts to.

  • plt_match (Bool, optional) – Display the matched areas on the social media post. Defaults to False.

  • plt_crop (Bool, optional) – Display the cropped text part of the social media post. Defaults to False.

  • plt_image (Bool, optional) – Display the image part of the social media post. Defaults to False.

cropposts.crop_posts_from_refs(ref_views: List, view: ndarray, plt_match: bool = False, plt_crop: bool = False, plt_image: bool = False) ndarray

Crop the social media post comments from the image.

Parameters:
  • ref_views (list) – List of all the reference images (as numpy arrays) that signify below which regions should be cropped.

  • view (np.ndarray) – The image to crop.

Returns:

np.ndarray – The cropped social media post.

cropposts.crop_posts_image(ref_view: List, view: ndarray) None | Tuple[ndarray, int, int, int]

Crop the social media post to exclude additional comments. Sometimes also crops the image part of the post - this is put back in later.

Parameters:
  • ref_views (list) – List of all the reference images (as numpy arrays) that signify below which regions should be cropped.

  • view (np.ndarray) – The image to crop.

Returns:

np.ndarray – The cropped social media post.

cropposts.draw_matches(matches: List, img1: ndarray, img2: ndarray, kp1: List[KeyPoint], kp2: List[KeyPoint]) None

Visualize the matches from SIFT.

Parameters:
  • matches (list[cv2.Match]) – List of cv2.Match matches on the image.

  • img1 (np.ndarray) – The reference image.

  • img2 (np.ndarray) – The social media post.

  • kp1 (list[cv2.KeyPoint]) – List of keypoints from the first image.

  • kp2 (list[cv2.KeyPoint]) – List of keypoints from the second image.

cropposts.kp_from_matches(matches, kp1: ndarray, kp2: ndarray) Tuple[Tuple, Tuple]

Extract the match indices from the keypoints.

Parameters:
  • kp1 (np.ndarray) – Key points of the matches,

  • kp2 (np.ndarray) – Key points of the matches,

Returns:
  • tuple – Index of the descriptor in the list of train descriptors.

  • tuple – index of the descriptor in the list of query descriptors.

cropposts.matching_points(img1: ndarray, img2: ndarray) Tuple[DMatch, List[KeyPoint], List[KeyPoint]]

Computes keypoint matches using the SIFT algorithm between two images.

Parameters:
  • img1 (np.ndarray) – The reference image.

  • img2 (np.ndarray) – The social media post.

Returns:
  • cv2.DMatch – List of filtered keypoint matches.

  • cv2.KeyPoint – List of keypoints from the first image.

  • cv2.KeyPoint – List of keypoints from the second image.

cropposts.paste_image_and_comment(crop_post: ndarray, crop_view: ndarray) ndarray

Paste the image part and the text part together without the unecessary comments.

Parameters:
  • crop_post (np.ndarray) – The cropped image part of the social media post.

  • crop_view (np.ndarray) – The cropped text part of the social media post.

Returns:

np.ndarray – The image and text part of the social media post in one image.

utils module

class utils.AnalysisMethod(subdict: dict)

Bases: object

Base class to be inherited by all analysis methods.

analyse_image()
set_keys()
class utils.DownloadResource(**kwargs)

Bases: object

A remote resource that needs on demand downloading.

We use this as a wrapper to the pooch library. The wrapper registers each data file and allows prefetching through the CLI entry point ammico_prefetch_models.

get()
resources = []
utils.ammico_prefetch_models()

Prefetch all the download resources

utils.append_data_to_dict(mydict: dict) dict

Append entries from nested dictionaries to keys in a global dict.

utils.check_for_missing_keys(mydict: dict) dict

Check the nested dictionary for any missing keys in the subdicts.

Parameters:

mydict (dict) – The nested dictionary with keys to check.

Returns:

dict – The dictionary with keys appended.

utils.dump_df(mydict: dict) DataFrame

Utility to dump the dictionary into a dataframe.

utils.find_files(path: str | None = None, pattern=['png', 'jpg', 'jpeg', 'gif', 'webp', 'avif', 'tiff'], recursive: bool = True, limit=20, random_seed: int | None = None) dict

Find image files on the file system.

Parameters:
  • path (str, optional) – The base directory where we are looking for the images. Defaults to None, which uses the ammico data directory if set or the current working directory otherwise.

  • pattern (str|list, optional) – The naming pattern that the filename should match. Use either ‘.ext’ or just ‘ext’ Defaults to [“png”, “jpg”, “jpeg”, “gif”, “webp”, “avif”,”tiff”]. Can be used to allow other patterns or to only include specific prefixes or suffixes.

  • recursive (bool, optional) – Whether to recurse into subdirectories. Default is set to True.

  • limit (int/list, optional) – The maximum number of images to be found. Provide a list or tuple of length 2 to batch the images. Defaults to 20. To return all images, set to None or -1.

  • random_seed (int, optional) – The random seed to use for shuffling the images. If None is provided the data will not be shuffeled. Defaults to None.

Returns:

dict – A nested dictionary with file ids and all filenames including the path.

utils.get_color_table()
utils.get_dataframe(mydict: dict) DataFrame
utils.initialize_dict(filelist: list) dict

Initialize the nested dictionary for all the found images.

Parameters:

filelist (list) – The list of files to be analyzed, including their paths.

Returns:

dict – The nested dictionary with all image ids and their paths.

utils.is_interactive()

Check if we are running in an interactive environment.

utils.iterable(arg)

display module

class display.AnalysisExplorer(mydict: dict)

Bases: object

run_server(port: int = 8050) None

Run the Dash server to start the analysis explorer.

Parameters:

port (int, optional) – The port number to run the server on (default: 8050).

update_picture(img_path: str)

Callback function to update the displayed image.

Parameters:

img_path (str) – The path of the selected image.

Returns:

Union[PIL.PngImagePlugin, None] – The image object to be displayed or None if the image path is