text module

summary module

class summary.SummaryDetector(subdict: dict = {}, model_type: str = 'base', analysis_type: str = 'summary_and_questions', list_of_questions: list[str] | None = None, summary_model=None, summary_vis_processors=None, summary_vqa_model=None, summary_vqa_vis_processors=None, summary_vqa_txt_processors=None, summary_vqa_model_new=None, summary_vqa_vis_processors_new=None, summary_vqa_txt_processors_new=None, device_type: str | None = None)

Bases: AnalysisMethod

all_allowed_model_types = ['base', 'large', 'vqa', 'blip2_t5_pretrain_flant5xxl', 'blip2_t5_pretrain_flant5xl', 'blip2_t5_caption_coco_flant5xl', 'blip2_opt_pretrain_opt2.7b', 'blip2_opt_pretrain_opt6.7b', 'blip2_opt_caption_coco_opt2.7b', 'blip2_opt_caption_coco_opt6.7b']
allowed_analysis_types = ['summary', 'questions', 'summary_and_questions']
allowed_model_types = ['base', 'large', 'vqa']
allowed_new_model_types = ['blip2_t5_pretrain_flant5xxl', 'blip2_t5_pretrain_flant5xl', 'blip2_t5_caption_coco_flant5xl', 'blip2_opt_pretrain_opt2.7b', 'blip2_opt_pretrain_opt6.7b', 'blip2_opt_caption_coco_opt2.7b', 'blip2_opt_caption_coco_opt6.7b']
analyse_image(subdict: dict | None = None, analysis_type: str | None = None, list_of_questions: list[str] | None = None, consequential_questions: bool = False)

Analyse image with blip_caption model.

Parameters:
  • analysis_type (str) – type of the analysis.

  • subdict (dict) – dictionary with analising pictures.

  • list_of_questions (list[str]) – list of questions.

  • consequential_questions (bool) – whether to ask consequential questions. Works only for new BLIP2 models.

Returns:

self.subdict (dict) – dictionary with analysis results.

analyse_questions(list_of_questions: list[str], consequential_questions: bool = False) dict

Generate answers to free-form questions about image written in natural language.

Parameters:
  • list_of_questions (list[str]) – list of questions.

  • consequential_questions (bool) – whether to ask consequential questions. Works only for new BLIP2 models.

Returns:

self.subdict (dict) – dictionary with answers to questions.

analyse_summary(nondeterministic_summaries: bool = True)

Create 1 constant and 3 non deterministic captions for image.

Parameters:

nondeterministic_summaries (bool) – whether to create 3 non deterministic captions.

Returns:

self.subdict (dict) – dictionary with analysis results.

check_model()

Check model type and return appropriate model and preprocessors.

Args:

Returns:
  • model (nn.Module) – model.

  • vis_processors (dict) – visual preprocessor.

  • txt_processors (dict) – text preprocessor.

  • model_old (bool) – whether model is old or new.

load_model(model_type: str)

Load blip_caption model and preprocessors for visual inputs from lavis.models.

Parameters:

model_type (str) – type of the model.

Returns:
  • summary_model (torch.nn.Module) – model.

  • summary_vis_processors (dict) – preprocessors for visual inputs.

load_model_base()

Load base_coco blip_caption model and preprocessors for visual inputs from lavis.models.

Args:

Returns:
  • summary_model (torch.nn.Module) – model.

  • summary_vis_processors (dict) – preprocessors for visual inputs.

load_model_base_blip2_opt_caption_coco_opt67b()

Load BLIP2 model with caption_coco_opt6.7b architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_base_blip2_opt_pretrain_opt67b()

Load BLIP2 model with pretrain_opt6.7b architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_opt_caption_coco_opt27b()

Load BLIP2 model with caption_coco_opt2.7b architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_opt_pretrain_opt27b()

Load BLIP2 model with pretrain_opt2 architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_caption_coco_flant5xl()

Load BLIP2 model with caption_coco_flant5xl architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_pretrain_flant5xl()

Load BLIP2 model with FLAN-T5 XL architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_blip2_t5_pretrain_flant5xxl()

Load BLIP2 model with FLAN-T5 XXL architecture.

Args:

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_model_large()

Load large_coco blip_caption model and preprocessors for visual inputs from lavis.models.

Args:

Returns:
  • summary_model (torch.nn.Module) – model.

  • summary_vis_processors (dict) – preprocessors for visual inputs.

load_new_model(model_type: str)

Load new BLIP2 models.

Parameters:

model_type (str) – type of the model.

Returns:
  • model (torch.nn.Module) – model.

  • vis_processors (dict) – preprocessors for visual inputs.

  • txt_processors (dict) – preprocessors for text inputs.

load_vqa_model()

Load blip_vqa model and preprocessors for visual and text inputs from lavis.models.

Args:

Returns:
  • summary_vqa_model (torch.nn.Module) – model.

  • summary_vqa_vis_processors (dict) – preprocessors for visual inputs.

  • summary_vqa_txt_processors (dict) – preprocessors for text inputs.

multimodal search module

faces module

class faces.EmotionDetector(subdict: dict, emotion_threshold: float = 50.0, race_threshold: float = 50.0, gender_threshold: float = 50.0, accept_disclosure: str = 'DISCLOSURE_AMMICO')

Bases: AnalysisMethod

analyse_image() dict

Performs facial expression analysis on the image.

Returns:

dict – The updated subdict dictionary with analysis results.

analyze_single_face(face: ndarray) dict

Analyzes the features of a single face on the image.

Parameters:

face (np.ndarray) – The face image array.

Returns:

dict – The analysis results for the face.

clean_subdict(result: dict) dict

Cleans the subdict dictionary by converting results into appropriate formats.

Parameters:

result (dict) – The analysis results.

Returns:

dict – The updated subdict dictionary.

facial_expression_analysis() dict

Performs facial expression analysis on the image.

Returns:

dict – The updated subdict dictionary with analysis results.

set_keys() dict

Sets the initial parameters for the analysis.

Returns:

dict – The dictionary with initial parameter values.

wears_mask(face: ndarray) bool

Determines whether a face wears a mask.

Parameters:

face (np.ndarray) – The face image array.

Returns:

bool – True if the face wears a mask, False otherwise.

faces.ethical_disclosure(accept_disclosure: str = 'DISCLOSURE_AMMICO')

Asks the user to accept the ethical disclosure.

Parameters:

accept_disclosure (str) – The name of the disclosure variable (default: “DISCLOSURE_AMMICO”).

color_analysis module

class colors.ColorDetector(subdict: dict, delta_e_method: str = 'CIE 1976')

Bases: AnalysisMethod

analyse_image()

Uses the colorgram library to extract the n most common colors from the images. One problem is, that the most common colors are taken before beeing categorized, so for small values it might occur that the ten most common colors are shades of grey, while other colors are present but will be ignored. Because of this n_colors=100 was chosen as default.

The colors are then matched to the closest color in the CSS3 color list using the delta-e metric. They are then merged into one data frame. The colors can be reduced to a smaller list of colors using the get_color_table function. These colors are: “red”, “green”, “blue”, “yellow”,”cyan”, “orange”, “purple”, “pink”, “brown”, “grey”, “white”, “black”.

Returns:

dict – Dictionary with color names as keys and percentage of color in image as values.

rgb2name(c, merge_color: bool = True, delta_e_method: str = 'CIE 1976') str

Take an rgb color as input and return the closest color name from the CSS3 color list.

Parameters:
  • c (Union[List,tuple]) – RGB value.

  • merge_color (bool, Optional) – Whether color name should be reduced, defaults to True.

Returns:

str – Closest matching color name.

set_keys() dict

cropposts module

cropposts.compute_crop_corner(matches: DMatch, kp1: ndarray, kp2: ndarray, region: int = 30, h_margin: int = 0, v_margin: int = 5, min_match: int = 6) Tuple[int, int] | None

Estimate the position on the image from where to crop.

Parameters:
  • matches (cv2.DMatch) – The matched objects on the image.

  • kp1 (np.ndarray) – Key points of the matches for the reference image.

  • kp2 (np.ndarray) – Key points of the matches for the social media posts.

  • region (int, optional) – Area to consider around the keypoints. Defaults to 30.

  • h_margin (int, optional) – Horizontal margin to subtract from the minimum horizontal position. Defaults to 0.

  • v_margin (int, optional) – Vertical margin to subtract from the minimum vertical position. Defaults to 5.

  • min_match – Minimum number of matches required. Defaults to 6.

Returns:

tuple, optional – Tuple of vertical and horizontal crop corner coordinates.

cropposts.crop_image_from_post(view: ndarray, final_h: int) ndarray

Crop the image part from the social media post.

Parameters:
  • view (np.ndarray) – The image to be cropped.

  • final_h – The horizontal position up to which should be cropped.

Returns:

np.ndarray – The cropped image part.

cropposts.crop_media_posts(files, ref_files, save_crop_dir, plt_match=False, plt_crop=False, plt_image=False) None

Crop social media posts so that comments beyond the first comment/post are cut off.

Parameters:
  • files (list) – List of all the files to be cropped.

  • ref_files (list) – List of all the reference images that signify below which regions should be cropped.

  • save_crop_dir (str) – Directory where to write the cropped social media posts to.

  • plt_match (Bool, optional) – Display the matched areas on the social media post. Defaults to False.

  • plt_crop (Bool, optional) – Display the cropped text part of the social media post. Defaults to False.

  • plt_image (Bool, optional) – Display the image part of the social media post. Defaults to False.

cropposts.crop_posts_from_refs(ref_views: List, view: ndarray, plt_match: bool = False, plt_crop: bool = False, plt_image: bool = False) ndarray

Crop the social media post comments from the image.

Parameters:
  • ref_views (list) – List of all the reference images (as numpy arrays) that signify below which regions should be cropped.

  • view (np.ndarray) – The image to crop.

Returns:

np.ndarray – The cropped social media post.

cropposts.crop_posts_image(ref_view: List, view: ndarray) None | Tuple[ndarray, int, int, int]

Crop the social media post to exclude additional comments. Sometimes also crops the image part of the post - this is put back in later.

Parameters:
  • ref_views (list) – List of all the reference images (as numpy arrays) that signify below which regions should be cropped.

  • view (np.ndarray) – The image to crop.

Returns:

np.ndarray – The cropped social media post.

cropposts.draw_matches(matches: List, img1: ndarray, img2: ndarray, kp1: List[KeyPoint], kp2: List[KeyPoint]) None

Visualize the matches from SIFT.

Parameters:
  • matches (list[cv2.Match]) – List of cv2.Match matches on the image.

  • img1 (np.ndarray) – The reference image.

  • img2 (np.ndarray) – The social media post.

  • kp1 (list[cv2.KeyPoint]) – List of keypoints from the first image.

  • kp2 (list[cv2.KeyPoint]) – List of keypoints from the second image.

cropposts.kp_from_matches(matches, kp1: ndarray, kp2: ndarray) Tuple[Tuple, Tuple]

Extract the match indices from the keypoints.

Parameters:
  • kp1 (np.ndarray) – Key points of the matches,

  • kp2 (np.ndarray) – Key points of the matches,

Returns:
  • tuple – Index of the descriptor in the list of train descriptors.

  • tuple – index of the descriptor in the list of query descriptors.

cropposts.matching_points(img1: ndarray, img2: ndarray) Tuple[DMatch, List[KeyPoint], List[KeyPoint]]

Computes keypoint matches using the SIFT algorithm between two images.

Parameters:
  • img1 (np.ndarray) – The reference image.

  • img2 (np.ndarray) – The social media post.

Returns:
  • cv2.DMatch – List of filtered keypoint matches.

  • cv2.KeyPoint – List of keypoints from the first image.

  • cv2.KeyPoint – List of keypoints from the second image.

cropposts.paste_image_and_comment(crop_post: ndarray, crop_view: ndarray) ndarray

Paste the image part and the text part together without the unecessary comments.

Parameters:
  • crop_post (np.ndarray) – The cropped image part of the social media post.

  • crop_view (np.ndarray) – The cropped text part of the social media post.

Returns:

np.ndarray – The image and text part of the social media post in one image.

utils module

display module