AMMICO - AI-based Media and Misinformation Content Analysis Tool

License: MIT GitHub Workflow Status codecov Quality Gate Status Language

This package extracts data from images such as social media posts that contain an image part and a text part. The analysis can generate a very large number of features, depending on the user input. See our paper for a more in-depth description.

This project is currently under development!

Use pre-processed image files such as social media posts with comments and process to collect information:

Text extraction from the images
1. Language detection
2. Translation into English or other languages
3. Cleaning of the text, spell-check
4. Sentiment analysis
5. Named entity recognition
6. Topic analysis
Content extraction from the images
1. Textual summary of the image content (“image caption”) that can be analyzed further using the above tools
2. Feature extraction from the images: User inputs query and images are matched to that query (both text and image query)
3. Question answering about image content
Content extractioni from the videos
1. Textual summary of the video content that can be analyzed further
2. Question answering about video content
Performing person and face recognition in images
1. Face mask detection
2. Probabilistic detection of age, gender and race
3. Emotion recognition
Color analysis
1. Analyse hue and percentage of color on image
Multimodal analysis
1. Find best matches for image content or image similarity
Cropping images to remove comments from posts

Installation

The AMMICO package can be installed using pip:

pip install ammico

This will install the package and its dependencies locally. If after installation you get some errors when running some modules, please follow the instructions in the FAQ.

Usage

The main demonstration notebook can be found in the notebooks folder and also on google colab:

There are further sample notebooks in the notebooks folder for the more experimental features:

Topic analysis: Use the notebook get-text-from-image.ipynb to analyse the topics of the extraced text.
You can run this notebook on google colab:
Place the data files and google cloud vision API key in your google drive to access the data.
To crop social media posts use the cropposts.ipynb notebook. You can run this notebook on google colab:

Features

Text extraction

The text is extracted from the images using google-cloud-vision. For this, you need an API key. Set up your google account following the instructions on the google Vision AI website or as described here. You then need to export the location of the API key as an environment variable:

export GOOGLE_APPLICATION_CREDENTIALS="location of your .json"

The extracted text is then stored under the text key (column when exporting a csv).

Googletrans is used to recognize the language automatically and translate into English. The text language and translated text is then stored under the text_language and text_english key (column when exporting a csv).

If you further want to analyse the text, you have to set the analyse_text keyword to True. In doing so, the text is then processed using spacy (tokenized, part-of-speech, lemma, …). The English text is cleaned from numbers and unrecognized words (text_clean), spelling of the English text is corrected (text_english_correct), and further sentiment and subjectivity analysis are carried out (polarity, subjectivity). The latter two steps are carried out using TextBlob. For more information on the sentiment analysis using TextBlob see here.

The Hugging Face transformers library is used to perform another sentiment analysis, a text summary, and named entity recognition, using the transformers pipeline.

Content extraction

The image and video content (“caption”) is now extracted using the Qwen2.5-VL model. Qwen2.5-VL is a multimodal large language model capable of understanding and generating content from both images and videos. With its help, AMMMICO supports tasks such as image/video summarization and image/video visual question answering, where the model answers users’ questions about the context of a media file.

Emotion recognition

Emotion recognition is carried out using the deepface and retinaface libraries. These libraries detect the presence of faces, as well as provide probabilistic assessment of their age, gender, race, and emotion based on several state-of-the-art models. It is also detected if the person is wearing a face mask - if they are, then no further detection is carried out as the mask affects the assessment acuracy. Because the detection of gender, race and age is carried out in simplistic categories (e.g., for gender, using only “male” and “female”), and because of the ethical implications of such assessments, users can only access this part of the tool if they agree with an ethical disclosure statement (see FAQ). Moreover, once users accept the disclosure, they can further set their own detection confidence threshholds.

Color/hue detection

Color detection is carried out using colorgram.py and colour for the distance metric. The colors can be classified into the main named colors/hues in the English language, that are red, green, blue, yellow, cyan, orange, purple, pink, brown, grey, white, black.

Cropping of posts

Social media posts can automatically be cropped to remove further comments on the page and restrict the textual content to the first comment only.