↵ NEWS

Upload An Image And Have It Described For You: How To Try MiniGPT-4 Right Now

Released: 5/27/2023

Author: Aaron Di Blasi, Publisher, Top Tech Tidbits

I want to open by stating that this demonstration can be painfully slow depending upon the number of requests currently in queue and the number of system resources available to you on your current system.

But, with that said, it should work for everyone. It may just take longer for some, depending upon your location and the time of day.

If you have multiple images that you need described, I recommend opening up the demo in separate browser tabs and submitting each request separately so that they can be processed simultaneously. This method is much faster than waiting in queue.

A Quick Bit of Background:

As of this writing, ChatGPT is still text-in, text-out only. GPT-4 was supposed to expand on this by adding image processing to allow it to generate text based on images, but OpenAI has yet to release this feature.

This is where MiniGPT-4 comes in.

This open source project gives us a preview of what the image processing in GPT-4 might be like — and for the most part, it's pretty accurate.

This is not to say that it does not make mistakes, because it does, a few of which I will give examples of below, but I think you will agree that it already describes images in far better detail than any human being currently does, or ever would for that matter. And that's what I believe makes it exciting right now. Especially for people with limited or no vision.

MiniGPT-4 is an open source project that was posted on GitHub to demonstrate vision-language capabilities in an AI system.

Examples of what it can do include generating descriptions of images, writing stories based on images, or even creating websites from handwritten drawings.

Despite what the name implies, MiniGPT-4 is not officially connected to OpenAI or GPT-4.

It was created by a group of Ph.D. students based in Saudi Arabia at the King Abdullah University of Science and Technology.

It's also based on a different large language model (LLM) called Vicuna, which itself was built on the open-source Large Language Model Meta AI (LLaMA). It''s not quite as powerful as ChatGPT, but as graded by GPT-4 itself, Vicuna gets within 90%.

MiniGPT-4 is currently just a demo and is still in its first version. For now, it can be accessed for free via the group's official website.

How To Try It Out:

Step 1: Open a web browser and navigate to this address: https://huggingface.co/spaces/Vision-CAIR/minigpt4.

Hugging Face is a technology company based in New York City that is widely known for their work in natural language processing (NLP) and artificial intelligence (AI), specifically with regard to developing and providing open-source tools for the AI community. Hugging Face provides a platform for AI models to be shared and collaborated on, allowing researchers and developers to access and build upon existing models.

Step 2: Search for "Drop Image Here" or "Click to Upload" and click inside this area. This will bring up your system's device manager for selecting a photo. Navigate to the photo you wish to have described, select it and click "Open."

Step 3: Search for and click the button titled, "Upload and Start Chat." This is a button located just below the area that you used to "Drop Image Here" or "Click to Upload."

Step 4: At this point MiniGPT will take some time to upload your image. Progress is indicated inside of the "Drop Image Here" or "Click to Upload" area.

Step 5: Once the upload has completed, you may now ask MiniGPT-4 to describe the image. Search for "User" and type the following into the field just beneath it: "Describe this image."

Step 6: At this point MiniGPT will take some additional time to process your request. Progress is not indicated but three dots will appear just above the "User" field while it is "thinking."

Step 7: And Voila. You will be provided with a description of the image.

Description Quality:

How good that description is, I'll let you decide, but I will give an example of an image described via ALT text, MiniGPT-4 and a human created long description (by me), so that you can compare them for yourself.

Below is an image. I have deliberately removed the ALT text for this image for the purposes of this demonstration. Please forgive me.

Original ALT Text: When I initially discovered this test image on the web it's ALT text read, "Man on a roof."

MiniGPT-4 Description: This image shows a man dressed in a viking costume standing on the roof of a house. He is holding an axe in one hand and a shield in the other. The sky is clear and blue, and there are trees in the background. The man is wearing a helmet, chain mail armor, and a long beard. He looks determined and ready for battle. The image is well composed, with the man standing in the center of the frame and the axe and shield providing a sense of movement and action. The overall mood of the image is one of adventure and bravery.

Human Created Long Description: This image shows a man dressed in a viking costume standing on the roof of a house. He is holding an axe in one hand and his other hand is against the roof. The sky is clear and blue. There are no trees in the background. The man is wearing a Viking helmet with two horns sticking out of either side, a cloth toga and a long beard. He looks determined and ready for battle. The image is well composed, with the man standing in the center of the frame and the axe providing a sense of movement and action. The overall mood of the image is one of adventure and bravery.

So? What do you think? Good enough for most purposes? I think so too. And the good news is that MiniGPT-4 is just a baby beta. An alpha really. Of all that is to come from mutli-modal generative AI.

Exciting times.

Cheers,

Aaron Di Blasi, PMP

Publisher (2020-Present)
Top Tech Tidbits
The Week's News in Access Technology
https://toptechtidbits.com 🌐

Publisher (2022-Present)
Access Information News
The Week's News in Access Information
https://accessinformationnews.com 🌐

Sr. Project Management Professional (2006 - Present)
Mind Vault Solutions, Ltd.
Innovative ideas. Solutions that perform.
https://mvsltd.com 🌐

Certified:

Digital Marketing Associate
Meta Certified (2022 - Present) ✔

Social Marketing Professional
Hootsuite Certified (2020 - Present) ✔

Email Marketing Professional
Constant Contact Certified (2019 - Present) ✔

Specializing in:

Digital Strategy and Content Marketing ⚙
Social Media Advertising ⚙
Online Fundraising ⚙
ADA, WCAG and Section 508 Compliance ⚙

⇧ Back To Top