I needed and/or wanted to have AI read OMR card for my work and was looking for the right model I can download and use it locally. I could find many models trained for OCR, but not at all for OMR. My best candidate was Meta Llama 3.2 11B Vision since this AI takes a prompt with an image. However, It did not work out of box. I needed to train the model for the job, but I did not have resources for training. So, it was not a successful project for me. I just wanted to leave the records what I did for the project.

First of all, I downloaded model from www.llama.com. You are suggested to install Llama Cli first and you can download Llama from Meta or Hugging Face. https://github.com/meta-llama/llama-models

%llama model list

The command above will give you the list of models you can download as below.

After downloading required models. You can install llama-stack to start your local service from your machine. It is possible to start local service with Docker or Conda. When I tried with both of them, I got so many warnings about libraries. So, I tried to alternative of Ollama. (https://ollama.com/)

Ollama makes if possible to run many models in your local machine. Llama 3.2 11B Vision is not supported by the current version of Ollama. So, I have to download prerelease version of Ollama 0.4.0-rc5. You will need at least version 4.0 for Llama Vision.

I was not successful with OMR but I gave me this response when I tested with one of math image as below.

%ollama run x/llama3.2-vision:latest "Can you describe this image?: /path/math.png"

Added image '/path/math.png'
The image depicts a triangle with two angles labeled and one side labeled. The angle at the top of the triangle is 20 degrees, and the angle on the right side of the triangle is 60 degrees. The side opposite the 20-degree angle is labeled "A", the side opposite the 60-degree angle is labeled "B", and the hypotenuse (the side opposite the right angle) is 
labeled "C".

It looks to be good enough to be alt text for accessibility of web pages. Sometimes, it gave me “!!!!!!!!!!!!!!!!!!!!!!!” as response but I am not sure what it meant.