Skip to content

Fine tuning a model on the local machine

Mike Saunders ยท 5th June 2026

As part of the index cards project, we feed images of index cards to a vision language model in order to extract structured metadata - headings, corrections, manuscript numbers, folios, descriptions, etc.

An example of an index card scan

The trouble with these cards is the versos - all the blank pages and covers are scanned and sent to the model, and as it can sometimes see faint text, we get a lot of false positives. An example of a blank index card verso An example of an index card cover

Daniel did some training on a model to try and reduce these false-positives, but I found that a lot were still getting through. so I did the training again. i'll skip over the technical details for now and go to the headlines:

  • Previously, 20% were false positives in the sample run. Now 0% are false positives.
  • Every part of this pipeline has permissive licensing, allowing for open sourcing and sharing on Hugging Face.
  • The fine tuning was done fully in house on the Framework AI machine (some parts of the previous training used Hugging Face jobs).
  • Energy consumption was about 0.5 kWh. Because it was pretty windy yesterday, we can see from the Carbon Intensity API that South Scotland's grid was about 88% wind energy. Therefore the carbon cost of fine-tuning the model was close to zero.

A screenshot of a pie chart showing 88% wind energy, 3% solar, and 3% nuclear energy supply in the south scotland region

Up until this point we have used the framework machine for inference - running pre-trained existing models to generate metadata outputs - but this is the first time it's been used to fine-tune a tool specific to our scans and requirements.

This isn't a flashy model, and literally just draws boxes around index cards, but I think it's kind of an exciting milestone.

An example of a successful green bounding box around an index card