The AI landscape is shifting. Open-source multimodal models like Llama 3.2 are proving that smaller, strategically fine-tuned models can match or exceed the performance of expensive closed-source solutions. With Impulse AI, you can harness Llama 3.2 11B Vision’s multimodal capabilities on your own data, building custom AI models that understand both text and images are entirely yours. This is a comprehensive guide to fine-tune the pre-trained Llama 3.2 11B Vision Instruct model using Impulse SDK or the Web App. Training on Impulse is simplified - it’s just a two-step process. Impulse AI orchestrates everything else for you.

Key Takeaways

  • Prepare dataset and upload to impulse platform.
  • Submit fine-tuning job with custom train parameters using SDK or Web App.
  • Download and evaluate fine tuned model.

Prerequisites

export IMPSDK_API_KEY=your_api_key

Dataset Preparation

Dataset preparation is the most crucial step for fine-tuning on the Impulse AI platform. Getting this step right is key to achieving the desired results from fine-tuning. For a more comprehensive guide on supported data formats and preparation methods, refer to the dataset guide.

Fine Tune

The fine-tuning parameters are flexible, allowing you to specify parameters like batch size, learning rate, the number of epochs, seed and shuffle. We support LoRA, QLoRA and Full fine tuning. Method 1: Fine Tune via Impulse SDK
import os
import asyncio
from impulse.api_sdk.sdk import ImpulseSDK
from impluse.api_sdk.models import (
    FineTuningJobCreate,FineTuningJobParameters
)

async def main():
    async with ImpulseSDK(os.environ.get("IMPSDK_API_KEY")) as client:
        job = await client.fine_tuning.create_fine_tuning_job(FineTuningJobCreate(
            base_model_name="llm_llama3_2_vision_11b",
            dataset_name="<dataset-name>",
            name="<job-name>",
            type="<fine-tune mode>",
            parameters=FineTuningJobParameters(
                batch_size=2,
                shuffle=True,
                num_epochs=1,
                lr=2e-5,
                seed=42
            )
        ))
        print(f"Fine-tuning job started: {job}")

asyncio.run(main())
Method 2: Fine Tune via Web App
  1. Login to the Impulse Dashboard.
  2. Navigate to Fine-Tuning Tab in the left panel.
  3. Click on “Create Job”.
Sit back & relax while we finish training and provide you with fine tuned model parameters.😃

Monitoring Jobs

Job status can be retrieved in the following ways. Method 1:Impulse SDK
import os
import asyncio
from impulse.api_sdk.sdk import ImpulseSDK

async def main():
    async with ImpulseSDK(os.environ.get("IMPSDK_API_KEY")) as client:
        jobs = await client.fine_tuning.list_fine_tuning_jobs()
        print("Fine-tuning jobs:", jobs)

asyncio.run(main())
Method 2:Web App Job status is visible under the Fine-Tuning section on the Impulse Dashboard.

Post Training

Fine-tuned model weights are available for download via the Impulse Dashboard on the Fine-Tuning page. Note: In-house evaluation and inference capabilities will be available soon on Impulse AI. Our team is currently building these inference features.

Quick Guide to Inference

Inference on dowloaded weights can be performed using Hugging Face Transformers library. The sample script below demonstrates how to run inference locally or on a hosted machine once the model weights are available to that machine.
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import sys

# Usage:
# python predict.py "Which number should be written in place of the question mark??" 1.jpg

query = sys.argv[1]
image_path = sys.argv[2] if len(sys.argv) > 2 else None

# Load Llama 3.2 Vision 11B model
model_path = "<path_to_your_finetuned_model>"
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

# Generate predictions from the model
def generate_answer(prompt, img_path=None):
    if img_path:
        image = Image.open(img_path)
        inputs = processor(prompt, image, return_tensors="pt")
    else:
        inputs = processor(prompt, return_tensors="pt")
    outputs = model.generate(inputs['input_ids'], max_length=50)
    return processor.decode(outputs[0], skip_special_tokens=True)

# Print model output
print(generate_answer(query, image_path))
Sample Inference for model fine-tuned on Mathvision dataset. Refer to dataset guide.
python predict.py "Which number should be written in place of the question mark??" 1.jpg

Conclusion

Using the Impulse SDK, you can quickly fine-tune open-source multimodal models like Llama 3.2 vision 11B for specific downstream tasks, creating faster, more accurate models at a fraction of the cost of closed-source alternatives. The flexibility of Impulse AI’s fine-tuning API allows you to customize the entire process, from dataset management to model deployment. For more details, check out our full documentation or explore the PyPi repo to get started.