Side projects / Poltergeist

Poltergeist is a hackathon concept for testing vision-language models in high-stakes document workflows. Instead of synthetic noise, it focuses on production-like issues such as stains, creases, blur, compression, and occlusion.

The product story is intentionally simple for cross-functional teams: set up an experiment, run realistic perturbations, inspect failure patterns, and feed those cases back into model improvement. The screens below walk through that end-to-end loop.

Open the live demo (best viewed on desktop). I designed and built this concept in a focused 10-hour sprint.

Landing screen
The landing page sets context quickly: why vision models break in the real world and why realistic adversarial testing matters. The call-to-action and hero preview help non-ML stakeholders understand the product in one glance.

Poltergeist landing page with product value proposition and hero dashboard preview

Projects list
This is the workspace overview where teams launch new runs and revisit previous experiments. Status tags make progression visible from baseline benchmarking to fine-tuning, so everyone can track where each model iteration stands.

Poltergeist projects list screen with project cards and status badges

New project setup
Setup captures only the inputs needed to define a useful run: project identity, model, dataset, task type, and duration. The structure keeps setup fast while still making experimental scope explicit.

Poltergeist new project form with model, dataset, task, and duration fields

Test suite selection
Scenario categories are grouped by failure mode, including environment changes, sensor degradation, semantic manipulations, and document-specific damage. This organization encourages deliberate test design rather than one-click black-box evaluation.

Poltergeist test suite screen showing grouped adversarial scenario categories

Run in progress
During execution, the UI keeps the team oriented with progress context and estimated timing. The state is intentionally calm and informative, so users understand that the system is actively generating and evaluating scenarios.

Poltergeist loading state while test scenarios are running

Run summary
Summary surfaces key outcomes first: baseline vs attacked performance, degradation signals, and ranked attack categories. The failed-scenario gallery then turns those metrics into concrete examples the team can use for triage and retraining decisions.

Poltergeist run summary dashboard with key metrics and failed scenario thumbnails

Scenario preview
The detail view supports human-in-the-loop review for each case: attack type, prompt, expected response, and model output are shown together. This makes validation actionable and helps teams separate meaningful vulnerabilities from noisy or invalid samples.

Poltergeist scenario preview screen with image, prompt, answer, and validation action

Sparsh Paliwal · 2026