Post-Launch Capability Expansion of Vision-Language Models via Prompting for On-Orbit Spacecraft Inspection

Spaceborne inspection systems often deploy perception models prior to launch, after which updating model weights or expanding fixed label sets becomes operationally impractical. While supervised models can be integrated pre-flight, adding new semantic capabilities in orbit requires retraining and re-uploading parameters. We investigate whether prompt-driven vision--language models can enable post-launch semantic expansion, allowing new spacecraft components to be specified via natural-language prompts without modifying onboard weights. We evaluate zero-shot instance segmentation of spacecraft components under a strictly frozen, single-pass inference protocol on a test set of 129 images of previously unseen satellites. Under fixed global thresholds and no post-processing, SAM3 achieves 0.385 mAP@0.5 and 0.267 mAP@0.5{:}0.95. Performance is strongly scale-dependent: large structural elements like spacecraft bodies (0.639 AP@0.50) and solar arrays (0.598 AP@0.5) localize reliably, while relatively small appendages like antennas (0.221 AP@0.5) and thrusters (0.081 AP@0.5) remain difficult. Prompt formulation influences performance, with structured prompts incorporating spatial and geometric descriptors yielding up to 82% improvement over short category-name prompts. The model operates within the memory and compute envelope of contemporary embedded GPUs, suggesting prompt-driven grounding can provide a practical mechanism for post-launch semantic extension of dominant spacecraft structures while highlighting limitations of zero-shot localization for fine-scale components under orbital domain shift.