A VFM for Earth Observation

Remote sensing is a well-known application of computer vision. Over the years, many research labs and companies have developed models for various use cases of activity monitoring and earth observation. As in many fields of computer vision, supervised training is still the most common paradigm.

This, however, has become a limiting factor to the development of new use cases. Although collecting data is fairly simple once the satellites are operating in space, labelling the ever increasing amount of required data has become infeasible. And this inefficient process has to be repeated for every new use case.

Given that vast amounts of unlabelled data are available, it makes sense to learn from the language domain and train a foundation model for earth observation. This model can then be adapted efficiently with small amounts of data and compute for specific use cases.

To spark such a workflow (for users who do not have access to data themselves), Synativ has released a foundation model for earth observation. In this blog, we demonstrate how it can speed up the development of a flood detection model.

We would like to emphasize that, besides exemplifying the aforementioned, this blog serves as an example to demonstrate the broader use of industry-specific foundation models. We are working on releasing more starting points for other industries and can also help you to create your proprietary starting point.

VFMs for earth observation

VFMs such as SAM and DINOv2 are trained on vast amounts of natural images. The concepts learned from that data do not translate well to satellite imagery as the viewpoints and consequently the semantic features are significantly different. Fine-tuning these models for earth observation applications might lead to marginal benefits but is not expected to drastically speed up development times.

Nevertheless, the training principles of those VFMs can easily be applied to satellite imagery. A foundation model can be trained on vast amounts of unlabelled data to learn generic concepts within this view. This model then functions as a good starting point for developing earth observation applications.

Below we compare, the attention maps of the last block of DINO and the earth observation VFM. It is clear that the latter performs better at attending to relevant semantic features and therefore will require less use-case specific data for fine-tuning.

DINO

Earth Observation VFM

Fine-tuning for flood detection using Synativ's SDK

As a example of a downstream task, we have trained a model for flood detection on the Sen1Floods11 dataset. This dataset contains 252 train, 89 validation, and 90 test images. We have gradually scaled down the number of images “available” for training randomly to mimic a low-data regime and investigate the decrease in performance.

From the graph below, it is clear that the performance of the fine-tuned foundation model remains stable and only decreases when we scale below 10% of the available training data. This indicates that the foundation model has indeed learned general concepts for earth observation and only requires a small amount of labelled data to learn the task of interest. We have also visualised an input image together with the corresponding ground truth and prediction of the model (trained on 100% of the data) to provide an idea of the task.

input

ground truth

prediction

Some remarks regarding the experimental results:

We have kept the same compute budget for all data points, i.e. models trained on less data have been trained for more epochs.
We have trained three models for every data point and averaged the performance; models at one data point were generally within ~1% of each other.
Flood detection is a binary tasks and the images in the training set are close in distribution to those of the test set, so that few-shot learning is possible.

The flood detection task was chosen as an example, comparable performance characteristics can be expected for other downstream tasks as long as the data has a similar distribution to the data that the foundation model was trained on.

Availability

The earth observation foundation model is now available for fine-tuning via our SDK. An easy-to-follow tutorial can be found in our docs. You will need an API key to interact with Synativ’s APIs, which we will provide upon request.

In the background, we are working hard to release more industry-specific foundation models as starting points soon. If you have collected a lot of unlabelled data and want to train a proprietary foundation model for your specific domain (it does not necessarily have to be a geospatial application), let us know and we can help.

Request Access

A Visual Foundation Model for Earth Observation

VFMs for earth observation

Fine-tuning for flood detection using Synativ's SDK

Availability

Want to keep up to date with our development?