Linear probing foundation model in machine learning. Notes by James Le and Vishnu Rachakonda.


Linear probing foundation model in machine learning Apr 11, 2024 · Much less machine learning knowledge is required to adapt a foundation model rather than training from scratch so modern generative AI teams are often made up of domain experts and product managers, not necessarily data scientists or machine learning specialists. Dec 2, 2024 · We evaluate TITAN on diverse clinical tasks and find that TITAN outperforms both ROI and slide foundation models across machine learning settings such as linear probing, few-shot and zero-shot classification, rare cancer retrieval and cross-modal retrieval, and pathology report generation. However, despite the widespread use of large language Apr 4, 2023 · Linear probing definitely gives you a fair amount of signal Linear mode connectivity and git rebasin Colin Burns’ unsupervised linear probing method works even for semantic features like ‘truth’ You can merge together different models finetuned from the same initialization You can do a moving average over model checkpoints and this is better! May 17, 2024 · Linear probing is a technique used in hash tables to handle collisions. We’ll talk about fine-tuning, Transformers, large language models, prompt engineering, other applications of large models, and vision and text-based models like Jun 27, 2025 · As rich sources of history, maps provide crucial insights into historical changes, yet their diverse visual representations and limited annotated data pose significant challenges for automated processing. Sep 20, 2025 · 【Linear Probing | 线性探测】深度学习 线性层 1. Sep 17, 2025 · We demonstrate that combining low-rank adaptation with linear probing of foundation models yields exceptional segmentation performance while maintaining parameter efficiency. Apr 17, 2024 · Understanding the “depth” and breadth of a model's perceptual capabilities is crucial, particularly in applications involving spatial recognition and 3D object interaction. We demonstrate how this Probing by linear classifiers This tutorial showcases how to use linear classifiers to interpret the representation encoded in different layers of a deep neural network. But large models are much more than that. We propose a simple yet effective approach for few-shot segmentation of historical maps, leveraging the rich semantic embeddings of large vision foundation models combined with parameter Sep 17, 2025 · As rich sources of history, maps provide crucial insights into historical changes, yet their diverse visual representations and limited annotated data pose significant challenges for automated processing. This holds true for both indistribution (ID) and out-of-distribution (OOD) data. We show greedy learning of low-rank latent codes Yet, we still observe a large gap between distilled and supervised 3D representations when directly measuring their quality by linear probing: they achieve respectively 45. But with good mathematical guarantees: Chernoff bounds ⇒ chaining, linear probing Cuckoo Hashing What are Probing Classifiers? Probing classifiers are a set of techniques used to analyze the internal representations learned by machine learning models. Related to finetuning in the field of training Foundation models is linear probing Overall, the GPT papers are a valuable resource for understanding the transformer model and its applications in natural language processing. We propose a simple yet effective approach for few-shot segmentation of historical maps, leveraging the rich semantic embeddings of large vision foundation models combined with parameter May 5, 2024 · MOMENT: A Family of Open Time-series Foundation Models Contents Abstract Introduction Related Works Methodology The Time Series Pile Model Architecture Pre-training using MTM Fine-tuning on Downstream Tasks Experimental Setup and Results Design choices RQ1: Effectiveness RQ2: Interpretability RQ3: Properties Abstract MOMENT = a family of open-source foundation models for general-purpose TS Nov 20, 2025 · The learned representations of our model have been ap-plied on standard CXR interpretation tasks, achieving state-of-the-art performance with linear probing. Our extensive ablation studies validate this approach as both computationally lightweight and highly effective for historical document analysis. Our method uses linear classifiers, referred to as “probes”, where a probe can only use the hidden units of a given intermediate layer as discriminating features. Oct 24, 2025 · Existing EEG foundation models struggle to generalize across these variations, often restricting pretraining to a single setup, resulting in suboptimal performance, in particular under linear probing. Aug 17, 2023 · Using extracted images and related labels from pathology-related tweets, a model is trained to associate tissue images and text and approaches state-of-the-art performance in clinically relevant Abstract—Driven by the recent advances in deep learning methods and, in particular, by the development of modern self-supervised learning algorithms, increased interest and efforts have been devoted to build foundation models (FMs) for medical images. Figure 1: (a) Existing pathology foundation model (PFM) pipelines typically rely on linear probing over the global class token, discarding fine-grained local cues from patch-level embeddings and thus losing critical cellular information. Linear probing, often applied to the final layer of pre-trained models, is limited by its inability to model complex relationships in data. Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing. In this work, we present our scalable training pipeline for large pathology imaging data, and a comprehensive analysis of various We evaluated our pretrained model using linear probing by freezing its feature extractor and training a simple linear classifier on top of the learned representations. It’s distinct from training a model from scratch using the downstream task dataset exclusively. Our extensive ablation studies validate this ap-proach as both computationally lightweight and highly effective for historical document analysis. This helps us better understand the roles and dynamics of the intermediate layers. Abstract—Based on the success of large-scale visual foundation models like CLIP in various downstream tasks, this paper initially attempts to explore their impact on Long-Tailed Semi-Supervised Learning (LTSSL) by employing the foundation model with three strategies: Linear Probing (LP), Lightweight Fine-Tuning (LFT) and Full Fine-Tuning (FFT). Abstract The two-stage fine-tuning (FT) method, linear probing (LP) then fine-tuning (LP-FT), outperforms linear probing and FT alone. We demonstrate that combining low-rank adaptation with linear probing of foundation models yields exceptional segmentation performance while main-taining parameter efficiency. Jun 6, 2025 · Here we introduce PanDerm, a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images from 11 clinical institutions We introduce MOMENT, a family of open-source foundation models for general-purpose time-series analysis. However, the non-interpretable, black-box nature of this Dec 30, 2024 · Ease of Transfer Learning Pretrained models can be easily fine-tuned or adapted using techniques like linear probing, making them versatile for a variety of use cases. Pre-training large models on time-series data is challenging due to (1) the absence of a large and cohesive public time-series repository, and (2) diverse time-series characteristics which make multi-dataset training onerous. This pre-training and then fine-tuning paradigm has become a standard practice in deep learning. Drawing inspiration from prior work, we build on top of the transformer architecture which takes disjoint time series sub-sequences (or patches) as input. One common adaptation strategy is known as “linear-probing” where a simple linear model is trained to map a foundation model’s representation to logits used for classification. This random feature is understand to have no useful information to predict the Y. Jul 22, 2024 · Figure 1: A high-level overview of a cross-view human activity recognition framework featuring pretrained frozen Foundation Models (FM) with linear probing and a temporal fusion mechanism. May 27, 2024 · The two-stage fine-tuning (FT) method, linear probing (LP) then fine-tuning (LP-FT), outperforms linear probing and FT alone. However, adapting these models to various downstream tasks remains challenging, particularly when faced with datasets from different sources and acquisition conditions, as well as limited data availability. Zero-shot: apply to new tasks without any training examples for those specific tasks Linear probe: train a linear model on the features Fine-tune: adjust the entire network to perform better in the target task We previously saw two examples of foundation models suitable for fine-tuning ImageNet pretrained models for vision BERT for language Jun 14, 2024 · n models on AL remains under explored. We propose a simple yet effective approach for few-shot segmentation of historical maps, leveraging the rich semantic embeddings of large vision foundation models combined with parameter Besides being great candidates for establishing well-posed problems, the idea of probing foundation models with synthetic conditional Gaussians is also motivated by the longstanding practice of Gaussian modeling in signal processing [287], data mining [291], machine learning [407, 837, 1051], and other engineering fields. Report issue for Foundation Model? “We introduce the term foundation models to fill a void in describing the paradigm shift we are witnessing Existing terms (e. Linear-Probe Classification: A Deep Dive into FILIP and SODA | SERP AIhome / posts / linear probe classification Nov 22, 2021 · Moreover, Florence demonstrates outstanding performance in many types of transfer learning: fully sampled fine-tuning, linear probing, few-shot transfer and zero-shot transfer for novel images and objects. 原理 训练后,要评价 模型 的好坏,通过将最后的一层替换成 线性 层。. By probing a pre-trained model's internal representations, researchers and data Jan 28, 2025 · This paper proposes a new federated learning method called FedLP + FT. In this paper, we take a step further and analyze implicit rank regularization in autoencoders. This holds true for both in-distribution (ID) and out-of-distribution (OOD) data. Although probing is typically not used as a stand alone approach, our preliminary experiment found that a vanilla probing baseline worked surprisingly well. Probes in the above sense are supervised models whose inputs are frozen parameters of the model we are probing. Additionally, (3) experimental benchmarks to evaluate these In our work, we demonstrate that when lightly finetuning multiple runs from a single foundation model, the choice of random-ness during training (linear head initialization, data ordering, and data subsetting) can lead to drastically different levels of agreement-on-the-line in the resulting ensemble. 7% mIoU in linear probing on the validation of nuScenes in [39] for a MinkUNet [14] with cylindrical voxels [84]. The idea is to introduce a random feature to the dataset and train a machine learning model. Where we're going: Theorem:Using 2-independent hash functions, we can prove an O(n1/2) expected cost of lookups with linear probing, and there's a matching adversarial lower bound. Source: Armand Ruiz Oct 1, 2021 · Many scientific fields now use machine-learning tools to assist with complex classification tasks. However, linear probing with frozen features from the backbone lim-its the application of feature-embedding and diversity-based sample Using probes, machine learning researchers gained a better understanding of the difference between models and between the various layers of a single model. National institute of standards and technology (NIST): adversarial machine learning: a taxonomy and terminology of attacks and mitigations. However, it is unclear why we should require the encoding of 3D properties to be linear. Download slides. , pretrained model, self-supervised model) partially capture the technical dimension of these models, but fail to capture the significance of the paradigm shift in an accessible manner for those beyond machine learning. Masked pre-training is a widely-used self-supervised learning task where a model learns to accurately reconstruct masked portions of its input. Apply machine learning strategies to varied scenarios, expanding your problem-solving toolkit. 4. 2 Linear Classifier Probes Linear Probes (LP) are classifiers (such as Multi-Layer Perceptrons, MLPs) that contribute to deep learning models explainability efforts by providing insights into how the model processes information internally [2]. Apr 4, 2022 · Abstract. In order for linear probes to successfully classify images, the Apr 1, 2017 · Alain and Bengio (2016) first introduced the idea of using linear classifier probes for features at every model layer, and Kim et al. Our analysis presents the following insights: i Linear probing is a method used in machine learning to improve how models handle new tasks. Linear probing consists of fitting a logistic regression model using representations extracted from frozen foundation models [48]. In this study, we benchmark four Sep 17, 2025 · As rich sources of history, maps provide crucial insights into historical changes, yet their diverse visual representations and limited annotated data pose significant challenges for automated processing. When a model is first trained on a large amount of data, it learns many useful features. Foundation models are very large models trained on very large datasets that can be used for multiple downstream tasks. We use linear classifiers, which we refer to as "probes", trained entirely independently of the model itself. Technical Report, NIST AI 100-2e2025 (2025) Layer 10 20 30 rthiness dynamics during pre-training. We present stable-pretraining, a modular, extensible, and performance-optimized library built on top of PyTorch, Lightning, Hugging Face, and Linear Probing: In linear probing, the weights in the encoder of the foundation model are frozen, and a linear classifier is added to its output layer. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the Nov 1, 2025 · We evaluate TITAN on diverse clinical tasks and find that it outperforms both ROI and slide foundation models across machine learning settings, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval and pathology report generation. Dec 10, 2024 · The two-stage fine-tuning (FT) method, linear probing (LP) then fine-tuning (LP-FT), outperforms linear probing and FT alone. While their simplicity has benefits, it also makes linear probes highly reliant on the expressivity of the foundation models they are trained with. Apr 5, 2023 · Two standard approaches to using these foundation models are linear probing and fine-tuning. One key reason for its success is the preservation of pre-trained features, achieved by obtaining a near-optimal linear head during LP. machine-learning computer-vision deep-learning master-thesis transformers pytorch image-classification transfer-learning linear-probing fine-tuning huggingface vision-transformers zero-shot-transfer prompt-engineering Initially, linear probing (LP) optimizes only the linear head of the model, after which fine-tuning (FT) updates the entire model, including the feature extractor and the linear head. This course equips you with the foundation to thrive as a machine learning enthusiast, data-driven professional, or someone ready to explore the dynamic possibilities of machine learning. ABSTRACT Advancements in foundation models (FMs) have led to a paradigm shift in machine learning. Linear probing freezes the foundation model and trains a head on top. Further-more, the model may represent such properties at different, or multiple, locations within the network. This is hard to distinguish from simply fitting a supervised model as usual, with a particular choice for featurization. Oct 14, 2024 · An alternative approach, Probing, represents a model by passing a set of learned inputs (probes) through the model, and training a predictor on top of the corresponding outputs. We present REVE (Representation for EEG with Versatile Embeddings), a pretrained model explicitly designed to generalize across diverse EEG signals. Pretrained model representations are commonly evaluated Dec 16, 2024 · When a model makes a correct prediction on a task it has been trained on (known as a ‘downstream task’), Probing classifiers can be used to identify if the model actually contains the relevant information or knowledge required to make that prediction, or if it is just making a lucky guess. However, recent studies have Simple Tabulation: “Uniting Theory and Practice” Simple & fast enough for practice. We study fine-grained activity understanding and cross-view generalization of different image- and video-based FMs and implement different techniques for linking temporal frame-level information. Linear probing helps in applying these learned features to a new task without losing the information stored during the initial training. Abstract This paper introduces Kolmogorov-Arnold Networks (KAN) as an enhancement to the traditional linear probing method in transfer learning. Furthermore, probes can be used to identify the specific components of the model that contain this Abstract This paper introduces Kolmogorov-Arnold Networks (KAN) as an enhancement to the traditional linear probing method in transfer learning. Sep 30, 2023 · The Probe method is a highly intuitive approach to feature selection. In simpler terms, it tries to draw a straight line that best fits the data points. In neuroscience, automatic classifiers may be usefu… This document is part of the arXiv e-Print archive, featuring scientific research and academic papers in various fields. Sep 19, 2024 · Discussion and Opinion Linear probing and non-linear probing are great ways to identify if certain properties are linearly separable in feature space, and they are good indicators that these information could be used for future token prediction. When a collision occurs (i. A recent study titled "Probing the 3D Awareness of Visual Foundation Models" offers significant insights into how well current visual foundation models perceive and interpret three-dimensional space from two-dimensional Apr 23, 2025 · The experiments aimed to identify the optimal combination of visual-semantic foundation model and margin-based metric learning loss for learning discriminative image embeddings enabling universal instance-level image retrieval. Sep 19, 2022 · Lecture by Sergey Karayev. Oct 25, 2024 · This guide explores how adding a simple linear classifier to intermediate layers can reveal the encoded information and features critical for various tasks. Your students can learn about the importance of pre-training, transfer learning, and hyperparameter tuning, as well as gain insights into the latest state-of-the-art techniques for language modeling and One common adaptation strategy is known as “linear-probing” where a simple linear model is trained to map a foundation model’s representation to logits used for classification. It assumes a linear relationship between the input features and the target variable. What is Linear Regression? Linear Regression is a supervised machine learning algorithm used for predicting a continuous target variable based on one or more input features. Linear probing is useful for semantic tasks since linear separability of classes is a desired and expected property. Oct 13, 2025 · 2. All of these properties are critical for our vision foundation model to serve general purpose vision tasks. , when two keys hash to the same index), linear probing searches for the next available slot in the hash table by incrementing the index until an empty slot is found. Learn about the construction, utilization, and insights gained from linear probes, alongside their limitations and challenges. Jun 14, 2024 · We demonstrate PEAL improves the transfer learning performance and efficiency with foundation models, as compared to linear probing. pplicable to our framework. Mar 26, 2024 · Understanding Linear Regression: From Mathematics to Machine Learning Model Have you ever wondered how AI systems predict future trends or values based on historical data? One of the most Jan 22, 2024 · In-context learning (ICL) is a new paradigm for natural language processing that utilizes Generative Pre-trained Transformer (GPT)-like models. Jan 16, 2025 · The reinforcement learning (RL) algorithm updates the SFT foundation model with the predicted reward. Sep 26, 2022 · 记录 论文阅读、zero-shot实验(直接推理)、linear probe实验(冻结CLIP抽特征只训练分类层)。 Although MOTORalso provides open weights, we could not evaluate its transportability due to insufficient information on lab measurement units in the original pre-training datasets. To validate that our ensemble strategy is non-trivial, we build three baselines for comparison: (1) Ens-LP, short for the ensemble of linear probing, where we apply linear probing to each pre-trained model and take the aver-age of their output probabilities for prediction, (2) Ens-LP†, where the zero Dec 11, 2022 · Surprisingly, even without any ground-truth labels, transductive linear probing with self-supervised graph contrastive pretraining can outperform the state-of-the-art fully supervised meta-learning based methods under the same protocol. , a linear model on top (called linear probing) •Our self-supervised learning example Analyzing Linear Probing When looking at k-independent hash functions, the analysis of linear probing gets significantly more complex. Our method exhibits effective 3D reconstruction capability. The basic idea is simple—a classifier is trained to predict some linguistic property from a model’s representations—and has been used to examine a wide variety of models and properties. Feb 1, 2023 · Keywords: machine learning, unsupervised learning, reinforcement learning, computer vision TL;DR: Our paper proposes linear reward probing as an efficient method to evaluate the quality of pretrained representations in the RL setting, and demonstrates its positive correlation with downstream RL performance. Traditional Machine Learning vs Foundation Models. The method adopts a two-stage strategy: in the first stage, the linear head of the model is trained using linear probing; in the second stage, fine-tuning update the entire model following the traditional federated learning approach. Although sufficient data exists for | Find, read and cite all the research you Jan 19, 2025 · 1. Changes to pre-trained features are minimized. However, linear probing with frozen features from the backbone lim-its the application of feature-embedding and diversity-based sample 4. To address this, we propose substituting the linear probing layer with KAN, which leverages spline-based representations In machine learning, a linear classifier makes a classification decision for each object based on a linear combination of its features. Potential harms of Foundation models include generating offensive or untruthful content and enabling the spread of misinformation. These classifiers aim to understand how a model processes and encodes different aspects of input data, such as syntax, semantics, and other linguistic features. 2 Linear probing Linear probing consists of fitting a logistic regression model using representations extracted from frozen foundation models [48]. Notes by James Le and Vishnu Rachakonda. The linear classifier is trained using the labeled ECG dataset to evaluate the quality of the learned representations without updating the weights in the encoder of the foundation model. This common approach is (i) less computationally expensive and data-hungry than further fine-tuning (ii) enables evaluating pretraining design choices on downstream performance. Oct 3, 2024 · We previously discussed freezing our model, and using just some trainable heads •E. Recently, numerous SSL algorithms have been proposed to address this challenge by automatically generating pseudo-labels for unlabeled samples using the model Sep 13, 2024 · This paper introduces Kolmogorov-Arnold Networks (KAN) as an en-hancement to the traditional linear probing method in transfer learning. At least some of the information that we identify is likely to be stored in the probe model. Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization. 0% and 74. Sep 28, 2024 · Nevertheless, this suggests that for some problems linear probing of a foundation model may be a better alternative than training a model from scratch with a small dataset. 作用 自监督 模型评测方法 是测试 预训练 模型性能的一种方法,又称为linear probing evaluation 2. However, despite the widespread use of Mar 12, 2025 · With the popularity of foundation models, recent years have witnessed a paradigm shift in deep learning from task-centric model design to task-agnostic representation learning and task-specific fine-tuning. 1. After training the ML model, extract the feature importances. However, despite the widespread use of n models on AL remains under explored. A recent work [4], which is more closely related to this re-search, investigates the use of vision foundation models in an active l arning context through linear probing. e. However, applying ICL in real cases does not scale with the number of samples, and lacks robustness to different prompt Abstract—Foundation models are usually pre-trained on large-scale datasets and then adapted to different downstream tasks through tuning. The rich, expressive feature representations from these pre-trained, large-scale FMs are leveraged for multiple downstream tasks, usually via lightweight fine-tuning of a shallow fully-connected network following the representation. Oct 5, 2016 · Neural network models have a reputation for being black boxes. (2019) further developed new probing tasks to explore the Motivated by the eficacy of test-time linear probe in assess-ing representation quality, we aim to design a linear prob-ing classifier in training to measure the discrimination of a neural network and further leverage the probing signal to empower representation learning. 1) Linear probing identifies linearly separable opposing concepts during early pre-training; 2) Steering vectors are developed to enhance LLMs’ trustworthiness; 3) Probing LLMs with mutual information reveals a two period of an LLM be utilized to enhance its trust-worthiness after pre Sep 12, 2024 · This paper introduces Kolmogorov-Arnold Networks (KAN) as an enhancement to the traditional linear probing method in transfer learning. Dec 1, 2023 · This family of approaches include, among others, linear probing [23], where only a linear layer staked on top of pre-training features is updated, or adapters [2, 17, 27], which are trainable, compact feed-forward networks that are inserted between the layers of a fixed pre-trained model. A simpler definition is to say that a linear classifier is one whose decision boundaries are linear. g. Baselines. However, despite the widespread use of large Finetuning # Fine-tuning refers to a process in machine learning where a pre-trained model is further trained on a specific dataset to adapt its parameters to a downstream task characterized by a relevent domain. Introduction Semi-supervised learning (SSL) has emerged as a promi-nent learning paradigm of machine learning, which aims to train models using a combination of a large amount of unlabeled data and a limited number of labeled samples. Published September 19, 2022. This method has been extensively analyzed and enhanced [50, 46, 16, 26]. We propose to monitor the features at every layer of a model and measure how suitable they are for classification. Various methods can be used to utilize Foundation models, including linear probing, fine-tuning, lightweight fine-tuning, prefix tuning, prompt tuning, zero-shot prompting, and in-context learning. Masked Representation Learning. This paper was accepted at the workshop on Overparameterization: Pitfalls and Opportunities at the ICML 2021 conference. We empirically show the effectiveness of PEAL for both uncertainty-based and diversity-based sample selection methods with extensive experiments on large-scale image-classification datasets utilizing DINOv2 [14] as the foundation backbone. b, Supervised-learning-based methods use loss functions constructed from human feedback to Aug 15, 2024 · PDF | Data scarcity is a major limiting factor for applying modern machine learning techniques to clinical tasks. May 1, 2025 · In computational pathology, several foundation models have recently developed, demonstrating enhanced learning capability for analyzing pathology images. Moreover, these probes cannot affect the training phase of a model, and they are generally added after training. This approach uses prompts that include in-context demonstrations to generate the corresponding output for a new query input. To address this, we propose substituting the linear probing layer with KAN, which leverages spline-based 1st Linear probing (LP), 2nd Fine-tuning (FT) FT starts with the optimized linear layer (classifier). 3 days ago · Abstract Foundation models and self-supervised learning (SSL) have become central to mod-ern AI, yet research in this area remains hindered by complex codebases, redundant re-implementations, and the heavy engineering burden of scaling experiments. hyygv iumvwgepm tah jikzr wgkwv iepgc diaq dndve bhoiv kmvxc ygunj olnwn xinc itckf eagjvyo