Reference for ultralytics/nn/text_model.py
Note
This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/text_model.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!
ultralytics.nn.text_model.TextModel
Bases: Module
Abstract base class for text encoding models.
This class defines the interface for text encoding models used in vision-language tasks. Subclasses must implement the tokenize and encode_text methods.
Methods:
Name | Description |
---|---|
tokenize |
Convert input texts to tokens. |
encode_text |
Encode tokenized texts into feature vectors. |
Source code in ultralytics/nn/text_model.py
encode_text
abstractmethod
ultralytics.nn.text_model.CLIP
Bases: TextModel
OpenAI CLIP text encoder implementation.
This class implements the TextModel interface using OpenAI's CLIP model for text encoding.
Attributes:
Name | Type | Description |
---|---|---|
model |
CLIP
|
The loaded CLIP model. |
device |
device
|
Device where the model is loaded. |
Methods:
Name | Description |
---|---|
tokenize |
Convert input texts to CLIP tokens. |
encode_text |
Encode tokenized texts into normalized feature vectors. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
str
|
Model size identifier (e.g., 'ViT-B/32'). |
required |
device
|
device
|
Device to load the model on. |
required |
Source code in ultralytics/nn/text_model.py
encode_text
Encode tokenized texts into normalized feature vectors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts
|
Tensor
|
Tokenized text inputs. |
required |
dtype
|
dtype
|
Data type for output features. |
float32
|
Returns:
Type | Description |
---|---|
Tensor
|
Normalized text feature vectors. |
Source code in ultralytics/nn/text_model.py
ultralytics.nn.text_model.MobileCLIP
Bases: TextModel
Apple MobileCLIP text encoder implementation.
This class implements the TextModel interface using Apple's MobileCLIP model for efficient text encoding.
Attributes:
Name | Type | Description |
---|---|---|
model |
MobileCLIP
|
The loaded MobileCLIP model. |
tokenizer |
callable
|
Tokenizer function for processing text inputs. |
device |
device
|
Device where the model is loaded. |
config_size_map |
dict
|
Mapping from size identifiers to model configuration names. |
Methods:
Name | Description |
---|---|
tokenize |
Convert input texts to MobileCLIP tokens. |
encode_text |
Encode tokenized texts into normalized feature vectors. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
str
|
Model size identifier (e.g., 's0', 's1', 's2', 'b', 'blt'). |
required |
device
|
device
|
Device to load the model on. |
required |
Source code in ultralytics/nn/text_model.py
encode_text
Encode tokenized texts into normalized feature vectors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts
|
Tensor
|
Tokenized text inputs. |
required |
dtype
|
dtype
|
Data type for output features. |
float32
|
Returns:
Type | Description |
---|---|
Tensor
|
Normalized text feature vectors. |
Source code in ultralytics/nn/text_model.py
ultralytics.nn.text_model.build_text_model
Build a text encoding model based on the specified variant.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
variant
|
str
|
Model variant in format "base:size" (e.g., "clip:ViT-B/32" or "mobileclip:s0"). |
required |
device
|
device
|
Device to load the model on. |
None
|
Returns:
Type | Description |
---|---|
TextModel
|
Instantiated text encoding model. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the specified variant is not supported. |