TextSimilarity usecases¶
-
class
previsionio.text_similarity.
DescriptionsColumnConfig
(content_column, id_column)¶ Bases:
previsionio.usecase_config.UsecaseConfig
Description Column configuration for starting a usecase: this object defines the role of specific columns in the dataset.
Parameters:
-
class
previsionio.text_similarity.
ModelEmbedding
¶ Bases:
enum.Enum
Embedding models for Text Similarity
-
TFIDF
= 'tf_idf'¶ Term Frequency - Inverse Document Frequency
-
Transformer
= 'transformer'¶ Transformer
-
TransformerFineTuned
= 'transformer_fine_tuned'¶ fine tuned Transformer
-
-
class
previsionio.text_similarity.
ModelsParameters
(model_embedding: previsionio.text_similarity.ModelEmbedding = <ModelEmbedding.TFIDF: 'tf_idf'>, preprocessing: previsionio.text_similarity.Preprocessing = <previsionio.text_similarity.Preprocessing object>, models: List[previsionio.text_similarity.TextSimilarityModels] = [<TextSimilarityModels.BruteForce: 'brute_force'>])¶ Bases:
previsionio.usecase_config.UsecaseConfig
Training configuration that holds the relevant data for a usecase description: the wanted feature engineering, the selected models, the training speed…
Parameters: - preprocessing (Preprocessing, optional) –
Dictionary of the text preprocessings to be applied (only for “tf_idf” embedding model),
- word_stemming: default to “yes”
- ignore_stop_word: default to “auto”, choice will be made depending on if the text descriptions contain full sentences or not
- ignore_punctuation: default to “no”.
- model_embedding (ModelEmbedding, optional) – Name of the embedding model to be used (among: “tf_idf”, “transformer”, “transformer_fine_tuned”).
- models (list(TextSimilarityModels), optional) – Names of the searching models to be used (among: “brute_force”, “cluster_pruning”, “ivfopq”, “hkm”, “lsh”).
- preprocessing (Preprocessing, optional) –
-
class
previsionio.text_similarity.
QueriesColumnConfig
(queries_dataset_content_column, queries_dataset_matching_id_description_column, queries_dataset_id_column=None)¶ Bases:
previsionio.usecase_config.UsecaseConfig
Description Column configuration for starting a usecase: this object defines the role of specific columns in the dataset.
Parameters:
-
class
previsionio.text_similarity.
TextSimilarity
(**usecase_info)¶ Bases:
previsionio.usecase_version.BaseUsecaseVersion
A text similarity usecase version
-
model_class
¶
-
new_version
(description: str = None, dataset: previsionio.dataset.Dataset = None, description_column_config: previsionio.text_similarity.DescriptionsColumnConfig = None, metric: previsionio.metrics.TextSimilarity = None, top_k: int = None, lang: previsionio.text_similarity.TextSimilarityLang = <TextSimilarityLang.Auto: 'auto'>, queries_dataset: previsionio.dataset.Dataset = None, queries_column_config: Optional[previsionio.text_similarity.QueriesColumnConfig] = None, models_parameters: previsionio.text_similarity.ListModelsParameters = None, **kwargs) → previsionio.text_similarity.TextSimilarity¶ Start a text similarity usecase training to create a new version of the usecase (on the platform): the training configs are copied from the current version and then overridden for the given parameters.
Parameters: - description (str, optional) – additional description of the version
- dataset (
Dataset
,DatasetImages
, optional) – Reference to the dataset object to use for as training dataset - description_column_config (
DescriptionsColumnConfig
, optional) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (metrics.TextSimilarity, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
, optional) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created text similarity usecase version object (new version)
Return type:
-
-
class
previsionio.text_similarity.
TextSimilarityModels
¶ Bases:
enum.Enum
Similarity search models for Text Similarity
-
BruteForce
= 'brute_force'¶ Brute force search
-
ClusterPruning
= 'cluster_pruning'¶ Cluster Pruning
-
HKM
= 'hkm'¶ Hierarchical K-Means
-
IVFOPQ
= 'ivfopq'¶ InVerted File system and Optimized Product Quantization
-
LSH
= 'lsh'¶ Locality Sensitive Hashing
-