TextSimilarity experiments¶
-
class
previsionio.text_similarity.
DescriptionsColumnConfig
(content_column, id_column)¶ Bases:
previsionio.experiment_config.ExperimentConfig
Description Column configuration for starting an experiment: this object defines the role of specific columns in the dataset.
Parameters:
-
class
previsionio.text_similarity.
ModelEmbedding
¶ Bases:
enum.Enum
Embedding models for Text Similarity
-
TFIDF
= 'tf_idf'¶ Term Frequency - Inverse Document Frequency
-
Transformer
= 'transformer'¶ Transformer
-
TransformerFineTuned
= 'transformer_fine_tuned'¶ fine tuned Transformer
-
-
class
previsionio.text_similarity.
ModelsParameters
(model_embedding: previsionio.text_similarity.ModelEmbedding = <ModelEmbedding.TFIDF: 'tf_idf'>, preprocessing: previsionio.text_similarity.Preprocessing = <previsionio.text_similarity.Preprocessing object>, models: List[previsionio.text_similarity.TextSimilarityModels] = [<TextSimilarityModels.BruteForce: 'brute_force'>])¶ Bases:
previsionio.experiment_config.ExperimentConfig
Training configuration that holds the relevant data for an experiment description: the wanted feature engineering, the selected models, the training speed…
Parameters: - preprocessing (Preprocessing, optional) –
Dictionary of the text preprocessings to be applied (only for “tf_idf” embedding model),
- word_stemming: default to “yes”
- ignore_stop_word: default to “auto”, choice will be made depending on if the text descriptions contain full sentences or not
- ignore_punctuation: default to “no”.
- model_embedding (ModelEmbedding, optional) – Name of the embedding model to be used (among: “tf_idf”, “transformer”, “transformer_fine_tuned”).
- models (list(TextSimilarityModels), optional) – Names of the searching models to be used (among: “brute_force”, “cluster_pruning”, “ivf_opq”, “hkm”, “lsh”).
- preprocessing (Preprocessing, optional) –
-
class
previsionio.text_similarity.
QueriesColumnConfig
(queries_dataset_content_column, queries_dataset_matching_id_description_column, queries_dataset_id_column=None)¶ Bases:
previsionio.experiment_config.ExperimentConfig
Description Column configuration for starting an experiment: this object defines the role of specific columns in the dataset.
Parameters:
-
class
previsionio.text_similarity.
TextSimilarity
(**experiment_version_info)¶ Bases:
previsionio.experiment_version.BaseExperimentVersion
A text similarity experiment version
-
best_model
¶ Get the model with the best predictive performance over all models (including Blend models), where the best performance corresponds to a minimal loss.
Returns: Model with the best performance in the experiment, or None
if no model matched the search filter.Return type: ( Model
, None)
-
dataset
¶ Get the
Dataset
object corresponding to the training dataset of this experiment version.Returns: Associated training dataset Return type: Dataset
-
delete
()¶ Delete an experiment version from the actual [client] workspace.
Raises: PrevisionException
– If the experiment version does not existrequests.exceptions.ConnectionError
– Error processing the request
-
delete_prediction
(prediction_id: str)¶ Delete a prediction in the list for the current experiment from the actual [client] workspace.
Parameters: prediction_id (str) – Unique id of the prediction to delete Returns: Deletion process results Return type: dict
-
delete_predictions
()¶ Delete all predictions in the list for the current experiment from the actual [client] workspace.
Returns: Deletion process results Return type: dict
-
done
¶ Get a flag indicating whether or not the experiment is currently done.
Returns: done status Return type: bool
-
fastest_model
¶ Returns the model that predicts with the lowest response time
Returns: Model object – corresponding to the fastest model
-
classmethod
from_id
(_id: str) → previsionio.text_similarity.TextSimilarity¶ Get a text-similarity experiment version from the platform by its unique id.
Parameters: _id (str) – Unique id of the experiment version to retrieve Returns: Fetched experiment version Return type: TextSimilarity
Raises: PrevisionException
– Any error while fetching data from the platform or parsing result
-
get_holdout_predictions
(full: bool = False)¶ Retrieves the list of holdout predictions for the current experiment from client workspace (with the full predictions object if necessary) :param full: If true, return full holdout prediction objects (else only metadata) :type full: boolean
-
get_predictions
(full: bool = False)¶ Retrieves the list of predictions for the current experiment from client workspace (with the full predictions object if necessary) :param full: If true, return full prediction objects (else only metadata) :type full: boolean
-
model_class
¶
-
models
¶ Get the list of models generated for the current experiment version. Only the models that are done training are retrieved.
Returns: List of models found by the platform for the experiment Return type: list( Model
)
-
new_version
(dataset: previsionio.dataset.Dataset = None, description_column_config: previsionio.text_similarity.DescriptionsColumnConfig = None, metric: previsionio.metrics.TextSimilarity = None, top_k: int = None, lang: previsionio.text_similarity.TextSimilarityLang = None, queries_dataset: previsionio.dataset.Dataset = None, queries_column_config: Optional[previsionio.text_similarity.QueriesColumnConfig] = None, models_parameters: previsionio.text_similarity.ListModelsParameters = None, description: str = None) → previsionio.text_similarity.TextSimilarity¶ Start a new text-similarity experiment version training from this version (on the platform). The training parameters are copied from the current version and then overridden for those provided.
Parameters: - dataset (
Dataset
) – Reference to the dataset object to use for as training dataset - description_column_config (
DescriptionsColumnConfig
) – Description column configuration (see the documentation of theDescriptionsColumnConfig
resource for more details on each possible column types) - metric (
metrics.TextSimilarity
, optional) – Specific metric to use for the experiment - top_k (int, optional) – top_k
- lang (
TextSimilarityLang
, optional) – lang of the training dataset - queries_dataset (
Dataset
, optional) – Reference to a dataset object to use as a queries dataset - queries_column_config (
QueriesColumnConfig
) – Queries column configuration (see the documentation of theQueriesColumnConfig
resource for more details on each possible column types) - models_parameters (
ListModelsParameters
) – Specific training configuration (see the documentation of theListModelsParameters
resource for more details on all the parameters) - description (str, optional) – The description of this experiment version (default:
None
)
Returns: Newly created text-similarity experiment version object (new version)
Return type: - dataset (
-
print_info
()¶ Print all info on the experiment.
-
queries_dataset
¶ Get the
Dataset
object corresponding to the queries dataset of this experiment version.Returns: Associated queries dataset Return type: Dataset
-
running
¶ Get a flag indicating whether or not the experiment is currently running.
Returns: Running status Return type: bool
-
score
¶ Get the current score of the experiment (i.e. the score of the model that is currently considered the best performance-wise for this experiment).
Returns: Experiment score (or infinity if not available). Return type: float
-
status
¶ Get a flag indicating whether or not the experiment is currently running.
Returns: Running status Return type: bool
-
stop
()¶ Stop an experiment (stopping all nodes currently in progress).
-
train_dataset
¶ Get the
Dataset
object corresponding to the training dataset of the experiment.Returns: Associated training dataset Return type: Dataset
-
update_status
()¶ Get an update on the status of a resource.
Parameters: specific_url (str, optional) – Specific (already parametrized) url to fetch the resource from (otherwise the url is built from the resource type and unique _id
)Returns: Updated status info Return type: dict
-
wait_until
(condition, raise_on_error: bool = True, timeout: float = 3600.0)¶ Wait until condition is fulfilled, then break.
Parameters: - (func (condition) – (
BaseExperimentVersion
) -> bool.): Function to use to check the break condition - raise_on_error (bool, optional) – If true then the function will stop on error,
otherwise it will continue waiting (default:
True
) - timeout (float, optional) – Maximal amount of time to wait before forcing exit
Example:
experiment.wait_until(lambda experimentv: len(experimentv.models) > 3)
Raises: PrevisionException
– If the resource could not be fetched or there was a timeout.- (func (condition) – (
-
-
class
previsionio.text_similarity.
TextSimilarityModels
¶ Bases:
enum.Enum
Similarity search models for Text Similarity
-
BruteForce
= 'brute_force'¶ Brute force search
-
ClusterPruning
= 'cluster_pruning'¶ Cluster Pruning
-
HKM
= 'hkm'¶ Hierarchical K-Means
-
IVFOPQ
= 'ivf_opq'¶ InVerted File system and Optimized Product Quantization
-
LSH
= 'lsh'¶ Locality Sensitive Hashing
-