Welcome to prevision-python’s documentation!

Prevision.io is an automated SaaS machine learning platform that enables you to create deploy and monitor powerful predictive models and business applications.

This documentation focuses on how to use Prevision.io’s Python SDK for a direct usage in your data science scripts.

To take a quick peek at the available features, look at the Getting started guide.

If you’d rather examine the Python API directly, here is the direct API Reference.

The compatibility version between Prevision.io’s Python SDK and Prevision Platform works as follows:

Compatibility matrix
  Prevision 10.10 Prevision 10.11 Prevision 10.12 Prevision 10.13 Prevision 10.14 Prevision 10.15 Prevision 10.16 Prevision 10.17 Prevision 10.18 Prevision 10.19 Prevision 10.20 Prevision 10.21 Prevision 10.22 Prevision 10.23 Prevision 10.24 Prevision 11.0
Prevision Python SDK 10.10
Prevision Python SDK 10.11
Prevision Python SDK 10.12
Prevision Python SDK 10.13
Prevision Python SDK 10.14
Prevision Python SDK 10.15
Prevision Python SDK 10.16
Prevision Python SDK 10.17
Prevision Python SDK 10.18
Prevision Python SDK 10.19
Prevision Python SDK 10.20
Prevision Python SDK 10.21
Prevision Python SDK 10.22
Prevision Python SDK 10.23
Prevision Python SDK 10.24
Prevision Python SDK 11.0

Getting started

The following document is a step by step usage example of the Prevision.io Python SDK. The full documentation of the software is available here.

Pre-requisites

You need to have an account at cloud.prevision.io or on an on-premise version installed in your company. Contact us or your IT manager for more information.

You will be working on a specific “instance”. This instance corresponds to the subdomain at the beginning of the url in your prevision.io address: https://<your instance>.prevision.io.

Get the package

pip install previsionio

Set up your client

Prevision.io’s SDK client uses a specific master token to authenticate with the instance’s server and allows you to perform various requests. To get your master token, log in the online interface of your instance, navigate to the admin page and copy the token.

You can either set the token and the instance name as environment variables, by specifying PREVISION_URL and PREVISION_MASTER_TOKEN, or at the beginning of your script:

import previsionio as pio

# The client is initialized with your master token and the url of the prevision.io server
# (or local installation, if applicable)
url = "https://<your instance>.prevision.io"
token = "<your token>"
pio.client.init_client(url, token)

# You can manage the verbosity (only output warnings and errors by default)
pio.verbose(
    False,           # whether to activate info logging
    debug=False,     # whether to activate detailed debug logging
    event_log=False, # whether to activate detailed event managers debug logging
)

# You can manage the duration you wish to wait for an asynchronous response
pio.config.default_timeout = 3600

# You can manage the number of retries for each call to the Prevision.io API
pio.config.request_retries = 6

# You can manage the duration of retry for each call to the Prevision.io API
pio.config.request_retry_time = 10

Create a project

First things first, to upload data or train a usecase, you need to create a project.

# create project
project = pio.Project.new(name="project_name",
                          description="project description")

Data

To train a usecase, you need to gather some training data. This data must be uploaded to your instance using either a data source, a file path or a pandas.DataFrame.

Managing datasources & connectors

Datasources and connectors are Prevision.io’s way of keeping a link to a source of data and taking snapshots when needed. The avaible data sources are:

  • SQL
  • FTP
  • SFTP
  • S3
  • GCP

Connectors hold the credentials to connect to the distant data sources. Then you can specify the exact resource to extract from a data source (be it the path to the file to load, the name of the database table to parse, …).

Creating a connector

To create a connector, use the appropriate method of project class. For example, to create a connector to an SQL database, use the create_sql_connector() and pass in your credentials:

connector = project.create_sql_connector('my_sql_connector',
                                         'https://myserver.com',
                                         port=3306,
                                         username='username',
                                         password='password')

For more information on all the available connectors, check out the Project full documentation.

Creating a data source

After you’ve created a connector, you need to use a datasource to actually refer to and fetch a resource in the distant data source. To create a datasource, you need to link the matching connector and to supply the relevant info, depending on the connector type:

datasource = project.create_datasource(connector,
                                       'my_sql_datasource',
                                       database='my_db',
                                       table='table1')

For more details on the creation of a datasource, check out the Project full documentation of the method create_datasource.

You can then create datasets from this datasource as explained in Uploading Data.

Listing available connectors and data sources

Connectors and datasources already registered on your workspace can be listed using the list_connectors() and list_datasource() method from project class:

connectors = project.list_connectors()
for connector in connectors:
    print(connector.name)

datasources = project.list_datasource()
for datasource in datasources:
    print(datasource.name)

Uploading Data

You can upload data from three different sources: a path to a local (csv, zip) file, a pandas.DataFrame or a created data source

# Upload tabular data from a CSV file
data_path = 'path/to/your/data.csv'
dataset = project.create_dataset(name='helloworld', file_name=data_path)

# or use a pandas DataFrame
dataframe = pd.read_csv(data_path)
dataset = project.create_dataset(name='helloworld', dataframe=dataframe)

# or use a created data source
datasource = pio.DataSource.from_id('my_datasource_id')
dataset = project.create_dataset(name='helloworld', datasource=datasource)

# Upload an image folder
image_folder_path = 'path/to/your/image_data.zip'
image_folder = project.create_image_folder(name='helloworld', file_name=image_folder_path)

This will automatically upload the data as a new dataset in your workspace. If you go to the online interface, you will see this new dataset in the list of datasets (in the “Data” tab).

Listing available datasets

To get a list of all the datasets currently available in your workspace, use the list_datasets() method:

# List tabular datasets
datasets = project.list_datasets()
for dataset in datasets:
    print(dataset.name)

# List image folders
image_folders = project.list_image_folders()
for folder in image_folders:
    print(folder.name)

Downloading data from your workspace

If you created or uploaded a dataset in your workspace and want to grab it locally, simply use the Dataset.download method:

out_path = dataset.download(download_path="your/local/path")

Regression/Classification/Multi-classification usecases

Configuring the dataset

To start a usecase you need to specify the dataset to be used and its configuration (target column, weight column, id column, …). To get a full documentation check the api reference of the ColumnConfig in Usecase configuration.

column_config = pio.ColumnConfig(target_column='TARGET', id_column='ID')

Configuring the training parameters

If you want, you can also specify some training parameters, such as which models are used, which transformations are applied, and how the models are optimized. To get a full documentation check the api reference of the TrainingConfig in Usecase configuration.

training_config = pio.TrainingConfig(
    advanced_models=[pio.AdvancedModel.LinReg],
    normal_models=[pio.NormalModel.LinReg],
    simple_models=[pio.SimpleModel.DecisionTree],
    features=[pio.Feature.Counts],
    profile=pio.Profile.Quick,
)

Starting training

You can now create a new usecase based on:

  • a usecase name
  • a dataset
  • a column config
  • (optional) a metric type
  • (optional) a training config
  • (optional) a holdout dataset (dataset only used for evaluation)
usecase_version = project.fit_classification(
    name='helloworld_classif',
    dataset=dataset,
    column_config=column_config,
    metric=pio.metrics.Classification.AUC,
    training_config=training_config,
    holdout_dataset=None,
)

If you want to use image data for your usecase, you need to provide the API with both the tabular dataset and the image folder:

usecase_version = project.fit_image_classification(
    name='helloworld_images_classif',
    dataset=(dataset, image_folder),
    column_config=column_config,
    metric=pio.metrics.Classification.AUC,
    training_config=training_config,
    holdout_dataset=None,
)

To get an exhaustive list of the available metrics go to the api reference Metrics.

Making predictions

To make predictions from a dataset and a usecase, you need to wait until at least one model is trained. This can be achieved in the following way:

# block until there is at least 1 model trained
usecase_version.wait_until(lambda usecasev: len(usecasev.models) > 0)

# check out the usecase status and other info
usecase_version.print_info()
print('Current (best model) score:', usecase_version.score)

Note

The wait_until method takes a function that takes the usecase as an argument, and can therefore access any info relative to the usecase.

Then you have to options:

  1. you can predict from a dataset of your workspace, which returns a previsionio.ValidationPrediction object. It allows you to keep on working even if the prediction isn’t complete
  2. you can predict from a pd.DataFrame, which returns a pd.DataFrame once the prediction is complete
# predict from a dataset of your workspace
validation_prediction = usecase_version.predict_from_dataset(test_dataset)
# get the result at a pandas.DataFrame
prediction_df = validation_prediction.get_result()

# predict from a pandas.DataFrame
prediction_df = usecase_version.predict(test_dataframe)

Time Series usecases

A time series usecase is very similar to a regression usecase. The main differences rely in the dataset configuration, and the specification of a time window.

Configuring the dataset

Here you need to specify which column in the dataset defines the time steps. Also you can specify the group_columns (columns defining a unique time serie) as well as the apriori_columns (columns containing information known in advanced):

column_config = pio.ColumnConfig(
    target_column='Sales',
    id_column='ID',
    time_column='Date',
    group_columns=['Store', 'Product'],
    apriori_columns=['is_holiday'],
)

Configuring the training parameters

The training config is the same as for a regression usecase (detailed in Configuring the training parameters).

Starting training

You can now create a new usecase based on:

  • a usecase name
  • a dataset
  • a column config
  • a time window
  • (optional) a metric type
  • (optional) a training config

In particular the time_window parameter defines the period in the past that you have for each prediction, and the period in the future that you want to predict:

# Define your time window:
# example here using 2 weeks in the past to predict the next week
time_window = pio.TimeWindow(
    derivation_start=-28,
    derivation_end=-14,
    forecast_start=1,
    forecast_end=7,
)

usecase_version = project.fit_timeseries_regression(
    name='helloworld_time_series',
    dataset=dataset,
    time_window=time_window,
    column_config=column_config,
    metric=pio.metrics.Regression.RMSE,
    training_config=training_config,
    holdout_dataset=None,
)

To get a full documentation check the api reference Time Series usecases.

Making predictions

The prediction workflow is the same as for a classic usecase (detailed in Making predictions).

Text Similarity usecases

A Text Similarity usecase matches the most similar texts between a dataset containing descriptions (can be seen as a catalog) and a dataset containing queries. It first converts texts to numerical vectors (text embeddings) and then performs a similarity search to retrieve the most similar documents to a query.

Configuring the datasets

To start a usecase you need to specify the datasets to be used and their configuration. Note that a DescriptionsDataset is required while a QueriesDataset is optional during training (used for scoring).

# Required: configuration of the DescriptionsDataset
description_column_config = pio.TextSimilarity.DescriptionsColumnConfig(
    content_column='text_descriptions',
    id_column='ID',
)

# Optional: configuration of the QueriesDataset
queries_column_config = pio.TextSimilarity.QueriesColumnConfig(
    content_column='text_queries',
    id_column='ID',
)

To get a full documentation check the api reference of DescriptionsColumnConfig and QueriesColumnConfig.

Configuring the training parameters

If you want, you can also specify some training parameters, such as which embedding models, searching models and preprocessing are used. Here you need to specify one configuration per embedding model you want to use:

# Using TF-IDF as embedding model
models_parameters_1 = pio.ModelsParameters(
    model_embedding=pio.ModelEmbedding.TFIDF,
    preprocessing=pio.Preprocessing(),
    models=[pio.TextSimilarityModels.BruteForce, pio.TextSimilarityModels.ClusterPruning],
)

# Using Transformer as embedding model
models_parameters_2 = pio.ModelsParameters(
    model_embedding=pio.ModelEmbedding.Transformer,
    preprocessing=pio.Preprocessing(),
    models=[pio.TextSimilarityModels.BruteForce, pio.TextSimilarityModels.IVFOPQ],
)

# Using fine-tuned Transformer as embedding model
models_parameters_3 = pio.ModelsParameters(
    model_embedding=pio.ModelEmbedding.TransformerFineTuned,
    preprocessing=pio.Preprocessing(),
    models=[pio.TextSimilarityModels.BruteForce, pio.TextSimilarityModels.IVFOPQ],
)

# Gather everything
models_parameters = [models_parameters_1, models_parameters_2, models_parameters_3]
models_parameters = pio.ListModelsParameters(models_parameters=models_parameters)

To get a full documentation check the api reference of ModelsParameters.

Note

If you want the default configuration of text similarity models, simply use:

models_parameters = pio.ListModelsParameters()

Starting the training

You can then create a new text similarity usecase based on:

  • a usecase name
  • a dataset
  • a description column config
  • (optional) a queries dataset
  • (optional) a queries column config
  • (optional) a metric type
  • (optional) the number of top k results you want per query
  • (optional) a language
  • (optional) a models parameters list
usecase_verion = project.fit_text_similarity(
    name='helloworld_text_similarity',
    dataset=dataset,
    description_column_config=description_column_config,
    metric=pio.metrics.TextSimilarity.accuracy_at_k,
    top_k=10,
    queries_dataset=queries_dataset,
    queries_column_config=queries_column_config,
    models_parameters=models_parameters,
)

To get a full documentation check the api reference of previsionio.metrics.TextSimilarity.

Making predictions

The prediction workflow is very similar to a classic usecase (detailed in Making predictions).

The only differences are the specific parameters top_k and queries_dataset_matching_id_description_column which are optional.

To get a full documentation check the api reference of TextSimilarityModel prediction methods.

Deployed usecases

Prevision.io’s SDK allows to deploy a usecase’s models. Deployed models are made available for unit and bulk prediction through apis. Then you can follow the usage of a model and the evolution of its input features distribution.

You first need to deploy a main model (and a challenger model) from an existing usecase:

# retrieve the best model of your usecase
uc_best_model = usecase_version.best_model

# deploy the usecase model
usecase_deployment = project.create_usecase_deployment(
    'my_deployed_usecase',
    main_model=uc_best_model,
    challenger_model=None,
)

Now you can make bulk predictions from your deployed model(s):

# make predictions
deployment_prediction = usecase_deployment.predict_from_dataset(test_dataset)

# retrieve prediction from main model
prediction_df = deployment_prediction.get_result()

# retrieve prediction from challenger model (if any)
prediction_df = deployment_prediction.get_challenger_result()

To get a full documentation check the api reference Usecase Deployment.

You can also make unitary predictions from the main model:

# create an api key for your model
usecase_deployment.create_api_key()

# retrieve the last client id and client secret
creds = usecase_deployment.get_api_keys()[-1]

# initialize the deployed model with its url, your client id and client secret
model = pio.DeployedModel(
    prevision_app_url=usecase_deployment.url,
    client_id=creds['client_id'],
    client_secret=creds['client_secret'],
)

# make a prediction
prediction, confidance, explain = model.predict(
    predict_data={'feature1': 0, 'feature2': 42},
    use_confidence=True,
    explain=True,
)

To get a full documentation check the api reference Deployed model.

Exporters

Once you trained a model and made predictions from it you might want to export your results on a remote filesystem/database. To do so you will need a registered connector on your project (described in section Creating a connector).

Creating an exporter

The first step is to create an exporter in your project:

exporter = project.create_exporter(
    connector=connector,
    name = 'my_exporter',
    path='remote/file/path.csv',
    write_mode = pio.ExporterWriteMode.timestamp,
)

To get a full documentation check the api reference Exporter.

Exporting

Once your exporter is operational you can export your datasets or predictions:

# export a dataset stored in your project
export = exporter.export_dataset(
    dataset=dataset,
    wait_for_export=False,
)

# export a prediction stored in your project
export = exporter.export_prediction(
    prediction=deployment_prediction,
    wait_for_export=False,
)

To get a full documentation check the api reference Export.

Additional util methods

Retrieving a use case

Since a use case can be somewhat long to train, it can be useful to separate the training, monitoring and prediction phases.

To do that, we need to be able to recreate a usecase object in python from its name:

usecase_version = pio.Supervised.from_id('<a usecase id>')
# Usecase_version now has all the same methods as a usecase_version
# created directly from a file or a dataframe
usecase_version.print_info()

Stopping and deleting

Once you’re satisfied with model performance, don’t want to wait for the complete training process to be over, or need to free up some resources to start a new training, you can stop the usecase_version simply:

usecase_version.stop()

You’ll still be able to make predictions and get info, but the performance won’t improve anymore. Note: there’s no difference in state between a stopped usecase and a usecase that has completed its training completely.

You can decide to completely delete the usecase:

uc = pio.Usecase.from_id(usecase_version.usecase_id)
uc.delete()

However be careful, in that case any detail about the usecase will be removed, and you won’t be able to make predictions from it anymore.

API Reference

This section gathers all the available classes, functions and tools offered by Prevision.io’s Python SDK.

Base API Resource

All resource objects you will be using in Prevision.io’s Python SDK inherit from this base parent class.

In the SDK, a resource is an object that can be fetched from the platform, used in your code, updated, deleted… previsionio.usecase.BaseUsecase, previsionio.dataset.Dataset and previsionio.model.Model are all resources.

class previsionio.api_resource.ApiResource(**params)

Base parent class for all SDK resource objects.

delete()

Delete a resource from the actual [client] workspace.

Raises:PrevisionException – Any error while deleting data from the platform
update_status(specific_url: str = None) → Dict

Get an update on the status of a resource.

Parameters:specific_url (str, optional) – Specific (already parametrized) url to fetch the resource from (otherwise the url is built from the resource type and unique _id)
Returns:Updated status info
Return type:dict
class previsionio.api_resource.ApiResourceType

All the different resource types and matching API endpoints.

Client

Prevision.io’s SDK client uses a specific master token to authenticate with the instance’s server and allow you to perform various requests. To get your master token, log in the online interface, navigate to the admin page and copy the token.

You can either set the token and the instance name as environment variables, by specifying PREVISION_URL and PREVISION_MASTER_TOKEN, or at the beginning of your script:

import previsionio as pio

# We initialize the client with our master token and the url of the prevision.io server
# (or local installation, if applicable)
url = """https://<your instance>.prevision.io"""
token = """<your token>"""
pio.client.init_client(url, token)
class previsionio.prevision_client.Client

Client class to interact with the Prevision.io platform and manage authentication.

init_client(prevision_url: str, token: str)

Init the client (and check that the connection is valid).

Parameters:
  • prevision_url (str) – URL of the Prevision.io platform. Should be of the form https://<instance_name>.prevision.io, or a custom IP address if working on-premise.
  • token (str) –

    Your Prevision.io master token. Can be retrieved on /dashboard/infos on the web interface or obtained programmatically through:

    client.init_client(prevision_url, token)
    
request(endpoint: str, method, files: Dict = None, data: Dict = None, format: Dict = None, allow_redirects: bool = True, content_type: str = None, check_response: bool = True, message_prefix: str = None, **requests_kwargs) → requests.models.Response

Make a request on the desired endpoint with the specified method & data.

Requires initialization.

Parameters:
  • endpoint – (str): api endpoint (e.g. /usecases, /prediction/file)
  • method (requests.{get,post,delete}) – requests method
  • files (dict) – files dict
  • data (dict) – for single predict
  • content_type (str) – force request content-type
  • allow_redirects (bool) – passed to requests method
  • check_response (bool) – wether to handle error or not
  • message_prefix (str) – prefix message in error logs
Returns:

request response

Raises:

Exception – Error if url/token not configured

Project

class previsionio.project.Project(_id: str, name: str, description: str = None, color: previsionio.project.ProjectColor = None, created_by: str = None, admins=[], contributors=[], viewers=[], pipelines_count: int = 0, usecases_count: int = 0, dataset_count: int = 0, **kwargs)

Bases: previsionio.api_resource.ApiResource, previsionio.api_resource.UniqueResourceMixin

A Project

Parameters:
  • _id (str) – Unique id of the project
  • name (str) – Name of the project
  • description (str, optional) – Description of the project
  • color (ProjectColor, optional) – Color of the project
create_dataset(name: str, datasource: previsionio.datasource.DataSource = None, file_name: str = None, dataframe: pandas.core.frame.DataFrame = None, **kwargs)

Register a new dataset in the workspace for further processing. You need to provide either a datasource, a file name or a dataframe (only one can be specified).

Note

To start a new use case on a dataset, it has to be already registred in your workspace.

Parameters:
  • name (str) – Registration name for the dataset
  • datasource (DataSource, optional) – A DataSource object used to import a remote dataset (if you want to import a specific dataset from an existent database, you need a datasource connector (Connector object) designed to point to the related data source)
  • file_name (str, optional) – Path to a file to upload as dataset
  • dataframe (pd.DataFrame, optional) – A pandas dataframe containing the data to upload
Raises:
  • Exception – If more than one of the keyword arguments datasource, file_name, dataframe was specified
  • PrevisionException – Error while creating the dataset on the platform
Returns:

The registered dataset object in the current workspace.

Return type:

Dataset

create_datasource(connector: previsionio.connector.Connector, name: str, path: str = None, database: str = None, table: str = None, bucket: str = None, request: str = None, gCloud: previsionio.connector.GCloud = None)

Create a new datasource object on the platform.

Parameters:
  • connector (Connector) – Reference to the associated connector (the resource to go through to get a data snapshot)
  • name (str) – Name of the datasource
  • path (str, optional) – Path to the file to fetch via the connector
  • database (str, optional) – Name of the database to fetch data from via the connector
  • table (str, optional) – Name of the table to fetch data from via the connector
  • bucket (str, optional) – Name of the bucket to fetch data from via the connector
  • gCloud (GCloud, optional) – Type of google cloud service
  • request (str, optional) – Direct SQL request to use with the connector to fetch data
Returns:

The registered datasource object in the current project

Return type:

DataSource

Raises:
  • PrevisionException – Any error while uploading data to the platform or parsing the result
  • Exception – For any other unknown error
create_exporter(connector: previsionio.connector.Connector, name: str, description: str = None, path: str = None, bucket: str = None, database: str = None, table: str = None, g_cloud: previsionio.connector.GCloud = None, write_mode: previsionio.exporter.ExporterWriteMode = <ExporterWriteMode.safe: 'safe'>)

Create a new exporter object on the platform.

Parameters:
  • connector (Connector) – Reference to the associated connector (the resource to go through to get a data snapshot)
  • name (str) – Name of the exporter
  • description (str, optional) – Description of the exporter
  • bucket (str, optional) – Bucket of the file to write on via the exporter
  • path (str, optional) – Path to the file to write on via the exporter
  • database (str, optional) – Name of the database to write on via the exporter
  • table (str, optional) – Name of the table to write on via the exporter
  • g_cloud (GCloud, optional) – Type of google cloud service
  • write_mode (ExporterWriteMode, optional) – Write mode
Returns:

The registered exporter object in the current project

Return type:

Exporter

Raises:
  • PrevisionException – Any error while uploading data to the platform or parsing the result
  • Exception – For any other unknown error
create_ftp_connector(name: str, host: str, port: int = 21, username: str = '', password: str = '')

A connector to interact with a distant source of data (and easily get data snapshots using an associated DataSource resource).

Parameters:
  • name (str) – Name of the connector
  • host (str) – Url of the connector
  • port (int) – Port of the connector
  • username (str, optional) – Username to use connect to the remote data source
  • password (str, optional) – Password to use connect to the remote data source
Returns:

The registered connector object in the current project.

Return type:

FTPConnector

create_gcp_connector(name: str = '', host: str = '', port=None, username: str = '', password: str = '', googleCredentials: str = '')

A connector to interact with a distant source of data (and easily get data snapshots using an associated DataSource resource).

Parameters:
  • name (str) – Name of the connector
  • googleCredentials (str) – google credentials
Returns:

The registered connector object in the current project.

Return type:

GCPConnector

create_image_folder(name: str, file_name: str)

Register a new image dataset in the workspace for further processing (in the image folders group).

Note

To start a new use case on a dataset image, it has to be already registred in your workspace.

Parameters:
  • name (str) – Registration name for the dataset
  • file_name (str) – Path to the zip file to upload as image dataset
Raises:

PrevisionException – Error while creating the dataset on the platform

Returns:

The registered dataset object in the current workspace.

Return type:

DatasetImages

create_s3_connector(name: str, host: str = '', port: int = None, username: str = '', password: str = '')

A connector to interact with a distant source of data (and easily get data snapshots using an associated DataSource resource).

Parameters:
  • name (str) – Name of the connector
  • host (str) – Url of the connector
  • port (int) – Port of the connector
  • username (str, optional) – Username to use connect to the remote data source
  • password (str, optional) – Password to use connect to the remote data source
Returns:

The registered connector object in the current project.

Return type:

S3Connector

create_sftp_connector(name: str, host: str, port: int = 23, username: str = '', password: str = '')

A connector to interact with a distant source of data (and easily get data snapshots using an associated DataSource resource).

Parameters:
  • name (str) – Name of the connector
  • host (str) – Url of the connector
  • port (int) – Port of the connector
  • username (str, optional) – Username to use connect to the remote data source
  • password (str, optional) – Password to use connect to the remote data source
Returns:

The registered connector object in the current project.

Return type:

SFTPConnector

create_sql_connector(name: str, host: str, port: int = 3306, username: str = '', password: str = '')

A connector to interact with a distant source of data (and easily get data snapshots using an associated DataSource resource).

Parameters:
  • name (str) – Name of the connector
  • host (str) – Url of the connector
  • port (int) – Port of the connector
  • username (str, optional) – Username to use connect to the remote data source
  • password (str, optional) – Password to use connect to the remote data source
Returns:

The registered connector object in the current project.

Return type:

SQLConnector

delete()

Delete a project from the actual [client] workspace.

Raises:
  • PrevisionException – If the project does not exist
  • requests.exceptions.ConnectionError – Error processing the request
fit_classification(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Classification = <Classification.AUC: 'auc'>, holdout_dataset=None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)

Start a tabular classification usecase version training

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.Classification, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created Classification usecase version object

Return type:

supervised.Classification

fit_image_classification(name: str, dataset: Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages], column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Classification = <Classification.AUC: 'auc'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)

Start an image classification usecase version training

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset, DatasetImages) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.Classification, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created ClassificationImages usecase version object

Return type:

supervised.ClassificationImages

fit_image_multiclassification(name: str, dataset: Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages], column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.MultiClassification = <MultiClassification.log_loss: 'log_loss'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs) → previsionio.supervised.Supervised

Start an image multiclassification usecase version training

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset, DatasetImages) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.MultiClassification, optional) – Specific metric to use for the usecase (default: metrics.MultiClassification.log_loss)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created MultiClassificationImages usecase version

object

Return type:

supervised.MultiClassificationImages

fit_image_regression(name: str, dataset: Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages], column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Regression = <Regression.RMSE: 'rmse'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)

Start an image regression usecase version training

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset, DatasetImages) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.Regression, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created RegressionImages usecase version object

Return type:

supervised.RegressionImages

fit_multiclassification(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.MultiClassification = <MultiClassification.log_loss: 'log_loss'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)

Start a tabular multiclassification usecase version training

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.MultiClassification, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created MultiClassification usecase version object

Return type:

supervised.MultiClassification

fit_regression(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Regression = <Regression.RMSE: 'rmse'>, holdout_dataset=None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)

Start a tabular regression usecase version training

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.Regression, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created Regression usecase version object

Return type:

supervised.Regression

fit_text_similarity(name: str, dataset: previsionio.dataset.Dataset, description_column_config: previsionio.text_similarity.DescriptionsColumnConfig, metric: previsionio.metrics.TextSimilarity = <TextSimilarity.accuracy_at_k: 'accuracy_at_k'>, top_k: int = 10, lang: previsionio.text_similarity.TextSimilarityLang = <TextSimilarityLang.Auto: 'auto'>, queries_dataset: previsionio.dataset.Dataset = None, queries_column_config: previsionio.text_similarity.QueriesColumnConfig = None, models_parameters: previsionio.text_similarity.ListModelsParameters = <previsionio.text_similarity.ListModelsParameters object>)

Start a text similarity usecase training with a specific training configuration.

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset) – Reference to the dataset object to use for as training dataset
  • description_column_config (DescriptionsColumnConfig) – Description column configuration (see the documentation of the DescriptionsColumnConfig resource for more details on each possible column types)
  • metric (metrics.TextSimilarity, optional) – Specific metric to use for the usecase (default: accuracy_at_k)
  • top_k (int, optional) – top_k (default: 10)
  • queries_dataset (Dataset, optional) – Reference to a dataset object to use as a queries dataset (default: None)
  • queries_column_config (QueriesColumnConfig) – Queries column configuration (see the documentation of the QueriesColumnConfig resource for more details on each possible column types)
  • models_parameters (ListModelsParameters) – Specific training configuration (see the documentation of the ListModelsParameters resource for more details on all the parameters)
Returns:

Newly created TextSimilarity usecase version object

Return type:

previsionio.text_similarity.TextSimilarity

fit_timeseries_regression(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, time_window: previsionio.timeseries.TimeWindow, metric: previsionio.metrics.Regression = <Regression.RMSE: 'rmse'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>) → previsionio.timeseries.TimeSeries

Start a timeseries regression usecase version training

Parameters:
  • name (str) – Name of the usecase to create
  • dataset (Dataset) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig) – Column configuration for the usecase version (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • time_window (TimeWindow) – Time configuration (see the documentation of the TimeWindow resource for more details)
  • metric (metrics.Regression, optional) – Specific metric to use for the usecase (default: metrics.Regression.RMSE)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created TimeSeries usecase version object

Return type:

TimeSeries

classmethod from_id(_id: str)

Get a project from the instance by its unique id.

Parameters:_id (str) – Unique id of the resource to retrieve
Returns:The fetched datasource
Return type:Project
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
info() → Dict

Get a datasource from the instance by its unique id.

Parameters:_id (str) – Unique id of the resource to retrieve
Returns:
Information about the Project with these entries:
”_id” “name” “description” “color” “created_by” “admins” “contributors” “viewers” “pipelines_count” “usecases_count” “dataset_count” “users”
Return type:Dict
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
classmethod list(all: bool = False)

List all the available project in the current active [client] workspace.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched project objects
Return type:list(Project)
list_connectors(all: bool = True)

List all the available connectors in the current active project.

Warning

Contrary to the parent list() function, this method returns actual Connector objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(Connector)
list_datasets(all: bool = True)

List all the available datasets in the current active project.

Warning

Contrary to the parent list() function, this method returns actual Dataset objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(Dataset)
list_datasource(all: bool = False)

List all the available datasources in the current active project.

Warning

Contrary to the parent list() function, this method returns actual DataSource objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(DataSource)
list_exporter(all: bool = False)

List all the available exporters in the current active project.

Warning

Contrary to the parent list() function, this method returns actual Exporter objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(Exporter)
list_image_folders(all: bool = True)

List all the available dataset image in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual DatasetImages objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(DatasetImages)
list_usecase_deployments(all: bool = True)

List all the available usecase in the current project.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched usecase deployment objects
Return type:list(UsecaseDeployment)
list_usecases(all: bool = True)

List all the available usecase in the current project.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched usecase objects
Return type:list(Usecase)
classmethod new(name: str, description: str = None, color: previsionio.project.ProjectColor = None) → previsionio.project.Project

Create a new project on the platform.

Parameters:
  • name (str) – Name of the project
  • description (str, optional) – Description of the project
  • color (str, optional) – Color of the project
Returns:

The registered project object in the current workspace

Return type:

Project

Raises:
  • PrevisionException – Any error while uploading data to the platform or parsing the result
  • Exception – For any other unknown error
users()

Get a project from the instance by its unique id.

Parameters:_id (str) – Unique id of the resource to retrieve
Returns:The fetched datasource
Return type:Project
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
class previsionio.project.ProjectColor

Bases: enum.Enum

An enumeration.

Connector

In all the specific connectors, the parameters for the new() method are the same as the ones in the Connector._new().

class previsionio.connector.Connector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.api_resource.ApiResource, previsionio.api_resource.UniqueResourceMixin

A connector to interact with a distant source of data (and easily get data snapshots using an associated DataSource resource).

Parameters:
  • _id (str) – Unique reference of the connector on the platform
  • name (str) – Name of the connector
  • host (str) – Url of the connector
  • port (int) – Port of the connector
  • conn_type (str) – Type of the connector, among “FTP”, “SFTP”, “SQL”, “S3”, “GCP”
  • username (str, optional) – Username to use connect to the remote data source
  • password (str, optional) – Password to use connect to the remote data source
classmethod _new(project_id: str, name: str, host: str, port: Optional[int], conn_type: str, username: str = None, password: str = None, googleCredentials: str = None)

Create a new connector object on the platform.

Parameters:
  • name (str) – Name of the connector
  • host (str) – Url of the connector
  • port (int) – Port of the connector
  • conn_type (str) – Type of the connector, among “FTP”, “SFTP”, “SQL”, “S3”, “GCP”
  • username (str, optional) – Username to use connect to the remote data source
  • password (str, optional) – Password to use connect to the remote data source
Returns:

Newly create connector object

Return type:

Connector

delete()

Delete a connector from the actual [client] workspace.

Raises:
  • PrevisionException – If the connector does not exist
  • requests.exceptions.ConnectionError – Error processing the request
classmethod list(project_id: str, all: bool = False)

List all the available connectors in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual Connector objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched connector objects
Return type:list(Connector)
test()

Test a connector already uploaded on the platform.

Returns:Test results
Return type:dict
class previsionio.connector.DataFileBaseConnector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.connector.Connector

A specific type of connector to interact with a database client (containing files).

list_files()

List all available tables in a specific database for the client.

Parameters:database (str) – Name of the database to find tables for
Returns:files information
Return type:dict
class previsionio.connector.DataTableBaseConnector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.connector.Connector

A specific type of connector to interact with a database client (containing databases and tables).

list_databases()

List all available databases for the client.

Returns:Databases information
Return type:dict
list_tables(database: str)

List all available tables in a specific database for the client.

Parameters:database (str) – Name of the database to find tables for
Returns:Tables information
Return type:dict
class previsionio.connector.FTPConnector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.connector.DataFileBaseConnector

A specific type of connector to interact with a FTP client (containing files).

class previsionio.connector.GCPConnector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.connector.Connector

A specific type of connector to interact with a GCP database client (containing databases and tables or buckets).

class previsionio.connector.GCloud

Bases: enum.Enum

Google services.

big_query = 'BigQuery'

Google BigQuery

storage = 'Storage'

Google Storage

class previsionio.connector.S3Connector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.connector.Connector

A specific type of connector to interact with an Amazon S3 client (containing buckets with files).

class previsionio.connector.SFTPConnector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.connector.DataFileBaseConnector

A specific type of connector to interact with a secured FTP client (containing files).

class previsionio.connector.SQLConnector(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)

Bases: previsionio.connector.DataTableBaseConnector

A specific type of connector to interact with a SQL database client (containing databases and tables).

DataSource

class previsionio.datasource.DataSource(_id, connector_id: str, name: str, path: str = None, database: str = None, table: str = None, request: str = None, gCloud: previsionio.connector.GCloud = None, **kwargs)

Bases: previsionio.api_resource.ApiResource, previsionio.api_resource.UniqueResourceMixin

A datasource to access a distant data pool and create or fetch data easily. This resource is linked to a Connector resource that represents the connection to the distant data source.

Parameters:
  • _id (str) – Unique id of the datasource
  • connector_id (str) – Reference to the associated connector (the resource to go through to get a data snapshot)
  • name (str) – Name of the datasource
  • path (str, optional) – Path to the file to fetch via the connector
  • database (str, optional) – Name of the database to fetch data from via the connector
  • table (str, optional) – Name of the table to fetch data from via the connector
  • bucket (str, optional) – Bucket of the file to fetch via the connector
  • request (str, optional) – Direct SQL request to use with the connector to fetch data
  • gCloud (GCloud, optional) – Type of google cloud service
delete()

Delete a datasource from the actual [client] workspace.

Raises:
  • PrevisionException – If the datasource does not exist
  • requests.exceptions.ConnectionError – Error processing the request
classmethod from_id(_id: str)

Get a datasource from the instance by its unique id.

Parameters:_id (str) – Unique id of the resource to retrieve
Returns:The fetched datasource
Return type:DataSource
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
classmethod list(project_id: str, all: bool = False)

List all the available datasources in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual DataSource objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched datasource objects
Return type:list(DataSource)

Dataset

class previsionio.dataset.Dataset(_id: str, name: str, datasource: previsionio.datasource.DataSource = None, _data: pandas.core.frame.DataFrame = None, describe_state: Dict = None, drift_state=None, embeddings_state=None, separator=', ', **kwargs)

Bases: previsionio.api_resource.ApiResource

Dataset objects represent data resources that will be explored by Prevision.io platform.

In order to launch an auto ml process (see BaseUsecase class), we need to have the matching dataset stored in the related workspace.

Within the platform they are stored in tabular form and are derived:

  • from files (CSV, ZIP)
  • or from a Data Source at a given time (snapshot)
data

Load in memory the data content of the current dataset into a pandas DataFrame.

Returns:Dataframe for the data object
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching or parsing the data
delete()

Delete a dataset from the actual [client] workspace.

Raises:
  • PrevisionException – If the dataset does not exist
  • requests.exceptions.ConnectionError – Error processing the request
download(download_path: str = None)

Download the dataset from the platform locally.

Parameters:download_path (str, optional) – Target local directory path (if none is provided, the current working directory is used)
Returns:Path the data was downloaded to
Return type:str
Raises:PrevisionException – If dataset does not exist or if there was another error fetching or parsing data
get_embedding() → Dict

Gets the embeddings analysis of the dataset from the actual [client] workspace

Raises:
  • PrevisionException – DatasetNotFoundError
  • requests.exceptions.ConnectionError – request error
classmethod list(project_id: str, all: bool = True)

List all the available datasets in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual Dataset objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(Dataset)
start_embedding()

Starts the embeddings analysis of the dataset from the actual [client] workspace

Raises:
  • PrevisionException – DatasetNotFoundError
  • requests.exceptions.ConnectionError – request error
to_pandas() → pandas.core.frame.DataFrame

Load in memory the data content of the current dataset into a pandas DataFrame.

Returns:Dataframe for the data object
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching or parsing the data
update_status()

Get an update on the status of a resource.

Parameters:specific_url (str, optional) – Specific (already parametrized) url to fetch the resource from (otherwise the url is built from the resource type and unique _id)
Returns:Updated status info
Return type:dict
class previsionio.dataset.DatasetImages(_id: str, name: str, project_id: str, copy_state, **kwargs)

Bases: previsionio.api_resource.ApiResource

DatasetImages objects represent image data resources that will be used by Prevision.io’s platform.

In order to launch an auto ml process (see BaseUsecase class), we need to have the matching dataset stored in the related workspace.

Within the platform, image folder datasets are stored as ZIP files and are copied from ZIP files.

delete()

Delete a DatasetImages from the actual [client] workspace.

Raises:
  • PrevisionException – If the dataset images does not exist
  • requests.exceptions.ConnectionError – Error processing the request
download(download_path: str = None)

Download the dataset from the platform locally.

Parameters:download_path (str, optional) – Target local directory path (if none is provided, the current working directory is used)
Returns:Path the data was downloaded to
Return type:str
Raises:PrevisionException – If dataset does not exist or if there was another error fetching or parsing data
classmethod list(project_id: str, all: bool = True)

List all the available dataset image in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual DatasetImages objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(DatasetImages)

Metrics

The metric of a usecase is the function that will be used to assess for the efficiency of its models. The metrics you can choose depends on the type of usecase you are training.

class previsionio.metrics.Classification

Metrics for classification projects Available metrics in prevision: auc, log_loss, error_rate_binary

AUC = 'auc'

Area Under ROC Curve

AUCPR = 'aucpr'

precision recall area under the curve score

F05 = 'F05'

F05 Score

F1 = 'F1'

Balanced F-score

F2 = 'F2'

F2 Score

F3 = 'F3'

F3 Score

F4 = 'F4'

F4 Score

Lift01 = 'lift_at_0.1'

lift at ratio 0.1

Lift02 = 'lift_at_0.2'

lift at ratio 0.2

Lift03 = 'lift_at_0.3'

lift at ratio 0.3

Lift04 = 'lift_at_0.4'

lift at ratio 0.4

Lift05 = 'lift_at_0.5'

lift at ratio 0.5

Lift06 = 'lift_at_0.6'

lift at ratio 0.6

Lift07 = 'lift_at_0.7'

lift at ratio 0.7

Lift08 = 'lift_at_0.8'

lift at ratio 0.8

Lift09 = 'lift_at_0.9'

lift at ratio 0.9

MCC = 'mcc'

Matthews correlation coefficient

accuracy = 'accuracy'

Accuracy

error_rate = 'error_rate_binary'

Error rate

gini = 'gini'

Gini score

log_loss = 'log_loss'

Logarithmic Loss

class previsionio.metrics.MultiClassification

Metrics for multiclassification projects

AUC = 'auc'

Area Under ROC Curve

MAP10 = 'map_at_10'

qmean average precision @10

MAP3 = 'map_at_3'

qmean average precision @3

MAP5 = 'map_at_5'

qmean average precision @5

accuracy = 'accuracy'

accuracy

error_rate = 'error_rate_multi'

Multi-class Error rate

log_loss = 'log_loss'

Logarithmic Loss

macroF1 = 'macroF1'

balanced F-score

qkappa = 'qkappa'

quadratic weighted kappa

class previsionio.metrics.Regression

Metrics for regression projects Available metrics in prevision: rmse, mape, rmsle, mse, mae

MAE = 'mae'

Mean Average Error

MAPE = 'mape'

Mean Average Percentage Error

MER = 'mer'

Median Absolute Error

MSE = 'mse'

Mean squared Error

R2 = 'R2'

R2 Error

RMSE = 'rmse'

Root Mean Squared Error

RMSLE = 'rmsle'

Root Mean Squared Logarithmic Error

RMSPE = 'rmspe'

Root Mean Squared Percentage Error

SMAPE = 'smape'

Symmetric Mean Absolute Percentage Error

class previsionio.metrics.TextSimilarity

Metrics for text similarity projects

accuracy_at_k = 'accuracy_at_k'

Accuracy at K

mrr_at_k = 'mrr_at_k'

Mean Reciprocal Rank at K

Usecase

class previsionio.usecase.Usecase(**usecase_info)

Bases: previsionio.api_resource.ApiResource

A Usecase

Parameters:
  • _id (str) – Unique id of the usecase
  • name (str) – Name of the usecase
delete()

Delete a usecase from the actual [client] workspace.

Raises:
  • PrevisionException – If the usecase does not exist
  • requests.exceptions.ConnectionError – Error processing the request
classmethod from_id(_id: str) → previsionio.usecase.Usecase

Get a usecase from the platform by its unique id.

Parameters:_id (str) – Unique id of the usecase version to retrieve
Returns:Fetched usecase
Return type:Usecase
Raises:PrevisionException – Any error while fetching data from the platform or parsing result
latest_version

Get the latest version of this use case.

Returns:latest UsecaseVersion in this Usecase
Return type:(previsionio.text_similarity.TextSimilarity | Supervised | TimeSeries)
classmethod list(project_id: str, all: bool = True) → List[previsionio.usecase.Usecase]

List all the available usecase in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual Usecase objects rather than plain dictionaries with the corresponding data.

Parameters:
  • project_id (str) – project id
  • all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:

Fetched dataset objects

Return type:

list(Usecase)

usecase_version_class

Get the type of UsecaseVersion class used by this Usecase

Returns:Type of UsecaseVersion
Return type:(previsionio.text_similarity.TextSimilarity | Supervised | TimeSeries)
versions

Get the list of all versions for the current use case.

Returns:List of the usecase versions (as JSON metadata)
Return type:list(previsionio.text_similarity.TextSimilarity | Supervised | TimeSeries)
previsionio.usecase.get_usecase_version_class(training_type: previsionio.usecase_config.TypeProblem, data_type: previsionio.usecase_config.DataType) → Union[Type[previsionio.text_similarity.TextSimilarity], Type[previsionio.supervised.Supervised], Type[previsionio.timeseries.TimeSeries]]

Get the type of UsecaseVersion class used by this Usecase

Returns:Type of UsecaseVersion
Return type:(previsionio.text_similarity.TextSimilarity | Supervised | TimeSeries)

Usecases Version

Prevision.io’s Python SDK enables you to very easily run usecases of different types: regression, (binary) classification, multiclassification or timeseries.

All these classes inherit from the base previsionio.usecase.BaseUsecaseVersion class, and then from the previsionio.supervised.Supervised class.

When starting a usecase, you also need to specify a training configuration.

Take a look at the specific documentation pages for a more in-depth explanation of each layer and of the usecase configuration options:

Base usecase version

class previsionio.usecase_version.BaseUsecaseVersion(**usecase_info)

Bases: previsionio.api_resource.ApiResource

Base parent class for all usecases objects.

advanced_models_list

Get the list of selected advanced models in the usecase.

Returns:Names of the normal models selected for the usecase
Return type:list(AdvancedModel)
best_model

Get the model with the best predictive performance over all models (including Blend models), where the best performance corresponds to a minimal loss.

Returns:Model with the best performance in the usecase, or None if no model matched the search filter.
Return type:(Model, None)
delete()

Delete a usecase version from the actual [client] workspace.

Raises:
  • PrevisionException – If the usecase version does not exist
  • requests.exceptions.ConnectionError – Error processing the request
delete_prediction(prediction_id: str)

Delete a prediction in the list for the current usecase from the actual [client] workspace.

Parameters:prediction_id (str) – Unique id of the prediction to delete
Returns:Deletion process results
Return type:dict
delete_predictions()

Delete all predictions in the list for the current usecase from the actual [client] workspace.

Returns:Deletion process results
Return type:dict
done

Get a flag indicating whether or not the usecase is currently done.

Returns:done status
Return type:bool
fastest_model

Returns the model that predicts with the lowest response time

Returns:Model object – corresponding to the fastest model
get_holdout_predictions(full: bool = False)

Retrieves the list of holdout predictions for the current usecase from client workspace (with the full predictions object if necessary) :param full: If true, return full holdout prediction objects (else only metadata) :type full: boolean

get_predictions(full: bool = False)

Retrieves the list of predictions for the current usecase from client workspace (with the full predictions object if necessary) :param full: If true, return full prediction objects (else only metadata) :type full: boolean

models

Get the list of models generated for the current use case. Only the models that are done training are retrieved.

Returns:List of models found by the platform for the usecase
Return type:list(Model)
normal_models_list

Get the list of selected normal models in the usecase.

Returns:Names of the normal models selected for the usecase
Return type:list(NormalModel)
running

Get a flag indicating whether or not the usecase is currently running.

Returns:Running status
Return type:bool
schema

Get the data schema of the usecase.

Returns:Usecase schema
Return type:dict
score

Get the current score of the usecase (i.e. the score of the model that is currently considered the best performance-wise for this usecase).

Returns:Usecase score (or infinity if not available).
Return type:float
simple_models_list

Get the list of selected simple models in the usecase.

Returns:Names of the simple models selected for the usecase
Return type:list(SimpleModel)
status

Get a flag indicating whether or not the usecase is currently running.

Returns:Running status
Return type:bool
stop()

Stop a usecase (stopping all nodes currently in progress).

train_dataset

Get the Dataset object corresponding to the training dataset of the usecase.

Returns:Associated training dataset
Return type:Dataset
update_status()

Get an update on the status of a resource.

Parameters:specific_url (str, optional) – Specific (already parametrized) url to fetch the resource from (otherwise the url is built from the resource type and unique _id)
Returns:Updated status info
Return type:dict
wait_until(condition, raise_on_error: bool = True, timeout: float = 3600.0)

Wait until condition is fulfilled, then break.

Parameters:
  • (func (condition) – (BaseUsecaseVersion) -> bool.): Function to use to check the break condition
  • raise_on_error (bool, optional) – If true then the function will stop on error, otherwise it will continue waiting (default: True)
  • timeout (float, optional) – Maximal amount of time to wait before forcing exit

Example:

usecase.wait_until(lambda usecasev: len(usecasev.models) > 3)
Raises:PrevisionException – If the resource could not be fetched or there was a timeout.
class previsionio.usecase_version.ClassicUsecaseVersion(**usecase_info)

Bases: previsionio.usecase_version.BaseUsecaseVersion

correlation_matrix

Get the correlation matrix of the features (those constitute the dataset on which the usecase was trained).

Returns:Correlation matrix as a pandas dataframe
Return type:pd.DataFrame
drop_list

Get the list of drop columns in the usecase.

Returns:Names of the columns dropped from the dataset
Return type:list(str)
dropped_features

Get dropped features

Returns:Dropped features
Return type:dict
feature_list

Get the list of selected feature engineering modules in the usecase.

Returns:Names of the feature engineering modules selected for the usecase
Return type:list(str)
features
  • feature types distribution
  • feature information list
  • list of dropped features
Returns:General features information
Return type:dict
Type:Get the general description of the usecase’s features, such as
features_stats
  • feature types distribution
  • feature information list
  • list of dropped features
Returns:General features information
Return type:dict
Type:Get the general description of the usecase’s features, such as
get_cv() → pandas.core.frame.DataFrame

Get the cross validation dataset from the best model of the usecase.

Returns:Cross validation dataset
Return type:pd.DataFrame
get_feature_info(feature_name: str) → Dict

Return some information about the given feature, such as:

  • name: the name of the feature as it was given in the feature_name parameter

  • type: linear, categorical, ordinal…

  • stats: some basic statistics such as number of missing values, (non missing) values count, plus additional information depending on the feature type:

    • for a linear feature: min, max, mean, std and median
    • for a categorical/textual feature: modalities/words frequencies, list of the most frequent tokens
  • role: whether or not the feature is a target/fold/weight or id feature (and for time series usecases, whether or not it is a group/apriori feature - check the Prevision.io’s timeseries documentation)

  • importance_value: scores reflecting the importance of the given feature

Parameters:
  • feature_name (str) – Name of the feature to get informations about
  • warning:: (.) – The feature_name is case-sensitive, so “age” and “Age” are different features!
Returns:

Dictionary containing the feature information

Return type:

dict

Raises:

PrevisionException – If the given feature name does not match any feaure

predict(df, confidence=False, prediction_dataset_name=None) → pandas.core.frame.DataFrame

Get the predictions for a dataset stored in the current active [client] workspace using the best model of the usecase with a Scikit-learn style blocking prediction mode.

Warning

For large dataframes and complex (blend) models, this can be slow (up to 1-2 hours). Prefer using this for simple models and small dataframes, or use option use_best_single = True.

Parameters:
  • df (pd.DataFrame) – pandas DataFrame containing the test data
  • confidence (bool, optional) – Whether to predict with confidence values (default: False)
Returns:

Prediction results dataframe

Return type:

pd.DataFrame

predict_from_dataset(dataset, confidence=False, dataset_folder=None) → pandas.core.frame.DataFrame

Get the predictions for a dataset stored in the current active [client] workspace using the best model of the usecase.

Parameters:
  • dataset (Dataset) – Reference to the dataset object to make predictions for
  • confidence (bool, optional) – Whether to predict with confidence values (default: False)
  • dataset_folder (Dataset) – Matching folder dataset for the predictions, if necessary
Returns:

The registered prediction object in the current workspace

Return type:

previsionio.prediction.ValidationPrediction

predict_single(data, confidence=False, explain=False)

Get a prediction on a single instance using the best model of the usecase.

Parameters:
  • use_best_single (bool, optional) – Whether to use the best single model instead of the best model overall (default: False)
  • confidence (bool, optional) – Whether to predict with confidence values (default: False)
  • explain (bool) – Whether to explain prediction (default: False)
Returns:

Dictionary containing the prediction.

Note

The format of the predictions dictionary depends on the problem type (regression, classification…)

Return type:

dict

print_info()

Print all info on the usecase.

Supervised usecases

class previsionio.supervised.Supervised(**usecase_info)

Bases: previsionio.usecase_version.ClassicUsecaseVersion

A supervised usecase version, for tabular data

classmethod from_id(_id: str) → previsionio.supervised.Supervised

Get a supervised usecase from the platform by its unique id.

Parameters:
  • _id (str) – Unique id of the usecase to retrieve
  • version (int, optional) – Specific version of the usecase to retrieve (default: 1)
Returns:

Fetched usecase

Return type:

Supervised

Raises:

PrevisionException – Invalid problem type or any error while fetching data from the platform or parsing result

new_version(description: str = None, dataset: Union[previsionio.dataset.Dataset, Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages]] = None, column_config: previsionio.usecase_config.ColumnConfig = None, metric: enum.Enum = None, holdout_dataset: previsionio.dataset.Dataset = None, training_config: previsionio.usecase_config.TrainingConfig = None, **fit_params) → previsionio.supervised.Supervised

Start a supervised usecase training to create a new version of the usecase (on the platform): the training configs are copied from the current version and then overridden for the given parameters.

Parameters:
  • description (str, optional) – additional description of the version
  • dataset (Dataset, DatasetImages, optional) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig, optional) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.Enum, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created supervised usecase object (new version)

Return type:

Supervised

Time Series usecases

class previsionio.timeseries.TimeSeries(**usecase_info)

Bases: previsionio.usecase_version.ClassicUsecaseVersion

A supervised usecase version, for timeseries data

metric_type

alias of previsionio.metrics.Regression

model_class

alias of previsionio.model.RegressionModel

new_version(description: str = None, dataset: previsionio.dataset.Dataset = None, column_config: previsionio.usecase_config.ColumnConfig = None, time_window: previsionio.timeseries.TimeWindow = None, metric: previsionio.metrics.Regression = None, holdout_dataset: previsionio.dataset.Dataset = None, training_config: previsionio.usecase_config.TrainingConfig = <previsionio.usecase_config.TrainingConfig object>)

Start a time series usecase training to create a new version of the usecase (on the platform): the training configs are copied from the current version and then overridden for the given parameters.

Parameters:
  • description (str, optional) – additional description of the version
  • dataset (Dataset, DatasetImages, optional) – Reference to the dataset object to use for as training dataset
  • column_config (ColumnConfig, optional) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • ( (time_window) – class: .TimeWindow, optional): a time window object for representing either feature derivation window periods or forecast window periods
  • metric (metrics.Regression, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig, optional) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created text similarity usecase version object (new version)

Return type:

TimeSeries

class previsionio.timeseries.TimeWindow(derivation_start: int, derivation_end: int, forecast_start: int, forecast_end: int)

Bases: previsionio.usecase_config.UsecaseConfig

A time window object for representing either feature derivation window periods or forecast window periods.

Parameters:
  • derivation_start (int) – Start of the derivation window (must be < 0)
  • derivation_end (int) – End of the derivation window (must be < 0)
  • forecast_start (int) – Start of the forecast window (must be > 0)
  • forecast_end (int) – End of the forecast window (must be > 0)
exception previsionio.timeseries.TimeWindowException

Bases: Exception

TextSimilarity usecases

class previsionio.text_similarity.DescriptionsColumnConfig(content_column, id_column)

Bases: previsionio.usecase_config.UsecaseConfig

Description Column configuration for starting a usecase: this object defines the role of specific columns in the dataset.

Parameters:
  • content_column (str, required) – Name of the column containing the text descriptions in the description dataset.
  • id_column (str, optional) – Name of the id column in the description dataset.
class previsionio.text_similarity.ModelEmbedding

Bases: enum.Enum

Embedding models for Text Similarity

TFIDF = 'tf_idf'

Term Frequency - Inverse Document Frequency

Transformer = 'transformer'

Transformer

TransformerFineTuned = 'transformer_fine_tuned'

fine tuned Transformer

class previsionio.text_similarity.ModelsParameters(model_embedding: previsionio.text_similarity.ModelEmbedding = <ModelEmbedding.TFIDF: 'tf_idf'>, preprocessing: previsionio.text_similarity.Preprocessing = <previsionio.text_similarity.Preprocessing object>, models: List[previsionio.text_similarity.TextSimilarityModels] = [<TextSimilarityModels.BruteForce: 'brute_force'>])

Bases: previsionio.usecase_config.UsecaseConfig

Training configuration that holds the relevant data for a usecase description: the wanted feature engineering, the selected models, the training speed…

Parameters:
  • preprocessing (Preprocessing, optional) –

    Dictionary of the text preprocessings to be applied (only for “tf_idf” embedding model),

    • word_stemming: default to “yes”
    • ignore_stop_word: default to “auto”, choice will be made depending on if the text descriptions contain full sentences or not
    • ignore_punctuation: default to “no”.
  • model_embedding (ModelEmbedding, optional) – Name of the embedding model to be used (among: “tf_idf”, “transformer”, “transformer_fine_tuned”).
  • models (list(TextSimilarityModels), optional) – Names of the searching models to be used (among: “brute_force”, “cluster_pruning”, “ivfopq”, “hkm”, “lsh”).
class previsionio.text_similarity.QueriesColumnConfig(queries_dataset_content_column, queries_dataset_matching_id_description_column, queries_dataset_id_column=None)

Bases: previsionio.usecase_config.UsecaseConfig

Description Column configuration for starting a usecase: this object defines the role of specific columns in the dataset.

Parameters:
  • content_column (str, required) – Name of the column containing the text queries in the description dataset.
  • id_column (str, optional) – Name of the id column in the description dataset.
class previsionio.text_similarity.TextSimilarity(**usecase_info)

Bases: previsionio.usecase_version.BaseUsecaseVersion

A text similarity usecase version

model_class

alias of previsionio.model.TextSimilarityModel

new_version(description: str = None, dataset: previsionio.dataset.Dataset = None, description_column_config: previsionio.text_similarity.DescriptionsColumnConfig = None, metric: previsionio.metrics.TextSimilarity = None, top_k: int = None, lang: previsionio.text_similarity.TextSimilarityLang = <TextSimilarityLang.Auto: 'auto'>, queries_dataset: previsionio.dataset.Dataset = None, queries_column_config: Optional[previsionio.text_similarity.QueriesColumnConfig] = None, models_parameters: previsionio.text_similarity.ListModelsParameters = None, **kwargs) → previsionio.text_similarity.TextSimilarity

Start a text similarity usecase training to create a new version of the usecase (on the platform): the training configs are copied from the current version and then overridden for the given parameters.

Parameters:
  • description (str, optional) – additional description of the version
  • dataset (Dataset, DatasetImages, optional) – Reference to the dataset object to use for as training dataset
  • description_column_config (DescriptionsColumnConfig, optional) – Column configuration for the usecase (see the documentation of the ColumnConfig resource for more details on each possible column types)
  • metric (metrics.TextSimilarity, optional) – Specific metric to use for the usecase (default: None)
  • holdout_dataset (Dataset, optional) – Reference to a dataset object to use as a holdout dataset (default: None)
  • training_config (TrainingConfig, optional) – Specific training configuration (see the documentation of the TrainingConfig resource for more details on all the parameters)
Returns:

Newly created text similarity usecase version object (new version)

Return type:

TextSimilarity

class previsionio.text_similarity.TextSimilarityLang

Bases: enum.Enum

An enumeration.

class previsionio.text_similarity.TextSimilarityModels

Bases: enum.Enum

Similarity search models for Text Similarity

BruteForce = 'brute_force'

Brute force search

ClusterPruning = 'cluster_pruning'

Cluster Pruning

HKM = 'hkm'

Hierarchical K-Means

IVFOPQ = 'ivfopq'

InVerted File system and Optimized Product Quantization

LSH = 'lsh'

Locality Sensitive Hashing

Usecase configuration

class previsionio.usecase_config.ColumnConfig(target_column: Optional[str] = None, filename_column: Optional[str] = None, id_column: Optional[str] = None, fold_column: Optional[str] = None, weight_column: Optional[str] = None, time_column: Optional[str] = None, group_columns: Optional[List[str]] = None, apriori_columns: Optional[List[str]] = None, drop_list: Optional[List[str]] = None)

Column configuration for starting a usecase: this object defines the role of specific columns in the dataset (and optionally the list of columns to drop).

Parameters:
  • target_column (str, optional) – Name of the target column in the dataset
  • id_column (str, optional) – Name of the id column in the dataset that does not have any signal and will be ignored for computation
  • fold_column (str, optional) – Name of the fold column used that should be used to compute the various folds in the dataset
  • weight_column (str, optional) – Name of the weight column used to assign non-equal importance weights to the various rows in the dataset
  • filename_column (str, optional) – Name of the filename column in the dataset for an image-based usecase
  • time_column (str, optional) – Name of the time column in the dataset for a timeseries usecase
  • group_columns (list(str), optional) – Names of the columns in the dataset that define a unique time serie for a timeseries usecase
  • apriori_columns (list(str), optional) – Names of the columns that are known a priori in the dataset for a timeseries usecase
  • drop_list (list(str), optional) – Names of all the columns that should be dropped from the dataset while training the usecase
class previsionio.usecase_config.TrainingConfig(profile: previsionio.usecase_config.Profile = <Profile.Quick: 'quick'>, advanced_models: List[previsionio.usecase_config.AdvancedModel] = [<AdvancedModel.XGBoost: 'XGB'>, <AdvancedModel.LinReg: 'LR'>], normal_models: List[previsionio.usecase_config.NormalModel] = [<NormalModel.XGBoost: 'XGB'>, <NormalModel.LinReg: 'LR'>], simple_models: List[previsionio.usecase_config.SimpleModel] = [], features: List[previsionio.usecase_config.Feature] = [<Feature.Frequency: 'freq'>, <Feature.TargetEncoding: 'tenc'>, <Feature.Counts: 'Counter'>], with_blend: bool = False, feature_time_seconds: int = 3600, feature_number_kept: Optional[int] = None)

Training configuration that holds the relevant data for a usecase description: the wanted feature engineering, the selected models, the training speed…

Parameters:
  • profile (Profile) –

    Type of training profile to use:

    • ”quick”: this profile runs very fast but has a lower performance (it is recommended for early trials)
    • ”advanced”: this profile runs slower but has increased performance (it is usually for optimization steps at the end of your project)
    • the “normal” profile is something in-between to help you investigate an interesting result
  • advanced_models (list(AdvancedModel), optional) – Names of the advanced models to use in the usecase (among: “LR”, “RF”, “ET”, “XGB”, “LGB”, “CB” and “NN”). The advanced models will be hyperparametrized, resulting in a more accurate modelization at the cost of a longer training time.
  • normal_models (list(NormalModel), optional) – Names of the (normal) models to use in the usecase (among: “LR”, “RF”, “ET”, “XGB”, “LGB”, “CB”, ‘NB’ and “NN”). The normal models only use default parameters.
  • simple_models (list(SimpleModel), optional) – Names of the (simple) models to use in the usecase (among: “LR” and “DT”). These models are easy to ineterpret and fast to train but only offer a limited modelization complexity.
  • features (list(Feature), optional) – Names of the feature engineering modules to use (among: “Counter”, “Date”, “freq”, “text_tfidf”, “text_word2vec”, “text_embedding”, “tenc”, “ee”, “poly”, “pca” and “kmean”)
  • with_blend (bool, optional) – If true, Prevision.io’s pipeline will add “blend” models at the end of the training by cherry-picking already trained models and fine-tuning hyperparameters (usually gives even better performance)
  • feature_time_seconds (int, optional) – feature selection take at most fsel_time in seconds
  • feature_number_kept (int, optional) – a feature selection algorithm is launched to keep at most feature_number_kept features
class previsionio.usecase_config.TypeProblem

Type of supervised problems available with Prevision.io.

Classification = 'classification'

Prediction using classification approach, for when the output variable is a category

MultiClassification = 'multiclassification'

Prediction using classification approach, for when the output variable many categories

ObjectDetection = 'object-detection'

Detection of pattern in images

Regression = 'regression'

Prediction using regression problem, for when the output variable is a real or continuous value

TextSimilarity = 'text-similarity'

Ranking of texts by keywords

class previsionio.usecase_config.UsecaseState

Possible state of a Usecase in Prevision.io

Done = 'done'

The usecase finished properly

Failed = 'failed'

The usecase finished with an error

Pending = 'pending'

The usecase is waiting for hardware ressources

Running = 'running'

The usecase is still running

class previsionio.usecase_config.YesOrNo

An enumeration.

class previsionio.usecase_config.YesOrNoOrAuto

An enumeration.

Model

class previsionio.model.Model(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)

Bases: previsionio.api_resource.ApiResource

A Model object is generated by Prevision AutoML plateform when you launch a use case. All models generated by Prevision.io are deployable in our Store

With this Model class, you can select the model with the optimal hyperparameters that responds to your buisiness requirements, then you can deploy it as a real-time/batch endpoint that can be used for a web Service.

Parameters:
  • _id (str) – Unique id of the model
  • usecase_version_id (str) – Unique id of the usecase version of the model
  • name (str, optional) – Name of the model (default: None)
classmethod from_id(_id: str) → Union[previsionio.model.RegressionModel, previsionio.model.ClassificationModel, previsionio.model.MultiClassificationModel, previsionio.model.TextSimilarityModel]

Get a usecase from the platform by its unique id.

Parameters:
  • _id (str) – Unique id of the usecase to retrieve
  • version (int, optional) – Specific version of the usecase to retrieve (default: 1)
Returns:

Fetched usecase

Return type:

BaseUsecaseVersion

Raises:

PrevisionException – Any error while fetching data from the platform or parsing result

hyperparameters

Return the hyperparameters of a model.

Returns:Hyperparameters of the model
Return type:dict
predict(df: pandas.core.frame.DataFrame, confidence: bool = False, prediction_dataset_name: str = None) → pandas.core.frame.DataFrame

Make a prediction in a Scikit-learn blocking style.

Warning

For large dataframes and complex (blend) models, this can be slow (up to 1-2 hours). Prefer using this for simple models and small dataframes or use option use_best_single = True.

Parameters:
  • df (pd.DataFrame) – A pandas dataframe containing the testing data
  • confidence (bool, optional) – Whether to predict with confidence estimator (default: False)
Returns:

Prediction results dataframe

Return type:

pd.DataFrame

predict_from_dataset(dataset: previsionio.dataset.Dataset, confidence: bool = False, dataset_folder: previsionio.dataset.Dataset = None) → previsionio.prediction.ValidationPrediction

Make a prediction for a dataset stored in the current active [client] workspace (using the current SDK dataset object).

Parameters:
  • dataset (Dataset) – Dataset resource to make a prediction for
  • confidence (bool, optional) – Whether to predict with confidence values (default: False)
  • dataset_folder (Dataset, None) – Matching folder dataset resource for the prediction, if necessary
Returns:

The registered prediction object in the current workspace

Return type:

ValidationPrediction

class previsionio.model.ClassicModel(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)

Bases: previsionio.model.Model

chart()

Return chart analysis information for a model.

Returns:Chart analysis results
Return type:dict
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
cross_validation

Get model’s cross validation dataframe.

Returns:Cross-validation dataframe
Return type:pd.Dataframe
feature_importance

Return a dataframe of feature importances for the given model features, with their corresponding scores (sorted by descending feature importance scores).

Returns:Dataframe of feature importances
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
predict_single(data: Dict, confidence: bool = False, explain: bool = False)

Make a prediction for a single instance. Use predict_from_dataset_name() or predict methods to predict multiple instances at the same time (it’s faster).

Parameters:
  • data (dict) – Features names and values (without target feature) - missing feature keys will be replaced by nans
  • confidence (bool, optional) – Whether to predict with confidence values (default: False)
  • explain (bool, optional) – Whether to explain prediction (default: False)

Note

You can set both confidence and explain to true.

Returns:Dictionary containing the prediction result

Note

The prediction format depends on the problem type (regression, classification, etc…)

Return type:dict
class previsionio.model.ClassificationModel(_id, usecase_version_id, **other_params)

Bases: previsionio.model.ClassicModel

A model object for a (binary) classification usecase, i.e. a usecase where the target is categorical with exactly 2 modalities.

Parameters:
  • _id (str) – Unique id of the model
  • uc_id (str) – Unique id of the usecase of the model
  • uc_version (str, int) – Version of the usecase of the model (either an integer for a specific version, or “last”)
  • name (str, optional) – Name of the model (default: None)
get_dynamic_performances(threshold: float = 0.5)

Get model performance for the given threshold.

Parameters:threshold (float, optional) – Threshold to check the model’s performance for (default: 0.5)
Returns:Model classification performance dict with the following keys:
  • confusion_matrix
  • accuracy
  • precision
  • recall
  • f1_score
Return type:dict
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
optimal_threshold

Get the value of threshold probability that optimizes the F1 Score.

Returns:Optimal value of the threshold (if it not a classification problem it returns None)
Return type:float
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
class previsionio.model.MultiClassificationModel(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)

Bases: previsionio.model.ClassicModel

A model object for a multi-classification usecase, i.e. a usecase where the target is categorical with strictly more than 2 modalities.

Parameters:
  • _id (str) – Unique id of the model
  • uc_id (str) – Unique id of the usecase of the model
  • uc_version (str, int) – Version of the usecase of the model (either an integer for a specific version, or “last”)
  • name (str, optional) – Name of the model (default: None)
class previsionio.model.RegressionModel(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)

Bases: previsionio.model.ClassicModel

A model object for a regression usecase, i.e. a usecase where the target is numerical.

Parameters:
  • _id (str) – Unique id of the model
  • uc_id (str) – Unique id of the usecase of the model
  • uc_version (str, int) – Version of the usecase of the model (either an integer for a specific version, or “last”)
  • name (str, optional) – Name of the model (default: None)
class previsionio.model.TextSimilarityModel(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)

Bases: previsionio.model.Model

predict(df: pandas.core.frame.DataFrame, queries_dataset_content_column: str, top_k: int = 10, queries_dataset_matching_id_description_column: str = None, prediction_dataset_name: str = None) → Optional[pandas.core.frame.DataFrame]

Make a prediction for a dataset stored in the current active [client] workspace (using the current SDK dataset object).

Parameters:
  • dataset (Dataset) – Dataset resource to make a prediction for
  • queries_dataset_content_column (str) – Content queries column name
  • top_k (integer) – Number of the nearest description to predict
  • queries_dataset_matching_id_description_column (str) – Matching id description column name
Returns:

Prediction results dataframe

Return type:

pd.DataFrame

predict_from_dataset(queries_dataset: previsionio.dataset.Dataset, queries_dataset_content_column: str, top_k: int = 10, queries_dataset_matching_id_description_column: str = None) → previsionio.prediction.ValidationPrediction

Make a prediction for a dataset stored in the current active [client] workspace (using the current SDK dataset object).

Parameters:
  • dataset (Dataset) – Dataset resource to make a prediction for
  • queries_dataset_content_column (str) – Content queries column name
  • top_k (integer) – Number of the nearest description to predict
  • queries_dataset_matching_id_description_column (str) – Matching id description column name
Returns:

The registered prediction object in the current workspace

Return type:

ValidationPrediction

Usecase Deployment

class previsionio.usecase_deployment.UsecaseDeployment(_id: str, name: str, usecase_id, current_version, versions, deploy_state, access_type, project_id, training_type, models, url=None, **kwargs)

UsecaseDeployment objects represent usecase deployment resource that will be explored by Prevision.io platform.

create_api_key()

Create an api key of the usecase deployment from the actual [client] workspace.

Raises:
  • PrevisionException – If the dataset does not exist
  • requests.exceptions.ConnectionError – Error processing the request
delete()

Delete a usecase deployment from the actual [client] workspace.

Raises:
  • PrevisionException – If the usecase deployment does not exist
  • requests.exceptions.ConnectionError – Error processing the request
classmethod from_id(_id: str)

Get a deployed usecase from the platform by its unique id.

Parameters:_id (str) – Unique id of the usecase version to retrieve
Returns:Fetched deployed usecase
Return type:UsecaseDeployment
Raises:PrevisionException – Any error while fetching data from the platform or parsing result
get_api_keys()

Fetch the api keys client id and cient secret of the usecase deployment from the actual [client] workspace.

Raises:
  • PrevisionException – If the dataset does not exist
  • requests.exceptions.ConnectionError – Error processing the request
classmethod list(project_id: str, all: bool = True) → List[previsionio.usecase_deployment.UsecaseDeployment]

List all the available usecase in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual UsecaseDeployment objects rather than plain dictionaries with the corresponding data.

Parameters:
  • project_id (str) – project id
  • all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:

Fetched dataset objects

Return type:

list(UsecaseDeployment)

list_predictions() → List[previsionio.prediction.DeploymentPrediction]

List all the available predictions in the current active [client] workspace.

Returns:Fetched deployed predictions objects
Return type:list(DeploymentPrediction)
new_version(name: str, main_model, challenger_model=None)

Create a new usecase deployment version.

Parameters:
  • name (str) – usecase deployment name
  • main_model – main model
  • challenger_model (optional) – challenger model. main and challenger models should be in the same usecase
Returns:

The registered usecase deployment object in the current project

Return type:

UsecaseDeployment

Raises:
  • PrevisionException – Any error while creating usecase deployment to the platform or parsing the result
  • Exception – For any other unknown error
predict_from_dataset(dataset: previsionio.dataset.Dataset) → previsionio.prediction.DeploymentPrediction

Make a prediction for a dataset stored in the current active [client] workspace (using the current SDK dataset object).

Parameters:dataset (Dataset) – Dataset resource to make a prediction for
Returns:The registered prediction object in the current workspace
Return type:DeploymentPrediction
wait_until(condition, timeout: float = 3600.0)

Wait until condition is fulfilled, then break.

Parameters:
  • (func (condition) – (BaseUsecaseVersion) -> bool.): Function to use to check the break condition
  • raise_on_error (bool, optional) – If true then the function will stop on error, otherwise it will continue waiting (default: True)
  • timeout (float, optional) – Maximal amount of time to wait before forcing exit

Example:

usecase.wait_until(lambda usecasev: len(usecasev.models) > 3)
Raises:PrevisionException – If the resource could not be fetched or there was a timeout.

Deployed model

Prevision.io’s SDK allows to make a prediction from a model deployed with the Prevision.io’s platform.

import previsionio as pio

# Initialize the deployed model object from the url of the model, your client id and client secret for this model, and your credentials
model = pio.DeployedModel(prevision_app_url, client_id, client_secret)

# Make a prediction
prediction, confidance, explain = model.predict(
    predict_data={'feature1': 1, 'feature2': 2},
    use_confidence=True,
    explain=True,
)
class previsionio.deployed_model.DeployedModel(prevision_app_url: str, client_id: str, client_secret: str, prevision_token_url: str = None)

DeployedModel class to interact with a deployed model.

Parameters:
predict(predict_data: Dict, use_confidence: bool = False, explain: bool = False)

Get a prediction on a single instance using the best model of the usecase.

Parameters:
  • predict_data (dictionary) – input data for prediction
  • confidence (bool, optional) – Whether to predict with confidence values (default: False)
  • explain (bool) – Whether to explain prediction (default: False)
Returns:

Tuple containing the prediction value, confidence and explain. In case of regression problem type, confidence format is a list. In case of multiclassification problem type, prediction value format is a string.

Return type:

tuple(float, float, dict)

request(endpoint, method, files=None, data=None, allow_redirects=True, content_type=None, check_response=True, message_prefix=None, **requests_kwargs)

Make a request on the desired endpoint with the specified method & data.

Requires initialization.

Parameters:
  • endpoint – (str): api endpoint (e.g. /usecases, /prediction/file)
  • method (requests.{get,post,delete}) – requests method
  • files (dict) – files dict
  • data (dict) – for single predict
  • content_type (str) – force request content-type
  • allow_redirects (bool) – passed to requests method
Returns:

request response

Raises:

Exception – Error if url/token not configured

Prediction

class previsionio.prediction.DeploymentPrediction(_id: str, project_id: str, deployment_id: str, state='running', main_model_id=None, challenger_model_id=None, **kwargs)

Bases: previsionio.api_resource.ApiResource

A prediction object that represents a deployed usecase bulk prediction resource which can be explored on the Prevision.io platform.

classmethod from_id(_id: str) → previsionio.prediction.DeploymentPrediction

Get a prediction from the platform by its unique id.

Parameters:_id (str) – Unique id of the deployed usecase bulk prediction to retrieve
Returns:The fetched prediction
Return type:DeploymentPrediction
Raises:PrevisionException – Any error while fetching data from the platform or parsing result
get_challenger_result()

Get the prediction result of the challenger model.

Returns:Prediction results dataframe
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching data from the platform or parsing result
get_result()

Get the prediction result of the main model.

Returns:Prediction results dataframe
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching data from the platform or parsing result
class previsionio.prediction.ValidationPrediction(_id: str, usecase_id: str, usecase_version_id: str, project_id: str, state='running', model_id=None, model_name=None, dataset_id=None, download_available=False, score=None, duration=None, predictions_count=None, **kwargs)

Bases: previsionio.api_resource.ApiResource

A prediction object that represents a usecase bulk prediction resource which can be explored on the Prevision.io platform.

delete()

Delete a prediction from the platform by its unique id.

Raises:
  • PrevisionException – If the prediction images does not exist
  • requests.exceptions.ConnectionError – Error processing the request
classmethod from_id(_id: str) → previsionio.prediction.ValidationPrediction

Get a prediction from the platform by its unique id.

Parameters:_id (str) – Unique id of the usecase bulk prediction to retrieve
Returns:The fetched prediction
Return type:ValidationPrediction
Raises:PrevisionException – Any error while fetching data from the platform or parsing result
get_result()

Get the prediction result.

Returns:Prediction results dataframe
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching data from the platform or parsing result

Exporter

class previsionio.exporter.Exporter(_id, connector_id: str, name: str, description: str = None, path: str = None, bucket: str = None, database: str = None, table: str = None, g_cloud: previsionio.connector.GCloud = None, write_mode: previsionio.exporter.ExporterWriteMode = <ExporterWriteMode.safe: 'safe'>, **kwargs)

Bases: previsionio.api_resource.ApiResource, previsionio.api_resource.UniqueResourceMixin

An exporter to access a distant data pool and upload data easily. This resource is linked to a Connector resource that represents the connection to the distant data source.

Parameters:
  • _id (str) – Unique id of the exporter
  • connector_id (str) – Reference to the associated connector (the resource to go through to get a data snapshot)
  • name (str) – Name of the exporter
  • description (str, optional) – Description of the exporter
  • path (str, optional) – Path to the file to write on via the exporter
  • bucket (str, optional) – Bucket of the file to write on via the exporter
  • database (str, optional) – Name of the database to write on via the exporter
  • table (str, optional) – Name of the table to write on via the exporter
  • g_cloud (GCloud, optional) – Type of google cloud service
  • write_mode (ExporterWriteMode, optional) – Write mode
classmethod _new(project_id: str, connector: previsionio.connector.Connector, name: str, description: str = None, path: str = None, bucket: str = None, database: str = None, table: str = None, g_cloud: previsionio.connector.GCloud = None, write_mode: previsionio.exporter.ExporterWriteMode = <ExporterWriteMode.safe: 'safe'>)

Create a new exporter object on the platform.

Parameters:
  • project_id (str) – Unique project id on which to create the exporter
  • connector (Connector) – Reference to the associated connector (the resource to go through to get a data snapshot)
  • name (str) – Name of the exporter
  • description (str, optional) – Description of the exporter
  • path (str, optional) – Path to the file to write on via the exporter
  • bucket (str, optional) – Bucket of the file to write on via the exporter
  • database (str, optional) – Name of the database to write on via the exporter
  • table (str, optional) – Name of the table to write on via the exporter
  • g_cloud (GCloud, optional) – Type of google cloud service
  • write_mode (ExporterWriteMode, optional) – Write mode
Returns:

The registered exporter object in the current workspace

Return type:

Exporter

Raises:
  • PrevisionException – Any error while uploading data to the platform or parsing the result
  • Exception – For any other unknown error
delete()

Delete an exporter from the actual [client] workspace.

Raises:
  • PrevisionException – If the exporter does not exist
  • requests.exceptions.ConnectionError – Error processing the request
export_dataset(dataset: previsionio.dataset.Dataset, wait_for_export: bool = False)

Upload a Dataset from the current active project using the exporter.

Parameters:
  • dataset (Dataset) – dataset to upload
  • wait_for_export (bool, optional) – Wether to wait until the export is complete or not
Returns:

The registered export object

Return type:

Export

export_file(file_path: str, encoding: str = None, separator: str = None, decimal: str = None, thousands: str = None, wait_for_export: bool = False, **kwargs)

Upload a CSV file using the exporter.

Parameters:
  • file_path (str) – Path of the file to upload
  • encoding (str, optional) – Encoding of the file to upload
  • separator (str, optional) – Separator of the file to upload
  • decimal (str, optional) – Decimal of the file to upload
  • thousands (str, optional) – Thousands of the file to upload
  • wait_for_export (bool, optional) – Wether to wait until the export is complete or not
Returns:

The registered export object

Return type:

Export

export_prediction(prediction: Union[previsionio.prediction.DeploymentPrediction, previsionio.prediction.ValidationPrediction], wait_for_export: bool = False)

Upload a DeploymentPrediction or a ValidationPrediction from the current active project using the exporter.

Parameters:
  • dataset (DeploymentPrediction`|:class:.ValidationPrediction`) – prediction to upload
  • wait_for_export (bool, optional) – Wether to wait until the export is complete or not
Returns:

The registered export object

Return type:

Export

classmethod from_id(_id: str)

Get an exporter from the instance by its unique id.

Parameters:_id (str) – Unique id of the exporter to retrieve
Returns:The fetched exporter
Return type:Exporter
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
classmethod list(project_id: str, all: bool = False)

List all the available exporters in the current active [client] workspace.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:The fetched exporter objects
Return type:list(Exporter)
list_exports()

List all the available exports given the exporter id.

Returns:The fetched export objects
Return type:list(Export)
class previsionio.exporter.ExporterWriteMode

Bases: enum.Enum

Write mode for exporters.

append = 'append'

Append to existing table.

replace = 'replace'

Replace existing file/table.

safe = 'safe'

Fail if file already exists.

timestamp = 'timestamp'

Append timestamp to the output filename.

Export

class previsionio.export.Export(_id, exporter_id: str, dataset_id: str = None, prediction_id: str = None, status: str = None, **kwargs)

Bases: previsionio.api_resource.ApiResource, previsionio.api_resource.UniqueResourceMixin

An export

Parameters:
  • _id (str) – Unique id of the export
  • exporter_id (str) – Unique exporter id on which to create the export
  • dataset_id (str, optional) – Unique dataset id on which to create the export
  • prediction_id (str, optional) – Unique prediction id on which to create the export
classmethod _new(exporter_id: str, prediction: Union[previsionio.prediction.DeploymentPrediction, previsionio.prediction.ValidationPrediction] = None, dataset: previsionio.dataset.Dataset = None, file_path: str = None, encoding: str = None, separator: str = None, decimal: str = None, thousands: str = None, wait_for_export: bool = False, origin: str = None, pipeline_scheduled_run_id: str = None)

Create a new exporter object on the platform.

Parameters:
  • exporter_id (str) – Unique exporter id on which to create the export
  • prediction (DeploymentPrediction`|:class:.ValidationPrediction`, optional) – prediction to upload
  • dataset (Dataset, optional) – Dataset to upload
  • file_path (str, optional) – Path of the file to upload
  • encoding (str, optional) – Encoding of the file to upload
  • separator (str, optional) – Separator of the file to upload
  • decimal (str, optional) – Decimal of the file to upload
  • thousands (str, optional) – Thousands of the file to upload
  • wait_for_export (bool, optional) – Wether to wait until the export is complete or not
Returns:

The registered export object

Return type:

Export

Raises:
  • PrevisionException – Any error while uploading data to the platform or parsing the result
  • Exception – For any other unknown error
classmethod export_dataset(exporter_id, dataset: previsionio.dataset.Dataset, wait_for_export: bool = False)

Upload a Dataset from the current active project using an exporter.

Parameters:
  • exporter_id (str) – Unique exporter id on which to create the export
  • dataset (Dataset) – Dataset to upload
  • wait_for_export (bool, optional) – Wether to wait until the export is complete or not
Returns:

The registered export object

Return type:

Export

classmethod export_file(exporter_id: str, file_path: str, encoding: str = None, separator: str = None, decimal: str = None, thousands: str = None, wait_for_export: bool = False, **kwargs)

Upload a CSV file using an exporter.

Parameters:
  • exporter_id (str) – Unique exporter id on which to create the export
  • file_path (str) – Path of the file to upload
  • encoding (str, optional) – Encoding of the file to upload
  • separator (str, optional) – Separator of the file to upload
  • decimal (str, optional) – Decimal of the file to upload
  • thousands (str, optional) – Thousands of the file to upload
  • wait_for_export (bool, optional) – Wether to wait until the export is complete or not
Returns:

The registered export object

Return type:

Export

classmethod export_prediction(exporter_id, prediction: Union[previsionio.prediction.DeploymentPrediction, previsionio.prediction.ValidationPrediction], wait_for_export: bool = False)

Upload a DeploymentPrediction or a ValidationPrediction from the current active project using an exporter.

Parameters:
  • exporter_id (str) – Unique exporter id on which to create the export
  • prediction (DeploymentPrediction`|:class:.ValidationPrediction`) – Prediction to upload
  • wait_for_export (bool, optional) – Wether to wait until the export is complete or not
Returns:

The registered export object

Return type:

Export

classmethod from_id(_id: str)

Get an export from the instance by its unique id.

Parameters:_id (str) – Unique id of the export to retrieve
Returns:The fetched export
Return type:Export
Raises:PrevisionException – Any error while fetching data from the platform or parsing the result
classmethod list(exporter_id: str)

List all the available exports given an exporter id.

Parameters:exporter_id – exporter id
Returns:The fetched export objects
Return type:list(Export)