Welcome to prevision-python’s documentation!¶
Prevision.io is an automated SaaS machine learning platform that enables you to create and deploy powerful predictive models and business applications in one-click.
This documentation focuses on how to use Prevision.io’s Python SDK for a direct usage in your data science scripts.
To take a quick peek at the available features, look at the Getting started guide.
For more in-depth explanations, dive into the extended documentation with Tutorials & Samples.
If you’d rather examine the Python API directly, here is the direct API Reference.
The compatibility version between Prevision.io’s Python SDK and Prevision Platform works as follows:
Prevision 10.10 | Prevision 10.11 | Prevision 10.12 | Prevision 10.13 | Prevision 10.14 | Prevision 10.15 | Prevision 10.16 | Prevision 10.17 | Prevision 10.18 | Prevision 10.19 | Prevision 10.20 | Prevision 10.21 | Prevision 10.22 | Prevision 10.23 | Prevision 10.24 | Prevision 11.0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Prevision Python SDK 10.10 | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.11 | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.12 | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.13 | ✘ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.14 | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.15 | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.16 | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.17 | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.18 | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
Prevision Python SDK 10.19 | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ |
Prevision Python SDK 10.20 | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ |
Prevision Python SDK 10.21 | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ |
Prevision Python SDK 10.22 | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ |
Prevision Python SDK 10.23 | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ |
Prevision Python SDK 10.24 | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✘ |
Prevision Python SDK 11.0 | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ |
Getting started¶
Pre-requisites¶
You need to have an account at cloud.prevision.io or on an on-premise version installed in your company. Contact us or your IT manager for more info.
You will be working on a specific “instance”. This instance corresponds to the subdomain at the beginning of the
url in your prevision.io address: https://<your instance>.prevision.io
.
Get the package¶
- clone the git repo:
git clone https://github.com/previsionio/prevision-python.git
- install as a Python package:
cd prevision-python
python setup.py install
Setup your client¶
Prevision.io’s SDK client uses a specific master token to authenticate with the instance’s server and allow you to perform various requests. To get your master token, log in the online interface on your instance, navigate to the admin page and copy the token.
You can either set the token and the instance name as environment variables, by specifying
PREVISION_URL
and PREVISION_MASTER_TOKEN
, or at the beginning of your script:
import previsionio as pio
# We initialize the client with our master token and the url of the prevision.io server
# (or local installation, if applicable)
url = """https://<your instance>.prevision.io"""
token = """<your token>"""
pio.client.init_client(url, token)
A small example¶
create a project¶
First things first, to upload data or train a usecase, you need to create a project.
# create project
project = pio.Project.new(name="project_name",
description="project description")
Getting some data¶
To train a usecase, you need to gather some training data. This data
can be passed a pandas
DataFrame
or a string representing a path to a file.
# load some data from a CSV file
data_path = 'data/titanic.csv'
dataset = project.create_dataset(name='helloworld', file_name=data_path)
# or use a pandas DataFrame
dataframe = pd.read_csv(data_path)
dataset = project.create_dataset(name='helloworld', dataframe=dataframe)
This will automatically read the given data and upload it as a new dataset on your Prevision.io’s instance. If you go to the online interface, you will see this new dataset in the list of datasets (in the “Data” tab).
You can also load in your script a dataset that has already been uploaded on the platform:
# by unique id
dataset = pio.Dataset.from_id('5ebaad70a7271000e7b28ea0')
Note
If you want to list all of the available datasets on your instance, simply use:
datasets = project.list_datasets()
Configuring a usecase¶
If you want, you can also specify some training parameters, such as which models are used, which transformations are applied, and how the models are optimized.
uc_config = pio.TrainingConfig(advanced_models=[pio.AdvancedModel.LinReg],
normal_models=[pio.NormalModel.LinReg],
simple_models=[pio.SimpleModel.DecisionTree],
features=[pio.Feature.Counts],
profile=pio.Profile.Quick)
For a full details on training config and training parameters, see the training config documentation.
Starting training¶
You can then create a new usecase based on :
- a usecase name
- a dataset
- a column config
- (optional) a metric type
- (optional) a training config
usecase_version = project.fit_classification('helloworld_classif',
dataset,
metric=pio.metrics.Classification.AUC,
training_config=uc_config)
Note
For more complex usecase setups (for example with an image dataset), refer to the Starting a usecase guide.
Configuring a text similarity usecase¶
If you want, you can also specify some training parameters, such as which models are used, which embedding and preprocessing are applied.
models_parameters_1 = pio.ModelsParameters(pio.ModelEmbedding.TFIDF,
pio.Preprocessing(),
[pio.TextSimilarityModels.BruteForce, pio.TextSimilarityModels.ClusterPruning])
models_parameters_2 = pio.ModelsParameters(pio.ModelEmbedding.Transformer,
{},
[pio.TextSimilarityModels.BruteForce])
models_parameters_3 = pio.ModelsParameters(pio.ModelEmbedding.TransformerFineTuned,
{},
[pio.TextSimilarityModels.BruteForce])
models_parameters = [models_parameters_1, models_parameters_2, models_parameters_3]
models_parameters = pio.ListModelsParameters(models_parameters=models_parameters)
Note
If you want the default configuration of text similarity models, simply use:
models_parameters = pio.ListModelsParameters()
Starting text similarity training¶
You can then create a new text similarity usecase based on :
- a usecase name
- a dataset
- a description column config
- (optional) a queries dataset
- (optional) a qeries column config
- (optional) a metric type
- (optional) a top k
- (optional) a language
- (optional) a models parameters list
usecase_verion = project.fit_text_similarity('helloworld_text_similarity',
dataset,
description_column_config,
metric=pio.metrics.TextSimilarity.accuracy_at_k,
top_k=10,
models_parameters=models_parameters)
Monitoring training¶
You can retrieve at any moment the number of models trained so far and the current error score, as well as some additional info.
>>> usecase_verion.score
0.0585
>>> usecase_verion.print_info()
scores_cv: 0.0585
You can also wait until a certain condition is reached, such as a number of models or a certain score:
# will block until there are more than 3 models
uc.wait_until(lambda usecasev: len(usecasev.models) > 0)
# will block until error is lower than 0.3 (warning, it may never reach it and wait forever)
uc.wait_until(lambda usecasev: usecasev.score < .3)
The wait_until
method takes a function that takes the usecase as an argument, and can therefore access any info
relative to the usecase.
Making predictions¶
Once we have at least a model, we can start making predictions. We don’t need to wait until the complete training process is done, and we’ll always have access to the best model trained so far.
# we have some test data here:
data_path = 'data/titanic_test.csv'
test_dataset = project.create_dataset(name='helloworld_test', file_name=data_path)
preds = usecase_verion.predict_from_dataset(test_dataset)
# scikit-learn style:
df = pd.read_csv(data_path)
preds = uc.predict(df)
- For text similarity, you can create a new prediction based on :
- a dataset queries
- a query colmun name
- (optional) topK
- (optional) description id column name
# we have some test data here:
data_path = 'data/queries_test.csv'
test_dataset = project.create_dataset(name='helloworld_test', file_name=data_path)
preds = usecase_verion.predict_from_dataset(test_dataset,
'query',
top_k=10,
queries_dataset_matching_id_description_column='true_item_id')
Additional util methods¶
Retrieving a use case¶
Since a use case can be somewhat long to train, it can be useful to separate the training, monitoring and prediction phases.
To do that, we need to be able to recreate a usecase object in python from its name:
usecase_version = pio.Supervised.from_id('<a usecase id>')
# usecase_version now has all the same methods as a usecase_version created directly from a file or a dataframe
>>> usecase_version.print_info()
scores_cv: 0.0585
state: running
Stopping and deleting¶
Once you’re satisfied with model performance, don’t want to wait for the complete training process to be over, or need to free up some resources to start a new training, you can stop the usecase_version simply:
usecase_version.stop()
You’ll still be able to make predictions and get info, but the performance won’t improve anymore. Note: there’s no difference in state between a stopped usecase and a usecase that has completed its training completely.
You can decide to completely delete the usecase:
uc = usecase_version.usecase
uc.delete()
However, be careful because, in that case, any detail about the usecase will be removed, and you won’t be able to make predictions anymore.
Tutorials & Samples¶
Tutorials¶
Configuring the client¶
To connect with Prevision.io’s Python SDK onto your instance, you need to have your master token. This token can be found either:
by going to the online web interface, in the “Administration & API key” page
by calling:
client.init_client_with_login(prevision_url, email, password)
Then, to use your client credentials, you have 2 options:
set the credentials as environment variables so that they are automatically reloaded when you run your scripts: you need to set
PREVISION_URL
to your instance url (i.e. something of the form:https://<instance_name>.prevision.io
) andPREVISION_MASTER_TOKEN
to the master token you just retrievedset the credentials at the beginning of your script, using the
init_client()
method:import previsionio as pio url = """https://<your instance>.prevision.io""" token = """<your token>""" pio.client.init_client(url, token)
For a full description of Prevision.io’s client API, check out the API Reference.
Loading & fetching datasets¶
Loading up data¶
To use a dataset in a Prevision.io’s usecase, you need to upload it on the platform. This can be done on the online platform, in the “Data” page, or through the SDK.
When using the SDK, you can reference a file path directly, or use a pre-read pandas
dataframe, to
easily create a new Prevision.io dataset on the platform:
# create a project
project = pio.Project.new(name="project_name",
description="project description")
# load some data from a CSV file
data_path = 'helloworld.csv'
dataset = project.create_dataset(name='helloworld', file_name=data_path)
# or use a pandas DataFrame
dataframe = pd.read_csv(data_path)
dataset = project.create_dataset(name='helloworld', dataframe=dataframe)
If you have a datasource you to take a snapshot of to create a dataset (see Managing datasources & connectors), then use the SDK resource object in your arguments:
datasource = pio.DataSource.from_id('my_datasource_id')
dataset = project.create_dataset(name='helloworld', datasource=datasource)
Listing available datasets¶
To get a list of all the datasets currently available on the platform (in your workspace), use the list_datasets()
method:
datasets = project.list_datasets()
for dataset in datasets:
print(dataset.name)
Fetching data from the platform¶
If you already uploaded a dataset on the platform and want to grab it locally to perform some preprocessing,
or a train/test split, simply use the from_id()
SDK methods:
dataset = pio.Dataset.from_id('5ebaad70a7271000e7b28ea0')
Starting a usecase¶
To run a use case and train models on your training subset, you need to define some configuration parameters,
and then simply use the SDK’s BaseUsecase
-derived methods to have the platform automatically take care
of starting everything for you.
Configuring the use case columns¶
In order for the platform to know what your training target is, or whether you have some specific id columns that should not be taken into account during computation, you need to specify some “column configuration” for your use case.
These columns are bundled together into a ColumnConfig
instance; there are 5 interesting parameters:
target_column
: the name of the target column in the datasetid_column
(optional): the name of an id column that has no value for the model (it doesn’t have any true signal) but is just a handy list of references for example; this column should thus be ignored during training (but it will eventually be rematched to the prediction sample to give you back the full data)fold_column
(optional): if you want to perform a custom stratification to improve the quality of your predictions (which is sometimes better than regular cross-validation), you can pass a specific column name to use as reference; if none is provided, a random stratification will be used and will try to force the same distribution of the target between foldsweight_column
(optional): sometimes, a numerical does not contain an actual feature but rather an indication of how important each row is — if that is the case, you can pass the name of this column as weight_column (the higher the weight, the more important the row — by default, all rows are considered to be of equal importance); note that if this is provided, the optimised metric will become weighteddrop_list
(optional): you can pass a list of column names that you wish to exclude from the training (they will simply be ignored)
There are additional columns required in case of a timeseries or image-based usecase: take a look at the ColumnConfig
API reference
for more details.
Here is an example of a very basic column configuration instance:
column_config = pio.ColumnConfig(target_column='TARGET', id_column='ID')
Configuring the training profile¶
You can also fine-tune your use case options by configuring a training profile. This ensemble of variables will decide several things for your use case: what models are tested out, what metric is used, the desired types of feature engineering…
The function offers you a range of options to choose from, among which some that are used quite often:
models
: the list of “full” models you want to add to your training pipeline chosen among “LR”, “RF”, “ET”, “XGB”, “LGB” and “NN”simple_models
: the list of “simple” models you want to add to your training pipeline chosen among “LR” and “DT”fe_selected_list
: the list of feature engineering blocks to integrate in the pipeline (these will be applied on your dataset during training to extract relevant information and transform it in the best possible way for the models fit step)profile
: this Prevision.io specific is a way of setting a global run mode that determines both training time and performance. You may choose between 3 profiles:- the “quick” profile runs very fast but has a lower performance (it is recommended for early trials)
- the “advanced” profile runs slower but has increased performance (it is usually for optimization steps at the end of your project)
- the “normal” profile is something in-between to help you investigate an interesting result
with_blend
: if you turn this setting on, you will allow Prevision.io’s training pipeline to append additional “blend” models at the end that are based on some cherry-picked already-trained models and proceed to further optimization to usually get even better results
A common “quick-test” training config could be:
training_config = pio.TrainingConfig(advanced_models=[pio.AdvancedModel.LinReg],
normal_models=[pio.NormalModel.LinReg],
simple_models=[pio.SimpleModel.DecisionTree],
features=[pio.Feature.Counts],
profile=pio.Profile.Quick)
Starting the use case!¶
To create the usecase and start your training session, you need to create a project then call the adapt function
For a full list of the usecase fit function, check out the API Reference.
You also need to provide the API with the dataset you want to use (for a tabular usecase) or the CSV reference dataset and ZIP image dataset (for an image usecase).
The following example shows how to start a regression on a simple tabular dataset:
uc = project.fit_regression('helloworld reg',
dataset,
column_config=column_config,
training_config=training_config)
If you are running an image usecase, then you need to pass the two datasets as a tuple:
The following example shows how to start a regression on a simple tabular dataset (where the CSV reference dataset is a Dataset
instance and
the ZIP image dataset is a DatasetImages
instance):
uc = project.fit_image_regression('helloworld images reg',
(dataset_csv, dataset_zip),
column_config=column_config,
training_config=training_config)
When you start your usecase, you can either let the SDK pick a default metric according to your usecase type, or you can choose one yourself from the list of available Metrics.
Managing datasources & connectors¶
Datasources and connectors are Prevision.io’s way of keeping a link to a source of data and taking snapshots when needed. The distant data source can be an FTP server, a database, an Amazon bucket…
Connectors hold the credentials to connect to the distant data source and datasources specify the exact resource to resource from it (be it the path to the file to load, the name of the database table to parse…).
For more info on all the options of connectors and datasources, check out the API Reference.
Listing available connectors and datasources¶
Connectors and datasources already registered on the platform can be listed
using the list_connectors()
and list_datasource()
method from project class:
connectors = project.list_connectors()
for connector in connectors:
print(connector.name)
datasources = project.list_datasource()
for datasource in datasources:
print(datasource.name)
Creating a connector¶
To create a connector, use the appropriate method of project class. For example,
to create a connector to an SQL database, use the create_sql_connector()
and pass in your credentials:
connector = project.create_sql_connector('my_sql_connector',
'https://myserver.com',
port=3306,
username='username',
password='password')
Creating a datasource¶
After you’ve created a connector, you need to use a datasource to actually refer to and fetch a resource in the distant data source. To create a datasource, you need to link the matching connector and to supply the relevant info, depending on the connector type.
datasource = project.create_datasource(connector,
'my_sql_datasource',
database='my_db',
table='table1')
You can then create datasets from this datasource as explained in the guide on Loading & fetching datasets.
Samples¶
Getting started¶
This piece of code shows how to:
- initialize a connection to your instance and authentify with your token
- load some data
- start a usecase
- get info about the usecase and its model
- make some predictions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | import previsionio as pio
import pandas as pd
# CLIENT INITIALIZATION -----------------------------------------
url = """https://<your instance>.prevision.io"""
token = """<your token>"""
pio.client.init_client(url, token)
# CREATE PROJECT --------------------------------------------------
project = pio.Project.new(name="project_name",
description="project description")
# DATA LOADING --------------------------------------------------
# load data from a CSV
dataframe = pd.read_csv('helloworld_train.csv')
# upload it to the platform
dataset = project.create_dataset(name='helloworld_train', dataframe=dataframe)
# USECASE TRAINING ----------------------------------------------
# setup usecase
uc_config = pio.TrainingConfig(advanced_models=[pio.AdvancedModel.LinReg],
normal_models=[pio.NormalModel.LinReg],
simple_models=[pio.SimpleModel.DecisionTree],
features=[pio.Feature.Counts],
profile=pio.Profile.Quick)
# run training
usecase_version = project.fit_classification('helloworld_classif',
dataset,
metric=pio.metrics.Classification.AUC,
training_config=uc_config)
# (block until there is at least 1 model trained)
usecase_version.wait_until(lambda usecase: len(usecase.models) > 0)
# check out the usecase status and other info
usecase_version.print_info()
print('Current (best model) score:', usecase_version.score)
# PREDICTIONS ---------------------------------------------------
# load up test data
test_datapath = 'helloworld_test.csv'
test_dataset = project.create_dataset(name='helloworld_test', file_name=test_datapath)
preds = usecase_version.predict_from_dataset(test_dataset)
df = pd.read_csv(test_datapath)
preds = usecase_version.predict(df)
|
Setting logging¶
Prevision.io’s SDK can provide with more detailed information if you change the logging level. By default, it will only output warnings and errors.
To change the logging level, use pio.verbose()
method:
-
previsionio.__init__.
verbose
(v, debug: bool = False, event_log: bool = False)¶ Set the SDK level of verbosity.
Parameters:
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import previsionio as pio
# CHANGE LOGGING LEVEL ------------------------------------------
pio.verbose(True, debug=True) # (add event_log=True
# for events logging)
# CLIENT INITIALIZATION -----------------------------------------
url = """https://<your instance>.prevision.io"""
token = """<your token>"""
pio.client.init_client(url, token)
# TESTING LOGS --------------------------------------------------
# fetching a dataset from the platform
dataset = pio.Dataset.from_id('dataset_id')
# fetching a usecase from the platform
usecase = pio.Usecase.from_id('usecase_id')
# fetching a usecase version from the platform
usecase_version = pio.Supervised.from_id('usecase_version_id')
usecase_version = pio.Classification.from_id('usecase_version_id')
# fetching a model from the platform
model = pio.Model.from_id('helloworld classif')
|
API Reference¶
This section gathers all the available classes, functions and tools offered by Prevision.io’s Python SDK.
Client¶
Prevision.io’s SDK client uses a specific master token to authenticate with the instance’s server and allow you to perform various requests. To get your master token, log in the online interface, navigate to the admin page and copy the token.
You can either set the token and the instance name as environment variables, by specifying
PREVISION_URL
and PREVISION_MASTER_TOKEN
, or at the beginning of your script:
import previsionio as pio
# We initialize the client with our master token and the url of the prevision.io server
# (or local installation, if applicable)
url = """https://<your instance>.prevision.io"""
token = """<your token>"""
pio.client.init_client(url, token)
-
class
previsionio.prevision_client.
Client
¶ Client class to interact with the Prevision.io platform and manage authentication.
-
init_client
(prevision_url: str, token: str)¶ Init the client (and check that the connection is valid).
Parameters: - prevision_url (str) – URL of the Prevision.io platform. Should be https://cloud.prevision.io if you’re in the cloud, or a custom IP address if installed on-premise.
- token (str) –
Your Prevision.io master token. Can be retrieved on /dashboard/infos on the web interface or obtained programmatically through:
client.init_client_with_login(prevision_url, email, password)
-
request
(endpoint: str, method, files: Dict = None, data: Dict = None, allow_redirects: bool = True, content_type: str = None, no_retries: bool = False, **requests_kwargs) → requests.models.Response¶ Make a request on the desired endpoint with the specified method & data.
Requires initialization.
Parameters: - endpoint – (str): api endpoint (e.g. /usecases, /prediction/file)
- method (requests.{get,post,delete}) – requests method
- files (dict) – files dict
- data (dict) – for single predict
- content_type (str) – force request content-type
- allow_redirects (bool) – passed to requests method
- no_retries (bool) – force request to run the first time, or exit directly
Returns: request response
Raises: Exception
– Error if url/token not configured
-
Project¶
-
class
previsionio.project.
Project
(_id: str, name: str, description: str = None, color: previsionio.project.ProjectColor = None, created_by: str = None, admins=[], contributors=[], viewers=[], pipelines_count: int = 0, usecases_count: int = 0, dataset_count: int = 0, **kwargs)¶ Bases:
previsionio.api_resource.ApiResource
,previsionio.api_resource.UniqueResourceMixin
A Project
Parameters: - _id (str) – Unique id of the project
- name (str) – Name of the project
- description (str, optional) – Description of the project
- color (ProjectColor, optional) – Color of the project
-
create_dataset
(name: str, datasource: previsionio.datasource.DataSource = None, file_name: str = None, dataframe: pandas.core.frame.DataFrame = None)¶ Register a new dataset in the workspace for further processing. You need to provide either a datasource, a file name or a dataframe (only one can be specified).
Note
To start a new use case on a dataset, it has to be already registred in your workspace.
Parameters: - name (str) – Registration name for the dataset
- datasource (
DataSource
, optional) – A DataSource object used to import a remote dataset (if you want to import a specific dataset from an existent database, you need a datasource connector (Connector
object) designed to point to the related data source) - file_name (str, optional) – Path to a file to upload as dataset
- dataframe (pd.DataFrame, optional) – A
pandas
dataframe containing the data to upload
Raises: Exception
– If more than one of the keyword argumentsdatasource
,file_name
,dataframe
was specifiedPrevisionException
– Error while creating the dataset on the platform
Returns: The registered dataset object in the current workspace.
Return type:
-
create_datasource
(connector: previsionio.connector.Connector, name: str, path: str = None, database: str = None, table: str = None, bucket: str = None, request: str = None, gCloud: str = None)¶ Create a new datasource object on the platform.
Parameters: - connector (
Connector
) – Reference to the associated connector (the resource to go through to get a data snapshot) - name (str) – Name of the datasource
- path (str, optional) – Path to the file to fetch via the connector
- database (str, optional) – Name of the database to fetch data from via the connector
- table (str, optional) – Name of the table to fetch data from via the connector
- bucket (str, optional) – Name of the bucket to fetch data from via the connector
- gCloud (str, optional) – gCloud
- request (str, optional) – Direct SQL request to use with the connector to fetch data
Returns: The registered datasource object in the current project
Return type: Raises: PrevisionException
– Any error while uploading data to the platform or parsing the resultException
– For any other unknown error
- connector (
-
create_ftp_connector
(name: str, host: str, port: int = 21, username: str = '', password: str = '')¶ A connector to interact with a distant source of data (and easily get data snapshots using an associated
DataSource
resource).Parameters: Returns: The registered connector object in the current project.
Return type:
-
create_gcp_connector
(name: str = '', host: str = '', port=None, username: str = '', password: str = '', googleCredentials: str = '')¶ A connector to interact with a distant source of data (and easily get data snapshots using an associated
DataSource
resource).Parameters: Returns: The registered connector object in the current project.
Return type:
-
create_hive_connector
(name: str, host: str, port: int = 10000, username: str = '', password: str = '')¶ A connector to interact with a distant source of data (and easily get data snapshots using an associated
DataSource
resource).Parameters: Returns: The registered connector object in the current project.
Return type:
-
create_image_folder
(name: str, file_name: str)¶ Register a new image dataset in the workspace for further processing (in the image folders group).
Note
To start a new use case on a dataset image, it has to be already registred in your workspace.
Parameters: Raises: PrevisionException
– Error while creating the dataset on the platformReturns: The registered dataset object in the current workspace.
Return type:
-
create_s3_connector
(name: str, host: str = '', port: int = None, username: str = '', password: str = '')¶ A connector to interact with a distant source of data (and easily get data snapshots using an associated
DataSource
resource).Parameters: Returns: The registered connector object in the current project.
Return type:
-
create_sftp_connector
(name: str, host: str, port: int = 23, username: str = '', password: str = '')¶ A connector to interact with a distant source of data (and easily get data snapshots using an associated
DataSource
resource).Parameters: Returns: The registered connector object in the current project.
Return type:
-
create_sql_connector
(name: str, host: str, port: int = 3306, username: str = '', password: str = '')¶ A connector to interact with a distant source of data (and easily get data snapshots using an associated
DataSource
resource).Parameters: Returns: The registered connector object in the current project.
Return type:
-
delete
() → requests.models.Response¶ Delete a project from the actual [client] workspace.
Raises: PrevisionException
– If the dataset does not existrequests.exceptions.ConnectionError
– Error processing the request
-
fit_classification
(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Classification = <Classification.AUC: 'auc'>, holdout_dataset=None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)¶ Start a tabular classification usecase version training
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (str, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created Classification usecase version object
Return type: supervised.Classification
-
fit_image_classification
(name: str, dataset: Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages], column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Classification = <Classification.AUC: 'auc'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)¶ Start an image classification usecase version training
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
,DatasetImages
) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (str, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created ClassificationImages usecase version object
Return type: supervised.ClassificationImages
-
fit_image_multiclassification
(name: str, dataset: Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages], column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.MultiClassification = <MultiClassification.log_loss: 'log_loss'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs) → previsionio.supervised.Supervised¶ Start an image multiclassification usecase version training
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
,DatasetImages
) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - ( (metric) – enum: metrics.MultiClassification, optional): Specific metric to use for the usecase (default:
metrics.MultiClassification.log_loss
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: - Newly created MultiClassificationImages usecase version
object
Return type: supervised.MultiClassificationImages
-
fit_image_regression
(name: str, dataset: Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages], column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Regression = <Regression.RMSE: 'rmse'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)¶ Start an image regression usecase version training
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
,DatasetImages
) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (str, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created RegressionImages usecase version object
Return type: supervised.RegressionImages
-
fit_multiclassification
(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.MultiClassification = <MultiClassification.log_loss: 'log_loss'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)¶ Start a tabular multiclassification usecase version training
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (str, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created MultiClassification usecase version object
Return type: supervised.MultiClassification
-
fit_regression
(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, metric: previsionio.metrics.Regression = <Regression.RMSE: 'rmse'>, holdout_dataset=None, training_config=<previsionio.usecase_config.TrainingConfig object>, **kwargs)¶ Start a tabular regression usecase version training
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (str, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created Regression usecase version object
Return type: supervised.Regression
-
fit_text_similarity
(name: str, dataset: previsionio.dataset.Dataset, description_column_config: previsionio.text_similarity.DescriptionsColumnConfig, metric: previsionio.metrics.TextSimilarity = <TextSimilarity.accuracy_at_k: 'accuracy_at_k'>, top_k: int = 10, lang: str = 'auto', queries_dataset: previsionio.dataset.Dataset = None, queries_column_config: previsionio.text_similarity.QueriesColumnConfig = None, models_parameters: previsionio.text_similarity.ListModelsParameters = <previsionio.text_similarity.ListModelsParameters object>)¶ Start a text similarity usecase training with a specific training configuration.
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
) – Reference to the dataset object to use for as training dataset - description_column_config (
DescriptionsColumnConfig
) – Description column configuration (see the documentation of theDescriptionsColumnConfig
resource for more details on each possible column types) - metric (str, optional) – Specific metric to use for the usecase (default:
accuracy_at_k
) - top_k (int, optional) – top_k (default:
10
) - queries_dataset (
Dataset
, optional) – Reference to a dataset object to use as a queries dataset (default:None
) - queries_column_config (
QueriesColumnConfig
) – Queries column configuration (see the documentation of theQueriesColumnConfig
resource for more details on each possible column types) - models_parameters (
ListModelsParameters
) – Specific training configuration (see the documentation of theListModelsParameters
resource for more details on all the parameters)
Returns: Newly created TextSimilarity usecase version object
Return type:
-
fit_timeseries_regression
(name: str, dataset: previsionio.dataset.Dataset, column_config: previsionio.usecase_config.ColumnConfig, time_window: previsionio.timeseries.TimeWindow, metric: previsionio.metrics.Regression = <Regression.RMSE: 'rmse'>, holdout_dataset: previsionio.dataset.Dataset = None, training_config=<previsionio.usecase_config.TrainingConfig object>) → previsionio.timeseries.TimeSeries¶ Start a timeseries regression usecase version training
Parameters: - name (str) – Name of the usecase to create
- dataset (
Dataset
) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
) – Column configuration for the usecase version (see the documentation of theColumnConfig
resource for more details on each possible column types) - time_window (
TimeWindow
) – Time configuration (see the documentation of theTimeWindow
resource for more details) - ( (metric) – enum: metrics.Regression, optional): Specific metric to use for the usecase (default:
metrics.Regression.RMSE
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created TimeSeries usecase version object
Return type:
-
classmethod
from_id
(_id: str)¶ Get a project from the instance by its unique id.
Parameters: _id (str) – Unique id of the resource to retrieve Returns: The fetched datasource Return type: Project
Raises: PrevisionException
– Any error while fetching data from the platform or parsing the result
-
info
() → Dict¶ Get a datasource from the instance by its unique id.
Parameters: _id (str) – Unique id of the resource to retrieve Returns: - Information about the Project with these entries:
- ”_id” “name” “description” “color” “created_by” “admins” “contributors” “viewers” “pipelines_count” “usecases_count” “dataset_count” “users”
Return type: Dict Raises: PrevisionException
– Any error while fetching data from the platform or parsing the result
-
classmethod
list
(all: bool = False)¶ List all the available project in the current active [client] workspace.
Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched project objects Return type: list( Project
)
-
list_connectors
(all: bool = True)¶ List all the available connectors in the current active project.
Warning
Contrary to the parent
list()
function, this method returns actualConnector
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( Connector
)
-
list_datasets
(all: bool = True)¶ List all the available datasets in the current active project.
Warning
Contrary to the parent
list()
function, this method returns actualDataset
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( Dataset
)
-
list_datasource
(all: bool = False)¶ List all the available datasources in the current active project.
Warning
Contrary to the parent
list()
function, this method returns actualDataSource
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( DataSource
)
-
list_image_folders
(all: bool = True)¶ List all the available dataset image in the current active [client] workspace.
Warning
Contrary to the parent
list()
function, this method returns actualDatasetImages
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( DatasetImages
)
-
list_usecases
(all: bool = True)¶ List all the available usecase in the current project.
Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched usecase objects Return type: list( Usecase
)
-
classmethod
new
(name: str, description: str = None, color: previsionio.project.ProjectColor = None) → previsionio.project.Project¶ Create a new datasource object on the platform.
Parameters: Returns: The registered project object in the current workspace
Return type: Raises: PrevisionException
– Any error while uploading data to the platform or parsing the resultException
– For any other unknown error
Usecase¶
-
class
previsionio.usecase.
Usecase
(**usecase_info)¶ Bases:
previsionio.api_resource.ApiResource
A Usecase
Parameters: -
delete
()¶ Delete a usecase from the actual [client] workspace.
Returns: Deletion process results Return type: dict
-
classmethod
from_id
(_id: str) → previsionio.usecase.Usecase¶ Get a usecase from the platform by its unique id.
Parameters: _id (str) – Unique id of the usecase version to retrieve Returns: Fetched usecase Return type: BaseUsecaseVersion
Raises: PrevisionException
– Any error while fetching data from the platform or parsing result
-
classmethod
list
(project_id: str, all: bool = True) → List[previsionio.usecase.Usecase]¶ List all the available usecase in the current active [client] workspace.
Warning
Contrary to the parent
list()
function, this method returns actualUsecase
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( Usecase
)
-
Base API Resource¶
All resource objects you will be using in Prevision.io’s Python SDK inherit from this base parent class.
In the SDK, a resource is an object that can be fetched from the platform,
used in your code, updated, deleted… previsionio.usecase.BaseUsecase
,
previsionio.dataset.Dataset
and previsionio.model.Model
are
all resources.
-
class
previsionio.api_resource.
ApiResource
(**params)¶ Base parent class for all SDK resource objects.
-
delete
()¶ Delete a resource from the actual [client] workspace.
Raises: PrevisionException
– Any error while deleting data from the platform
-
update_status
(specific_url: str = None) → Dict¶ Get an update on the status of a resource.
Parameters: specific_url (str, optional) – Specific (already parametrized) url to fetch the resource from (otherwise the url is built from the resource type and unique _id
)Returns: Updated status info Return type: dict
-
-
class
previsionio.api_resource.
ApiResourceType
¶ All the different resource types and matching API endpoints.
Dataset¶
-
class
previsionio.dataset.
Dataset
(_id: str, name: str, datasource: previsionio.datasource.DataSource = None, _data: pandas.core.frame.DataFrame = None, describe_state: Dict = None, drift_state=None, embeddings_state=None, separator=', ', **kwargs)¶ Bases:
previsionio.api_resource.ApiResource
Dataset objects represent data resources that will be explored by Prevision.io platform.
In order to launch an auto ml process (see
BaseUsecase
class), we need to have the matching dataset stored in the related workspace.Within the platform they are stored in tabular form and are derived:
- from files (CSV, ZIP)
- or from a Data Source at a given time (snapshot)
-
data
¶ Load in memory the data content of the current dataset into a pandas DataFrame.
Returns: Dataframe for the data object Return type: pd.DataFrame
Raises: PrevisionException
– Any error while fetching or parsing the data
-
delete
()¶ Delete a dataset from the actual [client] workspace.
Raises: PrevisionException
– If the dataset does not existrequests.exceptions.ConnectionError
– Error processing the request
-
download
(download_path: str = None)¶ Download the dataset from the platform locally.
Parameters: download_path (str, optional) – Target local directory path (if none is provided, the current working directory is used) Returns: Path the data was downloaded to Return type: str Raises: PrevisionException
– If dataset does not exist or if there was another error fetching or parsing data
-
get_embedding
() → Dict¶ Gets the embeddings analysis of the dataset from the actual [client] workspace
Raises: PrevisionException
– DatasetNotFoundErrorrequests.exceptions.ConnectionError
– request error
-
classmethod
list
(project_id, all: bool = True)¶ List all the available datasets in the current active [client] workspace.
Warning
Contrary to the parent
list()
function, this method returns actualDataset
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( Dataset
)
-
start_embedding
()¶ Starts the embeddings analysis of the dataset from the actual [client] workspace
Raises: PrevisionException
– DatasetNotFoundErrorrequests.exceptions.ConnectionError
– request error
-
to_pandas
() → pandas.core.frame.DataFrame¶ Load in memory the data content of the current dataset into a pandas DataFrame.
Returns: Dataframe for the data object Return type: pd.DataFrame
Raises: PrevisionException
– Any error while fetching or parsing the data
-
class
previsionio.dataset.
DatasetImages
(_id: str, name: str, project_id: str, copy_state, **kwargs)¶ Bases:
previsionio.api_resource.ApiResource
DatasetImages objects represent image data resources that will be used by Prevision.io’s platform.
In order to launch an auto ml process (see
BaseUsecase
class), we need to have the matching dataset stored in the related workspace.Within the platform, image folder datasets are stored as ZIP files and are copied from ZIP files.
-
delete
()¶ Delete a DatasetImages from the actual [client] workspace.
Raises: PrevisionException
– If the dataset images does not existrequests.exceptions.ConnectionError
– Error processing the request
-
download
(download_path: str = None)¶ Download the dataset from the platform locally.
Parameters: download_path (str, optional) – Target local directory path (if none is provided, the current working directory is used) Returns: Path the data was downloaded to Return type: str Raises: PrevisionException
– If dataset does not exist or if there was another error fetching or parsing data
-
classmethod
list
(project_id: str, all: bool = True)¶ List all the available dataset image in the current active [client] workspace.
Warning
Contrary to the parent
list()
function, this method returns actualDatasetImages
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( DatasetImages
)
-
DataSource¶
-
class
previsionio.datasource.
DataSource
(_id, connector_id: str, name: str, path: str = None, database: str = None, table: str = None, request: str = None, gCloud=None, **kwargs)¶ Bases:
previsionio.api_resource.ApiResource
,previsionio.api_resource.UniqueResourceMixin
A datasource to access a distant data pool and create or fetch data easily. This resource is linked to a
Connector
resource that represents the connection to the distant data source.Parameters: - _id (str) – Unique id of the datasource
- connector (
Connector
) – Reference to the associated connector (the resource to go through to get a data snapshot) - name (str) – Name of the datasource
- path (str, optional) – Path to the file to fetch via the connector
- database (str, optional) – Name of the database to fetch data from via the connector
- table (str, optional) – Name of the table to fetch data from via the connector
- request (str, optional) – Direct SQL request to use with the connector to fetch data
-
classmethod
from_id
(_id: str)¶ Get a datasource from the instance by its unique id.
Parameters: _id (str) – Unique id of the resource to retrieve Returns: The fetched datasource Return type: DataSource
Raises: PrevisionException
– Any error while fetching data from the platform or parsing the result
-
classmethod
list
(project_id: str, all: bool = False)¶ List all the available datasources in the current active [client] workspace.
Warning
Contrary to the parent
list()
function, this method returns actualDataSource
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched datasource objects Return type: list( DataSource
)
Connector¶
In all the specific connectors, the parameters for the new()
method
are the same as the ones in the Connector._new()
.
-
class
previsionio.connector.
Connector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.api_resource.ApiResource
,previsionio.api_resource.UniqueResourceMixin
A connector to interact with a distant source of data (and easily get data snapshots using an associated
DataSource
resource).Parameters: - _id (str) – Unique reference of the connector on the platform
- name (str) – Name of the connector
- host (str) – Url of the connector
- port (int) – Port of the connector
- conn_type (str) – Type of the connector, among “FTP”, “SFTP”, “SQL”, “S3”, “HIVE”, “HBASE”, “GCP”
- username (str, optional) – Username to use connect to the remote data source
- password (str, optional) – Password to use connect to the remote data source
-
classmethod
_new
(project_id: str, name: str, host: str, port: Optional[int], conn_type: str, username: str = None, password: str = None, googleCredentials: str = None)¶ Create a new connector object on the platform.
Parameters: - name (str) – Name of the connector
- host (str) – Url of the connector
- port (int) – Port of the connector
- conn_type (str) – Type of the connector, among “FTP”, “SFTP”, “SQL”, “S3”, “HIVE”, “HBASE” or “GCP”
- username (str, optional) – Username to use connect to the remote data source
- password (str, optional) – Password to use connect to the remote data source
Returns: Newly create connector object
Return type:
-
classmethod
list
(project_id: str, all: bool = False)¶ List all the available connectors in the current active [client] workspace.
Warning
Contrary to the parent
list()
function, this method returns actualConnector
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched connector objects Return type: list( Connector
)
-
class
previsionio.connector.
DataFileBaseConnector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.Connector
A specific type of connector to interact with a database client (containing files).
-
class
previsionio.connector.
DataTableBaseConnector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.Connector
A specific type of connector to interact with a database client (containing databases and tables).
-
class
previsionio.connector.
FTPConnector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.DataFileBaseConnector
A specific type of connector to interact with a FTP client (containing files).
-
class
previsionio.connector.
GCPConnector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.Connector
A specific type of connector to interact with a GCP database client (containing databases and tables or buckets).
-
class
previsionio.connector.
HiveConnector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.DataTableBaseConnector
A specific type of connector to interact with a Hive database client (containing databases and tables).
-
class
previsionio.connector.
S3Connector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.Connector
A specific type of connector to interact with an Amazon S3 client (containing buckets with files).
-
class
previsionio.connector.
SFTPConnector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.DataFileBaseConnector
A specific type of connector to interact with a secured FTP client (containing files).
-
class
previsionio.connector.
SQLConnector
(_id: str, name: str, host: str = None, port: int = None, type: str = None, username: str = '', password: str = '', googleCredentials: str = None, **kwargs)¶ Bases:
previsionio.connector.DataTableBaseConnector
A specific type of connector to interact with a SQL database client (containing databases and tables).
Usecases Version¶
Prevision.io’s Python SDK enables you to very easily run usecases of different types: regression, (binary) classification, multiclassification or timeseries.
All these classes inherit from the base previsionio.usecase.BaseUsecaseVersion
class,
and then from the previsionio.supervised.Supervised
class.
When starting a usecase, you also need to specify a training configuration.
Take a look at the specific documentation pages for a more in-depth explanation of each layer and of the usecase configuration options:
Base usecase version¶
-
class
previsionio.usecase_version.
BaseUsecaseVersion
(**usecase_info)¶ Bases:
previsionio.api_resource.ApiResource
Base parent class for all usecases objects.
-
best_model
¶ Get the model with the best predictive performance over all models (including Blend models), where the best performance corresponds to a minimal loss.
Returns: Model with the best performance in the usecase, or None
if no model matched the search filter.Return type: ( Model
, None)
-
delete_prediction
(prediction_id: str)¶ Delete a prediction in the list for the current usecase from the actual [client] workspace.
Parameters: prediction_id (str) – Unique id of the prediction to delete Returns: Deletion process results Return type: dict
-
delete_predictions
()¶ Delete all predictions in the list for the current usecase from the actual [client] workspace.
Returns: Deletion process results Return type: dict
-
done
¶ Get a flag indicating whether or not the usecase is currently done.
Returns: done status Return type: bool
-
fastest_model
¶ Returns the model that predicts with the lowest response time
Returns: Model object – corresponding to the fastest model
-
get_holdout_predictions
(full: bool = False)¶ Retrieves the list of holdout predictions for the current usecase from client workspace (with the full predictions object if necessary) :param full: If true, return full holdout prediction objects (else only metadata) :type full: boolean
-
get_predictions
(full: bool = False)¶ Retrieves the list of predictions for the current usecase from client workspace (with the full predictions object if necessary) :param full: If true, return full prediction objects (else only metadata) :type full: boolean
-
lite_models_list
¶ Get the list of selected lite models in the usecase.
Returns: Names of the lite models selected for the usecase Return type: list(str)
-
models
¶ Get the list of models generated for the current use case. Only the models that are done training are retrieved.
Returns: List of models found by the platform for the usecase Return type: list( Model
)
-
normal_models_list
¶ Get the list of selected normal models in the usecase.
Returns: Names of the normal models selected for the usecase Return type: list(str)
-
running
¶ Get a flag indicating whether or not the usecase is currently running.
Returns: Running status Return type: bool
-
score
¶ Get the current score of the usecase (i.e. the score of the model that is currently considered the best performance-wise for this usecase).
Returns: Usecase score (or infinity if not available). Return type: float
-
simple_models_list
¶ Get the list of selected simple models in the usecase.
Returns: Names of the simple models selected for the usecase Return type: list(str)
-
status
¶ Get a flag indicating whether or not the usecase is currently running.
Returns: Running status Return type: bool
-
stop
()¶ Stop a usecase (stopping all nodes currently in progress).
-
train_dataset
¶ Get the
Dataset
object corresponding to the training dataset of the usecase.Returns: Associated training dataset Return type: Dataset
-
update_status
()¶ Get an update on the status of a resource.
Parameters: specific_url (str, optional) – Specific (already parametrized) url to fetch the resource from (otherwise the url is built from the resource type and unique _id
)Returns: Updated status info Return type: dict
-
usecase
¶ Get a usecase of current usecase version.
Returns: Fetched usecase Return type: Usecase
Raises: PrevisionException
– Any error while fetching data from the platform or parsing result
-
wait_until
(condition, raise_on_error: bool = True, timeout: float = 3600.0)¶ Wait until condition is fulfilled, then break.
Parameters: - (func (condition) – (
BaseUsecaseVersion
) -> bool.): Function to use to check the break condition - raise_on_error (bool, optional) – If true then the function will stop on error,
otherwise it will continue waiting (default:
True
) - timeout (float, optional) – Maximal amount of time to wait before forcing exit
Example:
usecase.wait_until(lambda usecasev: len(usecasev.models) > 3)
Raises: PrevisionException
– If the resource could not be fetched or there was a timeout.- (func (condition) – (
-
-
class
previsionio.usecase_version.
ClassicUsecaseVersion
(**usecase_info)¶ Bases:
previsionio.usecase_version.BaseUsecaseVersion
-
correlation_matrix
¶ Get the correlation matrix of the features (those constitute the dataset on which the usecase was trained).
Returns: Correlation matrix as a pandas
dataframeReturn type: pd.DataFrame
-
drop_list
¶ Get the list of drop columns in the usecase.
Returns: Names of the columns dropped from the dataset Return type: list(str)
-
fe_selected_list
¶ Get the list of selected feature engineering modules in the usecase.
Returns: Names of the feature engineering modules selected for the usecase Return type: list(str)
-
features
¶ - feature types distribution
- feature information list
- list of dropped features
Returns: General features information Return type: dict Type: Get the general description of the usecase’s features, such as
-
features_stats
¶ - feature types distribution
- feature information list
- list of dropped features
Returns: General features information Return type: dict Type: Get the general description of the usecase’s features, such as
-
get_cv
() → pandas.core.frame.DataFrame¶ Get the cross validation dataset from the best model of the usecase.
Returns: Cross validation dataset Return type: pd.DataFrame
-
get_feature_info
(feature_name: str) → Dict¶ Return some information about the given feature, such as:
name: the name of the feature as it was given in the
feature_name
parametertype: linear, categorical, ordinal…
stats: some basic statistics such as number of missing values, (non missing) values count, plus additional information depending on the feature type:
- for a linear feature: min, max, mean, std and median
- for a categorical/textual feature: modalities/words frequencies, list of the most frequent tokens
role: whether or not the feature is a target/fold/weight or id feature (and for time series usecases, whether or not it is a group/apriori feature - check the Prevision.io’s timeseries documentation)
importance_value: scores reflecting the importance of the given feature
Parameters: - feature_name (str) – Name of the feature to get informations about
- warning:: (.) – The
feature_name
is case-sensitive, so “age” and “Age” are different features!
Returns: Dictionary containing the feature information
Return type: Raises: PrevisionException
– If the given feature name does not match any feaure
-
predict
(df, confidence=False, prediction_dataset_name=None) → pandas.core.frame.DataFrame¶ Get the predictions for a dataset stored in the current active [client] workspace using the best model of the usecase with a Scikit-learn style blocking prediction mode.
Warning
For large dataframes and complex (blend) models, this can be slow (up to 1-2 hours). Prefer using this for simple models and small dataframes, or use option
use_best_single = True
.Parameters: - df (
pd.DataFrame
) –pandas
DataFrame containing the test data - confidence (bool, optional) – Whether to predict with confidence values
(default:
False
)
Returns: Prediction data (as
pandas
dataframe) and prediction job ID.Return type: - df (
-
predict_from_dataset
(dataset, confidence=False, dataset_folder=None) → pandas.core.frame.DataFrame¶ Get the predictions for a dataset stored in the current active [client] workspace using the best model of the usecase.
Parameters: Returns: Predictions as a
pandas
dataframeReturn type: pd.DataFrame
-
predict_single
(data, confidence=False, explain=False)¶ Get a prediction on a single instance using the best model of the usecase.
Parameters: Returns: Dictionary containing the prediction.
Note
The format of the predictions dictionary depends on the problem type (regression, classification…)
Return type:
-
print_info
()¶ Print all info on the usecase.
-
Supervised usecases¶
-
class
previsionio.supervised.
Supervised
(**usecase_info)¶ Bases:
previsionio.usecase_version.ClassicUsecaseVersion
A supervised usecase.
-
classmethod
from_id
(_id: str) → previsionio.supervised.Supervised¶ Get a supervised usecase from the platform by its unique id.
Parameters: Returns: Fetched usecase
Return type: Raises: PrevisionException
– Invalid problem type or any error while fetching data from the platform or parsing result
-
new_version
(description: str = None, dataset: Union[previsionio.dataset.Dataset, Tuple[previsionio.dataset.Dataset, previsionio.dataset.DatasetImages]] = None, column_config: previsionio.usecase_config.ColumnConfig = None, metric: enum.Enum = None, holdout_dataset: previsionio.dataset.Dataset = None, training_config: previsionio.usecase_config.TrainingConfig = None, **fit_params)¶ Start a supervised usecase training to create a new version of the usecase (on the platform): the training configs are copied from the current version and then overridden for the given parameters.
Parameters: - description (str, optional) – additional description of the version
- dataset (
Dataset
,DatasetImages
, optional) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
, optional) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (metrics.Enum, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created supervised usecase object (new version)
Return type:
-
classmethod
-
class
previsionio.timeseries.
TimeSeries
(**usecase_info)¶ Bases:
previsionio.usecase_version.ClassicUsecaseVersion
A TimeSeries usecase.
-
metric_type
¶ alias of
previsionio.metrics.Regression
-
model_class
¶ alias of
previsionio.model.RegressionModel
-
new_version
(description: str = None, dataset: previsionio.dataset.Dataset = None, column_config: previsionio.usecase_config.ColumnConfig = None, time_window: previsionio.timeseries.TimeWindow = None, metric: previsionio.metrics.Regression = None, holdout_dataset: previsionio.dataset.Dataset = None, training_config: previsionio.usecase_config.TrainingConfig = <previsionio.usecase_config.TrainingConfig object>)¶ Start a time series usecase training to create a new version of the usecase (on the platform): the training configs are copied from the current version and then overridden for the given parameters.
Parameters: - description (str, optional) – additional description of the version
- dataset (
Dataset
,DatasetImages
, optional) – Reference to the dataset object to use for as training dataset - column_config (
ColumnConfig
, optional) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - ( (time_window) – class: .TimeWindow, optional): a time window object for representing either feature derivation window periods or forecast window periods
- metric (metrics.Regression, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
, optional) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created text similarity usecase version object (new version)
Return type:
-
-
class
previsionio.timeseries.
TimeWindow
(derivation_start: int, derivation_end: int, forecast_start: int, forecast_end: int)¶ Bases:
previsionio.usecase_config.UsecaseConfig
A time window object for representing either feature derivation window periods or forecast window periods
TextSimilarity usecases¶
-
class
previsionio.text_similarity.
DescriptionsColumnConfig
(content_column, id_column)¶ Bases:
previsionio.usecase_config.UsecaseConfig
Description Column configuration for starting a usecase: this object defines the role of specific columns in the dataset.
Parameters:
-
class
previsionio.text_similarity.
ModelsParameters
(model_embedding='tf_idf', preprocessing=<previsionio.text_similarity.Preprocessing object>, models=['brute_force'])¶ Bases:
previsionio.usecase_config.UsecaseConfig
Training configuration that holds the relevant data for a usecase description: the wanted feature engineering, the selected models, the training speed…
Args:
-
class
previsionio.text_similarity.
QueriesColumnConfig
(queries_dataset_content_column, queries_dataset_matching_id_description_column, queries_dataset_id_column=None)¶ Bases:
previsionio.usecase_config.UsecaseConfig
Description Column configuration for starting a usecase: this object defines the role of specific columns in the dataset.
Parameters:
-
class
previsionio.text_similarity.
TextSimilarity
(**usecase_info)¶ Bases:
previsionio.usecase_version.BaseUsecaseVersion
-
new_version
(description: str = None, dataset: previsionio.dataset.Dataset = None, description_column_config: previsionio.text_similarity.DescriptionsColumnConfig = None, metric: previsionio.metrics.TextSimilarity = None, top_k: int = None, lang: str = 'auto', queries_dataset: previsionio.dataset.Dataset = None, queries_column_config: Optional[previsionio.text_similarity.QueriesColumnConfig] = None, models_parameters: previsionio.text_similarity.ListModelsParameters = None, **kwargs) → previsionio.text_similarity.TextSimilarity¶ Start a text similarity usecase training to create a new version of the usecase (on the platform): the training configs are copied from the current version and then overridden for the given parameters.
Parameters: - description (str, optional) – additional description of the version
- dataset (
Dataset
,DatasetImages
, optional) – Reference to the dataset object to use for as training dataset - description_column_config (
DescriptionsColumnConfig
, optional) – Column configuration for the usecase (see the documentation of theColumnConfig
resource for more details on each possible column types) - metric (metrics.TextSimilarity, optional) – Specific metric to use for the usecase (default:
None
) - holdout_dataset (
Dataset
, optional) – Reference to a dataset object to use as a holdout dataset (default:None
) - training_config (
TrainingConfig
, optional) – Specific training configuration (see the documentation of theTrainingConfig
resource for more details on all the parameters)
Returns: Newly created text similarity usecase version object (new version)
Return type:
-
Usecase configuration¶
-
class
previsionio.usecase_config.
AdvancedModel
¶ Types of normal models that can be trained with Prevision.io. The
Full
member is a shortcut to get all available models at once. To just drop a single model from a list of models, use:LiteModel.drop(LiteModel.xxx)
-
CatBoost
= 'CB'¶ CatBoost
-
ExtraTrees
= 'ET'¶ ExtraTrees
-
Full
= ['LGB', 'XGB', 'NN', 'ET', 'LR', 'RF', 'CB']¶ Evaluate all models
-
LightGBM
= 'LGB'¶ LightGBM
-
LinReg
= 'LR'¶ Linear Regression
-
NeuralNet
= 'NN'¶ NeuralNet
-
RandomForest
= 'RF'¶ Random Forest
-
XGBoost
= 'XGB'¶ XGBoost
-
-
class
previsionio.usecase_config.
ColumnConfig
(target_column: Optional[str] = None, filename_column: Optional[str] = None, id_column: Optional[str] = None, fold_column: Optional[str] = None, weight_column: Optional[str] = None, time_column: Optional[str] = None, group_columns: Optional[str] = None, apriori_columns: Optional[str] = None, drop_list: Optional[List[str]] = None)¶ Column configuration for starting a usecase: this object defines the role of specific columns in the dataset (and optionally the list of columns to drop).
Parameters: - target_column (str, optional) – Name of the target column in the dataset
- id_column (str, optional) – Name of the id column in the dataset that does not have any signal and will be ignored for computation
- fold_column (str, optional) – Name of the fold column used that should be used to compute the various folds in the dataset
- weight_column (str, optional) – Name of the weight column used to assign non-equal importance weights to the various rows in the dataset
- filename_column (str, optional) – Name of the filename column in the dataset for an image-based usecase
- time_column (str, optional) – Name of the time column in the dataset for a timeseries usecase
- group_columns (str, optional) – Name of the target column in the dataset for a timeseries usecase
- apriori_columns (str, optional) – Name of the target column in the dataset for a timeseries usecase
- drop_list (list(str), optional) – Names of all the columns that should be dropped from the dataset while training the usecase
-
class
previsionio.usecase_config.
DataType
¶ Type of data available with Prevision.io.
-
Images
= 'images'¶ Catalogue of images
-
Tabular
= 'tabular'¶ Data arranged in a table
-
TimeSeries
= 'timeseries'¶ Data points indexed in time order
-
-
class
previsionio.usecase_config.
Feature
¶ Types of feature engineering that can be applied to a dataset with Prevision.io. The
Full
member is a shortcut to get all available feature engineering modules at once. To just drop a feature engineering module from a list of modules, use:Feature.drop(Feature.xxx)
-
Counts
= 'Counter'¶ Value type counting
-
DateTime
= 'Date'¶ Date transformation
-
Frequency
= 'freq'¶ Frequency encoding
-
Full
= ['Counter', 'Date', 'freq', 'text_tfidf', 'text_word2vec', 'text_embedding', 'tenc', 'poly', 'pca', 'kmean']¶ Full feature engineering
-
KMeans
= 'kmean'¶ K-Means clustering
-
PCA
= 'pca'¶ Principal component analysis
-
PolynomialFeatures
= 'poly'¶ Polynomial feature
-
TargetEncoding
= 'tenc'¶ Target encoding
-
TextEmbedding
= 'text_embedding'¶ Sentence embedding
-
TextTfidf
= 'text_tfidf'¶ Statistical analysis
-
TextWord2vect
= 'text_word2vec'¶ Word embedding
-
-
class
previsionio.usecase_config.
NormalModel
¶ Types of lite models that can be trained with Prevision.io. The
Full
member is a shortcut to get all available models at once. To just drop a single model from a list of models, use:Model.drop(Model.xxx)
-
CatBoost
= 'CB'¶ CatBoost
-
ExtraTrees
= 'ET'¶ ExtraTrees
-
Full
= ['LGB', 'XGB', 'NN', 'ET', 'LR', 'RF', 'NBC', 'CB']¶ Evaluate all models
-
LightGBM
= 'LGB'¶ LightGBM
-
LinReg
= 'LR'¶ Linear Regression
-
NaiveBayesClassifier
= 'NBC'¶ Random Forest
-
NeuralNet
= 'NN'¶ NeuralNet
-
RandomForest
= 'RF'¶ Random Forest
-
XGBoost
= 'XGB'¶ XGBoost
-
-
class
previsionio.usecase_config.
ParamList
¶ A list of params to be passed to a usecase.
-
class
previsionio.usecase_config.
Profile
¶ Training profile type.
-
Advanced
= 'advanced'¶ Slowest profile, for maximal optimization
-
Normal
= 'normal'¶ Normal profile, best balance
-
Quick
= 'quick'¶ Quickest profile, lowest predictive performance
-
-
class
previsionio.usecase_config.
SimpleModel
¶ Types of simple models that can be trained with Prevision.io. The
Full
member is a shortcut to get all available simple models at once. To just drop a single model from a list of simple models, use:SimpleModel.drop(SimpleModel.xxx)
-
DecisionTree
= 'DT'¶ DecisionTree
-
Full
= ['DT', 'LR']¶ Evaluate all simple models
-
LinReg
= 'LR'¶ Linear Regression
-
-
class
previsionio.usecase_config.
TrainingConfig
(profile='quick', advanced_models=['XGB', 'LR'], normal_models=['XGB', 'LR'], simple_models=[], features=['freq', 'tenc', 'Counter'], with_blend=False, fe_selected_list=[])¶ Training configuration that holds the relevant data for a usecase description: the wanted feature engineering, the selected models, the training speed…
Parameters: - profile (str) –
Type of training profile to use:
- ”quick”: this profile runs very fast but has a lower performance (it is recommended for early trials)
- ”advanced”: this profile runs slower but has increased performance (it is usually for optimization steps at the end of your project)
- the “normal” profile is something in-between to help you investigate an interesting result
- advanced_models (list(str), optional) – Names of the (advanced) models to use in the usecase (among: “LR”, “RF”, “ET”, “XGB”, “LGB”, “CB” and “NN”)
- normal_models (list(str), optional) – Names of the (normal) models to use in the usecase (among: “LR”, “RF”, “ET”, “XGB”, “LGB”, “CB”, ‘NB’ and “NN”)
- simple_models (list(str), optional) – Names of the (simple) models to use in the usecase (among: “LR” and “DT”)
- features (list(str), optional) – Names of the feature engineering modules to use (among: “Counter”, “Date”, “freq”, “text_tfidf”, “text_word2vec”, “text_embedding”, “tenc”, “ee”, “poly”, “pca” and “kmean”)
- with_blend (bool, optional) – If true, Prevision.io’s pipeline will add “blend” models at the end of the training by cherry-picking already trained models and fine-tuning hyperparameters (usually gives even better performance)
- fe_selected_list (list(str), optional) – Override for the features list, to restrict it only this list
- profile (str) –
-
class
previsionio.usecase_config.
TypeProblem
¶ Type of supervised problems available with Prevision.io.
-
Classification
= 'classification'¶ Prediction using classification approach, for when the output variable is a category
-
MultiClassification
= 'multiclassification'¶ Prediction using classification approach, for when the output variable many categories
-
ObjectDetection
= 'object-detection'¶ Detection of pattern in images
-
Regression
= 'regression'¶ Prediction using regression problem, for when the output variable is a real or continuous value
-
TextSimilarity
= 'text-similarity'¶ Ranking of texts by keywords
-
Model¶
-
class
previsionio.model.
Model
(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)¶ Bases:
previsionio.api_resource.ApiResource
A Model object is generated by Prevision AutoML plateform when you launch a use case. All models generated by Prevision.io are deployable in our Store
With this Model class, you can select the model with the optimal hyperparameters that responds to your buisiness requirements, then you can deploy it as a real-time/batch endpoint that can be used for a web Service.
Parameters: -
deploy
() → previsionio.deployed_model.DeployedModel¶ (Not Implemented yet) Deploy the model as a REST API app.
Keyword Arguments: {enum} -- it can be 'model', 'notebook', 'shiny', 'dash' or 'node' application (app_type) – Returns: Path of the deployed application Return type: str
-
classmethod
from_id
(_id: str) → Union[previsionio.model.RegressionModel, previsionio.model.ClassificationModel, previsionio.model.MultiClassificationModel, previsionio.model.TextSimilarityModel]¶ Get a usecase from the platform by its unique id.
Parameters: Returns: Fetched usecase
Return type: Raises: PrevisionException
– Any error while fetching data from the platform or parsing result
-
hyperparameters
¶ Return the hyperparameters of a model.
Returns: Hyperparameters of the model Return type: dict
-
predict
(df: pandas.core.frame.DataFrame, confidence: bool = False, prediction_dataset_name: str = None) → pandas.core.frame.DataFrame¶ Make a prediction in a Scikit-learn blocking style.
Warning
For large dataframes and complex (blend) models, this can be slow (up to 1-2 hours). Prefer using this for simple models and small dataframes or use option
use_best_single = True
.Parameters: - df (
pd.DataFrame
) – Apandas
dataframe containing the testing data - confidence (bool, optional) – Whether to predict with confidence estimator (default:
False
)
Returns: Prediction results dataframe
Return type: pd.DataFrame
- df (
-
predict_from_dataset
(dataset: previsionio.dataset.Dataset, confidence: bool = False, dataset_folder: previsionio.dataset.Dataset = None) → Optional[pandas.core.frame.DataFrame]¶ Make a prediction for a dataset stored in the current active [client] workspace (using the current SDK dataset object).
Parameters: Returns: Prediction results dataframe
Return type: pd.DataFrame
-
-
class
previsionio.model.
ClassicModel
(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)¶ Bases:
previsionio.model.Model
-
chart
()¶ Return chart analysis information for a model.
Returns: Chart analysis results Return type: dict Raises: PrevisionException
– Any error while fetching data from the platform or parsing the result
-
cross_validation
¶ Get model’s cross validation dataframe.
Returns: Cross-validation dataframe Return type: pd.Dataframe
-
feature_importance
¶ Return a dataframe of feature importances for the given model features, with their corresponding scores (sorted by descending feature importance scores).
Returns: Dataframe of feature importances Return type: pd.DataFrame
Raises: PrevisionException
– Any error while fetching data from the platform or parsing the result
-
predict_single
(data: Dict, confidence: bool = False, explain: bool = False)¶ Make a prediction for a single instance. Use
predict_from_dataset_name()
or predict methods to predict multiple instances at the same time (it’s faster).Parameters: Note
You can set both
confidence
andexplain
to true.Returns: Dictionary containing the prediction result Note
The prediction format depends on the problem type (regression, classification, etc…)
Return type: dict
-
-
class
previsionio.model.
ClassificationModel
(_id, usecase_version_id, **other_params)¶ Bases:
previsionio.model.ClassicModel
A model object for a (binary) classification usecase, i.e. a usecase where the target is categorical with exactly 2 modalities.
Parameters: -
get_dynamic_performances
(threshold: float = 0.5)¶ Get model performance for the given threshold.
Parameters: threshold (float, optional) – Threshold to check the model’s performance for (default: 0.5) Returns: Model classification performance dict with the following keys: confusion_matrix
accuracy
precision
recall
f1_score
Return type: dict Raises: PrevisionException
– Any error while fetching data from the platform or parsing the result
-
-
class
previsionio.model.
MultiClassificationModel
(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)¶ Bases:
previsionio.model.ClassicModel
A model object for a multi-classification usecase, i.e. a usecase where the target is categorical with strictly more than 2 modalities.
Parameters:
-
class
previsionio.model.
RegressionModel
(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)¶ Bases:
previsionio.model.ClassicModel
A model object for a regression usecase, i.e. a usecase where the target is numerical.
Parameters:
-
class
previsionio.model.
TextSimilarityModel
(_id: str, usecase_version_id: str, project_id: str, model_name: str = None, deployable: bool = False, **other_params)¶ Bases:
previsionio.model.Model
-
predict_from_dataset
(queries_dataset: previsionio.dataset.Dataset, queries_dataset_content_column: str, top_k: int = 10, queries_dataset_matching_id_description_column: str = None) → Optional[pandas.core.frame.DataFrame]¶ Make a prediction for a dataset stored in the current active [client] workspace (using the current SDK dataset object).
Parameters: Returns: Prediction results dataframe
Return type: pd.DataFrame
-
Metrics¶
The metric of a usecase is the function that will be used to assess for the efficiency of its models. The metrics you can choose depends on the type of usecase you are training.
-
class
previsionio.metrics.
Classification
¶ Metrics for classification projects Available metrics in prevision:
auc, log_loss, error_rate_binary-
AUC
= 'auc'¶ Area Under ROC Curve
-
AUCPR
= 'aucpr'¶ precision recall area under the curve score
-
F05
= 'F05'¶ F05 Score
-
F1
= 'F1'¶ Balanced F-score
-
F2
= 'F2'¶ F2 Score
-
F3
= 'F3'¶ F3 Score
-
F4
= 'F4'¶ F4 Score
-
Lift01
= 'lift_at_0.1'¶ lift at ratio 0.1
-
Lift02
= 'lift_at_0.2'¶ lift at ratio 0.2
-
Lift03
= 'lift_at_0.3'¶ lift at ratio 0.3
-
Lift04
= 'lift_at_0.4'¶ lift at ratio 0.4
-
Lift05
= 'lift_at_0.5'¶ lift at ratio 0.5
-
Lift06
= 'lift_at_0.6'¶ lift at ratio 0.6
-
Lift07
= 'lift_at_0.7'¶ lift at ratio 0.7
-
Lift08
= 'lift_at_0.8'¶ lift at ratio 0.8
-
Lift09
= 'lift_at_0.9'¶ lift at ratio 0.9
-
MCC
= 'mcc'¶ Matthews correlation coefficient
-
accuracy
= 'accuracy'¶ Accuracy
-
error_rate
= 'error_rate_binary'¶ Error rate
-
gini
= 'gini'¶ Gini score
-
log_loss
= 'log_loss'¶ Logarithmic Loss
-
-
class
previsionio.metrics.
Clustering
¶ Metrics for clustering projects
-
calinski_harabaz
= 'calinski_harabaz'¶ Clustering calinski_harabaz metric
-
silhouette
= 'silhouette'¶ Clustering silhouette metric
-
-
class
previsionio.metrics.
MultiClassification
¶ Metrics for multiclassification projects
-
AUC
= 'auc'¶ Area Under ROC Curve
-
MAP10
= 'map_at_10'¶ qmean average precision @10
-
MAP3
= 'map_at_3'¶ qmean average precision @3
-
MAP5
= 'map_at_5'¶ qmean average precision @5
-
accuracy
= 'accuracy'¶ accuracy
-
error_rate
= 'error_rate_multi'¶ Multi-class Error rate
-
log_loss
= 'log_loss'¶ Logarithmic Loss
-
macroF1
= 'macroF1'¶ balanced F-score
-
qkappa
= 'qkappa'¶ quadratic weighted kappa
-
-
class
previsionio.metrics.
Regression
¶ Metrics for regression projects Available metrics in prevision:
rmse, mape, rmsle, mse, mae-
MAE
= 'mae'¶ Mean Average Error
-
MAPE
= 'mape'¶ Mean Average Percentage Error
-
MER
= 'mer'¶ Median Absolute Error
-
MSE
= 'mse'¶ Mean squared Error
-
R2
= 'R2'¶ R2 Error
-
RMSE
= 'rmse'¶ Root Mean Squared Error
-
RMSLE
= 'rmsle'¶ Root Mean Squared Logarithmic Error
-
RMSPE
= 'rmspe'¶ Root Mean Squared Percentage Error
-
SMAPE
= 'smape'¶ Symmetric Mean Absolute Percentage Error
-
Deployed model¶
Prevision.io’s SDK allows to make a prediction from a model deployed with the Prevision.io’s platform.
import previsionio as pio
# Initialize the deployed model object from the url of the model, your client id and client secret for this model, and your credentials
model = pio.DeployedModel(prevision_app_url, client_id, client_secret)
# Make a prediction
prediction, confidance, explain = model.predict(
predict_data={'feature1': 1, 'feature2': 2},
use_confidence=True,
explain=True,
)
-
class
previsionio.deployed_model.
DeployedModel
(prevision_app_url: str, client_id: str, client_secret: str, prevision_token_url: str = None)¶ DeployedModel class to interact with a deployed model.
-
predict
(predict_data: Dict, use_confidence: bool = False, explain: bool = False)¶ Get a prediction on a single instance using the best model of the usecase.
Parameters: Returns: Tuple containing the prediction value, confidence and explain. In case of regression problem type, confidence format is a list. In case of multiclassification problem type, prediction value format is a string.
Return type:
-
request
(endpoint, method, files=None, data=None, allow_redirects=True, content_type=None, no_retries=False, **requests_kwargs)¶ Make a request on the desired endpoint with the specified method & data.
Requires initialization.
Parameters: - endpoint – (str): api endpoint (e.g. /usecases, /prediction/file)
- method (requests.{get,post,delete}) – requests method
- files (dict) – files dict
- data (dict) – for single predict
- content_type (str) – force request content-type
- allow_redirects (bool) – passed to requests method
- no_retries (bool) – force request to run the first time, or exit directly
Returns: request response
Raises: Exception
– Error if url/token not configured
-