Dataset

class previsionio.dataset.Dataset(_id, name, datasource=None, _data=None, **kwargs)

Bases: previsionio.api_resource.ApiResource

Dataset objects represent data resources that will be explored by Prevision.io platform.

In order to launch an auto ml process (see BaseUsecase class), we need to have the matching dataset stored in the related workspace.

Within the platform they are stored in tabular form and are derived:

  • from files (CSV, ZIP)
  • or from a Data Source at a given time (snapshot)
data

Load in memory the data content of the current dataset into a pandas DataFrame.

Returns:Dataframe for the data object
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching or parsing the data
delete()

Delete a dataset from the actual [client] workspace.

Raises:
  • PrevisionException – If the dataset does not exist
  • requests.exceptions.ConnectionError – Error processing the request
classmethod download(dataset_name=None, download_path=None)

Download the dataset from the platform locally.

Parameters:
  • dataset_name (str) – Name of the dataset to download
  • download_path (str, optional) – Target local directory path (if none is provided, the current working directory is used)
Returns:

Path the data was downloaded to

Return type:

str

Raises:

PrevisionException – If dataset does not exist or if there was another error fetching or parsing data

classmethod get_by_name(name=None, version='last')

Get an already registered dataset from the platform (using its registration name).

Parameters:
  • name (str) – Name of the dataset to retrieve
  • version (int, str, optional) – Specific version of the dataset (can be an int, or ‘last’ - default - to get the latest version of the dataset)
Raises:
  • AttributeError – if dataset_name is not given
  • PrevisionException – If dataset does not exist or if there was another error fetching or parsing data
Returns:

Fetched dataset

Return type:

Dataset

get_embedding()

Gets the embeddings analysis of the dataset from the actual [client] workspace

Raises:
  • PrevisionException – DatasetNotFoundError
  • requests.exceptions.ConnectionError – request error
classmethod getid_from_name(name=None, version='last')

Return the dataset id corresponding to a given name.

Parameters:
  • name (str) – Name of the dataset
  • version (int, str, optional) – Specific version of the dataset (can be an int, or ‘last’ - default - to get the latest version of the dataset)
Raises:

PrevisionException – If dataset does not exist, version number is out of range or there is another error fetching or parsing data

classmethod list(all=<built-in function all>)

List all the available datasets in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual Dataset objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(Dataset)
classmethod new(name, datasource=None, file_name=None, dataframe=None)

Register a new dataset in the workspace for further processing. You need to provide either a datasource, a file name or a dataframe (only one can be specified).

Note

To start a new use case on a dataset, it has to be already registred in your workspace.

Parameters:
  • name (str) – Registration name for the dataset
  • datasource (DataSource, optional) – A DataSource object used to import a remote dataset (if you want to import a specific dataset from an existent database, you need a datasource connector (Connector object) designed to point to the related data source)
  • file_name (str, optional) – Path to a file to upload as dataset
  • dataframe (pd.DataFrame, optional) – A pandas dataframe containing the data to upload
Raises:
  • Exception – If more than one of the keyword arguments datasource, file_name, dataframe was specified
  • PrevisionException – Error while creating the dataset on the platform
Returns:

The registered dataset object in the current workspace.

Return type:

Dataset

start_embedding()

Starts the embeddings analysis of the dataset from the actual [client] workspace

Raises:
  • PrevisionException – DatasetNotFoundError
  • requests.exceptions.ConnectionError – request error
to_pandas() → pandas.core.frame.DataFrame

Load in memory the data content of the current dataset into a pandas DataFrame.

Returns:Dataframe for the data object
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching or parsing the data
class previsionio.dataset.DatasetImages(_id, name, datasource=None, _data=None, **kwargs)

Bases: previsionio.dataset.Dataset

DatasetImages objects represent image data resources that will be used by Prevision.io’s platform.

In order to launch an auto ml process (see BaseUsecase class), we need to have the matching dataset stored in the related workspace.

Within the platform, image folder datasets are stored as ZIP files and are copied from ZIP files.

classmethod new(name, file_name)

Register a new image dataset in the workspace for further processing (in the image folders group).

Note

To start a new use case on a dataset, it has to be already registred in your workspace.

Parameters:
  • name (str) – Registration name for the dataset
  • file_name (str) – Path to the zip file to upload as image dataset
Raises:

PrevisionException – Error while creating the dataset on the platform

Returns:

The registered dataset object in the current workspace.

Return type:

Dataset

to_pandas() → pandas.core.frame.DataFrame

Invalid method for a DatasetImages object.

Raises:ValueError – Folder datasets cannot be converted to a pandas dataframe