Dataset

class previsionio.dataset.Dataset(_id: str, name: str, datasource: previsionio.datasource.DataSource = None, _data: pandas.core.frame.DataFrame = None, describe_state: Dict = None, drift_state=None, embeddings_state=None, separator=', ', **kwargs)

Bases: previsionio.api_resource.ApiResource

Dataset objects represent data resources that will be explored by Prevision.io platform.

In order to launch an auto ml process (see BaseExperiment class), we need to have the matching dataset stored in the related workspace.

Within the platform they are stored in tabular form and are derived:

  • from files (CSV, ZIP)
  • or from a Data Source at a given time (snapshot)
data

Load in memory the data content of the current dataset into a pandas DataFrame.

Returns:Dataframe for the data object
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching or parsing the data
delete()

Delete a dataset from the actual [client] workspace.

Raises:
  • PrevisionException – If the dataset does not exist
  • requests.exceptions.ConnectionError – Error processing the request
download(download_path: str = None)

Download the dataset from the platform locally.

Parameters:download_path (str, optional) – Target local directory path (if none is provided, the current working directory is used)
Returns:Path the data was downloaded to
Return type:str
Raises:PrevisionException – If dataset does not exist or if there was another error fetching or parsing data
get_embedding() → Dict

Gets the embeddings analysis of the dataset from the actual [client] workspace

Raises:
  • PrevisionException – DatasetNotFoundError
  • requests.exceptions.ConnectionError – request error
classmethod list(project_id: str, all: bool = True)

List all the available datasets in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual Dataset objects rather than plain dictionaries with the corresponding data.

Parameters:
  • project_id (str) – Unique reference of the project id on the platform
  • all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:

Fetched dataset objects

Return type:

list(Dataset)

start_embedding()

Starts the embeddings analysis of the dataset from the actual [client] workspace

Raises:
  • PrevisionException – DatasetNotFoundError
  • requests.exceptions.ConnectionError – request error
to_pandas() → pandas.core.frame.DataFrame

Load in memory the data content of the current dataset into a pandas DataFrame.

Returns:Dataframe for the data object
Return type:pd.DataFrame
Raises:PrevisionException – Any error while fetching or parsing the data
update_status()

Get an update on the status of a resource.

Parameters:specific_url (str, optional) – Specific (already parametrized) url to fetch the resource from (otherwise the url is built from the resource type and unique _id)
Returns:Updated status info
Return type:dict
class previsionio.dataset.DatasetImages(_id: str, name: str, project_id: str, copy_state, **kwargs)

Bases: previsionio.api_resource.ApiResource

DatasetImages objects represent image data resources that will be used by Prevision.io’s platform.

In order to launch an auto ml process (see BaseExperiment class), we need to have the matching dataset stored in the related workspace.

Within the platform, image folder datasets are stored as ZIP files and are copied from ZIP files.

delete()

Delete a DatasetImages from the actual [client] workspace.

Raises:
  • PrevisionException – If the dataset images does not exist
  • requests.exceptions.ConnectionError – Error processing the request
download(download_path: str = None)

Download the dataset from the platform locally.

Parameters:download_path (str, optional) – Target local directory path (if none is provided, the current working directory is used)
Returns:Path the data was downloaded to
Return type:str
Raises:PrevisionException – If dataset does not exist or if there was another error fetching or parsing data
classmethod list(project_id: str, all: bool = True)

List all the available dataset image in the current active [client] workspace.

Warning

Contrary to the parent list() function, this method returns actual DatasetImages objects rather than plain dictionaries with the corresponding data.

Parameters:all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result.
Returns:Fetched dataset objects
Return type:list(DatasetImages)