Dataset¶
-
class
previsionio.dataset.
Dataset
(_id, name, datasource=None, _data=None, **kwargs)¶ Bases:
previsionio.api_resource.ApiResource
Dataset objects represent data resources that will be explored by Prevision.io platform.
In order to launch an auto ml process (see
BaseUsecase
class), we need to have the matching dataset stored in the related workspace.Within the platform they are stored in tabular form and are derived:
- from files (CSV, ZIP)
- or from a Data Source at a given time (snapshot)
-
data
¶ Load in memory the data content of the current dataset into a pandas DataFrame.
Returns: Dataframe for the data object Return type: pd.DataFrame
Raises: PrevisionException
– Any error while fetching or parsing the data
-
delete
()¶ Delete a dataset from the actual [client] workspace.
Raises: PrevisionException
– If the dataset does not existrequests.exceptions.ConnectionError
– Error processing the request
-
classmethod
download
(dataset_name=None, download_path=None)¶ Download the dataset from the platform locally.
Parameters: Returns: Path the data was downloaded to
Return type: Raises: PrevisionException
– If dataset does not exist or if there was another error fetching or parsing data
-
classmethod
get_by_name
(name=None, version='last')¶ Get an already registered dataset from the platform (using its registration name).
Parameters: Raises: AttributeError
– if dataset_name is not givenPrevisionException
– If dataset does not exist or if there was another error fetching or parsing data
Returns: Fetched dataset
Return type:
-
get_embedding
()¶ Gets the embeddings analysis of the dataset from the actual [client] workspace
Raises: PrevisionException
– DatasetNotFoundErrorrequests.exceptions.ConnectionError
– request error
-
classmethod
getid_from_name
(name=None, version='last')¶ Return the dataset id corresponding to a given name.
Parameters: Raises: PrevisionException
– If dataset does not exist, version number is out of range or there is another error fetching or parsing data
-
classmethod
list
(all=<built-in function all>)¶ List all the available datasets in the current active [client] workspace.
Warning
Contrary to the parent
list()
function, this method returns actualDataset
objects rather than plain dictionaries with the corresponding data.Parameters: all (boolean, optional) – Whether to force the SDK to load all items of the given type (by calling the paginated API several times). Else, the query will only return the first page of result. Returns: Fetched dataset objects Return type: list( Dataset
)
-
classmethod
new
(name, datasource=None, file_name=None, dataframe=None)¶ Register a new dataset in the workspace for further processing. You need to provide either a datasource, a file name or a dataframe (only one can be specified).
Note
To start a new use case on a dataset, it has to be already registred in your workspace.
Parameters: - name (str) – Registration name for the dataset
- datasource (
DataSource
, optional) – A DataSource object used to import a remote dataset (if you want to import a specific dataset from an existent database, you need a datasource connector (Connector
object) designed to point to the related data source) - file_name (str, optional) – Path to a file to upload as dataset
- dataframe (pd.DataFrame, optional) – A
pandas
dataframe containing the data to upload
Raises: Exception
– If more than one of the keyword argumentsdatasource
,file_name
,dataframe
was specifiedPrevisionException
– Error while creating the dataset on the platform
Returns: The registered dataset object in the current workspace.
Return type:
-
start_embedding
()¶ Starts the embeddings analysis of the dataset from the actual [client] workspace
Raises: PrevisionException
– DatasetNotFoundErrorrequests.exceptions.ConnectionError
– request error
-
to_pandas
() → pandas.core.frame.DataFrame¶ Load in memory the data content of the current dataset into a pandas DataFrame.
Returns: Dataframe for the data object Return type: pd.DataFrame
Raises: PrevisionException
– Any error while fetching or parsing the data
-
class
previsionio.dataset.
DatasetImages
(_id, name, datasource=None, _data=None, **kwargs)¶ Bases:
previsionio.dataset.Dataset
DatasetImages objects represent image data resources that will be used by Prevision.io’s platform.
In order to launch an auto ml process (see
BaseUsecase
class), we need to have the matching dataset stored in the related workspace.Within the platform, image folder datasets are stored as ZIP files and are copied from ZIP files.
-
classmethod
new
(name, file_name)¶ Register a new image dataset in the workspace for further processing (in the image folders group).
Note
To start a new use case on a dataset, it has to be already registred in your workspace.
Parameters: Raises: PrevisionException
– Error while creating the dataset on the platformReturns: The registered dataset object in the current workspace.
Return type:
-
to_pandas
() → pandas.core.frame.DataFrame¶ Invalid method for a
DatasetImages
object.Raises: ValueError
– Folder datasets cannot be converted to apandas
dataframe
-
classmethod