Putting Together a Dataset

A Dataset object requires a Publication, a Model and the file storing the relevant data.

Browsing datasets

Before we create a new Dataset, it’s pertinent to check if it already exists. Using the get_datasets() method, we can obtain a list of all datasets in the system. However, given that we have a Publication and a Model handy, we can explicitly search the list for any datasets containing these objects:

datasets = gwl.get_datasets(publication=publication, model=model)

If the returned list is empty, we know that our dataset does not already exist, hence we can move onto the creation step.

Creating a dataset

To create a new dataset on the GWLandscape service, we can use the create_dataset() method:

from pathlib import Path

dataset = gwl.create_dataset(
    publication=publication,
    model=model,
    datafile=Path('/path/to/datafile')
)

Note

A datafile must be either a single HDF5 file, or a tarfile containing exactly one HDF5 file (though other files may also be included alongside it).

Updating and deleting datasets

Given that datasets may often require significant time investment to upload large files, updating the publications and models associated with them is potentially very useful. We can update the dataset with Dataset.update():

new_publication = gwl.get_publication(title="On the formation history of galactic double neutron stars")
dataset.update(publication=new_publication)

Only the publication and model can be updated with this method. If the data file must be updated, we should instead remove the old dataset with Dataset.delete(), and then create a new dataset with the correct file.