buddi.dataset.buddi3_dataset#

Functions

get_dataset([input_tuple_order, ...])

Create a TensorFlow dataset from numpy arrays for training the BuDDI model.

get_supervised_dataset(X_known_prop, ...)

Create a TensorFlow dataset from numpy arrays catered to the buddi3 model specifically. Meant to be constructed with dataset from pseudobulks where expression data have associated ground truth proportions of cell types.

get_unsupervised_dataset(X_unknown_prop, ...)

Create a TensorFlow dataset from numpy arrays catered to the buddi3 model specifically. :param X_unknown_prop: numpy array of normalized expression of shape (n, num_genes) :param label_unknown_prop: numpy array of one hot encoded sample labels of shape (n, num_unique_samples) :param samp_type_unknown_prop: numpy array of one hot encoded sample type (sequencing tech) labels of shape (n, num_unique_sample_types) Usually for supervised dataset this should be a 2darray of shape (n, 2), with one column should be all 1s (encoding for bulk) and the other all 0s (encoding for single cell) :return: TensorFlow dataset :rtype: tf.data.Dataset.