image_classifier_3d.data_loader package

Submodules

image_classifier_3d.data_loader.universal_loader module

class image_classifier_3d.data_loader.universal_loader.adaptive_loader(filenames: List, test_flag=False)[source]

Bases: torch.utils.data.dataset.Dataset

Adaptive DataLoader:

Adaptive data loader will collect images of different sizes into mini-batches. No padding applied. Random flip and rotaion will be applied, for all training, testing or evaluation.

All training data should be saved in a folder with filenames of format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), and CELLID is a unique name for the cell (e.g., using uuid). All images will only be loaded when they are being used in a training iteration. Only class labels are pre-loaded, no images will be pre-loaded (ideal for large dataset). During inference, currently only preprocessed images as .npy files are supported. This will be improved for more flexible data loading

filenames: List

a list of filenames for all data. Every filename has the format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), abd CELLID is a unique name for the cell (e.g., using uuid).

test_flag: bool

when for test_dataloader, default is False. When testing, filename will be returned in a batch

class image_classifier_3d.data_loader.universal_loader.adaptive_padding_loader(filenames: Union[List[str], str], out_shape: List = [64, 128, 128], flag: str = 'train', building_wrapper_path: str = 'image_classifier_3d.data_loader.utils', building_func_name: str = 'build_one_cell')[source]

Bases: torch.utils.data.dataset.Dataset

Adaptive padding DataLoader:

Adaptive padding data loader will pad all images to the same size defined by “out_shape” when constructing the data loader. During training, random flip and rotaion will be applied. No augmentation for testing or evaluation. In addition, all images will only be loaded when they are being used in a training iteration. Only class labels are pre-loaded, no images will be pre-loaded (ideal for large dataset).

filenames: Union[List[str], str]

This could be a filename (only csv file supported) or a list of filenames for all data. For the later case, every filename has the format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), and CELLID is a unique name for the cell (e.g., using uuid).

out_shape: List

the size of which all input images will be padded into. If an image is larger than out_shape, it will be resized down to fit under out_shape, and then padded to out_shape.

flag: str

“flag” is a key parameter for determining how data loadinh works in different scenarios: “train” | “val” | “test_csv” | “test_folder”.

When flag == “train” :

All data should be saved in a folder with filenames in the format X_CELLID.npy (see detail above). Random flip and random rotation in XY plane are used for data augmentation.

when flag == “val”:

All data should be saved in a folder with filenames in the format X_CELLID.npy (see detail above). No data augmentation.

when flag == “test_csv”:

Filenames should be the path to a csv file with record of all cells. The csv file should contains at least three columns, “CellId”, “crop_raw” and “crop_seg”. The last two are the read paths for raw image and segmentation. “crop_raw” assumes a 4D image tiff file (multi-channel z-stack, channel order: 0 = dna, 1 = mem, other channels will not be used). “crop_seg” assumes a 4D image tiff file (multi-channel z-stack, channel order: 0 = dna segmentation, 1 = cell segmentation, other channels will not be used). If a file with name “for_mito_prediction.npy” exists under the same folder as “crop_raw”, then it will be directly loaded and used as input to your model. Otherwise, buildinng_wrapper_path and building_func_name will be used to load a function defining how to prepare the input data using crop_raw and crop_seg. For example, you can have a file “C:/projects/demo/preprocessing.py” with a function called “my_preprocessing” defined in the script. Then, buildinng_wrapper_path = “C:/projects/demo/preprocessing.py” and building_func_name = “my_preprocessing”.

when flag == “test_folder”:

All data should be saved in a folder with filenames in the format X_CELLID.npy (see detail above). No data augmentation.

buildinng_wrapper_path: str

where to load the wrapper for building one cell (see above when flag == “train_csv”)

building_func_name: str

the function to load for building one cell (see above when flag == “train_csv”)

class image_classifier_3d.data_loader.universal_loader.basic_loader(filenames: List)[source]

Bases: torch.utils.data.dataset.Dataset

Basic DataLoader:

Only support problem with no more than 10 classes. All files are in .npy format instead of images. During training, all images will only be loaded when they are being used in a training iteration. Only class labels are pre-loaded, no images will be pre-loaded (ideal for large dataset). During inference, currently basic dataloader only take preprocessed images as .npy files. This will be improved for more flexible data loading

filenames: List

a list of filenames for all data. Every filename has the format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), abd CELLID is a unique name for the cell (e.g., using uuid).

image_classifier_3d.data_loader.utils module

image_classifier_3d.data_loader.utils.build_one_cell(crop_raw: numpy.ndarray, crop_seg: numpy.ndarray, down_ratio: float = 0.5) → numpy.ndarray[source]

prepare input tensor for single cell mitotic classifier

crop_raw: np.ndarray

4D array (CZYX), multi-channel 3D image, with the first channel as DNA image, and the second channel as cell membrane image. The image is assume to have isotropic dimension (i.e., XYZ have the same resolution)

crop_seg: np.ndarray

4D array (CZYX), multi-channel 3D image of segmentation mask, assuming the first channel is DNA segmentation, the second channel is cell segmentation. The XYZ size should be the same as crop_raw

down_ratio: float

how much downsampling is applied on the image. Default is 0.5, which means the image size is reduced by half.

a 4D array (CZYX) ready to be fed into the neural network

Module contents