image_classifier_3d.data_loader package¶
Submodules¶
image_classifier_3d.data_loader.universal_loader module¶
-
class
image_classifier_3d.data_loader.universal_loader.
adaptive_loader
(filenames: List, test_flag=False)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Adaptive DataLoader:
Adaptive data loader will collect images of different sizes into mini-batches. No padding applied. Random flip and rotaion will be applied, for all training, testing or evaluation.
All training data should be saved in a folder with filenames of format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), and CELLID is a unique name for the cell (e.g., using uuid). All images will only be loaded when they are being used in a training iteration. Only class labels are pre-loaded, no images will be pre-loaded (ideal for large dataset). During inference, currently only preprocessed images as .npy files are supported. This will be improved for more flexible data loading
- filenames: List
a list of filenames for all data. Every filename has the format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), abd CELLID is a unique name for the cell (e.g., using uuid).
- test_flag: bool
when for test_dataloader, default is False. When testing, filename will be returned in a batch
-
class
image_classifier_3d.data_loader.universal_loader.
adaptive_padding_loader
(filenames: Union[List[str], str], out_shape: List = [64, 128, 128], flag: str = 'train', building_wrapper_path: str = 'image_classifier_3d.data_loader.utils', building_func_name: str = 'build_one_cell')[source]¶ Bases:
torch.utils.data.dataset.Dataset
Adaptive padding DataLoader:
Adaptive padding data loader will pad all images to the same size defined by “out_shape” when constructing the data loader. During training, random flip and rotaion will be applied. No augmentation for testing or evaluation. In addition, all images will only be loaded when they are being used in a training iteration. Only class labels are pre-loaded, no images will be pre-loaded (ideal for large dataset).
- filenames: Union[List[str], str]
This could be a filename (only csv file supported) or a list of filenames for all data. For the later case, every filename has the format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), and CELLID is a unique name for the cell (e.g., using uuid).
- out_shape: List
the size of which all input images will be padded into. If an image is larger than out_shape, it will be resized down to fit under out_shape, and then padded to out_shape.
- flag: str
“flag” is a key parameter for determining how data loadinh works in different scenarios: “train” | “val” | “test_csv” | “test_folder”.
When flag == “train” :
All data should be saved in a folder with filenames in the format X_CELLID.npy (see detail above). Random flip and random rotation in XY plane are used for data augmentation.
when flag == “val”:
All data should be saved in a folder with filenames in the format X_CELLID.npy (see detail above). No data augmentation.
when flag == “test_csv”:
Filenames should be the path to a csv file with record of all cells. The csv file should contains at least three columns, “CellId”, “crop_raw” and “crop_seg”. The last two are the read paths for raw image and segmentation. “crop_raw” assumes a 4D image tiff file (multi-channel z-stack, channel order: 0 = dna, 1 = mem, other channels will not be used). “crop_seg” assumes a 4D image tiff file (multi-channel z-stack, channel order: 0 = dna segmentation, 1 = cell segmentation, other channels will not be used). If a file with name “for_mito_prediction.npy” exists under the same folder as “crop_raw”, then it will be directly loaded and used as input to your model. Otherwise, buildinng_wrapper_path and building_func_name will be used to load a function defining how to prepare the input data using crop_raw and crop_seg. For example, you can have a file “C:/projects/demo/preprocessing.py” with a function called “my_preprocessing” defined in the script. Then, buildinng_wrapper_path = “C:/projects/demo/preprocessing.py” and building_func_name = “my_preprocessing”.
when flag == “test_folder”:
All data should be saved in a folder with filenames in the format X_CELLID.npy (see detail above). No data augmentation.
- buildinng_wrapper_path: str
where to load the wrapper for building one cell (see above when flag == “train_csv”)
- building_func_name: str
the function to load for building one cell (see above when flag == “train_csv”)
-
class
image_classifier_3d.data_loader.universal_loader.
basic_loader
(filenames: List)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Basic DataLoader:
Only support problem with no more than 10 classes. All files are in .npy format instead of images. During training, all images will only be loaded when they are being used in a training iteration. Only class labels are pre-loaded, no images will be pre-loaded (ideal for large dataset). During inference, currently basic dataloader only take preprocessed images as .npy files. This will be improved for more flexible data loading
- filenames: List
a list of filenames for all data. Every filename has the format X_CELLID.npy, where X can be any integer from 0 to num_class-1 (assuming num_class <= 10), abd CELLID is a unique name for the cell (e.g., using uuid).
image_classifier_3d.data_loader.utils module¶
-
image_classifier_3d.data_loader.utils.
build_one_cell
(crop_raw: numpy.ndarray, crop_seg: numpy.ndarray, down_ratio: float = 0.5) → numpy.ndarray[source]¶ prepare input tensor for single cell mitotic classifier
- crop_raw: np.ndarray
4D array (CZYX), multi-channel 3D image, with the first channel as DNA image, and the second channel as cell membrane image. The image is assume to have isotropic dimension (i.e., XYZ have the same resolution)
- crop_seg: np.ndarray
4D array (CZYX), multi-channel 3D image of segmentation mask, assuming the first channel is DNA segmentation, the second channel is cell segmentation. The XYZ size should be the same as crop_raw
- down_ratio: float
how much downsampling is applied on the image. Default is 0.5, which means the image size is reduced by half.
a 4D array (CZYX) ready to be fed into the neural network