serotiny.dataframe.readers module#

serotiny.dataframe.readers.filter_columns(columns_to_filter: Sequence[str], regex: str | None = None, startswith: str | None = None, endswith: str | None = None, contains: str | None = None, excludes: str | None = None) Sequence[str][source]#

Filter a list of columns, using a combination of different queries, or a regex pattern. If regex is supplied it takes precedence and the remaining arguments are ignored. Otherwise, the logical AND of the supplied filters is applied, i.e. the columns that respect all of the supplied conditions are returned.

Parameters:
  • columns_to_filter (Sequence[str]) – List of columns to filter

  • regex (Optional[str] = None) – A string containing a regular expression to be matched

  • startswith (Optional[str] = None) – A substring the matching columns must start with

  • endswith (Optional[str] = None) – A substring the matching columns must end with

  • contains (Optional[str] = None) – A substring the matching columns must contain

  • excludes (Optional[str] = None) – A substring the matching columns must not contain

serotiny.dataframe.readers.read_csv(path, include_columns=None)[source]#

Read a dataframe stored in a .csv file, and optionally include only the columns given by include_columns

Parameters:
  • path (Union[Path, UPath, str]) – Path to the .csv file

  • include_columns (Optional[Sequence[str]] = None) – List of column names and/or regex expressions, used to only include the desired columns in the resulting dataframe.

Returns:

dataframe

Return type:

pd.DataFrame

serotiny.dataframe.readers.read_dataframe(dataframe: Path | UPath | str | DataFrame, required_columns: Sequence[str] | None = None, include_columns: Sequence[str] | None = None) DataFrame[source]#

Load a dataframe from a .csv or .parquet file, or assert a given pd.DataFrame contains the expected required columns.

Parameters:
  • dataframe (Union[Path, UPath, str, pd.DataFrame]) – Either the path to the dataframe to be loaded, or a pd.DataFrame. Supported file types are .csv and .parquet

  • required_columns (Optional[Sequence[str]] = None) – List of columns that the dataframe must contain. If these aren’t found, a ValueError is thrown

  • include_columns (Optional[Sequence[str]] = None) – List of column names and/or regex expressions, used to only include the desired columns in the resulting dataframe. If required_columns is not None, those get appended to include_columns (without duplication).

Returns:

dataframe

Return type:

pd.DataFrame

serotiny.dataframe.readers.read_h5ad(path, include_columns=None, backed=None)[source]#

Read an annData object stored in a .h5ad file.

Parameters:
Return type:

annData

serotiny.dataframe.readers.read_parquet(path, include_columns=None)[source]#

Read a dataframe stored in a .parquet file, and optionally include only the columns given by include_columns

Parameters:
  • path (Union[Path, UPath, str]) – Path to the .parquet file

  • include_columns (Optional[Sequence[str]] = None) – List of column names and/or regex expressions, used to only include the desired columns in the resulting dataframe.

Returns:

dataframe

Return type:

pd.DataFrame