serotiny.dataframe.transforms.filter module#

serotiny.dataframe.transforms.filter.filter_columns(input: DataFrame | Dict | Sequence[str], columns: Sequence[str] | None = None, startswith: str | None = None, endswith: str | None = None, contains: str | None = None, excludes: str | None = None, regex: str | None = None)[source]#

Select columns in a dataset, using different filtering options. See serotiny.dataframe.transforms.filter_columns for more details.

Parameters:
  • input (Union[pd.DataFrame, Sequence[str]]) – The input to operate on. It can either be a pandas DataFrame, in which case the result is a DataFrame with only the columns that match the filters; or it can be a list of strings, and in that case the result is a list containing only the strings that match the filters

  • columns (Optional[Sequence[str]] = None) – Explicit list of columns to include. If it is supplied, the remaining filters are ignored

  • startswith (Optional[str] = None) – A substring the matching columns must start with

  • endswith (Optional[str] = None) – A substring the matching columns must end with

  • contains (Optional[str] = None) – A substring the matching columns must contain

  • excludes (Optional[str] = None) – A substring the matching columns must not contain

  • regex (Optional[str] = None) – A string containing a regular expression to be matched

serotiny.dataframe.transforms.filter.filter_rows(dataframe: DataFrame, column: str, values: Sequence, exclude: bool = False)[source]#

Filter a dataframe, keeping only the rows where a given column’s value is contained in a list of values.

Parameters:
  • dataframe (pd.DataFrame) – Input dataframe

  • column (str) – The column to be used for filtering

  • values (Sequence) – List of values to filter for