roboto.domain.topics.parquet.table_transforms#

Module Contents#

roboto.domain.topics.parquet.table_transforms.compute_time_filter_mask(timestamps, start_time=None, end_time=None)#

Compute a boolean mask indicating which rows fall within the specified time range. Returns None if no time filtering is needed (both start_time and end_time are None).

Parameters:
  • timestamps (pyarrow.Array)

  • start_time (Optional[int])

  • end_time (Optional[int])

Return type:

Optional[pyarrow.BooleanArray]

roboto.domain.topics.parquet.table_transforms.extract_timestamp_field(schema, timestamp_message_path)#

Aggregate timestamp info into a helper utility for handling time-based data operations.

Parameters:
Return type:

roboto.domain.topics.parquet.timestamp.Timestamp

roboto.domain.topics.parquet.table_transforms.extract_timestamps(table, timestamp)#

Extract timestamps in nanoseconds since Unix epoch from the table’s timestamp column.

Parameters:
Return type:

pyarrow.Int64Array

roboto.domain.topics.parquet.table_transforms.should_read_row_group(row_group_metadata, timestamp, start_time=None, end_time=None)#

Determine whether a Parquet row group contains data within the requested time range. Used to short-circuit requesting column chunks from the given row group if not relevant.

Parameters:
Return type:

bool