roboto.domain.topics.parquet.table_transforms#
Module Contents#
- roboto.domain.topics.parquet.table_transforms.compute_time_filter_mask(timestamps, start_time=None, end_time=None)#
Compute a boolean mask indicating which rows fall within the specified time range. Returns None if no time filtering is needed (both start_time and end_time are None).
- Parameters:
timestamps (pyarrow.Array)
start_time (Optional[int])
end_time (Optional[int])
- Return type:
Optional[pyarrow.BooleanArray]
- roboto.domain.topics.parquet.table_transforms.extract_timestamp_field(schema, timestamp_message_path)#
Aggregate timestamp info into a helper utility for handling time-based data operations.
- Parameters:
schema (pyarrow.Schema)
timestamp_message_path (roboto.domain.topics.record.MessagePathRecord)
- Return type:
- roboto.domain.topics.parquet.table_transforms.extract_timestamps(table, timestamp)#
Extract timestamps in nanoseconds since Unix epoch from the table’s timestamp column.
- Parameters:
table (pyarrow.Table)
timestamp (roboto.domain.topics.parquet.timestamp.Timestamp)
- Return type:
pyarrow.Int64Array
- roboto.domain.topics.parquet.table_transforms.should_read_row_group(row_group_metadata, timestamp, start_time=None, end_time=None)#
Determine whether a Parquet row group contains data within the requested time range. Used to short-circuit requesting column chunks from the given row group if not relevant.
- Parameters:
row_group_metadata (pyarrow.parquet.RowGroupMetaData)
timestamp (roboto.domain.topics.parquet.timestamp.Timestamp)
start_time (Optional[int])
end_time (Optional[int])
- Return type:
bool