alphagenome.data.track_data.TrackData#
- class alphagenome.data.track_data.TrackData(values, metadata, resolution=1, interval=None, uns=None)[source]#
Container for storing track values and metadata.
TrackDatastores multiple genomic tracks at the same resolution, stacked into an ND matrix of shape (positional_bins, num_tracks). It also contains metadata information as a pandas DataFrame withnum_tracksrows.Metadata DataFrame has two main required columns:
name: The name of the track.
strand: The strand of the track (‘+’, ‘-’, or ‘.’).
Other columns are optional.
Valid shapes of
TrackData.valuesare:[positional_bins]
[positional_bins, num_tracks]
[positional_bins, positional_bins, num_tracks]
…
TrackDatacan store both model predictions and raw data. It can optionally hold information about thegenome.Intervalfrom which the data were derived and.unsfor storing additional unstructured data.In addition to being a container,
TrackDataprovides functionality for common aggregation and slicing operations.- values#
A numpy array of floats or integers representing the track values. Positional axes have the same length. Example valid shapes are: [num_tracks], [positional_bins, num_tracks], and [positional_bins, positional_bins, num_tracks].
- metadata#
A pandas DataFrame containing metadata for each track. The DataFrame must have at least two columns: ‘name’ and ‘strand’.
- resolution#
The resolution of the track data in base pairs.
- interval#
An optional
Intervalobject representing the genomic region.
- uns#
An optional dictionary to store additional unstructured data.
- Raises:
ValueError – If the number of tracks in
valuesdoes not match the number of rows inmetadata, or ifmetadatacontains duplicate (name, strand) pairs, or if the positional axes have different lengths, or if the interval width does not match the expected width.
Attributes#
Table
Returns an array of track names (not necessarily unique). |
|
Returns the number of tracks. |
|
Returns a list of ontology terms (if available). |
|
Returns a list of the positional axes. |
|
Returns an array of track strands. |
|
Returns the interval width covered by the tracks. |
|
- TrackData.names#
Returns an array of track names (not necessarily unique).
- TrackData.num_tracks#
Returns the number of tracks.
- TrackData.ontology_terms#
Returns a list of ontology terms (if available).
- TrackData.positional_axes#
Returns a list of the positional axes.
- TrackData.strands#
Returns an array of track strands.
- TrackData.width#
Returns the interval width covered by the tracks.
-
TrackData.values:
Union[Float32[ndarray, '*positional_bins num_tracks'],Int32[ndarray, '*positional_bins num_tracks'],Bool[ndarray, '*positional_bins num_tracks']]#
Methods#
Table
|
Returns the bin index for a relative position. |
|
Changes the resolution of the track data. |
|
Returns a deep copy of the |
|
Downsamples the track data to a lower resolution using sum pooling. |
Filters tracks to the negative DNA strand. |
|
Filters tracks to the non-negative DNA strands (positive and unstranded). |
|
Filters tracks to the non-positive DNA strands (negative and unstranded). |
|
Filters tracks to the positive DNA strand. |
|
Filters tracks to stranded tracks (excluding unstranded). |
|
Filters tracks to unstranded tracks. |
|
|
Filters tracks by a boolean mask. |
|
Splits tracks into groups based on a metadata column. |
|
Pads the track data along positional axes. |
|
Resizes the track data by cropping or padding with a fixed center. |
Reverse complements the track data and interval if present. |
|
Selects tracks by numerical index. |
|
|
Selects tracks by name. |
|
Slices the track data using a |
|
Slices the track data along the positional axes. |
|
Upsamples the track data to a higher resolution by repeating existing values. |
- TrackData.change_resolution(resolution, aggregation_type=AggregationType.SUM)[source]#
Changes the resolution of the track data.
- TrackData.downsample(resolution, aggregation_type=AggregationType.SUM)[source]#
Downsamples the track data to a lower resolution using sum pooling.
- Parameters:
resolution (
int) – The desired resolution in base pairs.aggregation_type (
AggregationType(default:<AggregationType.SUM: 'sum'>)) – The aggregation method to use for pooling the values.
- Return type:
- Returns:
A new
TrackDataobject with downsampled values.- Raises:
ValueError – If
resolutionis not greater than the current resolution or not divisible by the current resolution.
- TrackData.filter_to_negative_strand()[source]#
Filters tracks to the negative DNA strand.
- Return type:
- TrackData.filter_to_nonnegative_strand()[source]#
Filters tracks to the non-negative DNA strands (positive and unstranded).
- Return type:
- TrackData.filter_to_nonpositive_strand()[source]#
Filters tracks to the non-positive DNA strands (negative and unstranded).
- Return type:
- TrackData.filter_to_positive_strand()[source]#
Filters tracks to the positive DNA strand.
- Return type:
- TrackData.filter_to_stranded()[source]#
Filters tracks to stranded tracks (excluding unstranded).
- Return type:
- TrackData.groupby(column)[source]#
Splits tracks into groups based on a metadata column.
This method splits the tracks in the
TrackDataobject into separateTrackDataobjects based on the unique values in the specified metadata column. It returns a dictionary where the keys are the unique values in the column, and the values are newTrackDataobjects containing the tracks corresponding to each key.
- TrackData.pad(start_pad, end_pad)[source]#
Pads the track data along positional axes.
- Parameters:
- Return type:
- Returns:
A new
TrackDataobject with padded values.- Raises:
ValueError – If
start_padorend_padis not divisible by theresolution. –
- TrackData.resize(width)[source]#
Resizes the track data by cropping or padding with a fixed center.
- Parameters:
width (
int) – The desired width in base pairs.- Return type:
- Returns:
A new
TrackDataobject with resized values.- Raises:
ValueError – If
widthis not divisible by the resolution.
- TrackData.reverse_complement()[source]#
Reverse complements the track data and interval if present.
- Return type:
- Returns:
A new
TrackDataobject with reverse complemented tracks.
- TrackData.slice_by_interval(interval, match_resolution=False)[source]#
Slices the track data using a
genome.Interval.- Parameters:
- Return type:
- Returns:
A new
TrackDataobject sliced to the interval.- Raises:
ValueError – If
.intervalis not specified or if the specified interval is not fully contained within the current interval.
- TrackData.slice_by_positions(start, end)[source]#
Slices the track data along the positional axes.
The slicing follows Python slicing conventions (0 indexed, and includes elements up to end-1).
- Parameters:
- Return type:
- Returns:
A new
TrackDataobject with the sliced values.- Raises:
ValueError – If (end - start) is greater than the width, or if (end -
start) is not divisible by the resolution. –
- TrackData.upsample(resolution, aggregation_type=AggregationType.SUM)[source]#
Upsamples the track data to a higher resolution by repeating existing values.
- Parameters:
resolution (
int) – The desired resolution in base pairs.aggregation_type (
AggregationType(default:<AggregationType.SUM: 'sum'>)) – The aggregation method to use for pooling the values.
- Return type:
- Returns:
A new
TrackDataobject with upsampled values.- Raises:
ValueError – If
resolutionis not lower than the current resolution or not divisible by the current resolution.