4. The Tracks File Format
Including various tracks format such as: WIG/bigWig, bedGraph, etc
The bigWig file format
The bigWig format is for display of dense, continuous data that will be displayed as a graph. BigWig files are created initially from WIG type files, using the UCSC program wigToBigWig. Alternatively, bigWig files can be created from bedGraph files, using the UCSC program bedGraphToBigWig. In either case, the resulting bigWig files are in an indexed binary format. The main advantage of the bigWig files is that only the portions of the files needed to display a particular region are transferred, so for large data sets bigWig is considerably faster than regular WIG files.
-- Broard Institute
The bigWig track format can be visualized by genome browsers like Integrative Genomics Viewer - Broad Institute and other web-based genome browsers like VALIS.
A descriptive file format of bigWig can be found at this header file https://raw.githubusercontent.com/dpryan79/libBigWig/master/bigWig.h. The bigWig file is self-indexed, which means users can randomly access to data blocks that contains the track data according to the offsets of the data blocks stored in the index header blocks at the beginning of the file.
In brief, the there are three parts in abigWig file.
Header section
Chromosome List section
The data sections (will not be read until later accession)
Index section
These section holds everything needed to randomly access a bigWig file.
bigWig Header
As described in bigWig.h, the header of the bigWig files contains the following fields
typedef struct { uint16_t version; /* The version information of the file.*/ uint16_t nLevels; /* The number of "zoom" levels.*/ uint64_t ctOffset; /* The offset to the on-disk chromosome tree list.*/ uint64_t dataOffset; /* The on-disk offset to the first block of data.*/ uint64_t indexOffset; /* The on-disk offset to the data index.*/ uint16_t fieldCount; /* Total number of fields.*/ uint16_t definedFieldCount; /* Number of fixed-format BED fields.*/ uint64_t sqlOffset; /* The on-disk offset to an SQL string. This is unused.*/ uint64_t summaryOffset; /* If there's a summary, this is the offset to it on the disk.*/ uint32_t bufSize; /* The compression buffer size (if the data is compressed).*/ uint64_t extensionOffset; /* Unused*/ bwZoomHdr_t *zoomHdrs; /* Pointers to the header for each zoom level.*/ //total Summary uint64_t nBasesCovered; /* The total bases covered in the file.*/ double minVal; /* The minimum value in the file.*/ double maxVal; /* The maximum value in the file.*/ double sumData; /* The sum of all values in the file.*/ double sumSquared; /* The sum of the squared values in the file.*/} bigWigHdr_t;0-64 bytes of the file are reserved space.
Field
Offset
version
0x4
nLevels
0x6
ctOffset
0x8
dataOffset
0x10
indexOffset
0x18
fieldConnt)
0x20
definedFieldCount
0x22
sqlOffset
0x24
summaryOffset
0x2c
bufSize
0x34
extensionOffset
0x38
After these header fields, the bigwig file goes with a zoom header. BigWig files have multiple "zoom" levels, each of which has its own header.
The zoom level header contains arrays of dataOffset and indexOffset regarding to different zoom levels.
A header and index that points to an R-tree that in turn points to data blocks. A node within an R-tree holding the index forc data. For more information see the bigWig index section.
After the zoom header section, the bigWig files follows with file summary information, including nBasesCovered, minVal, maxVal, sumData, and sumSquared.
bigWig chromosome list
The offset of the chromosome list section to the file start can be accessed from the bigWig header. THe chromosome list section begins with some basic summary fields including itemsPerBlock, keySize, ValueSize, and itemCount.
The chromosome list structure is defined as
that contains an array of strings of null terminated chromosomes. In the bigWig files, the chromosome names are just stored sequentially padded with 2-byte flags (These flags are actually 1 byte flag isLeaf and 1 byte padding. The isLeaf flag is now deprecated in the bigWig file format)
bigWig index
The bwRTree and bwRTree_Node data structures are defined as follows:
The bwRTree data structure is actually a region-tree (RTree) data structure that is usually used for indexing multi-dimensional information such as geographical coordinates. The bwRTree arrange the genome coordinates in a hierachical format that provide information to randomly access the entire bigWig file.

Last updated
Was this helpful?
