PCI Subsystem Rework for Xorg v.next Proposal
This wiki entry details a proposal for reworking the PCI handling code in Xorg for the 7.1 release.
Background
It is fairly common knowledge in the X.org developer community that the PCI handling code in Xorg 7.0 (and earlier) is a big, ugly mess. The code is complex, understood by very few developers, and does things that only the kernel should do. In fact, most of the existing code originates from a time before most kernels implemented the required functionality. Since that time, kernels have greatly expanded the functionality provided to user space for probing and accessing PCI devices.
The PCI bus has also changed (e.g., multiple PCI domains, complex PCI-PCI bridges, AGP, and PCI-Express). This has required that the code to support probing and accessing PCI devices has also needed to change. Unfortuantely these changes tend to be platform specific. Certain features, such as multiple domains, are only supported on certain platforms by X.org. These features tend to be supported universally by the kernels on those platforms.
Rather than duplicating the efforts of kernel developers, X.org needs to use the interfaces provided by the kernel as much as possible. It is currently unclear as to whether X.org still needs to support any platforms with kernels that do not export this functionality to user mode.
Required Functionalty
There are seven broad pieces of functionality that X.org needs to work with devices on the PCI bus. These are:
- Get a list of all devices matching some criteria. This criteria is typically either the device class (i.e., find all devices of the class "display" or "multimedia") or the device vendor.
- Reading the device's expansion ROM.
- Accessing the device's IO ports.
- Accessing the device's memory regions (BARs).
- Reading the device's capabilities (i.e., determine if the device is AGP).
- Power management.
- VgaArbiter. Ultimately this should be in the kernel, but it isn't currently. Devices thatdecode legacy VGA IO and MEM need to be identified. In addition, their IO/MEM enable/disable bits need to be toggled (along with the VGA forwarding enables on any bridges on the path to the devices). This is needed to prevent multiple devices from decoding legacy access. Drivers need to disable legacy decoding completely on their hardware, which most modern cards can do. The driver must be able to inform the arbiter of that fact to take out the card from the picture. This must be done carefuly since bad things will happen if the card generates an interrupt when the arbiter has disabled MEM decoding on the card. The arbiter needs to either forbid cards to use interrupts if they are set to decode legacy space (and thus can be disabled at any time) or have a driver callback for disabling IRQ emission on a given card when it's being disabled by the arbiter. IO and MEM can't be treated separately since there is only one VGA forward bit on PCI-to-PCI bridges.
In the best case scenario, nearly all of this functionality is trivially provided by Linux's sysfs interface. A certain amount of this functionality is also provided by the libpci.a library in the pciutils package. The missing functionality and license issues (libpci.a is GPL) prevent use of libpci.a from being a viable option.
Proposed Implementation
The current proposal is to implement a new library that implements the require functionality in a generic way. The existing code for accessing PCI devices would then be removed, in its entirety, from the X-server and the drivers. At this time the X-server and drivers would be ported to the new interface. The remainder of this section describes the interface to the proposed new library.
The interface consists of a set of initialization / cleanup routines and a single primary data type. The overall structures is intentionally similar to that of libpci.a, but there are some significant differences. The initialization / cleanup routines are roughly analogous to pci_access
. pci_device
is roughly analogous to pci_dev
.
Initialization / Cleanup
Access to the PCI system is obtained by calling pci_system_init
. This function returns a either zero on success or an errno value on failure. It initializes global data that is private to the library.
When access to the PCI system is no longer needed pci_system_cleanup
is called. This destroys all of the internal data used by the library and all of the structures created by the library for the application. That is, all pci_device
, pci_device_iterator
, pci_agp_info
, etc. are destroyed by calling pci_system_cleanup
.
Device Iteration
Lists of PCI devices are obtained by creating a pci_device_iterator
structure. This structure is created by calling either pci_slot_match_iterator_create
or pci_id_match_iterator_create
.
struct pci_slot_match { /* * Device slot matching controls * * Control the search based on the domain, bus, slot, and function of * the device. Setting any of these fields to PCI_MATCH_ANY will cause * the field to not be used in the comparison. */ uint32_t domain; uint32_t bus; uint32_t dev; uint32_t func; intptr_t match_data; }; struct pci_device_iterator *pci_slot_match_iterator_create(const struct pci_slot_match *match); struct pci_id_match { /* * Device / vendor matching controls * * Control the search based on the device, vendor, subdevice, or subvendor * IDs. Setting any of these fields to PCI_MATCH_ANY will cause the * field to not be used in the comparison. */ uint32_t vendor_id; uint32_t device_id; uint32_t subvendor_id; uint32_t subdevice_id; /* * Device class matching controls */ uint32_t device_class; uint32_t device_class_mask; intptr_t match_data; }; struct pci_device_iterator *pci_id_match_iterator_create(const struct pci_id_match *match);
This allows devices to be iterated either by bus location, by vendor, by class, or by function. These interfaces roughly match similar interfaces available within the Linux kernel.
If the match
parameter to either function is NULL
, all devices will be matched.
Devices are iterated by calling pci_device_next
with the pci_device_iterator
. After the last device has been returned, the next call to pci_device_next
will return NULL
.
struct pci_device *pci_device_next(struct pci_device_iterator *iter);
When an iterator will not be used any further, it must be destroyed using pci_iterator_destroy
.
void pci_iterator_destroy(struct pci_device_iterator *iter);
pci_device
The pci_device
structure contains all of the expected fields and is very similar to libpci.a's pci_dev
structure. Some fields that are important to X (e.g., subvendor_id
) have been added, and some fields that are unnecessary (e.g., rom_base_addr
) have been removed.
struct pci_mem_region { void * memory; pciaddr_t bus_addr; pciaddr_t base_addr; pciaddr_t size; }; struct pci_device { uint16_t domain; uint8_t bus; uint8_t dev; uint8_t func; uint16_t vendor_id; uint16_t device_id; uint16_t subvendor_id; uint16_t subdevice_id; uint32_t device_class; struct pci_mem_region regions[6]; pciaddr_t rom_size; int irq; void * user_data; };
Once a pointer to a device has been obtained, memory regions of the device can be mapped via pci_device_map_region
. Mapped regions can be unmapped with pci_device_unmap_region
. Once a region is mapped, it can be accessed via the pci_mem_region::memory
pointer.
int pci_device_map_region( struct pci_device * dev, unsigned region, int write_enable ); int pci_device_unmap_region( struct pci_device * dev, unsigned region );
ISSUE: Should special routines for reading / writing MMIO regions ala xf86WriteMmio8 be added?
The device's expansion ROM is treated specially. Rather than mapping the ROM and reading it, a special function, pci_device_read_rom
, is provided. The supplied buffer must be at least pci_device::rom_size
bytes.
int pci_device_read_rom( struct pci_device * dev, void * buffer );
Device configuration and capability data can be accessed via traditional, libpci.a style read and write routines.
int pci_device_cfg_read_u8 ( struct pci_device * dev, unsigned offset, uint8_t * val ); int pci_device_cfg_read_u16 ( struct pci_device * dev, unsigned offset, uint16_t * val ); int pci_device_cfg_read_u32 ( struct pci_device * dev, unsigned offset, uint32_t * val ); int pci_device_cfg_read_block( struct pci_device * dev, unsigned offset, void * val, unsigned length ); int pci_device_cfg_write_u8 ( struct pci_device * dev, unsigned offset, uint8_t val ); int pci_device_cfg_write_u16 ( struct pci_device * dev, unsigned offset, uint16_t val ); int pci_device_cfg_write_u32 ( struct pci_device * dev, unsigned offset, uint32_t val ); int pci_device_cfg_write_block( struct pci_device * dev, unsigned offset, const void * val, unsigned length );
In addition, specific routines and data types exist for common capabilities that are important to X. The pci_device_get_agp_info
function parses the device's configuration header and returns a fully popluated pci_agp_info
structure. If the device does not have an AGP capability entry, NULL
is returned.
struct pci_agp_info { unsigned config_offset; uint8_t major_version; uint8_t minor_version; uint8_t rates; uint8_t fast_writes:1; uint8_t addr64:1; uint8_t htrans:1; uint8_t gart64:1; uint8_t coherent:1; uint8_t sideband:1; uint8_t isochronus:1; uint8_t async_req_size; uint8_t calibration_cycle_timing; uint8_t max_requests; }; const struct pci_agp_info * pci_device_get_agp_info( struct pci_device * dev );
In the future, similar routines may be added for other common device capabilities (e.g., power management, PCI-Express, etc.).
Status
Core X-server Status
The core X-server portion of the PCI-rework is, essentially, finished. It lives in the pci-rework branch.
libpciaccess Status
OS | Status | Point of Contact |
Linux | Working with sysfs | idr |
FreeBSD | Working (7.x+) | anholt |
NetBSD | ? | - |
OpenBSD | Working | herrb |
Solaris | Working | edward.shu@sun.com |
AIX | ? | - |
Driver Status
Driver | Status | Point of Contact |
apm | Not ported | - |
ark | Not ported | - |
ast | Not ported | - |
ati/ati | Bugzilla | fufutos |
ati/atimisc | Bugzilla | fufutos |
ati/r128 | Not ported | - |
ati/radeon | Not ported | - |
chips | Not ported | - |
cirrus | Not ported | - |
cyrix | Not ported | - |
dummy | Not ported | - |
fbdev | Trunk | idr |
glide | Not ported | - |
glint | Not ported | - |
i128 | Not ported | - |
i740 | Not ported | - |
impact | Not ported | - |
imstt | Not ported | - |
intel | Not ported | - |
mga | Trunk | idr |
neomagic | Not ported | - |
newport | Not ported | - |
nsc | Not ported | - |
nv | Not ported | - |
rendition | Trunk | idr |
s3 | Not ported | - |
s3virge | Not ported | - |
savage | pci-rework branch | idr |
siliconmotion | Not ported | - |
sis | Not ported | - |
sisusb | Not ported | - |
sunbw2 | Not ported | - |
suncg14 | Not ported | - |
suncg3 | Not ported | - |
suncg6 | Not ported | - |
sunffb | Not ported | - |
sunleo | Not ported | - |
suntcx | Not ported | - |
tdfx | In progress | idr |
tga | Not ported | - |
trident | Not ported | - |
tseng | Not ported | - |
v4l | Not ported | - |
vesa | Trunk | idr |
vga | Not ported | - |
via | Not ported | - |
vmware | Not ported | - |
voodoo | Not ported | - |
wsfb | Not ported | - |
xgi | Not ported | - |
xgixp | Not ported | - |
Status key:
- Trunk: Driver is ported to the new interface, and the port resides in the driver's trunk.
- pci-rework branch: Driver is ported to the new interface, and the port resides in the driver's pci-rework branch.
- In progress: The driver is being ported.
- Not ported: Work has not yet begun on porting driver to new interface.