|
The Sun Host to PCI Bridge (HPB) is a PCI-based I/O subsystem
that connects the system bus (Ultra Port Architecture (UPA)) with PCI buses. The HPB includes:
- Two physically separate PCI bus segments, each with full
master and target support
- Two separate streaming caches, one for each bus segment,
for accelerating PCI DMA activity
- A Memory Management Unit (IOMMU) for mapping the Direct
Memory Access (DMA) address for both buses
- An interrupt dispatch unit for delivering interrupt
requests to the CPU module
This is a block diagram of a Sun Host to PCI Bridge (HPB).
Only the functional blocks pertained to PCI are shown.
A summary of the characteristics of the HPB
is presented in the following table.
| Characteristics
|
Bus A
|
Bus B
|
| Data Transfer Width
|
64-bit
|
64-bit
|
| Clock Frequency
|
33/66 MHz capable
|
33 MHz capable
|
| Burst Size
|
64 bytes
|
64 bytes
|
| Number of Read Buffers
|
One 64-byte for DMA
|
One 64-byte for DMA
|
| Number of Write Buffers
|
Two 64-byte for DMA; One 64-byte for
PIO
|
Two 64-byte for DMA; One 64-byte for
PIO
|
| Dual Address Cycles
|
Bypass DMA only
|
Bypass DMA only
|
| Fast Back-to-Back Device
|
In target mode only
|
In target mode only
|
| Byte Twisting
|
Yes for DMA
|
Yes for DMA
|
| Interrupt Latency
|
6 cycles inside the IDU
|
6 cycles inside the IDU
|
| Cache Line Size
|
64 bytes
|
64 bytes
|
| Number of Cache Lines
|
16
|
16
|
| Disconnected on Cache Line
|
Target mode (DMA) only
|
Target mode (DMA) only
|
| Configuration Mechanism (per section
3.7.4 of PCI 2.1 specification)
|
Configuration Mechanism #2
|
Configuration Mechanism #2
|
| Configuration Space
|
256 bytes, starting at physical
address 1FE.0101.0000
|
256 bytes, starting at physical
address 1FE.0100.0000
|
| I/O Space
|
8 Kbytes. At physical address
1FE.0200.0000
|
8 Kbytes. At physical address
1FE.0201.0000
|
| Memory Space
|
2 Gbytes. At physical address
1FF.0000.0000
|
2 Gbytes. At physical address
1FF.8000.0000
|
| Configuration Cycles
|
Master mode only
|
Master mode only
|
| Special Cycle
|
Master mode only
|
Master mode only
|
| Arbitrary Byte Enables
|
Consistent DMA only
|
Consistent DMA only
|
| Peer-to-Peer DMA
|
On a single segment
|
On a single segment
|
| Interrupt
|
4 interrupt lines shared among PCI
devices
|
4 interrupt lines shared among PCI
devices
|
| IOMMU Page Size
|
8 Kbytes and 64 Kbytes. Only 8 Kbytes
page size used in the STC
|
8 Kbytes and 64 Kbytes. Only 8 Kbytes
page size used in the STC
|
| DVMA Addressing Space (set by PCI
nexus driver)
|
64 Mbytes in Solaris 2.5.1. May change
in later release of Solaris
|
64 Mbytes in Solaris 2.5.1. May change
in later release of Solaris
|
| PIO Read Size
|
1, 2, 4, 8, 16, 64 bytes for memory
cycles;
1, 2, and 4 bytes for I/O or
configuration cycles
|
1, 2, 4, 8, 16, 64 bytes for memory
cycles;
1, 2, and 4 bytes for I/O or
configuration cycles
|
| PIO Write Size
|
0-16 arbitrary byte enables and
64-byte aligned for memory cycles;
0-4 arbitrary byte enables for I/O or configuration
cycles
|
0-16 arbitrary byte enables and
64-byte aligned for memory cycles;
0-4 arbitrary byte enables for I/O or configuration
cycles
|
Cache-line Wrap
Addressing Mode
|
Not Supported
|
Not Supported
|
| Local (on-PCI) Cache
|
Not Supported
|
Not Supported
|
| Exclusive Access to Main Memory
|
LOCK# signal not
connected
|
LOCK# signal not
connected
|
| Address/Data Stepping
|
Not Supported
|
Not Supported
|
| DOS Compatibility Hole
|
Not Supported
|
Not Supported
|
| External Arbiter
|
Not Supported
|
Not Supported
|
| Subtractive Decode
|
Not Supported
|
Not Supported
|
The PCI Bus Module (PBM) implements a complete PCI master and
target interface. The HPB contains two nearly identical copies of
this module. Each module implements all of the required host
bridge functions as contained in the PCI Revision 2.1
specification except the interrupt logic. The interrupt function
is implemented in the Interrupt Dispatch Unit (see "Interrupt Dispatch Unit"
for details).
Each PBM also handles the
big-to-little-endian byte twisting required for correct operation
of Programmed I/O (PIO) and Direct Memory Access (DMA) data
paths. The byte at address 0 on the big-endian side is directly
wired to the byte at address 0 on the little-endian side (for
example, bits 63:56 map to bits 7:0, bits 55:48 map to bits 15:8,
and so forth). Because byte lanes at the same address are
connected, DMA of byte streams works correctly without further
intervention.
For PIO access larger than a byte, byte
twisting is not sufficient. For example,
if the 32-bit value 0x12345678 is written to a 32-bit register on
a PCI device, the PCI device sees the value 0x78563412 instead.
SPARC-V9 Architecture Manual defines special support for
little-endian access to correct this discrepancy. By either
marking the page containing the PCI register as little-endian in
the processor's MMU, or by using one of the little-endian Address
Space Identifiers (ASIs)1,
the CPU alters its ordering of the bytes so that the PCI device
correctly sees 0x12345678. Solaris 2.5 software provides a set of
DDI functions to address the endian issue. Writing Portable
DDI-Compliant PCI Drivers (a Sun white paper) provides a list of
DDI functions for writing endian-neutral device drivers.
As a target on the PCI bus, the PBM only
responds to PCI memory space commands. All other PCI commands are
ignored. The following table describes the PCI commands that the
PBM is able to generate as a master, as well as how it responds
as a target.
- 64-bit bus extension (as a target only for DMA)
- 64-bit addressing (Dual Address Cycle) for IOMMU bypass
- Required adapter and host-bridge configuration registers
- Fast back-to-back cycles as a target
- Arbitrary byte enables (consistent mode only)
- Ability to generate memory, I/O, and configuration read
and write cycles as a master
- Ability to generate special cycles as a master
- Ability to respond to memory space accesses as a target
- Peer-to-peer DMA on a single segment
- Exclusive access to main memory (
LOCK#
signal not connected)
- Peer-to-peer DMA between different PCI bus segments
- Local (on-PCI) cache support
- External arbiter
- Cache-line wrap addressing mode
- Fast back-to-back cycles as PIO master
- Address/data stepping
- Subtractive decode
- Any DOS compatibility feature
The STream Cache (STC) is used to accelerate PCI DMA to and
from system memory. For DMA reads, the STC speculatively
prefetches 64-byte cache lines. For DMA writes, the STC buffers
up 64-byte lines before sending to the UPA bus. The STC also acts
as a local cache for PCI read accesses to the same block. There
are two separate STC blocks in the HPB, one associated with each
PBM. Each STC contains storage for sixteen 64-byte lines which
are allocated during DMA on a least recently used basis.
The STC resides outside of the coherent
memory domain; therefore, it must rely on software to maintain
data correctness to the PCI devices. The STC includes registers
to invalidate and to flush the STC entries for maintaining memory
coherency. Device drivers call ddi_dma_sync(9F) to write to the
STC registers after the DMA transfer is completed.
In the Sun Ultra architecture, interrupts to a processor are
sent as packets on the UPA bus. The Interrupt Dispatch Unit (IDU)
is the main resource for generating such packets. The IDU accepts
interrupt requests from the UPA, PCI buses, graphics slots, and
internal HPB sources, and dispatches interrupt packets to the
UPA.
The PCI devices on the slots of each PCI
bus share the four PCI interrupt lines (INTA#, INTB#, INTC#,
INTD#). The Interrupt Number Offset (INO) is an interrupt
concentrator to encode the interrupt sources onto a 6-bit value.
The IDU includes the INO in the packet to the UPA.
Before the IDU sends any PCI related
interrupt to the UPA, it checks with the appropriate PBM to see
if it has any posted consistent DMA write data (see "Consistent vs. Streaming DMA"
for details). If so, the IDU waits until the PBM indicates that
the write data has been sent to the UPA.
The IOMMU performs virtual to physical address translation
during DMA. It maps 32-bit PCI virtual address to 41-bit system
bus physical address. The IOMMU consists of a 16-entry
Translation Lookaside Buffer (TLB) to cache recently used
translations as well as a Translation Storage Buffer (TSB), which
is a software managed data structure. The number of TSB entries
is software configurable and is typically set to 8K, which gives
the total DMA space of 64 Mbytes (8K * IOMMU_PAGESIZE,
see "DVMA Resources and IOMMU Translations" for details).
The DMA merge buffer is used for servicing DMA writes of less
than 64 bytes (subline writes). This is required for Sun4u, which
is only accessible in cached 64-byte quantities. The merge buffer
contains a 64-byte buffer for storing the partial line. There are
valid bits for each byte in the buffer, so it is able to handle
completely arbitrary byte enables on a consistent write from a
PCI device.
Each PBM implements all of the required host bridge functions
for PCI, and also acts as the central resource for arbitration,
reset, and system error (SERR#) monitoring. The PBM handles the
timing of PIO requests to the PCI bus, which includes target
disconnects, retries, and various error conditions during the
PIO.
The PBM is capable of generating aligned
PIO reads of 1, 2, 4, 8, 16, and 64 bytes to memory space and 1-,
2-, and 4-byte accesses to I/O and configuration space. It is
also capable of generating PIO writes with arbitrary byte enables
with 0-16 bytes transferred to memory space, and 0-4 bytes
transferred to I/O or configuration space. In addition, 64-byte
PIO writes with all bytes enabled can be generated to memory
space.
PIO write data is posted to the 64-byte
write buffer to be dispatched by the PBM, which handles target
retries and disconnects transparently to other control blocks.
PIO read data is loaded directly from the PCI bus to the HPB's
internal bus in 64-bit quantities.
As a target, the PBM only responds to PCI memory space
commands (MR, MW, MRL, MRM,
MWI) and Dual Access Cycle. All other PCI commands
are ignored. Typically, the transactions address of the memory
commands is treated as a virtual address, and translated to a
physical address by the IOMMU. These transactions are referred to
as DMA transactions. The PBM supports both consistent and
streaming mapped DMA.
DMA writes are posted to dual 64-byte
buffers. One buffer can drain to the HPB's internal data bus
while the other is filling from the PCI bus. Byte enables are
stored with the data. DMA reads are single buffered in the PBM to
insulate the internal data bus from disconnects and pauses due to
master wait states, master latency time-out, slower bus speed,
and narrower bus width.
When a DMA burst transfer attempts to go
past a cache line (64 bytes) boundary, the HPB generates a
disconnect. This should cause the master device to attempt the
transaction again beginning at the address of the next
untransferred data.
Under certain conditions, the PBM issues a
retry command for an incoming PCI transaction. These conditions
include:
- PBM requests the IOMMU to do a TSB tablewalk to get
mapping for this transaction.
- The STC indicates that it is initiating a request to get
the desired read data.
- Due to congestion, there are not enough buffers to accept
a transaction.
Arbitrary byte enables are supported for consistent DMA
transactions. In streaming mode, all data must be contiguous
bytes within a single PCI transaction. Gaps between the end
address of one PCI transaction and the start address of the next
are allowed in streaming DMA transactions.
<<
Previous |
Contents |
Next>>
|