Search for IP
 
 
 
 
Plan a SoC Project
 
 
Design Centers
 
 
 
Search for Tools
 
 
Qualify your SoC
 
 
 
 
Hot Corners
 
 
In the SoC World
 
 
 
 
 
 
 
Free Download
 
 
About D&R
 
 
 
 
 









A BUS ARCHITECTURE FOR SYSTEM-ON-CHIP DESIGNS

Bill Cordan

PALMCHIP CORPORATION
Colorado Design Center
Loveland, CO

Abstract
This paper presents the issues confronted when integrating system-on-chip (SOC) designs and offers a solution through a detailed description of the CoreFrame® architecture.  CoreFrame will dramatically reduce system de-sign and verification efforts while enhancing the reusabil-ity and customizability of system-on-chip product developments. The CoreFrame on-chip bus architecture is defined along with examples to illustrate how a design friendly bus standard will effect the mix and match of reusable cores without sacrificing performance.
 

Introduction

Systems-on-chip developments pose many challenges that are complicated by the need to accelerate the design process  in order to hit ever shorter market windows. One strategy to reduce design time and hence meet time-to-market requirements is through the use of reusable cores or Intel-lectual Property (IP). IP brings the base functionality to a design freeing up resources for ground-up design of value added functions or embedded firmware that differentiate one’s product in the marketplace. In order to effectively use system IP, a design methodology including hard-ware/ software co-simulation techniques and a versatile and design-friendly on-chip bus system is required. A look at existing bus standards that typically have migrated from earlier embedded system board applications leads to the conclusion that they have had to make unacceptable com-promises  in terms of performance, efficiency or synthesis friendliness. At PALMCHIP, we started with a clean sheet approach and designed a bus system called CoreFrame® which has successfully been used to create a variety of system-on-chip designs in a relatively short time.

On-Chip Bus Design Issues

Most existing busses such as PCI and ISA were designed as system level buses to connect discrete devices on a PCB substrate. At the board level, a key issue is minimizing the number of bus signals because pin and signal count translates directly into package and PCB costs. A large number of device pins increases package footprint and reduces component density on the board. System level buses must support add-in cards and PCB backplanes where connector size and cost are also directly related to signal count. This is why traditional system level buses use shared tri-state signaling and, in the case of PCI, multiplexed address and data on the same signals.

With system-on-chip, signal routing consumes silicon area but does not affect the size or cost of packages, PCBs or connectors. In addition, the capabilities and limitations of today’s logic synthesis tools directly impact design time and performance and must be taken into account. Achieving the lowest possible routing overhead is of little value if design time balloons and the market window is missed. Synthesis tools find it difficult to deal with shared tri-state signals with several drivers and receivers con-nected to the same trace. Static timing analysis is awkward and often the only way to verify timing is to use a circuit level simulator such as Spice. All of this takes time and effort without adding real value in terms of device functionality or features. Bus loading also limits theoreti-cal performance and the verification problems associated with bus loading can lead to a conservative design whose performance falls short of the inherent technology capa-bilities.

The on-chip world also has a significantly different set of design constraints and tradeoffs compared with the board-level environment. A bus designed for use on PCBs will not provide the most efficient on-chip solution. What is needed is a completely new bus architecture optimized for systems-on-chip. Key issues are performance, design time reduction, ease-of-use, power consumption, and silicon efficiency.
 

The CoreFrame SOC Architecture

In any processor-driven design, a number of peripheral devices are needed. These include timers, DMA engines, interrupt controllers, and memory controllers. In many cost-sensitive applications, a shared memory structure is utilized to reduce memory component costs. An architecture is needed that addresses the memory needs of all de-vices without severely degrading the performance of any single device.

PALMCHIP developed a system-on-chip bus architecture called CoreFrame from the ground up to meet the unique requirements of SOC based designs. The CoreFrame ar-chitecture differs significantly from other on-chip buses. By using point-to-point signals instead of shared tri-stated lines, it delivers higher performance while simultaneously reducing design and verification effort. The CoreFrame architecture was developed with several concerns in mind:
 

  • It must be foundry, processor and technology independent
  • It must be easily synthesizable
  • It must be centered around shared memory
  • It must be flexible
  • It must be modular
  • It must not sacrifice performance
  • It must be not add cost to a design


To address these concerns, the CoreFrame architecture includes:
 

  • 400 MB/s bandwidth at 100 MHz (Bus speed is scal-able to technology and design requirements)
  • Support for 32-, 16- and 8-bit peripherals
  • Unidirectional busses only
  • Positive-edge clocking only
  • A central, shared memory controller
  • Single clock cycle data transfers
  • Zero wait state register accesses
  • Separate peripheral I/O and DMA busses
  • Simple protocol for reduced gate count
  • Low-capacitive loading for high-frequency operation
  • Hidden arbitration for DMA bus masters
  • Application-specific memory map and peripherals


Figure 1 CoreFrame SOC Architecture


Perhaps the most distinctive feature of CoreFrame is the separation of I/O and memory transfers onto different  buses. The PalmBus provides the I/O backplane and allows the processor to configure and control peripheral blocks while the MBus provides a direct memory access (DMA) connection from peripherals to shared main memory, allowing peripherals to transfer data directly without processor intervention. Fig. 1 illustrates an example CoreFrame system.

The architecture centers on the PalmBus and the MBus. The PalmBus is designed for low-speed accesses from the CPU core. The MBus is designed for high-speed accesses to external memory from the CPU core or peripheral blocks. CoreFrame’s primary components include:

  • PalmBus
  • MBus
  • CPU Core (RISC, DSP, …)
  • Internal memory blocks
  • PalmBus Interface Controller
  • DMA Channels
  • Memory Access Controller (MAC)
  • Arbiter
  • Peripheral Blocks

A. PalmBus

The PalmBus is the interface for communications between the CPU and peripheral blocks and is not used to access memory. The PalmBus is a master-slave interface with a single master, the CPU core through its PalmBus interface controller. Its timings are synchronous with the CPU core, operating at a clock rate that is equal to or twice that of the CPU core. The MAC, arbiter and channels may also have ties to the PalmBus for set up, configuration, and to read block status. The PalmBus Interface Controller translates a CPU’s pipelined timings to non-pipelined timings. It generates the clock for the CPU as a divide-by-two of its clock. Its responsibilities include timing translation, block address decode and wait generation. PalmBus protocol and signaling are designed for easy memory-mapped register control common to ASIC con-  trol. The common tasks of writing and reading registers can be accomplished with a small number of logic gates and minimal verification time. Because all signals are launched and captured by rising edges of the bus clock,  and are not bi-directional, synthesis and static timing analysis are straightforward tasks. PalmBus peripherals can be operated at different clock frequencies than the PalmBus controller through the use of the wait signal. This simplifies peripheral design and integration by isolating clock domains. PalmBus is also designed with low power consumption in mind; special provisions are pro- vided to ease the integration of peripherals that, though synchronous, use latches for lower power consumption.
 

B. MBus

The MBus is the interface for communicating between the Memory Access Controller (MAC) and the memory channels. The MBus is an arbitrated initiator-target interface with only one target, the MAC. Each initiator (master) arbitrates for the MAC and once transfer is granted, the MAC is the bus master, controlling all data flow. The MBus is synchronous to the MAC. It is not meant for peer-to-peer communications.

The MBus protocol is optimized both for ASIC-type implementations and for data transfers to and from memory devices. Control signals that are commonly needed for DMA-type transfers are central to the protocol, eliminating the need for bus protocol state machines. MBus utilizes hidden arbitration to further simplify its protocol; however, recognizing that ASICs have a wide range of system requirements, the arbitration scheme is application-specific. Because memory devices vary significantly in their protocols and access latencies, the MBus is designed to be adaptive, allowing the MAC to control the bus as it sees fit for the memory device being accessed. This allows optimizations to be made in the MAC to maximize throughput and minimize latency, or for cost-sensitive applications to minimize design size.

DMA Channels interface between peripheral blocks and the MBus. A peripheral block interfaces to a channel only if it accesses shared memory. If the peripheral block is asynchronous to the MAC, a buffer (FIFO) is implemented, where the block side of the buffer is synchronous to the block clock and the MBus side of the buffer is synchronous to the MAC. The Arbiter is generally application specific. It takes a request from each of the channels and responds with a grant. It can be embedded into the MAC as is the case with PALMCHIP’s Configurable Memory Controller core.

C. CPU Subsystem

CoreFrame is a CPU independent architecture. The CPU core may be provided by the foundry as a hardcore (e.g. ARM, MIPS, Lexra, PowerPC…) or by the core vendor as a configurable soft core (e.g. ARM7TDMI-S, ARC, Lexra-RTL…). The CPU subsystem sits at the architecture’s heart and may contain local memory for its own use on its native CPU bus. The subsystem links to CoreFrame through the PalmBus Controller to slow peripherals or to the configuration/status registers of high-speed peripherals and the MBus Cache for the CPU’s instructions or data.

D. Topography
CoreFrame is wired together using a star-shaped topology as shown in Figures 2 and 3. Broadcast signals, driven by the PalmBus Controller and Memory Access Controller, are connected to all their respective peripherals while signals specific to each peripheral are point-to-point. The bus does not allow signals from peripheral to peripheral that are application specific. This is because in real systems most peripherals exchange only control or status informa-tion between peripherals, and do not need to exchange data directly with their peers. Data is instead communicated through shared main memory using either programmed I/O or DMA. CoreFrame exploits this fact to simplify the bus architecture and avoid tri-state signals.
 



Figure 2 MBus Topology

The MAC has a control port on the PalmBus to allow the processor to configure and control its attributes. This will also hold true for many DMA ports on the MBus. Their PalmBus ports allow access by the processor to configure or control their operation.



Figure 3 PalmBus Topology

Exclusive use of point-to-point and broadcast signaling brings other benefits in addition to the tool-related issues already described. Bus utilization efficiency is increased because there is no need for turn-around cycles. Load capacitances are lower because each signal has only a single driver, and in the case of point-to-point signals only a sin-gle load. Broadcast signals can easily be re-driven with no extra control logic. Power consumption is reduced be-cause bus holders that oppose signal transitions are elimi-nated. As a result, the buses can be run at higher speed, with greater efficiency and lower power.
 

CoreFrame HDD Controller

The CoreFrame system-on-chip architecture represents a versatile and flexible means to not only facilitate SOC designs but to plug-and-play variations to those designs. Fig. 4 illustrates the architecture being used as a single chip Hard Disc Controller containing all the system components needed for the HDD application: a CPU subsystem using ARM’s ARM7TDMI core and local SRAM, ATA-33 host interface, data buffer, MAC, servo interface, formatter, Reed Solomon ECC and serial interface. A UART has been added for system monitoring, self test, and debug.
 


Figure 4 GreenLite HDD Controller

Variations may be accomplished by replacing the ATA  with a IEEE1394 or PC-Card interface, upgrading the servo logic, or replacing the UART with an embedded software trace module. The example shown here has been implemented in silicon as PALMCHIP’s GreenLite TM HDD Controller  and has seen at least five variations, each of which required relatively little in the way of engineering resources due to the design friendliness and plug-and-play qualities of CoreFrame.

To meet the requirements of a large variety of system-on-chip applications, PALMCHIP offers a complete SOC microcontroller reference design known as Mango TM . Mango is a fully functional SOC microcontroller build from CoreFrame and will allow easy migration to other SOC applications through the mixing and matching of  reusable IP blocks along with any customer specific logic.  Figure 5 shows a block diagram of the Mango.
 


Figure 5 Mango Microcontroller


Conclusion

On-chip bus architectures will make or break the concept of system-on-chip design if it provides an effective vehicle for the mix-and-match insertion of custom IP without sacrificing either performance or verification time. CoreFrame  is a silicon-proven on-chip bus architecture that has significant advantages compared with other system interconnect schemes. Its definition is optimized for ASIC implementations. Its shared-memory architecture is opti-mized for devices with high bandwidth data streams requiring extensive DMA. This covers a wide range of applications such as mass storage, networking, printer controllers, and mobile communications. CoreFrame is designed to be synthesis friendly and provides plug-and-play connectivity to reduce SOC design time. Palmchip has developed a portfolio of peripherals for CoreFrame and is offering the architecture as solution to the SOC integration issues outlined in this paper. A more detailed specification can be requested off our web site www.palmchip.com as well as hardware/software solutions from basic building blocks to full system-on-chip reference designs.

References:
Bill Cordan, " An Efficient Bus Architecture for System-on- Chip Design" 1999 IEEE Custom Integrated Circuits Conference.




Click for printer-friendly version




Sponsor Links
Infineon
A C166S V1.2 IPEVAL kit is now available for the evaluation, verification and SW co-development of the Infineon 16-bit C166S V1.2 MCU softmacro
ARM
ARM adds Java extensions to arm9 core
Dolphin Integration
Flip8051 is the SOLE core to protect your data and software Intellectual Property!
IBM Microelectronics
Visit the PowerPC Developer Corner.




Home | Feedback | Register | Site Map

All material on this site Copyright © 2003 Design And Reuse S.A. All rights reserved.