

QRQN

A HIGH PERFORMANCE PERSONAL COMPUTER

# A NEW CONCEPT IN PERSONAL COMPUTING

ORION is High Level Hardware's new 32-bit word length supermicrocomputer intended particularly for the scientific community. Designed primarily as a member of the new generation of high performance personal computers, the ORION machine will be of particular value to users whose work involves a large amount of computation such as in the fields of mathematical modelling, artificial intelligence and symbolic algebra. With its combination of high speed and architectural flexibility it will also be of interest to computer scientists as a powerful research tool for the development of the next generation of computer architecture.

Over the last twenty years there have been many developments in computer hardware technology at the component level but these have not been matched by corresponding advances in computer architecture. The ORION hardware attempts to redress this imbalance. ORION's architecture is designed to yield efficient and reliable implementations of high level languages, a feature previously restricted to expensive mainframe computers.





The ORION hardware is complemented by UNIX†, a portable operating system that brings with it the benefits of mature design, a variety of high level languages and an extensive selection of tools for program development and text processing. Unusually for a new machine, the adoption of UNIX means that ORION users gain immediate access to a wealth of existing software and become part of the large and rapidly growing international community of UNIX users.

<sup>†</sup> UNIX is a Trademark of Bell Laboratories.

### HARDWARE

The ORION computer is a compact unit mounted in a standard 19 inch rack built in to a desk; the rack also houses disk and tape drives.

The standard microcode implements a stack oriented machine. Rather than forcing all data through the bottleneck of a small number of high speed registers, a working stack is maintained in cache memory that is used both for expression evaluation and for storage of local variables. Measurements on ORION have shown that for programs written in high level procedural language, the majority of instructions need only access this working stack, giving fast overall operation. Operational speed is further enhanced by adopting a compact representation for the most frequently used instructions. Instruction latency is minimised by arranging for up to eight instructions to be prefetched from main memory into an opcode cache in the CPU and by overlapping instruction decoding and execution.

#### Central processing unit

The 32-bit ALU is built around Am2901C bit slice microprocessors. To these has been added a byte manipulation unit which can perform the shifting, rotating and masking operations required for handling eight and sixteen bit data. Additional logic is provided to support both signed and unsigned two's complement comparisons in a single operation, multiple precision arithmetic and floating point normalization. Most operations can be performed in 150 ns. The cycle time is variable from 125 ns to 200 ns under micro-program control so that timing can be optimised.

The micro-sequencer directs the flow of control through the microprogram. It can perform branches, loops and subroutine calls, most of which may be conditional on any of several CPU status conditions. The main

component is the Am2910, an LSI device designed specifically for this purpose.

The instruction decoder is the portion of the CPU responsible for decoding machine level instructions (as opposed to micro-instructions). This is done using map tables held in fast parity checked RAM which map one byte opcodes onto micro-instruction addresses. Control is transferred to these addresses using a special sequencer operation which is performed in parallel with other CPU functions; instruction decoding is thus normally fully overlapped with instruction execution.

An escape mechanism is provided to allow the instruction set to be expanded beyond the 256 entries selected by a one byte opcode. A further mechanism exists to switch between several sets of dispatch tables, allowing the machine to support multiple instruction sets concurrently. Using this mechanism a different instruction set can be selected each time a context switch occurs. The mechanism is also used to implement privileged instructions, dynamic profiling (for performance monitoring) and multiple CPU modes (e.g. User and Kernel).

The cache memory is totally independent of main memory. Its principle function is to hold the top of an evaluation stack for a procedure oriented language. The cache has a two cycle latency after which it can deliver one word per cycle and is divided into a number of pages each of 512 32-bit words with parity protection. The pages are grouped in pairs with an architectural maximum of 16 pairs, the current hardware having two such pairs. The second member of each pair is used typically as additional fast registers and scratch storage without affecting the stack page. The lower nine bits of the CPU register which addresses the cache is implemented with counters which allow increment and decrement operations (push and pop) as well as random access. By exploiting the structure of modern high level programming languages and keeping the current stack frame in the cache memory, a high hit rate is obtained using very simple hardware.





#### Control store

The control store is built using high speed static RAM devices. This is normally loaded at bootstrap time, allowing the machine to be fully user microprogrammable. The control store cycle time is 125 ns, equal to the fastest CPU cycle. The architecture allows for up to 32 Kwords (64 bit wordlength) of control store, although current memory technology limits this to 8 Kwords. The standard configuration has 4 Kwords on a single circuit board. Parity checking is provided. To achieve the required speed at reasonable cost, a two-level pipeline is employed around the control store.

#### Main memory

Main memory is organised as 32-bit words with two-way interleaving, allowing 64 bits of data to be fetched or stored in one operation. In normal operation main memory is accessed via a virtual memory management unit.

Each main memory module contains 0.5 Mbytes of storage with parity protection, constructed using 64K dynamic MOS RAMs. Random access cycle time is 500 ns per 32-bit word but multi-word transfers, for example to and from the cache, yield an effective cycle time of 250 ns per 32-bit word (16 Mbytes per second). The memory modules decode 26-bit physical word addresses and within this limit total memory capacity is restricted only by the number of available system bus slots; depending on the I/O configuration of the system, up to 10 Mbytes of physical memory can be installed.

Logical to physical address translation is carried out using a set of address translation tables. Each process has access to three independently extensible regions of memory, used typically for program, heap and stack. A fourth region is normally reserved for the operating system. The tables also contain a set of rights bits for each memory page giving full protection and supporting the implementation of demand paged virtual memory. The translation tables are cached in the CPU resulting, in most cases, in an overhead of only

one microinstruction when performing address translation. The page size which is fixed by the hardware, is 4 Kbytes. Each logical region can be up to 256 Mbytes.

#### I/O subsystems

The ORION I/O subsystems include a number of attached microcomputers to perform low level tasks such as running diagnostics and managing terminals and disks.

The diagnostic microcomputer is embedded within the CPU. Its functions include running a system confidence check when power is first applied, bootstrapping the CPU, and taking control should an unrecoverable control store parity error be detected. It can also be used to load new microcode dynamically whilst the machine is running. An RS-232C interface is provided to which a terminal can be attached. Extensive diagnostics can then be run in conjunction with special microcode to perform fault analysis in the event of a system failure. Problems can usually be isolated to one or two integrated circuits.

One or more intelligent I/O channels control periphal activity. Each of these includes a full function microcomputer which performs control functions and housekeeping. Data transfers to and from peripheral devices take place via a direct memory access (DMA) path itself constructed using bit-slice microprocessors. This allows the full performance of the ORION memory system and of the peripheral device to be exploited, with the microcomputer able to take corrective action on soft I/O errors. Software on ORION communicates with the microcomputer using a high level message passing protocol.

A variety of Winchester technology disk subsystems is available, with capacities ranging from 40 to 160 Mbytes. For backup or archiving, a ¼ inch cartridge tape drive capable of storing up to 23 Mbytes per cartridge is used. One or more eight inch 1.2 Mbyte floppy disk drives and an industry standard ½ inch reel to reel magnetic tape drive can also be supplied as options. Please refer to the price list for further details.

#### Upgrade path

The ORION hardware has been designed to take advantage of new developments in semiconductor memory tehnology as soon as they become available. In particular the main memory modules will accept 256K by 1-bit dynamic memory devices with a change of three jumpers and the control store, cache memory and virtual address translation buffer will accept 64K bit or larger static devices. These new devices are expected to become available at reasonable cost within a year and will allow up to a fourfold increase in the maximum storage capacity of each of the memory systems. High resolution bit-mapped graphics and local area networking are under development.

#### Performance

The execution speed of C programs on ORION approximates to that on the VAX 11/750 super minicomputer (Digital Equipment Co.).

A local integer variable can be loaded or stored in as little as 600 ns. An ordinary branch instruction takes about 1.5  $\mu$ s including refilling the instruction cache. The high level statement A:= B \* C is performed in approximately 7  $\mu$ s (32-bit integer operands) or 35  $\mu$ s (double precision floating point operands). A high level procedure call and return together take less than 3  $\mu$ s; note that this includes the time to create a new stack frame.

## SOFTWARE

Although UNIX is a time sharing operating system it works well in a single user environment on a high performance processor. High Level Hardware have ported UNIX 4.1 bsd to ORION. This system, which was developed by the University of California at Berkeley, is derived directly from Bell Laboratories' UNIX 32V for the VAX 11/780, itself derived from UNIX version 7. 4.1 bsd is currently the only version of UNIX to include virtual memory management and is the primary system used at approximately 1000 VAX sites outside Bell Labs.

The nucleus of the UNIX operating system is memory resident and includes the code that supports system calls and maintains the file system. The file system is organised around a tree-structured directory hierarchy, with a similar method of access for devices, directories and data files. This structure makes it relatively easy for programs to perform input/output operations in a device-independent manner, buffering taking place invisibly within the operating system when required.

A distinguishing feature of UNIX is its powerful 'shell' command language interpreter. Using the shell the user can readily create

multiple processes that can run sequentially or in parallel and which may themselves create offspring processes. The processes can communicate with each other by means of asynchronous software interrupts (signals) and message buffers (pipes). The shell also supports redirection of program input/output. These features assist the user to break down the solution of his problem into a number of smaller parts, each of these smaller programs being run sequentially or concurrently. The ORION user has a choice of the standard Bourne shell or the Berkeley enhanced C shell.





The large number of UNIX utilities (tools) allows many jobs to be done by using a combination of existing tools without the need to 'descend' to a high level language. Among the available tools are: ed, a context-oriented text editor, and sed, a stream editor, both with a simple and regular command structure; vi, a screen editor: nroff, a comprehensive text formatter, and spelling and other utilities concerned with word processing; sort, a generalised sorting utility, and join and uniq which can be used with sort to manipulate relational data bases; grep, which efficiently searches a file for a pattern; diff, which compares text files; awk, a pattern scanning and processing language which is virtually a C interpreter; compiler writing aids such as the

lexical analyser lex and the parser generator vacc; and many more.

The principal language available under UNIX is C, a general purpose systems implementation language that supports structured programming. The High Level Hardware C compiler is totally compatible with the UNIX standard C language. Particular effort has been expended to produce useful error diagnostics, an area greatly neglected in the past; the compiler includes many of the features of the lint C program verifier. A large number of other languages and utilities is also available, including BCPL and Fortran 77.

For specialized applications a range of microcode development tools is available. This

includes a reconfigurable micro-assembler, a micro-linker and a micro-librarian. Together they allow a modular approach to be used in micro-programming, and easy access to be gained to standard library routines. Since the ORION CPU supports multiple concurrent instruction sets, custom microcode can be developed, loaded, tested and used, entirely within the UNIX environment.

The ORION environment is attractive for implementing new languages. In particular, the task of writing a new code generator is relatively straightforward since the compiler writer is able to define an appropriate instruction set and the ORION assembler can easily be reconfigured to recognise the new instructions.





## SPECIFICATION AT A GLANCE-

| PROCESSOR                           |                                                                                                                           |
|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| Туре                                | 32 bit CPU<br>LSI bit slice technology<br>User microprogrammable                                                          |
| Writable Control Store              | 64 bits by up to 32 Kwords                                                                                                |
| Granularity                         | 4 and 16 Kwords                                                                                                           |
| Cycle Time                          | 125, 150, 175 or 200 ns, instruction dependent                                                                            |
| Floating point                      | IEEE 32 and 64 bit formats                                                                                                |
| Decoding tables                     | Up to 16 independent instruction sets<br>Pipelined instruction decoding                                                   |
| Cache memory                        | 8 Kbytes expandable to 64 Kbytes, divided into 2 Kbyte segments<br>125 ns cycle time<br>Stack or random access addressing |
| MEMORY                              |                                                                                                                           |
| Instantaneous virtual address space | 1024 Mbytes                                                                                                               |
| Physical address space              | 256 Mbytes                                                                                                                |
| Page size                           | 4 Kbytes                                                                                                                  |
| Translation buffer cache            | 2048 entries                                                                                                              |
| Technology                          | MOS 150 ns, 64K or 256K dynamic RAM                                                                                       |
| Organisation                        | 64 bits data plus 2 bits parity                                                                                           |
| Physical                            | Up to 40 Mbytes                                                                                                           |
| Granularity                         | 512 and 2048 Kbytes                                                                                                       |
|                                     |                                                                                                                           |
| Access length                       | 4 and 8 bytes                                                                                                             |

| SYSTEMINTERCONNECT |                                               |
|--------------------|-----------------------------------------------|
| Туре               | Synchronous, supporting multiple DMA channels |
| Data path          | 32 bits data, 1 bit parity                    |
| Clock              | 8 MHz                                         |
| Bandwidth          | 32 Mbytes/sec sustained                       |

16 Mbytes/sec sustained

500 ns (32 bits), 625 ns (64 bits)

Memory access bandwidth

Random access cycle time

| 1/0  |                                                                                                                                                             |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Туре | Multiple intelligent microprocessor based peripheral controllers Microprogrammed bit slice DMA channels All data transfers via DMA to minimise CPU overhead |

HIGH LEVEL HARDVVARE

High Level Hardware Ltd.
PO Box 170, Windmill Road, Oxford OX3 7BN England
Telephone Oxford (0865) 750494