'584 revisited, please note final quotation:
posted on
Feb 15, 2009 12:32PM
584 Claim 29 Rejection Notes (Part 2)
My Summary
32-bit instructions that are fetched one instruction at a time. “Instruction pieces” are NOT instructions. “Instruction pieces” probably refers to the 6 fields of each instruction, which can be used at different places in the pipeline for the processing of each 32-bit instruction.
Very different from PTSC processor architecture
Mainframe Computer (not microprocessor), which has large instruction buffer to hold many instructions to be executed in the pipeline (assembly line). Multiple instructions are loaded from memory into the instruction buffer. Short loop is in Instruction Buffer.
How can one compare a microprocessor architecture with this very old mainframe architecture? Is this a bad joke?
http://www.research.ibm.com/journal/...
Hardly a Microprocessor
http://www.columbia.edu/acis/history...
Very old mainframe (Hardly a microprocessor). 36-bit instructions, which are “fetched in pairs.” I’m almost certain that these pairs of instructions are not actually fetched simultaneously. That would require a 72-bit bus. They are “sequentially transferred to the Control Unit”
Is this another bad joke?
http://ed-thelen.org/comp-hist/GE-63...
Machine instructions have the following general format:
---------------------------+------------+---+---+---+--------------
| y | op code | 0 | i | 0 | tag |
---------------------------+------------+---+---+---+--------------
0 17 18 26 27 28 29 30 35
Where
y
the address field; also used in some cases to augment the Op Code as in shift operations where it specifies the number of shifts
Op Code
the operation code, usually stated in the form of a 3-digit octal number
i
interrupt inhibit bit
Tag
the tag field, generally used to control the address modification
0
the two bit positions 27 and 29 have no function at this time; however, they must be zero for compatibility with other 600-line Processors.
The three repeat instructions, Repeat, Repeat Double, and Repeat Link (RPT, RPD, and RPL) use a different instruction format. (See pages II-125, II-127, and II-129.
Indirect words have the same general format as the instruction words; however, the fields are used in a somewhat different way. (See page II-26 and following.)
Instruction words are fetched in pairs and sequentially transferred to the Control Unit of the Processor where the instructions are directed to the primary and secondary instruction registers of the instruction decoder. If required, address modification is then performed using the first of the two instructions. As soon as this is accomplished, the operand specified by the first instruction is requested from memory while the Control Unit concurrently performs any address modification required by the second of the instruction pair.
When the operand called for by the first instruction is obtained, the Control Unit transfers the operand to the Operations Unit, thus initiating the specified operation to be carried out. While this operation is being carried out by the Operations Unit, the operand specified by the second instruction is requested by the Control Unit. As soon as the second operand is received and the Operations Unit has finished with the first operand, the Control Unit signals the Oprations Unit to carry out the second operation. Finally, while the second operation is being carried out, the next instruction pair is requested from memory.
Clipper Chip
My Summary: 4 bytes of instruction stream are fed at a time from memory to the Instruction Buffer (8-bytes long). Since some instructions are 2 bytes, sometimes 2 instructions will be fed at a time into the Instruction Buffer. No branching is permitted within the Instruction Buffer (so no short loops). Instructions are fed from the Instruction Buffer into the pipeline of the Instruction Control Unit (ICU), where one instruction at a time is in each stage of the ICU.
This is interesting microprocessor design, but is quite different from the PTSC architecture, where up to 4 instructions are read into the Instruction Register (IR) at a time, and micro loops (within the IR) are possible. The CAMMU’s are actually other chips (not part of processor chip).
http://www.eecs.berkeley.edu/Pubs/Te...
Paragraphs 7.1 Instruction Bus Interface and 7.6 Instruction Control Unit images follow:
IBM ROMP
My Summary:
Instructions are either 2 bytes or 4 bytes. A 4-byte word is fetched at a time into the Pre-fetch Buffer (16-byte instruction queue), so sometimes 2 instructions are fetched simultaneously into the Pre-fetch Buffer. Short loops are possible within the Pre-fetch buffer without re-fetching the instructions.
This is not the same as fetching up to 4 instructions simultaneously into the 32-bit Instruction Register of PTSC processor and permitting micro loops within the Instruction Register.
http://www.research.ibm.com/journal/...
From Reference
instructions are fetched ahead a
word at a time into a 16-byte instruction queue
(called a pre-fetch buffer on ROMP)
The short loop illustrated
is executed entirely from the ROMP instruction prefetch
buffer,
Intel i860
My Summary:
All instructions are 32-bit. Two instructions fetched at a time into Instruction Cache (4K bytes), because of 64-bit bus. Pipelined architecture, where stages of processing for instructions overlap. Simultaneous processing of one integer instruction and one floating point instruction is possible.
Intel i960
My Summary:
All instructions are 32-bit. Up to 4 instructions can be fetched in a “burst” into the Instruction Cache. The memory bus is actually 32-bits, so 4 instructions are actually fetched sequentially from memory and put into the Instruction Cache. Instruction Cache described here is 128 instructions (512bytes ). The short loops described refer to looping within this “large” instruction cache, NOT like PTSC microloop.
http://support.intel.com/design/i960...
The 80960MC
gets optimal use of its memory bus bandwidth
because the bus is tuned for use with the onchip
instruction cache: instruction cache line
size matches the maximum burst size for
instruction fetches. The 80960MC automatically
fetches four words in a burst and stores
them directly in the cache.
1.1.5 Instruction Cache
To further reduce memory accesses, the 80960MC
includes a 512-byte on-chip instruction cache. The
instruction cache is based on the concept of locality
of reference; most programs are typically not
executed in a steady stream but consist of many
branches, loops and procedure calls that lead to
jumping back and forth in the same small section of
code. Thus, by maintaining a block of instructions in
cache, the number of memory references required to
read instructions into the processor is greatly
reduced.
To load the instruction cache, instructions are
fetched in 16-byte blocks; up to four instructions can
be fetched at one time. An efficient prefetch algorithm
increases the probability that an instruction is
already in the cache when it is needed.
Code for small loops often fits entirely within the
cache, leading to an increase in processing speed
since further memory references might not be
necessary until the program exits the loop. Similarly,
when calling short procedures, the code for the
calling procedure is likely to remain in the cache so it
is there on the procedure’s return.
1.1.11 High Bandwidth Local Bus
The 80960MC CPU resides on a high-bandwidth
address/data bus known as the local bus (L-Bus).
The L-Bus provides a direct communication path
between the processor and the memory and I/O
subsystem interfaces. The processor uses the L-Bus
to fetch instructions, manipulate memory and
respond to interrupts. L-Bus features include:
• 32-bit multiplexed address/data path
• Four-word burst capability which allows transfers
from 1 to 16 bytes at a time
http://agoracom.com/ir/patriot/forum...
When I first commented about my pessimism on overcoming the Claim 29 rejection for the 584 patent, my comment was based upon reading the rejection notice and very quickly scanning a couple of the patents referenced.
After looking more carefully at the 4 patents, which are referenced in the rejection notice, I am not as pessimistic. In my opinion, the 4 referenced patents describe significantly different architectures, and are really not similar to the 584 processor architecture. I think that TPL should be able to discredit these 4 references.
I have not looked at the other 7 references, since they are not so easily retrieved. I wonder if they are equally as "irrelevant" to the 584?
Anyway, it seems to me that whoever did the USPTO review is not very technically competent if he cannot see how different the 4 referenced patents are to the 584 architecture.
Below are my notes:
584 Claim 29 Rejection Notes:
My Summary: This is a totally different “animal” for parallel processing. Many restrictions on the instruction groups to allow for such parallel processing.
From Patent:
Claim 1: 1.
A method of concurrently executing n instructions in parallel, where n is an integer >1, on different sets of data, said method comprising the steps of:
compiling said instructions into groups of n different instructions for storage in a memory device, with no more than a fixed number of data and instruction fetches, and a fixed number of store operations being permitted in a group, and any branch instruction always being the last instruction in the group;
storing one group at a time of said instructions in a first storage device;
storing different sets of data in a second storage device;
retrieving said one group of instructions from said first storage devices; and
executing in parallel each of the retrieved n instructions in said one group concurrently in n different execution devices, with each of said n instructions utilizing a different set of the stored data in said second storage device for purposes of instruction execution.
DESCRIPTION
1. Technical Field
The invention is in the field of computing machines, and in particular is directed to instruction execution sequences. Specifically, the computing machine according to the present invention is directed to the concurrent execution of multiple instructions of a given type during a single machine cycle with each instruction utilizing a different set of data.
DISCLOSURE OF THE INVENTION
A computing machine which utilizes instructions that have been compiled into groups. A group consists of from one to n instructions, where n is an integer determined by the particular implementation of the machine. The integer four is utilized for purposes of the following description, however, it is to be understood that the group length can be less than four or greater than four. For a group of four, the group of instructions must conform to the following rules:
Sachs (4,933,835)
My Summary: Emphasis on Dual MMU’s (instruction cache MMU and data cache MMU).
Appears to be a 16-bit instruction (parcels?) processor, so 2 instructions are retrieved from cache at a time. They are placed into “open” holding parcel registers in a pipeline to be eventually executed from 16-bit instruction register. Instruction register holds one instruction.
The other references in the rejection allude to retrieving main memory into cache memory; NOT into the instruction register as “instruction groups.”
From Patent:
Instruction interface 1310 of processor 110 includes a multi-stage instruction bus 1311 which provides means for storing, in seriatim, a plurality of instruction parcels, one per stage. A cache advance signal ISEND is sent by the instruction interface as it has free space. This signals instruction cache-MMU 120 to provide an additional 32-bit word containing two 16-bit instruction parcels via instruction bus 121. This multi-stage instruction buffer increases the average instruction throughput rate.
Prefetch 2 instructions into holding registers.
In FIG. 3, prefetch buffer 1311 is shown in detail, comprising the four prefetch buffer register stages IH, IL, IA and IC. The IH register stage holds a 16-bit instruction parcel in register 1312 plus an additional bit of control information in register 1313, IHD, which bit is set to indicate whether IH currently contains a parcel. Each of the register stages is similarly equipped to contain an instruction parcel and an associated control bit. Buffer advance logic circuit 1314 administers the parcel and control bit contents of the four register stages. In response to the parcel advance control signal PADV from instruction decoder 103, buffer advance logic circuit 1314 gates the next available instruction parcel into instruction register 102 through multiplexor 1315, and marks empty the control bit associated with the register stage from which the parcel was obtained. In response to the control bits of the four stages, circuit 1314 advances the parcels to fill empty register stages. As space becomes available for new instruction parcels from the instruction cache-MMU, cache advance logic circuit 1316 responds to the control bits to issue the ISEND signal on instruction bus 121. Instruction cache-MMU responds with a 32-bit word containing two parcels. The high order parcel is received in IH, and the low order parcel in IL through multiplexor 1319.
An example of different operations by the two types of cache-MMUs is in instruction prefetch. When enabled, the instruction cache-MMU will prefetch the four instruction words that follow the current instruction address. This reduces system response time for strings of sequential instructions. Prefetch stragety is not as successful in reducing access delays for data, so no prefetch is done in the data cache-MMU.
Oklobdzija (4,714,994) 1985
My Summary:
Patent for Instruction Prefetch Buffer (IFB). Instructions (“macro instructions”) can be 16-bit or 32-bit, which come from the RAM into the IFB. 32 bits are retrieved from RAM simultaneously, so up to 2 instructions are fetched from RAM into IFB at a time. 8-bit opcodes of these “macro instructions” are used to access appropriate 32-bit microinstruction (this is a microprogrammed processor architecture), which is then loaded into the Instruction Register (IR).
This is really not the same as PTSC architecture, which can load up to 4 instructions directly into the IR. IFB is an instruction cache for the “macro instructions.” These “macro instructions” are not loaded directly into IR.
From Patent:
Explaining in greater detail, in a preferred embodiment, the IPB array 10 is connected to a 32-bit wide processor bus 12 over which instructions are supplied to the IPB array 10 from a random access memory (RAM), not shown. In the drawing, the thirty-two lines of the bus 12 are indicated with a slash and the numeral "32" adjacent, and this convention is used throughout the various figures. Instructions are written into the IPB array 10 under the control of the IPB control 20 by means of a 4-bit write pointer, and instructions are read out of the IPB array 10 also under the control of the IPB control 20 by means of 5-bit read pointer. The 32 bits which are read out of the IPB array 10 include 8 bits of operation (OP) code, the remaining 24 bits comprising branch instruction (BI) and jump instruction (JI) fields. The 8-bit OP code is supplied to a microinstruction read only memory (ROM) 301. This 8-bit OP code serves as an address to select a 32-bit microinstruction which is transmitted to the CPU. This is conventional and well understood in the art. The 8-bit OP code is also supplied to an instruction length decoder 303, the output of which is supplied to both the CPU and the IPB control 20. As will be explained is more detail, the IPB array 10 is capable of handling variable length instructions, 16-bits or 32-bits in the preferred embodiment. Thus, it is necessary to provide an instruction length decoder 303 to recognize the length of the instruction which has been read out of the IPB array 10. Also, the 24-bit portion of the 32 bits read out of the IPB array 10 which comprises the BI and JI fields is supplied to both the CPU and the IPB control 20. The 32-bit instruction address from tne instruction address register (IAR) 305 is supplied to both the CPU and the IPB control 20. Thus, the IPB control 20 receives as inputs 24 bits from the 32 bits read out of the IPB array 10, the decoded instruction length and the instruction address. From this information, the IPB control 20 generates the 4-bit write and the 5 bit read pointers.
Zolnowsky (4,566,063) 1983
My Summary: Very similar to Motorola MC68000 architecture. This is a pipelined Microprogrammed architecture. Multiple instructions (macroinstructions) are NOT loaded in to the Instruction Register (IR) simultaneously. The reference to a microloop in the rejection is very flawed. This patent refers to looping within instructions in the pipeline (where pipeline contents can be modified in certain cases to take advantage of very short loops), not looping within instructions inside the IR (like PTSC’s processor).
http://agoracom.com/ir/patriot/forum...
With a fresh claim 29 on the '584, I humbly suggest that this negates the previous Markman construction and the following becomes even more important:
According to Leckrone, all ARM core families (ARM7, ARM9, ARM9E, ARM10E, ARM11) and the ARM Cortex microprocessor core family do infringe upon the US '584 as well as US 5,440,749 in the MMP Portfolio. He warned that all manufacturers of end user products using infringing ARM processors - with the exception of the 20 global manufacturers who have already purchased MMP Portfolio licenses - are infringers of technology protected by the MMP Portfolio. "The longer infringers wait to purchase an MMP license, the more they can expect to pay because our program is designed to reward first-movers in various industry categories."
Be well