Re: VNS PORTFOLIO follow-up -
in response to
by
posted on
Dec 14, 2009 12:30AM
Seems Mr. Moore has re-defined the way computers (networks), and micro chips (individual multi-core computers) communicate with each other eliminating the need to communicate thru a common bus, thus eliminating or putting less importance on using the clock as a timing mechanism for instruction data. Our in house engineers need to look at this. Please note #0088 at the bottom of the page.
---------------------------------------------------------------------------
0005]In the art of computing, processing speed is a much desired quality, and the quest to create faster computers and processors is ongoing. However, it is generally acknowledged in the industry that the limits for increasing the speed in microprocessors are rapidly being approached, at least using presently known technology. Therefore, there is an increasing interest in the use of multiple processors to increase overall computer speed by sharing computer tasks among the processors.
[0006]The use of multiple processors tends to create a need for communication between the processors. Indeed, there may well be a great deal of communication between the processors, such that a significant portion of time is spent in transferring instructions and data there between. Where the amount of such communication is significant, each additional instruction that must be executed in order to accomplish it places an incremental delay in the process which, cumulatively, can be very significant. The conventional method for communicating instructions or data from one computer to another involves first storing the data or instruction in the receiving computer and then, subsequently, calling it for execution (in the case of an instruction) or for operation thereon (in the case of data).
[0007]It would be useful to reduce the number of steps required to transmit, receive, and then use information, in the form of data or instructions, between computers. However, to the inventor's knowledge no prior art system has streamlined the above described process in a significant manner.
[0008]Also, in the prior art it is known that it is necessary to "get the attention" of a computer from time to time. That is, sometimes even though a computer may be busy with one task, another time sensitive task requirement can occur that may necessitate temporarily diverting the computer away from the first task. Examples include, but are not limited to, instances where a user input device is used to provide input to the computer. In such cases, the computer might need to temporarily acknowledge the input and/or react in accordance with the input. Then, the computer will either continue what it was doing before the input or else change what it was doing based upon the input. Although an external input is used as an example here, the same situation occurs when there is a potential conflict for attention between internal aspects of the computer, as well.
0009]When receiving data and change in status from I/O ports there have been two methods available in the prior art. One has been to "poll" the port, which involves reading the status of the port at regular intervals to determine whether any data has been received or a change of status has occurred. However, polling the port consumes considerable time and resources which could usually be better used doing other things. A better alternative has often been the use of "interrupts". When using interrupts, a processor can go about performing its assigned task and then, when a I/O Port/Device needs attention as indicated by the fact that a byte has been received or status has changed, it sends an Interrupt Request (IRQ) to the processor. Once the processor receives an Interrupt Request, it finishes its current instruction, places a few things on the stack, and executes the appropriate Interrupt Service Routine (ISR) which can remove the byte from the port and place it in a buffer. Once the ISR has finished, the processor returns to where it left off. Using this method, the processor doesn't have to waste time, looking to see if the I/O Device is in need of attention, but rather the device will only service the interrupt when it needs attention. However, the use of interrupts, itself, is far less than desirable in many cases, since there can be a great deal of overhead associated with the use of interrupts. For example, each time an interrupt occurs, a computer may have to temporarily store certain data relating to the task it was previously trying to accomplish, then load data pertaining to the interrupt, and then reload the data necessary for the prior task once the interrupt is handled. Interrupts disturb time-sensitive processing. Essentially they make timing unpredictable. Obviously, it would be desirable to reduce or eliminate all of this time and resource consuming overhead. However, no prior art method has been developed which has alleviated the need for interrupts.
[0010]Conventional parallel computing usually ties a number of computers to a common data path or bus. In such an arrangement individual computers are each assigned an address. In a Beowulf cluster for example individual PC's are connected to an Ethernet by TCP/IP protocol and given an address or URL. When data or instructions are conveyed to an individual computer they are placed in a packet addressed to that computer.
[0011]Direct connection of a plurality of computers, for example by separate, single-drop buses to adjacent, neighboring computers, without a common bus over which to address the computers individually, and asynchronous operation, rather than synchronously clocked operation of a computer system, are also known in the art, as described, for example in Moore et al. (U.S. Pat. App. Pub. No. 2007/0250682 A1). Asynchronous circuits can have a speed advantage, as sequential events can proceed at their actual pace rather than in a predetermined number of clock cycles; further, asynchronous circuits can require fewer transistors to implement, and need less operating power, as only the active circuits are operating at a given moment; and still further, distribution of a single clock is not required, thus saving layout area on a microchip, which can be advantageous in single-chip and embedded system applications. A related problem is how to efficiently transfer data and instructions to individual computers in such a computer. This problem is more difficult due to the architecture of this type of computer not including separately addressable computers.
SUMMARY
[0012]Briefly, an embodiment of the present invention is a computer having its own memory such that it is capable of independent computational functions. In one embodiment of the invention a plurality of the computers, also known as nodes, cores, or processors, are arranged in an array. In another embodiment each of the computers of the array is directly connected to adjacent, neighboring computers, without a common bus over which to address the computers directly. In yet another embodiment, the array is disposed on a single microchip. In order to accomplish tasks cooperatively, the computers must pass data and/or instructions from one to another. Since all of the computers working simultaneously will typically provide much more computational power than is required by most tasks, and since whatever algorithm or method that is used to distribute the task among the several computers will almost certainly result in an uneven distribution of assignments, it is anticipated that at least some, and perhaps most, of the computers may not be actively participating in the accomplishment of the task at any given time. Therefore, it would be desirable to find a way for under-used computers to be available to assist their busier neighbors by "lending" either computational resources, memory, or both. In order that such a relationship be efficient and useful it would further be desirable that communications and interaction between neighboring computers be as quick and efficient as possible. Therefore, the present invention provides a means and method for a computer to execute instructions and/or act on data provided directly from another computer, rather than having to receive and then store the data and/or instructions prior to such action. It will be noted that this invention will also be useful for instructions that will act as an intermediary to cause a computer to "pass on" instructions or data from one other computer to yet another computer.
[0013]Still yet another aspect of the desired embodiment is that, data and instructions can be efficiently loaded and executed into individual computers and/or transferred between such computers. This can be accomplished without recourse to a common bus even when each computer is only directly connected to a limited number of neighbors.
[0014]The invention includes a stream loader process, sometimes also referred to as a port loader, for loading programs using port execution. This process can be used to send a stream of compiled object code to various nodes of a multicore processor by using the processor's port execution facility. The stream will enter through an I/O node, and then be sent through ports to other nodes. By use of this facility, programs can be sent to the RAM of any node or combination of nodes, and also the stacks and registers of nodes can be initialized so that the programs sent to the RAM do not have to contain initialization code. By suitable manipulation of instructions the stream may be sent to multiple nodes simultaneously, allowing branching and other complex stream shapes.
[0015]These and other objects and advantages of the present invention will become clear to those skilled in the art in view of the description of modes of carrying out the invention, and the industrial applicability thereof, as described herein and as illustrated in the several figures of the drawing. The objects and advantages listed are not an exhaustive list of all possible advantages of the invention. Moreover, it will be possible to practice the invention even where one or more of the intended objects and/or advantages might be absent or not required in the application.
[0016]Further, those skilled in the art will recognize that various embodiments of the present invention may achieve one or more, but not necessarily all, of the described objects and/or advantages. Accordingly, the objects and/or advantages described herein are not essential elements of the present invention, and should not be construed as limitations.
This has been found by the inventor to provide important advantages. For example, since a clock signal does not have to be distributed throughout the computer array 10, a great deal of power is saved. Furthermore, not having to distribute a clock signal eliminates many timing problems that could limit the size of the array 10 or cause other known difficulties. Also, the fact that the individual computers operate asynchronously saves a great deal of power, since each computer will use essentially no power when it is not executing instructions, since there is no clock running therein.
"0088]Similarly, while the present invention has been described primarily herein in relation to communications between computers 12 in an array 10 on a single die 14, the same principles and methods can be used, or modified for use, to accomplish other inter-device communications, such as communications between a computer 12 and its dedicated memory or between a computer 12 in an array 10 and an external device.
Read more: http://www.faqs.org/patents/app/20090300334#ixzz0ZdRue4US