How to scale the optical interconnect using Co-Packaged Optics (CPO)
posted on
Aug 31, 2024 04:56PM
Optics are becoming increasingly important for both front-end and back-end networks for AI clusters. As communication bandwidth increases, CPO paves the path forward for optical links in these clusters.
GPU clusters used to train large language models require a high degree of parallel processing generating a renewed interest in optical interconnect scale and innovation. Today’s AI clusters contain tens of thousands of GPUs as companies hope to build clusters of over one million GPUs before the end of the decade. Because the cluster spans many rows of server racks, optical interconnects that enable GPU to GPU communication through a switch fabric are growing rapidly in volume. But even with the recent growth in scale-out cluster volume for back-end networks, most GPU bandwidth is still routed locally through electrical links. Future migrations to larger scale-up domains (e.g. hundreds of GPUs) could increase optical bandwidth requirements per GPU by an additional order of magnitude. Broadcom has invested in co-packaged optics to scale the optical interconnect into the age of AI. I will focus on two areas of scale in this post: integration and manufacturing.
Integration is at the heart of CPO, and has to be done right.
The first image below visually summarizes our approach to high density photonic integration for CPO. Charlie Kawwas, Broadcom’s Semiconductor Solutions Group President, compares the quantity of (128) 400G optical modules required to fully populate a 51.2 Tbps switch to Broadcom’s Bailly 51.2T equivalent CPO solution. All of the 128 optical modules (denoted by the blue tabs) surrounding the table collapse into eight 6.4 Tbps optical engines co-packaged on a common substrate with a 51.2 Tbps Tomahawk® 5 switch ASIC.
The second image offers a close-up of our TH5-Bailly 51.2T CPO device. Each of the 6.4 Tbps Bailly optical engines contains hundreds of photonic components, delivering an order of magnitude increase in integration density compared to silicon photonics used in traditional pluggable transceivers. As a point of reference, we can offer 6.4 Tbps for just double the silicon area of our 400 Gbps photonic integrated circuit (PIC). That is an 8x improvement in silicon area efficiency with positive implications on cost, power, and shoreline bandwidth density. CPO offers a compelling solution to match the growing GPU I/O bandwidth requirement with equivalent beachfront optical interconnect density.
One of the key challenges to solve as we take CPO mainstream is the ability to manufacture at scale. Innovations in manufacturing become as important as the technical innovations that integrate high-bandwidth optics with switches.
To meet the evolving interconnect demands highlighted above, Broadcom has also invested for scale in CPO manufacturing automation. Traditional pluggable transceivers are riddled with unpredictable quality and reliability due to manual assembly and test processes prevalent throughout the industry. At Broadcom, we have emphasized selecting best-in-class silicon manufacturing processes when available. If standard manufacturing processes and tools are unavailable, we build our own automation. Please watch the following video which gives a sneak peek into our end-to-end automated manufacturing process. From PIC and EIC fabrication, to wafer test, chip-to-wafer bonding, optical component attach, and CPO assembly and test, we are working to minimize sources of variation due to manual handling. We hope this video will provide evidence of Broadcom’s commitment to building and shipping CPO in high volume, not just to test the market and determine if the technology sticks.
https://www.broadcom.com/blog/how-to-scale-the-optical-interconnect-using-co-packaged-optics-cpo