Jotunn - Generative AI Platform

Overview

The "Memory Wall" was first conceived as a theory by Wulf and McKee in 1994. It posited that the development of the processing unit (CPU) far outpaced that of the memory. As a result the rate at which the data can be transferred to and from memory will force the processor to wait until the data is available for the processing.

In traditional architectures the problem has been mitigated by a hierarchical memory structure built around multiple levels of cache staging the data to minimize the amount of traffic to the main memory or to the external memory.

The recently introduced Generative AI (for example, ChatGPT, DALL-E, Diffusion, etc.) dramatically expanded the amount of parameters necessary for performing the task at hand. For example, GPT-3.5 requires 175 billion parameters and GPT-4, launched in April 2023, supposedly requires almost 2 trillion parameters. All of these parameters needs to be accessed during inference or training, posing a problem as traditional systems are not designed to handle such vast amount of data without resorting to the traditional hierarchical memory model. Unfortunately the more levels needed to be traversed to read or store the data the longer time it takes. As a result the processing elements will be forced to wait longer and longer for data to process, lengthening the latency and dropping the implementation efficiency.

Recent findings show that the efficiency running GPT-4, the most recent GPT algorithm, drops to around 3%. That is, the very expensive hardware designed to run these algorithms sits idle 97% of the time!

The flip-side is that the amount of hardware required to reach reasonable compute numbers will be staggering. In July 2023, EE Times reported that Inflection is planning to use 22,000 Nvidia H100 GPUs in their supercomputer, an investment of ~$800M. Assuming an average power consumption of 500Watts per H100, the total power draw would be an astounding 11 MWh!

Based on a fundamental new architecture, Jotunn allows data to be fed to the processing units 100% of the time regardless of the number of compute elements. Algorithm efficiencies, even for the large models like GPT-4, will exceed 50%. Jotunn4 will significantly outperform anything that is currently on the market!

Please sign in to view full IP description :

业务合作

成为合作伙伴

广告发布

访问我们的广告选项

诚邀广告发布

添加产品

供应商免费录入产品信息

公布产品信息

点击此处了解更多关于D&R的隐私政策

本网站的任何部分未经Design&Reuse许可，
不得复制，重发，转载或以其他方式使用。