An AI Startup Looks Toward the Post-Transformer Era

Most of the worries about an AI bubble involve investments in businesses that built their large language models and other forms of generative AI on the concept of the transformer, an innovative type of neural network that eight years ago laid the foundations for the current boom.

But behind the scenes, artificial-intelligence researchers are pushing into new approaches that could pack an even bigger payoff.

One early-stage startup developing a transformer alternative, Palo Alto, Calif.-based Pathway, plans to announce Monday that its “Dragon Hatchling” architecture now runs on Nvidia AI infrastructure and Amazon Web Services’ cloud and AI tech stack.

The company has shipped Dragon Hatchling architecture, but doesn’t plan to release the commercial models trained on it until next year. Once that happens, its Nvidia and AWS compatibility means companies would be able to put it into production “the next day,” Pathway said.

Dragon Hatchling imbues AI with memory that large language models can’t match, according to Pathway, theoretically enabling a new class of continuously learning, adaptive AI systems. The company also casts its approach as a potentially faster way to get to artificial general intelligence, which some people describe as similar to human-level cognitive ability.

The company isn’t alone in this quest. It regards large and well-established Anthropic as its biggest obstacle. It faces other challenges, too, such as convincing potential users who have just learned one set of AI vocabulary and skills that they should adopt something new.

Regardless of whether Pathway fulfills its ambitions, it will at least get a chance to make its case to the market. Its arrival also reinforces the intense scientific effort driving AI forward, even as big deals, big valuations and big personalities command the attention.

‘Equations of reasoning’

“This is just fun, right?” said Zuzanna Stamirowska, co-founder and chief executive officer at Pathway, when I met with her and another member of the team at Wall Street Journal headquarters in November. She was enthusing about Pathway’s approach, likening it to scientists’ discovery of thermodynamics, which accelerated the Industrial Revolution by shifting society from simply building engines to understanding the laws of heat and energy that govern them.

Pathway has identified what Stamirowska calls equations of reasoning, fundamental mathematical axioms that explain how intelligence emerges from smaller, local interactions in the brain, she said. That means it can explain how and why intelligence works, rather than just observing that it does, which has been a struggle with transformer-based models.

That also helps Pathway address large language models’ typical limits when it comes to building on previous interactions, by strengthening or weakening synapses over time according to their use, said Stamirowska, who holds a Ph.D. in complex systems and has published research on emergent behavior in dynamic networks. She has also received France’s i-Lab innovation prize and been called one of “100 geniuses whose innovation will change the world” by the magazine Le Point.

“Memory is key to intelligence and efficient reasoning,” Stamirowska said.

In the transformer, short-term memory and long-term memory are organized in an incompatible manner, with no clear way to transfer from short-term memory to long-term memory, according to Stamirowska. “It is not just a technicality. It is a foundational obstacle,” she said.

Pathway’s architecture organizes short-term memory very differently than the transformer, with an update mechanism that resembles what is found in the brain, and, crucially, has the same storage pattern as long-term memory, according to Stamirowska. “This opens the door to lifelong learning with transfer from short- to long-term memory, and moving smoothly to longer reasoning,” she said.

The company was founded in 2020 by Stamirowska, Chief Operating Officer Claire Nouet, Chief Scientific Officer Adrian Kosowski and Chief Technology Officer and Google Brain veteran Jan Chorowski. The 26-member team includes eight Ph.D.s, including Kosowski, a theoretical computer scientist, mathematician and quantum physicist who received his doctorate at age 20.

The company said it has raised more than $20 million, including more than $16.2 million in venture funding and about $3.8 million in non-dilutive research and development grants. Backers include Lukasz Kaiser, one of eight Google researchers who kicked off the transformer era in 2017 with the paper “Attention Is All You Need,” and early-stage investor TQ Ventures. The company declined to disclose its valuation.

Stamirowska said the name of Pathway’s architecture is inspired by the dragons in Terry Pratchett’s novel “Color of Magic,” which appear more frequently as the characters think about them. “For now, we have presented to the world an architecture, hence it’s a hatchling,” she said.

Accelerating innovation

The company expects the architecture to have broad applications in solving problems in business, finance and beyond.

Pathway Chief Commercial Officer Victor Szczerba distinguishes between “commodity” AI tasks such as approving a customer discount and more demanding projects such as end-of-quarter financial planning. “This process spans eight weeks, involves coordination across 10 departments, and requires maintaining context over a long period,” Szczerba said. “Pathway’s architecture is designed to handle this complexity by remembering sequences and consequences over time, rather than resetting with every interaction.”

The technology could be useful in solving complex supply chain variability. For example, a steel manufacturer faced with a sudden shortage of tungsten could apply Pathway’s framework that can learn from limited amounts of private data, without exposing that data to the world. Other potential applications exist in areas such as fusion research, space exploration and optimization of global trading networks, according to Stamirowska.

In all of these examples, the key is a need for real innovation. To come up with a new spaceship design, an AI model can’t just access lots of data on other spaceships and learn. It requires a model capable of generalizing or learning to reason, rather than pattern matching.

“We will speed up innovation cycles dramatically,” Stamirowska said. “The problem with current transformer-based models is that they need a lot of data, and they don’t generalize outside….of what they have seen.”

While a bubble may have formed around the concept of the large language model, that need not be true of AI itself, according to Martín Farach-Colton, chair of the Computer Science and Engineering Department at NYU Tandon School of Engineering. LLMs face limits when it comes to three areas: determining how models arrive at their answers; their ability to generalize plans beyond the criteria of data they are trained on; and multimodality, which is the ability to process text, images, video and spatial reasoning simultaneously. However, considerable effort is being made in addressing these shortcomings, especially the third.

Pathway’s architecture could help address the first two problems, according to Farach-Colton, who said he has known some Pathway team members professionally but has no financial or commercial ties to the company.

“The market may be overvaluing the current iteration of the technology [LLMs] while potentially underestimating or misunderstanding the necessity of the next architectural leap,” he said.

Write to Steven Rosenbush at steven.rosenbush@wsj.com

An AI Startup Looks Toward the Post-Transformer Era

‘Equations of reasoning’

Accelerating innovation

Is communication from the Moon to Earth possible even when the Moon is on the far side?

‘Monster: The Ed Gein Story’ Co-Showrunner Talks Horror’s Impact on Humanity and Finding Empathy, Answers Burning Questions

Fusion Reactor Hibernation: 3-Year Pause After Upgrade

Tinggalkan Balasan Batalkan balasan

‘Equations of reasoning’

Accelerating innovation

Related Posts

Tinggalkan Balasan Batalkan balasan