AI race
Current issues with gen ai
I’ve recently started working in the Gen AI space and am really loving the optimism. But to some degree, I find the hype a little overwhelming. I attended several Google Developer Network events, AWS events, startup events, and a host of other conference calls around Asia. It was surprising to me that AI brings out very polarizing opinions from several well-known speakers. Some call it the end of the world, while others see it as a productivity boon that will usher in an age of prosperity and sufficiency.
I took a closer look at the large language models (LLMs) of today and noted some key facts and issues that organizations and their engineering teams often overlook while talking about the much-touted productivity:
LLMs are just tools to predict the next word and require significant manual effort to engineer into usable applications. There is no superhuman intelligence underneath. For example, embedding an LLM in a customer support workflow often requires custom logic for context retention, handling out-of-scope queries, and escalation paths.
LLMs are generally embedded into very data-intensive and sensitive applications. These applications need careful tuning with a huge amount of clean and relevant data, which is always a challenge to procure and maintain. A key challenge I encountered was integrating sensitive healthcare data into an AI chatbot, where compliance with HIPAA and GDPR required extensive anonymization and redaction processes.
AI applications need a lot of GPUs to function, and they are very costly to purchase and run. Not a lot of companies can afford them, so they survive on APIs provided by cloud providers or OpenAI. At Google, I’ve seen small startups rely heavily on Vertex AI for GPU access, often limiting their scale due to rising API costs. This creates an ecosystem where only well-funded companies can afford to innovate at scale.
The recurring infrastructure (servers) cost of running Gen AI applications is often more than the users’ subscription fees. Even when companies want to subsidize Gen AI feature development, it often ends up being unaffordable for users. For instance, a Gen AI-powered analytics platform I worked with spent 70% of its revenue on infrastructure costs, forcing them to redesign their pricing and architecture.
Despite being around for several years, LLMs still critically suffer from hallucinations and accuracy issues. In one case, an LLM incorrectly summarized a financial report, leading to misinformation being propagated in decision-making.
Any specialized Gen AI use case needs grounding with business data, usually powered by vector or embedding search over the data stored somewhere (preferably a database). Vector search operations are more compute-intensive than traditional word-based search. For example, I worked on a LangChain integration with AlloyDB that used Postgres vector stores for embeddings. The performance gains were significant, but optimizing for latency required careful batching and indexing strategies.
Agentic workflows are slow and unreliable. Often, key issues remain within discussions among agents. Developers feel like fixing one last issue will get the tool working end-to-end, but new users often break the entire flow. In a recent demo, I showcased an agent-based system for automating customer onboarding. While the agents performed well in controlled scenarios, unexpected user inputs frequently caused breakdowns, necessitating fallback mechanisms.
As of today, AI tools need AI engineers to function. AutoML solutions are bridging the gap for non-technical users, but deploying robust, scalable AI systems still demands expertise in data pipelines, model optimization, and monitoring.
While these challenges are significant, they also represent opportunities for innovation and collaboration in the developer community. By addressing these issues—through better tools, cost optimization, and improved reliability—we can unlock AI’s full potential to drive meaningful change.
TECH
AI Efficiency Performance