TinyAgent: Function Calling at the Edge
The ability of Large Language Models (LLMs) to execute commands through plain language has paved the way for agentic systems that can complete user queries by orchestrating the right tools. Recent advancements in multi-modal models like GPT-4o and Gemini-1.5 have further expanded the capabilities of AI agents. However, the large size and computational requirements of these models often necessitate cloud inference, posing challenges for widespread adoption.
Uploading data to the cloud can raise privacy concerns, and the need for constant cloud connectivity may not always be feasible, especially in real-world scenarios. Latency issues can also arise, affecting response times. To address these challenges, deploying LLM models locally at the edge has emerged as a viable solution.
Can Smaller Language Models Recreate Emergent Abilities?
The question arises: can smaller language models emulate the emergent abilities of larger models while reducing computational footprint and enabling edge deployment? Our research aims to explore this possibility through training small models with specialized, high-quality data without the need for generic world knowledge memorization.
Developing Small Language Models for Complex Reasoning
Our study focuses on developing Small Language Models (SLMs) capable of complex reasoning for secure and private edge deployment. By curating specialized data and fine-tuning models, we aim to enhance function calling capabilities for agentic systems.
Enhancing Model Performance
We have successfully improved the performance of off-the-shelf SLMs through curated data and fine-tuning. Our TinyAgent models have shown promising results, outperforming larger models in function calling tasks and achieving high success rates.
Efficient Deployment at the Edge
To enable efficient deployment at the edge, we have implemented quantization techniques to reduce model footprint and latency. Quantized models have shown significant improvements in both resource consumption and performance.
Conclusion
The development of TinyAgent showcases the potential of small language models in powering semantic systems for user query processing. Through a systematic approach of data curation, model fine-tuning, and optimization for edge deployment, we have demonstrated the feasibility of deploying efficient and accurate models locally.
Acknowledgements
We extend our gratitude to Apple and Microsoft for their support in this project. Special thanks to our collaborators and sponsors for their insights and contributions. The conclusions drawn in this study are independent of the sponsors’ views.