The Evolution of AI Autonomy: Unlocking the Power of Agents in the Post-ChatGPT Era

4 min readJun 20, 2023

1. Killer apps of post GPT-3 AI wave:
- Generative Text for writing — Jasper AI going 0 to $75m ARR in 2 years
- Generative Art for non-artists — Midjourney/Stable Diffusion Multiverses
- Copilot for knowledge workers — both GitHub’s Copilot X and “Copilot for X”
- Conversational AI UX — ChatGPT / Bing Chat, with a long tail of Doc QA startups

2. Introduction to Auto-GPT and BabyAGI:
- Independently developed Python projects
- Catch enormous popularity
- Do not involve foundation model training or any deep ML innovation
- Demonstrate viability of applying existing LLM APIs (GPT3, 4, or any of the alternatives) and reasoning/tool selection prompt patterns in an infinite loop
- Can accomplish high level goal set by a human user

3. Difference between BabyAGI and Auto-GPT:
- BabyAGI is intentionally small, adding and stripping out LangChain
- Initial code being less than 150 lines and 10 env vars
- Auto-GPT is extremely expansive (7300 LOC)
- Ability to clone GitHub repos, start other agents, speak, send tweets, and generate images, with 50 env vars to support every vector database and LLM provider/Text to Image model/Browser

4. Leading AI figures’ opinions:
- Andrej Karpathy calling AutoGPTs the “next frontier of prompt engineering”
- Eliezer Yudkowsky approvingly observing BabyAGI’s refusal to turn the world into paperclips even when prompted

5. Key Autonomy Capabilities arranged in rough chronological order:
- Foundation models
- Metacognition (self improvement of pure reasoning)
- External Memory (reading from mostly static external data)
- Browser Automation (sandboxed read-and-write in a browser)
- Tool making and Tool use (server-side, hooked up to everything)

6. Multi-modal and Self-Learning Approaches:
- Models calling other models with capabilities they didn’t have were also being explored, with HuggingGPT/Microsoft JARVIS and VisualChatGPT
- Self-Learning Agent for Performing APIs (SLAPA) searches for API documentation to teach itself HOW to use tools, not just WHEN
- Other semi-stealth mode startups that may be worth exploring in this zone are Fixie AI and Alex Minion AI

7. Full vision laid out by John McDonnell:
- Net new thing seen in this most recent capability spurt is in the 4 agents

8. Capability 1 + 2: Context and Task Creation Agents:
- The “context agent” could be a much smarter version of the data augmented retrieval that both LlamaIndex and Langchain are working on.
- The “task creation agent”, creates tasks, but must not hallucinate and must self criticize and learn from previous tasks.

9. Capability 3 + 4: Execution Agents:
- The “execution agent” calls OpenAI, or any other foundation model, and could optionally make or use any provided tools to accomplish a task.
- Autonomous agents will be expected to plan further and further ahead, prioritizing task lists, reflecting on mistakes and keeping all relevant context in memory.

10. Capability 5: Planning, Reflexion, and Prioritization:
- Shinn et al (2023) showed that Reflexion — an autonomous agent with dynamic memory and self-reflection, could dramatically improve on GPT-4 benchmarks.
- Regardless of use case, autonomous agents will be expected to plan further and further ahead, prioritizing task lists, reflecting on mistakes and keeping all relevant context in memory.

11. Automation and Autonomy:
- The one currency we all never have enough of is time, and the ability to obsolete human effort, whether by clever system design, hiring someone else, or programming a machine, both frees up our time and increases our ability to scale up our output by just doing more in parallel.
- Full autonomy with as much trust and reliability as possible is the ultimate goal here.

12. Flavors of Autonomous AI:
- The Jobs School: AI Agents that augment your agency, as “bicycles for your mind”
- The Zuck School: AI Algorithms that replace your agency, hijacking your mind

13. Observations:
- Neither Babyagi nor AutoGPT use LangChain.
- Backend GPT has not led anywhere. why?
- All the open source winners are new to open source lol? is “open source experience” that valuable?

14. Prediction:
- There will be “AI Agent platforms” with tools all enabled.
- There will be “AI Agent fleets” especially if “idempotent”, readonly.
- There will be “AI social networks” subreddit simulator.

15. Challenges and Potential Solutions:
- Stepping on each other will need to do DAGs spawn on command Actor model / Agent Oriented Programming? 5th agent — reflection? metalearning?
- We don’t know how to prioritize humans, and you want to eval bot ability to prioritize? good luck must solve prompt injection (hey @simon) will probably need to self improve statefully principal agent problem.

16. Theory of Value of Software:
- Understanding what makes some lines of code more valuable than others can help in investing time, money, and creativity in more rewarding directions.
- Software value drivers include Demand and Supply Aggregators, Production-ready frameworks, Shadow IT, Systems of record, and replacing people and manual processes.

You can Follow Laika AI on all social networks to never miss an update by Team.

[Disclaimer: This article is for informational purposes only and does not constitute financial or investment advice.]

Refrences — Latent.Space

The Evolution of AI Autonomy: Unlocking the Power of Agents in the Post-ChatGPT Era

Written by LAIKA AI

No responses yet