NovaSystem Autogen Ollama Local LLM Bot β Final Technical Requirements Document (TRD)
1. Project Overview
The NovaSystem Autogen Ollama Local LLM Bot is a terminal-based chat system that leverages local Large Language Models via Ollama, orchestrated by AutoGen. The system must capture each user turn (input), the internal agent chain (steps), and the final bot output in a structured JSON file.
1.1 Objectives
- Local LLM Execution: Run inference without relying on external APIs.
- Agent Orchestration: AutoGen manages sub-agents (Planner, Executor, Memory).
- Metadata-Rich Logging: Store user inputs, LLM responses, chain steps, resource usage, timestamps, etc., in JSON.
- Docker Deployment: Provide a containerized environment that simplifies setup and ensures consistent execution.
2. System Scope
-
Command-Line Interface (CLI):
- The user interacts via terminal input.
- Multi-turn conversation support: user inputs a query, system responds, logs each turn.
-
AutoGen Integration:
- Multiple sub-agents coordinate to interpret requests, call the LLM, and handle data retrieval or transformations.
- Agents communicate in an agent chain; each step is captured for debugging and auditing.
-
Local LLM (Ollama):
- The system calls Ollamaβs Python library (
ollama
) to handle prompt completion or advanced tasks (embedding, structured outputs). - No external calls to remote LLMsβeverything is done locally.
- The system calls Ollamaβs Python library (
-
JSON Logging:
- Each turn is appended to a single JSON file per session.
- Detailed metadata includes CPU, memory usage, timestamps, chain-of-thought steps, model details (e.g.,
model_version
,temperature
), and any relevant environment info.
-
Docker Integration:
- A Dockerfile (and optionally a docker-compose file) ensures reproducible environment for Python, AutoGen, and Ollama.
- The container can store logs on a mounted volume so data persists after container shutdown.
3. Detailed Requirements
3.1 Functional Requirements
-
User I/O
- CLI Prompt:
>
for user entry, read until newline. - Exit Mechanism: If user types
exit
orquit
, close the session gracefully. - Error Feedback: If the user input is invalid or a fatal error occurs, display a concise error and continue or terminate safely.
- CLI Prompt:
-
Agent Chain
- Planner Agent: Interprets user intent, possibly breaks down tasks.
- Executor Agent: Calls Ollama to generate responses or handle queries.
- Memory Agent (Optional): Maintains conversation context.
- Chain-of-Thought: Each sub-agent invocation logs
agent_name
,input
,output
,timestamp
, andelapsed_time_ms
.
-
Local LLM Calls
- Ollama must be running locally (or accessible) with the desired model pre-pulled.
- Implementation can use
ollama.chat
orollama.generate
(including streaming if needed). - System retrieves text output, returning it to the user in the final response.
-
Logging
- JSON File per Session: Named
session_<uuid>_<timestamp>.json
. - Structure (each record is appended to an array):
- Synchronous Writes: For simplicity and reliability, each turn is appended right after generation.
- Closing Bracket: On user exit, write
]
to finalize the JSON array.
- JSON File per Session: Named
-
Session Management
- A session begins upon program startup, generating a
session_id
. - The conversation loop continues until the user exits.
- Upon exit, the log file is closed.
- A session begins upon program startup, generating a
3.2 Non-Functional Requirements
-
Performance
- Provide near-instant responses for short queries (subject to local LLM constraints).
- Log writing overhead should not degrade user experience. For heavy usage, consider asynchronous logs or log rotation.
-
Scalability
- Primarily for single-user CLI sessions.
- Future expansions can introduce concurrency (with additional concurrency or process management) or a server-based interface (e.g., REST API).
-
Security & Privacy
- Logs store chain-of-thought, which can reveal internal reasoning and user content. Ensure sensitive data is handled appropriately.
- Keep logs in a protected or private area on disk, especially if the conversation data is confidential.
-
Portability
- Docker-based environment: minimal friction for setting up Python,
ollama
,autogen
, etc. - The Docker image can run on any system supporting Docker (Linux, macOS, Windows with WSL2).
- Docker-based environment: minimal friction for setting up Python,
4. Technical Architecture
βββββββββββββββββββββββββββββββββββββ β User Terminal β βββββββββββββββββββββββββββββββββββββ β (1) user input βΌ βββββββββββββββββββββββββββββββββββ β CLI Driver β β (session & logging mgmt) β βββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββ β AutoGen: Core Bot Orchestrator β - Planner, Executor, Memory β - Calls Ollama βββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββ β Ollama Local LLM (Python) β β - e.g. βollama-lora-7bβ β βββββββββββββββββββββββββββββββββββ | βΌ βββββββββββββββββββββββββββββββββββ β JSON Log (session_.json) β β - Turn-based records β β - Agent chain steps β βββββββββββββββββββββββββββββββββββ
-
CLI Driver
- Maintains a loop for capturing user messages.
- Triggers the agent orchestration with each user input.
- Writes user turn and final assistant turn to JSON.
- On exit, finalizes the JSON array.
-
Agent Orchestrator (AutoGen)
- Sub-agents parse and plan user requests.
- Executor sub-agent calls
ollama.chat
orollama.generate
. - Returns a chain-of-thought (list of steps) plus final output text.
-
JSON Log
- The system appends each user/assistant message as a JSON object.
- Fields include timestamps, chain steps, resource usage, etc.
- Ensures full traceability of conversation flow.
-
Docker Environment
- Dockerfile installs Python 3.9+,
ollama
,autogen
, and related libraries (psutil
if needed). docker run
ordocker compose up
spins up the environment for immediate usage.
- Dockerfile installs Python 3.9+,
5. Implementation Roadmap
-
Setup
- Pull or build Docker image containing
ollama
andautogen
. - (Optional) Pre-pull specific LLM models (
ollama pull llama3.2
).
- Pull or build Docker image containing
-
Code Structure
main.py
:- Creates a log file (
session_<uuid>_<timestamp>.json
) and writes[
to start. - Enters a CLI loop.
- On each user input, calls a function (e.g.,
handle_turn
) that orchestrates sub-agents + logging. - On exit, writes
]
and closes the file.
- Creates a log file (
core_bot.py
:- Contains
run_agent_chain(user_text, session_state)
for Planner + Executor flow. - Wraps
ollama.chat
calls for local LLM usage.
- Contains
logging_utils.py
:- Manages writing JSON records (with turn-level metadata, chain steps).
-
Testing
- Unit Tests:
- Validate JSON logging (structure, file correctness).
- Mock agent calls to confirm chain-of-thought capture.
- Integration Tests:
- Full conversation with sample prompts.
- Check Docker container starts, model is accessible, logs are properly generated.
- Unit Tests:
-
Deployment & Maintenance
- Docker:
- Provide a
Dockerfile
and optionaldocker-compose.yml
. - Document volumes or paths for logs.
- Provide a
- Version Control:
- Tag code versions and link them to Docker image versions.
- Docker:
6. Potential Extensions
- Multi-User or Server Mode
- Convert the CLI to a web server (Flask/FastAPI).
- Async Execution
- Use Python async capabilities or queue-based concurrency for high-throughput usage.
- Advanced Tools
- Integrate other local scripts or APIs the Executor can call.
- Memory Persistence
- Store conversation context across sessions in a local database or file-based memory.
- Model Management
- Programmatically handle model pulling, versioning, or switching to different LLMs.
7. Acceptance Criteria
-
Functionality
- The user can run
docker run ...
ordocker compose up
. - Type a prompt, see a coherent response, and view the entire conversation in JSON.
- The user can run
-
Logging Completeness
- Each turn has a unique ID, timestamp, role, content, and chain steps.
- Metadata includes resource usage (CPU, MEM),
model_name
,model_version
, andsession_id
.
-
Robustness
- No crashes on typical usage.
- Graceful exit, properly closing JSON array.
- Error messages if Docker or Ollama is misconfigured.
-
Documentation
- README with instructions (setup, usage, environment variables).
- Clear references for accessing logs and verifying local LLM usage.
Conclusion
These final sections outline precisely how the NovaSystem Autogen Ollama Local LLM Bot should be architected, deployed, and tested. By adhering to this Technical Requirements Document, the system will deliver a well-structured, traceable, and maintainable solutionβharnessing local LLM capabilities while capturing every step of user-bot interaction.