KODIFY-Scaling AI with Confidence: Our Real-World Experience with Orq.ai

Scaling AI with Confidence: Our Real-World Experience with Orq.ai

4 April 2025

We’ve had the pleasure of working with the Orq.ai platform for some of our recent AI projects. Our experience has been highly positive, and in this article, I’ll share how Orq has helped us accelerate our AI product development lifecycles.

Orq.ai is an all-in-one solution for managing the lifecycle of LLM apps. It simplifies tasks like testing AI models, fine-tuning responses, and rolling out updates without requiring deep technical expertise. With Orq, we delivered business value faster and more efficiently than before.

Model Garden

The backbone of Orq.ai is the ‘Model Garden’, offering a selection of proprietary and open-source LLMs from providers like OpenAI, Azure, AWS, and more. You can also add your own Azure-hosted models. A 25-50% surcharge applies for pre-populated models, but for development, the immediate access is worth it.

Prompt Construction

The Prompt Snippet and Prompt modules allow prompt components to be structured as variables and assembled in multiple ways. These prompts can then be tested in the Playground or Experiment modules or deployed to production environments.

Playground vs Experiment

The Playground enables one-off prompt tests with tunable parameters—ideal for quick experimentation. For scale, the Experiment module lets you test multiple setups and Evaluators across various models with results visualized in a heatmap.

RAG and Function Calling

Orq’s RAG (Knowledge Base) module lets you upload documents, clean up retrieved chunks, and load them into a vector database. However, it currently lacks custom retrieval logic and web scraping capabilities.

The Orq API makes it possible to build an external RAG system and feed the results into the Orq pipeline.

Evaluators

Evaluators, such as Ragas and LLM-as-a-judge tools, are powerful for quality control. These can be integrated into pipelines, act as guardrails, or run batch evaluations. Custom evaluators are also supported.

Evaluator results are shown as scores in a heatmap, and responses can be added to new datasets directly from the UI.

Tracing

Orq recently introduced ’tracing’ to track LLM step-by-step decision-making. This feature enhances prompt and model tuning through deeper observability.

Team Collaboration with Orq.ai

A typical AI workflow in Orq:

Data Scientist selects outliers and creates a dataset.
Subject Matter Expert reviews and updates the dataset.
Prompt Engineer iterates on prompts.
Once verified, prompts are added to Deployment.

This seamless flow eliminates data transformation issues between tools and roles.

Deployment

Deployments manage version control, prompt flows, and routing. Logs and evaluations help monitor output accuracy. Logged responses can be sent to datasets for further analysis with one click.

Optimizing LLM Pipelines with Orq.ai

Workflow overview:

Logs are reviewed for outliers.
Responses are curated into datasets.
Prompt Engineer experiments with new prompts.
Data Scientist runs evaluators.
Accepted configurations are deployed.

Our Use Cases

We used Orq for prompt engineering, building conversational agents, and public content evaluation. Our key takeaways:

Pros

Streamlined Collaboration: Shared platform for all roles.
Scalable Architecture: Works with open-source and proprietary LLMs. Easy API integration.
Reliable Evaluation Tools: Heatmaps and evaluators ensured high-quality results.

Cons

Cost at Scale: Model markup is noticeable in production.
RAG Limitations: No custom retrieval or web scraping.
UX for Non-Tech Users: Less accessible for SMEs and non-developers.

Conclusion

Orq.ai’s modular structure accelerates collaboration and AI development. Despite some limitations, it proved invaluable in speeding up our workflows and enhancing quality. We look forward to continued collaboration with Orq to shape features that fit real-world needs.

Wouter Sligter – Principal AI Consultant @ KODIFY

Wouter designs and develops advanced conversational and generative AI solutions. He’s worked in AI for over 7 years.

Comment from Sohrab Hosseini, CEO of Orq.ai:

“Correct, web scraping for RAG will be added in the next few weeks.”
“The article is factually sound. We take your feedback very seriously and will work hard to improve the issues you found.”

Accelerating AI App Development with Orq.ai

Scaling AI with Confidence: Our Real-World Experience with Orq.ai

Scaling AI with Confidence: Our Real-World Experience with Orq.ai

Model Garden

Prompt Construction

Playground vs Experiment

RAG and Function Calling

Evaluators

Tracing

Team Collaboration with Orq.ai

Deployment

Optimizing LLM Pipelines with Orq.ai

Our Use Cases

Pros

Cons

Conclusion

Google Anaytics