https://promptfoo.dev/ logo
Join Discord
Powered by
# ❓|questions
  • Automatic trace injection to promptfoo from coding assistants and not API?
    p

    pelopo

    10/23/2025, 1:48 PM
    Aye, I was wondering if there is any automagical way to send to promptfoo traces or conversations that I have with the likes of Codex, Claude Code, Amp or some other coding assistants that use subscription model instead of and API. Something that would listen real time and make it go to promptfoo for me to evaluate on each turn basis? Maybe some third party tool? Thanks
    d
    • 2
    • 1
  • Evaled or not tick in the results table.
    p

    pelopo

    10/23/2025, 2:15 PM
    In the Results table in the UI, I don't see any obvious way to tell which runs were evaluated and which are not. I have to click each line to see whether I gave it a thumbs up, thumbs down, or left a comment, etc. It would be nice to have a column with a tick if any evaluation criteria were used, making it easy to see what needs work and what doesn't. For example, in the attached screenshot there are 4 runs and I have touched all of them, but only the one with the red percentage clearly shows I evaluated it negatively. The other 3 with 100% in green were also evaluated, but it's unclear whether they were, because I left comments and ratings. https://cdn.discordapp.com/attachments/1430922506558247012/1430922507095113989/image.png?ex=68fb89ee&is=68fa386e&hm=cdd74783b89bd32fbf2f26a723402fd311f7a870ccd5dd2a33bba39a87c858ce&
  • How to test dynamic multi-turn conversations in Promptfoo?
    đ

    Đức Duy

    10/23/2025, 3:28 PM
    Hi everyone 👋 I’m testing a medical chatbot agent that starts from a symptom (e.g. “I have stomach pain”), then asks several related questions, and finally recommends a suitable clinic. The problem is that each test run may have different question wording or order, so I can’t predefine all user inputs in advance. I’d like to dynamically provide user replies based on the agent’s last question — for example, if the agent asks about pain_location, I return the predefined answer for that property. Is there any recommended way in Promptfoo to handle this kind of dynamic multi-turn input-output flow? Thanks!
    e
    m
    • 3
    • 3
  • Environment variable substitution only working some places
    c

    crizm

    10/23/2025, 11:57 PM
    I'm trying to use environment variables defined in a .env file to specify a default provider for llm-rubric: defaultTest: options: provider: id: "azure:chat:{{ env.MY_DEPLOYMENT}}" config: apiVersion: "{{ env.API_VERSION }}" apiHost: "{{ env.AZURE_ENDPOINT }}" For some reason, only env.MY_DEPLOYMENT gets replaced. "{{ env.AZURE_ENDPOINT }}" does not (nor does API_VERSION, and there doesn't appear to be a way to affect that through preset environment variables) and results in a "Failed to sanitize URL" error. Any idea what's wrong here?
    m
    • 2
    • 1
  • OpenRouter - API error: 401 // message: No auth credentials found
    h

    haveles

    10/24/2025, 2:39 PM
    Hi all, I'm encountering persistent 401 Unauthorized errors when trying to use OpenRouter providers in my self evaluation and model comparison configs, despite having a working API key and successful direct API calls. Error Details: [ERROR] API error: 401 Unauthorized {"error":{"message":"No auth credentials found","code":401}} What's Working: OpenRouter API key works perfectly with direct curl calls Successfully configured and ran deterministic A/B testing for 3 LLMs using OpenRouter Environment variable OPENROUTER_API_KEY is properly set Current Configuration (that works for A/B testing): providers: - id: openrouter:anthropic/claude-3.5-sonnet config: temperature: 0.0 max_tokens: 2000 apiKey: ${OPENROUTER_API_KEY} What's Failing: Self-grading config with identical provider setup Model comparison config with identical provider setup All attempts result in 401 errors Attempted Fixes: Variable syntax variations: ${OPENROUTER_API_KEY}, "{{ env.OPENROUTER_API_KEY }}" Provider ID variations: different model names and versions Configuration approaches: Direct OpenRouter, OpenAI with custom base URL, Anthropic with custom base URL Environment handling: shell variables, --var flag, --env-file flag Removed llm-rubric assertions in attempt to fix authentication issues System Info: Promptfoo version: 0.118.17 OS: macOS Any insights on what might be causing this inconsistent behavior would be greatly appreciated!
    m
    • 2
    • 2
  • Clarification regarding Red Team configuration
    t

    tanktg

    10/28/2025, 12:10 PM
    Hi all, I am working for a cybersecurity service provider and we would like to use Promptfoo to test LLM applications of our customers. Data privacy is of major importance to us, and we therefore don't want to send any data or requests of any sort to PromptFoo's cloud services. In practice, this means that adversarial input generation, response evaluation and grading of attacks should all happen in our systems, and that all telemetry should be disabled. Looking at the documentation (https://www.promptfoo.dev/docs/red-team/configuration/#how-attacks-are-generated), we have several questions regarding the correct configuration to use: Will setting the PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION env var to true prevent adversarial input generation requests to be sent to promptFoo's API, while allowing us to use our own remote LLM deployed in our cloud environment? Or should we specify our own attacker model provider in the config file, while leaving PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION to its default value, false? Additionally, I understand that it is possible to override the default grader by specifying a custom one in the config file: https://www.promptfoo.dev/docs/red-team/troubleshooting/grading-results/#overriding-the-grader. Will making those two configuration changes (specifying a custom attacker model provider, and a custom grader) be enough to ensure that no data (including telemetry of usage data) is ever sent to promptFoo's services? If not, what additional configuration is needed to achieve this? Thanks
  • How to hook context in YAML?
    a

    Alex1990

    10/28/2025, 3:45 PM
    Hi, everyone. I spent around 3-4 hours to understand how dynamic context works, but whatever I did, every time I got an error. I connected to my own RAG using custom call_api
    Copy code
    def call_api(prompt, options=None, context=None):
    .................. some logic......
        data = response.json()
        contexts = [source.get('content', '')
                    for source in data.get('sources', [])]
    
        return {
            "output": data.get('content', ''),
            "context": context_text
        }
    and part of YAML for this metric
    Copy code
    assert:
          - type: context-relevance
            contextTransform: context
            value: ''
    But when I tried to catch this context field from the RAG response, I got an error below Whatever I did, I tried to use a string or array, just context or output.context, every time I had an error
    Copy code
    Error: Failed to transform context using expression 'context': Invariant failed: contextTransform must return a string or array of strings. Got object. Check your transform expression: context
    
    Error: Failed to transform context using expression 'context': Invariant failed: contextTransform must return a string or array of strings. Got object. Check your transform expression: context
        at resolveContext (/Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/contextUtils.js:60:19)
        at async handleContextRelevance (/Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/contextRelevance.js:23:21)
        at async runAssertion (/Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/index.js:353:24)
        at async /Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/index.js:400:24
    In documentation, it looks pretty simple, but look like it doesn't work correctly https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/context-relevance/ Any suggestions, how I can handle that? https://cdn.discordapp.com/attachments/1432757147405651968/1432757147648786584/image.png?ex=69023693&is=6900e513&hm=e3561b5fac664cff41e9131fc0c4327ce0fa1634c74a9240f06dff1d91c6ffb1&
    w
    • 2
    • 1
  • _conversation / previous messages for Simulated User and Assistant
    e

    Elias_M2M

    10/29/2025, 9:38 AM
    Hello, I would like to test a multi-turn conversation between an assistant and a simulated user. The prescribed conversation flow of the assistant is very long and for my current test cases I just need to test the end of the conversation. For these tests, the previous messages are very important, so the simulated user and the assistant need to know what "they" said before. I saw in the docs, that there is an option of adding a variable "messages" or "_conversation", but I don't know how this is behaving with the simulated user provider. Is it possible the define the previous messages for both the assistant and the simulated user, so they know where to continue the conversation? And how can I do this?
    • 1
    • 1
  • prompts generation only
    b

    b00l_

    10/29/2025, 2:36 PM
    hello, I have a redteam.yaml file with a bunch of plugins enabled, is it possible to just generate prompts, and save them in a file based on all plugins enabled? can I do it local only and even with openai key?
    u
    • 2
    • 3
  • How to add dynamic prompt with multiple placeholders inside promptfooconfig?
    c

    curious_battle

    11/04/2025, 4:50 AM
    My prompt looks like {"role":"system", "content": < {company}, {company_description}, ... , {previous_context},} then again repeated at user level with a few placeholders How can I use this reliably inside promptfooconfig with variables separately from another file such that the prompt gets build up completely and then we can test against user_input, currently the prompt part allows prompt with placeholders but no support for passing placeholders variables values, and input csv only allows a single column input ??
    m
    • 2
    • 2
  • Retrying tests
    t

    thomas_romas

    11/04/2025, 1:04 PM
    I am running a basic redteaming evaluation with some plugins enabled. Nothing crazy, I am just trying to see how it works. I pointed promptfoo at my Azure OpenAI model. The evaluation doesn't finish for multiple hours and is stuck at the following output
    Copy code
    ...
    Chunk 72 partial success: 24/8 test cases, retrying missing ones individually
    ...
    The chunk number increases. I am not sure what it means. Is there anything I can do to omit the retrying or at least see partial results of the evaluation?
    u
    • 2
    • 1
  • Any advice for really long-running models like GPT-5-pro?
    c

    CasetextJake

    11/04/2025, 9:30 PM
    I'd like to run some evals with GPT-5-pro, and I'd say usually 50% of them error out. I get a variety of errors: API call error: Error: Request failed after 4 retries: TypeError: fetch failed (Cause: Error: getaddrinfo ENOTFOUND api.openai.com) API call error: Error: Request failed after 4 retries: Error: Request timed out after 300000 ms API call error: Error: Error parsing response from https://api.openai.com/v1/responses: Unexpected token '<', " <h"... is not valid JSON. Received text: 502 Bad Gateway 502 Bad Gateway cloudflare API call error: Error: Error parsing response from https://api.openai.com/v1/responses: Unexpected token 'u', "upstream c"... is not valid JSON. Received text: upstream connect error or disconnect/reset before headers. reset reason: connection termination Presumably I can resolve one of these by increasing the amount of time per completion, but the other ones... Curious if there are tips for working with models like these. Thanks!
    u
    m
    • 3
    • 2
  • OSS version limits
    m

    Man

    11/06/2025, 8:07 PM
    What are the adversarial prompt generation limits in the open source version?
  • Mitigation
    j

    Jan!

    11/09/2025, 10:32 AM
    Is there a way to enable the mitigation option on the open source version? I'd be very happy to know how to fix the issues my application has haha.
  • Does anyone else has a Python provider problem
    m

    Monini

    11/13/2025, 9:00 AM
    I'm using promptfoo version: 0.119.6. In my yaml I have configured provider like that: providers: - id: 'file://retrieve_answer.py' I get an error:
    Copy code
    [logger.js:324] Python worker stderr: ERROR handling call: [Errno 2] No such file or directory: 'C'
    Traceback (most recent call last):
      File "C:\Users\x\AppData\Roaming\npm\node_modules\promptfoo\dist\src\python\persistent_wrapper.py", line 191, in handle_call
        with open(request_file, "r", encoding="utf-8") as f:
             ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FileNotFoundError: [Errno 2] No such file or directory: 'C'
    
    
    [logger.js:324] Python worker stderr: ERROR: Failed to write error response: [Errno 22] Invalid argument: '\\Users\\x\\AppData\\Local\\Temp\\promptfoo-worker-req-1763023104223-7f0714e90c0d6.json:C:\\Users\\x\\AppData\\Local\\Temp\\promptfoo-worker-resp-1763023104223-8d908bcfbd3be.json'                                                                                                                                                              
    
    [logger.js:324] Shutdown complete
    [logger.js:324] Failed to read response file after 16 attempts (~18s). Expected: promptfoo-worker-resp-1763023104223-8d908bcfbd3be.json, Found in C:\Users\x\AppData\Local\Temp: promptfoo-worker-req-1762959388946-f0ac9b4f1dd7e.json, promptfoo-worker-req-1762959433064-8d405bf3e7d72.json, promptfoo-worker-req-1763023104223-7f0714e90c0d6.json
    I didn't have this problem some time ago, so I think it started after the update to Promptfoo.
    m
    • 2
    • 2
  • Multiple prompts with each mapped to separate set of images & test cases/assertions
    n

    nkhatwani

    11/13/2025, 10:56 AM
    Can someone please look into https://github.com/promptfoo/promptfoo/issues/6206 and reply accordingly.
    m
    • 2
    • 1
  • Is it possible to set `response_format` per test, rather than per prompt?
    m

    m0ltz

    11/16/2025, 5:19 PM
    I have a single prompt that I want to use, but the JSON Schema for the response is different for each test. I see from the docs that it's possible to set the schema on the provider, or a prompt, but not for a test. Are there any known hacks of workarounds to make that work?
    m
    • 2
    • 2
  • Can we integrate and fetch langfuse datasets directly from promptfoo config file like prompts?
    c

    curious_battle

    11/17/2025, 4:45 AM
    For example, The way We do langfuse://{prompt_name}
  • Streamable HTTP MCP Server Testing
    a

    Anupam Patil

    11/19/2025, 12:44 PM
    Hi Team, @User I am very new to promptfoo. I need to test couple of MCP Servers. I have one login end point and need to use the token from that API for further requests. I have search and fetch tools which would require that token as an authorization. How to achieve it using promptfoo?
    • 1
    • 1
  • Cost Metric - Bedrock
    e

    ellebarto

    11/19/2025, 3:38 PM
    Hi - is there any plans to have the cost metric work for other providers? AWS Bedrock?
  • Custom provider context for model graded asserts
    d

    dracesw

    11/19/2025, 5:42 PM
    tl;dr: providers don't seem to be sent test context when used for asserts Hi, I'm trying to use promptfoo to evaluate some agentic workflows. I have a custom python provider that does some environment setup. I need to be able to pass information from the provider completing the prompt to the provider when it is evaluating an llm-rubric assert that doesn't belong in the prompt response. The context seems to always be empty when the provider is used for asserts. Is this working as intended and if so, is there an intended way to pass this information to the assert provider?
  • How to Show Markdown Instead of JSON + How to Expose OpenAI Response IDs?
    i

    IdoRozin

    11/19/2025, 7:50 PM
    Hey all — two related questions: 1) Prompt display in Promptfoo When using messages: in promptfoo.yaml, the Promptfoo results page shows the full prompt as an ugly JSON array, like: [ { "role": "system", "content": ".... long markdown ...." }, { "role": "user", "content": "...." } ] Is there a way to make Promptfoo show the actual Markdown inside the content fields, instead of the raw JSON structure? Ideally I'd like to see the formatted prompt (headings, lists, etc.) the same way a user would see it — not the full message object. 2) Getting OpenAI Response IDs in Promptfoo Is there a way to extract the OpenAI response id from each run so that I can click/open that response inside the OpenAI API logs? I don’t see the response ID in the result JSON, even when using the OpenAI provider with logprobs or raw: true. Is there a config option or hook for surfacing the model’s id (e.g., resp.id like chatcmpl-abc123) in the Promptfoo results?
  • Bedrock Provider
    e

    ellebarto

    11/20/2025, 3:01 PM
    I am reaching out to check if the Bedrock provider response includes the input/output token count of all the models on bedrock.
  • Set reasoning effort for open router models
    c

    CYH

    11/24/2025, 7:41 PM
    Does the open router config support setting reasoning effort? something like this
    Copy code
    config:
      reasoning:
        effort: minimal
    u
    • 2
    • 1
  • Using my own LLM for generating plugin inputs - unclear docs?
    d

    dulax

    11/26/2025, 1:43 AM
    Hi, I am looking at https://www.promptfoo.dev/docs/red-team/plugins/ and I've configured a redteam config that uses the pii plugin - which according to the docs doesn't require promptfoo's servers for inference. I've setup my provider as vertex:gemini-2.5-flash but when I run promptfoo redteam generate with -v I see calls to promptfoo's APIs for remote inference even with
    PROMPTFOO_DISABLE_REMOTE_GENERATION=true
    and
    PROMPTFOO_SELF_HOSTED=1
    Is using my own LLM for redteam generation just not supported at all?
    m
    • 2
    • 3
  • Does using different provider for each test case is supported,global provider is overriding
    n

    Nithya

    11/26/2025, 4:17 PM
    Iam trying to use two different providers,but my global provider is overriding my test case provider,what may be the reason?
    u
    • 2
    • 1
  • Adding more customization
    s

    Sarra

    11/27/2025, 12:26 PM
    Hello, Can anyone help me find tutorials or documentation that could help me to do more customization on the redTeaming part. I am struggling to adapt the promptfoo features to my application and the documentation provided online is not sufficient and I cannot find many videos on custom plugins and custom tests. And whether there is an other way to customize the tests, I could any information that you have. Thank you for your help!
  • Number of Input Tokens
    s

    Sarra

    11/27/2025, 12:52 PM
    Hello, Does anyone know whether it's possible or not to limit the number of input tokens. If so please let me know how can it be done exactly? Thank you for your help!
    m
    • 2
    • 2
  • STDErr - Python Worker Crash after evaluation is completed during promptfoo run
    c

    curious_battle

    11/28/2025, 5:46 AM
    This issue happens inconsistently, initially we were using homebrew for installation and was getting this issue after some time, later on we used npm installation and the issue was resolved for some time, but again got the same issue. I have attached the config yaml file for reference, someone please help with this. We're on the final stages of self-hosting this for internal POC to the larger team, and consistently getting this issue leading to OOM both in local as well as production server https://cdn.discordapp.com/attachments/1443840296810057778/1443840297867018442/chitchat_langfuse_promptfoo.yaml.tmpl?ex=692a8890&is=69293710&hm=5bd28de4c7b73796703aeadc7b7877992493d021e6bdf82775700980c8b5c255&
    m
    • 2
    • 1
  • Evaluating LLM Responses using MCP servers
    d

    David

    12/03/2025, 2:34 PM
    Hi! I’ve been using Promptfoo for a while to create agents and evaluate models. Now, I’m looking to create an agent that calls some tools from a remote MCP in order to interpret the results and produce a specific output. I have configured my provider as follows: - id: anthropic:messages:claude-3-7-sonnet-20250219 label: claude-3-7-sonnet config: temperature: 0.5 max_tokens: 7000 mcp: enabled: true server: url: verbose: true debug: true However, when I create tests, the output always only shows the first tool call: {"type":"tool_use","id":"toolu_01LFZRH5cb34ZZDu6SF3jHES","name":"getBrandSettings","input":{}} Is there any configuration that allows the LLM to execute or plan multiple tool uses inside Promptfoo? I have searched a bit about this and everyone talks about doing a wrapper, but I would like to know if there's an alternative
    m
    • 2
    • 6