https://promptfoo.dev/ logo
Join Discord
Powered by
# ❓|questions
  • _conversation / previous messages for Simulated User and Assistant
    e

    Elias_M2M

    10/29/2025, 9:38 AM
    Hello, I would like to test a multi-turn conversation between an assistant and a simulated user. The prescribed conversation flow of the assistant is very long and for my current test cases I just need to test the end of the conversation. For these tests, the previous messages are very important, so the simulated user and the assistant need to know what "they" said before. I saw in the docs, that there is an option of adding a variable "messages" or "_conversation", but I don't know how this is behaving with the simulated user provider. Is it possible the define the previous messages for both the assistant and the simulated user, so they know where to continue the conversation? And how can I do this?
    • 1
    • 1
  • prompts generation only
    b

    b00l_

    10/29/2025, 2:36 PM
    hello, I have a redteam.yaml file with a bunch of plugins enabled, is it possible to just generate prompts, and save them in a file based on all plugins enabled? can I do it local only and even with openai key?
    u
    • 2
    • 3
  • How to add dynamic prompt with multiple placeholders inside promptfooconfig?
    c

    curious_battle

    11/04/2025, 4:50 AM
    My prompt looks like {"role":"system", "content": < {company}, {company_description}, ... , {previous_context},} then again repeated at user level with a few placeholders How can I use this reliably inside promptfooconfig with variables separately from another file such that the prompt gets build up completely and then we can test against user_input, currently the prompt part allows prompt with placeholders but no support for passing placeholders variables values, and input csv only allows a single column input ??
    m
    • 2
    • 2
  • Retrying tests
    t

    thomas_romas

    11/04/2025, 1:04 PM
    I am running a basic redteaming evaluation with some plugins enabled. Nothing crazy, I am just trying to see how it works. I pointed promptfoo at my Azure OpenAI model. The evaluation doesn't finish for multiple hours and is stuck at the following output
    Copy code
    ...
    Chunk 72 partial success: 24/8 test cases, retrying missing ones individually
    ...
    The chunk number increases. I am not sure what it means. Is there anything I can do to omit the retrying or at least see partial results of the evaluation?
    u
    • 2
    • 1
  • Any advice for really long-running models like GPT-5-pro?
    c

    CasetextJake

    11/04/2025, 9:30 PM
    I'd like to run some evals with GPT-5-pro, and I'd say usually 50% of them error out. I get a variety of errors: API call error: Error: Request failed after 4 retries: TypeError: fetch failed (Cause: Error: getaddrinfo ENOTFOUND api.openai.com) API call error: Error: Request failed after 4 retries: Error: Request timed out after 300000 ms API call error: Error: Error parsing response from https://api.openai.com/v1/responses: Unexpected token '<', " <h"... is not valid JSON. Received text: 502 Bad Gateway 502 Bad Gateway cloudflare API call error: Error: Error parsing response from https://api.openai.com/v1/responses: Unexpected token 'u', "upstream c"... is not valid JSON. Received text: upstream connect error or disconnect/reset before headers. reset reason: connection termination Presumably I can resolve one of these by increasing the amount of time per completion, but the other ones... Curious if there are tips for working with models like these. Thanks!
    u
    m
    • 3
    • 2
  • OSS version limits
    m

    Man

    11/06/2025, 8:07 PM
    What are the adversarial prompt generation limits in the open source version?
  • Mitigation
    j

    Jan!

    11/09/2025, 10:32 AM
    Is there a way to enable the mitigation option on the open source version? I'd be very happy to know how to fix the issues my application has haha.
  • Does anyone else has a Python provider problem
    m

    Monini

    11/13/2025, 9:00 AM
    I'm using promptfoo version: 0.119.6. In my yaml I have configured provider like that: providers: - id: 'file://retrieve_answer.py' I get an error:
    Copy code
    [logger.js:324] Python worker stderr: ERROR handling call: [Errno 2] No such file or directory: 'C'
    Traceback (most recent call last):
      File "C:\Users\x\AppData\Roaming\npm\node_modules\promptfoo\dist\src\python\persistent_wrapper.py", line 191, in handle_call
        with open(request_file, "r", encoding="utf-8") as f:
             ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FileNotFoundError: [Errno 2] No such file or directory: 'C'
    
    
    [logger.js:324] Python worker stderr: ERROR: Failed to write error response: [Errno 22] Invalid argument: '\\Users\\x\\AppData\\Local\\Temp\\promptfoo-worker-req-1763023104223-7f0714e90c0d6.json:C:\\Users\\x\\AppData\\Local\\Temp\\promptfoo-worker-resp-1763023104223-8d908bcfbd3be.json'                                                                                                                                                              
    
    [logger.js:324] Shutdown complete
    [logger.js:324] Failed to read response file after 16 attempts (~18s). Expected: promptfoo-worker-resp-1763023104223-8d908bcfbd3be.json, Found in C:\Users\x\AppData\Local\Temp: promptfoo-worker-req-1762959388946-f0ac9b4f1dd7e.json, promptfoo-worker-req-1762959433064-8d405bf3e7d72.json, promptfoo-worker-req-1763023104223-7f0714e90c0d6.json
    I didn't have this problem some time ago, so I think it started after the update to Promptfoo.
    m
    • 2
    • 2
  • Multiple prompts with each mapped to separate set of images & test cases/assertions
    n

    nkhatwani

    11/13/2025, 10:56 AM
    Can someone please look into https://github.com/promptfoo/promptfoo/issues/6206 and reply accordingly.
    m
    • 2
    • 1
  • Is it possible to set `response_format` per test, rather than per prompt?
    m

    m0ltz

    11/16/2025, 5:19 PM
    I have a single prompt that I want to use, but the JSON Schema for the response is different for each test. I see from the docs that it's possible to set the schema on the provider, or a prompt, but not for a test. Are there any known hacks of workarounds to make that work?
    m
    • 2
    • 2
  • Can we integrate and fetch langfuse datasets directly from promptfoo config file like prompts?
    c

    curious_battle

    11/17/2025, 4:45 AM
    For example, The way We do langfuse://{prompt_name}
  • Streamable HTTP MCP Server Testing
    a

    Anupam Patil

    11/19/2025, 12:44 PM
    Hi Team, @User I am very new to promptfoo. I need to test couple of MCP Servers. I have one login end point and need to use the token from that API for further requests. I have search and fetch tools which would require that token as an authorization. How to achieve it using promptfoo?
    • 1
    • 1
  • Cost Metric - Bedrock
    e

    ellebarto

    11/19/2025, 3:38 PM
    Hi - is there any plans to have the cost metric work for other providers? AWS Bedrock?
  • Custom provider context for model graded asserts
    d

    dracesw

    11/19/2025, 5:42 PM
    tl;dr: providers don't seem to be sent test context when used for asserts Hi, I'm trying to use promptfoo to evaluate some agentic workflows. I have a custom python provider that does some environment setup. I need to be able to pass information from the provider completing the prompt to the provider when it is evaluating an llm-rubric assert that doesn't belong in the prompt response. The context seems to always be empty when the provider is used for asserts. Is this working as intended and if so, is there an intended way to pass this information to the assert provider?
  • How to Show Markdown Instead of JSON + How to Expose OpenAI Response IDs?
    i

    IdoRozin

    11/19/2025, 7:50 PM
    Hey all — two related questions: 1) Prompt display in Promptfoo When using messages: in promptfoo.yaml, the Promptfoo results page shows the full prompt as an ugly JSON array, like: [ { "role": "system", "content": ".... long markdown ...." }, { "role": "user", "content": "...." } ] Is there a way to make Promptfoo show the actual Markdown inside the content fields, instead of the raw JSON structure? Ideally I'd like to see the formatted prompt (headings, lists, etc.) the same way a user would see it — not the full message object. 2) Getting OpenAI Response IDs in Promptfoo Is there a way to extract the OpenAI response id from each run so that I can click/open that response inside the OpenAI API logs? I don’t see the response ID in the result JSON, even when using the OpenAI provider with logprobs or raw: true. Is there a config option or hook for surfacing the model’s id (e.g., resp.id like chatcmpl-abc123) in the Promptfoo results?
  • Bedrock Provider
    e

    ellebarto

    11/20/2025, 3:01 PM
    I am reaching out to check if the Bedrock provider response includes the input/output token count of all the models on bedrock.
  • Set reasoning effort for open router models
    c

    CYH

    11/24/2025, 7:41 PM
    Does the open router config support setting reasoning effort? something like this
    Copy code
    config:
      reasoning:
        effort: minimal
    u
    • 2
    • 1
  • Using my own LLM for generating plugin inputs - unclear docs?
    d

    dulax

    11/26/2025, 1:43 AM
    Hi, I am looking at https://www.promptfoo.dev/docs/red-team/plugins/ and I've configured a redteam config that uses the pii plugin - which according to the docs doesn't require promptfoo's servers for inference. I've setup my provider as vertex:gemini-2.5-flash but when I run promptfoo redteam generate with -v I see calls to promptfoo's APIs for remote inference even with
    PROMPTFOO_DISABLE_REMOTE_GENERATION=true
    and
    PROMPTFOO_SELF_HOSTED=1
    Is using my own LLM for redteam generation just not supported at all?
    m
    • 2
    • 3
  • Does using different provider for each test case is supported,global provider is overriding
    n

    Nithya

    11/26/2025, 4:17 PM
    Iam trying to use two different providers,but my global provider is overriding my test case provider,what may be the reason?
    u
    • 2
    • 2
  • Adding more customization
    s

    Sarra

    11/27/2025, 12:26 PM
    Hello, Can anyone help me find tutorials or documentation that could help me to do more customization on the redTeaming part. I am struggling to adapt the promptfoo features to my application and the documentation provided online is not sufficient and I cannot find many videos on custom plugins and custom tests. And whether there is an other way to customize the tests, I could any information that you have. Thank you for your help!
  • Number of Input Tokens
    s

    Sarra

    11/27/2025, 12:52 PM
    Hello, Does anyone know whether it's possible or not to limit the number of input tokens. If so please let me know how can it be done exactly? Thank you for your help!
    m
    • 2
    • 2
  • STDErr - Python Worker Crash after evaluation is completed during promptfoo run
    c

    curious_battle

    11/28/2025, 5:46 AM
    This issue happens inconsistently, initially we were using homebrew for installation and was getting this issue after some time, later on we used npm installation and the issue was resolved for some time, but again got the same issue. I have attached the config yaml file for reference, someone please help with this. We're on the final stages of self-hosting this for internal POC to the larger team, and consistently getting this issue leading to OOM both in local as well as production server https://cdn.discordapp.com/attachments/1443840296810057778/1443840297867018442/chitchat_langfuse_promptfoo.yaml.tmpl?ex=692a8890&is=69293710&hm=5bd28de4c7b73796703aeadc7b7877992493d021e6bdf82775700980c8b5c255&
    m
    • 2
    • 1
  • Evaluating LLM Responses using MCP servers
    d

    David

    12/03/2025, 2:34 PM
    Hi! I’ve been using Promptfoo for a while to create agents and evaluate models. Now, I’m looking to create an agent that calls some tools from a remote MCP in order to interpret the results and produce a specific output. I have configured my provider as follows: - id: anthropic:messages:claude-3-7-sonnet-20250219 label: claude-3-7-sonnet config: temperature: 0.5 max_tokens: 7000 mcp: enabled: true server: url: verbose: true debug: true However, when I create tests, the output always only shows the first tool call: {"type":"tool_use","id":"toolu_01LFZRH5cb34ZZDu6SF3jHES","name":"getBrandSettings","input":{}} Is there any configuration that allows the LLM to execute or plan multiple tool uses inside Promptfoo? I have searched a bit about this and everyone talks about doing a wrapper, but I would like to know if there's an alternative
    m
    u
    • 3
    • 7
  • Problem with MCP call
    m

    Matteo

    12/05/2025, 1:24 PM
    Title: MCP Server Returns Invalid Tool Schema -
    outputSchema.additionalProperties
    Must Be Boolean Description: I'm encountering a validation error when connecting to an MCP server via Streamable HTTP transport. The connection succeeds and ping works, but the
    listTools()
    call fails with a schema validation error. Error Message:
    Copy code
    Failed to connect to MCP server web-search: [
      {
        "code": "invalid_type",
        "expected": "boolean",
        "received": "object",
        "path": [
          "tools",
          0,
          "outputSchema",
          "additionalProperties"
        ],
        "message": "Expected boolean, received object"
      }
    ]
    u
    • 2
    • 2
  • How should google sheets handle multiple providers?
    s

    smaclell

    12/05/2025, 11:00 PM
    We want to output test results to Google Sheets with multiple providers. It is currently only outputting the last provider. I had expected it would match the CSV output instead. How should this behave? I'll share a simple PR ([link](https://github.com/promptfoo/promptfoo/pull/6528)) to fix it by showing all responses as unique columns. Happy to provide a more extensive change if you would like it to match the CSV format. Context: We are trying to improve the workflow for less technical users. We planned to have them write questions in a Google Sheet, then post the responses to a new sheet. p.s. We love promptfoo. Thank you for the fantastic framework.
    m
    • 2
    • 2
  • How to contribute?
    s

    Syed

    12/09/2025, 1:33 PM
    I went through https://www.promptfoo.dev/docs/contributing/, but didn't find any instructions on how to start. Should I be commenting on the issue that I am planning to pick it up? or directly create a PR against an issue? I have been using promptfoo a lot for testing my prompts and before release we do, so wanted to contribute a little to this wonderful project. I am more of a python guy, but can barely manage TS, so wanted to start on something small. Sorry to tag you @michaelmichaelmichael but can you suggest some issues which I can work on?
    m
    • 2
    • 1
  • Question relating to my Output
    t

    Tantrik

    12/09/2025, 11:57 PM
    Hi All, Thanks for having me here. I was running promptfoo scan-model on my GGUF model. I got the below output. I wanted to know the meaning of the GGUF metadata parse error. "vulnerabilities": [ { "severity": "critical", "message": "GGUF metadata parse error: Array too large: 32064 elements", "location": "/tmp/tmp9j0ng3ng.gguf", "details": { "error": "Array too large: 32064 elements", "error_type": "ValueError" }, "timestamp": 1765324185.1836855, "detection_category": "ValueError" }, { "severity": "critical", "message": "GGUF metadata parse error: Array too large: 32064 elements", "location": "/tmp/tmp9j0ng3ng.gguf", "details": { "error": "Array too large: 32064 elements", "error_type": "ValueError" }, "timestamp": 1765324185.1836665, "detection_category": "gguf_metadata_parsing" } ] Can someone please help me on this ?
    y
    • 2
    • 1
  • Error "API call error: https://api-inference.huggingface.co is no longer supported"
    v

    vivek

    12/10/2025, 6:48 PM
    I'm trying to use classifier type assert in my test and using huggingface provider
    huggingface:text-classification:protectai/deberta-v3-base-prompt-injection
    . running eval returns an error API call error: https://api-inference.huggingface.co is no longer supported. Please use https://router.huggingface.co instead. Looks like huggingface changed their endpoint. How to point to the new endpoint?
    Copy code
    - type: classifier
            provider: huggingface:text-classification:protectai/deberta-v3-base-prompt-injection
            config:
              apiEndpoint: https://router.huggingface.co/models/protectai/deberta-v3-base-prompt-injection
            value: 'SAFE'
            threshold: 0.9
    m
    • 2
    • 1
  • Not clear how to set up using LangChain w/ BedrockConverse, and Claude as the model
    p

    ptrin

    12/11/2025, 2:50 PM
    I'm just looking through documentation trying to get acquainted with how promptfoo can be used. Our setup is that we use LangChain to manage LLM inference, using BedrockConverse and we're experimenting with different models (which is part of why I'm looking into promptfoo, for model comparison). Our main use case is structured output, can anyone offer pointers on how to evaluate structured output responses from multiple models on the same provider?
    m
    • 2
    • 3
  • Error closing server: Server is not running
    e

    Elias_M2M

    12/12/2025, 9:45 AM
    After upgrading to 0.120.1 (still in 0.120.4) a strange bug occured when viewing evals through localhost. I used promptfoo eval and promptfoo view to view the evals. When pressing Ctrl+C to stop the server, I see the following logs in the command line: Shutting down server... Error closing server: Server is not running. (-> yellow) Server closed Till now this is not causing any problems for me, but I was wondering if this is part of an underlying bug that could cause problems. I just wanted to let you know.
    m
    • 2
    • 2