https://promptfoo.dev/ logo
Join Discord
Powered by
# ❓|questions
  • Hi I am trying to test my mcp server. I used two approaches
    s

    Suraj

    08/22/2025, 10:38 AM
    1. providers: - id: openai:chat:gpt-4 config: mcp: enabled: true server: name: my-mcp-server url: '' it gives response as json like this ## [{"id":"call_X4129obXuAlKku02qOKFWk6d","type":"function","function":{"name":"namespaces_list","arguments":"{}"}}] instead of getting actual data. 2. providers: - id: openai:responses:gpt-4.1-2025-04-14 config: tools: - type: mcp server_label: my-mcp-server server_url: require_approval: never allowed_tools: ['namespaces_list'] max_output_tokens: 1500 temperature: 0.3 instructions: 'You are a helpful research assistant. Use the available MCP tools to search for accurate information about repositories and provide comprehensive answers.' i get this : API error: 424 Failed Dependency {"error":{"message":"Error retrieving tool list from MCP server: 'my-mcp-server'. Http status code: 424 (Failed Dependency)","type":"external_connector_error","param":"tools","code":"http_error"}} i am using latest version of promptfoo can anybody help
    t
    u
    • 3
    • 15
  • Can I return multi turn prompt from javascript?
    p

    Puneet Arora

    08/24/2025, 7:57 AM
    I would lke to use javascript to create prompts.. but cannot find an example to return multi turn. Is that possible?
    w
    • 2
    • 1
  • csv with non-failing assertion
    d

    Donato Azevedo

    08/25/2025, 8:03 PM
    I''d like to run an llm rubric, but not fail the entire test if this single assertion fails. This is what I curently have:
    Copy code
    job_id,clause_number,text,legislation_id,expected_compliant,expected_analysis_rubric,__expected1,__expected2,__expected3
    job123,2,"Customers have 3 days to return defective products",consumer_law,FALSE,The analysis must state that the minimum is 7 days.,javascript: JSON.parse(output).Evaluation.Status === 'OK',javascript: JSON.parse(output).Evaluation.Compliant === (context.vars.expected_compliant === 'TRUE'),llm-rubric: The analysis should satisfy the rubric: {{expected_analysis_rubric}}
    But it's obviously failing when the
    __expected3
    fails... How can I still run the rubric, but disregard its actual score?
    w
    • 2
    • 5
  • Many variables don't get used by simulated user (max number of variables?)
    e

    Elias_M2M

    08/26/2025, 1:25 PM
    Hello, I have a problem when using many variables to dynamically adjust the simulated-user's instructions. I wrote an instruction template (in the promptfooconfig.yaml) including many variable placeholders. The simulated user should only use the values of the variables to answer questions about this topic. In every test I set the variables according to the test. When viewing the results, the variable values are shown correctly, but the simulated-user seems to only notice the first few variable values (<6). Because he does not have values for the other placeholders he says he is not sure about the certain topic. Here is a simple dummy usecase: ... defaultTest: provider: id: promptfoo:simulated-user config: maxTurns: 30 options: provider: id: azure:chat:gpt-4.1-mini vars: instructions: | You are a young wizard who wants to go to Hogwarts. The sorting hat asks you a few questions to find out, which house you fit best. Only respond if you have been asked a specific question. Answer questions using only the following information: prename: {{prename}} surname: {{surname}} date of birth: {{date_of_birth}} favourite color: {{favourite_color}} favourite food: {{favourite_food}} favourite city: {{favourite_city}} hobby: {{hobby}} number of siblings: {{number_of_siblings}} tests: - description: "Harry" vars: prename: Harry surname: Potter date_of_birth: 31 July 1980 favourite_color: Red favourite_food: Treacle tart favourite_city: London hobby: Quidditch / flying number_of_siblings: 0 ... In all tests the first 5 questions could be answered using the variable values. But the last questions (favourite city, hobby, number of siblings) were not answered by the wizards or they said, they are not sure about it. Is there a maximum number of variables you could use? Is there a recommended workaround for this?
    t
    u
    • 3
    • 3
  • Error on citation test cases generation
    e

    Envy

    08/26/2025, 3:49 PM
    Hey, I've been having errors on "citation" test cases generation part. Error states: "Error in remote citation generation: TypeError: Cannot read properties of undefined (reading 'citation')". I chose property called "Authority Bias" in promptfoo UI upon config generation. Will insert pictures below: https://cdn.discordapp.com/attachments/1409927827989594193/1409927893600960522/image.png?ex=68af292d&is=68add7ad&hm=68fde0d8056c7b8961688d74b971f28d1df99c98dd93863bcde95a074deb6fbc&
    t
    w
    • 3
    • 4
  • view factuality judge/grader response
    p

    peter

    08/26/2025, 10:59 PM
    My rubricPromptis set up like this:
    Copy code
    defaultTest:
      options:
        rubricPrompt: |
          You are an expert factuality evaluator. Compare these two answers:
    
          Question: {% raw %}{{input}}{% endraw %}
          Reference answer: {% raw %}{{ideal}}{% endraw %}
          Submitted answer: {% raw %}{{completion}}{% endraw %}
    
          Determine if the submitted answer is factually consistent with the reference answer.
          Choose one option:
          A: Submitted answer is a subset of reference (fully consistent)
          B: Submitted answer is a superset of reference (fully consistent)
          C: Submitted answer contains same details as reference
          D: Submitted answer disagrees with reference
          E: Answers differ but differences don't affect factuality
    
          Respond with JSON: {"category": "LETTER", "reason": "explanation"}
    and the eval works as expected. But, the only way I can find to view the JSON response of the grader is by turning
    --verbose
    on. The 'category' selection, for instance, isn't available in the dashboard or json outputs. I can pipe the command output to a file and jq/grep through that, but I feel like I'm probably missing a better way to grab that info?
    e
    • 2
    • 1
  • Hi Folks, How does the prompt work in the simple-mcp example?
    w

    wmluke

    08/27/2025, 2:06 AM
    In the simple-mcp example, how are the the
    tool
    and
    args
    test vars translated to json given the static 'MCP Tool Call Test' prompt? https://github.com/promptfoo/promptfoo/blob/main/examples/simple-mcp/promptfooconfig.yaml
    u
    • 2
    • 10
  • Unable to disable thinking for open-router and other models
    f

    Fiboape

    08/27/2025, 5:44 PM
    Hey guys we have this gh issue open about getting the thinking results when showThinking: false due to the content being empty so it outputs the thinking when in reality it should not. Sorry to be pedantic about the issue but it is blocking us from moving forward with more testcases and development. Any chance to get some more 👀 into it? Thank you!
    b
    u
    • 3
    • 3
  • Email inquiry
    a

    ArunS1997

    09/01/2025, 8:47 AM
    Hi team. I tried to reach your email address but was unable. Can you please confirm whether this is the correct email address for queries to your enterprise solutions? Email: enterprise@promptfoo.dev
    i
    • 2
    • 2
  • language setting for llm-grading
    e

    Elias_M2M

    09/01/2025, 2:21 PM
    Hello, is there a way to change the language of llm-grading (llm-rubric)? In my case every output should be German, not English. My whole conversation with a simulated-user is in the German language and every llm-rubric too. I even changed the rubric-prompt to a German instruction saying it should output everything in German. Nevertheless some "reasons" from llm-rubrics are always English. How can I force everything to German?
    m
    • 2
    • 2
  • Using red team generated prompts and my own custom prompts
    d

    dulax

    09/04/2025, 3:05 PM
    Hi, I'm trying to understand how to use the generated
    {{prompt}}
    and then add my own prompts. I did try this:
    Copy code
    prompts:
      - '{{prompt}}'
      - 'My custom prompt'
    But then when I view the output, it's not showing my prompt as another entry in the list - it's appending it to each prompt that was generated. What am I not getting about this?
    w
    u
    • 3
    • 6
  • Anthropic/claude tool response and javascript assert
    t

    Tak

    09/05/2025, 2:44 PM
    I'm trying to write a javascript assert for an anthropic tool response; but I don't seem to get the output var, tried many things, is there someone with a working example?
    w
    a
    • 3
    • 5
  • Config error with 0.118.3
    q

    Quinten

    09/05/2025, 6:12 PM
    I just upgraded to the latest promptfoo and I'm getting a new error ConfigPermissionError: Permission denied: config unknown: You need to upgrade to an enterprise license to access this feature Simplified view of my config:
    Copy code
    prompts:
      - "{{problem_description}}"
    
    providers: # A single enabled custom Python provider
      - id: file://promptfoo/promptfoo_classifier_provider.py
        label: gpt5-only
        config:
          enabled_providers: ["gpt-5"]
          decision_mode: "first"
          taxonomy_name: "default"
          include_debug_details: true
          cache_enabled: true
    
    tests: promptfoo/OSB_sample_data_mini.csv
    # tests: promptfoo/OSB_sample_data.csv
    
    defaultTest:
      assert:
        - type: javascript
          value: |
            // actual JS test here
    this worked in 0.118.0, seems to fail in 0.118.3. downgrading to 0.118.0 seems to get things working again, so maybe just a bug? i didn't see a related issue in GitHub yet and it's also possible I just have weird syntax that I should fix
    u
    • 2
    • 8
  • Doubt about how to connect
    j

    Josema Blanco

    09/09/2025, 8:43 AM
    Hi, i want to test Promptfoo on my company, for authenticate against the chatbot some steps are required, u log It, It generates a sso token, with that token you generate an authentication token, and with It generates the session id for the chatbot, the people that manage the chatbot connect to It using Python script, theres any way i can include all those headers in the .yaml config file or how i should do It. thanks in advance. https://cdn.discordapp.com/attachments/1414894018130612324/1414894018688450631/image001_1.png?ex=68c13a3d&is=68bfe8bd&hm=aa0b1720fa4dc4b914b18e155486a236bf1e8d596d205a609fc62594a904a0dd&
    r
    w
    • 3
    • 2
  • llm-rubric test always connects to api.openai.com
    r

    Rysiek

    09/09/2025, 11:11 AM
    Hi all, I'm trying to set up llm-rubric tests against a custom server using a config like:
    Copy code
    providers:
      - id: openai:gpt-4.1-mini
        label: openai
        config:
          apiHost: "yourAIhost.com"
          apiKey: sk-abc123
    So the prompts themselves are executed properly against "yourAIhost.com" and work properly. But the llm-rubric tests are then executed against
    api.openai.com
    I tried many different setups. Just the one I provided. I tried setting
    Copy code
    defaultTest:
      options:
        provider: openai:gpt-4.1-mini
    I tried using a custom label. I tried making the whole provider custom http provider, which also worked, and then to reference it in defaultTest, or in the llm-rubric test. Nothing works – the test always hits the api.openai.com When I tried to use custom https provider, with a label, and then reference it in defaultTest config, I get a response in the UI:
    Copy code
    Error: Invariant failed: Expected HTTP provider https:custom to have a config containing {body}, but instead got {}
    That looks like a bug, cause it identifies the provider, but not its config. Which works correctly for the prompts themselves, just not in llm-rubric verifier. Anyone had similar problem and managed to overcome it?
    w
    • 2
    • 3
  • Add name to test case
    a

    azai91

    09/16/2025, 4:12 AM
    Is there a way to have a name or id column be added to the eval output? I would like to dump my results into a database and then query to see which tests fail the most over time
    w
    • 2
    • 4
  • Compatibility with local Burpsuite proxy
    p

    path_traverser

    09/16/2025, 1:26 PM
    Has anyone tried proxying requests from Promptfoo through a local Burpsuite proxy? I'd like to see the requests and responses resulting from red teaming an application that uses an AI Agent to talk to an LLM. I'm following the instructions here https://www.promptfoo.dev/docs/faq/#how-do-i-use-a-proxy-with-promptfoo. I've added the following to an .env file in the directory I'm running
    promptfoo redteam
    from:
    Copy code
    # Promptfoo Proxy with authentication
    export HTTPS_PROXY=127.0.0.1:8080
    # SSL certificates - Absolute path
    export PROMPTFOO_CA_CERT_PATH=/Users/user/burpsuitecacert.der
    When running
    promptfoo -v redteam run
    I get the following output in verbose mode:
    Copy code
    [apiHealth.js:17] [CheckRemoteHealth] Checking API health: {"url":"https://www.promptfoo.app/health","env":{"httpsProxy":"127.0.0.1:8080"}}
    [apiHealth.js:36] [CheckRemoteHealth] Making fetch request: {"url":"https://www.promptfoo.app/health","options":{"headers":{"Content-Type":"application/json"}},"timeout":5000,"nodeVersion":"v24.7.0"}
    [fetch.js:114] Using custom CA certificate from /Users/user/burpsuitecacert.der
    [fetch.js:122] Using proxy: https://127.0.0.1:8080/
    [apiHealth.js:95] [CheckRemoteHealth] API health check failed: {"error":"Request timed out after 5000 ms","url":"https://www.promptfoo.app/health"}
    When I do not use the proxy, it works as it should. When I proxy local browser traffic to the API endpoint I want to test, it works as it should. I definitely have access to https://www.promptfoo.app/health so it seems the problem is with the communication between Promptfoo and Burpsuite proxy. Any ideas where to look when troubleshooting this will be very welcome!
    • 1
    • 1
  • Testing stateful agent with simulated user
    a

    azai91

    09/16/2025, 4:06 PM
    I have a stateful agent (data analysis agent that has access to a coding sandbox with a dataframe) that I want to evaluate with the simulated user provider. The issue is that every call_api is stateless and we are expected to feed the entire conversation/state as a string each time. Is there a recommended way to keep the same agent/client around for a few calls?
    w
    • 2
    • 6
  • I'm suddenly getting an error when I run promptfoo redteam with an extension.
    d

    dulax

    09/17/2025, 2:55 PM
    This was working yesterday and all I did was move to a new folder. The error does not occur when I remove the extension. Now when I run promptfoo eval I get the following error:
    Copy code
    [pythonUtils.js:168] Error running Python script: EACCES: permission denied, open '/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/promptfoo-python-input-json-1758117675565-14497df623c8b.json'
    command:
    Copy code
    npx promptfoo@0.117.11 redteam eval -c it-works.yaml
    promptfoo config:
    Copy code
    description: >-
      A simple configuration file that demonstrates the basic functionality of the
      system.
    targets:
      - id: http
        config:
          url: '{{url}}'
          method: POST
          transformResponse: 'json.message.parts[0].text'
          validateStatus: 'status >= 200 && status <= 299'
          headers:
            Content-Type: application/json
          body:
            session_id: '{{session_id}}'
            message: '{{prompt}}'
            user_id: '{{user_id}}'
    redteam:
      purpose: You are awesome.
      plugins:
        - id: intent
          config:
            intent: Hello, world
      entities: &ref_0 []
      strategies: []
    defaultTest:
      metadata:
        purpose: You are awesome.
        entities: *ref_0
    tests:
      - vars:
          prompt: Hello, world
        assert:
          - type: promptfoo:redteam:intent
            metric: Intent
        metadata:
          pluginId: promptfoo:redteam:intent
          severity: high
          modifiers: {}
          goal: >-
            Manipulate the system using benign-looking input as a carrier for a
            hidden attack technique specified in the pluginContext.
    extensions:
      - file://extensions/my_extension.py:extension_hook
    • 1
    • 2
  • Simulated-user doesn't see to work with extensions
    a

    azai91

    09/18/2025, 4:30 PM
    When I enable extensions when using the simulated-user provider, the prompt does not get set correctly. Anyone encouter this?
    t
    • 2
    • 2
  • Any way to add additional columns to Web Server
    a

    azai91

    09/19/2025, 3:00 PM
    When I am looking at the web server, is it possible to show columns from the metadata field. Currently we only see description, prompt, context https://cdn.discordapp.com/attachments/1418612633094979736/1418612638262235157/image.png?ex=68cec179&is=68cd6ff9&hm=e123b0cbcb567732ca6adbc31c9bed4c534fc2d3a516db42f1d3becb38b56fef&
    t
    • 2
    • 1
  • Qwen on Bedrock
    e

    ellebarto

    09/23/2025, 3:01 PM
    Any timelines on when qwen will be accessible to run eval against in bedrock?
    m
    • 2
    • 1
  • Looking to understand provider config and security details in Promptfoo
    g

    Gia Duc

    09/24/2025, 3:38 AM
    Hi Promptfoo team, I’m exploring using Promptfoo in the company’s project, and I’d love to understand a bit more about how provider configuration works under the hood. Would you be able to share a simple sketch or diagram of the flow to explain behind the scene(e.g., from developer → Promptfoo config → provider request → response handling) for my security review before applying it? I’m especially curious about a few points: 1. How and where provider configuration (API keys, credentials, endpoints) is stored and loaded. 2. Whether any of that configuration data is ever sent outside the local environment. 3. How Promptfoo keeps providers isolated when multiple are configured. Whether sensitive values are cached, logged, or persisted in any way. And if you happen to have any security-related documentation or notes already prepared, I’d be really grateful if you could share those as well — it would help me explain things clearly to our security team when I pass this along for review. Thanks so much for your help!
    t
    i
    • 3
    • 5
  • Sharding Tests?
    a

    azai91

    09/25/2025, 6:26 PM
    Is there a way to shard tests (by maybe passing a shard key)? My agent right now is stateful so I setup a sandbox with a setup and teardown but I have to limit concurrency quite a bit cause of memory constraints. I can split up tests and run on completely different machines but was wondering if there was an easier way than explicitly defining which groups tests belong in
    a
    • 2
    • 1
  • Ability to add additional context/metdata in UI
    a

    azai91

    09/27/2025, 4:01 PM
    In the web view, do we have a way of show addition data. For example my agent also produces artifacts or summaries and I want to see if these were produced correctly. https://cdn.discordapp.com/attachments/1421527100967358596/1421527101345108122/image.png?ex=68d95bc7&is=68d80a47&hm=4b89e81515433e6b1ce1da1cfad20bc484f79798eebbb34aaa66b562c795ea43&
    y
    • 2
    • 1
  • Re-run failed evals
    r

    Rares Vernica

    09/29/2025, 7:09 PM
    Is there a way to re-run only the failed prompts in the eval? I tried
    --filter-failing
    but it seems broken, see [#5755](https://github.com/promptfoo/promptfoo/issues/5755). Is there a workaround?
    u
    • 2
    • 2
  • Eval name from CLI
    j

    Jérémie

    10/01/2025, 8:04 PM
    Hi all, is there a way to set the eval name when triggering an eval using
    Copy code
    promptfoo eval
    command ? I see there is a way to update eval name from the webpage but I'm wondering how I can let my testers easily access eval results by adopting a naming convention. Thanks for your insights. Kind regards, Jérémie
  • Grader Endpoint
    f

    Firestorm

    10/02/2025, 8:57 PM
    I’m currently running a red team configuration and I want to use remote generation from Promptfoo, but force grading to use my own Azure OpenAI endpoint. I have added the defaultTest override as can be seen in the image. When I set PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION=true, I can see my Azure endpoint being hit in the logs for grading. However, when it’s set to false (remote generation enabled), grading requests still go to https://api.promptfoo.app/api/v1/task instead of my Azure endpoint — even though the logs show: [RedteamProviderManager] Using grading provider from defaultTest: azureopenai:chat:gpt-4.1-nano This suggests that the grader override is ignored when remote generation is active. My questions are: 1) Is it currently possible to use Promptfoo’s remote generation while forcing grading to happen only on my Azure OpenAI deployment? 2) If so, what’s the correct configuration to achieve this? 3) If not, is hybrid support (remote generation + custom grader) on the roadmap? https://cdn.discordapp.com/attachments/1423413621488095416/1423413621672902736/image.png?ex=68e038bd&is=68dee73d&hm=39161e9a3001840e6ba6a65f4d3a209333aaa41675d5752700844db3c01983f9&
  • Multi-stage chat setup
    d

    dulax

    10/06/2025, 7:45 PM
    Hi, I have a setup where I need to establish a session and then use the session ID to send messages to a chat session. Right now, I'm establishing the session in an extension and then using the HTTP provider API to send the messages with the prompts. The extension implements a
    beforeEach
    and just hits the create session endpoint, extracts the session_id from the response and passes it through the context. I noticed providers is a list, so does that mean there's a way for me to do it all in YAML using the list? I couldn't find an example.
    w
    • 2
    • 1
  • Hi All,
    u

    Umut

    10/07/2025, 11:55 AM
    I would like to test my agents which I created in Azure AI Foundry. The agents don't have any deployment name, instead they have Agent ID and Agent Name. The endpoint is generated like this: https://promptfoo-testing-resource.services.ai.azure.com/api/projects/promptfoo-testing What would you recommend to configure my "provider" agents in YAML file? I would like to stay away from implementing a self HTTP provider, if there is an easier way. Thanks a lot for your recommendations. Kind Regards, Umut