https://promptfoo.dev/ logo
Join Discord
Powered by
# ❓|questions
  • Re-run failed evals
    r

    Rares Vernica

    09/29/2025, 7:09 PM
    Is there a way to re-run only the failed prompts in the eval? I tried
    --filter-failing
    but it seems broken, see [#5755](https://github.com/promptfoo/promptfoo/issues/5755). Is there a workaround?
    u
    • 2
    • 2
  • Eval name from CLI
    j

    Jérémie

    10/01/2025, 8:04 PM
    Hi all, is there a way to set the eval name when triggering an eval using
    Copy code
    promptfoo eval
    command ? I see there is a way to update eval name from the webpage but I'm wondering how I can let my testers easily access eval results by adopting a naming convention. Thanks for your insights. Kind regards, Jérémie
    i
    • 2
    • 1
  • Grader Endpoint
    f

    Firestorm

    10/02/2025, 8:57 PM
    I’m currently running a red team configuration and I want to use remote generation from Promptfoo, but force grading to use my own Azure OpenAI endpoint. I have added the defaultTest override as can be seen in the image. When I set PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION=true, I can see my Azure endpoint being hit in the logs for grading. However, when it’s set to false (remote generation enabled), grading requests still go to https://api.promptfoo.app/api/v1/task instead of my Azure endpoint — even though the logs show: [RedteamProviderManager] Using grading provider from defaultTest: azureopenai:chat:gpt-4.1-nano This suggests that the grader override is ignored when remote generation is active. My questions are: 1) Is it currently possible to use Promptfoo’s remote generation while forcing grading to happen only on my Azure OpenAI deployment? 2) If so, what’s the correct configuration to achieve this? 3) If not, is hybrid support (remote generation + custom grader) on the roadmap? https://cdn.discordapp.com/attachments/1423413621488095416/1423413621672902736/image.png?ex=68e038bd&is=68dee73d&hm=39161e9a3001840e6ba6a65f4d3a209333aaa41675d5752700844db3c01983f9&
  • Multi-stage chat setup
    d

    dulax

    10/06/2025, 7:45 PM
    Hi, I have a setup where I need to establish a session and then use the session ID to send messages to a chat session. Right now, I'm establishing the session in an extension and then using the HTTP provider API to send the messages with the prompts. The extension implements a
    beforeEach
    and just hits the create session endpoint, extracts the session_id from the response and passes it through the context. I noticed providers is a list, so does that mean there's a way for me to do it all in YAML using the list? I couldn't find an example.
    w
    • 2
    • 1
  • Hi All,
    u

    Umut

    10/07/2025, 11:55 AM
    I would like to test my agents which I created in Azure AI Foundry. The agents don't have any deployment name, instead they have Agent ID and Agent Name. The endpoint is generated like this: https://promptfoo-testing-resource.services.ai.azure.com/api/projects/promptfoo-testing What would you recommend to configure my "provider" agents in YAML file? I would like to stay away from implementing a self HTTP provider, if there is an easier way. Thanks a lot for your recommendations. Kind Regards, Umut
    u
    • 2
    • 9
  • Similar metric - vertex:text-embedding-005 support
    g

    Gia Duc

    10/09/2025, 8:31 AM
    Hi, I have a Vertex text-embedding-005 set up for the similar metric and got this message:
    [matchers.js:121] Provider vertex:text-embedding-005 is not a valid embedding provider for 'similarity check', falling back to default
    This is the config in the defaultTest:
    Copy code
    provider:
          embedding:
            id: vertex:text-embedding-005
    AFAIK, the text-embedding-005 is available for the similarity task type: https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#assess_text_similarity Also, the syntax is valid according to the Promptfoo document for Vertex: https://www.promptfoo.dev/docs/providers/vertex/#embedding-models The test works fine currently but because it is using the default provider or something else? How can I use that text embedding model for my similarity assertion? Please help me take a look. Thank you
  • Post-processing llm-rubric response
    a

    Attila Horvath

    10/13/2025, 10:19 AM
    Hey all, First of all, thanks for Promptfoo — I’m loving it! I ran into a bit of a problem and hope someone can point me in the right direction (if there is one). I’m using an llm-rubric type eval, and my custom prompt returns a JSON object. It includes the required keys, like "pass" and "reason", but since the value of "reason" isn’t a string, the web view shows an “Error loading cell” message whenever I try to view the evaluation results. Is there any way to post-process the llm-rubric output? Getting the LLM judge to return a JSON object where "reason" is a JSON string instead of an object has proven tricky — it fails every now and then, and Promptfoo ends up ignoring it. Thanks in advance!
    u
    • 2
    • 2
  • Executing assertions without "prompts" (for online evaluation)
    o

    oyebahadur

    10/13/2025, 1:42 PM
    Hi folks, I have deployed a decent (system) prompts of multiple agents in an agentic chat app to my test environment. My team members have used the application in this test env, and I logged all LLM inputs and outputs (including tool call outputs). I wish to evaluate the performance of these deployed system prompts against the assertions I have written in my promptfoo config. Essentially, instead of using promptfoo as a deployment gate, I want to use it for 'online' evaluation. promptfoo evaluates the "output" of the prompt against the assertions, can I "override" this output without making any LLM calls?
    u
    • 2
    • 2
  • Trace-timeline not shown
    s

    singhe.

    10/15/2025, 2:22 AM
    Hey! I am facing some issues when trying to view the trace timeline of promptfoo GUI. I get the following error in the GUI. “Traces were created but no spans were received. Make sure your provider is: - Configured to send traces to the OTLP endpoint (http://localhost:4318) -Creating spans within the trace context -Properly exporting spans before the evaluation completes. I am trying to calculate the trace-error-spans of my LLM. But since it didn’t work, I tried writing a FAST API app and view the trace timeline in the GUI. Can someone help me in this matter please?
    m
    • 2
    • 4
  • Running redteam testing inside container
    y

    Yang

    10/15/2025, 7:36 PM
    The promptfoo container is read only, I can map my promptfooconfig.xml file from my local to the container. But it will always generate/update the redteam.yaml file so not able to get it working. Any tips? Appreciate it!
    o
    • 2
    • 2
  • Eval on pre-existing model response
    a

    apilchand

    10/16/2025, 6:20 AM
    Hello. I have tested one application previously and used red Teaming. So for asserting it uses the default one. Now I want to test g-eval for asserting, but don't want to generate any new responses. I want to compare the g-eval results and the default one results with the same responses. Is there a way to use g-eval on pre-existing tests and responses? While going through documentation I came across one flag, --model-outputs, but not much details about how to use that.
    m
    • 2
    • 3
  • PROMPTFOO_DISABLE_TELEMETRY and promptfoo debug
    u

    Umut

    10/16/2025, 11:46 AM
    Hello, I would like to note down an observation which you might take it as an improvement: Even if I set "PROMPTFOO_DISABLE_TELEMETRY=1", I cannot see this setting being listed when I call "promptfoo debug". I see that there is a section called "env" and I can see other environment variables, but "telemetry" information is missing. When I run "promptfoo redtam run", I observed that telemetry stopped, so it works as expected. However I thought it would be an improvement if we could also see this setting when we call "promptfoo debug" to be sure that it's there. This might help dev teams to validate their settings before running. Kind Regards, Umut
    i
    • 2
    • 1
  • Redteam Grader
    u

    Umut

    10/16/2025, 1:09 PM
    Hello All, I wanted to test grading my redteam prompt responses locally and therefore I set the provider to an example Ollama Model, namely llama3.2. (please see right half of the attached screenshot, which shows my yaml). However, when I run redteam action, I still see that "llm_rubric" going to api.promptfoo.app/api/v1/task endpoint to assess the outcome. Please see the logs on the left half of the screen. The logs mention overriding of grader component to ollama:llama3.2, but still uses promptfoo remote endpoint. Am I configuring my yaml incorrectly? Thanks a lot. https://cdn.discordapp.com/attachments/1428369155555590244/1428369156109107200/Screenshot.png?ex=68f23ff0&is=68f0ee70&hm=ea2c15440b1483828504e1a8c4afb1a813cdde83835722ec1219745f14f4e1f4&
    m
    • 2
    • 5
  • multi-step authorization setup in websocket
    d

    dulax

    10/16/2025, 6:21 PM
    Hi, from the docs it looks like the main way to authorize the websockets provider is through an Authorization header. My current setup requires the websocket to be established, then a specific message sent over the socket with the authorization token, and then promptfoo can go ahead and do its thing. Is there something I missed in the docs that would support this? Alternatively, can I programatically setup the websocket (say via extension) and then have promptfoo use it for its test?
    o
    • 2
    • 3
  • Possible to reference test case vars in assertions?
    j

    jjczopek

    10/20/2025, 12:12 PM
    So what I'm looking it at is that I have an object which is bascially my initial graph state that I will be running. one of the fields it has is
    description
    - which is a ground-truth, human made description. My agent is producing a field in output called
    generated_description
    which is of course the description agent did. I would like to run the
    similar
    metric on the two, ideally using the default test. How can I configure the test assertion so that it would reference
    description
    from the variables of the test case instead of hard-coding it? something like this:
    Copy code
    defaultTest:
      assert:
        - type: similar
          provider: azure:embeddings:text-embedding-3-small
          value: {{input_state.description}}
          transformm: 'output.generated_description'
    m
    • 2
    • 2
  • Sending multiple responses to LLM judge
    s

    storied

    10/20/2025, 8:26 PM
    Is there a way to take multiple LLM responses and send them to another LLM which can then do some action using all of the responses? For example, if I run a prompt through three LLMs so I now have 3 responses. Can I send those 3 responses to another LLM and ask it to combine the three or some other action using all three. I see there is a "select best" metric but I don't want to choose one, I want to combine 3 responses in some fashion. Thank you for your help.
    m
    • 2
    • 6
  • run `promptfoo eval` with NO models to register and evaluate manually?
    p

    pelopo

    10/20/2025, 9:50 PM
    Hi. I would like to only register my different prompts with customs labels for my combination of agents/model so that I manually can annotate them and assign the pass or no pass status. Without the actual run against any models API. I only want to make sure that the prompt is in the database and gives me the option to evaluate/mark/rate them, but the actual evaluation is done somewhere else, like in** claude code** or in codex. I launched
    promptfoo eval
    And immediately cancelled it. get the prompt in the database but there are no options to manually annotate them or rank them. Basically I need the evaluation metrics even if the problem didn't run - #evals https://cdn.discordapp.com/attachments/1429949929895493723/1429949931329683586/image.png?ex=68f80026&is=68f6aea6&hm=baba382cc34013385292eccf32ab9408222c2a47ff0715eba7e58f97ba05d31d& https://cdn.discordapp.com/attachments/1429949929895493723/1429949932080599221/image.png?ex=68f80026&is=68f6aea6&hm=ed959dc3f0dc443702e9aac5887b6df4b4eaccfbd2947aee4ff61e69fa5edff6&
    w
    • 2
    • 3
  • Question regarding data retention and privacy for api.promptfoo.app
    g

    gonkm

    10/21/2025, 4:57 AM
    Hello promptfoo team, We are considering using promptfoo's remote Grading and Attack generation features. When using these features, we understand that a POST request is sent to the https://api.promptfoo.app/api/v1/task endpoint. The request body of this call may contain sensitive and confidential information from our product under development, such as prompts, context, and test cases. From a security and confidentiality standpoint, we need to clarify your data handling policies. Could you please let us know: Is the content of the request body (prompts, variables, etc.) sent to this endpoint persistently stored on promptfoo's servers (e.g., in databases or log files)? If it is stored, for how long is this data retained? Is our understanding correct that the data is deleted immediately after the task (grading/attack generation) is executed? We need to ensure our sensitive data is handled appropriately. We would greatly appreciate it if you could provide details on your data retention policy. Thank you.
    m
    • 2
    • 4
  • Error: "self-signed certificate in certificate chain" with tls config
    d

    dulax

    10/21/2025, 9:50 PM
    Hi, when I setup TLS in my config, even with
    rejectUnauthorized: false
    I keep getting the above error. Is there anything more I should be doing?
    Copy code
    - id: https
      config:
        url: https://<myhost>:8080/chat
        method: POST
        tls:
          certPath: 'client.cert'
          keyPath: 'client.key'
          caPAth: 'ca.cert'
          rejectUnauthorized: false
    Error
    Copy code
    Request to https://myhost:8080/v1/run failed, retrying: TypeError: fetch failed (Cause: Error: self-signed certificate in certificate chain)
    u
    • 2
    • 2
  • Prompt name in Prompt tab table
    p

    pelopo

    10/22/2025, 11:04 AM
    How to add "prompt name" or some label to the table of prompts in the prompt tab in the UI. By default I only see the ID (some UUID) and prompt text, but no way to see which prompt is it, or which variation. I tried to add labels to the prompt in "promptfooconfig" but it didn't help. Thanks https://cdn.discordapp.com/attachments/1430512074143830086/1430512074370580490/prompt_tab.jpg?ex=68fa0bb0&is=68f8ba30&hm=c07c53b81e984a3f9bcc2afb25603056ea71a2d96b0bc11c275f3e33cc265feb&
    m
    • 2
    • 2
  • Add SharePoint Dataset Support
    t

    Teti

    10/22/2025, 1:30 PM
    Hey everyone! We're thinking about contributing a new feature to Promptfoo — adding support for pulling datasets directly from SharePoint. In our setup, SharePoint is the single source of truth for evaluation data. It’d be super helpful if Promptfoo could read datasets from a SharePoint file URL (CSV or Excel), similar to how it currently works with Google Sheets. The idea is to: Let users reference a SharePoint dataset link in the tests: field of their config. Support private file access via Microsoft Graph API with authentication. Before diving in, I just wanted to check with the maintainers/community: Would you be open to a PR adding this feature?
    m
    • 2
    • 1
  • Automatic trace injection to promptfoo from coding assistants and not API?
    p

    pelopo

    10/23/2025, 1:48 PM
    Aye, I was wondering if there is any automagical way to send to promptfoo traces or conversations that I have with the likes of Codex, Claude Code, Amp or some other coding assistants that use subscription model instead of and API. Something that would listen real time and make it go to promptfoo for me to evaluate on each turn basis? Maybe some third party tool? Thanks
    d
    • 2
    • 1
  • Evaled or not tick in the results table.
    p

    pelopo

    10/23/2025, 2:15 PM
    In the Results table in the UI, I don't see any obvious way to tell which runs were evaluated and which are not. I have to click each line to see whether I gave it a thumbs up, thumbs down, or left a comment, etc. It would be nice to have a column with a tick if any evaluation criteria were used, making it easy to see what needs work and what doesn't. For example, in the attached screenshot there are 4 runs and I have touched all of them, but only the one with the red percentage clearly shows I evaluated it negatively. The other 3 with 100% in green were also evaluated, but it's unclear whether they were, because I left comments and ratings. https://cdn.discordapp.com/attachments/1430922506558247012/1430922507095113989/image.png?ex=68fb89ee&is=68fa386e&hm=cdd74783b89bd32fbf2f26a723402fd311f7a870ccd5dd2a33bba39a87c858ce&
  • How to test dynamic multi-turn conversations in Promptfoo?
    đ

    Đức Duy

    10/23/2025, 3:28 PM
    Hi everyone 👋 I’m testing a medical chatbot agent that starts from a symptom (e.g. “I have stomach pain”), then asks several related questions, and finally recommends a suitable clinic. The problem is that each test run may have different question wording or order, so I can’t predefine all user inputs in advance. I’d like to dynamically provide user replies based on the agent’s last question — for example, if the agent asks about pain_location, I return the predefined answer for that property. Is there any recommended way in Promptfoo to handle this kind of dynamic multi-turn input-output flow? Thanks!
  • Environment variable substitution only working some places
    c

    crizm

    10/23/2025, 11:57 PM
    I'm trying to use environment variables defined in a .env file to specify a default provider for llm-rubric: defaultTest: options: provider: id: "azure:chat:{{ env.MY_DEPLOYMENT}}" config: apiVersion: "{{ env.API_VERSION }}" apiHost: "{{ env.AZURE_ENDPOINT }}" For some reason, only env.MY_DEPLOYMENT gets replaced. "{{ env.AZURE_ENDPOINT }}" does not (nor does API_VERSION, and there doesn't appear to be a way to affect that through preset environment variables) and results in a "Failed to sanitize URL" error. Any idea what's wrong here?
    m
    • 2
    • 1
  • OpenRouter - API error: 401 // message: No auth credentials found
    h

    haveles

    10/24/2025, 2:39 PM
    Hi all, I'm encountering persistent 401 Unauthorized errors when trying to use OpenRouter providers in my self evaluation and model comparison configs, despite having a working API key and successful direct API calls. Error Details: [ERROR] API error: 401 Unauthorized {"error":{"message":"No auth credentials found","code":401}} What's Working: OpenRouter API key works perfectly with direct curl calls Successfully configured and ran deterministic A/B testing for 3 LLMs using OpenRouter Environment variable OPENROUTER_API_KEY is properly set Current Configuration (that works for A/B testing): providers: - id: openrouter:anthropic/claude-3.5-sonnet config: temperature: 0.0 max_tokens: 2000 apiKey: ${OPENROUTER_API_KEY} What's Failing: Self-grading config with identical provider setup Model comparison config with identical provider setup All attempts result in 401 errors Attempted Fixes: Variable syntax variations: ${OPENROUTER_API_KEY}, "{{ env.OPENROUTER_API_KEY }}" Provider ID variations: different model names and versions Configuration approaches: Direct OpenRouter, OpenAI with custom base URL, Anthropic with custom base URL Environment handling: shell variables, --var flag, --env-file flag Removed llm-rubric assertions in attempt to fix authentication issues System Info: Promptfoo version: 0.118.17 OS: macOS Any insights on what might be causing this inconsistent behavior would be greatly appreciated!
    m
    • 2
    • 2
  • Clarification regarding Red Team configuration
    t

    tanktg

    10/28/2025, 12:10 PM
    Hi all, I am working for a cybersecurity service provider and we would like to use Promptfoo to test LLM applications of our customers. Data privacy is of major importance to us, and we therefore don't want to send any data or requests of any sort to PromptFoo's cloud services. In practice, this means that adversarial input generation, response evaluation and grading of attacks should all happen in our systems, and that all telemetry should be disabled. Looking at the documentation (https://www.promptfoo.dev/docs/red-team/configuration/#how-attacks-are-generated), we have several questions regarding the correct configuration to use: Will setting the PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION env var to true prevent adversarial input generation requests to be sent to promptFoo's API, while allowing us to use our own remote LLM deployed in our cloud environment? Or should we specify our own attacker model provider in the config file, while leaving PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION to its default value, false? Additionally, I understand that it is possible to override the default grader by specifying a custom one in the config file: https://www.promptfoo.dev/docs/red-team/troubleshooting/grading-results/#overriding-the-grader. Will making those two configuration changes (specifying a custom attacker model provider, and a custom grader) be enough to ensure that no data (including telemetry of usage data) is ever sent to promptFoo's services? If not, what additional configuration is needed to achieve this? Thanks
  • How to hook context in YAML?
    a

    Alex1990

    10/28/2025, 3:45 PM
    Hi, everyone. I spent around 3-4 hours to understand how dynamic context works, but whatever I did, every time I got an error. I connected to my own RAG using custom call_api
    Copy code
    def call_api(prompt, options=None, context=None):
    .................. some logic......
        data = response.json()
        contexts = [source.get('content', '')
                    for source in data.get('sources', [])]
    
        return {
            "output": data.get('content', ''),
            "context": context_text
        }
    and part of YAML for this metric
    Copy code
    assert:
          - type: context-relevance
            contextTransform: context
            value: ''
    But when I tried to catch this context field from the RAG response, I got an error below Whatever I did, I tried to use a string or array, just context or output.context, every time I had an error
    Copy code
    Error: Failed to transform context using expression 'context': Invariant failed: contextTransform must return a string or array of strings. Got object. Check your transform expression: context
    
    Error: Failed to transform context using expression 'context': Invariant failed: contextTransform must return a string or array of strings. Got object. Check your transform expression: context
        at resolveContext (/Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/contextUtils.js:60:19)
        at async handleContextRelevance (/Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/contextRelevance.js:23:21)
        at async runAssertion (/Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/index.js:353:24)
        at async /Users/aleksandrmeskov/.npm/_npx/81bbc6515d992ace/node_modules/promptfoo/dist/src/assertions/index.js:400:24
    In documentation, it looks pretty simple, but look like it doesn't work correctly https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/context-relevance/ Any suggestions, how I can handle that? https://cdn.discordapp.com/attachments/1432757147405651968/1432757147648786584/image.png?ex=69023693&is=6900e513&hm=e3561b5fac664cff41e9131fc0c4327ce0fa1634c74a9240f06dff1d91c6ffb1&
    w
    • 2
    • 1
  • _conversation / previous messages for Simulated User and Assistant
    e

    Elias_M2M

    10/29/2025, 9:38 AM
    Hello, I would like to test a multi-turn conversation between an assistant and a simulated user. The prescribed conversation flow of the assistant is very long and for my current test cases I just need to test the end of the conversation. For these tests, the previous messages are very important, so the simulated user and the assistant need to know what "they" said before. I saw in the docs, that there is an option of adding a variable "messages" or "_conversation", but I don't know how this is behaving with the simulated user provider. Is it possible the define the previous messages for both the assistant and the simulated user, so they know where to continue the conversation? And how can I do this?
  • prompts generation only
    b

    b00l_

    10/29/2025, 2:36 PM
    hello, I have a redteam.yaml file with a bunch of plugins enabled, is it possible to just generate prompts, and save them in a file based on all plugins enabled? can I do it local only and even with openai key?
    u
    • 2
    • 3