https://promptfoo.dev/ logo
Join Discord
Powered by
# questions
  • Disable provider, programmatic testing only
    i

    IzAaX

    04/29/2025, 9:09 AM
    I'd like to be able to use promptfoo optionally as a library for running manual tests on LLM outputs which have already been produced. So I have a dataset of 10000s of LLM outputs and need to sample test their accuracy and output format, it's basically a multilabelling pipeline (the output from the LLM is an array of N values from a discrete list). Is this the right library for my use case? 🤔 we may use the ai as a judge elements in the future but we're just starting with the programmatic testing for the moment.
    • 1
    • 1
  • provide assertion feedback back to chat
    d

    dmitry.tunikov

    04/30/2025, 7:40 AM
    Hi everyone! Do you know if there is a way to provide feedback to an LLM after assertion? I'm using promptfoo for checking my text -> graphql pipeline I would like to validate generated query with python/js and provide an error back to LLM + regenerate the query I read the docs, but couldn't find anything suitable for that. Essentially, I'm trying to do smth similar to this: https://www.promptfoo.dev/docs/guides/text-to-sql-evaluation/ but this example also doesn't have any sql validation + regeneration with feedback
  • OpenAI Responses API with Structured Output
    t

    Tony

    04/30/2025, 1:22 PM
    Is the
    text_format
    parameter supported for OpenAI’s
    responses.parse()
    method in promptfoo? I think this is the newest and preferred way to do structured outputs with OpenAI.
  • âť—[Help Needed] RedTeam Plugin Error: config.intent Not Set
    k

    kira

    05/03/2025, 7:29 AM
    Hey folks, I’m running into an issue with the RedTeam plugin and could use some help. I'm getting the following error:
    Copy code
    Error running redteam: Error: Validation failed for plugin intent: Error: Invariant failed: Intent plugin requires `config.intent` to be set
    Has anyone faced this before? Any idea what config.intent needs to be set to, or where exactly this should be configured? 🤔 Appreciate any guidance 🙏
    i
    • 2
    • 10
  • "Source Text" in g-eval showing as "[object Object]"
    d

    davidfineunderstory

    05/05/2025, 4:30 PM
    A) I'm using javascript files as prompts B) when using g-eval with these prompts, this is what is appearing in the "Source Text" field for the g-eval prompt sent to the LLM:
    Copy code
    Source Text:
    [object Object],[object Object]
    How can I make sure my original prompt is properly displayed to the g-eval prompt?
    i
    h
    • 3
    • 6
  • Response in promptfoo is truncated (Nova Pro)
    e

    ert

    05/05/2025, 11:51 PM
    I am using Nova Pro, and the JSON response that I'm expecting is always truncated. I already have max_tokens set to 300,000, and it's still cutting off the response. https://cdn.discordapp.com/attachments/1369099281512005813/1369099281675587737/image.png?ex=681aa091&is=68194f11&hm=5ebbfe1ac51c54237b43e291e4b8b4bdfa0ac2000fcc5609e9f6fe8b28eb5764&
    • 1
    • 1
  • Format the output.json to only include the LLM output
    e

    ert

    05/06/2025, 1:27 PM
    Based in this link https://www.promptfoo.dev/docs/configuration/parameters/#output-file, each record includes the original prompt, LLM output, and test variables. Is there a configuration that will only show the LLM output and nothing else?
  • Red Team Config Examples
    r

    Rob

    05/07/2025, 9:07 PM
    Hi, Love the idea of promptfoo! I'm trying to set up some simple examples of evaluating promptfoo for my organization. Are there example redteam configs? Like something for ChatGPT, OpenRouter, PortKey.ai, and a local chatbot server? Any tips are much appreciated!
    i
    • 2
    • 4
  • Prompts Sending
    a

    aldrich

    05/09/2025, 2:19 AM
    Hi folks, i'm running some sample tests against our premise LLM, using the combination of plugins and strategies, but i didnt want the promptfoo send the prompts generated by the original plugins and just wang to send the prompts which converted by the strageties. How can i do that? any suggestions would be appreicated!
    o
    i
    • 3
    • 2
  • Is there a config option (other than environment variable) for using ollama from a remote host?
    h

    harpomaxx

    05/12/2025, 8:50 PM
    Im looking for something I can set in the yaml file. Something like this...
    Copy code
    - id:ollama:chat:qwen2.5:b
       config:
         ollama_base_host:"http://10.20.30.40:11434"
    i
    • 2
    • 1
  • Sending multiple prompts as json
    a

    ahmedelbaqary.

    05/12/2025, 9:18 PM
    Hi folks, I'm new to this so I want to ask about the prompt as json to include the systme message, I know we can send prompts as separate json files, but I'm asking if there is a way to put all these promps in the promptfooconfig.yaml file? not as separate json files
    i
    • 2
    • 1
  • MCP support
    j

    jeffveen

    05/14/2025, 5:26 AM
    I'm curious if the MCP implementation may not match the documentation? I have promptfoo working for a few eval runs, but when trying to configure new tests using our remote MCP server I get errors. I'm following this: https://www.promptfoo.dev/docs/integrations/mcp/ using this in my config:
    Copy code
    providers:
      - id: openai:chat:gpt-4.1
        config:
          apiKey: sk-...
          mcp:
            enabled: true
            server:
              url: "https://[redacted].com/sse"
              name: "mcp-dev"
    The server is working fine with other mcp clients (Cursor, Claude desktop), but when running tests they fail with:
    Error: Failed to connect to MCP server mcp-dev: Remote MCP servers are not supported. Please use a local server file or npm package.
    Copy code
    $  npx -y promptfoo@latest --version
    0.112.5
    i
    f
    • 3
    • 4
  • usage of deprecated packages - glob, inflight, rimraf in promptfoo node.js package
    r

    Rohit Jalisatgi - Palosade

    05/16/2025, 6:43 PM
    Has there been any effort to move off these deprecated packages? npm warn deprecated inflight@1.0.6: This module is not supported, and leaks memory. Do not use it., npm warn deprecated rimraf@3.0.2: Rimraf versions prior to v4 are no longer supported, npm warn deprecated glob@7.2.3: Glob versions prior to v9 are no longer supported
    b
    • 2
    • 3
  • Http API provider is working for generations but NOT for evaluations.
    r

    rajendra.gola_86260

    05/19/2025, 5:58 AM
    Promptfoo uses by default open ai model for evaluations. How HTTP API provider can be forced to use for evaluations also? Can someone please tell if they have tried it? https://cdn.discordapp.com/attachments/1373902607906508862/1373902608426598400/image.png?ex=682c1a03&is=682ac883&hm=5b0077cc38b5105840aab8948244eaf93117e77d3f53d53fd1d4bdc20b598dd1&
    b
    i
    • 3
    • 5
  • How to use multiple providers for evaluations?
    r

    rajendra.gola_86260

    05/19/2025, 5:59 AM
    I see promptfoo has the provision to use multiple providers for generation but the same is not working for evaluation. Can someone please let me know if they have figured out how to use multiple providers for evaluations?
    b
    • 2
    • 2
  • Run assertions on mcp input
    j

    Joshua Frank

    05/20/2025, 6:12 AM
    Is there a way to check if a MCP tool was called? It would also be great if the input could have assertions run on it. We have a need to validate that the correct tools were used by the LLM. My current plan is to create a proxy MCP server and capture the input. I don't want to reinvent something that already exists though.
    i
    • 2
    • 2
  • openai /v1/responses endpoint
    a

    ahmedelbaqary.

    05/20/2025, 9:24 PM
    Does promptfoo support this endpoint, or is there any intentions to add it rather than the completion endpoint
    a
    • 2
    • 1
  • Having errors when using MCP with Gemini 2.5 flash
    j

    Joshua Frank

    05/21/2025, 12:55 AM
    I'm running into an issue with promptfoo eval using this config: - id: google:gemini-2.5-flash-preview-05-20 config: mcp: enabled: true server: url: https://mcp.server When I run it, I get this error: Error: Expected one candidate in API response. If I set enabled: false, everything works (just without MCP). Also, if I switch to a different model like chat-gpt-4.1, it works fine too. What's strange is that we're already using this same Gemini 2.5 Flash model with MCP elsewhere, and it's working — just not through promptfoo. Is this a config issue on my end, or something I should report as a bug in promptfoo? Thanks in advance for any help!
    a
    • 2
    • 4
  • How to evaluate pre-generated outputs where input context is required for evaluation?
    r

    rajendra.gola_86260

    05/22/2025, 8:44 AM
    I am able to evaluate the pre-generated LLM outputs using --model-outputs parameter. But in cases where input is required for evaluations along with pre-generated outputs it doesn't work. Please find attached the asserts.yaml and outputs.json file and also error in the terminal. Can someone please help here? https://cdn.discordapp.com/attachments/1375031588252815391/1375031588521512970/image.png?ex=68303575&is=682ee3f5&hm=61a2be001d5ad26942d5efea26bbabe4f3fa56ad4b81a742e8d9640995fd5930& https://cdn.discordapp.com/attachments/1375031588252815391/1375031588714446869/asserts.yaml?ex=68303575&is=682ee3f5&hm=091e358f20dfcf539923260958ad4a479d27f3ea9fe5a77898534381fc00edc2& https://cdn.discordapp.com/attachments/1375031588252815391/1375031589066510336/outputs.json?ex=68303575&is=682ee3f5&hm=32aaab9c23d66fadb3f048949b4a0ce814a5dda39bb3ba34a1339a473250f10b&
  • What data is sent to PromptFoo's API during remote generation for Red Teaming?
    j

    JohnRoy

    05/23/2025, 12:36 PM
    I understand that PromptFoo uses their own fine-tuned models for attack generation, which are accessible only through PromptFoo's API. Some plugins (like harmful) requires this models even when setting PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION=true. For these cases, I assume that the whole config file is sent to PromptFoo's server, exposing, for example, the "purpose" section. My question is then particularly about the jailbreak strategy. On the documentation jailbreak is described as following: 1) Starting with a base prompt that attempts to elicit undesired behavior 2) Using an LLM-as-a-Judge to: 2.1) Analyze the AI's response 2.2) Track the conversation history 2.3) Generate increasingly refined prompts based on previous attempts 4) Repeating this process for a configurable number of iterations 5) Selecting the most effective prompt variation discovered Is the generation of point 2.3) taking place again in PromptFoo's servers? I.e., are all the answers of my LLM being "exposed"? Also is it possible to completely replace PromptFoo's unaligned model for a local unaligned one? Thank you
    i
    • 2
    • 1
  • Is it possible to red teamin with node package
    r

    roy

    05/25/2025, 12:33 AM
    Is it possible to red teamin with node package
    i
    • 2
    • 1
  • Share Feature seems to be not working
    p

    pyq

    05/29/2025, 7:34 AM
    I used the command promptfoo share to upload the evaluation results to www.promptfoo.app. However, when I create a shareable link and try to share it with others, the link doesn't work—regardless of whether it's set to public or not. this is a generated sample link: https://www.promptfoo.app/register?invite_code=8d8650f5-8717-428c-a0d3-6a690ae39ed5&eval_id=eval-J77-2025-05-29T07:23:34 https://cdn.discordapp.com/attachments/1377550615995093122/1377550616175579146/Screenshot_2025-05-29_at_3.33.20_PM.png?ex=68395f7c&is=68380dfc&hm=7873b1a815939c905a611c0f20163dd51458407d4c8251eec974677c8f8e61b3&
    s
    • 2
    • 2
  • How do I pass the videos as a multi-model input to the Google's Gemini flash models?
    r

    raxrb

    05/29/2025, 5:56 PM
    Can anybody share me the example to pass multi-modal data like video, images, or audio to the Gemini Flash model?
    • 1
    • 1
  • enterprise
    n

    nomo-fomo

    05/31/2025, 5:54 AM
    How is enterprise self-host different from teams taking the github version and deploying it on a kubernetes cluster? Also, what is the pricing - it does not list it on the site.
    i
    • 2
    • 3
  • Disabling telemetry
    n

    nomo-fomo

    05/31/2025, 5:56 AM
    Currently, even after setting the environment variable, there is a single message per eval that gets sent to promptfoo server stating "telemetry is disabled". Can we update the code to not do so if the variable is set to not use telemetry?
  • client generated sessionId
    g

    grj373

    06/02/2025, 2:21 PM
    Hi, using promptfoo I have setup a redteam scan and want to use client generated session ID. However, when selecting this option and placing {{sessionId}} in the body of the request promptfoo sends an empty string for the sessionId. Does anyone know of a known issue with this functionality? Thanks
    s
    • 2
    • 3
  • Allowing file upload for Provider on web eval
    n

    nomo-fomo

    06/05/2025, 2:05 AM
    Promptfoo allows uploading of tests and YAML config files from web eval page. These get stored in a persistent storage as well. Why not allow the same capability for provider file? If there are more folks looking for similar capability, I can work on submitting a pull request. But before I do so, I am interested in knowing why it is not allowed in the first place.
  • Asserting on complex nested extracted entities
    d

    Donato Azevedo

    06/05/2025, 1:52 PM
    Hi everyone! I'm humbly looking for suggestions and opinions on how to assert on complex, sometimes deeply nested, extracted entities. My task extracts several entities from pdf documents and I am already using promptfoo to assert and build up metrics for performance. But it's getting ever so hard because of the complexity of the extracted entities. For example, this is one of my assertions:
    Copy code
    - type: python
          value: ('Outorgar Poderes' in output['pessoas_analisadas'][1]['restricoes_valor'] and '12' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'] and 'meses' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'])
          metric: alcada
    And this is not even robust, becase it depends on the order of the
    output['pessoas_analisadas']
    list being consistent across different evals. I'd appreciate any sugestion. Meanwhile, I was even considering contributing a
    transform
    property to assert-sets, which would enable this kind of syntax:
    Copy code
    tests:
      - description: test for persona 1
        vars:
          - file://path/to/pdf
        assert:
          - type: assert-set
            transform: next(o for o in output['pessoas_analisadas'] if o['nome'] == 'NAME OF PERSON')
            assert:
              - type: python
                value: ('Outorgar Poderes' in output['restricoes_valor'] and '12' in output['restricoes_valor']['Outorgar Poderes']['valor_alcada'] ...
    Opinions?
  • Metrics defined via named_scores in python file assertion not showing in UI
    d

    Donato Azevedo

    06/05/2025, 5:05 PM
    I am hesitant to open a bug issue in the github repo simply because this issue (https://github.com/promptfoo/promptfoo/issues/1626) mentions there has a been a fix last year. However, I am trying the same return from the OP of the issue and not getting any metrics shown in the UI:
    Copy code
    python
    def get_assert(output: dict[str, any], context) -> bool | float | GradingResult:
        return {
          'pass': True,
          'score': 0.11,
          'reason': 'Looks good to me',
          'named_scores': {
             'answer_similarity': 0.12,
             'answer_correctness': 0.13,
             'answer_relevancy': 0.14,
          }
        }
    I was expecting to see the three
    answer_*
    named metrics appearing up top https://cdn.discordapp.com/attachments/1380231112252723332/1380231112629944341/Screenshot_2025-06-05_at_14.04.57.png?ex=68431fe4&is=6841ce64&hm=1e59574abdb29e00278b719905d74efcea8620d330947203cbe31d0c4fc9301b&
    s
    • 2
    • 8
  • JSON errors during prompt generation
    b

    Bryson

    06/06/2025, 7:04 PM
    When I'm attempting to generate my redteam prompts (promptfoo redteam generate), I keep consistently running into "SyntaxError" issues in JSON, seemingly at inconsistent points during generation. I've tried multiple times and keep running into the same error:
    Copy code
    [chat.js:161]     completions API response: {"id":"chatcmpl-BfWK1RSvr3LAckI7hoUHM9dYo0Zce","object":"chat.completion","created":1749235393,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"message":{}
    <anonymous_script>:430
    
    
    SyntaxError: Expected ',' or '}' after property value in JSON at position 1966 (line 430 column 1)
        at JSON.parse (<anonymous>)
        at encodeMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:95:32)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
        at async addMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:122:33)
        at async action (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/index.js:195:34)
        at async applyStrategies (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:241:35)
        at async synthesize (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:678:85)
        at async doGenerateRedteam (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/commands/generate.js:243:88)
    Is this a Promptfoo bug by chance? Or is it possible I'm doing something wrong? Happy to DM over my promptfooconfig.yaml if helpful
    • 1
    • 1