Suraj
08/22/2025, 10:38 AMPuneet Arora
08/24/2025, 7:57 AMDonato Azevedo
08/25/2025, 8:03 PMjob_id,clause_number,text,legislation_id,expected_compliant,expected_analysis_rubric,__expected1,__expected2,__expected3
job123,2,"Customers have 3 days to return defective products",consumer_law,FALSE,The analysis must state that the minimum is 7 days.,javascript: JSON.parse(output).Evaluation.Status === 'OK',javascript: JSON.parse(output).Evaluation.Compliant === (context.vars.expected_compliant === 'TRUE'),llm-rubric: The analysis should satisfy the rubric: {{expected_analysis_rubric}}
But it's obviously failing when the __expected3
fails... How can I still run the rubric, but disregard its actual score?Elias_M2M
08/26/2025, 1:25 PMEnvy
08/26/2025, 3:49 PMpeter
08/26/2025, 10:59 PMdefaultTest:
options:
rubricPrompt: |
You are an expert factuality evaluator. Compare these two answers:
Question: {% raw %}{{input}}{% endraw %}
Reference answer: {% raw %}{{ideal}}{% endraw %}
Submitted answer: {% raw %}{{completion}}{% endraw %}
Determine if the submitted answer is factually consistent with the reference answer.
Choose one option:
A: Submitted answer is a subset of reference (fully consistent)
B: Submitted answer is a superset of reference (fully consistent)
C: Submitted answer contains same details as reference
D: Submitted answer disagrees with reference
E: Answers differ but differences don't affect factuality
Respond with JSON: {"category": "LETTER", "reason": "explanation"}
and the eval works as expected. But, the only way I can find to view the JSON response of the grader is by turning --verbose
on. The 'category' selection, for instance, isn't available in the dashboard or json outputs.
I can pipe the command output to a file and jq/grep through that, but I feel like I'm probably missing a better way to grab that info?wmluke
08/27/2025, 2:06 AMtool
and args
test vars translated to json given the static 'MCP Tool Call Test' prompt?
https://github.com/promptfoo/promptfoo/blob/main/examples/simple-mcp/promptfooconfig.yamlFiboape
08/27/2025, 5:44 PMArunS1997
09/01/2025, 8:47 AMElias_M2M
09/01/2025, 2:21 PMdulax
09/04/2025, 3:05 PM{{prompt}}
and then add my own prompts.
I did try this:
prompts:
- '{{prompt}}'
- 'My custom prompt'
But then when I view the output, it's not showing my prompt as another entry in the list - it's appending it to each prompt that was generated.
What am I not getting about this?Tak
09/05/2025, 2:44 PMQuinten
09/05/2025, 6:12 PMprompts:
- "{{problem_description}}"
providers: # A single enabled custom Python provider
- id: file://promptfoo/promptfoo_classifier_provider.py
label: gpt5-only
config:
enabled_providers: ["gpt-5"]
decision_mode: "first"
taxonomy_name: "default"
include_debug_details: true
cache_enabled: true
tests: promptfoo/OSB_sample_data_mini.csv
# tests: promptfoo/OSB_sample_data.csv
defaultTest:
assert:
- type: javascript
value: |
// actual JS test here
this worked in 0.118.0, seems to fail in 0.118.3. downgrading to 0.118.0 seems to get things working again, so maybe just a bug? i didn't see a related issue in GitHub yet and it's also possible I just have weird syntax that I should fixJosema Blanco
09/09/2025, 8:43 AMRysiek
09/09/2025, 11:11 AMproviders:
- id: openai:gpt-4.1-mini
label: openai
config:
apiHost: "yourAIhost.com"
apiKey: sk-abc123
So the prompts themselves are executed properly against "yourAIhost.com" and work properly.
But the llm-rubric tests are then executed against api.openai.com
I tried many different setups. Just the one I provided. I tried setting
defaultTest:
options:
provider: openai:gpt-4.1-mini
I tried using a custom label.
I tried making the whole provider custom http provider, which also worked, and then to reference it in defaultTest, or in the llm-rubric test. Nothing works – the test always hits the api.openai.com
When I tried to use custom https provider, with a label, and then reference it in defaultTest config, I get a response in the UI:
Error: Invariant failed: Expected HTTP provider https:custom to have a config containing {body}, but instead got {}
That looks like a bug, cause it identifies the provider, but not its config. Which works correctly for the prompts themselves, just not in llm-rubric verifier.
Anyone had similar problem and managed to overcome it?azai91
09/16/2025, 4:12 AMpath_traverser
09/16/2025, 1:26 PMpromptfoo redteam
from:
# Promptfoo Proxy with authentication
export HTTPS_PROXY=127.0.0.1:8080
# SSL certificates - Absolute path
export PROMPTFOO_CA_CERT_PATH=/Users/user/burpsuitecacert.der
When running promptfoo -v redteam run
I get the following output in verbose mode:
[apiHealth.js:17] [CheckRemoteHealth] Checking API health: {"url":"https://www.promptfoo.app/health","env":{"httpsProxy":"127.0.0.1:8080"}}
[apiHealth.js:36] [CheckRemoteHealth] Making fetch request: {"url":"https://www.promptfoo.app/health","options":{"headers":{"Content-Type":"application/json"}},"timeout":5000,"nodeVersion":"v24.7.0"}
[fetch.js:114] Using custom CA certificate from /Users/user/burpsuitecacert.der
[fetch.js:122] Using proxy: https://127.0.0.1:8080/
[apiHealth.js:95] [CheckRemoteHealth] API health check failed: {"error":"Request timed out after 5000 ms","url":"https://www.promptfoo.app/health"}
When I do not use the proxy, it works as it should. When I proxy local browser traffic to the API endpoint I want to test, it works as it should. I definitely have access to https://www.promptfoo.app/health so it seems the problem is with the communication between Promptfoo and Burpsuite proxy. Any ideas where to look when troubleshooting this will be very welcome!azai91
09/16/2025, 4:06 PMdulax
09/17/2025, 2:55 PM[pythonUtils.js:168] Error running Python script: EACCES: permission denied, open '/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/promptfoo-python-input-json-1758117675565-14497df623c8b.json'
command:
npx promptfoo@0.117.11 redteam eval -c it-works.yaml
promptfoo config:
description: >-
A simple configuration file that demonstrates the basic functionality of the
system.
targets:
- id: http
config:
url: '{{url}}'
method: POST
transformResponse: 'json.message.parts[0].text'
validateStatus: 'status >= 200 && status <= 299'
headers:
Content-Type: application/json
body:
session_id: '{{session_id}}'
message: '{{prompt}}'
user_id: '{{user_id}}'
redteam:
purpose: You are awesome.
plugins:
- id: intent
config:
intent: Hello, world
entities: &ref_0 []
strategies: []
defaultTest:
metadata:
purpose: You are awesome.
entities: *ref_0
tests:
- vars:
prompt: Hello, world
assert:
- type: promptfoo:redteam:intent
metric: Intent
metadata:
pluginId: promptfoo:redteam:intent
severity: high
modifiers: {}
goal: >-
Manipulate the system using benign-looking input as a carrier for a
hidden attack technique specified in the pluginContext.
extensions:
- file://extensions/my_extension.py:extension_hook
azai91
09/18/2025, 4:30 PMazai91
09/19/2025, 3:00 PMellebarto
09/23/2025, 3:01 PMGia Duc
09/24/2025, 3:38 AMazai91
09/25/2025, 6:26 PMazai91
09/27/2025, 4:01 PMRares Vernica
09/29/2025, 7:09 PM--filter-failing
but it seems broken, see [#5755](https://github.com/promptfoo/promptfoo/issues/5755). Is there a workaround?Jérémie
10/01/2025, 8:04 PMpromptfoo eval
command ?
I see there is a way to update eval name from the webpage but I'm wondering how I can let my testers easily access eval results by adopting a naming convention.
Thanks for your insights.
Kind regards,
JérémieFirestorm
10/02/2025, 8:57 PMdulax
10/06/2025, 7:45 PMbeforeEach
and just hits the create session endpoint, extracts the session_id from the response and passes it through the context.
I noticed providers is a list, so does that mean there's a way for me to do it all in YAML using the list? I couldn't find an example.Umut
10/07/2025, 11:55 AM