Garrett Weaver
07/28/2025, 5:33 PMdaft.exceptions.DaftCoreException: Not Yet Implemented: Window functions are currently only supported on the native runner.
A small test with new engine on seems to work, but want to make sure there are not any caveats.Everett Kleven
07/28/2025, 9:41 PMYufan
07/29/2025, 7:30 AMAggregateFnV2
interface to define an efficient aggregation UDFAmir Shukayev
07/31/2025, 5:10 PMPiqi Chen
07/31/2025, 11:59 PMGarrett Weaver
08/01/2025, 4:53 PMGiridhar Pathak
08/06/2025, 9:43 PMSky Yin
08/09/2025, 3:54 PMKesav Kolla
08/14/2025, 5:10 AMMichele Tasca
08/24/2025, 4:24 PM“first”
and “last”
aggregation strategies for window functions? Are there plans to support them?
I commented in this git issue, but also asking here in case i missed something
(Btw.. I’m evaluating different framewroks for a new project of ours, and it’s amazing how many things “just work” in daft. Too bad no first or last is a deal breaker for us)can cai
08/26/2025, 10:10 AMGarrett Weaver
08/27/2025, 5:54 AMKesav Kolla
08/27/2025, 11:26 AMGarrett Weaver
08/27/2025, 6:18 PMdaft.func
vs daft.udf
? I would guess that if the the underlying python code is not taking advantage of any vectorization but maybe just a list comprehension [my_func(x) for x in some_series],
then just use daft.func
?Garrett Weaver
08/28/2025, 4:21 PMVOID 001
08/29/2025, 3:55 AMdf = daft.from_pydict({
"json": [
'{"a": 1, "b": 2}',
'{"a": 3, "b": 4}',
],
})
df = daft.sql("SELECT json.* FROM df")
df.collect()
Amir Shukayev
08/29/2025, 4:01 AMconcat
lazy? Like
df = reduce(
lambda df1, df2: df1.concat(df2),
[
df_provider[i].get_daft_df()
for i in range(num_dfs)
],
)
Is there any way to lazily combine a set of dfs? in any orderSky Yin
08/29/2025, 10:31 PMGarrett Weaver
09/04/2025, 8:41 PMget_next_partition
is running there.Desmond Cheong
09/04/2025, 11:58 PMVOID 001
09/05/2025, 5:56 AMPeer Schendel
09/07/2025, 9:10 AMimport os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2025-03-01-preview",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
# Upload a file with a purpose of "batch"
file = client.files.create(
file=open("test.jsonl", "rb"),
purpose="batch",
extra_body={"expires_after":{"seconds": 1209600, "anchor": "created_at"}} # Optional you can set to a number between 1209600-2592000. This is equivalent to 14-30 days
)
print(file.model_dump_json(indent=2))
print(f"File expiration: {datetime.fromtimestamp(file.expires_at) if file.expires_at is not None else 'Not set'}")
file_id = file.id
Edmondo Porcu
09/07/2025, 4:17 PMChanChan Mao
09/08/2025, 5:29 PMChanChan Mao
09/09/2025, 6:23 PMKyle
09/11/2025, 5:04 AMEdmondo Porcu
09/12/2025, 6:36 PMRakesh Jain
09/12/2025, 10:15 PMKyle
09/15/2025, 6:22 AM吕威
09/16/2025, 7:22 AM@udf(
return_dtype=DataType.list(
DataType.struct(
{
"class": DataType.string(),
"score": DataType.float64(),
"cropped_img": DataType.image(),
"bbox": DataType.list(DataType.int64()),
}
)
),
num_gpus=1,
batch_size=16,
)
class YOLOWorldOnnxObjDetect:
def __init__(
self,
model_path: str,
device: str = "cuda:0",
confidence: float = 0.25,
):
# int model
pass
def __call__(self, images_2d_col: Series) -> List[List[dict]]:
images: List[np.ndarray] = images_2d_col.to_pylist()
results = self.yolo.predict(source=images, conf=self.confidence)
for r in results:
img_result = []
orig_img = r.orig_img
for box in r.boxes:
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy().astype(int)
x1, y1 = max(0, x1), max(0, y1)
x2, y2 = min(orig_img.shape[1], x2), min(orig_img.shape[0], y2)
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
cls = int(box.cls[0])
img_result.append(
{
"class": self.yolo.names[cls],
"score": float(box.conf[0]),
"cropped_img": {
"cropimg": cv2.cvtColor(
orig_img[y1:y2, x1:x2], cv2.COLOR_BGR2RGB
),
},
"bbox": [x1, y1, x2, y2],
}
)
objs.append(img_result)
return objs
the cropped_img must return with dict, if direct return np.ndarray, will raise Could not convert array(..., dtype=uint8) with type numpy.ndarray: was expecting tuple of (key, value) pair error
why?