This is the full-precision (FP32) reference export. For production use, prefer the INT8 or INT4 variants which are 2â4Ã smaller with negligible quality loss.
Export method
Exported with torch.onnx.export(dynamo=True) (PyTorch 2.9, opset 20).
The dynamo exporter traces at the FX-graph / symbolic level rather than via eager execution. This means all internal tensor shapes â including the Qwen3 causal attention mask â carry symbolic batch and sequence dimensions throughout. The legacy torch.onnx.export produced a static batch=1 inside the causal-mask BitAnd node, breaking inference for batch > 1.
Dynamic batch verified: batch = 1, 2, 4, 8 all produce correct output shapes.