Model Insights

gpt-4o-2024-05-13

Details

Developer

OpenAI

License

NA (private model)

Model parameters

NA (private model)

Supported context length

128k

Price for prompt token

$5/Million tokens

Price for response token

$15/Million tokens

Model Performance Across Task-Types

Chainpoll Score

Short Context

0.96

Medium Context

1

Long Context

0.99

Model Insights Across Task-Types

Digging deeper, here’s a look how gpt-4o-2024-05-13 performed across specific datasets

Short Context RAG

Medium Context RAG

This heatmap indicates the model's success in recalling information at different locations in the context. Green signifies success, while red indicates failure.

Prompt Type

gpt-4o-2024-05-13

Long Context RAG

This heatmap indicates the model's success in recalling information at different locations in the context. Green signifies success, while red indicates failure.

Prompt Type

gpt-4o-2024-05-13

Performance Summary

Tasks	Task insight	Cost insight	Dataset	Context adherence	Avg response length
Short context RAG	The model demonstrates exceptional reasoning and comprehension skills, excelling at short context RAG. It outperforms other models in mathematical proficiency, as evidenced by its strong performance on DROP and ConvFinQA benchmarks.	It is a great model but is nearly 2x costlier than Sonnet 3.5. If cost is your concern its better to try out Gemini-1.5-Pro or Llama-3-70b.	Drop	0.97	360
			Hotpot	0.93	360
			MS Marco	0.95	360
			ConvFinQA	1.00	360
Medium context RAG	Flawless performance making it suitable for any context length upto 25000 tokens.	Great performance but we recommed using 50x cheaper Gemini Flash.	Medium context RAG	1.00	360
Long context RAG	Flawless performance making it suitable for any context length upto 100000 tokens.	Great performance but we recommend using either 1.5x cheaper Claude 3.5 Sonnet or 50x cheaper Gemini Flash.	Long context RAG	0.99	360

Read the full report