Evaluating GPT-4o: Performance Beyond Standard Benchmarks