• MoonlightFox@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    6 hours ago

    I have been pretty impressed by Gemini 2.0 Flash.

    Its slightly worse than the very best on the benchmarks I have seen, but is pretty much instant and incredibly cheap. Maybe a loss leader?

    Anyways, which model of the commercial ones do you consider to be good?

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      6 hours ago

      benchmarks

      Benchmarks are so gamed, even Chatbot Arena is kinda iffy. TBH you have to test them with your prompts yourself.

      Honestly I am getting incredible/creative responses from Deepseek R1, the hype is real, though its frequently overloaded. Tencent’s API is a bit under-rated. If llama 3.3 70B is smart enough for you, Cerebras API is super fast.

      Qwen Max is… not bad? The reasoning models kinda spoiled me, but I think they have more reasoning releases coming.

      MiniMax is ok for long context, but I still tend to lean on Gemini for this.

      I dunno about Claude these days, as its just so expensive. I haven’t touched OpenAI in a long time.

      Oh, and sometimes “weird” finetunes you can find on OpenRouter or whatever will serve niches much better than “big” API models.

      EDIT:

      Locally, I used to hop around, but now I pretty much always run a Qwen 32B finetune. Either coder, Arcee Distill, FuseAI, R1, EVA-Gutenberg, or Openbuddy, usually.

      • MoonlightFox@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 hours ago

        So there is not any trustworthy benchmarks I can currently use to evaluate? That in combination with my personal anecdotes is how I have been evaluating them.

        I was pretty impressed with Deepseek R1. I used their app, but not for anything sensitive.

        I don’t like that OpenAI defaults to a model I can’t pick. I have to select it each time, even when I use a special URL it will change after the first request

        I am having a hard time deciding which models to use besides a random mix between o3-mini-high, o1, Sonnet 3.5 and Gemini 2 Flash

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 hours ago

          Heh, only obscure ones that they can’t game, and only if they fit your use case. One example is the ones in EQ bench: https://eqbench.com/

          …And again, the best mix of models depends on your use case.

          I can suggest using something like Open Web UI with APIs instead of native apps. It gives you a lot more control, more powerful tooling to work with, and the ability to easily select and switch between models.