![](/static/61a827a1/assets/icons/icon-96x96.png)
![](https://fry.gs/pictrs/image/c6832070-8625-4688-b9e5-5d519541e092.png)
Whoops, yeah, should have linked the blog.
I didn’t want to link the individual models because I’m not sure hybrid or pure transformers is better?
Whoops, yeah, should have linked the blog.
I didn’t want to link the individual models because I’m not sure hybrid or pure transformers is better?
Gemini 1.5 used to be the best long context model around, by far.
Gemini Flash Thinking from earlier this year was very good for its speed/price, but it regressed a ton.
Gemini 1.5 Pro is literally better than the new 2.0 Pro in some of my tests, especially long-context ones. I dunno what happened there, but yes, they probably overtuned it or something.
For local LLMs, this is an issue because it breaks your prompt cache and slows things down, without a specific tiny model to “categorize” text… which few have really worked on.
I don’t think the corporate APIs or UIs even do this. You are not wrong, but it’s just not done for some reason.
It could be that the trainers don’t realize its an issue. For instance, “0.5-0.7” is the recommended range for Deepseek R1, but I find much lower or slightly higher is far better, depending on the category and other sampling parameters.
Lemmy is understandably sympathetic to self-hosted AI, but I get chewed out or even banned literally anywhere else.
In one fandom (the Avatar fandom), there used to be enthusiasm for a “community enhancement” of the original show since the official DVD/Blu-ray looks awful. Years later in a new thread, I don’t even mention the word “AI,” just the idea of restoration, and I got bombed and threadlocked for the mere tangential implication.
Temperature isn’t even “creativity” per say, it’s more a band-aid to patch looping and dryness in long responses.
Lower temperature is much better with modern sampling algorithms, E.G., MinP, DRY, maybe dynamic temperature like mirostat and such. Ideally, structure output, too. Unfortunately, corporate APIs usually don’t offer this.
It can be mitigated with finetuning against looping/repetition/slop, but most models are the opposite, massively overtuning on their own output which “inbreeds” the model.
And yes, domain specific queries are best. Basically the user needs separate prompt boxes for coding, summaries, creative suggestions and such each with their own tuned settings (and ideally tuned models). You are right, this is a much better idea than offering a temperature knob to the user, but… most UIs don’t even do this for some reason?
What I am getting at is this is not a problem companies seem interested in solving.They want to treat the users as idiots without the attention span to even categorize their question.
Zonos just came out, seems sick:
There are also some “native” tts LLMs like GLM 9B, which “capture” more information in the output than pure text input.
What temperature and sampling settings? Which models?
I’ve noticed that the AI giants seem to be encouraging “AI ignorance,” as they just want you to use their stupid subscription app without questioning it, instead of understanding how the tools works under the hood. They also default to bad, cheap models.
I find my local thinking models (FuseAI, Arcee, or Deepseek 32B 5bpw at the moment) are quite good at summarization at a low temperature, which is not what these UIs default to, and I get to use better sampling algorithms than any of the corporate APis. Same with “affordable” flagship API models (like base Deepseek, not R1). But small Gemini/OpenAI API models are crap, especially with default sampling, and Gemini 2.0 in particular seems to have regressed.
My point is that LLMs as locally hosted tools you understand the mechanics/limitations of are neat, but how corporations present them as magic cloud oracles is like everything wrong with tech enshittification and crypto-bro type hype in one package.
Hey, uh, if he wants another state…
What about Puerto Rico?
I’d love to hear the excuse against that vs. Greenland and Canada.
Holding a country that doesn’t want to be held, that no one else wants you to occupy, is very different than winning it, as the US and other countries have repeatedly learned.
And the unpopularity at home would make Vietnam look like nothing.
No, it would be a disaster for the US. A very different one than Ukraine, but maybe even worse.
Ironically, Fallout had nothing like the propaganda tools we have today. Hitler would have drooled over Twitter, TikTok, Meta, Google’s, and a few influencer’s grip on the collective psyche.
Private investors are (usually, and theoretically) more “long-term” motivated than the public markets. Day traders and rotating board members love quarterly boosts even if it implodes the company, but with private equity, passing a bag of shit to someone else isn’t so easy, and desires aren’t so fickle.
Hence I suspect you’re right.
AFAIK Apple/Play Store do not allow this. They already got into a fight with Epic about it.
Even if that were true, absolutely not.
My friend… WTF.
DEI, of course.
That’s an understatement. It won’t even fit well in 8xA100, you need an EPYC server to run it in CPU RAM, very slowly.
You joke, but the pro version won’t be far. The Pro 4090 (aka the RTX 6000 ada)was already $7000 MSRP, and the pro 5090 is rumored to have far more VRAM.
You know what I meant, by no one I mean “a large majority of users.”
The bigger problem is AI “ignorance,” and it’s not just Facebook. I’ve reported more than one Lemmy post the user naively sourced from ChatGPT or Gemini and took as fact.
No one understands how LLMs work, not even on a basic level. Can’t blame them, seeing how they’re shoved down everyone’s throats as opaque products, or straight up social experiments like Facebook.
…Are we all screwed? Is the future a trippy information wasteland? All this seems to be getting worse and worse, and everyone in charge is pouring gasoline on it.
Ken Martin won: https://www.axios.com/2025/02/01/ken-martin-dnc-chair-2024
Is this bad?
Benchmarks are so gamed, even Chatbot Arena is kinda iffy. TBH you have to test them with your prompts yourself.
Honestly I am getting incredible/creative responses from Deepseek R1, the hype is real, though its frequently overloaded. Tencent’s API is a bit under-rated. If llama 3.3 70B is smart enough for you, Cerebras API is super fast.
Qwen Max is… not bad? The reasoning models kinda spoiled me, but I think they have more reasoning releases coming.
MiniMax is ok for long context, but I still tend to lean on Gemini for this.
I dunno about Claude these days, as its just so expensive. I haven’t touched OpenAI in a long time.
Oh, and sometimes “weird” finetunes you can find on OpenRouter or whatever will serve niches much better than “big” API models.
EDIT:
Locally, I used to hop around, but now I pretty much always run a Qwen 32B finetune. Either coder, Arcee Distill, FuseAI, R1, EVA-Gutenberg, or Openbuddy, usually.