Trump 2.0 initial approval ratings higher than in first term

brucethemoose@lemmy.world · edit-2 1 hour ago

benchmarks

Benchmarks are so gamed, even Chatbot Arena is kinda iffy. TBH you have to test them with your prompts yourself.

Honestly I am getting incredible/creative responses from Deepseek R1, the hype is real, though its frequently overloaded. Tencent’s API is a bit under-rated. If llama 3.3 70B is smart enough for you, Cerebras API is super fast.

Qwen Max is… not bad? The reasoning models kinda spoiled me, but I think they have more reasoning releases coming.

MiniMax is ok for long context, but I still tend to lean on Gemini for this.

I dunno about Claude these days, as its just so expensive. I haven’t touched OpenAI in a long time.

Oh, and sometimes “weird” finetunes you can find on OpenRouter or whatever will serve niches much better than “big” API models.

EDIT:

Locally, I used to hop around, but now I pretty much always run a Qwen 32B finetune. Either coder, Arcee Distill, FuseAI, R1, EVA-Gutenberg, or Openbuddy, usually.

brucethemoose@lemmy.world · 1 day ago

Whoops, yeah, should have linked the blog.

I didn’t want to link the individual models because I’m not sure hybrid or pure transformers is better?

brucethemoose@lemmy.world · edit-2 1 day ago

Gemini 1.5 used to be the best long context model around, by far.

Gemini Flash Thinking from earlier this year was very good for its speed/price, but it regressed a ton.

Gemini 1.5 Pro is literally better than the new 2.0 Pro in some of my tests, especially long-context ones. I dunno what happened there, but yes, they probably overtuned it or something.

brucethemoose@lemmy.world · edit-2 1 day ago

For local LLMs, this is an issue because it breaks your prompt cache and slows things down, without a specific tiny model to “categorize” text… which few have really worked on.

I don’t think the corporate APIs or UIs even do this. You are not wrong, but it’s just not done for some reason.

It could be that the trainers don’t realize its an issue. For instance, “0.5-0.7” is the recommended range for Deepseek R1, but I find much lower or slightly higher is far better, depending on the category and other sampling parameters.

brucethemoose@lemmy.world · edit-2 1 day ago

Lemmy is understandably sympathetic to self-hosted AI, but I get chewed out or even banned literally anywhere else.

In one fandom (the Avatar fandom), there used to be enthusiasm for a “community enhancement” of the original show since the official DVD/Blu-ray looks awful. Years later in a new thread, I don’t even mention the word “AI,” just the idea of restoration, and I got bombed and threadlocked for the mere tangential implication.

brucethemoose@lemmy.world · edit-2 1 day ago

Temperature isn’t even “creativity” per say, it’s more a band-aid to patch looping and dryness in long responses.
Lower temperature is much better with modern sampling algorithms, E.G., MinP, DRY, maybe dynamic temperature like mirostat and such. Ideally, structure output, too. Unfortunately, corporate APIs usually don’t offer this.
It can be mitigated with finetuning against looping/repetition/slop, but most models are the opposite, massively overtuning on their own output which “inbreeds” the model.
And yes, domain specific queries are best. Basically the user needs separate prompt boxes for coding, summaries, creative suggestions and such each with their own tuned settings (and ideally tuned models). You are right, this is a much better idea than offering a temperature knob to the user, but… most UIs don’t even do this for some reason?

What I am getting at is this is not a problem companies seem interested in solving.They want to treat the users as idiots without the attention span to even categorize their question.

brucethemoose@lemmy.world · edit-2 1 day ago

Zonos just came out, seems sick:

https://huggingface.co/Zyphra

There are also some “native” tts LLMs like GLM 9B, which “capture” more information in the output than pure text input.

brucethemoose@lemmy.world · edit-2 1 day ago

What temperature and sampling settings? Which models?

I’ve noticed that the AI giants seem to be encouraging “AI ignorance,” as they just want you to use their stupid subscription app without questioning it, instead of understanding how the tools works under the hood. They also default to bad, cheap models.

I find my local thinking models (FuseAI, Arcee, or Deepseek 32B 5bpw at the moment) are quite good at summarization at a low temperature, which is not what these UIs default to, and I get to use better sampling algorithms than any of the corporate APis. Same with “affordable” flagship API models (like base Deepseek, not R1). But small Gemini/OpenAI API models are crap, especially with default sampling, and Gemini 2.0 in particular seems to have regressed.

My point is that LLMs as locally hosted tools you understand the mechanics/limitations of are neat, but how corporations present them as magic cloud oracles is like everything wrong with tech enshittification and crypto-bro type hype in one package.

brucethemoose@lemmy.world · edit-2 3 days ago

Hey, uh, if he wants another state…

What about Puerto Rico?

I’d love to hear the excuse against that vs. Greenland and Canada.

brucethemoose@lemmy.world · edit-2 3 days ago

Holding a country that doesn’t want to be held, that no one else wants you to occupy, is very different than winning it, as the US and other countries have repeatedly learned.

And the unpopularity at home would make Vietnam look like nothing.

No, it would be a disaster for the US. A very different one than Ukraine, but maybe even worse.

brucethemoose@lemmy.world · 3 days ago

Ironically, Fallout had nothing like the propaganda tools we have today. Hitler would have drooled over Twitter, TikTok, Meta, Google’s, and a few influencer’s grip on the collective psyche.

brucethemoose@lemmy.world · edit-2 3 days ago

Private investors are (usually, and theoretically) more “long-term” motivated than the public markets. Day traders and rotating board members love quarterly boosts even if it implodes the company, but with private equity, passing a bag of shit to someone else isn’t so easy, and desires aren’t so fickle.

Hence I suspect you’re right.

brucethemoose@lemmy.world · edit-2 3 days ago

AFAIK Apple/Play Store do not allow this. They already got into a fight with Epic about it.

brucethemoose@lemmy.world · 3 days ago

Even if that were true, absolutely not.

My friend… WTF.

brucethemoose@lemmy.world · 3 days ago

Trump 2.0 initial approval ratings higher than in first term

brucethemoose@lemmy.world · 3 days ago

DEI, of course.

brucethemoose@lemmy.world · 5 days ago

That’s an understatement. It won’t even fit well in 8xA100, you need an EPYC server to run it in CPU RAM, very slowly.

brucethemoose@lemmy.world · edit-2 7 days ago

You joke, but the pro version won’t be far. The Pro 4090 (aka the RTX 6000 ada)was already $7000 MSRP, and the pro 5090 is rumored to have far more VRAM.

brucethemoose@lemmy.world · 10 days ago

You know what I meant, by no one I mean “a large majority of users.”

brucethemoose@lemmy.world · edit-2 10 days ago

The bigger problem is AI “ignorance,” and it’s not just Facebook. I’ve reported more than one Lemmy post the user naively sourced from ChatGPT or Gemini and took as fact.

No one understands how LLMs work, not even on a basic level. Can’t blame them, seeing how they’re shoved down everyone’s throats as opaque products, or straight up social experiments like Facebook.

…Are we all screwed? Is the future a trippy information wasteland? All this seems to be getting worse and worse, and everyone in charge is pouring gasoline on it.

brucethemoose@lemmy.world · edit-2 11 days ago

Ken Martin won: https://www.axios.com/2025/02/01/ken-martin-dnc-chair-2024

Is this bad?

brucethemoose@lemmy.world · edit-2 4 months ago

Guide to Self Hosting LLMs Faster/Better than Ollama