• Zess@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    6 days ago

    You asked a stupid question and got a stupid response, seems fine to me.

  • Grabthar@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 days ago

    Doc: That’s an interesting name, Mr…

    Fletch: Babar.

    Doc: Is that with one B or two?

    Fletch: One. B-A-B-A-R.

    Doc: That’s two.

    Fletch: Yeah, but not right next to each other, that’s what I thought you meant.

    Doc: Isn’t there a children’s book about an elephant named Babar.

    Fletch: Ha, ha, ha. I wouldn’t know. I don’t have any.

    Doc: No children?

    Fletch: No elephant books.

  • humorlessrepost@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    7 days ago

    Works fine for me in o3-mini-high:

    Counting letters in “strawberry”

    Alright, I’m checking: the word “strawberry” is spelled S T R A W B E R R Y. Let me count the letters: S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). There are three R’s: in positions 3, 8, and 9. So, the answer is 3. Even if we ignore case, the count still holds. Therefore, there are 3 r’s in “strawberry.”

  • eggymachus@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 days ago

    A guy is driving around the back woods of Montana and he sees a sign in front of a broken down shanty-style house: ‘Talking Dog For Sale.’

    He rings the bell and the owner appears and tells him the dog is in the backyard.

    The guy goes into the backyard and sees a nice looking Labrador Retriever sitting there.

    “You talk?” he asks.

    “Yep” the Lab replies.

    After the guy recovers from the shock of hearing a dog talk, he says, “So, what’s your story?”

    The Lab looks up and says, “Well, I discovered that I could talk when I was pretty young. I wanted to help the government, so I told the CIA. In no time at all they had me jetting from country to country, sitting in rooms with spies and world leaders, because no one figured a dog would be eavesdropping, I was one of their most valuable spies for eight years running… but the jetting around really tired me out, and I knew I wasn’t getting any younger so I decided to settle down. I signed up for a job at the airport to do some undercover security, wandering near suspicious characters and listening in. I uncovered some incredible dealings and was awarded a batch of medals. I got married, had a mess of puppies, and now I’m just retired.”

    The guy is amazed. He goes back in and asks the owner what he wants for the dog.

    “Ten dollars” the guy says.

    “Ten dollars? This dog is amazing! Why on Earth are you selling him so cheap?”

    “Because he’s a liar. He’s never been out of the yard.”

  • ClusterBomb@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    7 days ago

    “My hammer is not well suited to cut vegetables” 🤷

    There is so much to say about AI, can we move on from “it can’t count letters and do math” ?

    • Strykker@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      7 days ago

      But the problem is more “my do it all tool randomly fails at arbitrary tasks in an unpredictable fashion” making it hard to trust as a tool in any circumstances.

      • superglue@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 days ago

        Your not supposed to just trust it. Your supposed to test the solution it gives you. Yes that makes it not useful for some things. But still immensely useful for other applications and a lot of times it gives you a really great jumping off point to solving whatever your problem is.

    • ReallyActuallyFrankenstein@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      7 days ago

      I get that it’s usually just a dunk on AI, but it is also still a valid demonstration that AI has pretty severe and unpredictable gaps in functionality, in addition to failing to properly indicate confidence (or lack thereof).

      People who understand that it’s a glorified autocomplete will know how to disregard or prompt around some of these gaps, but this remains a litmus test because it succinctly shows you cannot trust an LLM response even in many “easy” cases.

  • whotookkarl@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 days ago

    I’ve already had more than one conversation where people quote AI as if it were a source, like quoting google as a source. When I showed them how it can sometimes lie and explain it’s not a primary source for anything I just get that blank stare like I have two heads.

  • gerryflap@feddit.nl
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    8 days ago

    These models don’t get single characters but rather tokens repenting multiple characters. While I also don’t like the “AI” hype, this image is also very 1 dimensional hate and misreprents the usefulness of these models by picking one adversarial example.

    Today ChatGPT saved me a fuckton of time by linking me to the exact issue on gitlab that discussed the issue I was having (full system freezes using Bottles installed with flatpak on Arch). This was the URL it came up with after explaining the problem and giving it the first error I found in dmesg: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/110

    This issue is one day old. When I looked this shit up myself I found exactly nothing useful on both DDG or Google. After this ChatGPT also provided me with the information that the LTS kernel exists and how to install it. Obviously I verified that stuff before using it, because these LLMs have their limits. Now my system works again, and figuring this out myself would’ve cost me hours because I had no idea what broke. Was it flatpak, Nvidia, the kernel, Wayland, Bottles, some random shit I changed in a config file 2 years ago? Well thanks to ChatGPT I know.

    They’re tools, and they can provide new insights that can be very useful. Just don’t expect them to always tell the truth, or to actually be human-like

    • lennivelkant@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      0
      ·
      7 days ago

      Just don’t expect them to always tell the truth, or to actually be human-like

      I think the point of the post is to call out exactly that: people preaching AI as replacing humans

      • desktop_user@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 days ago

        it can, in the same way a loom did, just for more language-y tasks, a multimodal system might be better at answering that type of question by first detecting that this is a question of fact and that using a bucket sort algorithm on the word “strawberry” will answer the question better than it’s questionably obtained correlations.

  • HoofHearted@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 days ago

    The terrifying thing is everyone criticising the LLM as being poor, however it excelled at the task.

    The question asked was how many R in strawbery and it answered. 2.

    It also detected the typo and offered the correct spelling.

    What’s the issue I’m missing?

    • Fubarberry@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      8 days ago

      There’s also a “r” in the first half of the word, “straw”, so it was completely skipping over that r and just focusing on the r’s in the word “berry”

      • catloaf@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 days ago

        It wasn’t focusing on anything. It was generating text per its training data. There’s no logical thought process whatsoever.

    • Tywèle [she|her]@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      8 days ago

      The issue that you are missing is that the AI answered that there is 1 ‘r’ in ‘strawbery’ even though there are 2 'r’s in the misspelled word. And the AI corrected the user with the correct spelling of the word ‘strawberry’ only to tell the user that there are 2 'r’s in that word even though there are 3.

      • TomAwsm@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 days ago

        Sure, but for what purpose would you ever ask about the total number of a specific letter in a word? This isn’t the gotcha that so many think it is. The LLM answers like it does because it makes perfect sense for someone to ask if a word is spelled with a single or double “r”.

  • VintageGenious@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    8 days ago

    Because you’re using it wrong. It’s good for generative text and chains of thought, not symbolic calculations including math or linguistics

    • Grandwolf319@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 days ago

      Because you’re using it wrong.

      No, I think you mean to say it’s because you’re using it for the wrong use case.

      Well this tool has been marketed as if it would handle such use cases.

      I don’t think I’ve actually seen any AI marketing that was honest about what it can do.

      I personally think image recognition is the best use case as it pretty much does what it promises.

      • slaacaa@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        7 days ago

        I have it write for me emails in German. I moved there not too long ago, works wonders to get doctors appointment, car service, etc. I also have it explain the text, so I’m learning the language.

        I also use it as an alternative to internet search, which is now terrible. It’s not going to help you to find smg super location specific, but I can ask it to tell me without spoilers smg about a game/movie or list metacritic scores in a table, etc.

        It also works great in summarizing long texts.

        LLM is a tool, what matters is how you use it. It is stupid, it doesn’t think, it’s mostly hype to call it AI. But it definitely has it’s benefits.

      • scarabic@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 days ago

        We have one that indexes all the wikis and GDocs and such at my work and it’s incredibly useful for answering questions like “who’s in charge of project 123?” or “what’s the latest update from team XYZ?”

        I even asked it to write my weekly update for MY team once and it did a fairly good job. The one thing I thought it had hallucinated turned out to be something I just hadn’t heard yet. So it was literally ahead of me at my own job.

        I get really tired of all the automatic hate over stupid bullshit like this OP. These tools have their uses. It’s very popular to shit on them. So congratulations for whatever agreeable comments your post gets. Anyway.

      • chiisana@lemmy.chiisana.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 days ago

        Ask it for a second opinion on medical conditions.

        Sounds insane but they are leaps and bounds better than blindly Googling and self prescribe every condition there is under the sun when the symptoms only vaguely match.

        Once the LLM helps you narrow in on a couple of possible conditions based on the symptoms, then you can dig deeper into those specific ones, learn more about them, and have a slightly more informed conversation with your medical practitioner.

        They’re not a replacement for your actual doctor, but they can help you learn and have better discussions with your actual doctor.

        • Wogi@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          8 days ago

          So can web MD. We didn’t need AI for that. Googling symptoms is a great way to just be dehydrated and suddenly think you’re in kidney failure.

          • chiisana@lemmy.chiisana.net
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 days ago

            We didn’t stop trying to make faster, safer and more fuel efficient cars after Model T, even though it can get us from place A to place B just fine. We didn’t stop pushing for digital access to published content, even though we have physical libraries. Just because something satisfies a use case doesn’t mean we should stop advancing technology.

            • Wogi@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              8 days ago

              We also didn’t make the model T suggest replacing the engine when the oil light comes on. Cars, as it happens, aren’t that great at self diagnosis, despite that technology being far simpler and further along than generative models are. I don’t trust the model to tell me what temperature to bake a cake at, I’m sure at hell not going to trust it with medical information. Googling symptoms was risky at best before. It’s a horror show now.

            • snooggums@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              arrow-down
              1
              ·
              8 days ago

              AI is slower and less efficient than the older search algorithms and is less accurate.

      • chaosCruiser@futurology.today
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        8 days ago

        Here’s a bit of code that’s supposed to do stuff. I got this error message. Any ideas what could cause this error and how to fix it? Also, add this new feature to the code.

        Works reasonably well as long as you have some idea how to write the code yourself. GPT can do it in a few seconds, debugging it would take like 5-10 minutes, but that’s still faster than my best. Besides, GPT is also fairly fluent in many functions I have never used before. My approach would be clunky and convoluted, while the code generated by GPT is a lot shorter.

        If you’re well familiar with the code you’ve working on, GPT code will be convoluted by comparison. If so, you can ask GPT for the rough alpha version, and you can do the debugging and refining in a few minutes.

        • Windex007@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          8 days ago

          That makes sense as long as you’re not writing code that needs to know how to do something as complex as …checks original post… count.

          • TimeSquirrel@kbin.melroy.org
            link
            fedilink
            arrow-up
            1
            ·
            8 days ago

            It can do that just fine, because it has seen enough examples of working code. It can’t directly count correctly, sure, but it can write “i++;”, incrementing a variable by one in a loop and returning the result. The computer running the generated program is going to be doing the counting.

      • L3s@lemmy.worldM
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        8 days ago

        Writing customer/company-wide emails is a good example. “Make this sound better: we’re aware of the outage at Site A, we are working as quick as possible to get things back online”

        Dumbing down technical information “word this so a non-technical person can understand: our DHCP scope filled up and there were no more addresses available for Site A, which caused the temporary outage for some users”

        Another is feeding it an article and asking for a summary, https://hackingne.ws/ does that for its Bsky posts.

        Coding is another good example, “write me a Python script that moves all files in /mydir to /newdir”

        Asking for it to summarize a theory or protocol, “explain to me why RIP was replaced with RIPv2, and what problems people have had since with RIPv2”

        • Corngood@lemmy.ml
          link
          fedilink
          English
          arrow-up
          0
          ·
          8 days ago

          Make this sound better: we’re aware of the outage at Site A, we are working as quick as possible to get things back online

          How does this work in practice? I suspect you’re just going to get an email that takes longer for everyone to read, and doesn’t give any more information (or worse, gives incorrect information). Your prompt seems like what you should be sending in the email.

          If the model (or context?) was good enough to actually add useful, accurate information, then maybe that would be different.

          I think we’ll get to the point really quickly where a nice concise message like in your prompt will be appreciated more than the bloated, normalised version, which people will find insulting.

          • locuester@lemmy.zip
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 hours ago

            Yes, people are using it as the least efficient communication protocol ever.

            One side asks an LLM to expand a summary into a fluff filled email, and the other side asks an LLM to reduce the long email to a summary.

        • snooggums@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          edit-2
          8 days ago

          The dumbed down text is basically as long as the prompt. Plus you have to double check it to make sure it didn’t have outrage instead of outage just like if you wrote it yourself.

          How do you know the answer on why RIP was replaced with RIPv2 is accurate and not just a load of bullshit like putting glue on pizza?

          Are you really saving time?

            • snooggums@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              arrow-down
              1
              ·
              edit-2
              8 days ago

              If the amount of time it takes to create the prompt is the same as it would have taken to write the dumbed down text, then the only time you saved was not learning how to write dumbed down text. Plus you need to know what dumbed down text should look like to know if the output is dumbed down but still accurate.

        • lurch (he/him)@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          2
          ·
          8 days ago

          it’s not good for summaries. often gets important bits wrong, like embedded instructions that can’t be summarized.

          • L3s@lemmy.worldM
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            8 days ago

            My experience has been very different, I do have to sometimes add to what it summarized though. The Bsky account mentioned is a good example, most of the posts are very well summarized, but every now and then there will be one that isn’t as accurate.

    • Prandom_returns@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      7 days ago

      So for something you can’t objectively evaluate? Looking at Apple’s garbage generator, LLMs aren’t even good at summarising.

      • Balder@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        20 hours ago

        For reference:

        AI chatbots unable to accurately summarise news, BBC finds

        the BBC asked ChatGPT, Copilot, Gemini and Perplexity to summarise 100 news stories and rated each answer. […] It found 51% of all AI answers to questions about the news were judged to have significant issues of some form. […] 19% of AI answers which cited BBC content introduced factual errors, such as incorrect factual statements, numbers and dates.

        It makes me remember I basically stopped using LLMs for any summarization after this exact thing happened to me. I realized that without reading the text, I wouldn’t be able to know whether the output has all the relevant info or if it has some made-up info.

  • Grandwolf319@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    edit-2
    8 days ago

    There is an alternative reality out there where LLMs were never marketed as AI and were marketed as random generator.

    In that world, tech savvy people would embrace this tech instead of having to constantly educate people that it is in fact not intelligence.