• voodooattack@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    Almost all clients do some random sampling after softmax using temperature. I’m confused why someone who knows about kv caching would not know about temperature.

    I know what temperature is. Modifying the probability distribution is still not randomness. Because even the random sampling is PRNG based.

    The issue you’re not spotting is that it’s still deterministic because a binary system cannot source entropy without external assistance or access to qbits, it’s why even OS kernels have to do a warm up at boot and read all accessible analogue signal sources they can reach, and why PRNGs still exist to begin with.

    Also shared kv cache while plausible is not standard in open source as of a year or so ago,

    Shared KV-cache is an economic necessity for big providers, otherwise 1M context windows wouldn’t be a thing.

    so i’m curious what you are basing this off of. Did I miss a research paper?

    Empirical testing, 20 years of experience coding and tinkering with simulators, and Chaos Theory basics. The papers are out there, you just gotta cross some domains to see it.

    • kersplomp@piefed.blahaj.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      I see, thanks for clarifying. If you’re arguing that PRNG is not random, then you’re likely confusing non-technical readers. Additionally, it is an implementation detail whether it’s pseudorandom or actually random since /dev/random takes in actual random signals like network packets.

      If it used a seeded PRNG it’s repeatable, but repeatability does not imply predictability which is what a non-technical reader might assume. Remember, most people on here are non-technical.

      re: the kv cache thing, I don’t think that’s correct but I don’t have the energy to prove it sorry. shared kv cache sounds like a security nightmare but ymmv