• Natanael@slrpnk.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    hace 2 años

    They’ll probably isolate the models from each other, but yeah, if they want to train shared models from private data then that could happen.

      • Natanael@slrpnk.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        hace 2 años

        Bard is the name of the service, they can create account specific models trained on your user data which aren’t shared with other accounts (as an extension of the base model built on public data). I’ve already read about companies doing this to avoid cross contamination. Pretty sure Google is aware of this.

        • JohnDClay@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          hace 2 años

          But I don’t know if Google cares enough about privacy to bother training individual models to avoid cross contamination. Each model takes years worth of super computer time, so the fewer they’d need to train, the less costly.

          • Natanael@slrpnk.net
            link
            fedilink
            English
            arrow-up
            1
            ·
            hace 2 años

            Extending existing models (retraining) doesn’t need years, it can be done in far less time.

            • JohnDClay@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              hace 2 años

              Hmm, I thought one of the problems with LLMs was they’re pretty baked in in the training process. Maybe that was only with respect to removing information?

              • Natanael@slrpnk.net
                link
                fedilink
                English
                arrow-up
                1
                ·
                hace 2 años

                Yeah, it’s hard to remove data already trained into a model. But you can retrain them to add capabilities to an existing model, so if you copy one based on public data multiple times and then retrain with different sets of private data then you can save a lot of work