• 0 Posts
  • 55 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle


  • How can you be sure it’s one line of code? What if there are several codepaths, and venvs are activated in different places? And in any case, even if there is only one conditional needed, that is still one branch more than necessary to test.

    Your symlink example does not make sense. There is someting that is changing. In fact, it may even be the opposite: if you need to use file A in s container, and file B otherwise, it may make perfect sense to symlink the correct file to C, so thst your code does not need to care about it.






  • there is no way to do the equivalent of banning armor piercing rounds with an LLM or making sure a gun is detectable by metal detectors - because as I said it is non-deterministic. You can’t inject programmatic controls.

    Of course you can. Why would you not, just because it is non-deterministic? Non-determinism does not mean complete randomness and lack of control, that is a common misconception.

    Again, obviously you can’t teach an LLM about morals, but you can reduce the likelyhood of producing immoral content in many ways. Of course it won’t be perfect, and of course it may limit the usefulness in some cases, but that is the case also today in many situations that don’t involve AI, e.g. some people complain they “can not talk about certain things without getting cancelled by overly eager SJWs”. Society already acts as a morality filter. Sometimes it works, sometimes it doesn’t. Free-speech maximslists exist, but are a minority.











  • Obviously the 2nd LLM does not need to reveal the prompt. But you still need an exploit to make it both not recognize the prompt as being suspicious, AND not recognize the system prompt being on the output. Neither of those are trivial alone, in combination again an order of magnitude more difficult. And then the same exploit of course needs to actually trick the 1st LLM. That’s one pompt that needs to succeed in exploiting 3 different things.

    LLM litetslly just means “large language model”. What is this supposed principles that underly these models that cause them to be susceptible to the same exploits?


  • Moving goalposts, you are the one who said even 1000x would not matter.

    The second one does not run on the same principles, and the same exploits would not work against it, e g. it does not accept user commands, it uses different training data, maybe a different architecture even.

    You need a prompt that not only exploits two completely different models, but exploits them both at the same time. Claiming that is a 2x increase in difficulty is absurd.