So, Ente used an AI security audit startup to review their new crypto layer.
We first came across winfunc when they audited our server code on their own initiative. That report was noticeably better: It was clearer (the findings read like they were written for an engineer), had fewer false positives, and felt directly actionable (“here’s what’s wrong, here’s where, here’s what to do.”)
We met the founders. They’re sharp, young, and building something that already punches well above its weight.
And when it came time to review ente-core, we reached out to them.
I’m sure that LLMs can be efficient and effective in finding vulnerabilities and this is actually one of few useful things to use LLMs for. BUT to me trusting an LLM to spot everything sounds like asking for trouble – I mean just take a look at the recent LiteLLM/Delve debacle.
Yes. Why not? Latest models are pretty smart. On the other hand, even human audit does not necessarily guarantee quality.Moreover, AI-assisted auditing is thousands of times cheaper and can help identify vulnerabilities before a human audit.
LLMs are limited both by what they’ve already seen (training set), and the inference (prompts/input).
I suspect the question you want to ask is: Is the expert (“human in the loop” as @bitsondatadev puts it) responsible for reviewing LLM’s output putting in the requisite work to evaluate it for correctness and completeness?
SoTA LLMs from frontier labs are really, really good at code. I wouldn’t be surprised if they are decent at code-focused cryptography audits, too. I can tell from experience that LLMs in the hands of an expert will produce vastly different & potentially highly useful results than a “script kiddie” might. In short, it all depends on just who is wielding the sword prompt.
All that said, rage-bait marketing (ex) is also a thing these days…
I mean, I feel like security audits need to contain all the above on some spectrum between human only pen testing all the way out to LLM agent testing. The reality is that generative does a great job at generating ideas that humans might not otherwise have come up with and also humans have a model of the world beyond training data and dimensions that an LLM trains and generates on. Not having audits that miss any variation from human only, hybrid and agentic is missing variations of attacks. Arguing one is wrong or right is missing the point IMO.