For the SelfCheckGPT paper, which was basically this method, it was very sample dependent, continuing to see improvement up to 20 samples (their limit), but especially up to around 6 iterations…
I’ve seen it double down, when instructed a facet of the answer was incorrect and to revise, several times I’d get “sorry for the incorrect information”, followed by exact same mistake.
You can’t continue with it in context or it ruins the entire methodology. You are reintroducing those tokens when you show it back to the model, and the models are terrible at self-correcting when instructed that it is incorrect, so the step is quite meritless anyways.
You need to run parallel queries and identify shared vs non-shared data points.
It really depends on the specific use case in terms of the full pipeline, but it works really well. Even with just around 5 samples and intermediate summarization steps it pretty much shuts down completely errant hallucinations. The only class of hallucinations it doesn’t do great with are the ones resulting from biases in the relationship between the query and the training data, but there’s other solutions for things like that.
And yes, it definitely does mean inadvertently eliminating false negatives, which is why a balance has to be struck in terms of design choices.
How many times are you running it?
For the SelfCheckGPT paper, which was basically this method, it was very sample dependent, continuing to see improvement up to 20 samples (their limit), but especially up to around 6 iterations…
You can’t continue with it in context or it ruins the entire methodology. You are reintroducing those tokens when you show it back to the model, and the models are terrible at self-correcting when instructed that it is incorrect, so the step is quite meritless anyways.
You need to run parallel queries and identify shared vs non-shared data points.
It really depends on the specific use case in terms of the full pipeline, but it works really well. Even with just around 5 samples and intermediate summarization steps it pretty much shuts down completely errant hallucinations. The only class of hallucinations it doesn’t do great with are the ones resulting from biases in the relationship between the query and the training data, but there’s other solutions for things like that.
And yes, it definitely does mean inadvertently eliminating false negatives, which is why a balance has to be struck in terms of design choices.