You may have heard that the US government has a bit of a mess on its hands after House Speaker Mike Johnson worked out a somewhat carefully crafted compromise continuing resolution funding plan to …
(B) make reasonable efforts to identify and remove any known identical copies of such depiction
So under the plain wording, similar images aren’t covered, only identical, and you only have to make a “reasonable effort” to do so, you don’t actually have to be successful in doing so. There’s nothing here that indicates perceptual hashing is required in order to meet this standard.
And all of that aside- even perceptual hashes are not that burdensome to generate.
(5) “visual depiction” includes undeveloped film and videotape, data stored on computer disk or by electronic means which is capable of conversion into a visual image, and data which is capable of conversion into a visual image that has been transmitted by any means, whether or not stored in a permanent format;
The way it is written, even cropped, rotated, blurred, or in any other way processed files of that “depiction”, even the values learned by a neural network (capable of conversion into a visual image), would fall under the “identical” part.
Since perceptual hashing does exist, there are open source libraries to run it, and even Beehaw runs an AI based image filter, the “reasonable effort” is arguably to use all those tools as the bare minimum. Even if they sometimes (or always) fail at removing all instances of a depiction.
But ultimately, deciding whether a service has applied all “reasonable efforts” to remove “identical copies” of a “depiction”, will fall on the shoulders of a judge… and even starting to go there, can bankrupt most sites.
Depends on “how identical” is “identical”.
The SHA hash of a file, is easy to calculate, but pretty much useless at detecting similar images; change a single bit, and the SHA hash changes.
In order to detect similar content, you need perceptual hashes, which are no longer that easy to calculate.
Yes, but does the law require removing similar content, or just removal of that image?
Reading the act itself, the law requires a site to:
So under the plain wording, similar images aren’t covered, only identical, and you only have to make a “reasonable effort” to do so, you don’t actually have to be successful in doing so. There’s nothing here that indicates perceptual hashing is required in order to meet this standard.
And all of that aside- even perceptual hashes are not that burdensome to generate.
The problem lies in what is a “depiction”:
section 2256(5) of title 18
https://uscode.house.gov/view.xhtml?req=(title%3A18+section%3A2256+edition%3Aprelim)+OR+(granuleid%3AUSC-prelim-title18-section2256)&%3Bf=treesort&%3Bnum=0&%3Bedition=prelim
via: section 1309 of the Consolidated Appropriations Act, 2022 (15 U.S.C. 6851).
https://uscode.house.gov/view.xhtml?req=(title%3A15+section%3A6851+edition%3Aprelim)
via: the definitions section of the act
https://www.congress.gov/bill/118th-congress/senate-bill/4569/text#idE946FA7637914C2F88ACBEDF472397DB
The way it is written, even cropped, rotated, blurred, or in any other way processed files of that “depiction”, even the values learned by a neural network (capable of conversion into a visual image), would fall under the “identical” part.
Since perceptual hashing does exist, there are open source libraries to run it, and even Beehaw runs an AI based image filter, the “reasonable effort” is arguably to use all those tools as the bare minimum. Even if they sometimes (or always) fail at removing all instances of a depiction.
But ultimately, deciding whether a service has applied all “reasonable efforts” to remove “identical copies” of a “depiction”, will fall on the shoulders of a judge… and even starting to go there, can bankrupt most sites.
Why “no longer”?
because of the “perceptual” part.
A normal hash has the property that it produces wildly different hashes for even the tiniest of changes in the file.
Perceptual hashing flips that requirement on its head, and therefore makes finding a suitable hash function much harder.