**jonny** @jonny@social.coop · Jan 25, 2022, 23:36

**jonny** @jonny@social.coop · Jan 25, 2022, 23:36

jonny @jonny@social.coop

Jan 25, 2022, 23:36

More fun publisher surveillance:
Elsevier embeds a hash in the PDF metadata that is *unique for each time a PDF is downloaded*, this is a diff between metadata from two of the same paper. Combined with access timestamps, they can uniquely identify the source of any shared PDFs.

053de9506bb007c6.jpg

**jonny** @jonny@social.coop · Jan 26, 2022, 00:03

**jonny** @jonny@social.coop · Jan 26, 2022, 00:03

Jan 26, 2022, 00:03

jonny @jonny@social.coop

You can see for yourself using exiftool.
To remove all of the top-level metadata, you can use exiftool and qpdf:

exiftool -all:all= <path.pdf> -o <output1.pdf>
qpdf --linearize <output1.pdf> <output2.pdf>

To remove *all* metadata, you can use dangerzone or mat2

**‍fuchsiaaaaaaaaaaaaaaaaa** @f0x@pixie.town · 2022-01-26T00:45:24Z

‍fuchsiaaaaaaaaaaaaaaaaa @f0x@pixie.town

@jonny word of caution is that while removing exif is good, knowing publishers there's a bunch of other ways they'd directly include such trackers into the file, in a less human/machine readable spot than EXIF. so be careful

Jan 26, 2022, 00:45 · · · ·

**marius851000** @marius851000@framapiaf.org · Jan 26, 2022, 10:09

**marius851000** @marius851000@framapiaf.org · Jan 26, 2022, 10:09

Jan 26, 2022, 10:09

marius851000 @marius851000@framapiaf.org

@f0x
I suspect it should be a good idea to compare two PDF from two different source. If the hash match, it's all good. If the it doesn't, strip the EXIF. If it still doesn't match... find the difference somehow.
@jonny

Resources

Developers

What is Mastodon?

pixie.town

More…