Digital Verifiability: What You Need to Know
tldr: Very few digital structures are actually verifiable. However, the cost of faking data varies widely by data type and the verification methods, making some data formats are more suitable for verification than others. As a rule of thumb, richer data is more expensive to fake. Public-Private Key cryptography provides some useful tools, but is in and of itself insufficient to ensure verifiability.
Verifiability of digital data and the Paris Agreement
Historically, public funds for climate action have attracted various fraudsters, making donors extremely skeptical. It is also in the nature of international climate collaboration that the people closest to the action have most to gain by lying about the size of the impact. Thus most climate funds and donors prefer to pay for verified impacts only. However, the more scientifically literate donors also understand that the non-existence of an invisible trace gas - the actual “impact” of a climate action - can only be monitored and verified by proxy. Thus the common understanding is that the verification focuses on the monitor-able actions instead.
In my classic example, the real-world action it thus requires verification of a energy efficient stove actually being used to prepare meals in a school.
Making those real-world actions verifiable in a digital native way is crucial in today’s world where almost all data is processed and stored electronically. Even more critically, only digital verification methods can scale without proportionally scaling the cost of verification and thus limiting the speed of scale up of climate action. However, with the constant threat of cyber attacks and fraud, it’s important to understand what verifiability means in the digital world.
The Core Problem
At its core, the problem with digital verifiability is the deep uncertainty of digital systems. Even when a system is considered “secure,” it’s may be only a matter of time until a vulnerability is discovered and exploited. So the recommendations at the end of this article can only be a snapshot of the state of verifiability at the time of publication.
However, all digital data, whether it’s a photo, video, or a series of numbers, is just a a binary — a long series of 1101011011101... This means that digital verifiability largely boils down to mathematics. Fortunately, those underlying mathematical properties have remained solid for a long time. There is one mathematical threshold concept without which discussions of digital verifiability is not viable: Asymmetric cost of computation. If you do not know what those words mean, please read the footnote at this end of the article before continuing.
Verifiability of Data Types
Not all data formats are equally verifiable. Some data formats are very hard to fake, while others are almost impossible to verify. Here are three categories of data formats based on their verifiability:
Mathematically Verifiable Data
These are data formats for which there are no known mathematical tricks to break, even with the most powerful computers. Examples of such data include RSA signatures confirming the upload timestamp of a file and the hash of a file, which confirms that the data has not changed since its upload. However, the drawback of this type of data is that it requires reliable private key management, an eldritch art only mastered by a few elite hackers.
Somewhat Verifiable Data
These data formats are in an ongoing arms race against fake data generation technologies, but they currently have the lead. It’s generally cheaper to be truthful with this type of data than to lie, and the difference in effort increases with resolution of the data. Examples of such data include audiovisual data such as videos and satellite images. The underlying math has an asymmetric cost of computation, but the asymmetry is not as extreme as for RSA signatures. Put simply, it is much harder to create a fake video than to detect a less-than-perfect fake, and the higher the resolution, the stronger this effect.
More often than not, these types of data are easier to verify for a human than for a computer — analog to the the “I’m not a robot” / captcha type of problem.
Unverifiable Data
These data formats are easily manipulated and indistinguishable from real data. It’s usually cheaper to fake this type of data than to collect the real data. Examples of such data include text, purely numeric data, data with predictable patterns (e.g. solar energy production can be predicted from weather data) and GPS data.
Verifiability in the digital world, requires an architecture that relies on the second type of data formats and thus makes it prohibitively expensive to create plausible fakes.
On Sensors
Sensors are pieces of hardware that record the physical response of a detector and encode it digitally. The most commonly found sensors are the microphones, cameras, gyroscopes and GPS sensors embedded in every smart phone. Any monitoring system relying exclusively on smart phones can thus rapidly scale over large areas.
While the idea of custom made sensors (e.g. thermometers inside a stove) may sound appealing enough to ignore all privacy concerns, their verifiability is complicated — especially if their technical specifications, including the source code of all drivers and the logic circuits of their electronics is proprietary. Unless the sensor is an actual “secure hardware environment”, the data generated by a thermometer is just purely numeric data and thus falls into the “unverifiable” category.
Tools and Recommendations
Given the extremely low cost and wide availability of smart phone cameras, There is simply no excuse for NOT requiring video evidence for a climate action.
While it is very well possible to lie with a camera by framing and deliberate choice to exclude parts of the real situation from the image, this can be counteracted by enforcing a “script” on video recordings. The cost of creating entirely fake videos is dramatically increased by the addition of such requirements.
One requirement that we use is a 360 degree camera rotation or camera movements from panoramic outdoor view into a room without cuts.
Another requirement that greatly increases the cost of faking a recording is to include a random element in the script that is revealed only shortly prior to the recording — this trick limits the time available to fake the video to the period between the reveal of the random element and the (verifiable) upload date.
Further, the videos can be accompanied by additional weaker data allowing for triangulation:
- GPS Data can be combined with satellite pictures to locate the structures visible in the video (e.g. buildings, trees, hills) — we use the 360 degree shots for this purpose.
- Gyroscope data recorded while filming should match the camera movement — adding to the difficulty of creating a fake.
- Last but not least, verification can be supported by social data. For example, you can require recordings of interviews with local community leaders and keep reputation scores of the local change agents recording the data.
We have a prototype for this approach up and running with our new venture Climate Gains.
Footnote on Asymmetric Cost of Computation: The concept of asymmetric cost of computation refers to the idea that performing certain mathematical operations is much more difficult or resource-intensive in one direction compared to the opposite direction. This property is leveraged in various digital security systems, including public-private key cryptography. In these systems, it is computationally easy to encrypt a message, but difficult to decrypt it without the proper key. The idea of asymmetric cost is a critical component in understanding how digital security systems function and provides a framework for evaluating the feasibility of various verification methods.
Here is a basic explanation in a video, https://www.youtube.com/watch?v=d_FU9tZIo10 resp. as text https://maths.straylight.co.uk/archives/108. If you want to experience asymmetric calculation cost with your own brain, you can try this exercise: “Prime Factorization” is finding which prime numbers multiply together to make the original number.
Calculate the prime factorization for 8507.
Now multiply 181 and 47.
Which way was easier?