The fragility of encryption

2022-02-132022-02-16 PNRJ computer science, mathematical models encryption, Millennium Prize, P=NP, quantum computing, quantum encryption, quantum mechanics, SAT, security

Feb 13 JDN 2459620

I said in last week’s post that most of the world’s online security rests upon public-key encryption. It’s how we do our shopping, our banking, and paying our taxes.

Yet public-key encryption has an Achilles’ Heel. It relies entirely on the assumption that, even knowing someone’s public key, you can’t possibly figure out what their private key is. Yet obviously the two must be deeply connected: In order for my private key to decrypt all messages that are encrypted using my public key, they must, in a deep sense, contain the same information. There must be a mathematical operation that will translate from one to the other—and that mathematical operation must be invertible.

What we have been relying on to keep public-key encryption secure is the notion of a one-way function: A function that is easy to compute, but hard to invert. A typical example is multiplying two numbers: Multiplication is a basic computing operation that is extremely fast, even for numbers with thousands of digits; but factoring a number into its prime factors is far more difficult, and currently cannot be done in any reasonable amount of time for numbers that are more than a hundred digits long.

“Easy” and “hard” in what sense? The usual criterion is in polynomial time.

Say you have an input that is n bits long—i.e. n digits, when expressed as a binary number, all 0s and 1s. A function that can be computed in time proportional to n is linear time; if it can only be done in time proportional to n², that is quadratic time; n³ would be cubic time. All of these are examples of polynomial time.

But if instead the time required were 2ⁿ, that would be exponential time. 3ⁿand 1.5ⁿ would also be exponential time.

This is significant because of how much faster exponential functions grow relative to polynomial functions, for large values of n. For example, let’s compare n³with2ⁿ. When n=3, the polynomial is actually larger: n³=27 but 2ⁿ=8. At n=10 they are nearly equal: n³=1000 but 2ⁿ=1024. But by n=20, n³is only 8000 while 2ⁿis over 1 million. At n=100, n³is a manageable (for a modern computer) 1 million, while 2ⁿis a staggering 10³⁰; that’s a million trillion trillion.

You may see that there is already something a bit fishy about this: There are lots of different ways to be polynomial and lots of different ways to be exponential. Linear time n is clearly fast, and for many types of problems it seems unlikely one could do any better. But is n¹⁰⁰ time really all that fast? It’s still polynomial. It doesn’t take a large exponential base to make for very fast growth—2 doesn’t seem that big, after all, and when dealing with binary digits it shows up quite naturally. But while 2ⁿ grows very fast even for reasonably-sized n, 1.0000001ⁿ grows slower than most polynomials—even linear!—for quite a long range before eventually becoming very fast growth when n is in the hundreds of millions. Yet it is still exponential.

So, why do we use these categories? Well, computer scientists and mathematicians have discovered that many types of problems that seem different can in fact be translated into one another, so that solving one would solve the other. For instance, you can easily convert between the Boolean satisfiability problem and the subset-sum problem or the travelling salesman problem. These conversions always take time that is a polynomial in n(usually somewhere between linear and quadratic, as it turns out). This has allowed to build complexity classes, classes of problem such that any problem can be converted to any other in polynomial time or better.

Problems that can be solved in polynomial timeare in class P, for polynomial.

Problems that can be checked—but not necessarily solved—in polynomial time are in class NP, which actually stands for “non-deterministic polynomial” (not a great name, to be honest). Given a problem in NP, you may not be able to come up with a valid answer in polynomial time. But if someone gave you an answer, you could tell in polynomial time whether or not that answer was valid.

Boolean satisfiability (often abbreviated SAT) is the paradigmatic NP problem: Given a Boolean formula like (A OR B OR C) AND (¬A OR D OR E) AND (¬D OR ¬C OR B) and so on, it isn’t a simple task to determine if there’s some assignment of the variables A, B, C, D, E that makes it all true. But if someone handed you such an assignment, say (¬A, B, ¬C, D, E), you could easily check that it does in fact satisfy the expression. It turns out that in fact SAT is what’s called NP-complete: Any NP problem can be converted into SAT in polynomial time.

This is important because in order to be useful as an encryption system, we need our one-way function to be in class P (otherwise, we couldn’t compute it quickly). Yet, by definition, this means its inverse must be in class NP.

Thus, simply because it is easy to multiply two numbers, I know for sure that factoring numbers must be in NP: All I have to do to verify that a factorization is correct is multiply the numbers. Since the way to get a public key from a private key is (essentially) to multiply two numbers, this means that getting a private key from a public key is equivalent to factorization—which means it must be in NP.

This would be fine if we knew some problems in NP that could never, ever be solved in polynomial time. We could just pick one of those and make it the basis of our encryption system. Yet in fact, we do not know any such problems—indeed, we are not even certain they exist.

One of the biggest unsolved problems in mathematics is P versus NP, which asks the seemingly-simple question: “Are P and NP really different classes?” It certainly seems like they are—there are problems like multiplying numbers, or even finding out whether a number is prime, that are clearly in P, and there are other problems, like SAT, that are definitely in NP but seem to not be in P. But in fact no one has ever been able to prove that P ≠ NP. Despite decades of attempts, no one has managed it.

To be clear, no one has managed to prove that P = NP, either. (Doing either one would win you a Clay Millennium Pri ze.) But since the conventional wisdom among most mathematicians is that P ≠ NP (99% of experts polled in 2019 agreed), I actually think this possibility has not been as thoroughly considered.

Vague heuristic arguments are often advanced for why P ≠ NP, such as this one by Scott Aaronson: “If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found.”

That really doesn’t follow at all. Doing something in polynomial time is not the same thing as doing it instantly.

Say for instance someone finds an algorithm to solve SAT in n⁶ time. Such an algorithm would conclusively prove P = NP. n⁶; that’s a polynomial, all right. But it’s a big polynomial. The time required to check a SAT solution is linear in the number of terms in the Boolean formula—just check each one, see if it works. But if it turns out we could generate such a solution in time proportional to the sixth power of the number of terms, that would still mean it’s a lot easier to check than it is to solve. A lot easier.

I guess if your notion of a “fundamental gap” rests upon the polynomial/exponential distinction, you could say that’s not “fundamental”. But this is a weird notion to say the least. If n = 1 million can be checked in 1 million processor cycles (that is, milliseconds, or with some overhead, seconds), but only solved in 10³⁶ processor cycles (that is, over a million trillion years), that sounds like a pretty big difference to me.

Even an n²algorithm wouldn’t show there’s no difference. The difference between n and n², is, well, a factor of n. So finding the answer could still take far longer than verifying it. This would be worrisome for encryption, however: Even a million times as long isn’t really that great actually. It means that if something would work in a few seconds for an ordinary computer (the timescale we want for our online shopping and banking), then, say, the Russian government with a supercomputer a thousand times better could spend half an hour on it. That’s… a problem. I guess if breaking our encryption was only feasible for superpower national intelligence agencies, it wouldn’t be a complete disaster. (Indeed, many people suspect that the NSA and FSB have already broken most of our encryption, and I wouldn’t be surprised to learn that’s true.)

But what I really want to say here is that since it may be true that P=NP—we don’t know it isn’t, even if most people strongly suspect as much—we should be trying to find methods of encryption that would remain secure even if that turns out to be the case. (There’s another reason as well: Quantum computers are known to be able to factor numbers in polynomial time—though it may be awhile before they get good enough to do so usefully.)

We do know two such methods, as a matter of fact. There is quantum encryption, which, like most things quantum, is very esoteric and hard to explain. (Maybe I’ll get to that in another post.) It also requires sophisticated, expensive hardware that most people are unlikely to be able to get.

And then there is onetime pad encryption, which is shockingly easy to explain and can be implemented on any home computer.

The problem with substitution ciphers is that you can look for patterns. You can do this because the key ultimately contains only so much information, based on how long it is. If the key contains 100 bits and the message contains 10,000 bits, at some point you’re going to have to repeat some kind of pattern—even if it’s a very complex, sophisticated one like the Enigma machine.

Well, what if the key were as long as the message? What if a 10,000 bit message used a 10,000 bit key? Then you could substitute every single letter for a different symbol each time. What if, on its first occurrence, E is D, but then it’s Q, and then it’s T—and each of these was generated randomly and independently each time? Then it can’t be broken by searching for patterns—because there are no patterns to be found.

Mathematically, it would look like this: Take each bit of the plaintext, and randomly generate another bit for the key. Add the key bit to the plaintext bit (technically you want to use bitwise XOR, but that’s basically adding), and you’ve got the ciphertext bit. At the other end, subtracting out each key bit will give back each plaintext bit. Provided you can generate random numbers efficiently, this will be fast to encrypt and decrypt—but literally impossible to break without the key.

Indeed, onetime-pad encryption is so secure that it is a proven mathematical theorem that there is no way to break it. Even if you had such staggering computing power that you could try every possible key, you wouldn’t even know when you got the right one—because every possible message can be generated from a given ciphertext, using some key. Even if you knew some parts of the message already, you would have no way to figure out any of the rest—because there are no patterns linking the two.

The downside is that you need to somehow send the keys. As I said in last week’s post, if you have a safe way to send the key, why can’t you send the message that way? Well, there is still an advantage, actually, and that’s speed.

If there is a slow, secure way to send information (e.g. deliver it physically by armed courier), and a fast, insecure way (e.g. send it over the Internet), then you can send the keys in advance by the slow, safe way and then send ciphertexts later the fast, risky way. Indeed, this kind of courier-based onetime-pad encryption is how the “r ed phone” (really a fax line) linking the White House to the Kremlin works.

Now, for online banking, we’re not going to be able to use couriers. But here’s something we could do. When you open a bank account, the bank could give you a, say, 128 GB flash drive of onetime-pad keys for you to use in your online banking. You plug that into your computer every time you want to log in, and it grabs the next part of key each time (there are some tricky technical details with synchronizing this that could, in practice, create some risk—but, done right, the risk would be small). If you are sending 10 megabytes of encrypted data each time (and that’s surely enough to encode a bank statement, though they might want to use a format other than PDF), you’ll get over 10,000 uses out of that flash drive. If you’ve been sending a lot of data and your key starts to run low, you can physically show up at the bank branch and get a new one.

Similarly, you could have onetime-pad keys on flash drives (more literal flash keys)given to you by the US government for tax filing, and another from each of your credit card issuers. For online purchases, the sellers would probably need to have their own onetime-pad keys set up with the banks and credit card companies, so that you send the info to VISA encrypted one way and they send it to the seller encrypted another way. Businesses with large sales volume would go through keys very quickly—but then, they can afford to keep buying new flash drives. Since each transaction should only take a few kilobytes, the cost of additional onetime-pad should be small compared to the cost of packing, shipping, and the items themselves. For larger purchases, business could even get in the habit of sending you a free flash key with each purchase so that future purchases are easier.

This would render paywalls very difficult to implement, but good riddance. Cryptocurrency would die, but even better riddance.It would be most inconvenient to deal with things like, well, writing a blog like this; needing to get a physical key from WordPress sounds like quite a hassle. People might actually just tolerate having their blogs hacked on occasion, because… who is going to hack your blog, and who really cares if your blog gets hacked?

Yes, this system is awkward and inconvenient compared to our current system. But unlike our current system, it is provably secure. Right now, it may seem like a remote possibility that someone would find an algorithm to prove P=NP and break encryption. But it could definitely happen, and if it did happen, it could happen quite suddenly. It would be far better to prepare for the worst than be unprepared when it’s too late.

This attack on the postal service must not stand

2020-08-232020-08-16 PNRJ current events, public policy blockchain, COVID-19, Donald Trump, election, FiveThirtyEight, Joe Biden, politics, security, Trump, US Postal Service, USPS, voting

Aug 23 JDN 2459085

Trump has done so many unprecedented and terrible things that we can become numbed by it all, unable to process each new offense because we are already overwhelmed by the others. Perhaps this is a kind of strategy on his part: Keep doing so many outrageous things that we lose our capacity to be outraged. Already it is fair to say that at least half of the 160,000 (and counting) Americans killed by COVID-19 would still be alive if a better President had been in office.

But the attack on the US Postal Service deserves particular attention, because the disruption of mail-in voting during a pandemic could radically alter the results of the election. Indeed, Trump has all but said that this was his goal in defunding the post office.

Trump has long hated the postal service (perhaps because it is a clear example of federal government doing things well and helping people), but his full-scale war upon it started with the appointment of Louis DeJoy as Postmaster General, whose main qualifications appear to be that he has given millions of dollars to Republican campaigns and hates everything the post office stands for. I am quite certain that if there were a Director of Henhouse Affairs, Trump would appoint the Fantastic Mr. Fox.

The White House chief of staff claims that there have been no mail sorting machines decommissioned aside from those that were normally scheduled for replacement. Yet it’s easy to find a number of different sources claiming that there have been far more machines shut down than usual. Postal workers have also spoken out about other kinds of restructuring in the postal system that claim to be about “reducing costs” but seem to be systematically impairing the speed and reliability of service.

Trump claims that mail-in voting is insecure, which has a kernel of truth: Mail-in voting certainly doesn’t have the ironclad security against fraud that in-person voting has. (Unlike in-person voter fraud, mail-in voter fraud actually exists.) But not only is his concern obviously overblown, the USPS has even taken measures to upgrade their security using blockchain encryption. Bitcoin has always been a stupid idea (though a very lucrative one for anyone who bought in early), but blockchain does have some major advantages for voting security, because it is one of the few ways to make a remote system that is simultaneously secure and anonymous. Indeed, I think blockchain encryption (combined with more standard SSL encryption that most web pages already use) might well be a way to implement full-scale online voting—though surely not in time for this election.

The US Postal Service is the most popular federal agency in the United States, followed by the CDC, the Census Bureau, and the Department of Health and Human Services, all of which deservedly have strong bipartisan majority support among voters. It may surprise you to learn that the Department of Homeland Security, the IRS, and the Department of Justice also have strong majority support—though with substantial partisan differences. The most divisive federal agency is ICE, which is beloved by Republicans but hated by Democrats.

Some 91% of Americans approve of the USPS—and why shouldn’t they? It is objectively rated one of the best postal systems in the world—and if anything this isn’t even fair, because most of the other top-rated postal services, particularly Switzerland, the Netherlands, and Singapore, have far smaller areas to cover than the US does. If we restrict ourselves to countries of at least 10 million people and territory of at least 100,000 square kilometers, there are only four postal services rated higher than the US: Japan, Germany, France, and Poland. If we restrict to countries of at least 100 million people, only Japan remains.

Thus, attacking the postal service is clearly not a winning proposition if your goal is to advance the interests of your constituents or even gain more votes. But during a pandemic, mail-in voting is likely to be—and well should be—a very large proportion of all votes. Sabotaging the mail system is a highly effective way to make it much harder to vote in general. And that seems to very much be Trump’s intention.

It is a general pattern that when voting gets harder, Republicans become more likely to win. Liberal voters are more likely to be young adults, poor people, or people of color, all of whom generally have a harder time making it to the polls. This may be less true in this election in particular, because against Trump in particular people who are highly educated and live in cities have been far more likely to vote against Trump—and these are groups of people with particularly high voter turnout. Empirical estimates of how a switch to mail-in voting will affect the election results have been highly ambiguous.

Indeed, perhaps this makes the Republican vote suppression campaign even more sinister: Perhaps they have moved beyond simply trying to tilt the scales in elections and are now willing to actively suppress democracy itself. It sounds radical, if not outright crazy, to assert such a thing—but many of the things that Trump and his Republican lackeys have done would have sounded crazy to me just a few years ago. I can’t believe I’m saying this, but I honestly don’t know that Trump will concede defeat when he loses the election—he may refuse to accept the election results and try to stay in office via some sort of coup d’etat. Why do I think this could happen? Because he said so himself on national television. Vladimir Putin must be so embarrassed; his protege doesn’t even know how to be subtle about his authoritarianism.

FiveThirtyEight is currently giving Biden a 72% chance of victory, which is about 27% too low for my taste. That isn’t much better than the margin Hillary Clinton had four years ago. We can only hope that Trump attacking the most popular agency in our federal government will tilt those odds a little further.

What you can do to protect against credit card fraud

2017-06-182017-06-11 PNRJ financial advice, financial fraud, public policy credit card, fraud, FTC, money, password, safety, security, theft

JDN 2457923

This is the second post in my ongoing series on financial fraud, but it’s also some useful personal financial advice. One of the most common forms of fraud, which I have experienced, and most Americans will experience at some point in their lives, is credit card fraud. The US leads the world in credit card fraud, accounting for 47% of all money stolen by this means. In most countries credit card fraud is declining, but not here.

The good news is that there are several things you can do to reduce both the probability of being victimized and the harm you will suffer if you are. I am of course not the first to make such recommendations; similar lists have been made by the Wall Street Journal, Consumer Reports, and even the FTC itself.

1. The first and simplest is to use fewer credit cards.

It is a good idea to have at least one credit card, because you can build a credit history this way which will help you get larger loans such as car loans and home loans later. The best thing to do is to use it for regular purchases and then pay it off as quickly as you can. The higher the interest rate, the more imperative it is to pay it quickly.

More credit cards means that you have more to keep track of, and more that can be stolen; it also generally means that you have larger total credit limits, which is a mixed blessing at best. You have more liquidity that way, to buy things you need; but you also have more temptation to buy things you don’t actually need, and more risk of losing a great deal should any of your cards be stolen.

2. Buy fewer things online, and always from reputable merchants.

This is one I certainly preach more than I practice; I probably buy as much online now as I do in person. It’s hard to beat the combination of higher convenience, wider selection, and lower prices. But buying online is the most likely way to have your credit card stolen (and it is certainly how mine was stolen a few years ago).

The US is unusual among developed countries because we still mainly use magnetic-strip cards, whereas most countries have switched to the EMV system of chip-based cards that provide more security. But this security measure is really quite overrated; it can’t protect against “card not present” fraud, which is by far the most common. Unless and until you can somehow link up the encrypted chips to your laptop in order to use them to pay online, the chips will do little to protect against fraud.

3. Monitor your bank and credit card statements regularly.

This is something you should be doing anyway. Online statements are available from just about every major bank and credit union, and you can check them at any time, any day. Watching these online statements will help you keep track of your spending, manage your budget, and, yes, protect against fraud, because the sooner you see and report a suspicious transaction the more likely you are to recover the money.

4. Use secure passwords, don’t re-use passwords, and use a secure password manager.

Most people still use remarkably insecure passwords for their online accounts. Hacking your online accounts —especially your online retail accounts, like Amazon—typically means being able to steal your credit cards. As we move into the cyberpunk future, personal security will increasingly be coextensive with online security, and until we find something better, that means good passwords.

Passwords should be long, complicated, and not easily tied to anything about you. To remember them, I highly recommend the following technique: Write a sentence of several words, and then convert the words of that sentence into letters and numbers. For example (obviously don’t use this particular example; the whole point is for passwords to be unique), the sentence “Passwords should be long, complicated, and not easily tied to anything about you.” could become the password “Psblcanet2aau”.

Human long-term memory is encoded in something very much like narrative, so you can make a password much more memorable by making it tell a story. (Literally a story if you like: “Once upon a time, in a land far away, there were seven dwarves who lived in a forest.” could form the password “1uatialfatw7dwliaf”.) If you used the whole words, it would be far too long to fit in most password systems; but by condensing it into letters, you keep it memorable while allowing it to fit. The first letters of English words are not quite random—some letters are much more common than others, for example—but as long as the password is long enough this doesn’t make it substantially easier to guess.

If you have any doubts about the security of your password, do the following: Generate a new password by the same method you used to generate that one, and then try the new password—not the old password—in an entropy checking utility such as https://howsecureismypassword.net/. The utility will tell you approximately how long it would take to guess your password by guessing random characters using current technology. This is really an upper limit—computers will get faster, and by knowing things about you, hackers can improve upon random guessing substantially—but a good password should at least be in the thousands or millions of years, while a very bad password (like the word “password” itself) can literally be in the nanoseconds. (Actually if you play around you can generate passwords that can take far longer, even “12 tredecillion years” and the like, but they are generally too long to actually use.) The reason not to use your actual password is that there is a chance, however remote, that it could be intercepted while you were doing the check. But by checking the method, you can ensure that you are generating passwords in an effective way.

After you’ve generated all these passwords, how do you remember them all? It’s unreasonable to expect you to keep them all in your head. Instead, you can just keep a few of the most important ones in your head, including a master password that you then use for a password manager like LastPass or Keeper. Password managers are frequently rated by sites like PC Mag, CNET, Consumer Affairs, and CSO. Get one that is free and top-rated; there’s no reason to pay when the free ones are just as good, and no excuse for getting any less than the best when the best ones are free.

The idea of a password manager makes some people uncomfortable—aren’t you handing your passwords over to someone else?—so let me explain it a little. You aren’t actually handing over your passwords, first of all; a reputable password manager will actually encrypt your passwords locally, and then only transmit encrypted versions of them to the site that operates the password manager. This means that no one—not the company, not even you—can access those passwords without knowing the master password, so definitely make sure you remember that master password.

In theory, it would be better to just remember different 27-character alphanumeric passwords for each site you use online. This is indisputable. Encryption isn’t perfect, and theoretically someone might be able to recover your passwords even from Keeper or LastPass. But that is astronomically unlikely, and what’s far more likely is that if you don’t use a password manager, you will forget your passwords, or re-use them and get them stolen, or else make them too simple and allow them to be guessed. A password manager allows you to maintain dozens of distinct, very complex passwords, and even update them regularly, all while remembering only one or a few. In practice, this is what provides the best security.

5. Above all, report any suspicious activity immediately.

This one I cannot emphasize enough. If you do nothing else, do this. If you ever have any reason to suspect that your credit card might have been compromised, call your bank immediately. Get them to cancel the card, send you a new one, and check any recent transactions.

Do this if you lose your wallet. Do it if you see something weird on your online statement. Do it if you bought something from an online retailer that seemed a little sketchy. Do it if you just have a weird hunch and something doesn’t feel right. The cost of doing this is a minor inconvenience; the benefit could be thousands of dollars.

If you do report a stolen card, in most cases you won’t be held liable for a penny—the credit card company will have to cover any losses. But if you don’t, you could end up making payments on interest on a balance that a thief ran up on your behalf.

If we all do this, credit card fraud could become a thing of the past. Now, about those interest rates…

How we sold our privacy piecemeal

2017-04-022017-03-26 PNRJ cognitive science, heuristics and biases, market failure, public policy, Uncategorized ACLU, Big Data, data, data broker, Facebook, Google, just-noticeable difference, preferences, privacy, security, TIME

Apr 2, JDN 2457846

The US Senate just narrowly voted to remove restrictions on the sale of user information by Internet Service Providers. Right now, your ISP can basically sell your information to whomever they like without even telling you. The new rule that the Senate struck down would have required them to at least make you sign a form with some fine print on it, which you probably would sign without reading it. So in practical terms maybe it makes no difference.

…or does it? Maybe that’s really the mistake we’ve been making all along.

In cognitive science we have a concept called the just-noticeable difference (JND); it is basically what it sounds like. If you have two stimuli—two colors, say, or sounds of two different pitches—that differ by an amount smaller than the JND, people will not notice it. But if they differ by more than the JND, people will notice. (In practice it’s a bit more complicated than that, as different people have different JND thresholds and even within a person they can vary from case to case based on attention or other factors. But there’s usually a relatively narrow range of JND values, such that anything below that is noticed by no one and anything above that is noticed by almost everyone.)

The JND seems like an intuitively obvious concept—of course you can’t tell the difference between a color of 432.78 nanometers and 432.79 nanometers!—but it actually has profound implications. In particular it undermines the possibility of having truly transitive preferences. If you prefer some colors to others—which most of us do—but you have a nonzero JND in color wavelengths—as we all do—then I can do the following: Find one color you like (for concreteness, say you like blue of 475 nm), and another color you don’t (say green of 510 nm). Let you choose between the blue you like and another blue, 475.01 nm. Will you prefer one to the other? Of course not, the difference is within your JND. So now compare 475.01 nm and 475.02 nm; which do you prefer? Again, you’re indifferent. And I can go on and on this way a few thousand times, until finally I get to 510 nanometers, the green you didn’t like. I have just found a chain of your preferences that is intransitive; you said A = B = C = D… all the way down the line to X = Y = Z… but then at the end you said A > Z. Your preferences aren’t transitive, and therefore aren’t well-defined rational preferences. And you could do the same to me, so neither are mine.

Part of the reason we’ve so willingly given up our privacy in the last generation or so is our paranoid fear of terrorism, which no doubt triggers deep instincts about tribal warfare. Depressingly, the plurality of Americans think that our government has not gone far enough in its obvious overreaches of the Constitution in the name of defending us from a threat that has killed fewer Americans in my lifetime than die from car accidents each month.

But that doesn’t explain why we—and I do mean we, for I am as guilty as most—have so willingly sold our relationships to Facebook and our schedules to Google. Google isn’t promising to save me from the threat of foreign fanatics; they’re merely offering me a more convenient way to plan my activities. Why, then, am I so cavalier about entrusting them with so much personal data?

Well, I didn’t start by giving them my whole life. I created an email account, which I used on occasion. I tried out their calendar app and used it to remind myself when my classes were. And so on, and so forth, until now Google knows almost as much about me as I know about myself.

At each step, it didn’t feel like I was doing anything of significance; perhaps indeed it was below my JND. Each bit of information I was giving didn’t seem important, and perhaps it wasn’t. But all together, our combined information allows Google to make enormous amounts of money without charging most of its users a cent.

The process goes something like this. Imagine someone offering you a penny in exchange for telling them how many times you made left turns last week. You’d probably take it, right? Who cares how many left turns you made last week? But then they offer another penny in exchange for telling them how many miles you drove on Tuesday. And another penny for telling them the average speed you drive during the afternoon. This process continues hundreds of times, until they’ve finally given you say $5.00—and they know exactly where you live, where you work, and where most of your friends live, because all that information was encoded in the list of driving patterns you gave them, piece by piece.

Consider instead how you’d react if someone had offered, “Tell me where you live and work and I’ll give you $5.00.” You’d be pretty suspicious, wouldn’t you? What are they going to do with that information? And $5.00 really isn’t very much money. Maybe there’s a price at which you’d part with that information to a random suspicious stranger—but it’s probably at least $50 or even more like $500, not $5.00. But by asking it in 500 different questions for a penny each, they can obtain that information from you at a bargain price.

If you work out how much money Facebook and Google make from each user, it’s actually pitiful. Facebook has been increasing their revenue lately, but it’s still less than $20 per user per year. The stranger asks, “Tell me who all your friends are, where you live, where you were born, where you work, and what your political views are, and I’ll give you $20.” Do you take that deal? Apparently, we do. Polls find that most Americans are willing to exchange privacy for valuable services, often quite cheaply.

Of course, there isn’t actually an alternative social network that doesn’t sell data and instead just charges a subscription fee. I don’t think this is a fundamentally unfeasible business model, but it hasn’t succeeded so far, and it will have an uphill battle for two reasons.

The first is the obvious one: It would have to compete with Facebook and Google, who already have the enormous advantage of a built-in user base of hundreds of millions of people.

The second one is what this post is about: The social network based on conventional economics rather than selling people’s privacy can’t take advantage of the JND.

I suppose they could try—charge $0.01 per month at first, then after awhile raise it to $0.02, $0.03 and so on until they’re charging $2.00 per month and actually making a profit—but that would be much harder to pull off, and it would provide the least revenue when it is needed most, at the early phase when the up-front costs of establishing a network are highest. Moreover, people would still feel that; it’s a good feature of our monetary system that you can’t break money into small enough denominations to really consistently hide under the JND. But information can be broken down into very tiny pieces indeed. Much of the revenue earned by these corporate giants is actually based upon indexing the keywords of the text we write; we literally sell off our privacy word by word.

What should we do about this? Honestly, I’m not sure. Facebook and Google do in fact provide valuable services, without which we would be worse off. I would be willing to pay them their $20 per year, if I could ensure that they’d stop selling my secrets to advertisers. But as long as their current business model keeps working, they have little incentive to change. There is in fact a huge industry of data brokering, corporations you’ve probably never heard of that make their revenue entirely from selling your secrets.

In a rare moment of actual journalism, TIME ran an article about a year ago arguing that we need new government policy to protect us from this kind of predation of our privacy. But they had little to offer in the way of concrete proposals.

The ACLU does better: They have specific proposals for regulations that should be made to protect our information from the most harmful prying eyes. But as we can see, the current administration has no particular interest in pursuing such policies—if anything they seem to do the opposite.

	Mark James on Marriage and matching
	The most dangerous i… on Why would AI kill us?
	lorentjd on The stochastic superstar …
	lorentjd on The stochastic superstar …
	PNRJ on The stochastic superstar …

Human Economics

Cognitive development economics: Understanding the mind in order to feed the world

security

The fragility of encryption

This attack on the postal service must not stand

What you can do to protect against credit card fraud

How we sold our privacy piecemeal