Classic McEleice and the NIST search for post-quantum crypto

I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.

Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.

About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.

This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].

For Public Key Encryption and Key Exchange Mechanism, Prof Bill Buchanan thinks that the winner will be lattice-based, but I am not convinced.

Robert McEleice at his retirement in 2007

Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:

He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.

Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.

Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.

Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.

We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.

As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.

The trouble with group labels. Because.

So Australia’s long-term accepted refugee detainees on Manus and Nauru will not be able to migrate to the US, if they come from a country that’s on Trump’s list.  So anyone from a particular set of countries is classified as bad.  Because Muslim.

paper boatFundamentally of course, these fellow humans should not be detained at all. They have been accepted as genuine refugees by the Australian immigration department.  Their only “crime”, which is not actually a crime by either Australian or International law, was to arrive by boat.  They are now, as a group, abused as deterrent marketing material.  Anyone coming by boat to Australia is classified as bad.  Because boat (although stats actually indicate that it might correlate better with skin colour).

My grandfather, a surgeon at a Berlin hospital, lost his job on the same day that Hitler was elected.  Since not even decrees move that quickly and the exclusion of jews from  particular professions happened gradually over the 1930s, we can only assume that the hospital management contained some overzealous individuals, who took initiatives that they thought would be looked on favourably by their new superiors. Because Jew.

The Berlin events are a neat example of how a hostile atmosphere enables certain behaviour – of course, if you’d asked the superiors, they didn’t order any such thing and it had nothing to do with them. Sound familiar?

Via jiggly paths, my family came to the UK.  Initially interned.  Because German.

My grandfather, whom as I mentioned was an accomplished surgeon, had to re-do all his medical education in Britain – during that period he was separated from his wife and two young daughters (Scotland vs the south of England – a long way in the 1930s).  Because [the qualifications and experience] not British.  Possibly also Because German.

Hassles with running a practice, and being allowed a car.  Because German.

Then allowed those things.  Because Useful.  He was still a German Jew though….

I mentioned this, because other refugees at the time would have had the same rules applied, but my grandfather had the advantage of his profession and thus being regarded as useful.  Others would not have had that benefit.  This means that people were being judged “worthy” merely based on their immediate usefulness to the local politics of the day, not for being a fellow human being in need, or any other such consideration.

Group Labels and Value Judgements

Any time we classify some group as more or less worthy than another group or set of groups, trouble will follow – both directly, and indirectly.  Every single time.  And the trouble will affect everybody, it’s not selective to some group(s).  Also every time.  Historically verifiable.

brown eyes - blue eyesPopulists simplify, for political gain.  Weak leaders pander to extreme elements, in the hope that it will keep them in power.

The method of defining groups doesn’t matter, nor does one need to add specific “instructions”. The “Blue Eye experiment” proved this. The nasties are initiated merely through a “simple” value judgement of one group vs another.  That’s all that’s required, and the bad consequences are all implied and in a way predetermined.  Trouble will follow, and it’s not going to end well.

Identity

It’s fine to identify as a member of a particular group, or more  likely multiple groups as many factors overlap.  That is part of our identity.  But that’s is not the same as passing a judgement on the relative value of one of those groups vs another group.

  • Brown vs blue eyes
  • Muslim vs Christian vs Jew
  • White vs black “race”
  • One football club vs another

human skullIt really doesn’t matter.  Many of these groupings are entirely arbitrary.  The concept of “race” has no scientific basis, we humans are verifiably a single species.

You can make up any arbitrary classification.  It won’t make a difference.  It doesn’t matter.  The outcomes will be the same.

Given what we know about these dynamics, anyone making such value judgements is culpable.  If they’re in a leadership position, I’d suggest that any utterances in that realm indicate either incompetence or criminal intent. Don’t do that.

Don’t accept it.  Don’t ignore it.  Don’t pander to it.  Don’t vote for it.

Speak up and out against it.  For everybody’s sake.  Because I assure you, every single example in history shows that it comes back to bite everyone.  So even if you don’t really feel a connection with people you don’t know, it’ll come and bite you and yours, too.

It’s rearing its ugly head, again, and if we ignore it it will bite us. Badly. Just like any previous time in history.  Guaranteed.

Maximising available dynamic memory on Arduino

I like programming in small spaces, it makes one think efficiently and not be sloppy.

Example: I was looking at an embedded device the other day, and found a complete Tomcat server and X desktop running on a Raspberry Pi type environment.  So that kind of stuff makes me shudder, it just looks wrong and to me it shows that whoever put that together (it was a company product, not just a hobby project) is not really “thinking” embedded.  Yes we have more powerful embedded CPUs available now, and more memory, but that all costs power.  So really: code efficiency = power efficiency!

Back to Arduino, a standard Arduino Uno type board (Atmel ATMEGA 328P) has 32K of program storage space, and 2K of dynamic memory.  The latter is used for runtime variables and the stack, so you want to be sure you always have enough spare there, particularly when using some libraries that need a bit of working space.

A bit of background… while in “big” CPUs the program and variable memory space is generally shared, it’s often separated in microcontrollers.  This makes sense considering the architecture: the program code is flashed and doesn’t change, while the variable space needs to be written at runtime.

The OneWire library, used to interface with for instance Maxim one-wire sensors, applies a little lookup table to do its CRC (checksum) calculation.  In this case it’s optimising for time, as using a table lookup CRC is much faster than just calculating for each byte.  But, it does optimise the table, by using the PROGMEM modifier when declaring the static array.  It can be optimised further still, see my post on reducing a CRC lookup table from 256 entries to 32 entries (I submitted a patch and pull request).  What PROGMEM does is tell the compiler to put the array in program storage rather than the dynamic variable space.

Arduino uses the standard GNU C++ compiler to cross-compile for the Atmel chips (that form the core of most Arduino boards), and this compiler “normally” just puts any variable (even if “const” – constant) with the rest of the variables.  This way of organising is perfectly sensible except with these microcontrollers.  So this is why we need to give the compiler a hint that certain variables can be placed in program memory instead!

Now consider this piece of code:

Serial.println("Hello, world!");

Where does the “Hello, world!” string get stored by default?  In the variable space!

I hadn’t really thought about that, until I ran short of dynamic memory in the controller of my hot water system.  The issue actually manifested itself by causing incorrect temperature readings, and sometimes crashing.  When looking at it closer, it became clear that the poor thing was running out of memory and overwriting other stuff or otherwise just getting stuck.  The controller used is an Freetronics EtherTen, which is basically an Arduino Uno with an Ethernet breakout integrated on the same board.  Using Ethernet with a Uno gets really tight, but it can be very beneficial: the controller is powered through 802.3f PoE, and communicates using UDP packets.

I knew that I wasn’t actually using that many actual variables, but was aware that the Ethernet library does require a fair bit of space (unfortunately the “how much” is not documented, so I was just running on a “as much as possible”).  Yet after compiling I was using in the range of 1K of the dynamic variable space, which just looked like too much.  So that’s when I started hunting, thought it over in terms of how I know compilers think, and then found the little note near the bottom of the PROGMEM page, explaining that you can use the F() macro on static strings to also place them in program storage. Eureka!

Serial.println(F("Hello, world!"));

When you compile an Arduino sketch, right at the end you get to see how much memory is used. Below is the output on a tiny sketch that only prints the hello world as normal:

Sketch uses 1374 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 202 bytes (9%) of dynamic memory, leaving 1846 bytes for local variables. Maximum is 2048 bytes.

If you use the F() modifier for this example, you get:

Sketch uses 1398 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 188 bytes (9%) of dynamic memory, leaving 1860 bytes for local variables. Maximum is 2048 bytes.

The string is 13 bytes long plus its ‘\0’ terminator, and indeed we see that the 14 bytes have shifted from dynamic memory to program storage. Great score, because we know that that string can’t be modified anyway so we have no reason to have it in dynamic variable space.  It’s really a constant.

Armed with that new wisdom I went through my code and changed the serial monitoring strings to use the F() modifier.  That way I “recovered” well over 200 bytes of dynamic variable space, which should provide the Ethernet library with plenty of room!

Now, you may wonder, “why doesn’t he just use an #ifdef to remove the serial debugging code?”.  I could do that, but then I really have no way to easily debug the system on this hardware platform, as it wouldn’t work with the debugging code enabled (unless the Ethernet library is not active).  So that wouldn’t be a winner.  This approach works, and now with the extra knowledge of F() for string constants I have enough space to have the controller do everything it needs to.

Finally, while I did an OSDC conference talk on this solar hot water controller (video on this the PV system in my home) some years ago already, I realise I hadn’t actually published the code.  Done now, arjenlentz/HomeResourceMonitor on GitHub.  It contains some hardcoded stuff (mostly #define macros) for our family situation, but the main story is that the automatic control of the boost saves us a lot of money.  And of course, the ability to see what goes on (feedback) is an important factor for gaining understanding, which in turn leads to adjusting behaviour.

Using expired credit cards

Using expired credit/debit cards… surely you can’t do that?  Actually, yes you can. This is how it goes.

First, what can be observed (verified):

On a vendor site that allows your to save your card (hopefully via a token with the payment gateway provider, so it doesn’t actually store your card), you enter the card number and expiry date for your then still valid card.  This is necessary because otherwise the site is likely to reject your input.  Makes sense.

Some time later your card expires, but the vendor is still quite happy to keep using the card-on-file for recurring payments.  The payment gateway apparently doesn’t mind, and our banks apparently don’t mind.  I have observed this effect with Suncorp and Bank of Queensland, please let me know if you’ve observed this with other banks.

From this point, let’s play devil’s advocate.

  • What if someone got hold of your card number+expiry date?
    • Well, sites tend to reject dates-in-the-past on input. Excellent.
  • What if that someone just does +4 on the year and then enters it – renewed cards tend to have the same number, just with an updated expiry 4 years in to the future (the exact number of years may differ between banks) ?
    • Payment gateway should reject the card, because even though the card+expiry is “ok”, the CVV (Card Verification Value, the magic number on the back of the card) would be different!  Nice theory, but…
      • I’ve noted that some sites don’t ask for the CVV, and thus we must conclude at at least some payment gateways don’t require it.  Eek!
        I noticed that the payment gateway for one of these was Westpac.

So what are the underlying issues:

  • Banks let through payments on expired cards.
    • Probably done for client convenience (otherwise you’d be required to update lots of places).
  • Banks issue new cards with the same card number but just an updated year (even the month tends to be the same).
    • Possibly convenience again, but if you need to update your details anyway with some vendor, you might as well update a few more numbers.  I don’t see a valid reason to do this (please comment if you think of something).
  • Some payment gateways don’t require CVV to let through a payment.
    • This is inexcusable and means that the above two habits result in a serious fraud vector.  Payment gateways, credit card companies and banks should not allow this at all, yet somehow it goes through the gateway -> credit card company path without getting rejected.

Security tends to involve multiple layers.  This makes sense, as any one layer can be compromised.  When a security aspect is procedurally compromised, such as not regarding an expired card as expired, or not requiring the o-so-important CVV number for online payments, it’s the vendor itself undoing their security.  If that happens with a few layers, as in the above scenario, security is fatally impacted.  A serious failing.

I have little doubt that people have been using this fraud vector some time as it’s unlikely that I’m the first one spotting this.  In many scenarios, credit card companies tend to essentially weigh security risks against convenience, and refund those affected.  This is what happens with abuse of the PayWave system, and while I don’t really like it, I understand why they do this.  But I also think we have to draw the line somewhere.  Not requiring CVV numbers for online transactions is definitely beyond. Possibly renewing cards with the same number also.  And as it’s the combination of these factors that causes the problem, addressing any one of them could plug the hole – addressing more than one would be great.

Calculating CRC with a tiny (32 entry) lookup-table

I happened to notice that the Arduino OneWire library uses a 256 entry lookup table for its CRC calculations.  I did some research on this topic in 1992-1993, while working on Bulletin Board Systems, FidoNet code and file transfer protocols.  These days memory is not at a premium on most computers, however on Arduino and microcontroller environments it definitely is, and I happen to know that table-lookup CRC can be done using two 16-entry tables!  So I’ve dug up my documentation and code from the time, and applied it to the CRC-8 calculation for the Maxim (Dallas Semiconductor) OneWire bus protocol.  I think this provides a neat trade-off between code size and speed.

License

For any of the below code, apply the following license (2-clause “simplified” BSD license), which should suffice for any use.  If you do require another license, just ask.

CRC lookup tables, generator and use implementation by Arjen Lentz <arjen (at) lentz (dot) com (dot) au>

Copyright (C) 1992-2017 Arjen Lentz

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The Code

The below is a drop-in replacement for the chunk of code in OneWire.cpp, the two 16-entry tables I mentioned above are combined in to a single 32 entry array.

Copyright (c) 2007, Jim Studt  (original old version - many contributors since)

The latest version of this library may be found at:
  http://www.pjrc.com/teensy/td_libs_OneWire.html

OneWire has been maintained by Paul Stoffregen (paul@pjrc.com) since
January 2010.
[...]

// The 1-Wire CRC scheme is described in Maxim Application Note 27:
// "Understanding and Using Cyclic Redundancy Checks with Maxim iButton Products"
//

#if ONEWIRE_CRC8_TABLE
// Dow-CRC using polynomial X^8 + X^5 + X^4 + X^0
static const uint8_t PROGMEM dscrc_table[] = {
    0x00, 0x5E, 0xBC, 0xE2, 0x61, 0x3F, 0xDD, 0x83,
    0xC2, 0x9C, 0x7E, 0x20, 0xA3, 0xFD, 0x1F, 0x41,
    0x00, 0x9D, 0x23, 0xBE, 0x46, 0xDB, 0x65, 0xF8,
    0x8C, 0x11, 0xAF, 0x32, 0xCA, 0x57, 0xE9, 0x74
};

//
// Compute a Dallas Semiconductor 8 bit CRC. These show up in the ROM
// and the registers.
uint8_t OneWire::crc8(const uint8_t *addr, uint8_t len)
{
    uint8_t crc = 0;

    while (len--) {
        crc = *addr++ ^ crc;  // just re-using crc as intermediate
        crc = pgm_read_byte(dscrc_table + (crc & 0x0f)) ^
              pgm_read_byte(dscrc_table + 16 + ((crc >> 4) & 0x0f));
    }
    return crc;
}
#else
[...]

Generating CRC tables

CRC tables are always only presented as magic output, but the feedback terms simply represent the results of eight shifts/xor operations for all combinations of data and CRC register values. Actually, the table generator routine uses the old-style method shifting each bit.

Below is the rough code. For the Dow-CRC, the specified polynomial is X^8 + X^5 + X^4 + X^0. The first term tells us that it’s an 8 bit CRC. We work the rest in to the designate bits but from right to left, so we end up with binary 1000 1100 which is 0x8C in hex. This is the POLY value in the code.

for (i = 0; i <= 15; i++) {
    crc = i;

    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ POLY;
        else
            crc >>= 1;
    }
    crctab[i] = crc;
}

for (i = 0; i <= 15; i++) {
    crc = i << 4;

    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ poly;
        else
            crc >>= 1;
    }
    crctab[16 + i] = crc;
}

If you want to generate a 256-entry table for the same:

for (i = 0; i <= 255; i++) {
    crc = i;
    for (j = 8; j > 0; j--) {
        if (crc & 1)
            crc = (crc >> 1) ^ POLY;
        else
            crc >>= 1;
    }
    crctab[i] = crc;
}

Yes, this trickery basically works for any CRC (any number of bits), you just need to put in the correct polynomial.  However, some CRCs are “up side down”, and thus require a bit of adjustment along the way – the CRC-16 as used by Xmodem and Zmodem worked that way, but I won’t bother you with that old stuff here as it’s not really used any more. Some CRCs also use different initialisation values, post-processing and check approaches. If you happen to need info on that, drop me a line as the info contained in this blog post is just a part of the docu I have here.

CRC – Polynomial Division

To calculate a CRC (Cyclic Redundancy Check) of a block of data, the data bits are considered to be the coefficients of a polynomial. This message (data) polynomial is first multiplied by the highest term in the polynomial (X^8, X^16 or X^32) then divided by the generator polynomial using modulo two arithemetic. The remainder left after the division is the desired CRC.

CRCs are usually expressed as an polynomial expression such as:

X^16 + X^12 + X^5 + 1

The generator polynomial number is determined by setting bits corresponding to the power terms in the polynomial equation in an (unsigned) integer. We do this backwards and put the highest-order term in the the lowest-order bit. The highest term is implied (X^16 here, just means its’s a 16-bit CRC), the LSB is the X^15 term (0 here), the X^0 term (shown as + 1) results in the MSB being 1.

Note that the usual hardware shift register implementation shifts bits into the lowest-order term. In our implementation, that means shifting towards the right. Why do we do it this way? Because the calculated CRC must be transmitted across a serial link from highest- to lowest-order term. UARTs transmit characters in order from LSB to MSB. By storing the CRC this way, we hand it to the UART in the order low-byte to high-byte; the UART sends each low-bit to high-bit; and the result is transmission bit by bit from highest- to lowest-order term without requiring any bit shuffling.

Credits

Stuff found on BBS, numerous files with information and implementations.
Comments in sources on CRC-32 calculation by Gary S. Brown.
All other people hereby credited anonymously, because unfortunately the documents and sources encountered did not contain the name of a specific person or organisation. But, each source provided a piece of the puzzle!

Sources / Docs

My original source code and full docu is also available: crc_agl.zip, as part of my archive of Lentz Software-Development‘s work in the 1990s.