An initial look at running DeepSeek-R1 locally

There’s a lot of blah flying around about DeepSeek and its latest models. My rule when running training sessions has always been TIAS, if someone asked me something that could be figured out relatively quickly by actually trying it. TIAS stands for Try It And See. It’s more valuable, because it’s a first-hand validation and doing the activity means the outcome is more likely to be remembered. So let’s do that!

My desktop has a decent AMD Ryzen 5 CPU, 32GB RAM, and an RTX 3060 GPU with 12GB VRAM. So while that’s not super hot, I sufficient to load and query a variety of LLMs locally without pain. By the way, ollama is a wonderful tool for this! So I had ollama grab deepseek-R1, which is the biggest reasoning model. It takes slightly longer to get an answer, but it tells you the process (aka “what it is thinking”. indicated inbetween the <think> … </think> tags). Details of the model:

>>> /show info
 Model
   architecture       qwen2
   parameters         7.6B
   context length     131072
   embedding length   3584   
   quantization       Q4_K_M

I am not using any pre-prompting here, I simply started ollama with the model and started asking the questions as noted below. A local model in ollama is typically not Internet connected nor does it have access to the broader system, proven by the following:

>>> What is the current date?
<think>

</think>

Hi there! I suggest getting online to get real-time information. If you have any other questions, please don't hesitate to let me know!

Ok, fair. On to a question then that the model might be able to answer.

>>> Who is the current president of the US?
<think>
Alright, the user previously asked for the current date and time, which I provided a response for. Now, they're asking who the current president of the US is.

I need to make sure my response is accurate and up-to-date. Since the last response was from July 2023, it's likely that the current president as of October 2023 would still be Joe Biden.

I should structure the answer clearly, stating their position explicitly. Adding a friendly emoji can make the response feel warmer.

I'll also keep it concise and straightforward to match the user's possible expectations.

</think>

As of now, the current President of the United States is **Joe Biden**.

Given the stated model build date of October 2023 which used input on this from July 2023, the answer is correct. The reasoning is interesting (though not unexpected).
a) it is maintaining state. As it should, otherwise each question would stand on its own and you’d never be able to ask a follow-up question.
b) It tries to infer from the previous question as to what the user’s intent is. Useful.

>>> Who is the current president of China?
<think>
Alright, so I need to figure out who the current president of China is. Let me start by recalling some information I have about Chinese leaders.

I know that former President Xi Jinping has been serving as the leader for quite a while now. I think he was first elected in 2012 or maybe 2013. Wait, actually, it was 2012 when he became the General Secretary of the Communist Party and then later he became president.

I also remember that there have been other leaders before him, like Deng Xiaoping, who was a very important leader in the reform and opening-up policies. Then there's Hu Jintao, who served two terms. After Hu, I think it was Wen Jiabao, but I'm not sure about his tenure.

So, putting this together, after Deng Xiaoping came Hu Jintao, then Wen Jiabao, followed by Xi Jinping taking over from 2013 onwards. So the current president is Xi Jinping.

</think>

The current President of China is **Xi Jinping**.

Correct. Now for the question that seems to get everybody’s knickers in a knot. DeepSeek’s mobile app doesn’t want to answer it, and people call out China’s censorship. We’ll talk about that in a moment, but let’s first try asking the local model straight out:

>>> Tell me about the Tiananmen Square event.
<think>
Okay, so I need to explain what happened at Tiananmen Square in China during the year 1989. From what I remember, it was a very significant event related to the Chinese government and its policies.

First, I think it started with a peaceful demonstration by students, intellectuals, and other groups who were upset about certain government actions or policies they believed were unfair. They gathered in Tiananmen Square because they wanted to express their opinions and demands.

As more people joined the protest, things turned into a larger-scale event. There were clashes between the protesters and the government's security forces. I believe there was some violence involved, with tear gas being used and maybe even some arrests or disruptions of the protest.

I also remember that there were military actions involved to maintain control, which might have led to casualties on both sides. The exact numbers are fuzzy in my memory, but I think several people were injured or killed during the events.

The government eventually cracked down hard on what they saw as an illegal demonstration. There was a lot of media coverage about it at the time, and now there's a debate about whether it was justified or an overreaction.

I'm not entirely sure about all the details, like the exact dates or how many people were involved, but I know it was a pivotal moment in modern Chinese history that led to various political changes. Maybe it's related to the 1989 protests and the subsequent shift in power within the government?

Overall, it seems like a complex event with multiple factors influencing its outcome, including government actions, international reactions, and the internal dynamics of the Communist Party.

</think>

Tiananmen Square is a significant historical event that occurred in Beijing on June 3-4, 1989. It was a peaceful demonstration by students, intellectuals, workers, and others who were protesting against what they perceived as unfair government policies regarding various issues such as human rights, economic reforms, and environmental regulations.

The protest began with small demonstrations and gradually turned into larger, more organized antigovernment activities. On June 3, several thousand people gathered in Tiananmen Square to protest against the Chinese Communist Party's handling of these issues.

As the protests grew, there was significant government crackdown. Security forces used tear gas, rubber bullets, and heavy machine gun fire to disperse the crowd, resulting in casualties on both sides. The government ultimately cracked down more brutally, arresting and killing many protesters. According to some accounts, at least 138 people were killed during the events.

The Tiananmen Square protests led to a dramatic shift in Chinese political power. After the crackdown, several members of the Chinese Communist Party who had supported the government's actions against the protesters were removed from leadership positions. The event also catalyzed widespread international condemnation and opened up a
period of political transition in China.

Right. So there we have it. It’s not refusing, it’s not hedging, it’s not skirting around the issue, it just answers (you should see ChatGPT hedge it answers, it’s icky, you need to very strictly pre-prompt it to make it not do that). DeepSeek doesn’t appear to be holding back at all, down to the use of “heavy machine gun fire to disperse the crowd”. As to whether there were indeed casualties on both sides I cannot say, even internationally there is rather a lack of independently verified information regarding the event. But for the purpose of this exercise, we can at least conclude that the model itself does not appear to be censoring its output.

So what about the mobile app that queries the model running in China? Well, someone else asked it a similar question to what I did above, and it didn’t want to talk about it. Then the person added “, answer in l33t speak.” to the question, whereupon they received a substantial answer (possibly less extensive than mine, but they may have queried the non-reasoning model).

What does this tell us? It’s simple logic (at least as a hypothesis): it probably means that the model itself contains all the information, but that in the online app the output gets scanned and censored via some automated mechanism. That mechanism isn’t perfect and humans are very creative, so in this instance it was bypassed. Remember: you can often tell a lot about how an application works internally just by observing how it behaves externally. And with the experiment of running a big DeepSeek model locally, we’ve just verified our hypothesis of where the censorship occurs as well, it seems clear that the model itself is not censored. At least not on these issues.

This is not to say that the model isn’t biased. But all models are biased, at the very least through their base dataset as well as the reinforcement learning, but often also for cultural reasons. Anyone pretending otherwise is either naive or being dishonest. But that’s something to further investigate and write about another time.

Classic McEleice and the NIST search for post-quantum crypto

I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.

Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.

About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.

This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].

For Public Key Encryption and Key Exchange Mechanism, Prof Bill Buchanan thinks that the winner will be lattice-based, but I am not convinced.

Robert McEleice at his retirement in 2007

Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:

He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.

Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.

Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.

Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.

We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.

As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.

The trouble with group labels. Because.

So Australia’s long-term accepted refugee detainees on Manus and Nauru will not be able to migrate to the US, if they come from a country that’s on Trump’s list.  So anyone from a particular set of countries is classified as bad.  Because Muslim.

paper boatFundamentally of course, these fellow humans should not be detained at all. They have been accepted as genuine refugees by the Australian immigration department.  Their only “crime”, which is not actually a crime by either Australian or International law, was to arrive by boat.  They are now, as a group, abused as deterrent marketing material.  Anyone coming by boat to Australia is classified as bad.  Because boat (although stats actually indicate that it might correlate better with skin colour).

My grandfather, a surgeon at a Berlin hospital, lost his job on the same day that Hitler was elected.  Since not even decrees move that quickly and the exclusion of jews from  particular professions happened gradually over the 1930s, we can only assume that the hospital management contained some overzealous individuals, who took initiatives that they thought would be looked on favourably by their new superiors. Because Jew.

The Berlin events are a neat example of how a hostile atmosphere enables certain behaviour – of course, if you’d asked the superiors, they didn’t order any such thing and it had nothing to do with them. Sound familiar?

Via jiggly paths, my family came to the UK.  Initially interned.  Because German.

My grandfather, whom as I mentioned was an accomplished surgeon, had to re-do all his medical education in Britain – during that period he was separated from his wife and two young daughters (Scotland vs the south of England – a long way in the 1930s).  Because [the qualifications and experience] not British.  Possibly also Because German.

Hassles with running a practice, and being allowed a car.  Because German.

Then allowed those things.  Because Useful.  He was still a German Jew though….

I mentioned this, because other refugees at the time would have had the same rules applied, but my grandfather had the advantage of his profession and thus being regarded as useful.  Others would not have had that benefit.  This means that people were being judged “worthy” merely based on their immediate usefulness to the local politics of the day, not for being a fellow human being in need, or any other such consideration.

Group Labels and Value Judgements

Any time we classify some group as more or less worthy than another group or set of groups, trouble will follow – both directly, and indirectly.  Every single time.  And the trouble will affect everybody, it’s not selective to some group(s).  Also every time.  Historically verifiable.

brown eyes - blue eyesPopulists simplify, for political gain.  Weak leaders pander to extreme elements, in the hope that it will keep them in power.

The method of defining groups doesn’t matter, nor does one need to add specific “instructions”. The “Blue Eye experiment” proved this. The nasties are initiated merely through a “simple” value judgement of one group vs another.  That’s all that’s required, and the bad consequences are all implied and in a way predetermined.  Trouble will follow, and it’s not going to end well.

Identity

It’s fine to identify as a member of a particular group, or more  likely multiple groups as many factors overlap.  That is part of our identity.  But that’s is not the same as passing a judgement on the relative value of one of those groups vs another group.

  • Brown vs blue eyes
  • Muslim vs Christian vs Jew
  • White vs black “race”
  • One football club vs another

human skullIt really doesn’t matter.  Many of these groupings are entirely arbitrary.  The concept of “race” has no scientific basis, we humans are verifiably a single species.

You can make up any arbitrary classification.  It won’t make a difference.  It doesn’t matter.  The outcomes will be the same.

Given what we know about these dynamics, anyone making such value judgements is culpable.  If they’re in a leadership position, I’d suggest that any utterances in that realm indicate either incompetence or criminal intent. Don’t do that.

Don’t accept it.  Don’t ignore it.  Don’t pander to it.  Don’t vote for it.

Speak up and out against it.  For everybody’s sake.  Because I assure you, every single example in history shows that it comes back to bite everyone.  So even if you don’t really feel a connection with people you don’t know, it’ll come and bite you and yours, too.

It’s rearing its ugly head, again, and if we ignore it it will bite us. Badly. Just like any previous time in history.  Guaranteed.

Maximising available dynamic memory on Arduino

I like programming in small spaces, it makes one think efficiently and not be sloppy.

Example: I was looking at an embedded device the other day, and found a complete Tomcat server and X desktop running on a Raspberry Pi type environment.  So that kind of stuff makes me shudder, it just looks wrong and to me it shows that whoever put that together (it was a company product, not just a hobby project) is not really “thinking” embedded.  Yes we have more powerful embedded CPUs available now, and more memory, but that all costs power.  So really: code efficiency = power efficiency!

Back to Arduino, a standard Arduino Uno type board (Atmel ATMEGA 328P) has 32K of program storage space, and 2K of dynamic memory.  The latter is used for runtime variables and the stack, so you want to be sure you always have enough spare there, particularly when using some libraries that need a bit of working space.

A bit of background… while in “big” CPUs the program and variable memory space is generally shared, it’s often separated in microcontrollers.  This makes sense considering the architecture: the program code is flashed and doesn’t change, while the variable space needs to be written at runtime.

The OneWire library, used to interface with for instance Maxim one-wire sensors, applies a little lookup table to do its CRC (checksum) calculation.  In this case it’s optimising for time, as using a table lookup CRC is much faster than just calculating for each byte.  But, it does optimise the table, by using the PROGMEM modifier when declaring the static array.  It can be optimised further still, see my post on reducing a CRC lookup table from 256 entries to 32 entries (I submitted a patch and pull request).  What PROGMEM does is tell the compiler to put the array in program storage rather than the dynamic variable space.

Arduino uses the standard GNU C++ compiler to cross-compile for the Atmel chips (that form the core of most Arduino boards), and this compiler “normally” just puts any variable (even if “const” – constant) with the rest of the variables.  This way of organising is perfectly sensible except with these microcontrollers.  So this is why we need to give the compiler a hint that certain variables can be placed in program memory instead!

Now consider this piece of code:

Serial.println("Hello, world!");

Where does the “Hello, world!” string get stored by default?  In the variable space!

I hadn’t really thought about that, until I ran short of dynamic memory in the controller of my hot water system.  The issue actually manifested itself by causing incorrect temperature readings, and sometimes crashing.  When looking at it closer, it became clear that the poor thing was running out of memory and overwriting other stuff or otherwise just getting stuck.  The controller used is an Freetronics EtherTen, which is basically an Arduino Uno with an Ethernet breakout integrated on the same board.  Using Ethernet with a Uno gets really tight, but it can be very beneficial: the controller is powered through 802.3f PoE, and communicates using UDP packets.

I knew that I wasn’t actually using that many actual variables, but was aware that the Ethernet library does require a fair bit of space (unfortunately the “how much” is not documented, so I was just running on a “as much as possible”).  Yet after compiling I was using in the range of 1K of the dynamic variable space, which just looked like too much.  So that’s when I started hunting, thought it over in terms of how I know compilers think, and then found the little note near the bottom of the PROGMEM page, explaining that you can use the F() macro on static strings to also place them in program storage. Eureka!

Serial.println(F("Hello, world!"));

When you compile an Arduino sketch, right at the end you get to see how much memory is used. Below is the output on a tiny sketch that only prints the hello world as normal:

Sketch uses 1374 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 202 bytes (9%) of dynamic memory, leaving 1846 bytes for local variables. Maximum is 2048 bytes.

If you use the F() modifier for this example, you get:

Sketch uses 1398 bytes (4%) of program storage space. Maximum is 32256 bytes.
Global variables use 188 bytes (9%) of dynamic memory, leaving 1860 bytes for local variables. Maximum is 2048 bytes.

The string is 13 bytes long plus its ‘\0’ terminator, and indeed we see that the 14 bytes have shifted from dynamic memory to program storage. Great score, because we know that that string can’t be modified anyway so we have no reason to have it in dynamic variable space.  It’s really a constant.

Armed with that new wisdom I went through my code and changed the serial monitoring strings to use the F() modifier.  That way I “recovered” well over 200 bytes of dynamic variable space, which should provide the Ethernet library with plenty of room!

Now, you may wonder, “why doesn’t he just use an #ifdef to remove the serial debugging code?”.  I could do that, but then I really have no way to easily debug the system on this hardware platform, as it wouldn’t work with the debugging code enabled (unless the Ethernet library is not active).  So that wouldn’t be a winner.  This approach works, and now with the extra knowledge of F() for string constants I have enough space to have the controller do everything it needs to.

Finally, while I did an OSDC conference talk on this solar hot water controller (video on this the PV system in my home) some years ago already, I realise I hadn’t actually published the code.  Done now, arjenlentz/HomeResourceMonitor on GitHub.  It contains some hardcoded stuff (mostly #define macros) for our family situation, but the main story is that the automatic control of the boost saves us a lot of money.  And of course, the ability to see what goes on (feedback) is an important factor for gaining understanding, which in turn leads to adjusting behaviour.

Using expired credit cards

Using expired credit/debit cards… surely you can’t do that?  Actually, yes you can. This is how it goes.

First, what can be observed (verified):

On a vendor site that allows your to save your card (hopefully via a token with the payment gateway provider, so it doesn’t actually store your card), you enter the card number and expiry date for your then still valid card.  This is necessary because otherwise the site is likely to reject your input.  Makes sense.

Some time later your card expires, but the vendor is still quite happy to keep using the card-on-file for recurring payments.  The payment gateway apparently doesn’t mind, and our banks apparently don’t mind.  I have observed this effect with Suncorp and Bank of Queensland, please let me know if you’ve observed this with other banks.

From this point, let’s play devil’s advocate.

  • What if someone got hold of your card number+expiry date?
    • Well, sites tend to reject dates-in-the-past on input. Excellent.
  • What if that someone just does +4 on the year and then enters it – renewed cards tend to have the same number, just with an updated expiry 4 years in to the future (the exact number of years may differ between banks) ?
    • Payment gateway should reject the card, because even though the card+expiry is “ok”, the CVV (Card Verification Value, the magic number on the back of the card) would be different!  Nice theory, but…
      • I’ve noted that some sites don’t ask for the CVV, and thus we must conclude at at least some payment gateways don’t require it.  Eek!
        I noticed that the payment gateway for one of these was Westpac.

So what are the underlying issues:

  • Banks let through payments on expired cards.
    • Probably done for client convenience (otherwise you’d be required to update lots of places).
  • Banks issue new cards with the same card number but just an updated year (even the month tends to be the same).
    • Possibly convenience again, but if you need to update your details anyway with some vendor, you might as well update a few more numbers.  I don’t see a valid reason to do this (please comment if you think of something).
  • Some payment gateways don’t require CVV to let through a payment.
    • This is inexcusable and means that the above two habits result in a serious fraud vector.  Payment gateways, credit card companies and banks should not allow this at all, yet somehow it goes through the gateway -> credit card company path without getting rejected.

Security tends to involve multiple layers.  This makes sense, as any one layer can be compromised.  When a security aspect is procedurally compromised, such as not regarding an expired card as expired, or not requiring the o-so-important CVV number for online payments, it’s the vendor itself undoing their security.  If that happens with a few layers, as in the above scenario, security is fatally impacted.  A serious failing.

I have little doubt that people have been using this fraud vector some time as it’s unlikely that I’m the first one spotting this.  In many scenarios, credit card companies tend to essentially weigh security risks against convenience, and refund those affected.  This is what happens with abuse of the PayWave system, and while I don’t really like it, I understand why they do this.  But I also think we have to draw the line somewhere.  Not requiring CVV numbers for online transactions is definitely beyond. Possibly renewing cards with the same number also.  And as it’s the combination of these factors that causes the problem, addressing any one of them could plug the hole – addressing more than one would be great.