diff --git a/README.md b/README.md
index 71d484eb81327e8389c8d48e7727e5ab6f4e6fed..b84e35b13a6cfe51a853562a0dad172d802448df 100644
--- a/README.md
+++ b/README.md
@@ -1,18 +1,15 @@
 # ch32v003fun
 
-An open source development "environment" for the [CH32V003](http://www.wch-ic.com/products/CH32V003.html) with gcc-riscv64 that can be used in Windows, Linux and/or WSL.  The CH32V003 is 10-cent part with a RISC-V EC core that runs at 48MHz, has 16kB of flash and 2kB of RAM and a bunch of peripherals.  It also comes in SOP-8, QFN-20 and SOIC packages.  You can get the datasheet [here](http://www.wch-ic.com/downloads/CH32V003DS0_PDF.html).
+An open source development environment (tooling, headers, examples) for the [CH32V003](http://www.wch-ic.com/products/CH32V003.html) with gcc-riscv64 that can be used in Windows (Native), Linux and/or WSL.  The CH32V003 is 10-cent part with a RISC-V EC core that runs at 48MHz, has 16kB of flash and 2kB of RAM and a bunch of peripherals.  It also comes in SOP-8, QFN-20 and SOIC packages.  You can get the datasheet [here](http://www.wch-ic.com/downloads/CH32V003DS0_PDF.html).
 
-The goal of this project is to develop the tooling and environment for efficient use of the CH32V003.  This means making it possible to have basic projects that are compact and require no proprietary tooling like their [MounRiver Studio(MRS)](http://www.wch-ic.com/products/www.mounriver.com/).
-
-The existing EVT is massive.  Just to boot the chip at all, it requires ~2kB of support functions and has to do things like software-divides and use a ton of space at startup to use their HAL.  This project specifically avoids the HAL and makes it so you can just use the [TRM](http://www.wch-ic.com/downloads/CH32V003RM_PDF.html).
-
-In contrast, blinky is only 500 bytes with ch32v003fun, boots faster, and significantly simpler overall.
-
-As it currently stands it is still designed to use the WCH-Link to do the SDIO programming.  Though I would like to ALSO support an open source programmer.
+The goal of this project is to develop the tooling and environment for efficient use of the CH32V003.  Avoid complicated HALs, and unleash the hardware! The existing EVT is massive, and dev environment weighty.  This project specifically avoids the HAL and makes it so you can just use the [TRM](http://www.wch-ic.com/downloads/CH32V003RM_PDF.html). In contrast, blinky is only 500 bytes with ch32v003fun, boots faster, and significantly simpler overall.
 
 ch32v003fun contains:
 1. Examples using ch32v003fun, but not as many as using the HAL.
 2. "minichlink" which uses the WCH CH-Link with libusb, for cross-platform use.
+  * An STM32F042 Programmer, the NHC-Link042
+  * An ESP32S2 Programmer, the [esp32s2-funprog](https://github.com/cnlohr/esp32s2-cookbook/tree/master/ch32v003programmer)
+  * The official WCH Link-E Programmer.
 3. An extra copy of libgcc so you can use unusual risc-v build chains, located in the `misc/libgcc.a`.
 4. A folder named "ch32v003fun" containing a single self-contained source file and header file for compling apps for the ch32v003.
 5. On some systems ability to "printf" back through
@@ -46,33 +43,48 @@ You can just try out the `debugprintf` project, or call `SetupDebugPrintf();` an
 
 On WSL or Debian based OSes `apt-get install build-essential libnewlib-dev gcc-riscv64-unknown-elf libusb-1.0-0-dev libudev-dev`
 
+On Arch/Manjaro, `sudo pacman -S base-devel libusb`, then from AUR install `riscv64-unknown-elf-gcc, riscv64-unknown-elf-binutils, riscv64-unknown-elf-newlib` (will compile for a long time).
+
 On Windows, download and install (to system) this copy of GCC10. https://gnutoolchains.com/risc-v/
 
 On macOS install the RISC-V toolchain with homebrew following the instructions at https://github.com/riscv-software-src/homebrew-riscv
 
-You can use the pre-compiled minichlink or 
+You can use the pre-compiled minichlink or go to minichlink dir and `make` it.
 
-## Running
+## Building and Flashing
 
 ```
 cd examples/blink
-make
+make flash
 ```
 
-In Linux this will "just work" using the `minichlink`.   In Windows if you want to use minichlink, you will need to use Zadig to install WinUSB to the WCH-Link interface 0.
+Just use `make` if you want to compile but not flash.
+
+In Linux this will "just work"(TM) using `minichlink`.
+In Windows, if you want to use minichlink, you will need to use Zadig to install WinUSB to the WCH-Link interface 0.
+The generated .hex file is compatible with the official WCH flash tool.
 
-In Windows, you can use this or you can use the WCH-LinkUtility to flash the built hex file.
 
 ## ESP32S2 Programming
 
-## WCH-Link
+## WCH-Link (E)
 
 It enumerates as 2 interfaces.
 0. the programming interface.  I can't get anything except the propreitary interface to work.
-1. the usb serial port built in.
+1. the built-in usb serial port. You can hook up UART D5=TX to RX and D6=RX to TX of the CH32V003 for printf/debugging, default speed is 115200. Both are optional, connect what you need.
 
 If you want to mess with the programming code in Windows, you will have to install WinUSB to the interface 0.  Then you can uninstall it in Device Manager under USB Devices.
 
+On linux you find the serial port with `ls -l /dev/ttyUSB* /dev/ttyACM*` and connect to it with `screen /dev/ttyACM0 115200`
+Adding your user to these groups will remove the need to `sudo` for access to the serial port:
+debian-based
+	`sudo usermod -a -G dialout $USER`
+arch-based
+	`sudo usermod -a -G uucp $USER`
+ 
+ You'll need to log out and in to see the change.
+
+
 ## WCH-Link Hardware access in WSL
 To use the WCH-Link in WSL, it is required to "attach" the USB hardware on the Windows side to WSL.  This is achieved using a tool called usbipd.
 
@@ -80,18 +92,21 @@ To use the WCH-Link in WSL, it is required to "attach" the USB hardware on the W
 2. Install the WSL side client:
     * For Debian: 
         `sudo apt-get install usbip hwdata usbutils`
+    * For Arch-based:
+        `sudo pacman -S usbip hwdata usbutils`
     * For Ubuntu (not tested):
 ```
         sudo apt install linux-tools-5.4.0-77-generic linux-tools-virtual hwdata usbutils
         sudo update-alternatives --install /usr/local/bin/usbip usbip `ls /usr/lib/linux-tools/*/usbip | tail -n1` 20
 ```
+
 3. Plug in the WCH-Link to USB
 4. Run Powershell as admin and use the `usbipd list` command to list all connected devices
 5. Find the this device: `1a86:8010  WCH-Link (Interface 0)` and note the busid it is attached to
 6. In powershell, use the command `usbipd wsl attach --busid=<BUSID>` to attach the device at the busid from previous step
 7. You will hear the windows sound for the USB device being removed (and silently attached to WSL instead)
 8. In WSL, you will now be able to run `lsusb` and see that the SCH-Link is attached
-9. For unknown reasons, you must run make under root access in order to connect to the programmer with minichlink.  Recommend running `sudo make` when building and programming projects using WSL
+9. For unknown reasons, you must run make under root access in order to connect to the programmer with minichlink.  Recommend running `sudo make flash` when building and programming projects using WSL
 Feel free to solve this issue and figure out a way to give the user hardware access to WCH-Link and modify these instructions.
 
 ## minichlink
@@ -123,4 +138,3 @@ You can open a github ticket or join my Discord in the #ch32v003fun channel. htt
  * http://www.wch-ic.com/downloads/QingKeV2_Processor_Manual_PDF.html Processor Manual
  * http://www.wch-ic.com/downloads/CH32V003RM_PDF.html Technical Reference Manual
  * http://www.wch-ic.com/downloads/CH32V003DS0_PDF.html Datasheet
-
diff --git a/examples/GPIO/Makefile b/examples/GPIO/Makefile
index c7ee56af8f0e1a752e2d4d563b88c50cd0424887..399d16eba5d704e10e1ce2a4c0455daf5b06477c 100644
--- a/examples/GPIO/Makefile
+++ b/examples/GPIO/Makefile
@@ -1,3 +1,5 @@
+all : flash
+
 TARGET:=GPIO
 
 CFLAGS+=-DTINYVECTOR
@@ -5,7 +7,6 @@ ADDITIONAL_C_FILES+=wiring.c
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/GPIO_analogRead/Makefile b/examples/GPIO_analogRead/Makefile
index ff995e67083d41ad07499576da97746201271b1f..e47191ea6a2a9efea0eb37aa371f0c276ec5f817 100644
--- a/examples/GPIO_analogRead/Makefile
+++ b/examples/GPIO_analogRead/Makefile
@@ -1,3 +1,5 @@
+all : flash
+
 TARGET:=GPIO_analogRead
 
 CFLAGS+=-DTINYVECTOR
@@ -5,7 +7,6 @@ ADDITIONAL_C_FILES+=wiring.c
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/MCOtest/Makefile b/examples/MCOtest/Makefile
index 2cdcdbfd17e0c0c2fe66ea47c189d5ed61aff536..20c15bfcfb253f54a0d9aee1564603d1993d430b 100644
--- a/examples/MCOtest/Makefile
+++ b/examples/MCOtest/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=MCOtest
 
 CFLAGS+=-DTINYVECTOR -DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/adc_dma_opamp/Makefile b/examples/adc_dma_opamp/Makefile
index 7c4d8f1ba2aad23a3c593486ba15b183ec542028..55bdd4e2feca1e51cda9e4d03f6eb7ce97b0efce 100644
--- a/examples/adc_dma_opamp/Makefile
+++ b/examples/adc_dma_opamp/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=adc_dma_opamp
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/adc_polled/Makefile b/examples/adc_polled/Makefile
index 11afe2ea9535ca5cdd34755e263a2653f54fdc83..42f80910cb00c3b46f4fb603c6d1fbbf959e68bd 100644
--- a/examples/adc_polled/Makefile
+++ b/examples/adc_polled/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=adc_polled
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/blink/Makefile b/examples/blink/Makefile
index 63f7ff0d6546009ea3592fbe7ffc93a099c4ec6b..a2563455e0ff37588df9aa893aa69417c6f25411 100644
--- a/examples/blink/Makefile
+++ b/examples/blink/Makefile
@@ -1,8 +1,9 @@
+all : flash
+
 TARGET:=blink
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/bootload/Makefile b/examples/bootload/Makefile
index 47ff5e054142a9735a36d0aac879ec149f0ecb28..dc85199fa0bbfef6e06393db5dc3213810bc88e5 100644
--- a/examples/bootload/Makefile
+++ b/examples/bootload/Makefile
@@ -1,8 +1,9 @@
+all : flash
+
 TARGET:=bootload
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/debugprintfdemo/Makefile b/examples/debugprintfdemo/Makefile
index 0e3fda68083465e8e241cbc9f15295257bb69104..ce4ca1ce6425a2df098ab8d60d50c5127dd81f1c 100644
--- a/examples/debugprintfdemo/Makefile
+++ b/examples/debugprintfdemo/Makefile
@@ -1,8 +1,9 @@
+all : flash
+
 TARGET:=debugprintfdemo
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/direct_gpio/Makefile b/examples/direct_gpio/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..154479d8fadb1096860652d1b5467bca6a385ec8
--- /dev/null
+++ b/examples/direct_gpio/Makefile
@@ -0,0 +1,10 @@
+all : flash
+
+TARGET:=direct_gpio
+
+include ../../ch32v003fun/ch32v003fun.mk
+
+flash : cv_flash
+clean : cv_clean
+
+
diff --git a/examples/direct_gpio/direct_gpio.c b/examples/direct_gpio/direct_gpio.c
new file mode 100644
index 0000000000000000000000000000000000000000..47d56f9f021f8486939bd7195201274c8f2c8cf7
--- /dev/null
+++ b/examples/direct_gpio/direct_gpio.c
@@ -0,0 +1,52 @@
+// Could be defined here, or in the processor defines.
+#define SYSTEM_CORE_CLOCK 48000000
+
+#include "ch32v003fun.h"
+#include <stdio.h>
+
+#define APB_CLOCK SYSTEM_CORE_CLOCK
+
+uint32_t count;
+
+int main()
+{
+	SystemInit48HSI();
+
+	// Enable GPIOs
+	RCC->APB2PCENR |= RCC_APB2Periph_GPIOC;
+
+	// GPIO C1 Push-Pull
+	GPIOC->CFGLR &= ~(0xf<<(4*1));
+	GPIOC->CFGLR |= (GPIO_Speed_10MHz | GPIO_CNF_OUT_PP)<<(4*1);
+	// GPIO C2 Push-Pull
+	GPIOC->CFGLR &= ~(0xf<<(4*2));
+	GPIOC->CFGLR |= (GPIO_Speed_10MHz | GPIO_CNF_OUT_PP)<<(4*2);
+	// GPIO C4 Push-Pull
+	GPIOC->CFGLR &= ~(0xf<<(4*4));
+	GPIOC->CFGLR |= (GPIO_Speed_10MHz | GPIO_CNF_OUT_PP)<<(4*4);
+
+	while(1)
+	{
+		// Use low bits of BSHR to SET output
+		GPIOC->BSHR = 1<<(1);      // SET GPIO C1
+		GPIOC->BSHR = 1<<(2);      // SET GPIO C2
+
+		// Modify the OUTDR register directly to SET output
+		GPIOC->OUTDR |= 1<<(4);    // SET GPIO C4
+		Delay_Ms( 950 );
+
+
+		// Use upper bits of BSHR to RESET output
+		GPIOC->BSHR = (1<<(16+1)); // RESET GPIO C1
+
+		// Use BCR to RESET output
+		GPIOC->BCR = (1<<(2));     // RESET GPIO C2
+
+		// Modify the OUTDR register directly to CLEAR output
+		GPIOC->OUTDR &= ~(1<<(4)); // CLEAR GPIO C4
+
+		Delay_Ms( 50 );
+		count++;
+	}
+}
+
diff --git a/examples/external_crystal/Makefile b/examples/external_crystal/Makefile
index a4a4b8dde6be39adbdc72873667d1d37f51ec1f6..8c6cf756acbf90ab9ffd76a19d057bc3b57ef7aa 100644
--- a/examples/external_crystal/Makefile
+++ b/examples/external_crystal/Makefile
@@ -1,8 +1,9 @@
+all : flash
+
 TARGET:=external_crystal
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/i2c_oled/Makefile b/examples/i2c_oled/Makefile
index e2e933a29f60c3cc30163258029f222918fd6224..b9b2d1a02788c3f9d2bd9736a8e4a6004cf9253b 100644
--- a/examples/i2c_oled/Makefile
+++ b/examples/i2c_oled/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=i2c_oled
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/optionbytes/Makefile b/examples/optionbytes/Makefile
index fca3e40bdb68a5e920ab6ca6e26faeda7daa5e84..5e49195c82302282314686ee3f33112907876366 100644
--- a/examples/optionbytes/Makefile
+++ b/examples/optionbytes/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=optionbytes
 
 CFLAGS+=-DTINYVECTOR
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/run_from_ram/Makefile b/examples/run_from_ram/Makefile
index 55db45e79ddf58b06a766e9c54a9d77f9fd4dd56..4743de508184371cd9862e198165dc427a57cb95 100644
--- a/examples/run_from_ram/Makefile
+++ b/examples/run_from_ram/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=run_from_ram
 
 CFLAGS+=-DTINYVECTOR
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/sandbox/Makefile b/examples/sandbox/Makefile
index 13888334a4b45f7b0dcba5290227eced951d50bf..b3f8b71aad1aab9cde12d9a8a36662cac06fc31c 100644
--- a/examples/sandbox/Makefile
+++ b/examples/sandbox/Makefile
@@ -1,8 +1,9 @@
+all : flash
+
 TARGET:=sandbox
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/self_modify_code/Makefile b/examples/self_modify_code/Makefile
index 80c711c36e615c7c2618b51784f02e7c312ec70c..fb872cfbebd7fdc2e789ab79a02182f49e6962ae 100644
--- a/examples/self_modify_code/Makefile
+++ b/examples/self_modify_code/Makefile
@@ -1,8 +1,9 @@
+all : flash
+
 TARGET:=self_modify_code
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/spi_dac/Makefile b/examples/spi_dac/Makefile
index 98e7abf89391863e9082a989e53c584224db3ee5..c7ef196154ca50823da2985083a574352b921fab 100644
--- a/examples/spi_dac/Makefile
+++ b/examples/spi_dac/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=spi_dac
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/systick_irq/Makefile b/examples/systick_irq/Makefile
index e72ba9807b0ce6f59d5e2bf6d9e69ca7472d7108..42b59e6d7559688ded602721055949c6259dfaf4 100644
--- a/examples/systick_irq/Makefile
+++ b/examples/systick_irq/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=systick_irq
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/systick_irq_millis/Makefile b/examples/systick_irq_millis/Makefile
index e2050f99840757d86a3defbca9effb5c2db38d24..641fc4cd43b6b16d9107b38143bb71cc0001d478 100644
--- a/examples/systick_irq_millis/Makefile
+++ b/examples/systick_irq_millis/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=systick_irq_millis
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/tim1_pwm/Makefile b/examples/tim1_pwm/Makefile
index eb2260d357364820cf4de8fcdb21fbdede7645c9..8ece937bcbdd7344cba03fbe07fdf62863e999ff 100644
--- a/examples/tim1_pwm/Makefile
+++ b/examples/tim1_pwm/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=tim1_pwm
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/uartdemo/Makefile b/examples/uartdemo/Makefile
index d77dc2153005732a22f8adb9b3656c65ff9a01fd..0d5fb5db5e9774041496f06fc54a5a8660cb1887 100644
--- a/examples/uartdemo/Makefile
+++ b/examples/uartdemo/Makefile
@@ -1,10 +1,11 @@
+all : flash
+
 TARGET:=uartdemo
 
 CFLAGS+=-DSTDOUT_UART
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
 
diff --git a/examples/ws2812bdemo/Makefile b/examples/ws2812bdemo/Makefile
index 3c823287b90bca972df8988b6aebd343baaa16b5..46bcf178bca2ee077686a2a3fc6c833eecc9343e 100644
--- a/examples/ws2812bdemo/Makefile
+++ b/examples/ws2812bdemo/Makefile
@@ -1,7 +1,8 @@
+all : flash
+
 TARGET:= ws2812bdemo
 
 include ../../ch32v003fun/ch32v003fun.mk
 
-all : flash
 flash : cv_flash
 clean : cv_clean
diff --git a/examples/ws2812bdemo/color_utilities.h b/examples/ws2812bdemo/color_utilities.h
index 3b6f1b74bac1e269eff0a6668f6712e483a5038f..84bf2532ba36d4caa931577a0bc0749cb7ace682 100644
--- a/examples/ws2812bdemo/color_utilities.h
+++ b/examples/ws2812bdemo/color_utilities.h
@@ -136,6 +136,64 @@ static const unsigned char sintable[] = {
 	0x26, 0x28, 0x2a, 0x2d, 0x2f, 0x31, 0x34, 0x36, 0x39, 0x3c, 0x3e, 0x41, 0x44, 0x47, 0x49, 0x4c, 
 	0x4f, 0x52, 0x55, 0x58, 0x5b, 0x5e, 0x61, 0x64, 0x67, 0x6a, 0x6d, 0x70, 0x73, 0x76, 0x79, 0x7d, };
 
+static inline uint32_t FastMultiply( uint32_t big_num, uint32_t small_num ) __attribute__((section(".data")));
+static inline uint32_t FastMultiply( uint32_t big_num, uint32_t small_num )
+{
+	// The CH32V003 is an EC core, so no hardware multiply. GCC's way multiply
+	// is slow, so I wrote this.
+	//
+	// This basically does this:
+	//	return small_num * big_num;
+	//
+	// Note: This does NOT check for zero to begin with, though this still
+	// produces the correct results, it is a little weird that even if
+	// small_num is zero it executes once.
+	//
+	// Additionally note, instead of the if( m&1 ) you can do the following:
+	//  ret += multiplciant & neg(multiplicand & 1).
+	//
+	// BUT! Shockingly! That is slower than an extra branch! The CH32V003
+	//  can branch unbelievably fast.
+	//
+	// This is functionally equivelent and much faster.
+	//
+	// Perf numbers, with small_num set to 180V.
+	//  No multiply:         21.3% CPU Usage
+	//  Assembly below:      42.4% CPU Usage  (1608 bytes for whole program)
+	//  C version:           41.4% CPU Usage  (1600 bytes for whole program)
+	//  Using GCC (__mulsi3) 65.4% CPU Usage  (1652 bytes for whole program)
+	//
+	// The multiply can be done manually:
+	uint32_t ret = 0;
+	uint32_t multiplicand = small_num;
+	uint32_t mutliplicant = big_num;
+	do
+	{
+		if( multiplicand & 1 )
+			ret += mutliplicant;
+		mutliplicant<<=1;
+		multiplicand>>=1;
+	} while( multiplicand );
+	return ret;
+
+	// Which is equivelent to the following assembly (If you were curious)
+/*
+	uint32_t ret = 0;
+	asm volatile( "\n\
+		.option   rvc;\n\
+	1:	andi t0, %[small], 1\n\
+		beqz t0, 2f\n\
+		add %[ret], %[ret], %[big]\n\
+	2:	srli %[small], %[small], 1\n\
+		slli %[big], %[big], 1\n\
+		bnez %[small], 1b\n\
+	" :
+		[ret]"=&r"(ret), [big]"+&r"(big_num), [small]"+&r"(small_num) : :
+		"t0" );
+	return ret;
+*/
+}
+
 static uint32_t TweenHexColors( uint32_t hexa, uint32_t hexb, int tween )
 {
 	if( tween <= 0 ) return hexa;
@@ -148,9 +206,9 @@ static uint32_t TweenHexColors( uint32_t hexa, uint32_t hexb, int tween )
 	int32_t hbb = hexb & 0xff;
 	int32_t hbr = (hexb>>8) & 0xff;
 	int32_t hbg = (hexb>>16) & 0xff;
-	int32_t b = (hab * aamt + hbb * bamt + 128) >> 8;
-	int32_t r = (har * aamt + hbr * bamt + 128) >> 8;
-	int32_t g = (hag * aamt + hbg * bamt + 128) >> 8;
+	int32_t b = (FastMultiply( hab, aamt ) + FastMultiply( hbb, bamt ) + 128) >> 8;
+	int32_t r = (FastMultiply( har, aamt ) + FastMultiply( hbr, bamt ) + 128) >> 8;
+	int32_t g = (FastMultiply( hag, aamt ) + FastMultiply( hbg, bamt ) + 128) >> 8;
 	return b | (r<<8) | (g<<16);
 }