From base 10 to any base and back again

26 Sep 2019
Have a comment or question? Please drop me an email or tweet me @ronnieholm.

Following the algorithm outlined for turning a 128 bit sequential GUID into non-sequential 21 digits, we transform it into C#. As the algorithm operates on 160 bit numbers, but the largest intrinsic type, unsigned long, is 64 bits only, we use the arbitrary size BigInteger type. It has support for common operations we'd expect on intrinsic types and System.Math:

using System;
using System.Collections.Generic;
using System.Numerics;
using System.Security.Cryptography;
using System.Text;

namespace Bugfree {
    public static class ConvertionHelpers {
        public const string Alphabet = "0123456789abcdefghijklmnopqrstuvwxyz";

        public static string GuidToAccountNumber(Guid guid) {
            var a = ComputeSha1(guid.ToByteArray());
            var b = new BigInteger(a, isUnsigned: true);
            var c = b >> 52;
            var d = IntegerToBase(c, Alphabet.Length);
            return DigitsToEncoding(d, Alphabet);
        }

        public static byte[] ComputeSha1(byte[] bytes) {
            using var sha1 = new SHA1Managed();
            return sha1.ComputeHash(bytes);
        }

        // Code below goes here.
    }
}

Conversion from base 10 to base 36, any base, is surprisingly straightforward. But rather than the encoded number, IntegerToBase returns an array of indices into an alphabet, any alphabet, to divoce value from visual representation. For a base > 10, the position of a single digit in the alphabet may be > 10, a distinction seldomly made in base 10. For later testing, the inverse of any base to base 10 is similarly included:

public static int[] IntegerToBase(BigInteger number, int base_) {
    if (number.Sign != 1)
        throw new ArgumentException($"{nameof(number)} must be positive");
    if (base_ < 2)
        throw new ArgumentException($"{nameof(base_)} must be at least two");

    var alphabetPositions = new List();
    while (number > 0) {
        var quotient = BigInteger.DivRem(number, base_, out var digit);
        alphabetPositions.Add((int)digit);
        number = quotient;
    }
    alphabetPositions.Reverse();
    return alphabetPositions.ToArray();
}

public static BigInteger BaseToInteger(int[] digits, int base_) {
    if (base_ < 2)
        throw new ArgumentException($"{nameof(base_)} must be at least two");

    BigInteger value;
    var j = 0;
    for (var i = digits.Length - 1; i >= 0; i--)
        value += digits[i] * BigInteger.Pow(base_, j++);
    return value;
}

public static string DigitsToEncoding(int[] digits, string alphabet) {
    var sb = new StringBuilder();
    foreach (var d in digits)
        sb.Append(alphabet[d]);
    return sb.ToString();
}

Conversion from base 10 to any base is carried out by repeatedly dividing by base. Running through the divisions, with 123 and base 10, number, quotient, and digit become (123,12,3), (12,1,2), (1,0,1). Then positions are reversed to arrive at indices 1, 2, and 3. For 12 and base 2, the number, quotient, and digit become (12,6,0), (6,3,0), (3,1,1), (1,0,1) for 1, 1, 0, and 0 into the alphabet. For 254 in base 16 we get (254,15,14), (15,0,15) for indices 15 and 14 into the alphabet, yielding 0xfe.

Regular unit tests

As for a couple of examples to show convertions back and forth:

using System;
using Xunit;
using System.Numerics;
using System.Runtime.InteropServices;
using System.Linq;
using System.Collections;
using System.Collections.Generic;
using static Bugfree.ConvertionHelpers;

namespace Bugfree.UnitTests {
    public class ConvertionHelpersTests {
        [Fact]
        public void SequentialGuidToNumberTest() {
            var guid = Guid.Parse("63b0e1db-cd2f-4265-b4f1-eb4b436b6adf");
            var number = GuidToAccountNumber(guid);
            Assert.Equal(21, number.Length);
            Assert.Equal("o7wt7mrm8rj4t0j24twi3", number);
        }

        [Theory]
        [InlineData(3470241, new[] { 3, 4, 15, 3, 10, 1 }, "34f3a1")]
        public void Base10ToHex(int base10, int[] hexDigits, string hexEncoded) {
            var digits = IntegerToBase(new BigInteger(base10), 16);
            var encoded = DigitsToEncoding(digits, ShortAlphabet);
            Assert.Equal(hexDigits, digits);
            Assert.Equal(hexEncoded, encoded);
        }

        [Fact]
        public void BaseToIntegerTest() {
            var n = BaseToInteger(new[] { 1, 2, 3 }, 10);
            Assert.Equal(new BigInteger(123), n);
        }

        // Code below goes here.
    }
}

Property based tests

To supplement the example based tests, we can take advantage of the inverse property of conversions. With \(\textrm{encode}_b\) being base 10 to base \(b\) and \(\textrm{decode}_b\) being base \(b\) to base 10, the following inverse property must always hold: $$n = \textrm{decode}_b(\textrm{encode}_b(n))$$ To the tune of John Hughes' don't write tests, generate them, for arbitrary generated values of \(b\) and \(n\), assertions must hold. A more sophisticated approach would use FsCheck, but for many cases, a simple generator is good enough. From Choosing properties for property-based testing, this property is an example of there and back again.

[Theory]
[ClassData(typeof(RandomNumberBase))]
public void InverseProperty(BigInteger number, int base_) {
    var n = BaseToInteger(IntegerToBase(number, base_), base_);
    Assert.Equal(n, number);
}

public class RandomNumberBase : IEnumerable {
    public IEnumerator GetEnumerator() {
        var rng = new Random();
        for (var i = 0; i < 100; i++) {
            var number = rng.Next(0, int.MaxValue);
            var base_ = rng.Next(2, Math.Min(number, 1000));
            yield return new object[] { new BigInteger(number), base_ };
        }
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

Summary