分类: zig

  • Zig Msgpack

    Zig Msgpack

    MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it’s faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.

    It’s fast and a typical use case is in neovim and redis! For neovim, it is used to as remote rpc protocol.

    MessagePack spec : Github

    Zig Msgpack : Github

    Thery

    The implementation plan of this protocol is that the header is a one-byte mark to indicate the type of data to be transmitted next. If it is a non-fixed length, it will be followed by a few bytes to indicate the length, and then the data.

    Here is a very simple schematic for reading a simple type:

    The types supported:

    Nil, Bool, Int, Float, Str, Bin, Array, Map, Ext (Predefined timestamp types)

    Usage

    Acorrding to GIthub README.md add this package to your project!

    zig-msgpack provide a generics function call Pack to build the read type used.

    Just like this:

    const bufferType = std.io.FixedBufferStream([]u8);
    
    const pack = msgpack.Pack(
        *bufferType,
        *bufferType,
        bufferType.WriteError,
        bufferType.ReadError,
        bufferType.write,
        bufferType.read,
    );

    masgpack.Pack will return a type, we can call it pack !

    We use FixedBufferStream as the writetype and readtype, Pack accepts 6 parameters:

    fn Pack(
        comptime WriteContext: type,
        comptime ReadContext: type,
        comptime WriteError: type,
        comptime ReadError: type,
        comptime writeFn: fn (context: WriteContext, bytes: []const u8) WriteError!usize,
        comptime readFn: fn (context: ReadContext, arr: []u8) ReadError!usize,
    ) type

    It looks very similar to std.io.writer.GenericWriter and std.io.GenericReader, the design of zig-msgpack here references them.

    This type pack contains methods for reading and writing the msgpack type, we can use the pack.init(write_context, read_context) to get a variable.

    But since zig does not distinguish strings separately, a separate type Str is defined:

    pub const Str = struct {
        str: []const u8,
        pub fn value(self: Str) []const u8 {
            return self.str;
        }
    };
    
    /// this is for encode str in struct
    pub fn wrapStr(str: []const u8) Str {
        return Str{ .str = str };
    }

    For message pack Bin type:

    pub const Bin = struct {
        bin: []u8,
        pub fn value(self: Bin) []u8 {
            return self.bin;
        }
    };
    
    /// this is wrapping for bin
    pub fn wrapBin(bin: []u8) Bin {
        return Bin{ .bin = bin };
    }

    For message pack Ext type:

    pub const EXT = struct {
        type: i8,
        data: []u8,
    };
    
    /// t is type, data is data
    pub fn wrapEXT(t: i8, data: []u8) EXT {
        return EXT{
            .type = t,
            .data = data,
        };
    }

    zig-msgpack provides multiple ways to write and read, and strict type checking will be performed on the parameters to ensure that no non-read failure errors will occur during runtime.

    For example, we write two bool value, and read them:

    var arr: [0xffff_f]u8 = std.mem.zeroes([0xffff_f]u8);
    var write_buffer = std.io.fixedBufferStream(&arr);
    var read_buffer = std.io.fixedBufferStream(&arr);
    var p = pack.init(
        &write_buffer,
        &read_buffer,
    );
    
    const test_val_1 = false;
    const test_val_2 = true;
    
    try p.write(.{ .bool = test_val_1 });
    try p.write(.{ .bool = test_val_2 });
    
    var val_1 = try p.read(allocator);
    defer val_1.free(allocator);
    
    var val_2 = try p.read(allocator);
    defer val_2.free(allocator);
    
    try std.testing.expect(val_1.bool == test_val_1);
    try std.testing.expect(val_2.bool == test_val_2);

    Overall, we convert the read result into the Payload type and obtain the value. This allows us to directly obtain data of unknown structure.

    For more examples, check out the unit tests of zig-msgpack.

  • Assembly in Zig

    Recently, I want to write a kenel with zig, naturally we need to use assembly.

    When computer boot, we need to deploy with assembly so we can enter protected mode.

    Zig reference assembly

    Seprate File

    we just need to declare the assembly function with keyword extern

    For example, we next use zig to call assembly to print our most common “Hello, World!”

    First, zig itself has its own support for assembly, but now only for AT&T, the modish intel syntax support poorly.

    Now zig use llvm for assembly parsing, and zig may have its own assembler in the future.

    Now, we config the build.zig for assembly:

    You can just watch the ///////////////// symbol

    const std = @import("std");
    
    // Although this function looks imperative, note that its job is to
    // declaratively construct a build graph that will be executed by an external
    // runner.
    pub fn build(b: *std.Build) void {
        // Standard target options allows the person running `zig build` to choose
        // what target to build for. Here we do not override the defaults, which
        // means any target is allowed, and the default is native. Other options
        // for restricting supported target set are available.
        const target = b.standardTargetOptions(.{});
    
        // Standard optimization options allow the person running `zig build` to select
        // between Debug, ReleaseSafe, ReleaseFast, and ReleaseSmall. Here we do not
        // set a preferred release mode, allowing the user to decide how to optimize.
        const optimize = b.standardOptimizeOption(.{});
    
        const exe = b.addExecutable(.{
            .name = "zig-prac",
            // In this case the main source file is merely a path, however, in more
            // complicated build scripts, this could be a generated file.
            .root_source_file = .{ .path = "src/main.zig" },
            .target = target,
            .optimize = optimize,
        });
    
        /////////////////
        // we need to notice here, as long as this sentence can add assembly language support
        exe.addAssemblyFile("./src/hello.s");
        ////////////////
    
        // This declares intent for the executable to be installed into the
        // standard location when the user invokes the "install" step (the default
        // step when running `zig build`).
        exe.install();
    
        // This *creates* a RunStep in the build graph, to be executed when another
        // step is evaluated that depends on it. The next line below will establish
        // such a dependency.
        const run_cmd = exe.run();
    
        // By making the run step depend on the install step, it will be run from the
        // installation directory rather than directly from within the cache directory.
        // This is not necessary, however, if the application depends on other installed
        // files, this ensures they will be present and in the expected location.
        run_cmd.step.dependOn(b.getInstallStep());
    
        // This allows the user to pass arguments to the application in the build
        // command itself, like this: `zig build run -- arg1 arg2 etc`
        if (b.args) |args| {
            run_cmd.addArgs(args);
        }
    
        // This creates a build step. It will be visible in the `zig build --help` menu,
        // and can be selected like this: `zig build run`
        // This will evaluate the `run` step rather than the default, which is "install".
        const run_step = b.step("run", "Run the app");
        run_step.dependOn(&run_cmd.step);
    
        // Creates a step for unit testing.
        const exe_tests = b.addTest(.{
            .root_source_file = .{ .path = "src/main.zig" },
            .target = target,
            .optimize = optimize,
        });
    
        // Similar to creating the run step earlier, this exposes a `test` step to
        // the `zig build --help` menu, providing a way for the user to request
        // running the unit tests.
        const test_step = b.step("test", "Run unit tests");
        test_step.dependOn(&exe_tests.step);
    }

    Then, we write the main.zig:

    const std = @import("std");
    
    extern fn hello_world(?[*:0]const u8) void;
    
    const msg: [:0]const u8 = "Hello World!\n";
    
    pub fn main() void {
        hello_world(msg.ptr);
    }

    the hello.s:

    Notie: the filename must be *.s!

    .globl hello_world
    # global function, expose the hello_world
    .type hello_world, @function
    # tell compiler, we define a function
    .section .text
    hello_world:
      mov $4, %eax
      mov $1, %ebx
      mov %edi, %ecx
      # get parameter from register edi, you can learn more on x86-64 abi document
      mov $0xd, %edx
      # the length of string
      int $0x80
      # system call
      ret

    We alse can get another version for main.zig:

    notice the function @ptrToInt, it can cast ptr to int(usize)

    const std = @import("std");
    
    extern fn hello_world(usize) void;
    
    const msg = "Hello World!\n";
    
    pub fn main() void {
        hello_world(@ptrToInt(msg));
    }

    Global Assembly

    When an assembly expression occurs in a container level comptime block, this is global assembly.

    This kind of assembly has different rules than inline assembly. First, volatile is not valid because all global assembly is unconditionally included. Second, there are no inputs, outputs, or clobbers. All global assembly is concatenated verbatim into one long string and assembled together. There are no template substitution rules regarding % as there are in inline assembly expressions.

    const std = @import("std");
    
    comptime {
        asm (
            \\.globl hello_world
            \\.type hello_world, @function
            \\.section .text
            \\hello_world:
            \\  mov $4, %eax
            \\  mov $1, %ebx
            \\  mov %edi, %ecx
            \\  mov $0xd, %edx
            \\  int $0x80
            \\  ret
        );
    }
    
    // extern fn hello_world(?[*:0]const u8) void;
    extern fn hello_world(usize) void;
    
    // const msg: [:0]const u8 = "Hello World!\n";
    const msg = "Hello World!\n";
    
    pub fn main() void {
        // hello_world(msg.ptr);
        hello_world(@ptrToInt(msg));
    }

    Summary

    Then we can just run zig build run, you will see “Hello, World!” on your screen!

    Assembly reference zig

    wait for completion

    Inline assembly in zig

    pub fn syscall1(number: usize, arg1: usize) usize {
        return asm volatile ("syscall"
            : [ret_reference] "={rax}" (-> usize),
            : [number_reference] "{rax}" (number),
              [arg1_reference] "{rdi}" (arg1),
            : "rcx", "r11"
        );
    }

    In this code, syscall1 is a wrap function for assembly

    Inline assembly is an expression which returns a value, the asm keyword begins the expression.

    volatile is an optional modifier that tells Zig this inline assembly expression has side-effects. Without volatile, Zig is allowed to delete the inline assembly code if the result is unused.

    syscall is assembly instructions.

    After the first colon is the output section, ret_reference is reference of output, "={rax}" is the output constraint string, In this example, the constraint string means “the result value of this inline assembly instruction is whatever is in $rax”. (-> usize), it is either a value binding, or -> and then a type. The type is the result type of the inline assembly expression. If it is a value binding, then %[ret] syntax would be used to refer to the register bound to the value.

    After the second colon is the output section, ret_reference is reference of input, we can have these in the asm string and it would refer to the operands, the register rax and register rdi will have the value of number and arg1

    After the second colon is the output section, it is the list of clobbers. These declare a set of registers whose values will not be preserved by the execution of this assembly code. These do not include output or input registers. The special clobber value of “memory” means that the assembly writes to arbitrary undeclared memory locations – not only the memory pointed to by a declared indirect output. In this example we list $rcx and $r11 because it is known the kernel syscall does not preserve these registers.

    Notice: for now inline asm is limited in zig, constraints don’t work, i mean they do but only a limited set. So we avoid them unless you really need to.

    Reference

    Zig Assembly