Back to the stupid search function

What I intended to do was to write a file search function with as few characters of C code as possible. I dont think I did, but it was a start.

This time I have the same code in somewhat optimized nasm assembler code. Optimized as in as few instructions as possible while still having a readable source code. This is not omptimized for speed as I would guess loading big chunks into memory would be the way to go. (And some other magic I surely dont know anything about.)

This is for me a programming lesson. Maybe I will take a look at it again in a few years and see if I have improved 🙂

Compiled on OSX with nasm-2.12.02.

section .data
s db "%s: %i: %i: %s",10,0
o db "usage: ",10,0
section .bss
section .text

global _main
extern _open
extern _strlen
extern _read
extern _printf
extern _close

push rbp
mov rbp,rsp
sub rsp,48
cmp edi,3
jne .help
mov rax,qword [rsi+8]
mov qword [rbp-16],rax
mov rax,qword [rsi+16]
mov qword [rbp-24],rax
mov rdi,qword [rbp-16]
mov esi,0
call _open
mov dword [rbp-28],eax
mov rdi,qword [rbp-24]
call _strlen
mov qword [rbp-36],rax
mov r12,0
mov r13,0
mov r14,qword [rbp-24]
lea r15,[rbp-37]

mov edx,1
mov edi,dword [rbp-28]
mov rsi,r15
call _read
inc r13
cmp eax,0
je .end
movzx eax,byte [rbp-37]
movsxd rcx,r12d
lea r12d,[rcx+1]
cmp al, byte [r14+rcx]
mov eax,0
cmovne r12d,eax
cmp r12,qword [rbp-36]
jne .read
mov r12,0
mov ecx,r13d
mov edx,r13d
mov rax,qword [rbp-36]
sub edx,eax
lea rdi,[rel s]
mov rsi,qword [rbp-16]
mov r8,qword [rbp-24]
call _printf
jmp .read

lea rdi,[rel o]
call _printf

mov edi,dword [rbp-28]
call _close
add rsp,48
pop rbp

The distinctive sound of Atari 2600.

ascii logo

53 54 53 31 3c c8 02 03 00 01 00 02 00 40 1c 00
01 1f 0f 00 02 00 00 00 00 04 01 14 0f 00 05 00
00 00 00 06 01 14 0f 00 07 00 00 00 00 08 01 15
0f 00 0a 00 00 00 00 0c 01 14 0f 00 0e 00 00 00
00 10 01 14 0f 00 12 00 00 00 00 14 01 13 0f 00
16 00 00 00 00 18 01 1d 0f 00 20 01 1f 0f 00 22
00 00 00 00 24 01 14 0f 00 26 01 15 0f 00 28 01
14 0f 00 2a 01 15 0f 00 2c 01 14 0f 00 2e 01 15
0f 00 30 01 14 0f 00 32 00 00 00 00 34 01 13 0f
00 36 00 00 00 00 38 01 1d 0f 00 01 40 01 00 00
00 0f 00 02 40 02 00 0c 0f 0f 00 01 00 00 00 00

This is actually the music data, on a cartridge,
with its player running on 128 bytes of RAM.

Its byte values are the header, the refresh rate,
tempo in bpm, number of pages and patterns, pages
index, pattern number, lenght, number of used rows,
and row data of a pattern; number, waveform,
frequency, volume and trigger.

This is the Stella Tracker Song format,
used in Lowres, Solskogen 2014, Oldskool Demo Compo.

If you want to check it out, goto YouTube.


LIFL, or its full name, Linux Filesystem Logger, is a new, rewritten version of loggedfs, a filesystem activities logging daemon, based on FUSE.


LIFL logs any file system call inside a given directory-path, in details.

To describe the prosject I will summarise the functionality of the program in a list:

  • Fully configurable with On/Off switches for performance.
  • Remote logging with MySQL.
  • Logging of lstat, access, readlink, readdir, mknod, mkdir, unlink, rmdir, symlink, rename, link, chmod, chown, truncate, utimens, open, read, write, statfs, fallocate, setxattr, getxattr, listxattr and removexattr.
  • SQL provides flexibility to represent the log data.
  • Logs system error messages.
  • Monitor write calls and log a copy of the write buffer with options to target the command, effective user id and write sizes of the write calls.
  • Logging of time, hostname, user id, group id, username and groupname.
  • Logging of TTY, login time and remote host.
  • Logging command, arguments, process id, parent process command and parent process id.
  • And file, path, file protection, file owner and group.
  • LIFL

    This project need some big testing. I think I will set up a honeypot to see if the program has the expected behavior, and to see if I am able to find some valuable results.

    (If so I will come back with another post about that.)

    See Github page.

    Tomboy data extractor

    ohboy tomboy extractor

    When I changed platform from Linux to Mac, some of my Linux software where unavailable. The most important app I lost was Tomboy Notes. It had all the things I needed to write down. After looking for a working version on OSX, I decided I had to export. Both the Linux and the Windows port where lacking working functionality to export to a more readable format. I tried the different plugins for a way to export, but I could not get them working. I tried to export the data with libXML, but libXML did not recognize the data as valid xml data.

    Ok. I needed the data and I wanted cleartext files, so I made my own Tomboy XML data extractor.

    This utility takes data from Tomboy application data folder and outputs the data to cleartext files. If there is multiple revisions of a Tomboy note, only the newest is stored.

    This process of trying to get the data into cleartext was annoying. I guess others might have had the same problem, so I wrote this post and made the utility public.

    You can download the sourcecode here.

    Practical lazyness


    I made a mess of my photos. About 9000 originals stored as many as three times in the Pictures folder on my Mac.
    So I started to write my own code for finding duplicate files.
    This code will inspect the file’s content, not match with filename and size. It ran unbelivable fast. 9000 photos scanned and compared in only a few seconds.
    Job done. 🙂

    Compilation should be straight forward.
    You will find the files here.

    Be careful with filenames, platforms, and distribution.

    When a Subversion-client do not filter characters in filenames properly, it may result in :

    I. Client stops working while committing, resulting in a broken checkout of repository. (And a half done commit results in not having a overview of the files.)

    II. Writing files to repository which cannot be distributed, results in broken repository. (And it is hard to purge files from a subversion repository.)

    III. A broken repository, then again, results in a broken client checkout. (And you have to clean up the client too.)

    If you have done all these awful steps, you dont want to do them again.

    If you need to clean up the filenames, i wrote a small program that takes care of all the restricted filename letters and names on Windows, OSX and Linux.

    Please feel free to grab it, its under the GNU GPL.


    Datastorm revisited

    In regard of the demolition of the party place where Datastorm was arranged and the end of Datastorm, here is a gallery from 2011.

    Optimal file search function?

    off_t search_file(int f,off_t *o,char *m){
        int h=0,l=strlen(m);
        ssize_t b=1;
        char c;
            if(c==m[h])h++;else h=0;

    The function takes a file-descriptor f and repositions the offset to o and returns the offset to the last character in the first occurrence of the string m.
    I think it is a quite fast and minimal implementation, free for grab if you need one.

    UTF-8 and standard C library

    Take a look at this:

    binf@home:~/codesnippets/C$ cat utf8.c
    #include <stdio.h>
    #include <string.h>
    void main(void)
        printf("%s : %i\n","a",(int)strlen("a"));
        printf("%s : %i\n","æ",(int)strlen("æ"));

    And compile:

    binf@home:~/codesnippets/C$ gcc -o utf8 utf8.c

    Output :

    binf@home:~/codesnippets/C$ ./utf8
     a : 1
     æ : 2

    Yeah, it is the Unicode up-code. When you are used to work in an ISO-8859-1 environment, you might take into consideration that more or less of system calls are made with the ASCII in mind.

    One example straight to the point; on Linux, the dirent structure is defined as follows:

    struct dirent {
                   ino_t          d_ino;
                   off_t          d_off;
                   unsigned short d_reclen;
                   unsigned char  d_type;
                   char           d_name[256]; /* filename */

    In an UTF-8 environment with variable-width encoding a character uses one to four bytes of the system’s assigned 256 bytes for a file name. And with my Unicode example your are limited to one half, a 128 character file name.

    binf@home:~/codesnippets/C$ uname -rvm
    3.8.0-35-generic #52~precise1-Ubuntu SMP Thu Jan 30 17:24:40 UTC 2014 x86_64