1

I'm writing a custom compression algorithm in C that reads in ascii characters, removes the 1st bit from each of them (because it will always be 0), and then sticks it in a new file. It makes the input 7/8 the original size. Here's the compression:

#include <stdio.h>

int main()
{
  int i = 1;
  int c;
  unsigned short value = 0;

  while((c = getchar()) != EOF)
  {
    value = (c << i) | value;
    if(i != 1) putchar(value >> 8);
    value = value << 8;
    i++;
    if(i == 9) i = 1;
  }
  if(i != 1) putchar(value >> 8);
}

and here's the decompression:

#include <stdio.h>

int main() {

  int i = 1;
  int c;
  unsigned char value = 0;

  while((c = getchar()) != EOF) {
    value = (c >> i) | value;
    putchar(value);

    value = (c << (8-i)) | 0;
    value = value >> 1;

    if(++i == 8) {
      putchar(value);
      i = 1;
    }
  }
}

If I compress something like "ororororor" (without the quotes), and then decompress it, then the output is "orororor.r", where the "." is 7F in hex. However, if I give it "ororororrr" then it outputs "ororororrr" which is correct. It only fails with certain inputs, but I can't find a pattern to when it messes up.

Sorry that this is not in functions. The way I've been using it is in UNIX with these commands:

echo -n your input here > data
gcc compress.c
./a.out < data > inp
gcc decompress.c
./a.out < inp > out
hexdump -C out

2 Answers 2

1

A problem is surely that you do not zero value when decompressing.

This has no effect (the extra bits get rotated out) until you get at the end of file.

Try:

 if(++i == 8) {
     putchar(value);
     i = 1;
     value = 0; // Clean up
 }

Test case (modified the above program to only zero value if there was a command line argument):

  echo "xxxxxxxRxx" | ./comp | ./decomp OK
  xxxxxxxRxx
  echo "xxxxxxxRxx" | ./comp | ./decomp
  xxxxxxxRzx
Sign up to request clarification or add additional context in comments.

1 Comment

That fixed it! I thought I was zeroing it with the value = c << 8-i | 0 because that ORs it with 0, but I guess that wasn't clearing it correctly for the 8th character.
1

Are you accounting for the situations where the input won't fall on even 8-bit boundaries? Kinda like the problem base 64 encoding has when it does the same kind of thing....

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.