0

I have several large files (3-6 Gb) of 1's and 0's characters in ASCII and I would like to convert it to a simply binary file. Newlines are not important and should be discarded.

test.bin below is 568 bytes, I would like the 560 bit file.

0111000110000000101000100000100100011111010010101000001001010000111000
1001100011010100001101110000100010000010000000000001011000010011111100
0100001000010000010000010111011101011111000111111000111001100010100011
0011101000100001111111000001111110111111101101100000011000010101100001
0000000110110001000000000001000011110100000101101000001000010001010011
1101101111010101011110001110000010011001100101101101000111111101110101
1000001100101101010111110111110101100000000011001000100000000011001110
0101101001110010011110000100101001001111010011100100001001111111100110
...

I've found several solutions going the other way, converting a binary file into ASCII but not the other way.

Ideally I'm looking for a simple linux / bash solution but I could live with a python solution. =================== Edit ==================

To make this less confusing consider converting any two ASCII characters into a binary file.

test_XY_encoded.txt

XYYYXXXYYXXXXXXXYXYXXXYXXXXXYXXYXXXYYYYYXYXXYXYXYXXXXXYXXYXYXXXXYYYXXX
YXXYYXXXYYXYXYXXXXYYXYYYXXXXYXXXYXXXXXYXXXXXXXXXXXXYXYYXXXXYXXYYYYYYXX
XYXXXXYXXXXYXXXXXYXXXXXYXYYYXYYYXYXYYYYYXXXYYYYYYXXXYYYXXYYXXXYXYXXXYY
XXYYYXYXXXYXXXXYYYYYYYXXXXXYYYYYYXYYYYYYYXYYXYYXXXXXXYYXXXXYXYXYYXXXXY
XXXXXXXYYXYYXXXYXXXXXXXXXXXYXXXXYYYYXYXXXXXYXYYXYXXXXXYXXXXYXXXYXYXXYY
YYXYYXYYYYXYXYXYXYYYYXXXYYYXXXXXYXXYYXXYYXXYXYYXYYXYXXXYYYYYYYXYYYXYXY
YXXXXXYYXXYXYYXYXYXYYYYYXYYYYYXYXYYXXXXXXXXXYYXXYXXXYXXXXXXXXXYYXXYYYX
XYXYYXYXXYYYXXYXXYYYYXXXXYXXYXYXXYXXYYYYXYXXYYYXXYXXXXYXXYYYYYYYYXXYYX

Where X represents the binary 0 and Y represents the binary 1.

4
  • Related? ASCII binary tools? Commented Jan 3, 2017 at 22:42
  • I found 'xxd -b test.bin' but the output is the binary strings for "1" and "0", "00110001" and "00110000" respectively. Commented Jan 3, 2017 at 22:44
  • How would you get 560 bit from 568 bytes? Am I missing something? 1 byte = 8 bit, 568*8 = 4544! Commented Jan 3, 2017 at 23:03
  • The file contains 560 binary characters of information, it's just encoded as ASCII which is why it's 568 bytes. I'd like to convert those 560 characters to bits as my final output. Commented Jan 3, 2017 at 23:06

3 Answers 3

2

How about this bash command?

cat test.bin | tr -d '\n' | perl -lpe '$_=pack"B*",$_' > true_binary.txt

'tr' will delete all newline characters, and the perl command converts to binary.

Sign up to request clarification or add additional context in comments.

4 Comments

Same difference; output file is 4481 bytes like the other solutions. All of them encode the character "1" as '0011001' and not '1'. Imagine if the file is all 'A' and 'B' characters instead where A = binary 1 and B = binary 0. That's the output file I'm looking for.
Got it! This was close but it was writing the output as ASCII, to write the output as binary directly I used: ' cat test.bin | tr -d '[\n]' | perl -lpe '$_=pack"B*",$_' > true_binary.txt '
Great! I misunderstood the requirements. Edited my answer to the correct command.
@gniourf_gniourf point taken -- edited the command again, those brackets shouldn't be there.
1

I don't know if this would solve the question, but how about this:

with open('ascii.txt', 'r') as file_ascii, open('binary.txt', 'wb') as file_bin:
    file_bin.write(bytes(''.join(file_ascii.read().split()), 'utf-8'))

Or, to overwrite the file:

with open('ascii.txt', 'r') as f:
    binary = bytes(''.join(file_ascii.read().split()), 'utf-8')

with open('ascii.txt', 'wb') as f:
    f.write(binary)

Short, but should do the trick.

5 Comments

This is close but just like 'xxd -b' the output is still 560 bytes not 560 bits.
You may even open two different files within single with
@MoinuddinQuadri - Updated, I didn't knew that. Thanks!
Also, since you just want to rewrite a file. You do not have to open it twice. open the file in r+ mode which means read + write
The OP may adapt it, this is just for better readability. But thanks for your hint!
0

We could build an "only shell" solution.
First, we transform the 1's and 0's to an stream of 8 characters lines:

$ { cat test.bin | tr -cd '01' | fold -b8; echo; }
01110001
10000000
10100010
00001001
00011111
…
…
10011110
00010010
10010011
11010011
10010000
10011111
11100110

That's 560/8 lines, or 70 lines, which should translate to 70 characters.
It should be said that the characters are not ASCII, values above decimal 127 (hex 7f) are not ASCII. I am interpreting them as byte values (unsigned decimal value).

Then we can read each line and translate it first to decimal "$((2#$a))" so the shell understand them, then to hex printf '\\x%x' so the final printf could translate to an hex byte printf '%b' "…":

$ { cat infile | tr -cd '01' | fold -b8; echo; } | 
    while read a; do printf '%b' "$(printf '\\x%x' "$((2#$a))")"; done 
q��     J�P�cP�XO�!u���(Έ�큅a���OoU�f[G�X2���Ȁ3����Ӑ��

Of course, the characters printed are a (most probably) incorrect interpretation of the byte values in some locale that the user is using. Maybe an hex output will be more interesting (but that depends on your needs or interest):

$ { cat infile | tr -cd '01' | fold -b8; echo; } | 
    while read a; do printf '%b' "$(printf '\\x%x' "$((2#$a))")"; done |
        od -vAn -tx1c

  71  80  a2  09  1f  4a  82  50  e2  63  50  dc  22  08  00  58
   q 200 242  \t 037   J 202   P 342   c   P 334   "  \b  \0   X
  4f  c4  21  04  17  75  f1  f8  e6  28  ce  88  7f  07  ef  ed
   O 304   ! 004 027   u 361 370 346   ( 316 210 177  \a 357 355
  81  85  61  01  b1  00  10  f4  16  82  11  4f  6f  55  e3  82
 201 205   a 001 261  \0 020 364 026 202 021   O   o   U 343 202
  66  5b  47  f7  58  32  d5  f7  d6  00  c8  80  33  96  9c  9e
   f   [   G 367   X   2 325 367 326  \0 310 200   3 226 234 236
  12  93  d3  90  9f  e6
 022 223 323 220 237 346

Note that the same structure could be used for the file test_XY_encoded.txt:

$ { cat infile | tr 'XY' '01' | tr -cd '01' | fold -b8; echo; } | 
    while read a; do printf '%b' "$(printf '\\x%x' "$((2#$a))")"; done | 
        od -vAn -tx1c

  71  80  a2  09  1f  4a  82  50  e2  63  50  dc  22  08  00  58
   q 200 242  \t 037   J 202   P 342   c   P 334   "  \b  \0   X
  4f  c4  21  04  17  75  f1  f8  e6  28  ce  88  7f  07  ef  ed
   O 304   ! 004 027   u 361 370 346   ( 316 210 177  \a 357 355
  81  85  61  01  b1  00  10  f4  16  82  11  4f  6f  55  e3  82
 201 205   a 001 261  \0 020 364 026 202 021   O   o   U 343 202
  66  5b  47  f7  58  32  d5  f7  d6  00  c8  80  33  96  9c  9e
   f   [   G 367   X   2 325 367 326  \0 310 200   3 226 234 236
  12  93  d3  90  9f  e6
 022 223 323 220 237 346

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.