Author |
Message |
Fogmeister
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 7:35 pm Posts: 6580 Location: Getting there
|

I wrote a bit of code the other day to de-compress some input data into the correct form.
It was for something one of my colleagues was working on but the number range being worked with was something like -1.2 to +1.4 (or something).
Anyway, the input comprised of hundreds of these values that had been collected of a period of time (i.e. once every ten minutes for 24 hours)
Anyway, I've never seen the kind of technique they used to compress the data before.
They provided a text file with a load of ASCII characters in it. Each group of 3 was one number.
i.e. hP[
and the way to convert from that into the number had a load of instructuions.
First convert the letters into numbers, then into binary and then concatenate them into a long binary string. The first bit was then positive/negative (S)ign. The next 6 were the (E)xponent and the next 14 were the (F)ractional part.
Then you converted each part back to decimal and used them in the following equation...
result = (1 - F/12345) * 2^(E-12)
Then if S = 1 the result = negative.
It meant you could get a range of VERY accurate numbers into a very small amount of data.
Never seen anything like that before but it made sense.
Anyone else had to do anything like that?
|
Thu May 10, 2012 1:36 pm |
|
 |
JJW009
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 6:58 pm Posts: 8767 Location: behind the sofa
|
Do you mean write down the ascii values in binary in groups of 3? If so, then it's just 24 bit signed floating point in a raw binary file. The truth is, there are no letters. They only exist because you're printing the number as if it was ascii. 24 bit is less common. 32 bit is "single precision" and the the lowest precision you usually use.
_________________jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly." When you're feeling too silly for x404, youRwired.net
|
Thu May 10, 2012 2:01 pm |
|
 |
finlay666
Spends far too much time on here
Joined: Thu Apr 23, 2009 9:40 pm Posts: 4876 Location: Newcastle
|
There was a technique that I have used before (as in at uni) that worked out as something like the following
convert data to arbitrary means (say binary)
group it into groups of 4 bits, replace common bit patters with another number, e.g. 2 for 1111 and 3 for 0000 (say use a byte for this, compresses 4bits to 3bits) re-encode to binary + include key
But not similar to that.... no idea why they chose that...
I would have gone down a simpler route, define lower bounds (in this case -1.2) then with a scale (say 1/10000) use an int16/10k that gives you -1.2-5.3 with 0.0001 precision
_________________TwitterCharlie Brooker: Macs are glorified Fisher-Price activity centres for adults; computers for scaredy cats too nervous to learn how proper computers work; computers for people who earnestly believe in feng shui.
|
Thu May 10, 2012 2:18 pm |
|
 |
Fogmeister
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 7:35 pm Posts: 6580 Location: Getting there
|
Sort of... I think. You take each ascii value and convert that to binary. So... ASCII - 9 X j = Decimal - 57 88 106 = Binary - 0111001 1011000 1101010 so... S = 0 E = 111001 = 57 F = 10110001101010 = 11370 Then put these into the equation. It may be a common thing to do (I imagine it is) just never seen it myself (especially not to write a conversion function).
|
Thu May 10, 2012 2:49 pm |
|
 |
JJW009
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 6:58 pm Posts: 8767 Location: behind the sofa
|
I worry of this talk of ascii. I don't see why you don't just read the file as binary bytes, and skip the first step which could potentially cause issues if the numbers represent non-ascii characters. 0 being an obvious example. It really isn't a "text file" by the sounds of it.
_________________jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly." When you're feeling too silly for x404, youRwired.net
|
Thu May 10, 2012 3:19 pm |
|
 |
Fogmeister
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 7:35 pm Posts: 6580 Location: Getting there
|
TBH I'm not too fussed about that stage. I'd be surprised if the language that is used here would actually be able to do that. The program I wrote takes in 3 numbers (57,88,107) and outputs a single number so isn't bothered about reading from the file input at all.
|
Thu May 10, 2012 3:33 pm |
|
 |
JJW009
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 6:58 pm Posts: 8767 Location: behind the sofa
|
I will now stop worrying 
_________________jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly." When you're feeling too silly for x404, youRwired.net
|
Thu May 10, 2012 3:47 pm |
|
 |
EddArmitage
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 9:40 pm Posts: 5288 Location: ln -s /London ~
|
As JJ's saying surely the ASCII is just an interpretation. At the end of the day a file is just a lump of bits, and you can make of it what you will. Why you'd display that file as ASCII I don't know understand, but that's probably my (mis-)reading of the problem.
|
Thu May 10, 2012 3:48 pm |
|
 |
Fogmeister
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 7:35 pm Posts: 6580 Location: Getting there
|
 You are probably right about it not being text etc... And I would also guess that your worrying is correct... However... I'll probably never find out  TBH I wasn't even supposed to be shown the exact instructions of how the conversion works (and I haven't shown correct version here either) so I'll probably never see that again. I'm off in 2 weeks too so... 
|
Thu May 10, 2012 3:53 pm |
|
 |
JJW009
I haven't seen my friends in so long
Joined: Thu Apr 23, 2009 6:58 pm Posts: 8767 Location: behind the sofa
|
I just remembered what it reminded me of!
Many years ago when I had an 8088 laptop with no hard disk and a 3.5" floppy loaded with DOS and a C compiler, I was trying to optimise code to draw the Mandelbrot set in glorious full-colour CGA. Because I had no floating point co-pro I decided to use integer logic in machine code to do the iterative loop, since the C floating point emulation was very slow. I think I used a 16 bit signed integer for the mantissa and an 8 bit one for the exponent.
It was very much faster, but the rounding errors led to it producing something only vaguely resembling the expected pattern. It was an awesome pattern though!
I still have the computer and disk somewhere...
_________________jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly." When you're feeling too silly for x404, youRwired.net
|
Thu May 10, 2012 5:09 pm |
|
 |
big_D
What's a life?
Joined: Thu Apr 23, 2009 8:25 pm Posts: 10691 Location: Bramsche
|
Yeah, we used to use lots of techniques like that. All the old modem used to do on-the-fly compression to maximise their throughput, so they were much "faster" and more efficient when sending text than a JPEG or MP3, because they are already compressed. If this area interests you, have a look at V.42bis and V.44, which were the last in the line. V.44 is based on LZJH (Lempel-Ziv-Jeff-Heath) adaptive data compression developed by Hughes Electronics http://ixbtlabs.com/articles/compressv4 ... index.html
_________________ "Do you know what this is? Hmm? No, I can see you do not. You have that vacant look in your eyes, which says hold my head to your ear, you will hear the sea!" - Londo Molari
Executive Producer No Agenda Show 246
|
Fri May 11, 2012 4:05 am |
|
|