Reply to topic  [ 11 posts ] 
Compressing data for transfer 
Author Message
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 7:35 pm
Posts: 6580
Location: Getting there
Reply with quote
I wrote a bit of code the other day to de-compress some input data into the correct form.

It was for something one of my colleagues was working on but the number range being worked with was something like -1.2 to +1.4 (or something).

Anyway, the input comprised of hundreds of these values that had been collected of a period of time (i.e. once every ten minutes for 24 hours)

Anyway, I've never seen the kind of technique they used to compress the data before.

They provided a text file with a load of ASCII characters in it. Each group of 3 was one number.

i.e. hP[

and the way to convert from that into the number had a load of instructuions.

First convert the letters into numbers, then into binary and then concatenate them into a long binary string. The first bit was then positive/negative (S)ign. The next 6 were the (E)xponent and the next 14 were the (F)ractional part.

Then you converted each part back to decimal and used them in the following equation...

result = (1 - F/12345) * 2^(E-12)

Then if S = 1 the result = negative.

It meant you could get a range of VERY accurate numbers into a very small amount of data.

Never seen anything like that before but it made sense.

Anyone else had to do anything like that?

_________________
Oliver Foggin - iPhone Dev

JJW009 wrote:
The count will go up until they stop counting. That's the way counting works.


Doodle Sub!
Game Of Life

Image Image


Thu May 10, 2012 1:36 pm
Profile WWW
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 6:58 pm
Posts: 8767
Location: behind the sofa
Reply with quote
Fogmeister wrote:
convert the letters into numbers, then into binary and then concatenate them into a long binary string

Do you mean write down the ascii values in binary in groups of 3? If so, then it's just 24 bit signed floating point in a raw binary file. The truth is, there are no letters. They only exist because you're printing the number as if it was ascii.

24 bit is less common. 32 bit is "single precision" and the the lowest precision you usually use.

_________________
jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly."

When you're feeling too silly for x404, youRwired.net


Thu May 10, 2012 2:01 pm
Profile WWW
Spends far too much time on here
User avatar

Joined: Thu Apr 23, 2009 9:40 pm
Posts: 4876
Location: Newcastle
Reply with quote
There was a technique that I have used before (as in at uni) that worked out as something like the following

convert data to arbitrary means (say binary)

group it into groups of 4 bits, replace common bit patters with another number, e.g. 2 for 1111 and 3 for 0000 (say use a byte for this, compresses 4bits to 3bits)
re-encode to binary + include key

But not similar to that.... no idea why they chose that...

I would have gone down a simpler route, define lower bounds (in this case -1.2) then with a scale (say 1/10000) use an int16/10k that gives you -1.2-5.3 with 0.0001 precision

_________________
Twitter
Charlie Brooker:
Macs are glorified Fisher-Price activity centres for adults; computers for scaredy cats too nervous to learn how proper computers work; computers for people who earnestly believe in feng shui.


Thu May 10, 2012 2:18 pm
Profile
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 7:35 pm
Posts: 6580
Location: Getting there
Reply with quote
JJW009 wrote:
Fogmeister wrote:
convert the letters into numbers, then into binary and then concatenate them into a long binary string

Do you mean write down the ascii values in binary in groups of 3? If so, then it's just 24 bit signed floating point in a raw binary file. The truth is, there are no letters. They only exist because you're printing the number as if it was ascii.

24 bit is less common. 32 bit is "single precision" and the the lowest precision you usually use.

Sort of... I think.

You take each ascii value and convert that to binary.

So...

ASCII - 9 X j = Decimal - 57 88 106 = Binary - 0111001 1011000 1101010
so...
S = 0
E = 111001 = 57
F = 10110001101010 = 11370

Then put these into the equation.

It may be a common thing to do (I imagine it is) just never seen it myself (especially not to write a conversion function).

_________________
Oliver Foggin - iPhone Dev

JJW009 wrote:
The count will go up until they stop counting. That's the way counting works.


Doodle Sub!
Game Of Life

Image Image


Thu May 10, 2012 2:49 pm
Profile WWW
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 6:58 pm
Posts: 8767
Location: behind the sofa
Reply with quote
I worry of this talk of ascii. I don't see why you don't just read the file as binary bytes, and skip the first step which could potentially cause issues if the numbers represent non-ascii characters. 0 being an obvious example. It really isn't a "text file" by the sounds of it.

_________________
jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly."

When you're feeling too silly for x404, youRwired.net


Thu May 10, 2012 3:19 pm
Profile WWW
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 7:35 pm
Posts: 6580
Location: Getting there
Reply with quote
JJW009 wrote:
I worry of this talk of ascii. I don't see why you don't just read the file as binary bytes, and skip the first step which could potentially cause issues if the numbers represent non-ascii characters. 0 being an obvious example. It really isn't a "text file" by the sounds of it.

TBH I'm not too fussed about that stage. I'd be surprised if the language that is used here would actually be able to do that.

The program I wrote takes in 3 numbers (57,88,107) and outputs a single number so isn't bothered about reading from the file input at all.

_________________
Oliver Foggin - iPhone Dev

JJW009 wrote:
The count will go up until they stop counting. That's the way counting works.


Doodle Sub!
Game Of Life

Image Image


Thu May 10, 2012 3:33 pm
Profile WWW
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 6:58 pm
Posts: 8767
Location: behind the sofa
Reply with quote
I will now stop worrying :lol:

_________________
jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly."

When you're feeling too silly for x404, youRwired.net


Thu May 10, 2012 3:47 pm
Profile WWW
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 9:40 pm
Posts: 5288
Location: ln -s /London ~
Reply with quote
As JJ's saying surely the ASCII is just an interpretation. At the end of the day a file is just a lump of bits, and you can make of it what you will. Why you'd display that file as ASCII I don't know understand, but that's probably my (mis-)reading of the problem.

_________________
timark_uk wrote:
Gay sex is better than no sex

timark_uk wrote:
Edward Armitage is Awesome. Yes, that's right. Awesome with a A.


Thu May 10, 2012 3:48 pm
Profile
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 7:35 pm
Posts: 6580
Location: Getting there
Reply with quote
JJW009 wrote:
I will now stop worrying :lol:

:)

You are probably right about it not being text etc... And I would also guess that your worrying is correct...

However...

I'll probably never find out :D TBH I wasn't even supposed to be shown the exact instructions of how the conversion works (and I haven't shown correct version here either) so I'll probably never see that again.

I'm off in 2 weeks too so... :)

_________________
Oliver Foggin - iPhone Dev

JJW009 wrote:
The count will go up until they stop counting. That's the way counting works.


Doodle Sub!
Game Of Life

Image Image


Thu May 10, 2012 3:53 pm
Profile WWW
I haven't seen my friends in so long
User avatar

Joined: Thu Apr 23, 2009 6:58 pm
Posts: 8767
Location: behind the sofa
Reply with quote
I just remembered what it reminded me of!

Many years ago when I had an 8088 laptop with no hard disk and a 3.5" floppy loaded with DOS and a C compiler, I was trying to optimise code to draw the Mandelbrot set in glorious full-colour CGA. Because I had no floating point co-pro I decided to use integer logic in machine code to do the iterative loop, since the C floating point emulation was very slow. I think I used a 16 bit signed integer for the mantissa and an 8 bit one for the exponent.

It was very much faster, but the rounding errors led to it producing something only vaguely resembling the expected pattern. It was an awesome pattern though!

I still have the computer and disk somewhere...

_________________
jonbwfc's law: "In any forum thread someone will, no matter what the subject, mention Firefly."

When you're feeling too silly for x404, youRwired.net


Thu May 10, 2012 5:09 pm
Profile WWW
What's a life?
User avatar

Joined: Thu Apr 23, 2009 8:25 pm
Posts: 10691
Location: Bramsche
Reply with quote
Yeah, we used to use lots of techniques like that. All the old modem used to do on-the-fly compression to maximise their throughput, so they were much "faster" and more efficient when sending text than a JPEG or MP3, because they are already compressed.

If this area interests you, have a look at V.42bis and V.44, which were the last in the line. V.44 is based on LZJH (Lempel-Ziv-Jeff-Heath) adaptive data compression developed by Hughes Electronics

http://ixbtlabs.com/articles/compressv4 ... index.html

_________________
"Do you know what this is? Hmm? No, I can see you do not. You have that vacant look in your eyes, which says hold my head to your ear, you will hear the sea!" - Londo Molari

Executive Producer No Agenda Show 246


Fri May 11, 2012 4:05 am
Profile ICQ
Display posts from previous:  Sort by  
Reply to topic   [ 11 posts ] 

Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software.