librelist archives

« back to archive

tnetstrings round2

tnetstrings round2

From:
Zed A. Shaw
Date:
2011-03-20 @ 22:52
Alright, tweaked the code a bit more to support all the data types that
JSON has, and be backward compatible with existing netstrings:

http://codepad.org/gfpQLnZP

That will parse the original netstring as a "blob" which in python is
just a string.

A couple of design points:

1. It's actually easier to get the type after you load the data.  You
have to read the size, then the ':', then read the full message anyway,
so having the type ahead of time doesn't buy you much.  Having it at the
end makes it easy to recursively deal with or loop over.

2. Putting balanced separators makes the parsing code harder.  In Python
you'd have to split with a regex on all the possible chars instead of
just ':'.

3. Backward compatibility with netstrings is crucial because then people
can use this library to handle the netstrings we already use even if
they use JSON.  Looking at the code for a lot of netstrings libs out
there it's kind of bad stuff.

4. So far the other implementations people have offered up are showing
that parsing this simple text is really easy (as compared with binary
formats):

http://codepad.org/8NZTj74s
http://re-factor.blogspot.com/2011/03/typed-netstrings.html

Next up, generating this from the above.  I'll probably have to make an
explicit Blob() class for Python so that it round-trips.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] tnetstrings round2

From:
Isaac Force
Date:
2011-03-21 @ 09:18
On Sun, Mar 20, 2011 at 3:52 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> Alright, tweaked the code a bit more to support all the data types that
> JSON has, and be backward compatible with existing netstrings:
>
> http://codepad.org/gfpQLnZP

Ruby version based on http://codepad.org/Uj42SuMo :

http://codepad.org/qhgbXDPP

Benchmark results added at the bottom in a comment. Summary is that
it's ~3.5x faster than the pure-Ruby JSON, but ~2x slower than JRuby's
Java-based JSON.

Only tested on JRuby 1.6 and Zed's 'TESTS' sample tnetstring data.

-Isaac

Re: [mongrel2] tnetstrings round2

From:
Zed A. Shaw
Date:
2011-03-21 @ 16:37
On Mon, Mar 21, 2011 at 02:18:14AM -0700, Isaac Force wrote:
> Ruby version based on http://codepad.org/Uj42SuMo :
> 
> http://codepad.org/qhgbXDPP

Alright so it's coming out to be about 100 lines of code in any language
to implement.  Sounds like a winner for simplicity.

> Benchmark results added at the bottom in a comment. Summary is that
> it's ~3.5x faster than the pure-Ruby JSON, but ~2x slower than JRuby's
> Java-based JSON.

Is this comparing pure Ruby tnetstring to JRuby?  Can you do
JRuby-tnetstring against JRuby-json?  I'm curious if it's the jruby or
the json that's faster.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] tnetstrings round2

From:
Isaac Force
Date:
2011-03-21 @ 18:28
On Mon, Mar 21, 2011 at 9:37 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> On Mon, Mar 21, 2011 at 02:18:14AM -0700, Isaac Force wrote:
>> http://codepad.org/qhgbXDPP
>
>> Benchmark results added at the bottom in a comment. Summary is that
>> it's ~3.5x faster than the pure-Ruby JSON, but ~2x slower than JRuby's
>> Java-based JSON.
>
> Is this comparing pure Ruby tnetstring to JRuby?  Can you do
> JRuby-tnetstring against JRuby-json?  I'm curious if it's the jruby or
> the json that's faster.

All tests were done with the same JRuby VM.

When you require 'json' logic is included that attempts to load a
binary acceleration library for the platform it's running in; in this
case that's a Java lib. Requiring 'java/pure' skips that and uses a
pure-Ruby JSON lib.

tnetstring vs. json/pure   = 3.5x faster
tnetstring vs. json(+java) = 2x slower

Pure-Ruby vs. pure-Ruby in the same VM the tnetstring code is faster
than the JSON lib that ships with Ruby.

Is that what you're asking?

-Isaac

Re: [mongrel2] tnetstrings round2

From:
Zed A. Shaw
Date:
2011-03-21 @ 22:36
On Mon, Mar 21, 2011 at 11:28:51AM -0700, Isaac Force wrote:
> tnetstring vs. json/pure   = 3.5x faster
> tnetstring vs. json(+java) = 2x slower
> 
> Pure-Ruby vs. pure-Ruby in the same VM the tnetstring code is faster
> than the JSON lib that ships with Ruby.
> 
> Is that what you're asking?

Yep, perfect.  That's actually kind of interesting because cjson under
the Python library is idiotic fast compared to tnetstrings or
simplejson.  Like 250x faster.  I was thinking the JVM might be able to
make naive tnetstrings fast, and at least be able to do json fast, but
looks like not really.

Now I'm curious what a C implementation of tnetstrings can do.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] tnetstrings round2

From:
Ryan Kelly
Date:
2011-03-22 @ 03:38
could not decode message

Re: [mongrel2] tnetstrings round2

From:
Ryan Kelly
Date:
2011-03-22 @ 05:18
On Tue, 2011-03-22 at 14:38 +1100, Ryan Kelly wrote:
> On Mon, 2011-03-21 at 15:36 -0700, Zed A. Shaw wrote:
> >
> > 
> > Yep, perfect.  That's actually kind of interesting because cjson under
> > the Python library is idiotic fast compared to tnetstrings or
> > simplejson.  Like 250x faster.
> > 
> > Now I'm curious what a C implementation of tnetstrings can do.
> 
> Attached is a start.  It's a "_tnetstring" module for python written in
> the style of the cjson module, i.e. a pure-C parsing core with hooks
> back into the python API.
> 
> On my machine, it goes head-to-head with cjson:
> 
>   $> python shootout.py
>   cjson: 0.00308704376221
>   _tnetstring 0.0030951499939

Ahem.  As Tordek points out, those are stupidly small numbers.  There
was an error in my shootout code which meant it was basically timing
nothing.  Fortunately, the results are still very similar when it
actually runs the two parsers:

   $> python shootout.py 
   cjson: 1.35818314552
  _tnetstring 1.35400009155


  Ryan


> 
> That's the result of about 2 hours of hacking, so there's probably room
> for a fair bit of optimisation.  Currently it only parses, doesn't
> render.  It also segfaults on bad input from time to time.
> 
> Still, I think beating cjson is very doable without much work.
> 
> The parser core is written to use a struct of callback functions to
> build up the result, so it should be straightforward to adapt for use
> outside of python.  If I get a chance, I'll try to do a version using
> the ADTs from mongrel2.
> 
> 
>   Cheers,
> 
>      Ryan
> 

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] tnetstrings round2

From:
Ryan Kelly
Date:
2011-03-22 @ 06:44
could not decode message

Re: [mongrel2] tnetstrings round2

From:
Ryan Kelly
Date:
2011-03-20 @ 23:06
On Sun, 2011-03-20 at 15:52 -0700, Zed A. Shaw wrote:
> Alright, tweaked the code a bit more to support all the data types that
> JSON has, and be backward compatible with existing netstrings:
> 
> http://codepad.org/gfpQLnZP
> 
> That will parse the original netstring as a "blob" which in python is
> just a string.
>
> 
> Next up, generating this from the above.  I'll probably have to make an
> explicit Blob() class for Python so that it round-trips.

The sqlite3 bindings make you wrap strings with the built-in buffer
object to indicate bytes-vs-text.  Might be simpler than creating your
own Blob class.


  Ryan

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] tnetstrings round2

From:
Ryan Kelly
Date:
2011-03-20 @ 23:04
On Sun, 2011-03-20 at 15:52 -0700, Zed A. Shaw wrote:
> Alright, tweaked the code a bit more to support all the data types that
> JSON has, and be backward compatible with existing netstrings:
> 
> http://codepad.org/gfpQLnZP
> 
> That will parse the original netstring as a "blob" which in python is
> just a string.

So what's the difference between a string and a blob?  Since this is a
wire-format, they're both coming in as bytes.  Is there an encoding or
something?

I'd rather not see a python-style "mongrel3" transition just to sort out
a strings-vs-bytes issue that we left ambiguous early in the design :-)


  Ryan


-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] tnetstrings round2

From:
Zed A. Shaw
Date:
2011-03-20 @ 23:11
On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote:
> So what's the difference between a string and a blob?  Since this is a
> wire-format, they're both coming in as bytes.  Is there an encoding or
> something?

Good point, maybe say just , and no " and say it's bytes always, with no
interpretation?

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] tnetstrings round2

From:
Loic d'Anterroches
Date:
2011-03-21 @ 08:34

On 2011-03-21 00:11, Zed A. Shaw wrote:
> On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote:
>> So what's the difference between a string and a blob?  Since this is a
>> wire-format, they're both coming in as bytes.  Is there an encoding or
>> something?
> 
> Good point, maybe say just , and no " and say it's bytes always, with no
> interpretation?

Yes please. It is a really big can of worms starting to do the string
and blob difference. You will get endless of problems because so many
people don't even know the concept of encoding. So you will get a lot of
problems because the length of the string in the netstring will not
match the length as given by the encoded aware len("string").

Please, no encoding dependency at the protocol level.

loïc

Re: [mongrel2] tnetstrings round2

From:
Ryan Kelly
Date:
2011-03-20 @ 23:14
On Sun, 2011-03-20 at 16:11 -0700, Zed A. Shaw wrote:
> On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote:
> > So what's the difference between a string and a blob?  Since this is a
> > wire-format, they're both coming in as bytes.  Is there an encoding or
> > something?
> 
> Good point, maybe say just , and no " and say it's bytes always, with no
> interpretation?

Or to quote the mongrel2 manual: "Sorry, Unicodians, It's All ASCII"

  Cheers,

     Ryan

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] tnetstrings round2

From:
joshua simmons
Date:
2011-03-20 @ 23:18
It's not ascii though, just make it 8 bit clean and everybody's happy.
Our wire formats may be ascii, or not, it shouldn't matter to the protocol.

On Mon, Mar 21, 2011 at 10:14 AM, Ryan Kelly <ryan@rfk.id.au> wrote:

> On Sun, 2011-03-20 at 16:11 -0700, Zed A. Shaw wrote:
> > On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote:
> > > So what's the difference between a string and a blob?  Since this is a
> > > wire-format, they're both coming in as bytes.  Is there an encoding or
> > > something?
> >
> > Good point, maybe say just , and no " and say it's bytes always, with no
> > interpretation?
>
> Or to quote the mongrel2 manual: "Sorry, Unicodians, It's All ASCII"
>
>  Cheers,
>
>     Ryan
>
> --
> Ryan Kelly
> http://www.rfk.id.au  |  This message is digitally signed. Please visit
> ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details
>
>