[wp-hackers] Portable tokenising from the shell

Doug Stewart zamoose at gmail.com
Sat Dec 1 20:10:48 UTC 2012


Does the UCB version of grep support all the right flags?


On Sat, Dec 1, 2012 at 11:57 AM, David Anderson <david at wordshell.net> wrote:

> Hi,
>
> Some of you may remember an earlier discussion about parsing JSON output,
> which is one of the formats available from api.wordpress.org. JSON was
> the most suitable for portably parsing from a Bourne/Bash shell.
>
> This guy has implemented such a parser already:
> http://github.com/dominictarr/**JSON.sh<http://github.com/dominictarr/JSON.sh>
>
> One part of the parser is this. It's the tokeniser, splitting up the JSON
> into parts:
>
>     local ESCAPE='(\\[^u[:cntrl:]]|\\u[**0-9a-fA-F]{4})'
>     local CHAR='[^[:cntrl:]"\\]'
>     local STRING="\"$CHAR*($ESCAPE$CHAR***)*\""
>     local NUMBER='-?(0|[1-9][0-9]*)([.][**0-9]*)?([eE][+-]?[0-9]*)?'
>     local KEYWORD='null|false|true'
>     local SPACE='[[:space:]]+'
>     grep -E -o "$STRING|$NUMBER|$KEYWORD|$**SPACE|."
>
> It's an interesting use of grep; basically it matches *everything*, but
> splits it up based on certain separators, in a certain order.
>
> However... my research shows that the "-o" switch (which causes grep to
> output only each matched portion, one per line) is not part of POSIX, but
> is nonetheless available in GNU (hence Linux and Cygwin), Free/Net/OpenBSD
> and Mac OS X - but not in Solaris (either in the grep in /usr/bin or in
> /usr/xpg4/bin).
>
> So it's not quite totally portable. My question: does anyone have
> sufficient sed or awk skills to advise me how to reproduce the above in one
> of those? As I said, it's a tokeniser, that splits the input into the
> discrete chunks indicated. I'm an awk novice. I'm trying to write code that
> assumes only POSIX, or failing that the common subset of
> GNU/BSD/Mac/Solaris. If I fail I can use various hacks (e.g. search for
> perl, use that if found, search for PHP, use that), but it'd be nice if I
> didn't have to resort to multiple code paths in that way.
>
> Many thanks,
> David
>
> --
> WordShell - WordPress fast from the CLI - www.wordshell.net
>
> ______________________________**_________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.**com <wp-hackers at lists.automattic.com>
> http://lists.automattic.com/**mailman/listinfo/wp-hackers<http://lists.automattic.com/mailman/listinfo/wp-hackers>
>



-- 
-Doug


More information about the wp-hackers mailing list