x404.co.uk
http://www.x404.co.uk/forum/

SED/regex help
http://www.x404.co.uk/forum/viewtopic.php?f=3&t=5672
Page 1 of 1

Author:  forquare1 [ Fri Jan 22, 2010 2:10 pm ]
Post subject:  SED/regex help

Hi all,

I have a number of files with a line like this:
Quote:
<h2 class="title" style="clear: both"><a xmlns="http://www.w3.org/1999/xhtml" id="id2826402"/>Conclusion: From Custom to Customary Law</h2></div></div><p xmlns="http://www.w3.org/1999/xhtml">We have examined the customs which regulate the ownership and control


I have a statement doing this:
Code:
# Mod <p> and <h> tags to preserve them
cat $FILE | sed 's/<p.*>/[p]/g' | sed 's/<\/p>/[\/p]/g' | sed 's/<h\([1-3]\)\(.\)*></[h\1]</g' | sed 's/<\/h\([1-3]\)\(.\)*></[\/h\1]</g' > $tmp1


Cat the file, change <> tags around p and /p to [] and tags from h* and /h* to []. However, the statement does this:
Quote:
[h2][p]We have examined the customs which regulate the ownership and control


The third SED expression is matching everything until the last '>' (just before the [p] tag). How do I make it so it only matches up to the FIRST '>' it comes to? I.E. I want this:

Quote:
[h]<a xmlns="http://www.w3.org/1999/xhtml" id="id2826402"/>Conclusion: From Custom to Customary Law[/h2]</div></div>[p]We have examined the customs which regulate the ownership and control


Thanks,
Ben

Author:  forquare1 [ Sat Jan 23, 2010 5:36 pm ]
Post subject:  Re: SED/regex help

Solved :D

I split the statement into two lines in the end when I cleaned up the script, I also played around and got it to do what I wanted it to:
Code:
# Mod <p> and <h> tags to preserve them
cat $FILE | sed 's/<p.*>/[p]/g' | sed 's/<\/p>/[\/p]/g' > $tmp1
cat $tmp1 | sed 's/<h\([1-3]\)\([^>]\)*>/[h\1]/g' | sed 's/<\/h\([1-3]\)\(.\)*>/[\/h\1]/g' > $tmp2


Simple after I re-read my book on pattern matching, I had missed it the first few times I scanned through it.

Now after a few more scripts I've got a legal, up-to-date copy of the book "The Cathedral and the Bazaar" by Eric Raymond, it's a good read :D

Author:  EddArmitage [ Tue Feb 02, 2010 4:27 pm ]
Post subject:  Re: SED/regex help

My turn:

It's been a bit of a long day and my brain's clearly missing something obvious. I have a file containing lines of input. I want to use grep to select those that end in a forward slash (ultimately I want to select everything but them, but that's a simple flag). What regexp do I need? I thought I've tried everything obvious:

Code:
egrep "/$" < input


Edd

Author:  Nick [ Tue Feb 02, 2010 8:34 pm ]
Post subject:  Re: SED/regex help

Argh my head has just imploded.

We had to write Sed in C last year. Absolute hell!!!!!!!

Author:  forquare1 [ Wed Feb 03, 2010 11:51 am ]
Post subject:  Re: SED/regex help

EddArmitage wrote:
My turn:

It's been a bit of a long day and my brain's clearly missing something obvious. I have a file containing lines of input. I want to use grep to select those that end in a forward slash (ultimately I want to select everything but them, but that's a simple flag). What regexp do I need? I thought I've tried everything obvious:

Code:
egrep "/$" < input


Edd


I'd do something like:
Code:
egrep \/$ < input

Author:  EddArmitage [ Wed Feb 03, 2010 11:57 am ]
Post subject:  Re: SED/regex help

forquare1 wrote:
EddArmitage wrote:
Code:
egrep "/$" < input

I'd do something like:
Code:
egrep \/$ < input

It worked fine in the end as was, when the input was piped straight in from the previous stage. I swear there must be something installed that uses hamsters as line endings on these damn CSC machines!

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/