Regex: Exclude first line with brackets - php

I am trying to strip all HTML brackets, except anything from the first line of code using this REGEX
(?ms)(?!\A)<[^>]*>
It's very close to working, unfortunately it strips the closing brackets from the first line as well. The example I am working with is:
<div id="uniquename">https://www.example.com?item_id=10302</div>
<div id="uniqname2">
<div id="uniqname3">
<h2 id="uniqnametitle">Title</h2>
<div class="row">
<div class="large-3 columns">Example:</div>
<div class="large-9 columns"><b>Sub example</b></div>
</div>
<div class="row">
<div class="large-3 columns">Additional</div>
The current REGEX removes all other HTML tags and excludes the first line with the exception of the trailing div close tag and outputs the following:
<div id="uniquename">https://www.example.com?item_id=10302
Title
Example:
Sub example
Additional
If there is a better way to perform the REGEX than excluding the first line I am open to suggestions. Skipping the first line seems to be the easiest way, however, I need the end bracket to stay intact.
What am I missing in my REGEX?

You can try this
(?ms)((?<firstline>\A[^\n]*)|(<[^>]*>))
With substitution
$firstline
Playground for your example - https://regex101.com/r/ASItOP/3

UPDATE 1 : just realized it could be massively simplified
gawk 'NR==!_ || (NF=NF)*/./' FS='<[^>]+>' OFS=
mawk 'NR==!_ || (NF=NF)*/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS=
1 <div id="uniquename">https://www.example.com?item_id=10302</div>
2 Title
3 Example:
4 Sub example
5 Additional

You should use an HTML parser in general...
However, you can do:
$ cat <(head -n 1 file) <(sed 1d file | sed -E 's/<[^>]*>//g; /^$/d')
Or an awk:
$ awk 'FNR==1 {print; next}
{gsub(/<[^>]*>/,""); if ($0) print}' file
Either prints:
<div id="uniquename">https://www.example.com?item_id=10302</div>
Title
Example:
Sub example
Additional

Related

How to receive values of html div tag with specific class?

I am parsing an HTML string, but I have a problem. I would like to get the values inside of the divs with the class of product__info__value using regex but not with the client side DOM.
I have tried the following code:
$reg_ex = "<div[^<>]*class=\"my-class\"[^<>]*>[\s\S]*?</div>";
But it didn't really work for me.
This is the html input:
<div class="product__info__group">
<div class="product__info__name">Производитель</div>
<div class="product__info__value">Holzhof</div>
</div>
<div class="product__info__group">
<div class="product__info__name">Страна</div>
<div class="product__info__value"></div>
</div>
I need these values in array form:
Производитель, Holzhof, Страна
Thank you very much for help, really appreciate your help!
You can break it into 3 parts. The part before the name/value, the actual name/value that you want and something after the name/value.
product__info__(?:name|value)"> # THE END OF THE TAG
(?:[^<]+) # ANYTHING THAT'S NOT AN OPENING ANGLE BRACKET
(?=<) # A LOOKAHEAD WITH AN OPENING ANGLE BRACKET
All put together, it would look like this:
preg_match_all('~product__info__(?:name|value)">\K(?:[^<]+)(?=<)~', $string, $matches);
Here is a demo

How to convert html into one line string?

I have some large templates as html:
<% /*
This is basically all the markup and interface
/* %>
<div id="test" class="test-right"></div>
<div id="test" class="test-right"></div>
Which I need to have as one line string, for example:
<% /*\n This is basically all the markup and interface\n*/ %>\n<div id=\"test\" class=\"test-right\"></div>\n <div id=\"test\" class=\"test-right\"></div>
How can you do that?
original_string='test this id="one" test this id="two"'
string_to_replace_Suzi_with=\"
result_string="${original_string/"/$string_to_replace_Suzi_with}"
If you want to replace newlines (carriage return) by a litteral \n, you can use awk:
awk -v ORS='\\n' 1 file
This will replace the output record separator ORS to a litteral \n.
The 1 triggers the default action in awk, print the entire record.
If you are looking for a pure bash way to do this, you can run them in command-line one at a time or in a script.
# Store the contents of the file into a variable
fileContents="$(<file)"
# Create a temporary string using 'GNU mktemp'
tempfile="$(mktemp)"
# Parameter substitution syntax to replace the new-line character by
# empty string and store it in the file identified by the temporary
# name
printf "%s\n" "${fileContents//$'\n'//}" > "$tempfile"
# Revert the temp file to original file
mv "$tempfile" file
But considering bash is slower for this trivial task, use Awk by restting the ORS from new-line to empty string,
awk -v ORS="" 1 file
<% /* This is basically all the markup and interface/* %><div id="test" class="test-right"></div> <div id="test" class="test-right"></div>

How to fetch all anchor tag text from div using regexp

I have below html content from that I want to fetch all text between anchor tag
<div class="row mb-xlg"><div class="col-md-12">
<div class="heading heading-border heading-middle-border"><h3>Compatible Models</h3></div>
<div class="row show-grid">
<div class="col-md-4">PC19S80</div>
<div class="col-md-4">PC25580</div>
<div class="col-md-4">PC25S80</div>
<div class="col-md-4">PC27S80</div>
</div></div></div>
I have below regular expression which returns all text between anchor tag
<a[^>]*>([^<]+)<\/a>+
Tested on this website
Result -
Full match `PC25580`
Group 1. `PC25580`
Match 3
Full match `PC25S80`
Group 1. `PC25S80`
Match 4
Full match `PC27S80`
Group 1. `PC27S80`
But I want to add Compatible Models word condition like
<h3>Compatible Models<\/h3>.*?<a[^>]*>([^<]+)<\/a>+
In this condition it returns only first anchor tag result.
How can I achieve all anchor tag text result and store in an array
Don't use regular expressions for this. Instead you should use a DOM Parser:
How do you parse and process HTML/XML in PHP?
The next link just contains an excellent answer why you shouldn't use regex:
RegEx match open tags except XHTML self-contained tags

laravel - partials are inserting empty text into HTML - " "

<div style="margin-top:-21px">
#include('partials.header')
</div>
<div style="margin-top:-21px">
#include('partials.navig')
</div>
<div style="margin-top:0px">
#include('partials.footer')
</div>
Above HTML/Laravel code inserts partials into layout. Every time when I use partial, it will insert empty spaces into HTML output and causing ugly white-space (empty row) above the partial. That's why I am using margin:top:-21px, to hide empty row . But the problem is, that in Internet Explorer are not those white-spaces visible and therefore partials are shifted too much. Here is an HTML output and how empty row looks like:
I have no clue what can cause these white-spaces, it is not wrong margin of elements or something like that. Is there any solution or explanation for this?
I found solution:
Problem was in encoding. Change from UTF-8 to UTF-8 w/o DOM made it.
Alternate solution is wrap partial into div with line-height:0 and div in partials set back to line-height:1.
This is because you are including partials in new row. Try including them in same row and it should fix your problem.
<div style="margin-top:-21px">#include('partials.header')</div>
Laravel doesn't insert blank lines. You should look at your partial files and make sure there are no blank lines / empty spaces at the beginning and at the end. You should also consider including those partials right after closing div and not in next line.
For example:
<div style="margin-top:-21px">#include('partials.header')</div>
<div style="margin-top:-21px">#include('partials.navig')</div>
<div style="margin-top:0px">#include('partials.footer')</div>
And when you put in those partials only the name of file, you will get the following output:
<div style="margin-top:-21px">header</div>
<div style="margin-top:-21px">navig</div>
<div style="margin-top:0px">footer</div>
As you see - no spaces, no blank lines.
Of course the trick with negative margin is the very wrong solutions, so you should analyze your partials and also change including those files to the method I showed here.

preg_replace is replacing matches including content contained within

I am using preg_replace to replace HTML comment tags with empty space but it seems to be replacing the whole HTML comment with empty space.
echo preg_replace('/<!--(.*?)-->/','',$r->pageCont);
Where $r->pageCont is a database entry containing HTML, for example:
<div class="col-lg-12">
<p>The year is:</p>
<!-- <?php echo date(Y); ?> -->
</div>
In the above example, the HTML comment tags would be stripped away leaving only the PHP code to echo the year. Like I said, what is happening is the entire HTML comment is being stripped away.
Can someone recommend a pattern to use? Would appreciate your input.
EDIT: updated question to reflect the code I am using.
It seems like you're trying to replace the comment line with the php code present inside that. If yes, then you need to put the replacement string as $1 so that it would refer to the group index 1.
echo preg_replace('/<!--(.*?)-->/', '$1', $r->pageCont);
DEMO

Categories