Regex: Exclude first line with brackets

Regex: Exclude first line with brackets - php

I am trying to strip all HTML brackets, except anything from the first line of code using this REGEX
(?ms)(?!\A)<[^>]*>
It's very close to working, unfortunately it strips the closing brackets from the first line as well. The example I am working with is:
<div id="uniquename">https://www.example.com?item_id=10302</div>
<div id="uniqname2">
<div id="uniqname3">
<h2 id="uniqnametitle">Title</h2>
<div class="row">
<div class="large-3 columns">Example:</div>
<div class="large-9 columns"><b>Sub example</b></div>
</div>
<div class="row">
<div class="large-3 columns">Additional</div>
The current REGEX removes all other HTML tags and excludes the first line with the exception of the trailing div close tag and outputs the following:
<div id="uniquename">https://www.example.com?item_id=10302
Title
Example:
Sub example
Additional
If there is a better way to perform the REGEX than excluding the first line I am open to suggestions. Skipping the first line seems to be the easiest way, however, I need the end bracket to stay intact.
What am I missing in my REGEX?

You can try this
(?ms)((?<firstline>\A[^\n]*)|(<[^>]*>))
With substitution
$firstline
Playground for your example - https://regex101.com/r/ASItOP/3

UPDATE 1 : just realized it could be massively simplified
gawk 'NR==!_ || (NF=NF)*/./' FS='<[^>]+>' OFS=
mawk 'NR==!_ || (NF=NF)*/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS=
1 <div id="uniquename">https://www.example.com?item_id=10302</div>
2 Title
3 Example:
4 Sub example
5 Additional

You should use an HTML parser in general...
However, you can do:
$ cat <(head -n 1 file) <(sed 1d file | sed -E 's/<[^>]*>//g; /^$/d')
Or an awk:
$ awk 'FNR==1 {print; next}
{gsub(/<[^>]*>/,""); if ($0) print}' file
Either prints:
<div id="uniquename">https://www.example.com?item_id=10302</div>
Title
Example:
Sub example
Additional

Related

How to receive values of html div tag with specific class?

I am parsing an HTML string, but I have a problem. I would like to get the values inside of the divs with the class of product__info__value using regex but not with the client side DOM.
I have tried the following code:
$reg_ex = "<div[^<>]*class=\"my-class\"[^<>]*>[\s\S]*?</div>";
But it didn't really work for me.
This is the html input:
<div class="product__info__group">
<div class="product__info__name">Производитель</div>
<div class="product__info__value">Holzhof</div>
</div>
<div class="product__info__group">
<div class="product__info__name">Страна</div>
<div class="product__info__value"></div>
</div>
I need these values in array form:
Производитель, Holzhof, Страна
Thank you very much for help, really appreciate your help!

You can break it into 3 parts. The part before the name/value, the actual name/value that you want and something after the name/value.
product__info__(?:name|value)"> # THE END OF THE TAG
(?:[^<]+) # ANYTHING THAT'S NOT AN OPENING ANGLE BRACKET
(?=<) # A LOOKAHEAD WITH AN OPENING ANGLE BRACKET
All put together, it would look like this:
preg_match_all('~product__info__(?:name|value)">\K(?:[^<]+)(?=<)~', $string, $matches);
Here is a demo

How to convert html into one line string?

I have some large templates as html:
<% /*
This is basically all the markup and interface
/* %>
<div id="test" class="test-right"></div>
<div id="test" class="test-right"></div>
Which I need to have as one line string, for example:
<% /*\n This is basically all the markup and interface\n*/ %>\n<div id=\"test\" class=\"test-right\"></div>\n <div id=\"test\" class=\"test-right\"></div>
How can you do that?
original_string='test this id="one" test this id="two"'
string_to_replace_Suzi_with=\"
result_string="${original_string/"/$string_to_replace_Suzi_with}"

If you want to replace newlines (carriage return) by a litteral \n, you can use awk:
awk -v ORS='\\n' 1 file
This will replace the output record separator ORS to a litteral \n.
The 1 triggers the default action in awk, print the entire record.

If you are looking for a pure bash way to do this, you can run them in command-line one at a time or in a script.
# Store the contents of the file into a variable
fileContents="$(<file)"
# Create a temporary string using 'GNU mktemp'
tempfile="$(mktemp)"
# Parameter substitution syntax to replace the new-line character by
# empty string and store it in the file identified by the temporary
# name
printf "%s\n" "${fileContents//$'\n'//}" > "$tempfile"
# Revert the temp file to original file
mv "$tempfile" file
But considering bash is slower for this trivial task, use Awk by restting the ORS from new-line to empty string,
awk -v ORS="" 1 file
<% /* This is basically all the markup and interface/* %><div id="test" class="test-right"></div> <div id="test" class="test-right"></div>

How to fetch all anchor tag text from div using regexp

I have below html content from that I want to fetch all text between anchor tag
<div class="row mb-xlg"><div class="col-md-12">
<div class="heading heading-border heading-middle-border"><h3>Compatible Models</h3></div>
<div class="row show-grid">
<div class="col-md-4">PC19S80</div>
<div class="col-md-4">PC25580</div>
<div class="col-md-4">PC25S80</div>
<div class="col-md-4">PC27S80</div>
</div></div></div>
I have below regular expression which returns all text between anchor tag
<a[^>]*>([^<]+)<\/a>+
Tested on this website
Result -
Full match `PC25580`
Group 1. `PC25580`
Match 3
Full match `PC25S80`
Group 1. `PC25S80`
Match 4
Full match `PC27S80`
Group 1. `PC27S80`
But I want to add Compatible Models word condition like
<h3>Compatible Models<\/h3>.*?<a[^>]*>([^<]+)<\/a>+
In this condition it returns only first anchor tag result.
How can I achieve all anchor tag text result and store in an array

Don't use regular expressions for this. Instead you should use a DOM Parser:
How do you parse and process HTML/XML in PHP?
The next link just contains an excellent answer why you shouldn't use regex:
RegEx match open tags except XHTML self-contained tags

laravel - partials are inserting empty text into HTML - " "

<div style="margin-top:-21px">
#include('partials.header')
</div>
<div style="margin-top:-21px">
#include('partials.navig')
</div>
<div style="margin-top:0px">
#include('partials.footer')
</div>
Above HTML/Laravel code inserts partials into layout. Every time when I use partial, it will insert empty spaces into HTML output and causing ugly white-space (empty row) above the partial. That's why I am using margin:top:-21px, to hide empty row . But the problem is, that in Internet Explorer are not those white-spaces visible and therefore partials are shifted too much. Here is an HTML output and how empty row looks like:
I have no clue what can cause these white-spaces, it is not wrong margin of elements or something like that. Is there any solution or explanation for this?

I found solution:
Problem was in encoding. Change from UTF-8 to UTF-8 w/o DOM made it.
Alternate solution is wrap partial into div with line-height:0 and div in partials set back to line-height:1.

This is because you are including partials in new row. Try including them in same row and it should fix your problem.
<div style="margin-top:-21px">#include('partials.header')</div>

Laravel doesn't insert blank lines. You should look at your partial files and make sure there are no blank lines / empty spaces at the beginning and at the end. You should also consider including those partials right after closing div and not in next line.
For example:
<div style="margin-top:-21px">#include('partials.header')</div>
<div style="margin-top:-21px">#include('partials.navig')</div>
<div style="margin-top:0px">#include('partials.footer')</div>
And when you put in those partials only the name of file, you will get the following output:
<div style="margin-top:-21px">header</div>
<div style="margin-top:-21px">navig</div>
<div style="margin-top:0px">footer</div>
As you see - no spaces, no blank lines.
Of course the trick with negative margin is the very wrong solutions, so you should analyze your partials and also change including those files to the method I showed here.

preg_replace is replacing matches including content contained within

I am using preg_replace to replace HTML comment tags with empty space but it seems to be replacing the whole HTML comment with empty space.
echo preg_replace('/<!--(.*?)-->/','',$r->pageCont);
Where $r->pageCont is a database entry containing HTML, for example:
<div class="col-lg-12">
<p>The year is:</p>
<!-- <?php echo date(Y); ?> -->
</div>
In the above example, the HTML comment tags would be stripped away leaving only the PHP code to echo the year. Like I said, what is happening is the entire HTML comment is being stripped away.
Can someone recommend a pattern to use? Would appreciate your input.
EDIT: updated question to reflect the code I am using.

It seems like you're trying to replace the comment line with the php code present inside that. If yes, then you need to put the replacement string as $1 so that it would refer to the group index 1.
echo preg_replace('/<!--(.*?)-->/', '$1', $r->pageCont);
DEMO

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex: Exclude first line with brackets - php

You can try this (?ms)((?<firstline>\A[^\n])|(<[^>]>)) With substitution $firstline Playground for your example - https://regex101.com/r/ASItOP/3

UPDATE 1 : just realized it could be massively simplified gawk 'NR==!_ || (NF=NF)/./' FS='<[^>]+>' OFS= mawk 'NR==!_ || (NF=NF)/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS= 1 <div id="uniquename">https://www.example.com?item_id=10302</div> 2 Title 3 Example: 4 Sub example 5 Additional

Related

How to receive values of html div tag with specific class?

How to convert html into one line string?

How to fetch all anchor tag text from div using regexp

laravel - partials are inserting empty text into HTML - " "

preg_replace is replacing matches including content contained within

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex: Exclude first line with brackets - php

You can try this (?ms)((?<firstline>\A[^\n]*)|(<[^>]*>)) With substitution $firstline Playground for your example - https://regex101.com/r/ASItOP/3

UPDATE 1 : just realized it could be massively simplified gawk 'NR==!_ || (NF=NF)*/./' FS='<[^>]+>' OFS= mawk 'NR==!_ || (NF=NF)*/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS= 1 <div id="uniquename">https://www.example.com?item_id=10302</div> 2 Title 3 Example: 4 Sub example 5 Additional

Related

How to receive values of html div tag with specific class?

How to convert html into one line string?

How to fetch all anchor tag text from div using regexp

laravel - partials are inserting empty text into HTML - " "

preg_replace is replacing matches including content contained within

Categories

Resources

You can try this (?ms)((?<firstline>\A[^\n])|(<[^>]>)) With substitution $firstline Playground for your example - https://regex101.com/r/ASItOP/3

UPDATE 1 : just realized it could be massively simplified gawk 'NR==!_ || (NF=NF)/./' FS='<[^>]+>' OFS= mawk 'NR==!_ || (NF=NF)/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS= 1 <div id="uniquename">https://www.example.com?item_id=10302</div> 2 Title 3 Example: 4 Sub example 5 Additional