Capturing Words in HTML tag

Capturing Words in HTML tag - php

I want to know what is the most regular optimized to capture keywords in an HTML text expression.
Note that I am using PHP.
I have a piece of HTML code like this:
...
<li><span class="fl">
Dish</span>
<div class="oflow">
<span class="1F4446484E1FCB4FC3C21FC04AC6C21E232020211F underline">
pasta</span>
, <span class="1F4446484E1FCB4FC3C21FC04AC6C21E23202A251F underline">
rice</span>
, <span class="1F4446484E1FCB4FC3C21FC04AC6C21E2320202B1F underline">
potatoes</span>
</div>
</li>
...
I want to select the available dishes (pasta, rice and potatoes), knowing that the only word that is always the same is "Dish" and that there's always a span between each keyword that I would recover.
Thank you in advance.

<?php
var $aDishes = explode(',', strip_tags($sHtml));
?>

Related

How to get content from Div which have other HTML tags using Regexp

I have div which contain other html tags along with text
I want to extract only text from this div OR inside all html tags
<div class="rpr-help m-chm">
<div class="header">
<h2 class="h6">Repair Help</h2>
</div><!-- /end .header -->
<div class="inner m-bsc">
<ul>
<li>Repair Video</li>
<li>Repair Q&A</li>
</ul>
</div>
<div>
<br>
<span class="h4">Cross Reference Information</span><br>
<p>Part Number 285753A (AP3963893) replaces 1195967, 280152, 285140, 285743, 285753, 3352470, 3363664, 3364002, 3364003, 62672, 62693, 661560, 80008, 8559748, AH1485646, EA1485646, PS1485646.
<br>
</p>
</div>
</div>
Here is my Regexp
preg_match_all("/<div class=\"rpr-help m-chm\">(.*)<\/.*>/s", $urlcontent, $description);
Its working fine whenever I assign this complete div to $urlcontent variable.
But when I am fetching data from real url like $urlcontent = "www.test.com/test.html";
its returning complete webpage script.
How can I get inside content of <div class="rpr-help m-chm"> ?
Is there any correction require in my regexp?
Any help would be appreciated. Thanks

It's not possible to parse HTML/XHTML by regex. Source
You can't parse [X]HTML with regex. Because HTML can't be parsed by
regex. Regex is not a tool that can be used to correctly parse HTML
Based on the language you use, Please consider using a thirdpart library for HTML parsing.

use this function
function GetclassContent($tagStart,$tagEnd,$content)
{
$first_step = explode( $tagStart,$content );
$second_step = explode($tagEnd,$first_step[1] );
return $second_step[0];
}
Steps to Use Above function
$website="www.test.com/test.html";
$content=file_get_contents($website);
$tagStart ='<div class="rpr-help m-chm">';
$tagEnd = "</div >";
$RequiredContent = GetclassContent($tagStart,$tagEnd,$content);

Regex contents outside of two tags

I am trying to extract contents that lie outside two sets of html tags.
The HTML is set up like so:
<div class="col-md-4 col-sm-6 col-lg-3">
<small class="text-muted pull-right">4.4</small>
<i class="custom-icon"></i>
desired content to retrieve
<span class="text-muted">some other text here</span>
</div>
I need to retrieve the content "desired content to retrieve" which lies after the </i> and before the <span class="text-muted">.
I've tried:
$custom_regex= '#</i>(.*?)<span class="text-muted">#';
$text_scan = preg_match_all( $custom_regex, $content_to_scan, $text_array );
with no success. The $text_array variable returns empty.
I'm not that great with regex, so maybe my expression is incorrect for what I'm after.

Wouldn't usage of lookarounds be better?
(?<=<\/i>)\s*(.*?)\n.*(?=<span)
Demo: https://regex101.com/r/zK2wD8/8

If you insist on regex, try this.
/<\/i>\s*(.*?)\n.*<span class="text-muted"/g

Php simple html dom parser find string with any character

I have this html
<div class="price-box">
<p class="old-price">
<span class="price-label">This:</span>
<span class="price" id="old-price-326">
8,69 € </span>
</p>
<p class="special-price">
<span class="price-label">This is:</span>
<span class="price" id="product-price-326">
1,99 € </span> <span style="">/ 6.87 </span>
</p>
</div>
I'm need get "1,99 €", but the id 'product-price-326' is generating random numbers. How to find 'product-price-*'? I'm trying
foreach($preke->find('span[id="product-price-[0-9]"]') as $div)
and
foreach($preke->find('span[id="product-price-"]') as $div)
but it doesn't work.

As per my comment, here's what you need to do:
foreach($preke->find('span[id^="product-price-"]') as $div) {} // note the ^ before the =
^= means starts with.

I am not sure what $preke is, but if it's a DOM selector that supports proper class selectors you can use
$preke->find('span[id^="product-price"]')
or
$preke->find('span[id*="product-price"]')
The ^= tells it to look for elements that has an ID starting with "product-price" and the *= tells it to look for elements that has an ID that contains "product-price".

Try Like This Might Be Works
foreach($preke->find('span[id^="product-price-"]') as $div) { /* Code */ }

why not to get it using class?
echo $preke->find('.special-price', 0)->find('.price', 0)->plaintext;
this will get you 1,99 €

Get the content from HTML tags

I'm trying to get the content from the html tags
function get_model($html){
return preg_match('!<b>Model:</b>(.*?)<br>!i', $html, $matches) ? $matches[1] : '';
}
But, it returns "" string.
The entire html code looks like:
<div class="prodInfo">
<div class="prodOptions">
<div class="redBtn">
-
<input type="text" class="tnyTxt" value="1" name="quantity"/>
+
</div>
<br/>
<a href="/0-30cb9a-adjustable-pan-connector-p-mw555"
onclick="addToCart(139, $('.tnyTxt').val() ); return false;" class="redBtn"
id="button-cart">Add to Cart</a>
</div>
<p>
<b>Our Price: <span class="price">£5.55</span></b><br/>
<span class="grey">
(Exc. 20% VAT)<br/>
(£6.66 Inc. VAT)
</span>
</p>
<p>
<b>Model:</b> MW555<br/>
<b>Availability:</b> 2 - 3 Days</p>
</div>
I'm not quite understand why is this? even if I write preg_match('!<b>Model:</b>) it also return empty result. Could you help me please?

Please use this PHP Simple HTML DOM Parser.
This question have also duplicate :-
How parse HTML in PHP?

I prefer You to use phpQuery for this job.

PHP preg_replace to insert string inside an existing string

Unfortunately I really cannot get my head around regular expressions so my last resort is to ask the help of you fine people.
I have this existing code:
<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 2
Link 3
</div>
</li>
Due to a number of reasons, I have to use preg_replace to inject an additional piece of code:
Link 1
I think you can guess where that should go, but for the sake of clarity, my desire is for the resulting string to look like:
<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 1
Link 2
Link 3
</div>
</li>
Can anyone help me with the appropriate regular expression to achieve this?

try this
$html = '<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 2
Link 3
</div>
</li>';
$eleName = 'a';
$eleAttr = 'href';
$eleAttrValue = 'link2';
$addBefore = 'Link 1';
$result = regexAddBefore($html, $eleName, $eleAttr, $eleAttrValue, $addBefore);
var_dump($result);
function regexAddBefore($subject, $eleName, $eleAttr, $eleAttrValue, $addBefore){
$regex = "/(<\s*".$eleName."[^>]*".$eleAttr."\s*=\s*(\"|\')?\s*".$eleAttrValue."\s*(\"|\')?[^>]*>)/s";
$replace = $addBefore."\r\n$1";
$subject = preg_replace($regex, $replace, $subject);
return $subject;
}

I can suggest two things (Although I couldn't understand your problem clearly)
$newStr = preg_replace ('/<[^>]*>/', ' ', $htmlText);
this will remove all the html tags from the string. I don't know if it will be usefull for you.
Another recommendation would be to use strip_tags function. The second parameter of strip_tags is optional. You can define the tags you want to keep with the help of 2nd parameter.
$str = '<li id="id-21" class="listClass" data-author="newbie">
<div class="someDiv">
<span class="spanClass">Some content</span>
</div>
<div class="controls faint">
Link 2
Link 3
</div>
</li>';
echo strip_tags ($str,'<a>');
This will give you an output just with the links and whatever text in the html string.
Sorry if this also doesn't help.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Capturing Words in HTML tag - php

<?php var $aDishes = explode(',', strip_tags($sHtml)); ?>

Related

How to get content from Div which have other HTML tags using Regexp

Regex contents outside of two tags

Php simple html dom parser find string with any character

Get the content from HTML tags

PHP preg_replace to insert string inside an existing string

Categories

Resources