PHP - Scraping a DIV Element from a Web Page using preg_match

PHP - Scraping a DIV Element from a Web Page using preg_match - php

I am trying to use preg_match currently just to retrieve 1 value (before I move onto retrieving multiple values), however, I am having no luck. When I perform a print_r() there is nothing stored in my array.
Here is my code what i am trying so far:
<?php
$content = '<div class="text-right font-90 text-default text-light last-updated vertical-offset-20">
Reported ETA Received:
<time datetime="2017-02-02 18:12">2017-02-02 18:12</time>
UTC
</div>';
preg_match('|Reported ETA Received: <time datetime=".+">(.*)</time>(.*)\(<span title=".+">(.*)<time datetime=".+">(.*)</time></span>\)|', $content, $reported_eta_received);
if ($reported_eta_received) {
$arr_parsed['reported_eta_received'] = $reported_eta_received[1];
}
?>
Required Output:
2017-02-02 18:12
My above-mentioned code is not working. Any help on this regards would be appreciated. Thanks in advance.

It may not match because there is a new line between Reported ETA Received: and the <time> tag. And you've just put in there a space (use [\n\r\s\t]+ instead " ").
Also, why don't you simply use:
preg_match('|<time datetime=".*?">(.*?)</time>|', $content, $reported_eta_received);
You can also use:?P<name> for a easier pointing (associative vs numeric: numeric can change if you put more capture groups).
preg_match('|<time datetime=".*?">(?P<name>.*?)</time>|', $content, $match);
print_r($match); // $match['name'] should be there if matched.

Related

HTML output based in input number in PHP

I have values like so:
0.00000500
0.00003491
0.00086583
1.45304093
etc
I would like to run these through a PHP function so they become:
<span class="text-muted">0.00000</span>500
<span class="text-muted">0.0000</span>3491
<span class="text-muted">0.000</span>86583
1.45304093
What I have now is:
$input_number str_replace('0', '<span class="text-muted">0</span>', $input_number);
$input_number str_replace('.', '<span class="text-muted">.</span>', $input_number);
This is a bit 'aggressive' as it would replace every character instead of using the <span> once, but I guess that's OK, even if I have say 1000 numbers on a page. But the biggest problem I have is that my code would also 'mute' the last 2 digits in 0.00000500 which I don't want.

First of all:
$input_number = str_replace('0.', '', $input_number);
We are replacing 0. with empty string
Secondly:
Use preg replace()
$newNumber = preg_replace('/^0?/','<span class="text-muted">0</span>',$input_number);
Basically /^0?/ is looking for leading 0's which will replace with span, you can also replce with empty or anything you want.

how to do echo from a string, only from values that are between a specific stretch[href tag] of the string?

[PHP]I have a variable for storing strings (a BIIGGG page source code as string), I want to echo only interesting strings (that I need to extract to use in a project, dozens of them), and they are inside the quotation marks of the tag
but I just want to capture the values that start with the letter: N (news)
[<a href="/news7044449/exclusive_news_sunday_"]
<a href="/n[ews7044449/exclusive_news_sunday_]"
that is, I think you will have to work with match using: [a href="/n]
how to do that to define that the echo will delete all the texts of the variable, showing only:
note that there are other hrefs tags with values that start with other letters, such as the letter 'P' : href="/profiles... (This does not interest me.)
$string = '</div><span class="news-hd-mark">HD</span></div><p>exclusive_news_sunday_</p><p class="metadata"><span class="bg">Czech AV<span class="mobile-hide"> - 5.4M Views</span>
- <span class="duration">7 min</span></span></p></div><script>xv.thumbs.preparenews(7044449);</script>
<div id="news_31720715" class="thumb-block "><div class="thumb-inside"><div class="thumb"><a href="/news31720715/my_sister_running_every_single_morning"><img src="https://static-hw.xnewss.com/img/lightbox/lightbox-blank.gif"';
I imagine something like this:
$removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n = ('/something regex expresion I think /' or preg_match, substring?);
echo $string = str_replace($removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n,'',$string);
expected output: /news7044449/exclusive_news_sunday_
NOTE: it is not essential to be through a variable, it can be from a .txt file the place where the extracts will be extracted, and not necessarily a variable.
thanks.

I believe this will help her.
<?php
$source = file_get_contents("code.html");
preg_match_all("/<a href=\"(\/n(?:.+?))\"[^>]*>/", $source, $results);
var_export( end($results) );
Step by Step Regex:
Regex Demo
Regex Debugger

To get just the links out of the $results array from Valdeir's answer:
foreach ($results as $r) {
echo $r;
// alt: to display them with an HTML break tag after each one
echo $r."<br>\n";
}

SIMPLE HTML DOM - how to ignore nested elements?

My html code is as follows
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>
What I want to do is extract the 'i want this text' leaving all of the other elements behind. I've tried several iterations of the following, but none return the text I need:
$name = trim($page->find('span[class!=ignore^] a[class!=also^] span[class=phone]',0)->innertext);
Some guidance would be appreciated as the simple_html_dom section on filters is quite bare.

what about using php preg_match (http://php.net/manual/en/function.preg-match.php)
try the below:
<?php
$html = <<<EOF
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>;
EOF;
$result = preg_match('#class="phone".*\n(.*)#', $html, $matches);
echo $matches[1];
?>
regex explained:
find text class="phone" then proceed until the end of the line, matching any character using *.. Then switch to a new line with \n and grab everything on that line by enclosing *. into brackets.
The returned result is stored in the array $matches. $matches[0] holds the value that is returned from the whole regex, while $matches[1] holds the value that is return by the closing brackets.

ECHO Variable include newline

I have the following code:
<p><?php echo $item['desc']; ?></p>
The code pulls the following from the database:
Point 1
Point 2
Point 3
and displays it as: Point 1 Point 2 Point 3,
What do I need to do to get the new lines included, I've tried adding /n or tags into to the DB reference however it is not making any difference.

This should work
<p><?php echo nl2br($item['desc']); ?></p>
Otherwise:-
echo nl2br(str_replace(' ',"\n", $item['desc']));

The newline is included. It's just not displayed in the browser, because the HTML standard says so. If you want it displayed in the browser, change newline to <br> tags, e.g. using nl2br()

Maybe the newlines are being printed, but you need them to be <br> tags, so they will appear as newlines on the webpage? You can use the function nl2br() for that:
<p><?php echo nl2br($item['desc']); ?></p>

Try this
echo str_replace(' ',"\n", $item['desc']);

\n won't work because the string you print has to be processed by the browser in html.
Use <br> tag instead.

Try like
echo str_replace(' ',"<br>", $item['desc']);

I couldn't understand exactly what you want but probably it's something like this:
$string = "Point 1 Point 2 Point 3";
$string = preg_replace("/(point \d+ )/i", "\$1\r\n", $string);
echo $string
Output:
Point 1
Point 2
Point 3
Note: Use only nl2br (see other answers) if the new lines are already in the string but just not displayed.

if your code in one line like TEST TEST TEST you can use preg_replace function :
echo preg_replace("{[ ]+}","<br/>",$items['desc']);
if your string like this TEST 1 TEST 2 ... and your goal is :
TEST 1
TEST 2
... so on..
you can use this pattern:
$t = preg_match_all("{[\w]+[ ]?[0-9]+}",$items['desc'],$m);
foreach ($m[0] as $val) { echo $val."<br/>"; }

Exclude a group regular expression

I search in many threads in Stackoverflow but I didn't find anything relevant for my case.
Here is the source text :
<span class="red"><span>70</span><span style="display:none">1</span><span>,89</span> € TTC<br /></span>
I want to extract 70,89 with a regular expression.
So I tried :
<span class="red"><span>([0-9]+)(<\/span><span style="display:none">1<\/span><span>)(,[0-9]+)<\/span>
which returns an array (with preg_match_all in PHP) with 3 groups :
1/ 70
2/
</span><span style="display:none">1</span><span>
3/ ,89
I would like to exclude group 2 and merge 1 & 3.
So I also tried :
<span class="red"><span>([0-9]+)(?:<\/span><span style="display:none">1<\/span><span>)(,[0-9]+)<\/span>
but it returns :
70
,89
How can I merge the two groups ?
Thanks a lot for your answers, I am going to be crazy searching for this regular expression ! :)
Have a good day !

Just match the numbers that are wrapped with a plain <span>:
$str = '<span class="red"><span>70</span><span style="display:none">1</span><span>,89</span> € TTC<br /></span>';
if (preg_match_all('#<span>([,\d]+)</span>#', $str, $matches)) {
echo join('', $matches[1]);
}
// output: 70,89

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - Scraping a DIV Element from a Web Page using preg_match - php

Related

HTML output based in input number in PHP

how to do echo from a string, only from values that are between a specific stretch[href tag] of the string?

SIMPLE HTML DOM - how to ignore nested elements?

ECHO Variable include newline

Exclude a group regular expression

Categories

Resources