Getting div data without using DOM [duplicate] - php

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Getting DIV content with Regular Expression
Let me first tell you that DOM is not an option on this one.
I simply have the html :
className">Name</div>......</div>....</div>
Now, i have created a regular expression like :
$match_count = preg_match_all('/className\">(.*)\<\/div\>/', $page, $matches);
This would seem fine to me, but for some reason, it gets more data than expected. That is, it finishes some closing divs later. How can i restrict it so that it gets the data only inside the first closing div ?

$match_count = preg_match_all('/className">(.*?)<\/div>/', $page, $matches);
use non greedy selector .*?

Use preg_match instead. It will stop searching after the first matched pattern.

This works:
$match_count = preg_match_all('/className\">(.*)\<\/div\>/', $page, $matches);
The U pattern modifier will make sure it finds the smallest possible match, not the biggest.

Related

My regex function seems to be right but it doesn't work [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 2 years ago.
I'm trying to remove a script that contains a malware from my database.
It was injected in a lot of registers of my table.
The script starts with a <script> tag and ends with a </script> tag.
I'm using the following code to find and replace it:
$content = $post->post_content;
$new_content = preg_replace('/(<script>.+?)+(<\/script>)/i', '', $content);
I've tested it on regx101.com and it's working fine but on my code, it doesn't work.Does anyone know what's wrong?
Here is my goto regex for <script>...</script> tags with their contents:
(\<script\>)([\s\S]*?)(<\/script>)
You're not escaping some key characters and you're not capturing everything which could be in the contents of the tags.
Here is an explanation of the content capturing group:
\s matches any whitespace character
\S matches any non-whitespace character
*? matches between zero and unlimited times, as few times as possible, expanding as needed
As I stated before, you really shouldn't do this. You should use a PHP DOM parser instead.

How to get URL value from html with PHP [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I found a problem with my homework on how to get the URL value from html using php. I tried a website to try my code, but i need get some URL with pattern (specific result)
example : https: //video.xxxxxxx/
my code :
$regexp = "/<a\s[^>]*href=([\"\']??)([^\\1 >]*?)\\1[^>]*>(.*)<\/a>/siU";
if(preg_match_all("$regexp", $data, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
echo $match[0];
}
}
You can try this:
<a.*?href\s*=\s*([\"\'])(.*?)\1.*?>.*?<\/a>
As seen here
I've never used PHP before, so you might have to use \\1 instead of \1
Explanation:
It's tedious to explain every single element of this, so I'll give you a general idea. First you match the a tag, followed by any number of characters, styles, or different attributes, then followed by href=. Here, we start the capturing group 1, which contains your ' or ". Capturing group 2 contains your website's url without the quotations. Then we use \1 to refer to the type of quotation first used.
If you want the text within the a tag, for whatever reason, you can refer to it using \3
Do note: You'll need to use match[2] instead of match[0]

Regex: Expression selecting more than expected [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 5 years ago.
I am using the following regex
'/\#(.*)\((.*)\)/'
And I am trying to get #ONE(TWO) one and two from the expression. Which works as long as it's the only time that it can be found before an end of line (I think)
I am quite green with regex and I really cannot understand what I am doing wrong.
What I need is to be able to get all the ONE/TWO couples. Can you please help me.
I am working with PHP and the following function
$parsed_string = preg_replace_callback(
// Placeholder for not previously created article
// Pattern example: #George Ioannidis(person)
'/\#(.*)\((.*)\)/',
function ($matches) {
return $this->parsePlaceholders( $matches );
},
$string
);
The results I am getting from https://regexr.com
* expression is greedy by default. For example such regexp (.*)a will return you bdeabde result on bdeabdea string. You should use special ? symbol for non-greedy * behavior. In your case try to use /\#(.*?)\((.*?)\)/ regexp.

how to write regular expression for this by php? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regex - Greedyness - matching HTML tags, content and attributes
The text I want to parse is something like this:
Dir: Vinton Heuck, Ciro Nieli
With: Eric Loomis, Bumper Robinson, Dawn Olivieri
Usually, there're one or two anchor elements after "Dir" and multiple anchor elements after "With".
What I want to do is get all values of anchor elements after "Dir" and before "With". I tried some regular expression like this:
preg_match_all("/Dir: <a href=\"\/name\/.+\/\">(.+)<\/a>/", $content, $matches);
But this only works when there's only one anchor element after "Dir". Any suggestions? Thanks!
i think you are missing some grouping instruction "()+" to get not only one but one or two links, take a look at this to test your regex.
You would have to group your regex for finding the anchor tag, and use + for one or more.
Something like:
/Dir: (<a href=\"\/name\/.+\/\">(.+)<\/a>)+/
You'd have to edit to take into account the comma, but it will get you started.
Assuming that the line that contains "Dir:" appears only once:
preg_match_all("/(<([[:graph:]]+)[^>]*>)(.*?)(<\/\\2>)/", preg_replace("/[[:blank:]]*With:.*/","",$content), $matches);
print_r($matches[3]);

How to extract an id from facebook video link with regular expression? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Regex to get value of URL parameter?
If you have a facebook link such as
http://www.facebook.com/photo.php?v=107084586333124
What regular expression would extract the number at the end 107084586333124
edit: I use php
Use PHP's built-in functions to reliably pull query string variables out of a URL. You would use something like:
parse_str(parse_url($url , PHP_URL_QUERY), $v);//where $v is not set yet or an empty array
$v = #$v['v'];//will now contain the value of v or be null if not found
I'm not familiar with the PHP regex engine, but this simple regular expression should do the trick. You will find your number in the first capture group.
v=(\d+)
Replace the following regexes with "".
.*?\?v\=
\D*.*
The remaining data is your number.

Categories