preg_match() regex pattern to retrieve 3 things from my url - php

I'm a newbie in PHP regex patterns, so i tried to make a pattern for this URL:
$turl=http://ss-3.domian.com/screenshot/50/18/screenshot_multiple/501800/501800_multiple_1_extra_large.jpg
I just want to retrieve 3 things: "3", "50/18", "501800"
So I used this code:
preg_match('#http://ss-(.*?).domain.com/screenshot/(.*?)/screenshot_multiple/(.*?)/(.*?)_multiple_1_extra_large\.jpg#',$turl,$t_url)
So if I use $matches[1]=3; $matches[2]=50/18; $matches[3]=501800, I should get the numbers right??

<?php
$turl = 'http://ss-3.domain.com/screenshot/50/18/screenshot_multiple/501800/501800_multiple_1_extra_large.jpg';
preg_match_all('#http://ss\-([^\.]*)\.domain.com/[^/]+/([^/]*)/([^/]*)/[^/]*/([^/]*)/([^/]*)#msi',$turl,$match);
// For testing
var_dump($match);
?>
You had a typo (domian) in the search string and it wasn't in quotes. This sort of URL is likely to change, so I've made it as generic as possible while still keeping the shape. I think if we knew your problem we would reconsider using regex if possible. Also, reading the function declarations in php.net is a big help and will give you a good understanding of their applications.

Related

How To Regex Search and Replace array_key_exists with isset?

Whats the best way to do a regex search and replace for all instances of array_key_exists() with the more efficient isset()? Please, no Donald Knuth quotes regarding optimizations and yes, I'm aware of the differences between the two functions.
This is what I'm currently using in my Netbeans search and replace:
search for:
array_key_exists\s*\(\s*'([^']*)'\s*,([^)]*)\)
replace with:
isset($2['$1'])
it works well , changing this:
array_key_exists('my_key',$my_array)
to
isset($my_array['my_key'])
but doesn't pick up instances like this:
array_key_exists($my_key,$my_array)
Not the most elegant solution, but adding to your current regex we find both types of search criteria.
array_key_exists\s*(\s*'|$['|\S]\s*,([^)]*))
The best I could do was to run a second search and replace to cover the instances that used variables for both arguments:
array_key_exists($my_key,$my_array)
search and replace 2:
search for:
array_key_exists\s*\(\s*(\$[^,]*)\s*,([^)]*)\)
replace with:
isset($2[$1])
If you need a WIDER spectrum when upgrading the PHP version rather than JUST this upper use case:
Didn't clean it up, but it should catch every instance I could think of.
Search:
array_key_exists\s*\(\s*([^,]*)\s*,\s*((\(\w+\))?[a-z0-9_$'"\{\}\[\]\-\>\:]*(\(\))*[a-z0-9$_\.\{\}\'\"\[\]\-\>\:]*)\)
Replace:
isset($2[$1])

PHP preg_replace();

I've got a problem with regexp function, preg_replace(), in PHP.
I want to get viewstate from html's input, but it doesn't work properly.
This code:
$viewstate = preg_replace('/^(.*)(<input\s+id="__VIEWSTATE"\s+type="hidden"\s+value=")(.*[^"])("\s+name="__VIEWSTATE">)(.*)$/u','^\${3}$',$html);
Returns this:
%0D%0A%0D%0A%3C%21DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+XHTML+1.0+Transitional%2F%2FEN%22+%22http%3A%2F%2Fwww.w3.org%2FTR%2Fxhtml1%2FDTD%2Fxhtml1-transitional.dtd%22%3E%0D%0A%0D%0A%3Chtml+xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml%22+%3E%0D%0A%3Chead%3E%3Ctitle%3E%0D%0A%09Strava.cz%0D%0A%3C%2Ftitle%3E%3Clink+rel%3D%22shortcut+icon%22+href%3D%22..%2FGrafika%2Ffavicon.ico%22+type%3D%22image%2Fx-icon%22+%2F%3E%3Clink+rel%3D%22stylesheet%22+type%3D%22text%2Fcss%22+media%3D%22screen%22+href%3D%22..%2FStyly%2FZaklad.css%22+%2F%3E%0D%0A++++%3Cstyle+type%3D%22text%2Fcss%22%3E%0D%0A++++++++.style1%0D%0A++++++++%7B%0D%0A++++++++++++width%3A+47px%3B%0D%0A++++++++%7D%0D%0A++++++++.style2%0D%0A++++++++%7B%0D%0A++++++++++++width%3A+64px%3B%0D%0A++++++++%7D%0D%0A++++%3C%2Fstyle%3E%0D%0A%0D%0A%3Cscript+type%3D%22text%2Fjavascript%22%3E%0D%0A%0D%0A++var+_gaq+%3D+_gaq+%7C%7C+%5B%5D%3B%0D%0A++_gaq.push%28%5B
EDIT: Sorry, I left this question for a long time. Finally I used DOMDocument.
To be sure i'd split this match into two phases:
Find the relevant input element
Get the value
Because you cannot be certain what the attributes order in the element will be.
if(preg_match('/<input[^>]+name="__VIEWSTATE"[^>]*>/i', $input, $match))
$value = preg_replace('/.*value="([^"]*)".*/i', '$1', $match[0]);
And, of course, always consider DOM and DOMXpath over regex for parsing html/xml.
You should only capture when you're planning on using the data. So most () are obsolete in that regexp pattern. Not a cause for failure but I thought I'd mention it.
Instead of using [^"] to mark that you don't want that character you could use the non-greedy modifier - ?. This makes sure the pattern is matching as little as it can. Since you have name="__VIEWSTATE" following the value this should be safe.
Let's put this in practice and simplify the pattern some. This works as you want:
'/.*<input\s+id="__VIEWSTATE"\s+type="hidden"\s+value="(.+?)"\s+name="__VIEWSTATE">.*/'
I would strongly recommend checking out an alternative to regexp for DOM operations. This makes certain your code works also if the attributes changes order. Plus it's so much nicer to work with.
The main mistake was the use of funciton preg_replace, witch returns the subject - neither the matched pattern nor the replacement. Thank you for your ideas and for the recommendation of DOMDocument. m93a
http://www.php.net/manual/en/function.preg-replace.php#refsect1-function.preg-replace-returnvalues

PHP Regex pattern to match magic search keywords

Ok regex experts. I'm having a ton of trouble trying to make a regex pattern for my needs.
The goal:
Take a search query such as "good food type:post format:gallery" and parse the type or format or both from the string.
This is what I wrote, but doesnt work unless both type and format are present and type comes before format. Ideally, either type or format could be present.
$query = "Great food type:post format:gallery";
preg_match('/(.*?(?<=\btype:)(?P<type>[a-z]*\w+))(.*?(?<=\bformat:)(?P<format>[a-z]*\w+))/', $query, $matches);
I image I need the returned $matches to be named as well right?
Thanks,
I don't think you'll want to use a regex for this. It'll be a pain to maintain and update when you add more operators like type: and format: Also the regex then depends on ordering of what's entered.
A simple approach might be like
$tokens=explode(" ",$searchString);
foreach($tokens as $token){
if(preg_match('~([^:]+:(.*)~',$token,$flagMatch)){
$flags[$flagMatch[1]]=$flagMatch[2];
}
$searchtokens[]=$token
}
Obvious caveat with that example is exploding straight on space so you wouldn't be able to handle "quoted terms" that should be treated as one.

Using regex to get string from URL?

Regex is my bete noire, can anyone help me isolate a string from a URL?
I want to get the page name from a URL which could appear in any of the following ways from an input form:
https://www.facebook.com/PAGENAME?sk=wall&filter=2
http://www.facebook.com/PAGENAME?sk=wall&filter=2
www.facebook.com/PAGENAME
facebook.com/PAGENAME?sk=wall
... and so on.
I can't seem to find a way to isolate the string after .com/ but before ? (if present at all). Is it preg_match, replace or split?
If anyone can recommend a particularly clear and introductory regex guide they found useful, it'd be appreciated.
You can use the parse_url function and then get the last segment from the path of the url:
$parts=parse_url($url);
$path_parts=explode("/", $parts["path"]);
$page=$path_parts[count($path_parts)-1];
For learning and testing regexes I found RegExr, an online tool, very useful: http://gskinner.com/RegExr/
But as others mentioned, parsing the url with appropriate functions might be better in this case.
I think you can use this php function (parse_url) directly instead of using regex.
Use smth like:
substr(parse_url('https://www.facebook.com/PAGENAME?sk=wall&filter=2', PHP_URL_PATH), 1);

PHP regex for filtering out urls from specific domains for use in a vBulletin plug-in

I'm trying to put together a plug-in for vBulletin to filter out links to filesharing sites. But, as I'm sure you often hear, I'm a newb to php let alone regexes.
Basically, I'm trying to put together a regex and use a preg_replace to find any urls that are from these domains and replace the entire link with a message that they aren't allowed. I'd want it to find the link whether it's hyperlinked, posted as plain text, or enclosed in [CODE] bb tags.
As for regex, I would need it to find URLS with the following, I think:
Starts with http or an anchor tag. I believe that the URLS in [CODE] tags could be processed the same as the plain text URLS and it's fine if the replacement ends up inside the [CODE] tag afterward.
Could contain any number of any characters before the domain/word
Has the domain somewhere in the middle
Could contain any number of any characters after the domain
Ends with a number of extentions such as (html|htm|rar|zip|001) or in a closing anchor tag.
I have a feeling that it's numbers 2 and 4 that are tripping me up (if not much more). I found a similar question on here and tried to pick apart the code a bit (even though I didn't really understand it). I now have this which I thought might work, but it doesn't:
<?php
$filterthese = array('domain1', 'domain2', 'domain3');
$replacement = 'LINKS HAVE BEEN FILTERED MESSAGE';
$regex = array('!^http+([a-z0-9-]+\.)*$filterthese+([a-z0-9-]+\.)*(html|htm|rar|zip|001)$!',
'!^<a+([a-z0-9-]+\.)*$filterthese+([a-z0-9-]+\.)*</a>$!');
$this->post['message'] = preg_replace($regex, $replacement, $this->post['message']);
?>
I have a feeling that I'm way off base here, and I admit that I don't fully understand php let alone regexes. I'm open to any suggestions on how to do this better, how to just make it work, or links to RTM (though I've read up a bit and I'm going to continue).
Thanks.
You can use parse_url on the URLs and look into the hashmap it returns. That allows you to filter for domains or even finer-grained control.
I think you can avoid the overhead of this in using the filter_var built-in function.
You may use this feature since PHP 5.2.0.
$good_url = filter_var( filter_var( $raw_url, FILTER_SANITIZE_URL), FILTER_VALIDATE_URL);
Hmm, my first guess: You put $filterthese directly inside a single-quoted string. That single quotes don't allow for variable substitution. Also, the $filterthese is an array, that should first be joined:
var $filterthese = implode("|", $filterthese);
Maybe I'm way off, because I don't know anything about vBulletin plugins and their embedded magic, but that points seem worth a check to me.
Edit: OK, on re-checking your provided source, I think the regexp line should read like this:
$regex = '!(?#
possible "a" tag [start]: )(<a[^>]+href=["\']?)?(?#
offending link: )https?://(?#
possible subdomains: )(([a-z0-9-]+\.)*\.)?(?#
domains to block: )('.implode("|", $filterthese).')(?#
possible path: )(/[^ "\'>]*)?(?#
possible "a" tag [end]: )(["\']?[^>]*>)?!';

Categories