php regex question for matching google searchterms in url

php regex question for matching google searchterms in url - php

im finding searchwords from google request urls.
im using
preg_match("/[q=](.*?)[&]/", $requesturl, $match);
but it fails when the 'q' parameter is the last parameter of the string.
so i need to fetch everything that comes after 'q=', but the match must stop IF it finds '&'
how to do that?
EDIT:
I eventually landed on this for matching google request url:
/[?&]q=([^&]+)/
Because sometimes they have a param that ends with q. like 'aq=0'

You need /q=([^&]+)/. The trick is to match everything except & in the query.
To build on your query, this is a slightly modified version that will (almost) do the trick, and it's the closest to what you have there: /q=(.*?)(&|$)/. It puts the q= out of the brackets, because inside the brackets it will match either of them, not both together, and at the end you need to match either & or the end of the string ($). There are, though, a few problems with this:
sometimes you will have an extra & at the end of the match; you don't need it. To solve this problem you can use a lookahead query: (?=&|$)
it introduces an extra group at the end (not necessarily bad, but can be avoided) -- actually, this is fixed by 1.
So, if you want a slightly longer query to expand what you have there, here it is: /q=(.*?)(?=&|$)/

Try this:
preg_match("/q=([^&]+)/", $requesturl, $match);
A little explaining:
[q=] will search for either q or =, but not one after another.
[&] is not needed as there is only one character. & is fine.
the ? operator in regex tells it to match 0 or 1 occurrences of the ** preceding** character.
[^&] will tell it to match any character except for &. Which means you'll get all the query string until it hits &.

Related

How do I extract one group from a URL using regex for use in a redirect?

I've read the Best RegEx Trick Ever and tried to wrap my head around the other answers here on Stack Exchange and just can't seem to get it right. Take these three strings:
http://www.test.com/newyork/class-schedule
http://www.test.com/location/newyork/class-schedule
http://www.test.com/location/newyork/training
I need a regex that will extract the newyork from the first string and save it for a replace later, but will NOT match any part of the other strings. Also, for obscure reasons, I can not include http://www.test.com as a condition for matching (so I can't use anything before the slash that precedes newyork). Note that in this scenario, newyork could easily be chicago, atlanta, or any other city name with no spaces or punctuation.
The only thing I've been able to figure out that isolates only newyork in the first string is the following:
/.*\.com\/(.[^\/]*)\/class-schedule/g
However, this relies on using the URL first which I can't use.
Any ideas on how to achieve this WITHOUT using the URL?
[EDIT]
To clarify what I'm looking for, I'm trying to take the results from the first string and add "location" to it, still using regex. So:
http://www.test.com/newyork/class-schedule
would become
http://www.test.com/location/newyork/class-schedule
using something like
http://www.test.com/location/$1/class-schedule

Try this: ~/(\w+)/[-a-z]+?/?(?:\?.*?)*(:?\s|$)~gm
See it working here: https://regex101.com/r/4VMazZ/3.
So it will use the end of URL instead of the beginning and match only the word between slash 2 and 3 from the end. There can be a query string it will still work.
[EDIT 1]
I exchanged 2 chars doing typo in the end so it was capturing one extra group: /(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$). here: https://regex101.com/r/4VMazZ/4
If you use preg_match($pattern, $string, $matches); the result you want (newyork) will be in $matches[1];, $matches[0] contains everything.
You can see the captures in 'MATCH INFORMATION' panel on regex101 in my example!
[EDIT 2] after your comment.
If you want to replace the whole url you have to match the whole URL, something like this: .*?/(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$) will do in this example. See it working here: https://regex101.com/r/4VMazZ/5
[EDIT 3] Add capturing of last part for replacement.
So as you want to reuse last part you need to add capturing parenthesis: .*?/(\w+)/([-a-z]+?)/?(?:\?.*?)*(?:\s|$).
See it working here: https://regex101.com/r/4VMazZ/6

Could this work? See it here.
(?<=location\/|\.\w{3}\/|\.\w{2}\/)(?!location).*?(?=\/|$)
It matches everything following .xxx/ or .xx/ or location/. I don't know if one letter domain exist, in this case, you can add |\.\w\/ to the lookahead at the start of the regex.
(?<=location\/|\.\w{3}\/|\.\w{2}\/) is a lookahead, so it matches the following pattern only if preceded by location/ or .xxx or .xx
.*? matches every character (lazy)
(?=\/|$) end match if next character is / or on line end
Note: If location is counted as part of the url, I don't think what you are asking is possible in regex, as the city name could be anywhere in string. If so, then you could have a list of cities and check what part of the url matches one of them.
EDIT: You need the multiline m flag so $ also matches end of line

Regex match for numbers,letters,space and underscore does not work in PHP

if(preg_match("^[A-Za-z0-9 _]*$^",'name')) -> 1
if(preg_match("^[A-Za-z0-9 _]*$^",'name--')) -> 1 //should be a zero. works in js but php
1st scenario it's fine. but in the 2nd scenario I want it to be a zero but preg_match outputs 1. This regex works in js as I want. What's wrong with PHP code I use?

The * quantifier matches zero characters in the class [A-Za-z0-9 _] followed by the end of the input. The result should not be surprising; the regex works "as intended".
Adding a start of input anchor makes the match fail as expected:
preg_match("^\\^[A-Za-z0-9 _]*$^",'name--')
And of course it might be better to use the classic / delimiter because using ^ (which, as a delimiter, will go at the start of the regex) is IMHO asking for trouble:
preg_match("/^[A-Za-z0-9 _]*$/",'name--')

You use ^ as delimiter, what is ... not so good, because it is the start-anchor too. Thus you have no start anchor, which means, that * covers exactly 0 characters, what is valid.
~^[a-z0-9 _]*$~i

Regular expression doesn't quite work

I have created a Regular Expression (using php) below; which must match ALL terms within the given string that contains only a-z0-9, ., _ and -.
My expression is: '~(?:\(|\s{0,},\s{0,})([a-z0-9._-]+)(?:\s{0,},\s{0,}|\))$~i'.
My target string is: ('word', word.2, a_word, another-word).
Expected terms in the results are: word.2, a_word, another-word.
I am currently getting: another-word.
My Goal
I am detecting a MySQL function from my target string, this works fine. I then want all of the fields from within that target string. It's for my own ORM.
I suppose there could be a situation where by further parenthesis are included inside this expression.

From what I can tell, you have a list of comma-separated terms and wish to find only the ones which satisfy [a-z0-9._\-]+. If so, this should be correct (it returns the correct results for your example at least):
'~(?<=[,(])\\s*([a-z0-9._-]+)\\s*(?=[,)])~i'
The main issues were:
$ at the end, which was anchoring the query to the end of the string
When matching all you continue from the end of the previous match - this means that if you match a comma/close parenthesis at the end of one match it's not there at match at the beginning of the next one. I've solved this with a lookbehind ((?<=...) and a lookahead ((?=...)
Your backslashes need to be double escaped since the first one may be stripped by PHP when parsing the string.
EDIT: Since you said in a comment that some of the terms may be strings that contain commas you will first want to run your input through this:
$input = preg_replace('~(\'([^\']+|(?<=\\\\)\')+\'|"([^"]+|(?<=\\\\)")+")~', '"STRING"', $input);
which should replace all strings with '"STRING"', which will work fine for matching the other regex.

Maybe using of regex is overkill. In this kind of text you can just remove parenthesis and explode string by comma.

How do I modify this regex to validate URIs with empty parameter value in querystring?

I'm using this code to validate URIs in php:
preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $uri)
However, this won't pass for URIs that end with a equals sign.
e.g. http://example.com?query=fish&offset=10 returns true, http://example.com?query=fish&offset= doesn't.
I can't see why this should be the case from the regex as it allows all characters following the ? sign.
Any tips?
Thanks,
Chris

Why don't you use filter_var? ;)

Your RegEx isn't working as you anticipate.
Your second group (.[a-z0-9-]+)* is capturing EVERYTHING past http://e. However, it requires that there are at least 2 characters to work, and since it's greedy, it will capture as much as it possibly can.
Try this instead:
^http(s)?://[a-z0-9-]+\.[a-z0-9-]+(\.[a-z0-9-]+)?(/[-a-z0-9=?&/]*)?$
If need be, change the last capturing group to include any characters you might need to include in your query string or URI.

replace exact match in php

im new to regular expressions in php.
I have some data in which some of the values are stored as zero(0).What i want to do is to replace them with '-'. I dont know which value will get zero as my database table gets updated daily thats why i have to place that replace thing on all the data.
$r_val=preg_replace('/(0)/','-',$r_val);
The code im using is replacing all the zeroes that it finds for eg. it is even replacing zero from 104.67,giving the output 1-4.56 which is wrong. i want that data where value is exact zero that must be replaced by '-' not every zero that it encounter.
Can anyone please help!!
Example of the values that $r_val is having :-
10.31,
391.05,
113393,
15.31,
1000 etc.

This depends alot on how your data is formatted inside $r_val, but a good place to start would be to try:
$r_val = preg_replace('/(?<!\.)\b0\b(?!\.)/', '-', $r_val);
Where \b is a 0-length character representing the start or end of a 'word'.
Strange as it may sound, but the Perl regex documentation is actually really good for explaining the regex part of the preg_* functions, since Perl is where the functionality is actually implemented.

Again, it would be more than helpful if you could supply an example of what the $r_val string really looks like.
Note that \b matches at word boundaries, which would also turn a string like "0.75" into "-.75". Not a desirable result, I guess.

Whilst the other answer does work, it seems overly complex to me. I think you need only to use the ^ and $ chars either side of 0.
$r_val = preg_replace('/^0+$/', '&#45', $r_val);
^ indicates the regex should match from the beginning of the line.
$ indicates the regex should match to the end of the line.
+ means match this pattern 1 or more times
I altered the minus sign to it's html code equivalent too. Paranoid, yes, but we are dealing with numbers after all, so I though throwing a raw minus sign in there might not be the best idea.

Why not just do this?
if ( $r_val == 0 )
$r_val = '-';
You do not need to use a regex for this. In fact, I'd advise against doing so for performance reasons. The operation above is approximately 20x faster than the regex solution.
Also, the PHP manual advises against using regexes for simple replacements:
If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().
http://us.php.net/manual/en/function.str-replace.php
Hope that helps!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php regex question for matching google searchterms in url - php

Related

How do I extract one group from a URL using regex for use in a redirect?

Regex match for numbers,letters,space and underscore does not work in PHP

Regular expression doesn't quite work

How do I modify this regex to validate URIs with empty parameter value in querystring?

replace exact match in php

Categories

Resources