I wonder if you anyone can construct a regular expression that can detect if a person searches for something like "site:cnn.com" or "site:www.globe.com.ph/". I've been having the most difficult time figuring it out. Thanks a lot in advance!
Edit: Sorry forgot to mention my script is in PHP.
Ok, for input into an arbitary text field, something as simple as the following will work:
\bsite:(\S+)
where the parentheses will capture whatever site/domain they're trying to search. It won't verify it as valid, but validating urls/domains is complex and there are many easily googlable regexes for doing that, for instance, there's one here.
What are you matching against? A referer url?
Assuming you're matching against a referer url that looks like this:
http://www.google.com/search?client=safari&rls=en-us&q=whatever+site:foo.com&ie=UTF-8&oe=UTF-8
A regex like this should do the trick:
\bsite(?:\:|%3[aA])(?:(?!(?:%20|\+|&|$)).)+
Notes:
The colon after 'site' can either be unencoded or it can be percent encoded. Most user agents will leave it unencoded (which I believe is actually contrary to the standard), but this will handle both
I assumed the site:... url would be right-bounded by the equivalent of a space character, end of field (&) or end of string ($)
I didn't assume x-www-form-urlencoded encoding (spaces == '+') or spaces encoded with percent encoding (space == %20). This will handle both
The (?:...) is a non-capturing group. (?!...) is a negative lookahead.
no it's not for a referrer url. My php script basically spits out information about a domain (e.g. backlinks, pagerank etc) and I need that regex so it will know what the user is searching for. If the user enters something that doesn't match the regex, it does a regular web search instead.
If this is all you are trying to do, I guess I'd take the more simple approach and just do:
$entry = $_REQUEST['q'];
$tokens = split(':', trim($entry));
if (1 < count($tokens) && strtolower($tokens[0]) == 'site')
$site = $tokens[1];
Related
I have the following content in a string (query from the DB), example:
$fulltext = "Thank you so much, {gallery}art-by-stephen{/gallery}. As you know I fell in love with it from the moment I saw it and I couldn’t wait to have it in my home!"
So I only want to extract what it is between the {gallery} tags, I'm doing the following but it does not work:
$regexPatternGallery= '{gallery}([^"]*){/gallery}';
preg_match($regexPatternGallery, $fulltext, $matchesGallery);
if (!empty($matchesGallery[1])) {
echo ('<p>matchesGallery: '.$matchesGallery[1].'</p>');
}
Any suggestions?
Try this:
$regexPatternGallery= '/\{gallery\}(.*)\{\/gallery\}/';
You need to escape / and { with a \ before it. And you where missing start and end / of the pattern.
http://www.phpliveregex.com/p/fn1
Similar to Andreas answer but differ in ([^"]*?)
$regexPatternGallery= '/\{gallery\}([^"]*?)\{\/gallery\}/';
Don't forget to put / at the beginning and the end of the Regex string. That's a must in PHP, different from other programming languages.
{,},/ are characters that can be confused as a Regex logic, so you have to escape it using \ like \{.
Use ? to make the string to non-greedy, thus saves memory. It avoids error when facing this kind of string "blabla {galery}you should only get this{/gallery} but you also got this instead.{/gallery} Rarely happens but be careful anyway".
Try this RegEx:
\{gallery\}(.*?)\{\/gallery\}
The problem with your RegEx was that you did not escape the / in the closing {gallery}. You also need to escape { and }.
You should use .*? for a lazy match, otherwise if there are 2 tags in one string, it will combine them. I.e. {gallery}by-joe{/gallery} and {gallery}by-tim{/gallery} would end up as:
by-joe{/gallery} and {gallery}by-tim
However, using a lazy match, you would get 2 results:
by-joe
by-tim
Live Demo on Regex101
I'm using this code to validate URIs in php:
preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $uri)
However, this won't pass for URIs that end with a equals sign.
e.g. http://example.com?query=fish&offset=10 returns true, http://example.com?query=fish&offset= doesn't.
I can't see why this should be the case from the regex as it allows all characters following the ? sign.
Any tips?
Thanks,
Chris
Why don't you use filter_var? ;)
Your RegEx isn't working as you anticipate.
Your second group (.[a-z0-9-]+)* is capturing EVERYTHING past http://e. However, it requires that there are at least 2 characters to work, and since it's greedy, it will capture as much as it possibly can.
Try this instead:
^http(s)?://[a-z0-9-]+\.[a-z0-9-]+(\.[a-z0-9-]+)?(/[-a-z0-9=?&/]*)?$
If need be, change the last capturing group to include any characters you might need to include in your query string or URI.
I need to replace certain user-entered URLs with embedded flash objects...and I'm having trouble with a regex that I'm using to match the url...I think mainly because the URLs are SEO-friendly and therefore a bit more difficult to parse
URL structure: http://www.site.com/item/item_title_that_can_include_1('_etc-32CHARACTERALPHANUMERICGUID
I need to both detect a match of an URL in that format and capture the 32CHARACTERALPHANUMERICGUID which is always placed after the - in the url
something like this:
$ret = preg_replace('#http://www\.site\.com/item/([^-])-([a-zA-Z0-9]+)#','<embed>itemid=$2</embed>', $ret);
For some reason, the above does not find a match for an URL in the specified format. I'm new to regexes, so I think I'm missing something fairly obvious.
You should check out parse_url().
Examine the results - it was made for parsing URLs. You'll be able to extract the data you require from the tokens returned.
If you are regex crazy, try this...
/^http:\/\/www\.site\.com\/item\/[^-]*\-([a-zA-Z0-9]{32})$/
Your example is almost there, but...
When you do the not character range, i.e. [^-], you still need a quantifier. I placed *, or 0 or more.
You don't seem to use the item title, so we won't bother capturing it.
You should use beginning (^) and end ($) anchors if the string is always exactly like that.
You say the GUID is 32 chars, so we may as well explicitly state that with the {32} quantifier.
I have the following possible string:
'', or '4.', or '*.4' or '4.35'
all the above format are valid, others are all invalid.
basically, if I don't care the digit or word character, this is what I used in PHP for the validation:
else if ( !ereg('^\*|.*\..*$',$bl_objver) )
Now, I would like to add some clientside validation, so I just translate it into javascript:
var ver_reg = new RegExp("^\*|.*\..*$");
if (ver_reg.test(obj_ver) == false)
but firebug always shows some error, like: "invalid quantifier |...*$" etc..
any suggestions?
(I'm not convinced your expression is correct, but for the moment just going with what you have.)
Using the RegExp object, you need to escape the slashes:
var ver_reg = new RegExp("^\\*|.*\\..*$");
Alternatively you can use regex literal notation:
var ver_reg = /^\*|.*\..*$/;
That answers your direct question, but...
As for the expression, well, what you definitely want to correct is the start/end anchors each applying to one side of the alternation.
i.e. you're saying <this>|<that> where <this> is ^\* and <that> is .*\..*$
What you want is ^(?:<this>|<that>)$ to ensure the start/end markers are not part of the alternatives (but using ?: since we're not capturing the group).
So /^(?:\*|.*\..*)$/ using the second example above - this fix would also need applying to the PHP version (which can use the same syntax).
I'd also question your use of . instead of \w or [^.] or similar, but without knowing what you're actually doing, I can't say for sure what makes most sense.
Hope this helps! :)
In PHP, I need to be able to figure out if a string contains a URL. If there is a URL, I need to isolate it as another separate string.
For example: "SESAC showin the Love! http://twitpic.com/1uk7fi"
I need to be able to isolate the URL in that string into a new string. At the same time the URL needs to be kept intact in the original string. Follow?
I know this is probably really simple but it's killing me.
Something like
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/?:#=_#&%~,+$]+/', $string, $matches);
$matches[0] will hold the result.
(Note: this regex is certainly not RFC compliant; it may fetch malformed (per the spec) URLs. See http://www.faqs.org/rfcs/rfc1738.html).
this doesn't account for dashes -. needed to add -
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/\-?:#=_#&%~,+$]+/', $_POST['string'], $matches);
URLs can't contain spaces, so...
\b(?:https?|ftp)://\S+
Should match any URL-like thing in a string.
The above is the pure regex. PHP preg_* and string escaping rules apply before you can use it.
$test = "SESAC showin the Love! http://twitpic.com/1uk7fi";
$myURL= strstr ($test, "http");
echo $myURL; // prints http://twitpic.com/1uk7fi