Regular expression to convert usernames into links like Twitter does - php

in twitter
when you write #moustafa
will change to <a href='user/moustafa'>#moustafa</a>
now i want make the same thing
when write #moustafa + space its change #moustafa only

One regular expression that could be used (shamelessly stolen from the #anywhere javascript library mentioned in another answer) would be:
\B\#([a-zA-Z0-9_]{1,20})
This looks for a non–word-boundary (to prevent a#b [i.e. emails] from matching) followed by #, then between one and 20 (inclusive) characters in that character class. Of course, the anything-except-space route, as in other answers; it depends very much on what values are to be (dis)allowed in the label part of the #label.
To use the highlighted regex in PHP, something like the following could be used to replace a string $subject.
$subject = 'Hello, #moustafa how are you today?';
echo preg_replace('/\B\#([a-zA-Z0-9_]{1,20})/', '$0', $subject);
The above outputs something like:
Hello, #moustafa how are you today?

You're looking for a regular expression that matches #username, where username doesn't have a space? You can use:
#[^ ]+
If you know the allowed characters in a username you can be more specific, like if they have to be alphanumeric:
#[A-Za-z0-9]+

Regular Expressions in PHP are just Strings that start and end with the same character. By convention this character is /
So you can use something like this as an argument to any of the many php regular expression functions:
Not space:
"/[^ ]+/"
Alphanumeric only:
"/[A-Za-z0-9]+/"

Why not use the #anywhere javascript library that Twitter have recently released?

There are several libraries that perform this selection and linking for you. Currently I know of Java, Ruby, and PHP libraries under mzsanford's Github account: http://github.com/mzsanford/twitter-text-rb

Related

Regular expression to filter links

I am using this regular expression to filter .pdffiles from the webpage:
$regex='|<a.*?href="(.*pdf?)"|';
It does the job if the link is like this:
www.xyz.com/trgrrtr/ghtty.pdf
but if the links are something like this, it is unable to filter:
www.xyz.com/trgrrtr/ghtty.pdf?code=KksRHhdVXAoECBFCVFpeXBsBUgYMDQpxd3J2d3F2fDtzfnFuLiErNXNpIG5kYm16aGhpcmxoa05QV1VKUVFFUxQ%3D
What regular expression I should use to filter out this link from a webpage?
First of all, you need to escape the ? otherwise it just makes the f in front of it optional. Then you could do something like this:
$regex = '|<a.*?href="([^"]*\.pdf\?[^"]*)"|';
The use of the negated character class makes sure that you cannot leave the attribute. (.* could consume the attribute-ending " as well, and go on until " matches another double quote further down the string.)
But I really recommend that you use a DOM parser to find the link-elements first. PHP has a built-in one and there is a very nice and convenient 3rd-party alternative.
The blog post An Improved Liberal, Accurate Regex Pattern for Matching URLs may help.

Regex to find function call php

I need the regex to find function calls in strings in php, I have tried to search here on stackoverflow but none of the ones i've tried worked.
this pattern: ^.*([\w][\(].*[\)])
This will match: functionone(fgfg) but also functionone(fgfg) dhgfghfgh functiontwo() as one match. Not 2 separate matches (as in functionone(fgfg) and functiontwo().
I don't know how to write it but I think this is what I need.
1. Any string, followed by (
2. Any string followed by )
And then it should stop, not continue. Any regex-gurus that can help me out?
I see 5 issues with your regex
If you want to match 2 functions in the same row, don't use the anchor ^, this will anchor the regex to the start of the string.
You then don't need .* at the start maybe more something like \w+ (I am not sure what the spec of a function name in PHP is)
if there is only one entry in a character class (and its not a negated one), you don't need the character class
The quantifier between the brackets needs to be a lazy one (followed by a ?). So after this 4 points your regex would look something like
\w+\(.*?\)
Is a regex really the right tool for this job?
Don't use regexp for this... use PHP's built-in tokenizer
A function signature is not a regular language. As such, you cannot use a regular expression to match a function signature. Your current regex will match signatures that are NOT valid function signatures.
What I would suggest you use is the PHP tokenizer.

Regular expression anchor text for a link

I am trying to pull the anchor text from a link that is formatted this way:
<h3><b>File</b> : i_want_this</h3>
I want only the anchor text for the link : "i_want_this"
"variable_text" varies according to the filename so I need to ignore that.
I am using this regex:
<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>
This is matching of course the complete link.
PHP uses a pretty close version to PCRE (PERL Regex). If you want to know a lot about regex, visit perlretut.org. Also, look into Regex generators like exspresso.
For your use, know that regex is greedy. That means that when you specify that you want something, follwed by anything (any repetitions) followed by something, it will keep on going until that second something is reached.
to be more clear, what you want is this:
<a href="
any character, any number of times (regex = .* )
">
any character, any number of times (regex = .* )
</a>
beyond that, you want to capture the second group of "any character, any number of times". You can do that using what are called capture groups (capture anything inside of parenthesis as a group for reference later, also called back references).
I would also look into named subpatterns, too - with those, you can reference your choice with a human readable string rather than an array index. Syntax for those in PHP are (?P<name>pattern) where name is the name you want and pattern is the actual regex. I'll use that below.
So all that being said, here's the "lazy web" for your regex:
<?php
$str = '<h3><b>File</b> : i_want_this</h3>';
$regex = '/(<a href\=".*">)(?P<target>.*)(<\/a>)/';
preg_match($regex, $str, $matches);
print $matches['target'];
?>
//This should output "i_want_this"
Oh, and one final thought. Depending on what you are doing exactly, you may want to look into SimpleXML instead of using regex for this. This would probably require that the tags that we see are just snippits of a larger whole as SimpleXML requires well-formed XML (or XHTML).
I'm sure someone will probably have a more elegant solution, but I think this will do what you want to done.
Where:
$subject = "<h3><b>File</b> : i_want_this</h3>";
Option 1:
$pattern1 = '/(<a href=")(.*)(">)(.*)(<\/a>)/i';
preg_match($pattern1, $subject, $matches1);
print($matches1[4]);
Option 2:
$pattern2 = '()(.*)()';
ereg($pattern2, $subject, $matches2);
print($matches2[4]);
Do not use regex to parse HTML. Use a DOM parser. Specify the language you're using, too.
Since it's in a captured group and since you claim it's matching, you should be able to reference it through $1 or \1 depending on the language.
$blah = preg_match( $pattern, $subject, $matches );
print_r($matches);
The thing to remember is that regex's return everything you searched for if it matches. You need to specify that only care about the part you've surrounded in parenthesis (the anchor text). I'm not sure what language you're using the regex in, but here's an example in Ruby:
string = 'i_want_this'
data = string.match(/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/)
puts data # => outputs 'i_want_this'
If you specify what you want in parenthesis, you can reference it:
string = 'i_want_this'
data = string.match(/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/)[1]
puts data # => outputs 'i_want_this'
Perl will have you use $1 instead of [1] like this:
$string = 'i_want_this';
$string =~ m/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/;
$data = $1;
print $data . "\n";
Hope that helps.
I'm not 100% sure if I understand what you want. This will match the content between the anchor tags. The URL must start with /en/browse/file/, but may end with anything.
#(.*?)#
I used # as a delimiter as it made it clearer. It'll also help if you put them in single quotes instead of double quotes so you don't have to escape anything at all.
If you want to limit to numbers instead, you can use:
#(.*?)#
If it should have just 5 numbers:
#(.*?)#
If it should have between 3 and 6 numbers:
#(.*?)#
If it should have more than 2 numbers:
#(.*?)#
This should work:
<a href="[^"]*">([^<]*)
this says that take EVERYTHING you find until you meet "
[^"]*
same! take everything with you till you meet <
[^<]*
The paratese around [^<]*
([^<]*)
group it! so you can collect that data in PHP! If you look in the PHP manual om preg_match you will se many fine examples there!
Good luck!
And for your concrete example:
<a href="/en/browse/file/variable_text">([^<]*)
I use
[^<]*
because in some examples...
.*?
can be extremely slow! Shoudln't use that if you can use
[^<]*
You should use the tool Expresso for creating regular expression... Pretty handy..
http://www.ultrapico.com/Expresso.htm

How to write regex to find one directory in a URL?

Here is the subject:
http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov
What I need using regex is only the bit before the last / (including that last / too)
The 937IPiztQG string may change; it will contain a-z A-Z 0-9 - _
Here's what I tried:
$code = strstr($url, '/http:\/\/www\.mysite\.com\/files\/get\/([A-Za-z0-9]+)./');
EDIT: I need to use regex because I don't actually know the URL. I have string like this...
a song
more text
oh and here goes some more blah blah
I need it to read that string and cut off filename part of the URLs.
You really don't need a regexp here. Here is a simple solution:
echo basename(dirname('http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov'));
// echoes "937IPiztQG"
Also, I'd like to quote Jamie Zawinski:
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
This seems far too simple to use regex. Use something similar to strrpos to look for the last occurrence of the '/' character, and then use substr to trim the string.
/http:\/\/www.mysite.com\/files\/get\/([^/]+)\/
How about something like this? Which should capture anything that's not a /, 1 or more times before a /.
The greediness of regexp will assure this works fine ^.*/
The strstr() function does not use a regular expression for any of its arguments it's the wrong function for regex replacement.
Are you thinking of preg_replace()?
But a function like basename() would be more appropriate.
Try this
$ok=preg_match('#mysite\.com/files/get/([^/]*)#i',$url,$m);
if($ok) $code=$m[1];
Then give a good read to these pages
http://www.php.net/preg_match
preg_replace
Note
the use of "#" as a delimiter to avoid getting trapped into escaping too many "/"
the "i" flag making match insensitive
(allowing more liberal spellings of the MySite.com domain name)
the $m array of captured results

Replace Local Links, Keep External Links

I have an API call that essentially returns the HTML of a hosted wiki application page. I'm then doing some substr, str_replace and preg_replace kung-fu to format it as per my sites style guides.
I do one set of calls to format my left nav (changing a link to pageX to my wikiParse?page=pageX type of thing). I can safely do this on the left nav. In the body text, however, I cannot safely assume a link is a link to an internal page. It could very well be a link to an external resource. So I need to do a preg_replace that matches href= that is not followed by http://.
Here is my stab at it:
$result = preg_replace('href\=\"(?!http\:\/\/)','href="bla?id=',$result);
This seems to strip out the entire contents on the page. Anyone see where I slipped up? I don't think I'm too far off, just can't see where to go next.
Cheers
The preg_* functions expect Perl-Compatible Regular Expressions (PCRE). The structural difference to normal regular expressions is that the expression itself is wrapped into delimiters that separate the expression from possible modifiers. The classic delimiter is the / but PHP allows any other non-alphanumeric character except the backslash character. See also Intruduction to PCRE in PHP.
So try this:
$result = preg_replace('/href="(?!http:\/\/)/', 'href="bla?id=', $result);
Here href="(?!http://) is the regular expression. But as we use / as delimiters, the occurences of / inside the regular expression must be escaped using backslashes.
Your regexp is missing starting and ending delimiters (by default '/');
$result = preg_replace('/href\=\"(?!http\:\/\/)/','href="bla?id=',$result);

Categories