PHP How do I create an if statement using only preg_match? - php

I am trying to append a URL if necessary, and skip over it when not necessary. The think is, I'm learning php right now and I would like to use regular expressions as much as possible. would it be possible to make this code more concise using preg_match? Example:
<?php
$facebook_url = str_replace("facebook.org","facebook.com", trim($_REQUEST['facebook_url']));
$position = strpos($facebook_url, "facebook.com");
if ($position === false) {
$facebook_url = "http://www.facebook.com/" . $facebook_url;
}
?>
But using:
if (!preg_match("/^(http:///www.facebook.com | facebook.com)/i"), $facebook_url)) {
$facebook_url = "http://www.facebook.com/" . $facebook_url;
}
I feel like that should work the way I understand php syntax, but something isn't working right. Thank you in advance.

I don't know why you would want to use regex "as much as possible" as opposed to as much as is needed, which should be very little. In your case the original code is much faster, and you can still do it with less code:
if (stripos($facebook_url, "facebook.com") === false) {
Your regex would require a space after the .com or before the facebook in the alternation. Space matters in regex.

Related

How to get response from a url with two wildcard parameters?

I am trying to return html from a url that has two sections which need to have wildcards. I am pretty sure regex is the way to go (unless there are alternatives I am not aware of?)... but I can't seem to get it right. I don't have any knowledge of regex and was hoping someone could please help me with this.
The wildcards need to support any number of alphanumeric characters, including underscores and as many special characters as possible.
All of these are possible scenarios:
https://www.example.com/type/1062483_name
https://www.example.com/type/name_ii
https://www.example.com/type/name
I've tried using all the regex tools online but I can't seem to be able to match it. Here is what my code looks like at the moment:
$url = $baseurl . $type . $wildcard1 . $name . $wildcard2
function get_http_response_code($url) {
$headers = get_headers($url);
return substr($headers[0], 9, 3);
}
if(get_http_response_code($url) == "200"){
$html = file_get_contents($url);
}
I need (get_http_response_code($url) == "200") to return true.
Any advice on how to make this work would be amazing. Thanks!

Remove certain part of string in PHP [duplicate]

This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P
Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook
You can use the parse_url function from PHP which returns exactly what you want - see
Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions
You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.
Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

PHP Regex problem!

I was creating a Syntax Highlighter in PHP but I was failed! You see when I was creating script comments (//) Syntax Highlighting (gray) , I was facing some problems. So I just created a shortened version of my Syntax Highlighting Function to show you all my problem. See whenever a PHP variable ,i.e., $example, is inserted in between the comment it doesn't get grayed as it should be according to my Syntax Highlighter. You see I'm using preg_replace() to achieve this. But the regex of it which I'm using currently doesn't seem to be right. I tried out almost everything that I know about it, but it doesn't work. See the demo code below.
Problem Demo Code
<?php
$str = '
<?php
//This is a php comment $test and resulted bad!
$text_cool++;
?>
';
$result = str_replace(array('<','>','/'),array('[',']','%%'),$str);
$result = preg_replace("/%%%%(.*?)(?=(\n))/","<span style=\"color:gray;\">$0</span>",$result);
$result = preg_replace("/(?<!\"|'|%%%%\w\s\t)[\$](?!\()(.*?)(?=(\W))/","<span style=\"color:green;\">$0</span>",$result);
$result = str_replace(array('[',']','%%'),array('<','>','/'),$result);
$resultArray = explode("\n",$result);
foreach ($resultArray as $i) {
echo $i.'</br>';
}
?>
Problem Demo Screen
So you see the result I want is that $test in the comment string of the 'Demo Screen' above should also be colored as gray!(See below.)
Can anyone help me solve this problem?
I'm Aware of highlight_string() function!
THANKS IN ADVANCE!
Reinventing the wheel?
highlight_string()
Also, this is why they have parsers, and regex (despite popular demand) should not be used as a parser.
I agree, that you should use existing, parsers. Every ide has a php parser, and many people have written more of them.
That said, I do think it is worth the mental exercise. So, you can replace:
$result = preg_replace("/(?<!\"|')[\$](?!\()(.*?)(?=(\W))/","<span style=\"color:green;\">$0</span>",$result);
with
//regular expression.:
//#([^(%%%%|\"|')]*)([\$](?!\()(.*?)(?=(\W)))#
//replacement text:
//$1<span style=\"color:green;\">$2</span>
$result = preg_replace("#([^(%%%%|\"|')]*)([\$](?!\()(.*?)(?=(\W)))#","$1<span style=\"color:green;\">$2</span>",$result);
Personally, I think your best bet is to use CSS selectors. Replace style=\"color:gray;\" with class="comment-text" and style=\"color:green;\" with class="variable-text" and this CSS should work for you:
.variable-text {
color: #00E;
}
.comment-text .comment-text.variable-text {
color: #DDD;
}
Insert don't use regex to parse irregular languages here
anyway, it looks like you've run into a prime example of why regular expressions are not suited for this kind of problem. You'd be better off looking into PHP's highlight_string functionality
Well, you don't seem to care that php already has a function like this.
But because of the structure of php code one cannot simply use a regex for this or walk into mordor (the latter being the easier).
You have to use a parser or you will fly over the cuckoo's nest soon.

Is regex the right tool to find a line of HTML?

I have a PHP script that pulls some content off of a server, but the problem is that the line on which the content is changes every day, so I can't just pull a specific line. However, the content is contained within a div that has a unique id. Is it possible (and is it the best way) for regex to search for this unique id and then pass the line of which it's on back to my script?
Example:
HTML file:
<html><head><title>Example</title></head>
<body>
<div id="Alpha"> Blah blah blah </div>
<div id="Beta"> Blah Blah Blah </div>
</body>
</html>
So let's say that I'm looking for the line with an opening div tag with an id of alpha. The code should return 3, because on the third line is the div with the id of alpha.
At the risk of providing more up-votes for Jeff who has already crossed the mountains of madness... see here
The argument rages back and forth, but... it's is a simple one-off or little used script you are writing then sure use regex, if it's more complex and needs to be reliable with little future tweaking then I'd suggest using an HTML parser. HTML is a nasty often non-regular beast to tame. Use the right tool for the job... maybe in your case it's regex, or maybe its a full blown parser.
Generally, NO. But if you are sure that the div will always be one line or there is not another div inside it, you can use it without problem. Something like /<div id=\"mydivid\">(.*?)</div>/ or something similar.
Otherwise, DOMDocument would be a more sane way.
EDIT See from your HTML example. My answer would be "YES". RegEx is a very good tool for this.
I assume that you have the HTML as a continuous text not as lines (which will be slightly different). I also assume that you want the line number more that the line content.
Here is a rought PHP code to extract it. (just to give some idea)
$HTML =
"<html><head><title>Example</title></head>
<body>
<div id=\"Alpha\"> Blah blah blah </div>
<div id=\"Beta\"> Blah Blah Blah </div>
</body>
</html>";
$ID = "Alpha";
function GetLineOfDIV($HTML, $ID) {
$RegEx_Alpha = '/\n(<div id="'.$ID.'">.*?<\/div>)\n/m';
$Index = preg_match($RegEx_Alpha, $HTML, $Match, PREG_OFFSET_CAPTURE);
$Match = $Match[1]; // Only the one in '(...)'
if ($Match == "")
return -1;
//$MatchStr = $Match[0]; Since you do not want it, so we comment it out.
$MatchOffset = $Match[1];
$StartLines = preg_split("/\n/", $HTML, -1, PREG_SPLIT_OFFSET_CAPTURE);
foreach($StartLines as $I => $StartLine) {
$LineOffset = $StartLine[1];
if ($MatchOffset <= $LineOffset)
return $I + 1;
}
return count($StartLines);
}
echo GetLineOfDIV($HTML, $ID);
I hope I give you some idea.
According to Jeff Atwood, you should never parse HTML using regex.
Since the line number is important to you here and not the actual contents of the div, I'd be inclined not to use regex at all. I'd probably explode() the string into an array and loop through that array looking for your marker. Like so:
<?php
$myContent = "[your string of html here]";
$myArray = explode("\n", $myContent);
$arraylen = count($myArray); // So you don't waste time counting the array at every loop
$lineNo = 0;
for($i = 0; $i < $arraylen; $i++)
{
$pos = strpos($myArray[$i], 'id="Alpha"');
if($pos !== false)
{
$lineNo = $i+1;
break;
}
}
?>
Disclaimer: I haven't got a php installation readily available to test this so some debugging may be required.
Hope this helps as I think it's probably just going to be a waste of time for you to implement a parsing engine just to do something so simple - especially if it's a one-off.
Edit: if the content is impotant to you at this stage too then you can use this in combination with the other answers which provide an adequate regex for the job.
Edit #2: Oh what the hey... here's my two cents:
"/<div.*?id=\"Alpha\".*?>.*?(<div.*//div>)*.*?//div>/m"
The (<div.*//div>) tells the regex engine that it may find nested div tags and to just incorporate them into the match if it finds them rather than just stopping at the first </div>. However this only solves the problem if there is only one level of nesting. If there's more, then regex is not for you sorry :(.
The /m also makes the regex engine ignore linebreaks so you don't have to dirty up your expressions with [\S\s] everywhere.
Again, sorry, I've no environment to test this in at the moment so you may need to debug.
Cheers
Iain
The fact that a unique id is involved, sounds promising, but since it will be a DIV, and not necessarily a single line of HTML, it will be difficult to construct a regular expression, and the usual objections to parsing HTML with regexes apply.
Not recommended.
Instead of RegEx, use a parser that is made especially to handle (messy) HTML. This will make your application less brittle in case the HTML changes slightly, and you don't have to hand-craft custom RegEx each time you want to pull out a new piece of data.
See this Stack Overflow page: Mature HTML Parsers for PHP
#OP since your requirement is that easy, you can just use string methods
$f = fopen("file","r");
if($f){
$s="";
while( !feof($f) ){
$i+=1;
$line = fgets($f,4096);
if (stripos($line,'<div id="Alpha">')!==FALSE){
print "line number: $i\n";
}
}
fclose($f);
}

PHP building a basic calculator system from user input

I'd like to make a script where the user can enter a sum e.g. 4^5+(56+2)/3 or any other basic maths sum (no functions etc.) how would I go about doing this? Presumably regex. Could somebody point me in the right direction - I'm guessing this isn't going to be too easy so I'd just like some advice on where to start and I'll take it from there.
Have a look at this: http://www.webcheatsheet.com/php/regular_expressions.php
It's a good intro to Regular Expressions and how to use them in PHP.
Yes, someone can (and probably will) just give you the regex you need to work this out but it helps a lot if you understand HOW your regex works. They look scary but aren't that bad really...
this is not my code, but there is a great PHP snippet that lets you use Google Calculator to do the calculations. So you can just enter your query (ie, "7+3") using regular/FOIL notation or whatever, and it will return the result.
http://www.hawkee.com/snippet/5812/
<?php
// Google calculator
function do_calculator($query){
if (!empty($query)){
$url = "http://www.google.co.uk/search?q=".urlencode($query);
$f = array("Â", "<font size=-2> </font>", " × 10", "<sup>", "</sup>");$t = array("", "", "e", "^", "");
preg_match('/<h2 class=r style="font-size:138%"><b>(.*?)<\/b><\/h2>/', file_get_contents($url), $matches);
if (!$matches['1']){
return 'Your input could not be processed..';
} else {
return str_replace($f, $t, $matches['1']);
}
} else {
return 'You must supply a query.';
}
}
?>
The easy way to do it is with eval(). If you're accepting arbitrary input from a web form and executing in on a server, though, you MUST be careful to only accept valid expressions. Use a regex for that.

Categories