PHP Remove everything after and including .html in a web address string

PHP Remove everything after and including .html in a web address string - php

I'm trying to remove everything after and including '.html' in a web address string. Current (failing) code is:
$input = 'http://example.com/somepage.html?foo=bar&baz=x';
$result = preg_replace("/(.html)[^.html]+$/i",'',$input);
Desired outcome:
value of $result is 'http://example.com/somepage'
Some other examples of $input that should lead to same value $result:
http://example.com/somepage
http://example.com/somepage.html
http://example.com/somepage.html?url=http://example.com/index.html

Your regular expresson is wrong, it would only match strings ending with <one char> "html" <one or more chars matching ., h, t, m or l>. Since preg_replace just returns the string "as-is" if there was no match, you'd be fine with matching the literal .html and ignoring anything after it:
$result = preg_replace('/\.html.*/', '', $input);

Why not use parse_url instead?

If you ever have issues with the syntax for preg_replace() then you can also use explode():
$input = explode(".html", $input);
$result = $input[0];

Related

Matching a substring (an apostrophe) in a given word using regex

I have a server application which looks up where the stress is in Russian words. The end user writes a word жажда. The server downloads a page from another server which contains the stresses indicated with apostrophes for each case/declension like this жа'жда. I need to find that word in the downloaded page.
In Russian the stress is always written after a vowel. I've been using so far a regex that is a grouping of all possible combinations (жа'жда|жажда'). Is there a more elegant solution using just a regex pattern instead of making a PHP script which creates all these combinations?
EDIT:
I have a word жажда
The downloaded page contains the string жа'жда. (notice the
apostrophe, I do not before-hand know where the apostrophe in the
word is)
I want to match the word with apostrophe (жа'жда).
P.S.: So far I have a PHP script creating the string (жа'жда|жажда') used in regex (apostrophe is only after vowels) which matches it. My goal is to get rid of this script and use just regex in case it's possible.

If I understand your question,
have these options (d'isorder|di'sorder|dis'order|diso'rder|disor'der|disord'er|disorde'r|disorder‌') and one of these is in the downloaded page and I need to find out which one it is
this may suit your needs:
<pre>
<?php
$s = "d'isorder|di'sorder|dis'order|diso'rder|disor'der|disord'er|disorde'r|disorder'|disorde'";
$s = explode("|",$s);
print_r($s);
$matches = preg_grep("#[aeiou]'#", $s);
print_r($matches);
running example: https://eval.in/207282

Uhm... Is this ok with you?
<?php
function find_stresses($word, $haystack) {
$pattern = preg_replace('/[aeiou]/', '\0\'?', $word);
$pattern = "/\b$pattern\b/";
// word = 'disorder', pattern = "diso'?rde'?r"
preg_match_all($pattern, $haystack, $matches);
return $matches[0];
}
$hay = "something diso'rder somethingelse";
find_stresses('disorder', $hay);
// => array(diso'rder)
You didn't specify if there can be more than one match, but if not, you could use preg_match instead of preg_match_all (faster). For example, in Italian language we have àncora and ancòra :P
Obviously if you use preg_match, the result would be a string instead of an array.

Based, on your code, and the requirements that no function is called and disorder is excluded. I think this is what you want. I have added a test vector.
<pre>
<?php
// test code
$downloadedPage = "
there is some disorde'r
there is some disord'er in the example
there is some di'sorder in the example
there also' is some order in the example
there is some disorder in the example
there is some dso'rder in the example
";
$word = 'disorder';
preg_match_all("#".preg_replace("#[aeiou]#", "$0'?", $word)."#iu"
, $downloadedPage
, $result
);
print_r($result);
$result = preg_grep("#'#"
, $result[0]
);
print_r($result);
// the code you need
$word = 'also';
preg_match("#".preg_replace("#[aeiou]#", "$0'?", $word)."#iu"
, $downloadedPage
, $result
);
print_r($result);
$result = preg_grep("#'#"
, $result
);
print_r($result);
Working demo: https://eval.in/207312

Preg_match not returning success

When I try to match some user input code I always get 0 as returned value.
$input = $_POST['input'];
$look = '[a-zA-Z]';
preg_match($look,$input);
For some reason I always get 0 as return value, why?

Couple of issues here in your regex:
Regex needs a delimiter so it should be `/[a-zA-Z]/'
You are probably more than single US English letter so better to use + modifier to match more than 1 letter
Not using start and end of line anchors can cause problems and you may get false positive from your preg_match call.
Combining all suggestions, you can use this regex:
$look = '/^[a-zA-Z]+$/';
OR
$look = '/^[a-z]+$/i';

<?php
$input = $_POST['input'];
$look = '/^[a-zA-Z]/';
preg_match($look,$input);
?>
See Manual

Replace a character only in one special part of a string

When I've a string:
$string = 'word1="abc.3" word2="xyz.3"';
How can I replace the point with a comma after xyz in xyz.3 and keep him after abc in abc.3?

You've provided an example but not a description of when the content should be modified and when it should be kept the same. The solution might be simply:
str_replace("xyz.", "xyz", $input);
But if you explicitly want a more explicit match, say requiring a digit after the ful stop, then:
preg_replace("/xyz\.([0-9])+/", 'xyz\${1}', $input);
(not tested)

something like (sorry i did this with javascript and didn't see the PHP tag).
var stringWithPoint = 'word1="abc.3" word2="xyz.3"';
var nopoint = stringWithPoint.replace('xyz.3', 'xyz3');
in php
$str = 'word1="abc.3" word2="xyz.3"';
echo str_replace('xyz.3', 'xyz3', $str);

You can use PHP's string functions to remove the point (.).
str_replace(".", "", $word2);

It depends what are the criteria for replace or not.
You could split string into parts (use explode or preg_split), then replace dot in some parts (eg. str_replace), next join them together (implode).

how about:
$string = 'word1="abc.3" word2="xyz.3"';
echo preg_replace('/\.([^.]+)$/', ',$1', $string);
output:
word1="abc.3" word2="xyz,3"

regular expression to remove ID=1234 from a string (PHP)

I am trying to create a regular expression to do the following (within a preg_replace)
$str = 'http://www.site.com&ID=1620';
$str = 'http://www.site.com';
How would I write a preg_replace to simply remove the &ID=1620 from the string (taking into account the ID could be variable string length
thanks in advance

You could use...
$str = preg_replace('/[?&;]ID=\d+/', '', $str);
I'm assuming this is meant to be a normal URL, hence the [?&;]. If that's the case, the & should be a ?.
If it's part of a larger list of GET params, you are probably better off using...
parse_str($str, $params);
unset($params['ID']);
$str = http_build_query($params);

I'm guessing that & is not allowed as a character in the ID attribute. In that case, you can use
$result = preg_replace('/&ID=[^&]+/', '', $subject);
or (possibly better, thanks to PaulP.R.O.):
$result = preg_replace('/[?&]ID=[^&]+/', '', $subject);
This will remove &ID= (the second version would also remove ?ID=) plus any amount of characters that follow until the next & or end of string. This approach makes sure that any following attributes will be left alone:
$str = 'http://www.site.com?spam=eggs&ID=1620&foo=bar';
will be changed into
$str = 'http://www.site.com?spam=eggs&foo=bar';

You can just use parse_url
(that is if the URL is of the form: http://something.com?id1=1&id2=2):
$url = parse_url($str);
echo "http://{$url['host]}";

Supposedly valid regular expression doesn't return any data in PHP

I am using the following code:
<?php
$stock = $_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];
$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);
$r_header = '/Prev. Week(.+?)Next Week/';
$r_date = '/\<b\>(.+?)\<\/b\>/';
preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);
echo $date[1];
?>
I've checked the regular expressions here and they appear to be valid. If I check just $url or $data they come out correctly and if I print $data and check the source the code that I'm looking for to use in the regex is in there. If you're interested in checking anything, an example of a proper URL would be http://biz.yahoo.com/research/earncal/g/goog.html
I've tried everything I could think of, including both var_dump($header) and var_dump($date), both of which return empty arrays.
I have been able to create other regular expressions that works. For instance, the following correctly returns "Earnings":
$r_header = '/Company (.+?) Calendar/';
preg_match($r_header,$data,$header);
echo $header[1];
I am going nuts trying to figure out why this isn't working. Any help would be awesome. Thanks.

Your regex doesn't allow for the line breaks in the HTML Try:
$r_header = '/Prev\. Week((?s:.*))Next Week/';
The s tells it to match the newline characters in the . (match any).

Problem is that the HTML has newlines in it, which you need to incorporate with the s regex modifier, as below
<?php
$stock = "goog";//$_GET[s]; //returns stock ticker symbol eg GOOG or YHOO
$first = $stock[0];
$url = "http://biz.yahoo.com/research/earncal/".$first."/".$stock.".html";
$data = file_get_contents($url);
$r_header = '/Prev. Week(.+?)Next Week/s';
$r_date = '/\<b\>(.+?)\<\/b\>/s';
preg_match($r_header,$data,$header);
preg_match($r_date, $header[1], $date);
var_dump($header);
?>

Dot does not match newlines by default. Use /your-regex/s
$r_header should probably be /Prev\. Week(.+?)Next Week/s
FYI: You don't need to escape < and > in a regex.

You want to add the s (PCRE_DOTALL) modifier. By default . doesn't match newline, and I see the page has them between the two parts you look for.
Side note: although they don't hurt (except readability), you don't need a backslash before < and >.

I think this is because you're applying the values to the regex as if it's plain text. However, it's HTML. For example, your regex should be modified to parse:
Prev. Week ...
Not to parse regular plain text like: "Prev. Week ...."

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Remove everything after and including .html in a web address string - php

Why not use parse_url instead?

If you ever have issues with the syntax for preg_replace() then you can also use explode(): $input = explode(".html", $input); $result = $input[0];

Related

Matching a substring (an apostrophe) in a given word using regex

Preg_match not returning success

Replace a character only in one special part of a string

regular expression to remove ID=1234 from a string (PHP)

Supposedly valid regular expression doesn't return any data in PHP

Categories

Resources