Extracting url from string php - php

I am trying to extract URL out of a string. The format of the string will be:
some text! some numbers http://linktoimage.com/image
I found this post earlier Extract URLs from text in PHP
and I think this solution mentioned there could work:
<?php
$string = "this is my friend's website http://example.com I think it is coll";
echo explode(' ',strstr($string,'http://'))[0]; //"prints" http://example.com
However I do not understand what it actually does. Would someone mind explaining this to me to me ?

You have this string:
this is my friend's website http://example.com I think it is coll
strstr($string,'http://') will return
http://example.com I think it is coll
explode(' ', ...) then will split this resulting string at the space character resulting in
array(
0 => 'http://example.com',
1 => 'I',
2 => 'think',
3 => 'it',
4 => 'is',
5 => 'coll'
)
and finally [0] returns the first item of this array, which is:
http://example.com
Further reading:
http://php.net/manual/en/function.strstr.php
http://php.net/manual/en/function.explode.php

Related

php preg_split ignore comma in specific string

I need some help. What I want is to make ignore a comma in specific string. It is a comma seperated file csv, but the name have a comma, and I need to ignore that.
What I got is
<?php
$pattern = '/([\\W,\\s]+Inc.])|[,]/';
$subject = 'hypertext language, programming, Amazon, Inc., 100';
$limit = -1;
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$result = preg_split ($pattern, $subject, $limit, $flags);
?>
Result is
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon',
3 => ' Inc.',
4 => ' 100',
);
?>
And I want the result to be
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon, Inc.',
3 => ' 100',
);
?>
Thanks for your help :)
Note that [\W,\s] = \W since \W matches any char that is not a letter, digit or underscore. However, it seems you just want to split on a , that is not followed with space(s)*+Inc..
You may use a negative lookahead to achieve this:
/,(?!\s*Inc\.)/
^^^^^^^^^^^^
See the regex demo
The (?!\s*Inc\.) will fail any , match if there are 0+ whitespaces (\s*) followed with a sequence of literal characters Inc. after them.
From your tutorial, if I pull the Amazon information as a CSV, I get the following format. Which you can then parse with one of Php's native functions. This shows you don't need to use explode or regex to handle this data. Use the right tool for the job:
<?php
$csv =<<<CSV
"amzn","Amazon.com, Inc.",765.56,"11/2/2016","4:00pm","-19.85 - -2.53%",10985
CSV;
$array = str_getcsv($csv);
var_dump($array);
Output:
array (size=7)
0 => string 'amzn' (length=4)
1 => string 'Amazon.com, Inc.' (length=16)
2 => string '765.56' (length=6)
3 => string '11/2/2016' (length=9)
4 => string '4:00pm' (length=6)
5 => string '-19.85 - -2.53%' (length=15)
6 => string '10985' (length=5)

strstr returning null, because str_replace isn't good enough

I'm in this situation.
I need to replace some tags into text ma similar input are forcing me to change my code.
My variabiles are,
A string $string with tags
An Array $arr with this fomat { $tag => $values}
Before my code was:
$string = str_replace(array_keys($arr), array_values($arr), $string);
But in case like that:
'alfa' => 1,
'alfa 2' =>3,
and $string is alfa, alfa 2 the output is 1, 1 2 and not 1, 3,
So I changed with strstr function:
$string = strstr($arr,$string);
But the returning value of strstr is null and a see this error message
Warning: strstr() expects parameter 1 to be string
Where I do wrong?
Thanks in advice

Stop regex splitting a matched url with preg_split

Given the following code:
$regex = '/(http\:\/\/|https\:\/\/)([a-z0-9-\.\/\?\=\+_]*)/i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
its returning an array such as:
array (size=4)
0 => string '...' (length=X)
1 => string 'https://' (length=8)
2 => string 'duckduckgo.com/?q=how+much+wood+could+a+wood-chuck+chuck+if+a+wood-chuck+could+chuck+wood' (length=89)
3 => string '...' (length=X)
I would prefer it if the returned array had size=3, with one single URL. Is this possible?
Sure that can be done, just remove those extra matching groups from your regex. Try following code:
$regex = '#(https?://[a-z0-9.?=+_-]*)#i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
Now resulting array will have 3 elements in the array instead of 4.
Besides removing extra grouping I have also simplified your regex also since most of the special characters don't need to be escaped inside character class.

Regex Optional Matches

I'm trying to match two types of strings using the preg_match function in PHP which could be the following.
'_mything_to_newthing'
'_onething'
'_mything_to_newthing_and_some_stuff'
In the third one above, I only want the "mything" and "newthing" so everything that comes after the third part is just some optional text the user could add. Ideally out of the regex would come in the cases of above;
'mything', 'newthing'
'onething'
'mything', 'newthing'
The patterns should match a-zA-Z0-9 if possible :-)
My regex is terrible, so any help would be appreciated!
Thanks in advanced.
Assuming you're talking about _ deliminated text:
$regex = '/^_([a-zA-Z0-9]+)(|_to_([a-zA-Z0-9]+).*)$/';
$string = '_mything_to_newthing_and_some_stuff';
preg_match($regex, $string, $match);
$match = array(
0 => '_mything_to_newthing_and_some_stuff',
1 => 'mything',
2 => '_to_newthing_and_some_stuff',
3 => 'newthing',
);
As far as anything farther, please provide more details and better sample text/output
Edit: You could always just use explode:
$parts = explode('_', $string);
$parts = array(
0 => '',
1 => 'mything',
2 => 'to',
3 => 'newthing',
4 => 'and',
5 => 'some',
6 => 'stuff',
);
As long as the format is consistent, it should work well...

php regex to read select form

I have a source file with a select form with some options, like this:
<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>
I would like to read this file using php and regex, but I don't really know how. Anybody an idea? It would be nice to have an array with the 3 digits code as a key, and the longer string as a value. (so, for example, $arr['TWO'] == '2SK8')
<?php
$options= '
<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>
';
preg_match_all( '#(<option value="([^"]+)">([^<]+)<\/option>)#', $options, $arr);
$result = array();
foreach ($arr[0] as $i => $value)
{
$result[$arr[2][$i]] = $arr[3][$i];
}
print_r($result);
?>
output:
Array
(
[TTO] => 1031
[187] => 187
[TWO] => 2SK8
[411] => 411
[AEL] => Abec 11
[ABE] => Abec11
[ACE] => Ace
[ADD] => Addikt
[AFF] => Affiliate
[ALI] => Alien Workshop
[ALG] => Alligator
[ALM] => Almost
)
What about something like this :
$html = <<<HTML
<option value="TTO">1031</option><option value="187">187</option>
<option value="TWO">2SK8</option><option value="411">411</option>
<option value="AEL">Abec 11</option><option value="ABE">Abec11</option>
<option value="ACE">Ace</option><option value="ADD">Addikt</option>
<option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option>
<option value="ALG">Alligator</option><option value="ALM">Almost</option>
HTML;
$matches = array();
if (preg_match_all('#<option\s+value="([^"]+)">([^<]+)</option>#', $html, $matches)) {
$list = array();
$num_matches = count($matches[0]);
for ($i=0 ; $i<$num_matches ; $i++) {
$list[$matches[1][$i]] = $matches[2][$i];
}
var_dump($list);
}
The output ($list) would be :
array
'TTO' => string '1031' (length=4)
187 => string '187' (length=3)
'TWO' => string '2SK8' (length=4)
411 => string '411' (length=3)
'AEL' => string 'Abec 11' (length=7)
'ABE' => string 'Abec11' (length=6)
'ACE' => string 'Ace' (length=3)
'ADD' => string 'Addikt' (length=6)
'AFF' => string 'Affiliate' (length=9)
'ALI' => string 'Alien Workshop' (length=14)
'ALG' => string 'Alligator' (length=9)
'ALM' => string 'Almost' (length=6)
A few explainations :
I'm using preg_match_all to match as many times as possible
([^"]+) means "everything that is not a double-quote (as that one would mark the end of the value), at least one time, and as many times as possible (+)
([^<]+) means about the same thing, but with < instead of " as end marker
preg_match_all will get me an array containing in $matches[1] the list of all stuff that matched the first set of (), and in $matches[2] what matched the second set of ()
so I need to iterate over the results to re-construct the list that inetrestes you :-)
Hope this helps -- and that you understood what it does and how, so you can help yourself, the next time ;-)
As a sidenote : using regex to "parse" HTML is generally not such a good idea... If you have a full HTML page, you might want to take a look at DOMDocument::loadHTML.
If you don't and the format of the options is not well-defined... Well, maybe it might prove useful to add some stuff to the regex, as a precaution... (Like accepting spaces here and there, accepting other attributes, ...)
Try this out. Just load the file's contents into $raw_html and use this regex to collect the matches. The 3-digit code from the $ith option is $out[i][1], and the longer string is $out[i][2]. You can convert that to an associative array as needed.
$regex = '|<option value="(.{3})">([^<]+)</option>|';
preg_match_all($regex, $raw_html, $out, PREG_SET_ORDER);
print_r($out);

Categories