PHP Get all Urls from string - php

So I'm trying to get all the urls from a string with a script that looks like this:
$file = file_get_contents('something.txt');
function getUrls($string) {
preg_match_all('~href=("|\')(.*?)\1~', $string, $out);
print_r($out);
}
getUrls($file);
The urls contained in this document may be imperfect - i.e. "/blah/blah.asp?2". The problem is that when I run this script, I get an array that looks something like this:
Array
(
[0] => Array
(
[0] => href="#A"
[1] => href="#B"
[2] => href="#C"
)
[1] => Array
(
[0] => "
[1] => "
[2] => "
)
[2] => Array
(
[0] => #A
[1] => #B
[2] => #C
)
)
Any idea what could be going on here? I have no idea why it is returning alphabetical lists with hash signs instead of the desired urls. How can I go about just returning the urls?

The way of evil:
$file = file_get_contents('something.txt');
function displayUrls($string) {
$pattern = '~\bhref\s*+=\s*+["\']?+\K(?!#)[^\s"\'>]++~';
preg_match_all($pattern, $string, $out);
print_r($out[0]);
}
displayUrls($file);
The good way:
$doc = new DOMDocument();
#$doc->loadHTMLFile('something.txt');
$links = $doc->getElementsByTagName('a');
foreach($links as $link) {
$href = $link->getAttribute('href');
if ($href[0] != '#') $result[] = $href;
}
print_r($result);

Related

PHP read Netlist from txt file

I have this file format of txt file generated from schematic software:
(
NETR5_2
R6,1
R5,2
)
(
NETR1_2
R4,2
R3,1
R3,2
R2,1
R2,2
R1,1
R1,2
)
I need to get this:
Array
(
[0] => Array
(
[0] => NETR5_2
[1] => R6,1
[2] => R5,2
)
[1] => Array
[0] => NETR1_2
[1] => R4,2
[2] => R3,1
[3] => R3,2
[4] => R2,1
[5] => R2,2
[6] => R1,1
[7] => R1,2
)
Here is code i try but i get all from input string:
$file = file('tangoLista.txt');
/* GET - num of lines */
$f = fopen('tangoLista.txt', 'rb');
$lines = 0;
while (!feof($f)) {
$lines += substr_count(fread($f, 8192), "\n");
}
fclose($f);
for ($i=0;$i<=$lines;$i++) {
/* RESISTORS - check */
if (strpos($file[$i-1], '(') !== false && strpos($file[$i], 'NETR') !== false) {
/* GET - id */
for($k=0;$k<=10;$k++) {
if (strpos($file[$i+$k], ')') !== false || empty($file[$i+$k])) {
} else {
$json .= $k.' => '.$file[$i+$k];
}
}
$resistors_netlist[] = array($json);
}
}
echo '<pre>';
print_r($resistors_netlist);
echo '</pre>';
I need to read between ( and ) and put into array values...i try using checking if line begins with ( and NETR and if yes put into array...but i don't know how to get number if items between ( and ) to get foreach loop to read values and put into array.
Where i im making mistake? Can code be shorter?
Try this approach:
<?php
$f = fopen('test.txt', 'rb');
$resistors_netlist = array();
$current_index = 0;
while (!feof($f)) {
$line = trim(fgets($f));
if (empty($line)) {
continue;
}
if (strpos($line, '(') !== false) {
$resistors_netlist[$current_index] = array();
continue;
}
if (strpos($line, ')') !== false) {
$current_index++;
continue;
}
array_push($resistors_netlist[$current_index], $line);
}
fclose($f);
print_r($resistors_netlist);
This gives me:
Array
(
[0] => Array
(
[0] => NETR5_2
[1] => R6,1
[2] => R5,2
)
[1] => Array
(
[0] => NETR1_2
[1] => R4,2
[2] => R3,1
[3] => R3,2
[4] => R2,1
[5] => R2,2
[6] => R1,1
[7] => R1,2
)
)
We start $current_index at 0. When we see a (, we create a new sub-array at $resistors_netlist[$current_index]. When we see a ), we increment $current_index by 1. For any other line, we just append it to the end of $resistors_netlist[$current_index].
Try this, using preg_match_all:
$text = '(
NETR5_2
R6,1
R5,2
)
(
NETR1_2
R4,2
R3,1
R3,2
R2,1
R2,2
R1,1
R1,2
)';
$chunks = explode(")(", preg_replace('/\)\W+\(/m', ')(', $text));
$result = array();
$pattern = '{([A-z0-9,]+)}';
foreach ($chunks as $row) {
preg_match_all($pattern, $row, $matches);
$result[] = $matches[1];
}
print_r($result);
3v4l.org demo
I'm not the king of regex, so you can find a better way.
The main problem are parenthesis: I don't know what are between closing and next open parenthesis ( )????( ), so first I replace every space, tab, cr or ln between, then I explode the string by )(.
I perform a foreach loop for every element of resulted array, matching every occurrence of A-z0-9, and add array of retrieved values to an empty array that, at end of foreach, will contain desired result.
Please note:
The main pattern is based on provided example: if the values contains other characters then A-z 0-9 , the regex fails.
Edit:
Replaced preliminar regex pattern with `/\)\W+\(/m`

Display all matches from preg match

How can display all the results instead of just the first match from the preg match?
This is the content of $show:
One
Two
Three
This is the PHP code:
preg_match("/<a href=\"(.+?)\">(.+?)<\/a>/", $show, $display);
$xml = "<name>".$display[2]."</name><link>".$display[1]."</link>";
echo $xml;
The output is:
<name>One</name><link>http://website.com/one</link>
But I want it to display all the results like this:
<name>One</name><link>http://website.com/one</link>
<name>Two</name><link>http://website.com/two</link>
<name>Three</name><link>http://website.com/three</link>
this is the output of print_r($display); ...
Array
(
[0] => Array
(
[0] => One
[1] => Two
[2] => Three
)
[1] => Array
(
[0] => http://website.com/one
[1] => http://website.com/two
[2] => http://website.com/three
)
[2] => Array
(
[0] => One
[1] => Two
[2] => Three
)
)
You would use preg_match_all() to get all matches and then iterate through them:
preg_match_all('~(.+?)~s', $html, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
echo "<name>".$m[2]."</name><link>".$m[1]."</link>\n";
}
But I'd recommend using DOM for this task instead.
$doc = new DOMDocument;
$doc->loadHTML($html); // load the HTML data
foreach ($doc->getElementsByTagName('a') as $link) {
echo "<name>".$link->nodeValue."</name><link>".$link->getAttribute('href')."</link>\n";
}
eval.in
you can something like this
$xml = '';
$show = 'One
Two
Three';
preg_match_all("/<a href=\"(.+?)\">(.+?)<\/a>/", $show, $display);
for($i=0; $i<count($display[0]); $i++){
$xml .= "<name>".$display[2][$i]."</name><link>".$display[1][$i]."</link>";
}
echo $xml;
and this will output
<name>One</name><link>http://website.com/one</link><name>Two</name><link>http://website.com/two</link><name>Three</name><link>http://website.com/three</link>
DEMO

PHP Check string contain #(any) [duplicate]

I have a string that has hash tags in it and I'm trying to pull the tags out I think i'm pretty close but getting a multi-dimensional array with the same results
$string = "this is #a string with #some sweet #hash tags";
preg_match_all('/(?!\b)(#\w+\b)/',$string,$matches);
print_r($matches);
which yields
Array (
[0] => Array (
[0] => "#a"
[1] => "#some"
[2] => "#hash"
)
[1] => Array (
[0] => "#a"
[1] => "#some"
[2] => "#hash"
)
)
I just want one array with each word beginning with a hash tag.
this can be done by the /(?<!\w)#\w+/ regx it will work
That's what preg_match_all does. You always get a multidimensional array. [0] is the complete match and [1] the first capture groups result list.
Just access $matches[1] for the desired strings. (Your dump with the depicted extraneous Array ( [0] => Array ( [0] was incorrect. You get one subarray level.)
I think this function will help you:
echo get_hashtags($string);
function get_hashtags($string, $str = 1) {
preg_match_all('/#(\w+)/',$string,$matches);
$i = 0;
if ($str) {
foreach ($matches[1] as $match) {
$count = count($matches[1]);
$keywords .= "$match";
$i++;
if ($count > $i) $keywords .= ", ";
}
} else {
foreach ($matches[1] as $match) {
$keyword[] = $match;
}
$keywords = $keyword;
}
return $keywords;
}
Try:
$string = "this is #a string with #some sweet #hash tags";
preg_match_all('/(?<!\w)#\S+/', $string, $matches);
print_r($matches[0]);
echo("<br><br>");
// Output: Array ( [0] => #a [1] => #some [2] => #hash )

unique pairs storing in array in php

I am stuck at a scenerio where i need to save user input like... (in 1 go i get these reuslt)
$string = "'a':'php,'b':'.Net' ...
'c' 'java'
'c' 'php'
'c' 'java'
'a' 'php'
'a' 'java' ";
Now i need to store all these values in a database (only unique pairs).
WHat i tried so far, exploded $string with "," and stored everything in an array like
$array["a"] = "php";...but this will overwrite a = java too... //problem
I don't need to check in database that if they exist already or not..this is handled already (all dumped data in one go get a unique identifier).
All i need to do is to get unique pairs and dump into database...means
a = php, a = java, b = .net, c = java, c=php
Only solution i could see was...after exploding ...check for the pair in db against new unique identified...mysql_num_rows...if does not exist then dump else dont...
Is there any easy way...??
The best way for your purpose is to create the multidimensional array
<?php
$string = "'a':'php','b':'.Net','c':'java','c':'php','c':'java','a':'php','a':'java'";
$array = array();
$temp_arr = explode(",", $string);
foreach($temp_arr as $key=>$value)
{
list($tempkey,$tempValue) = explode(':', $value);
$tempKey = trim($tempkey,"'");
$tempValue = trim($tempValue,"'");
$array[$tempKey][] = $tempValue;
}
$array = array_map('array_unique',$array);
echo "<pre>";
print_r($array);
?>
output will be
Array
(
[a] => Array
(
[0] => php
[2] => java
)
[b] => Array
(
[0] => .Net
)
[c] => Array
(
[0] => java
[1] => php
)
)
$string = "'a':'php,'b':'.Net','c':'java','c':'php','c':'java','a':'php','a':'java'";
$temp = array_map(function($item) {
list($key, $value) = explode(':', $item);
return array(str_replace("'", "", $key) => str_replace("'", "", $value));
}, explode(",", $string));
$results = array();
foreach($temp as $item) {
$key = key($item);
if(!isset($results[$key]) || !in_array($item[$key], $results[$key])) {
$results[$key][] = $item[$key];
}
}
print_r($results);
Output:
Array
(
[a] => Array
(
[0] => php
[1] => java
)
[b] => Array
(
[0] => .Net
)
[c] => Array
(
[0] => java
[1] => php
)
)

Split string between less and greater than

I need to split this kind of strings to separate the email between less and greater than < >. Im trying with the next regex and preg_split, but I does not works.
"email1#domain.com" <email1#domain.com>
News <news#e.domain.com>
Some Stuff <email-noreply#somestuff.com>
The expected result will be:
Array
(
[0] => "email1#domain.com"
[1] => email#email.com
)
Array
(
[0] => News
[1] => news#e.domain.com
)
Array
(
[0] => Some Stuff
[1] => email-noreply#somestuff.com
)
Code that I am using now:
foreach ($emails as $email)
{
$pattern = '/<(.*?)>/';
$result = preg_split($pattern, $email);
print_r($result);
}
You may use some of the flags available for preg_split: PREG_SPLIT_DELIM_CAPTURE and PREG_SPLIT_NO_EMPTY.
$emails = array('"email1#domain.com" <email1#domain.com>', 'News <news#e.domain.com>', 'Some Stuff <email-noreply#somestuff.com>');
foreach ($emails as $email)
{
$pattern = '/<(.*?)>/';
$result = preg_split($pattern, $email, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($result);
}
This outputs what you expect:
Array
(
[0] => "email1#domain.com"
[1] => email1#domain.com
)
Array
(
[0] => News
[1] => news#e.domain.com
)
Array
(
[0] => Some Stuff
[1] => email-noreply#somestuff.com
)
Splitting on something removes the delimiter (i.e. everything the regex matches). You probably want to split on
\s*<|>
instead. Or you can use preg_match with the regex
^(.*?)\s*<([^>]+)>
and use the first and second capturing groups.
This will do the job. click here for Codepad link
$header = '"email1#domain.com" <email1#domain.com>
News <news#e.domain.com>
Some Stuff <email-noreply#somestuff.com>';
$result = array();
preg_match_all('!(.*?)\s+<\s*(.*?)\s*>!', $header, $result);
$formatted = array();
for ($i=0; $i<count($result[0]); $i++) {
$formatted[] = array(
'name' => $result[1][$i],
'email' => $result[2][$i],
);
}
print_r($formatted);
preg_match_all("/<(.*?)>/", $string, $result_array);
print_r($result_array);
$email='"email1#domain.com" <email1#domain.com>
News <news#e.domain.com>
Some Stuff <email-noreply#somestuff.com>';
$pattern = '![^\>\<]+!';
preg_match_all($pattern, $email,$match);
print_r($match);
Ouput:
Array ( [0] => Array (
[0] => "email1#domain.com"
[1] => email1#domain.com
[2] => News
[3] => news#e.domain.com
[4] => Some Stuff
[5] => email-noreply#somestuff.com ) )
You can also split by <, and get rid of ">" in $result
$pattern = '/</';
$result = preg_split($pattern, $email);
$result = preg_replace("/>/", "", $result);

Categories