I have this HTML in $string:
$string = '<p>random</p>
Test 1 (target1)
<br>
Test 2 (target1)
<br>
Test 3 (skip)
// etc
';
And I have a few terms in $array:
$array = array(
'(target1)',
'(target2)'
);
How can I search through $string to find all terms in $array and grab the content of the <a> tag that precedes it?
So I end up with the following results:
$results = array(
array(
'text' => 'Test 1',
'needle' => 'target1'
),
array(
'text' => 'Test 2',
'needle' => 'target1'
)
);
I will give you an answer using javascript, but php can do the same thing.
You can search through the array 1 string at a time, and finish once no results are found and you have reached the end of your array.
target1Match = s.match(/<.+?>(.+?)<\/.+?> *\(target1\)/);
// target1Match is now [Test 1 (target1), Test 1]
target1Match = target1Match[1];
target2Match = s.match(/<.+?>(.+?)<\/.+?> *\(target2\)/);
// target1Match is now [Test 2 (target2), Test 2]
target2Match = target2Match[1];
You build the regex using variables for "target1 and 2"
matching multiple targets and specific tag
s.match(/<a.+?>(.+?)<\/a> *\((target1|target2)\)/);
Using preg_match_all():
// Assuming your HTML as $str, your terms as $terms
$results = [];
foreach ($terms as $t) {
// Get content of <a> tag preceeding the term
preg_match_all('/<a ?.*>(.*)<\/a>\s+' . preg_quote($t) . '/', $str, $matches);
//Then insert into your result array
foreach ($matches[1] as $m) {
$results[] = [
'text' => $m,
'needle' => $t
];
}
}
Output:
// echo '<pre>' . print_r($results, true) . '</pre>';
Array
(
[0] => Array
(
[text] => Test 1
[needle] => (target1)
)
[1] => Array
(
[text] => Test 2
[needle] => (target1)
)
)
See also: preg_quote()
I'm in the JayBlanchard camp. Here's a solution that rightly uses DomDocument & Xpath with a dynamically generated query to target <a> tags that are immediately followed by text that contains one of the qualifying needles.
For the sample needles, this is the generated query:
//a[following-sibling::text()[1][contains(.,'(target1)') or contains(.,'(target2)')]]
Code: (Demo)
$html = '<p>random</p>
Test 1 (skip)
<br>
Test 2 (target1)
<br>
Test 3 (target1)
<br>
Test 4 (skip)
<br>
Test 5 (target2)
<br>
Test 6 (skip)
';
$needles = [
'(target1)',
'(target2)'
];
$contains = array_reduce($needles, function($carry, $needle) {
return $carry .= ($carry !== null ? ' or ' : '') . "contains(.,'$needle')";
});
$matches = [];
$dom=new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[following-sibling::text()[1][$contains]]") as $node) {
$matches[] = ["text" => $node->nodeValue, "needle" => trim($node->nextSibling->nodeValue)];
}
var_export($matches);
Output:
array (
0 =>
array (
'text' => 'Test 2',
'needle' => '(target1)',
),
1 =>
array (
'text' => 'Test 3',
'needle' => '(target1)',
),
2 =>
array (
'text' => 'Test 5',
'needle' => '(target2)',
),
)
Related
Considering I have one-dimensional ordered array of strings:
$arr = [
'Something else',
'This is option: one',
'This is option: two',
'This is option: 😜',
'This is second option: 2',
'This is second option: 3'
];
I would like to turn it into two-dimensional array, having the common beginning as the key. For example:
$target = [
'Something else',
'This is option:' => [
'one',
'two',
'😜'
],
'This is second option:' => [
'2',
'3'
]
];
It sounds simple, but I have gone completely blank.
function convertArr(array $src): array {
$prevString = null;
$newArray = [];
foreach ($src as $string) {
if ($prevString) {
// stuck here
}
$prevString = $string;
}
return $newArray;
}
Pre-made fiddle: https://3v4l.org/eqGDc
How can I check if two strings start with the same words, without having to loop on each letter?
As of now I have written this overly-complicated function, but I wonder if there is a simpler way:
function convertArr(array $src): array {
$prevString = null;
$newArray = [];
$size = count($src);
for ($i = 0; $i < $size; $i++) {
if (!$prevString || strpos($src[$i], $prevString) !== 0) {
if ($i == $size - 1) {
$newArray[] = $src[$i];
break;
}
$nowWords = explode(' ', $src[$i]);
$nextWords = explode(' ', $src[$i + 1]);
foreach ($nowWords as $k => $v) {
if ($v != $nextWords[$k]) {
break;
}
}
if ($k) {
$prevString = implode(' ', array_splice($nowWords, 0, $k));
$newArray[$prevString][] = implode(' ', $nowWords);
}
} else {
$newArray[$prevString][] = trim(substr($src[$i], strlen($prevString)));
}
}
return $newArray;
}
This might do the job:
function convertArray(array $values): array
{
$newArray = [];
foreach ($values as $value) {
if (stripos($value, ':') !== false) {
$key = strtok($value, ':');
$newArray[$key][] = trim(substr($value, stripos($value, ':') + 1));
}
}
return $newArray;
}
Essentially, based on the format of your array of strings, as long as each string only has one ":" character followed by the option value, this should work well enough.
I'm sure there will be a more advanced and more fail-safe solution but this may be a start.
I haven't got a complete solution, but maybe you can use this as a starting point: The following gets you the longest common starting sequence for the strings in an array of length 2:
var s=["This is option: one","This is option: two"];
var same=s.join('|').match(/(.*)(.*?)\|\1.*/)[1];
// same="This is option: "
In same you will find the longest possible beginning of the two strings in array s. I achieve this by using a regular expression with a greedy and a non-greedy wildcard group and forcing the first group to be repeated.
You could apply this method on slice()-d short arrays of your original sorted input array and monitor whether same stays the same for a number of these sub-arrays. You can then perform your intended grouping operation on sections with the same same.
[[ Sorry, I just realized I coded this in JavaScript and you wanted PHP - but the idea is so simple you can translate that easily into PHP yourself. ]]
Edit
When looking at the question and expected result again it seems to me, that what the OP really wants is to combine elements with similar parts before the colon (:) into a common sub-array. This can be done with the following code:
$arr = [
'Is there anything new and',
'Something old',
'This is option: one',
'This is option: two',
'This is option: 😜',
'This is second option: 2',
'This is second option: 3',
'Abc: def',
'Abc: ghi',
'the same line',
'the same words'
];
foreach($arr as $v) {
$t=array_reverse(explode(':',$v));
$target[isset($t[1])?trim($t[1]):0][]=trim($t[0]);
}
print_r($target)
output:
Array
(
[0] => Array
(
[0] => Is there anything new and
[1] => Something old
[2] => the same line
[3] => the same words
)
[This is option] => Array
(
[0] => one
[1] => two
[2] => 😜
)
[This is second option] => Array
(
[0] => 2
[1] => 3
)
[Abc] => Array
(
[0] => def
[1] => ghi
)
)
See a demo here https://rextester.com/JMB6676
This question already has answers here:
How add a link on comma separated multidimensional array
(2 answers)
Closed 7 months ago.
I am trying to generate a string from an array. Need to concatenate the array values with a small string AFTER the value. It doesn't work for the last value.
$data = array (
1 => array (
'symbol' => 'salad'
),
2 => array (
'symbol' => 'wine'
),
3 => array (
'symbol' => 'beer'
)
);
$symbols = array_column($data, 'symbol');
$string_from_array = join($symbols, 'bar');
echo($string_from_array);
// expected output: saladbar, winebar, beerbar
// output: saladbar, winebar, beer
You can achieve it a few different ways. One is actually by using implode(). If there is at least one element, we can just implode by the delimiter "bar, " and append a bar after. We do the check for count() to prevent printing bar if there are no results in the $symbols array.
$symbols = array_column($data, "symbol");
if (count($symbols)) {
echo implode("bar, ", $symbols)."bar";
}
Live demo at https://3v4l.org/ms5Ot
You can also achieve the desired result using array_map(), as follows:
<?php
$data = [
1 => ['symbol' => 'salad'],
2 => ['symbol' => 'wine'],
3 => ['symbol' => 'beer']
];
echo join(", ", array_map(
fn($v) => "{$v}bar",
array_column($data, 'symbol')
)
);
See live code
Array_map() takes every element of the array resulting from array_column() pulling out the values from $data and with an arrow function, appends the string "bar". Then the new array yielded by array_map has the values of its elements joined with ", " to form the expected output which is then displayed.
As a recent comment indicated you could eliminate array_column() and instead write code as follows:
<?php
$data = [
1 => ['symbol' => 'salad'],
2 => ['symbol' => 'wine'],
3 => ['symbol' => 'beer']
];
echo join(", ", array_map(
fn($row) => "{$row['symbol']}bar",
$data
)
);
See live code
Note while this 2nd way, may appear more direct, is it? The fact is that as array_map iterates over $data, the arrow function contains code that requires dereferencing behind the scenes, namely "$row['symbol']".
The join() function is an alias of implode() which
Returns a string containing a string representation of all the array
elements in the same order, with the glue string between each element.
So you need to add the last one by yourself
$data = array (
1 => array (
'symbol' => 'salad'
),
2 => array (
'symbol' => 'wine'
),
3 => array (
'symbol' => 'beer'
)
);
$symbols = array_column($data, 'symbol');
$string_from_array = join($symbols, 'bar');
if(strlen($string_from_array)>0)
$string_from_array .= "bar";
echo($string_from_array);
You can use array_column and implode
$data = array (
1 => array (
'symbol' => 'salad'
),
2 => array (
'symbol' => 'wine'
),
3 => array (
'symbol' => 'beer'
)
);
$res = implode("bar,", array_column($data, 'symbol'))."bar";
Live Demo
Try this:
$symbols = array_column($data, 'symbol');
foreach ($symbols as $symbol) {
$symbol = $symbol."bar";
echo $symbol;
}
btw, you can't expect implode to do what you expect, because it places "bar" between the strings, and there is no between after the last string you get from your array. ;)
Another way could be using a for loop:
$res = "";
$count = count($data);
for($i = 1; $i <= $count; $i++) {
$res .= $data[$i]["symbol"] . "bar" . ($i !== $count ? ", " : "");
}
echo $res; //saladbar, winebar, beerbar
Php demo
I'm trying to parse a large text using regex in PHP. I know the lines format, bellow shown using sprintf format, for ease of explaining.
So a line contains some known words (or parenthesis). I would like to know the matched format (in the example I printed the formats array key) and extract some relevant data out of the line.
I tried regex formats such as '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/', but besides matching, I could not extract the correct data out of the lines.
$input = [
'new message from Bob [22:105:3905:534]',
'user Dylan posted a question in section General',
'new message from Mary(gold) [19504:8728:18524:78941]'
];
$formats = [
'new message from %s [%d:%d:%d:%d]', // this would actually be something like '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/'
'user %s posted a question in section %s',
'new message from %s(%s) [%d:%d:%d:%d]',
];
foreach ($input as $line) {
foreach ($formats as $key => $format) {
$data = [];
if (preg_match($format, $line, $data)) {
echo 'format: ' . $key . ', data: ' . var_export($data, true) . "\n";
continue;
}
}
}
// should yield:
// format: 0, data: array ( 0 => 'Bob', 1 => 22, 2 => 105, 3 => 3905, 4 => 534, )
// format: 1, data: array ( 0 => 'Dylan', 1 => 'General', )
// format: 2, data: array ( 0 => 'Mary', 1 => 'gold', 2 => 19504, 3 => 8728, 4 => 18524, 5 => 78941, )
I need:
an efficient regex format, for matching a line, using multiple wildcards
a way to extract the wildcards, when a regex format matched a line (maybe the preg_match isnt the best regex php function to use in this case)
I can do this using string functions (strpos and substr), but the code looks awful..
Thanks!
Just a little adjustment to the patterns. Please see the code below.
<?php
$input = [
'new message from Bob [22:105:3905:534]',
'user Dylan posted a question in section General with space',
'new message from Mary(gold) [19504:8728:18524:78941]'
];
$formats = [
'/new message from (\w+) \[(\d+):(\d+):(\d+):(\d+)\]/', // this would actually be something like '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/'
'/user (\w+) posted a question in section ([\w ]+)/',
'/new message from (\w+)\((\w+)\) \[(\d+):(\d+):(\d+):(\d+)\]/',
];
foreach ($input as $line) {
foreach ($formats as $key => $format) {
$data = [];
if (preg_match($format, $line, $data)) {
array_shift($data);
echo 'format: ' . $key . ', data: ' . var_export($data, true) . "\n";
continue;
}
}
}
// should yield:
// format: 0, data: array ( 0 => 'Bob', 1 => 22, 2 => 105, 3 => 3905, 4 => 534, )
// format: 1, data: array ( 0 => 'Dylan', 1 => 'General', )
// format: 2, data: array ( 0 => 'Mary', 1 => 'gold', 2 => 19504, 3 => 8728, 4 => 18524, 5 => 78941, )
https://3v4l.org/NBgaT
EDIT: I've added an array_shift() to get rid of the text that matched the full pattern.
I'm trying to find all the elements of a tag in HTML and get the starting and ending point.
Here's my sample HTML
some content <iframe></iframe> <iframe></iframe> another content
Here's what I have got so far for code.
$dom = HtmlDomParser::str_get_html( $this->content );
$iframes = array();
foreach( $dom->find( 'iframe' ) as $iframe) {
$iframes[] = $iframe;
}
return array(
'hasIFrame' => count( $iframes ) > 0
);
Getting the number of elements is easy but I'm not sure if HTMLDomParser can get the starting and ending position?
What I want is
array(
'hasIFrame' => true,
'numberOfElements => 2,
array (
0 => array (
'start' => $firstStartingElement,
'end' => $firstEndingElement
),
1 => array (
'start' => $secondStartingElement,
'end' => $secondEndingElement
)
)
If you have a look at the official doc (http://simplehtmldom.sourceforge.net/) you can easily found out how many elements of a type there is in your DOM :
// Find all images
foreach($html->find('img') as $element) {
echo $element->src . '<br>';
}
All you have to do is retrieving $html->find('iframe') and verify its size to know if there is at least once
You can do something like this:
$html = "some content <iframe></iframe> <iframe></iframe> another content";
preg_match_all('/<iframe>/', $html, $iframesStartPositions, PREG_OFFSET_CAPTURE);
preg_match_all('/<iframe\/>/', $html, $iframesEndPositions, PREG_OFFSET_CAPTURE);
$iframesPositions = array();
foreach( $dom->find( 'iframe' ) as $key => $iframe) {
$iframesPositions[] = array(
'start' => $iframesStartPositions[0][$key][1],
'end' => $iframesEndPositions[0][$key][1] + 9 // 9 is the length of the ending tag <iframe/>
);
}
return array(
'hasIFrame' => count($iframesPositions) > 0,
'numberOfElements' => count($iframesPositions),
'positions' => $iframesPositions
);
I have a section in my code which uses file_get_contents to grab the url from the given web page. I also have a section in my code which scans the titles in each link value in my array.
I want end up having an array similar to this :
Array(
Google => array(
[title] => Google
[link] => http://www.google.com
)
)
but no values are saved to my array, even though i can't detect any errors
$links = Array();
$URL = 'http://www.theqlick.com'; // change it for urls to grab
$file = file_get_contents($URL);
// grabs the urls from URL
if( strlen( $file )>0 ) {
$links[] = preg_match_all( "/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/", $file, $links);
}
function Titles() {
global $links;
$str = implode('',array_map('file_get_contents',$links));
error_reporting(E_ERROR | E_PARSE);
$titles = Array();
if( strlen( $str )>0 ) {
$titles[] = preg_match_all( "/\<title\>(.*)\<\/title\>/", $str, $title );
return $title;
return $links;
}
}
$newArray = array();
$j = 0;
foreach( $links as $key => $val ){
$newArray[$key] = array( 'link' => $val, 'title' => $title[1][$j++]);
}
print_r($newArray);
The following code does not seem to return anything
$links[] = preg_match_all( "/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/", $file, $links);
Try the following
$links = Array();
$URL = 'http://www.theqlick.com'; // change it for urls to grab
$file = file_get_contents($URL);
// grabs the urls from URL
if (strlen($file) > 0) {
$links[] = preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $file, $links);
}
var_dump($links);
Output
array
0 =>
array
0 => string 'http://www.w3.org/TR/xhtmll/DTD/xhtmll-transitional.dtd' (length=55)
1 => string 'http://www.w3.org/1999/xhtml' (length=28)
2 => string 'http://www.theqlick.com' (length=23)
3 => string 'http://www.theqlick.com' (length=23)
1 =>
array
0 => string 'd' (length=1)
1 => string 'l' (length=1)
2 => string 'm' (length=1)
3 => string 'm' (length=1)
2 => int 4