Get element content if text exists next to it - php

I have this HTML in $string:
$string = '<p>random</p>
Test 1 (target1)
<br>
Test 2 (target1)
<br>
Test 3 (skip)
// etc
';
And I have a few terms in $array:
$array = array(
'(target1)',
'(target2)'
);
How can I search through $string to find all terms in $array and grab the content of the <a> tag that precedes it?
So I end up with the following results:
$results = array(
array(
'text' => 'Test 1',
'needle' => 'target1'
),
array(
'text' => 'Test 2',
'needle' => 'target1'
)
);

I will give you an answer using javascript, but php can do the same thing.
You can search through the array 1 string at a time, and finish once no results are found and you have reached the end of your array.
target1Match = s.match(/<.+?>(.+?)<\/.+?> *\(target1\)/);
// target1Match is now [Test 1 (target1), Test 1]
target1Match = target1Match[1];
target2Match = s.match(/<.+?>(.+?)<\/.+?> *\(target2\)/);
// target1Match is now [Test 2 (target2), Test 2]
target2Match = target2Match[1];
You build the regex using variables for "target1 and 2"
matching multiple targets and specific tag
s.match(/<a.+?>(.+?)<\/a> *\((target1|target2)\)/);

Using preg_match_all():
// Assuming your HTML as $str, your terms as $terms
$results = [];
foreach ($terms as $t) {
// Get content of <a> tag preceeding the term
preg_match_all('/<a ?.*>(.*)<\/a>\s+' . preg_quote($t) . '/', $str, $matches);
//Then insert into your result array
foreach ($matches[1] as $m) {
$results[] = [
'text' => $m,
'needle' => $t
];
}
}
Output:
// echo '<pre>' . print_r($results, true) . '</pre>';
Array
(
[0] => Array
(
[text] => Test 1
[needle] => (target1)
)
[1] => Array
(
[text] => Test 2
[needle] => (target1)
)
)
See also: preg_quote()

I'm in the JayBlanchard camp. Here's a solution that rightly uses DomDocument & Xpath with a dynamically generated query to target <a> tags that are immediately followed by text that contains one of the qualifying needles.
For the sample needles, this is the generated query:
//a[following-sibling::text()[1][contains(.,'(target1)') or contains(.,'(target2)')]]
Code: (Demo)
$html = '<p>random</p>
Test 1 (skip)
<br>
Test 2 (target1)
<br>
Test 3 (target1)
<br>
Test 4 (skip)
<br>
Test 5 (target2)
<br>
Test 6 (skip)
';
$needles = [
'(target1)',
'(target2)'
];
$contains = array_reduce($needles, function($carry, $needle) {
return $carry .= ($carry !== null ? ' or ' : '') . "contains(.,'$needle')";
});
$matches = [];
$dom=new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[following-sibling::text()[1][$contains]]") as $node) {
$matches[] = ["text" => $node->nodeValue, "needle" => trim($node->nextSibling->nodeValue)];
}
var_export($matches);
Output:
array (
0 =>
array (
'text' => 'Test 2',
'needle' => '(target1)',
),
1 =>
array (
'text' => 'Test 3',
'needle' => '(target1)',
),
2 =>
array (
'text' => 'Test 5',
'needle' => '(target2)',
),
)

Related

How to check if two strings have the same beginning?

Considering I have one-dimensional ordered array of strings:
$arr = [
'Something else',
'This is option: one',
'This is option: two',
'This is option: 😜',
'This is second option: 2',
'This is second option: 3'
];
I would like to turn it into two-dimensional array, having the common beginning as the key. For example:
$target = [
'Something else',
'This is option:' => [
'one',
'two',
'😜'
],
'This is second option:' => [
'2',
'3'
]
];
It sounds simple, but I have gone completely blank.
function convertArr(array $src): array {
$prevString = null;
$newArray = [];
foreach ($src as $string) {
if ($prevString) {
// stuck here
}
$prevString = $string;
}
return $newArray;
}
Pre-made fiddle: https://3v4l.org/eqGDc
How can I check if two strings start with the same words, without having to loop on each letter?
As of now I have written this overly-complicated function, but I wonder if there is a simpler way:
function convertArr(array $src): array {
$prevString = null;
$newArray = [];
$size = count($src);
for ($i = 0; $i < $size; $i++) {
if (!$prevString || strpos($src[$i], $prevString) !== 0) {
if ($i == $size - 1) {
$newArray[] = $src[$i];
break;
}
$nowWords = explode(' ', $src[$i]);
$nextWords = explode(' ', $src[$i + 1]);
foreach ($nowWords as $k => $v) {
if ($v != $nextWords[$k]) {
break;
}
}
if ($k) {
$prevString = implode(' ', array_splice($nowWords, 0, $k));
$newArray[$prevString][] = implode(' ', $nowWords);
}
} else {
$newArray[$prevString][] = trim(substr($src[$i], strlen($prevString)));
}
}
return $newArray;
}
This might do the job:
function convertArray(array $values): array
{
$newArray = [];
foreach ($values as $value) {
if (stripos($value, ':') !== false) {
$key = strtok($value, ':');
$newArray[$key][] = trim(substr($value, stripos($value, ':') + 1));
}
}
return $newArray;
}
Essentially, based on the format of your array of strings, as long as each string only has one ":" character followed by the option value, this should work well enough.
I'm sure there will be a more advanced and more fail-safe solution but this may be a start.
I haven't got a complete solution, but maybe you can use this as a starting point: The following gets you the longest common starting sequence for the strings in an array of length 2:
var s=["This is option: one","This is option: two"];
var same=s.join('|').match(/(.*)(.*?)\|\1.*/)[1];
// same="This is option: "
In same you will find the longest possible beginning of the two strings in array s. I achieve this by using a regular expression with a greedy and a non-greedy wildcard group and forcing the first group to be repeated.
You could apply this method on slice()-d short arrays of your original sorted input array and monitor whether same stays the same for a number of these sub-arrays. You can then perform your intended grouping operation on sections with the same same.
[[ Sorry, I just realized I coded this in JavaScript and you wanted PHP - but the idea is so simple you can translate that easily into PHP yourself. ]]
Edit
When looking at the question and expected result again it seems to me, that what the OP really wants is to combine elements with similar parts before the colon (:) into a common sub-array. This can be done with the following code:
$arr = [
'Is there anything new and',
'Something old',
'This is option: one',
'This is option: two',
'This is option: 😜',
'This is second option: 2',
'This is second option: 3',
'Abc: def',
'Abc: ghi',
'the same line',
'the same words'
];
foreach($arr as $v) {
$t=array_reverse(explode(':',$v));
$target[isset($t[1])?trim($t[1]):0][]=trim($t[0]);
}
print_r($target)
output:
Array
(
[0] => Array
(
[0] => Is there anything new and
[1] => Something old
[2] => the same line
[3] => the same words
)
[This is option] => Array
(
[0] => one
[1] => two
[2] => 😜
)
[This is second option] => Array
(
[0] => 2
[1] => 3
)
[Abc] => Array
(
[0] => def
[1] => ghi
)
)
See a demo here https://rextester.com/JMB6676

How to join string after last array value? [duplicate]

This question already has answers here:
How add a link on comma separated multidimensional array
(2 answers)
Closed 7 months ago.
I am trying to generate a string from an array. Need to concatenate the array values with a small string AFTER the value. It doesn't work for the last value.
$data = array (
1 => array (
'symbol' => 'salad'
),
2 => array (
'symbol' => 'wine'
),
3 => array (
'symbol' => 'beer'
)
);
$symbols = array_column($data, 'symbol');
$string_from_array = join($symbols, 'bar');
echo($string_from_array);
// expected output: saladbar, winebar, beerbar
// output: saladbar, winebar, beer
You can achieve it a few different ways. One is actually by using implode(). If there is at least one element, we can just implode by the delimiter "bar, " and append a bar after. We do the check for count() to prevent printing bar if there are no results in the $symbols array.
$symbols = array_column($data, "symbol");
if (count($symbols)) {
echo implode("bar, ", $symbols)."bar";
}
Live demo at https://3v4l.org/ms5Ot
You can also achieve the desired result using array_map(), as follows:
<?php
$data = [
1 => ['symbol' => 'salad'],
2 => ['symbol' => 'wine'],
3 => ['symbol' => 'beer']
];
echo join(", ", array_map(
fn($v) => "{$v}bar",
array_column($data, 'symbol')
)
);
See live code
Array_map() takes every element of the array resulting from array_column() pulling out the values from $data and with an arrow function, appends the string "bar". Then the new array yielded by array_map has the values of its elements joined with ", " to form the expected output which is then displayed.
As a recent comment indicated you could eliminate array_column() and instead write code as follows:
<?php
$data = [
1 => ['symbol' => 'salad'],
2 => ['symbol' => 'wine'],
3 => ['symbol' => 'beer']
];
echo join(", ", array_map(
fn($row) => "{$row['symbol']}bar",
$data
)
);
See live code
Note while this 2nd way, may appear more direct, is it? The fact is that as array_map iterates over $data, the arrow function contains code that requires dereferencing behind the scenes, namely "$row['symbol']".
The join() function is an alias of implode() which
Returns a string containing a string representation of all the array
elements in the same order, with the glue string between each element.
So you need to add the last one by yourself
$data = array (
1 => array (
'symbol' => 'salad'
),
2 => array (
'symbol' => 'wine'
),
3 => array (
'symbol' => 'beer'
)
);
$symbols = array_column($data, 'symbol');
$string_from_array = join($symbols, 'bar');
if(strlen($string_from_array)>0)
$string_from_array .= "bar";
echo($string_from_array);
You can use array_column and implode
$data = array (
1 => array (
'symbol' => 'salad'
),
2 => array (
'symbol' => 'wine'
),
3 => array (
'symbol' => 'beer'
)
);
$res = implode("bar,", array_column($data, 'symbol'))."bar";
Live Demo
Try this:
$symbols = array_column($data, 'symbol');
foreach ($symbols as $symbol) {
$symbol = $symbol."bar";
echo $symbol;
}
btw, you can't expect implode to do what you expect, because it places "bar" between the strings, and there is no between after the last string you get from your array. ;)
Another way could be using a for loop:
$res = "";
$count = count($data);
for($i = 1; $i <= $count; $i++) {
$res .= $data[$i]["symbol"] . "bar" . ($i !== $count ? ", " : "");
}
echo $res; //saladbar, winebar, beerbar
Php demo

match format and return tokens from a string

I'm trying to parse a large text using regex in PHP. I know the lines format, bellow shown using sprintf format, for ease of explaining.
So a line contains some known words (or parenthesis). I would like to know the matched format (in the example I printed the formats array key) and extract some relevant data out of the line.
I tried regex formats such as '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/', but besides matching, I could not extract the correct data out of the lines.
$input = [
'new message from Bob [22:105:3905:534]',
'user Dylan posted a question in section General',
'new message from Mary(gold) [19504:8728:18524:78941]'
];
$formats = [
'new message from %s [%d:%d:%d:%d]', // this would actually be something like '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/'
'user %s posted a question in section %s',
'new message from %s(%s) [%d:%d:%d:%d]',
];
foreach ($input as $line) {
foreach ($formats as $key => $format) {
$data = [];
if (preg_match($format, $line, $data)) {
echo 'format: ' . $key . ', data: ' . var_export($data, true) . "\n";
continue;
}
}
}
// should yield:
// format: 0, data: array ( 0 => 'Bob', 1 => 22, 2 => 105, 3 => 3905, 4 => 534, )
// format: 1, data: array ( 0 => 'Dylan', 1 => 'General', )
// format: 2, data: array ( 0 => 'Mary', 1 => 'gold', 2 => 19504, 3 => 8728, 4 => 18524, 5 => 78941, )
I need:
an efficient regex format, for matching a line, using multiple wildcards
a way to extract the wildcards, when a regex format matched a line (maybe the preg_match isnt the best regex php function to use in this case)
I can do this using string functions (strpos and substr), but the code looks awful..
Thanks!
Just a little adjustment to the patterns. Please see the code below.
<?php
$input = [
'new message from Bob [22:105:3905:534]',
'user Dylan posted a question in section General with space',
'new message from Mary(gold) [19504:8728:18524:78941]'
];
$formats = [
'/new message from (\w+) \[(\d+):(\d+):(\d+):(\d+)\]/', // this would actually be something like '/(?<=new message from )(.*)(?=[)(.*)(?=:)(.*)(?=:)(.*)(?=:)(.*)(?=])/'
'/user (\w+) posted a question in section ([\w ]+)/',
'/new message from (\w+)\((\w+)\) \[(\d+):(\d+):(\d+):(\d+)\]/',
];
foreach ($input as $line) {
foreach ($formats as $key => $format) {
$data = [];
if (preg_match($format, $line, $data)) {
array_shift($data);
echo 'format: ' . $key . ', data: ' . var_export($data, true) . "\n";
continue;
}
}
}
// should yield:
// format: 0, data: array ( 0 => 'Bob', 1 => 22, 2 => 105, 3 => 3905, 4 => 534, )
// format: 1, data: array ( 0 => 'Dylan', 1 => 'General', )
// format: 2, data: array ( 0 => 'Mary', 1 => 'gold', 2 => 19504, 3 => 8728, 4 => 18524, 5 => 78941, )
https://3v4l.org/NBgaT
EDIT: I've added an array_shift() to get rid of the text that matched the full pattern.

How to use PHP to find all elements in HTML and get all the positions?

I'm trying to find all the elements of a tag in HTML and get the starting and ending point.
Here's my sample HTML
some content <iframe></iframe> <iframe></iframe> another content
Here's what I have got so far for code.
$dom = HtmlDomParser::str_get_html( $this->content );
$iframes = array();
foreach( $dom->find( 'iframe' ) as $iframe) {
$iframes[] = $iframe;
}
return array(
'hasIFrame' => count( $iframes ) > 0
);
Getting the number of elements is easy but I'm not sure if HTMLDomParser can get the starting and ending position?
What I want is
array(
'hasIFrame' => true,
'numberOfElements => 2,
array (
0 => array (
'start' => $firstStartingElement,
'end' => $firstEndingElement
),
1 => array (
'start' => $secondStartingElement,
'end' => $secondEndingElement
)
)
If you have a look at the official doc (http://simplehtmldom.sourceforge.net/) you can easily found out how many elements of a type there is in your DOM :
// Find all images
foreach($html->find('img') as $element) {
echo $element->src . '<br>';
}
All you have to do is retrieving $html->find('iframe') and verify its size to know if there is at least once
You can do something like this:
$html = "some content <iframe></iframe> <iframe></iframe> another content";
preg_match_all('/<iframe>/', $html, $iframesStartPositions, PREG_OFFSET_CAPTURE);
preg_match_all('/<iframe\/>/', $html, $iframesEndPositions, PREG_OFFSET_CAPTURE);
$iframesPositions = array();
foreach( $dom->find( 'iframe' ) as $key => $iframe) {
$iframesPositions[] = array(
'start' => $iframesStartPositions[0][$key][1],
'end' => $iframesEndPositions[0][$key][1] + 9 // 9 is the length of the ending tag <iframe/>
);
}
return array(
'hasIFrame' => count($iframesPositions) > 0,
'numberOfElements' => count($iframesPositions),
'positions' => $iframesPositions
);

file_get_contents not saving to an array

I have a section in my code which uses file_get_contents to grab the url from the given web page. I also have a section in my code which scans the titles in each link value in my array.
I want end up having an array similar to this :
Array(
Google => array(
[title] => Google
[link] => http://www.google.com
)
)
but no values are saved to my array, even though i can't detect any errors
$links = Array();
$URL = 'http://www.theqlick.com'; // change it for urls to grab
$file = file_get_contents($URL);
// grabs the urls from URL
if( strlen( $file )>0 ) {
$links[] = preg_match_all( "/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/", $file, $links);
}
function Titles() {
global $links;
$str = implode('',array_map('file_get_contents',$links));
error_reporting(E_ERROR | E_PARSE);
$titles = Array();
if( strlen( $str )>0 ) {
$titles[] = preg_match_all( "/\<title\>(.*)\<\/title\>/", $str, $title );
return $title;
return $links;
}
}
$newArray = array();
$j = 0;
foreach( $links as $key => $val ){
$newArray[$key] = array( 'link' => $val, 'title' => $title[1][$j++]);
}
print_r($newArray);
The following code does not seem to return anything
$links[] = preg_match_all( "/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/", $file, $links);
Try the following
$links = Array();
$URL = 'http://www.theqlick.com'; // change it for urls to grab
$file = file_get_contents($URL);
// grabs the urls from URL
if (strlen($file) > 0) {
$links[] = preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $file, $links);
}
var_dump($links);
Output
array
0 =>
array
0 => string 'http://www.w3.org/TR/xhtmll/DTD/xhtmll-transitional.dtd' (length=55)
1 => string 'http://www.w3.org/1999/xhtml' (length=28)
2 => string 'http://www.theqlick.com' (length=23)
3 => string 'http://www.theqlick.com' (length=23)
1 =>
array
0 => string 'd' (length=1)
1 => string 'l' (length=1)
2 => string 'm' (length=1)
3 => string 'm' (length=1)
2 => int 4

Categories