I am trying to create a try/catch loop that I am using for downloading HTML from another website:
foreach($intldes as $id) {
$html = HtmlDomParser::file_get_html('https://nssdc.gsfc.nasa.gov/nmc/spacecraftDisplay.do?id='.$id);
foreach($html->find('#rightcontent') as $id);
foreach($html->find('.urone p') as $element);
foreach($html->find('.urtwo') as $launchdata);
}
If the data exists, it results in the following HTML:
<p><strong>NSSDCA/COSPAR ID:</strong> 2009-038F</p>
<p>ANDE 2, the Atmospheric Neutral Density Experiment 2, is a pair of microsatellites (Castor and Pollux) launched from Cape Canaveral on STS 127 on 15 July 2009 at 22:03 UT and deployed from the payload bay of the shuttle on 30 July 2009 at 17:22 UT.</p>
<p><strong>Launch Date:</strong> 2009-07-15<br/><strong>Launch Vehicle:</strong> Shuttle<br/><strong>Launch Site:</strong> Cape Canaveral, United States<br/></p>
If the data does not exist, I get a Undefined variable: element error, which means that the DOM Parser could not find the HTML that I want to display.
So I need a something that skips the webpages that do not have the required HTML or a variable of NULL returned.
Basically, if the HTML I want or the variable $element does not exist, I want Guzzle to skip that webpage and not load it.
EDIT:
My full function:
public function tester() {
$intldes = DB::table('examples')->pluck('id');
foreach ($intldes as $query) {
$html = HtmlDomParser::file_get_html('https://example.com?id='.$query);
$elements = $html->find('.urone p', 0);
if (is_array($elements)) {
foreach($html->find('#rightcontent') as $rawid);
foreach($html->find('.urone p') as $rawdescription);
foreach($html->find('.urtwo') as $launchdata);
//-- Data Parser --//
//Intldes
$intldesgetter = strip_tags($rawid->first_child()->next_sibling()->next_sibling()); //Get Element and Remove Tags
$intldesformat = substr($intldesgetter, ($pos = strpos($intldesgetter, ':')) !== false ? $pos + 3 : 0); //Remove Title
$dbintldes = ltrim($intldesformat); //Remove Blank-space
//Description
$description = strip_tags($rawdescription);
$dbdescription = ltrim($description);
//Launch Data
$launchdate = $launchdata->first_child()->next_sibling()->next_sibling()->next_sibling();
$explode = explode("<br/>", $launchdate);
$newArray = array_map(function($v){
return trim(strip_tags($v));
}, $explode);
$dblaunchdate = substr($newArray[0], ($pos = strpos($newArray[0], ':')) !== false ? $pos + 3 : 0);
$dblaunchvehicle = substr($newArray[1], ($pos = strpos($newArray[1], ':')) !== false ? $pos + 3 : 0);
$dblaunchsite = substr($newArray[2], ($pos = strpos($newArray[2], ':')) !== false ? $pos + 3 : 0);
//Data Saver
DB::table('descriptions')->insert(
['intldes' => $dbintldes, 'description' => strip_tags($dbdescription), 'launch_date' => $dblaunchdate, 'launch_vehicle' => $dblaunchvehicle, 'launch_site' => $dblaunchsite]
);
echo "Success";
} else {
echo "$query does not exist";
continue;
};
}
}
I think you are getting error here in your code:
foreach($html->find('.urone p') as $element);
From my experience, I would recommend to you that you should check for the availability of the HTML tag first before iterating in the foreach loop.
You can use either is_object() or is_array() to get work around your problem. When you search for a single element, an object is returned. When you search for a set of elements, an array of objects is returned.
As you are searching for set of elements then you can use
$elements = $html->find('.urone p');
if (is_array($elements)) {
//continue
}
Related
I store data from an article in a .txt file.
A txt file looks like this:
id_20201010120010 // id of article
Sport // category of article
data/uploads/image-1602324010_resized.jpg // image of article
Champions League // title of article
Nunc porttitor ut augue sit amet maximus... // content of the article
2020-10-10 12:00 // date article
John // author of article
oPXWlZp+op7B0+/v5Y9khQ== // encrypted email of author
football,soccer // tags of article
true // boolean (SHOULD BE IGNORED WHEN SEARCHING)
false // boolean (SHOULD BE IGNORED WHEN SEARCHING)
For searching in articles, is use this code below:
$searchthis = strtolower('Nunc');
$searchmatches = [];
foreach($articles as $article) { // Loop through all the articles
$handle = #fopen($article, "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle);
if(strpos(strtolower($buffer), $searchthis) !== FALSE) { // strtolower; search word not case sensitive
$searchmatches[] = $article; // array of articles with search matches
}
}
fclose($handle);
}
}
//show results:
if(empty($searchmatches)) { // if empty array
echo 'no match found';
}
print_r($searchmatches);
This works all fine! But when searching on a word like true, he finds almost all articles because in all articles are the 2 booleans at the last lines. So how can i skip these 2 last lines of the txt file from searching?
One way to do this would be to use file to read the entire file into an array, then array_slice to strip the last two elements from the array. You can then iterate through the array looking for the search value. Note you can use stripos to do a case-insensitive search:
foreach ($articles as $article) {
$data = file($article);
if ($data === false) continue;
$data = array_slice($data, 0, -2);
$search = 'league';
foreach ($data as $value) {
if (stripos($value, $search) !== false) {
$searchmatches[] = $article;
}
}
}
To read your file, instead of using fopen, fgets, etc, like with some C code, just use the file() function. It will read all the file and put it inside an array of lines. Then select the lines where you want to do a search.
<?php
$article = file('article-20201010120010.txt');
// Access each information of the article you need directly.
$id = $article[0];
$category = $article[1];
// etc...
// Or do it like this with the list() operator of PHP:
list($id, $category, $image, $title, $content, $date, $author, $email_encrypted, $tags, $option_1, $option_2) = $article;
// Now do the insensitive seach in the desired fields.
$search = 'porttitor'; // or whatever typed.
if (($pos = stripos($content, $search)) !== false) {
print "Found $search at position $pos\n";
} else {
print "$search not found!\n";
}
I have been using the Yahoo Financial API to download historical stock data from Yahoo. As has been reported on this site, as of mid May, the old API was discontinued. There have been many posts addressed the the form of the new call, e.g.:
https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=315561600&period2=1496087439&interval=1d&events=history&crumb=XXXXXXXXXXX
As well as methods for obtaining the crumb:
Yahoo Finance URL not working
But I must be misunderstanding what the procedure is as I always get an error saying that it "Failed to open stream: HTTP request failed. HTTP/1.0 201 Unauthorized".
Below is my code. Any and all assistance is welcome. I have to admit that I am an old Fortran programmer and my coding reflects this.
Good Roads
Bill
$ticker = "AAPL";
$yahooURL="https://finance.yahoo.com/quote/" .$ticker ."/history";
$body=file_get_contents($yahooURL);
$headers=$http_response_header;
$icount = count($headers);
for($i = 0; $i < $icount; $i ++)
{
$istart = -1;
$istop = -1;
$istart = strpos($headers[$i], "Set-Cookie: B=");
$istop = strpos($headers[$i], "&b=");
if($istart > -1 && $istop > -1)
{
$Cookie = substr ( $headers[$i] ,$istart+14,$istop - ($istart + 14));
}
}
$istart = strpos($body,"CrumbStore") + 22;
$istop = strpos($body,'"', $istart);
$Crumb = substr ( $body ,$istart,$istop - $istart);
$iMonth = 1;
$iDay = 1;
$iYear = 1980;
$timestampStart = mktime(0,0,0,$iMonth,$iDay,$iYear);
$timestampEnd = time();
$url = "https://query1.finance.yahoo.com/v7/finance/download/".$ticker."?period1=".$timestampStart."&period2=".$timestampEnd."&interval=1d&events=history&crumb=".$Cookie."";
while (!copy($url, $newfile) && $iLoop < 10)
{
if($iLoop == 9) echo "Failed to download data." .$lf;
$iLoop = $iLoop + 1;
sleep(1);
}
#Craig Cocca this isn't exactly a duplicate because the reference you gave gives a solution in python which for those of us who use php but haven't learnt python doesn't help much. I'd love to see as solution with php. I've examinied the yahoo page and am able to extract the crumb but can't work out how to put it into a stream and GET call.
My latest (failed) effort is:
$headers = [
"Accept" => "*/*",
"Connection" => "Keep-Alive",
"User-Agent" => sprintf("curl/%s", curl_version()["version"])
];
// open connection to Yahoo
$context = stream_context_create([
"http" => [
"header" => (implode(array_map(function($value, $key) { return sprintf("%s: %s\r\n", $key, $value); }, $headers, array_keys($headers))))."Cookie: $Cookie",
"method" => "GET"
]
]);
$handle = #fopen("https://query1.finance.yahoo.com/v7/finance/download/{$symbol}?period1={$date_now}&period2={$date_now}&interval=1d&events=history&crumb={$Crumb}", "r", false, $context);
if ($handle === false)
{
// trigger (big, orange) error
trigger_error("Could not connect to Yahoo!", E_USER_ERROR);
exit;
}
// download first line of CSV file
$data = fgetcsv($handle);
The two dates are unix coded dates i.e.: $date_now = strtotime($date);
I've now managed to download share price history. At the moment I'm only taking the current price figures but my download method receives historical data for the past year. (i.e. until Yahoo decides to put some other block on the data).
My solution uses the "simple_html_dom.php" parser which I've added to my /includes folder.
Here is the code (modified from the original version from the Harvard CS50 course which I recommend for beginners like me):
function lookup($symbol)
{
// reject symbols that start with ^
if (preg_match("/^\^/", $symbol))
{
return false;
}
// reject symbols that contain commas
if (preg_match("/,/", $symbol))
{
return false;
}
// body of price history search
$sym = $symbol;
$yahooURL='https://finance.yahoo.com/quote/'.$sym.'/history?p='.$sym;
// get stock name
$data = file_get_contents($yahooURL);
$title = preg_match('/<title[^>]*>(.*?)<\/title>/ims', $data, $matches) ? $matches[1] : null;
$title = preg_replace('/[[a-zA-Z0-9\. \| ]* \| /','',$title);
$title = preg_replace('/ Stock \- Yahoo Finance/','',$title);
$name = $title;
// get price data - use simple_html_dom.php (added to /include)
$body=file_get_html($yahooURL);
$tables = $body->find('table');
$dom = new DOMDocument();
$elements[] = null;
$dom->loadHtml($tables[1]);
$x = new DOMXpath($dom);
$i = 0;
foreach($x->query('//td') as $td){
$elements[$i] = $td -> textContent." ";
$i++;
}
$open = floatval($elements[1]);
$high = floatval($elements[2]);
$low = floatval($elements[3]);
$close = floatval($elements[5]);
$vol = str_replace( ',', '', $elements[6]);
$vol = floatval($vol);
$date = date('Y-m-d');
$datestamp = strtotime($date);
$date = date('Y-m-d',$datestamp);
// return stock as an associative array
return [
"symbol" => $symbol,
"name" => $name,
"price" => $close,
"open" => $open,
"high" => $high,
"low" => $low,
"vol" => $vol,
"date" => $date
];
}
I know many of the users have asked this type of question but I am stuck in an odd situation.
I am trying a logic where multiple occurance of a specific pattern having unique identifier will be replaced with some conditional database content if there match is found.
My regex pattern is
'/{code#(\d+)}/'
where the 'd+' will be my unique identifier of the above mentioned pattern.
My Php code is:
<?php
$text="The old version is {code#1}, The new version is {code#2}, The stable version is {code#3}";
$newsld=preg_match_all('/{code#(\d+)}/',$text,$arr);
$data = array("first Replace","Second Replace", "Third Replace");
echo $data=str_replace($arr[0], $data, $text);
?>
This works but it is not at all dynamic, the numbers after #tag from pattern are ids i.e 1,2 & 3 and their respective data is stored in database.
how could I access the content from DB of respective ID mentioned in the pattern and would replace the entire pattern with respective content.
I am really not getting a way of it. Thank you in advance
It's not that difficult if you think about it. I'll be using PDO with prepared statements. So let's set it up:
$db = new PDO( // New PDO object
'mysql:host=localhost;dbname=projectn;charset=utf8', // Important: utf8 all the way through
'username',
'password',
array(
PDO::ATTR_EMULATE_PREPARES => false, // Turn off prepare emulation
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
)
);
This is the most basic setup for our DB. Check out this thread to learn more about emulated prepared statements and this external link to get started with PDO.
We got our input from somewhere, for the sake of simplicity we'll define it:
$text = 'The old version is {code#1}, The new version is {code#2}, The stable version {code#3}';
Now there are several ways to achieve our goal. I'll show you two:
1. Using preg_replace_callback():
$output = preg_replace_callback('/{code#(\d+)}/', function($m) use($db) {
$stmt = $db->prepare('SELECT `content` FROM `footable` WHERE `id`=?');
$stmt->execute(array($m[1]));
$row = $stmt->fetch(PDO::FETCH_ASSOC);
if($row === false){
return $m[0]; // Default value is the code we captured if there's no match in de DB
}else{
return $row['content'];
}
}, $text);
echo $output;
Note how we use use() to get $db inside the scope of the anonymous function. global is evil
Now the downside is that this code is going to query the database for every single code it encounters to replace it. The advantage would be setting a default value in case there's no match in the database. If you don't have that many codes to replace, I would go for this solution.
2. Using preg_match_all():
if(preg_match_all('/{code#(\d+)}/', $text, $m)){
$codes = $m[1]; // For sanity/tracking purposes
$inQuery = implode(',', array_fill(0, count($codes), '?')); // Nice common trick: https://stackoverflow.com/a/10722827
$stmt = $db->prepare('SELECT `content` FROM `footable` WHERE `id` IN(' . $inQuery . ')');
$stmt->execute($codes);
$rows = $stmt->fetchAll(PDO::FETCH_ASSOC);
$contents = array_map(function($v){
return $v['content'];
}, $rows); // Get the content in a nice (numbered) array
$patterns = array_fill(0, count($codes), '/{code#(\d+)}/'); // Create an array of the same pattern N times (N = the amount of codes we have)
$text = preg_replace($patterns, $contents, $text, 1); // Do not forget to limit a replace to 1 (for each code)
echo $text;
}else{
echo 'no match';
}
The problem with the code above is that it replaces the code with an empty value if there's no match in the database. This could also shift up the values and thus could result in a shifted replacement. Example (code#2 doesn't exist in db):
Input: foo {code#1}, bar {code#2}, baz {code#3}
Output: foo AAA, bar CCC, baz
Expected output: foo AAA, bar , baz CCC
The preg_replace_callback() works as expected. Maybe you could think of a hybrid solution. I'll let that as a homework for you :)
Here is another variant on how to solve the problem: As access to the database is most expensive, I would choose a design that allows you to query the database once for all codes used.
The text you've got could be represented with various segments, that is any combination of <TEXT> and <CODE> tokens:
The old version is {code#1}, The new version is {code#2}, ...
<TEXT_____________><CODE__><TEXT_______________><CODE__><TEXT_ ...
Tokenizing your string buffer into such a sequence allows you to obtain the codes used in the document and index which segments a code relates to.
You can then fetch the replacements for each code and then replace all segments of that code with the replacement.
Let's set this up and defined the input text, your pattern and the token-types:
$input = <<<BUFFER
The old version is {code#1}, The new version is {code#2}, The stable version is {code#3}
BUFFER;
$regex = '/{code#(\d+)}/';
const TOKEN_TEXT = 1;
const TOKEN_CODE = 2;
Next is the part to put the input apart into the tokens, I use two arrays for that. One is to store the type of the token ($tokens; text or code) and the other array contains the string data ($segments). The input is copied into a buffer and the buffer is consumed until it is empty:
$tokens = [];
$segments = [];
$buffer = $input;
while (preg_match($regex, $buffer, $matches, PREG_OFFSET_CAPTURE, 0)) {
if ($matches[0][1]) {
$tokens[] = TOKEN_TEXT;
$segments[] = substr($buffer, 0, $matches[0][1]);
}
$tokens[] = TOKEN_CODE;
$segments[] = $matches[0][0];
$buffer = substr($buffer, $matches[0][1] + strlen($matches[0][0]));
}
if (strlen($buffer)) {
$tokens[] = TOKEN_TEXT;
$segments[] = $buffer;
$buffer = "";
}
Now all the input has been processed and is turned into tokens and segments.
Now this "token-stream" can be used to obtain all codes used. Additionally all code-tokens are indexed so that with the number of the code it's possible to say which segments need to be replaced. The indexing is done in the $patterns array:
$patterns = [];
foreach ($tokens as $index => $token) {
if ($token !== TOKEN_CODE) {
continue;
}
preg_match($regex, $segments[$index], $matches);
$code = (int)$matches[1];
$patterns[$code][] = $index;
}
Now as all codes have been obtained from the string, a database query could be formulated to obtain the replacement values. I mock that functionality by creating a result array of rows. That should do it for the example. Technically you'll fire a a SELECT ... FROM ... WHERE code IN (12, 44, ...) query that allows to fetch all results at once. I fake this by calculating a result:
$result = [];
foreach (array_keys($patterns) as $code) {
$result[] = [
'id' => $code,
'text' => sprintf('v%d.%d.%d%s', $code * 2 % 5 + $code % 2, 7 - 2 * $code % 5, 13 + $code, $code === 3 ? '' : '-beta'),
];
}
Then it's only left to process the database result and replace those segments the result has codes for:
foreach ($result as $row) {
foreach ($patterns[$row['id']] as $index) {
$segments[$index] = $row['text'];
}
}
And then do the output:
echo implode("", $segments);
And that's it then. The output for this example:
The old version is v3.5.14-beta, The new version is v4.3.15-beta, The stable version is v2.6.16
The whole example in full:
<?php
/**
* Simultaneous Preg_replace operation in php and regex
*
* #link http://stackoverflow.com/a/29474371/367456
*/
$input = <<<BUFFER
The old version is {code#1}, The new version is {code#2}, The stable version is {code#3}
BUFFER;
$regex = '/{code#(\d+)}/';
const TOKEN_TEXT = 1;
const TOKEN_CODE = 2;
// convert the input into a stream of tokens - normal text or fields for replacement
$tokens = [];
$segments = [];
$buffer = $input;
while (preg_match($regex, $buffer, $matches, PREG_OFFSET_CAPTURE, 0)) {
if ($matches[0][1]) {
$tokens[] = TOKEN_TEXT;
$segments[] = substr($buffer, 0, $matches[0][1]);
}
$tokens[] = TOKEN_CODE;
$segments[] = $matches[0][0];
$buffer = substr($buffer, $matches[0][1] + strlen($matches[0][0]));
}
if (strlen($buffer)) {
$tokens[] = TOKEN_TEXT;
$segments[] = $buffer;
$buffer = "";
}
// index which tokens represent which codes
$patterns = [];
foreach ($tokens as $index => $token) {
if ($token !== TOKEN_CODE) {
continue;
}
preg_match($regex, $segments[$index], $matches);
$code = (int)$matches[1];
$patterns[$code][] = $index;
}
// lookup all codes in a database at once (simulated)
// SELECT id, text FROM replacements_table WHERE id IN (array_keys($patterns))
$result = [];
foreach (array_keys($patterns) as $code) {
$result[] = [
'id' => $code,
'text' => sprintf('v%d.%d.%d%s', $code * 2 % 5 + $code % 2, 7 - 2 * $code % 5, 13 + $code, $code === 3 ? '' : '-beta'),
];
}
// process the database result
foreach ($result as $row) {
foreach ($patterns[$row['id']] as $index) {
$segments[$index] = $row['text'];
}
}
// output the replacement result
echo implode("", $segments);
I'm not too sure what exactly it is I'm doing wrong, and I did check with other questions, and all I kind of inferred was that returning (something called "empty") only supports variables, although that didn't really change anything.
I am getting a very strange error in my code when I run it, and can't make head or tails of it.
Fatal error: Can't use function return value in write context in /home/shortcu1/public_html/projects/friendcodes/newUser.php on line 103
This is the main function that's being called. (in a file called newUser.php)
function isBumping($forumid, $username, $premium){
if($premium == 'true'){
$file = file_get_contents('plist.txt'); // This is the file I'm testing on
echo 'Running code as premium<br>';
} else {
$file = file_get_contents('list.txt');
}
$forumid = $forumid.':'.$username;
$posts = explode(' ', $file);
$posts ($info, $bump) = array_filter($posts, function($item) use ($forumid, $posts){
// This will check for matching forum ID
if(strpos($item, $forumid) !== true){
$pos = strpos($item, ':Day-');
$pos = $pos + 5;
$day = (int) substr($item, $pos, 1); // Converts the stored date to a numerical value. remember 1 = monday, 7 = sunday
$today = date('N');
$bump = false;
if(($day+3) % 7 > $today){
// Old enough to re-bump
return array ($item, $bump);
} else {
// Too recent to re-bump
$bump = true;
return array ($item, $bump);
}
}
});
print_r($posts[1]);
echo '<br>';
print_r($posts[2]);
}
It is being run through the file test.php:
include('newUser.php');
isBumping(1, 'Spitfire', 'true')
The file called plist.txt is as follows:
1:Spitfire:Day-4:8JX-UKR8:8JX-UKR8:Spirit:90
1:Spitfire:Day-4:8JX-UKR8:8JX-UKR8:Spirit:90
1:Spitfire:Day-4:8JX-UKR8:8JX-UKR8:Spirit:90
1:Spitfire:Day-4:8JX-UKR8:8JX-UKR8:Spirit:90
Try changing
$posts ($info, $bump) =
to
$posts =
array_filter returns the array having all non empty element of the array.
You cann't use $post($a, $b) for this
try changing to to a variable
like simply $post.
$test = array('<h1>text1</h1>','<h1>text2<h1>','<h1>text3</h1><p>subtext3</p>');
In a long long texts, I use preg_split cut them into small pieces. I want to remove only h1 tag wraped and without hyperlink.
I hope remove all the text looks like: <h1>text1</h1> //only h1 wraped and without hyperlink.
And remain <h1>text2<h1>,<h1>text3</h1><p>subtext3</p>
Use a loop to go through each array element and find each instance of the string "<". Then look at the next 3 characters. If they're "h1>" then you you have the correct tag. If you ever find a "<" that has a different 3 characters, then its not an "" HTML tag and you can remove this array object.
To remove the given object from the array, you can use unset($array[$index]) and when you're done I recommend using a sort to remove any index skips that may occur.
You'll want to use functions such as strpos to get the position of a string, and substr to get a subset of the given string. php.net is your friend :)
Here is an example function which works with your $test array:
<?php
$test = array('<h1>text1</h1>','<h1>text2<h1>','<h1>text3</h1><p>subtext3</p>');
function removeBadElements(&$array) {
foreach($array as $k => $v) {
// $v is a single array element
$offset = 0;
do {
$pos = strpos($v, '<', $offset);
$offset = $pos + 1;
if($pos === false) { break; }
$tag = substr($v, $pos, 3);
$next = substr($v, $pos+1, 1);
if($next == '/') { continue; }
if($tag == '<h1') { continue; }
else {
unset($array[$k]);
break;
}
} while($offset + 2 < strlen($v));
}
}
echo "\nORIG ARRAY:\n";
print_r($test);
removeBadElements($test);
echo "\n\n-------\nMODIFIED ARRAY:\n\n";
print_r($test);
?>