Machine Learning and PHP with php-ml

ByRodrigo "pokemaobr" Cardoso in

One of the most relevant topics in recent years is machine learning. It has been spoken like never before in solving problems of the most diverse types using artificial intelligence, big data, computer vision and much more.

However, whenever this subject is treated, so-called more performative and statistical-oriented languages ​​such as: python, go, julia, R etc. appear as the most common among developers of this type of solution.

But if we stop to analyze, when we talk about machine learning, we actually speak of mathematical algorithms for analysis, cleaning, tokenization, training and interpretation of data. For example: linear regression, clustering algorithms and decision trees.

And yes, someone stopped and implemented several of these algorithms in PHP. And so the php-ml project came about. That is actually a PHP library with various machine learning algorithms that you can use to train, analyze and interpret data.

To incorporate php-ml into your project, just create a folder and using composer. Make the command:

composer require php-ai/php-ml

Or access the direct source in the project’s github https://github.com/php-ai/php-ml

Among the algorithms that are implemented in php-ml we have: Apriori, Associator, KNearestNeighbors, Naïve Bayes, SVC, as well as neural network algorithms and many others.

In addition to the repository with the library, there is another repository with code samples using the php-ml library. To do this, simply give a git clone in the repository: https://github.com/php-ai/php-ml-examples

An example presented in this repository is language detection. According to the code below:

<?php 

declare(strict_types=1);
namespace PhpmlExamples;

include 'vendor/autoload.php';

use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;

$dataset = new CsvDataset('data/languages.csv', 1);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($dataset->getSamples() as $sample) {
    $samples[] = $sample[0];
}
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$classifier = new SVC(Kernel::RBF, 10000);
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());
echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);

In the above example, the program captures the data from the file ‘data / languages.csv’, and tokenize (transforms the data into numeric arrays), as it was previously said, machine learning algorithms are mathematical algorithms. Then, the data is transformed into numbers and the algorithms are applied to that data. After that, the data is optimized by the fit() and transform() methods. Then, the SVC classifier algorithm is applied, the data is trained in this algorithm and then the confidence percentage of the training is validated and displayed in the code output that percentage.

Well, we know that PHP really is not the best language to work with a lot of processing and with lots of numerical data, however, it is always nice to see that we have more and more great programmers bringing to the language increasingly complex and current projects. Now it’s time to try out this project and learn even more about machine learning.

Leave a comment! 0

read more