He fet uns xicotets canvis a la tokenització, i sembla que funciona<div><br></div><div><ol style="margin:0px;padding:0px 0px 0px 5px;color:rgb(172,172,172);font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono',monospace,serif;font-size:12px;line-height:21px;white-space:nowrap;background-color:rgb(248,248,248);list-style:none">
<li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(68,0,136)">204c204</span></div>
</li><li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(153,17,17)">< $words<span>[</span>$i<span>]</span> = preg_split<span>(</span>"/<span>(</span><span>[</span>\s\,\.\"\:\;\«\»\-\=\+\?\!\<span>(</span>\<span>)</span>\/<span>]</span>+<span>)</span>/", $words<span>[</span>$i<span>]</span>, -<span>1</span>, PREG_SPLIT_DELIM_CAPTURE<span>)</span>; //then split it on the spaces</span></div>
</li><li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(136,136,34)">---</span></div>
</li><li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(136,136,34)"><span style="color:rgb(0,176,0)">> $words<span>[</span>$i<span>]</span> = preg_split<span>(</span>"/<span>(</span>\s+<span>)</span>/", $words<span>[</span>$i<span>]</span>, -<span>1</span>, PREG_SPLIT_DELIM_CAPTURE<span>)</span>; //then split it on the spaces</span></span></div>
</li><li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(68,0,136)">214c214</span></div>
</li><li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(153,17,17)">< preg_match<span>(</span>"/<span>[</span>^\s\,\.\"\:\;\«\»\-\=\+\?\!\<span>(</span>\<span>)</span>\/<span>]</span>+/i", $words<span>[</span>$i<span>]</span><span>[</span>$j<span>]</span>, $tmp<span>)</span>; //get the word that is in the array slot $i</span></div>
</li><li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(136,136,34)">---</span></div>
</li><li><div style="padding:0px 5px;vertical-align:top;color:rgb(0,0,0);border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);background-color:rgb(255,255,255)"><span style="color:rgb(136,136,34)"><span style="color:rgb(0,176,0)">> preg_match<span>(</span>"/<span>[</span>^\s\,\.\"\:\;\«\»\-\=\+\?\!\<span>(</span>\<span>)</span>\/<span>]</span><span>{</span><span>1</span>,<span>20</span><span>}</span>/i", $words<span>[</span>$i<span>]</span><span>[</span>$j<span>]</span>, $tmp<span>)</span>; //get the word that is in the array slot $i</span></span></div>
</li></ol></div><div><br><div>El primer canvi millora la tokenització, tenint en compte més caràcters a l'hora de separar les paraules. El segon, elimina el límit de 20 caràcters a analitzar.</div><div><br></div><div>
Si ho podeu comprovar...</div><div><br></div><div><ol style="margin:0px;padding:0px 0px 0px 48px;color:rgb(172,172,172);font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono',monospace,serif;font-size:12px;line-height:21px;background-color:rgb(248,248,248)">
</ol>-- <br>< Xavi Ivars ><br>< <a href="http://xavi.ivars.me" target="_blank">http://xavi.ivars.me</a> >
</div></div>