ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

php-自动换行/剪切HTML字符串中的文本

2019-12-02 01:33:21  阅读:452  来源: 互联网

标签:word-wrap html-parsing html php


在这里,我想做的是:我有一个包含HTML标签的字符串,并且我想使用除HTML标签之外的自动换行功能将其剪切.

我被卡住了:

public function textWrap($string, $width)
{
    $dom = new DOMDocument();
    $dom->loadHTML($string);
    foreach ($dom->getElementsByTagName('*') as $elem)
    {
        foreach ($elem->childNodes as $node)
        {
            if ($node->nodeType === XML_TEXT_NODE)
            {
                $text = trim($node->nodeValue);
                $length = mb_strlen($text);
                $width -= $length;
                if($width <= 0)
                { 
                    // Here, I would like to delete all next nodes
                    // and cut the current nodeValue and finally return the string 
                }
            }
        }
    }
}

我不确定目前是否以正确的方式进行操作.我希望这很清楚…

编辑:

这里举个例子.我有这句话

    <p>
        <span class="Underline"><span class="Bold">Test to be cut</span></span>
   </p><p>Some text</p>

假设我想在第6个字符处剪切它,我想返回以下内容:

<p>
    <span class="Underline"><span class="Bold">Test to</span></span>
</p>

解决方法:

正如我在评论中所写,您首先需要找到要进行剪切的文本偏移量.

首先,我设置一个包含HTML片段的DOMDocument,然后选择在DOM中表示它的正文:

$htmlFragment = <<<HTML
<p>
        <span class="Underline"><span class="Bold">Test to be cut</span></span>
   </p><p>Some text </p>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($htmlFragment);
$parent = $dom->getElementsByTagName('body')->item(0);
if (!$parent)
{
    throw new Exception('Parent element not found.');
}

然后,我使用TextRange类找到需要剪切的地方,并使用TextRange实际进行剪切,并找到应该成为片段最后一个节点的DOMNode:

$range = new TextRange($parent);

// find position where to cut the HTML textual represenation
// by looking for a word or the at least matching whitespace
// with a regular expression. 
$width = 17;
$pattern = sprintf('~^.{0,%d}(?<=\S)(?=\s)|^.{0,%1$d}(?=\s)~su', $width);
$r = preg_match($pattern, $range, $matches);
if (FALSE === $r)
{
    throw new Exception('Wordcut regex failed.');
}
if (!$r)
{
    throw new Exception(sprintf('Text "%s" is not cut-able (should not happen).', $range));
}

此正则表达式查找在$range可用的文本表示形式中剪切内容的偏移量.正则表达式模式为inspired by another answer,对其进行了更详细的讨论,并进行了略微修改以适应此答案的需求.

// chop-off the textnodes to make a cut in DOM possible
$range->split($matches[0]);
$nodes = $range->getNodes();
$cutPosition = end($nodes);

由于可能没有什么要剪裁的(例如,主体将变空),我需要处理这种特殊情况.否则,如注释中所述,需要删除以下所有节点:

// obtain list of elements to remove with xpath
if (FALSE === $cutPosition)
{
    // if there is no node, delete all parent children
    $cutPosition = $parent;
    $xpath = 'child::node()';
}
else
{
    $xpath = 'following::node()';
}

其余的很简单:查询xpath,删除节点并输出结果:

// execute xpath
$xp = new DOMXPath($dom);
$remove = $xp->query($xpath, $cutPosition);
if (!$remove)
{
    throw new Exception('XPath query failed to obtain elements to remove');
}

// remove nodes
foreach($remove as $node)
{
    $node->parentNode->removeChild($node);
}

// inner HTML (PHP >= 5.3.6)
foreach($parent->childNodes as $node)
{
    echo $dom->saveHTML($node);
}

完整的代码示例为available on viper codepad incl. TextRange类.键盘有一个错误,因此结果不正确(相关:XPath query result order).实际输出如下:

<p>
        <span class="Underline"><span class="Bold">Test to</span></span></p>

因此,请确保您具有当前的libxml版本(通常是这种情况),并且最后的foreach输出使用了PHP函数saveHTML,该函数自PHP 5.3.6起就可以使用该参数.如果您没有该PHP版本,请采用How to get the xml content of a node as a string?中概述的替代方法或类似问题.

当您仔细查看我的示例代码时,您可能会注意到剪切长度非常大($width = 17;).那是因为文本前面有很多空白字符.可以通过使正则表达式在其前面删除任意数量的空格和/或通过首先修剪TextRange来进行调整.第二个选项确实需要更多功能,我写了一些快速的东西,可以在创建初始范围后使用:

...
$range = new TextRange($parent);
$trimmer = new TextRangeTrimmer($range);
$trimmer->trim();
...

这样可以删除HTML片段内左右两侧不必要的空格. TextRangeTrimmer代码如下:

class TextRangeTrimmer
{
    /**
     * @var TextRange
     */
    private $range;

    /**
     * @var array
     */
    private $charlist;

    public function __construct(TextRange $range, Array $charlist = NULL)
    {
        $this->range = $range;
        $this->setCharlist($charlist);      
    }
    /**
     * @param array $charlist list of UTF-8 encoded characters
     * @throws InvalidArgumentException
     */
    public function setCharlist(Array $charlist = NULL)
    {
         if (NULL === $charlist)
            $charlist = str_split(" \t\n\r\0\x0B")
        ;

        $list = array();

        foreach($charlist as $char)
        {
            if (!is_string($char))
            {
                throw new InvalidArgumentException('Not an Array of strings.');
            }
            if (strlen($char))
            {
                $list[] = $char; 
            }
        }

        $this->charlist = array_flip($list);
    }
    /**
     * @return array characters
     */
    public function getCharlist()
    {
        return array_keys($this->charlist);
    }
    public function trim()
    {
        if (!$this->charlist) return;
        $this->ltrim();
        $this->rtrim();
    }
    /**
     * number of consecutive charcters of $charlist from $start to $direction
     * 
     * @param array $charlist
     * @param int $start offset
     * @param int $direction 1: forward, -1: backward
     * @throws InvalidArgumentException
     */
    private function lengthOfCharacterSequence(Array $charlist, $start, $direction = 1)
    {
        $start = (int) $start;              
        $direction = max(-1, min(1, $direction));
        if (!$direction) throw new InvalidArgumentException('Direction must be 1 or -1.');

        $count = 0;
        for(;$char = $this->range->getCharacter($start), $char !== ''; $start += $direction, $count++)
            if (!isset($charlist[$char])) break;

        return $count;
    }
    public function ltrim()
    {
        $count = $this->lengthOfCharacterSequence($this->charlist, 0);

        if ($count)
        {
            $remainder = $this->range->split($count);
            foreach($this->range->getNodes() as $textNode)
            {
                $textNode->parentNode->removeChild($textNode);
            }
            $this->range->setNodes($remainder->getNodes());
        }

    }
    public function rtrim()
    {
        $count = $this->lengthOfCharacterSequence($this->charlist, -1, -1);

        if ($count)
        {
            $chop = $this->range->split(-$count);
            foreach($chop->getNodes() as $textNode)
            {
                $textNode->parentNode->removeChild($textNode);
            }
        }
    }
}

希望这会有所帮助.

标签:word-wrap,html-parsing,html,php
来源: https://codeday.me/bug/20191202/2084944.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有