删除与类关联的HTML标记

我强迫自己学习如何仅使用AppleScript编写脚本,但我目前面临的问题是尝试使用类删除特定标记.我试图找到可靠的文档和示例,但此时它似乎非常有限.

这是我的HTML

<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class="foo">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami <span class="foo">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>

我想要做的是删除一个特定的类,所以它将删除< span class =“foo”>,结果:

<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl shoulder biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami jerky strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>

我知道如何使用do shell脚本和终端进行此操作,但我想了解AppleScript字典中可用的内容.

在研究中,我能够找到一种解析所有HTML标签的方法:

on removeMarkupFromText(theText)
    set tagDetected to false
    set theCleanText to ""
    repeat with a from 1 to length of theText
        set theCurrentCharacter to character a of theText
        if theCurrentCharacter is "<" then
            set tagDetected to true
        else if theCurrentCharacter is ">" then
            set tagDetected to false
        else if tagDetected is false then
            set theCleanText to theCleanText & theCurrentCharacter as string
        end if
    end repeat
    return theCleanText
end removeMarkupFromText

但这会删除所有HTML标签,这不是我想要的.搜索SO我能够找到如何在Parsing HTML source code using AppleScript之间提取标签,但我不打算解析文件.

我熟悉BBEdit的平衡标签,在下拉列表中称为Balance,但是当我运行时:

tell application "BBEdit"
    activate
    find "<span class=\"foo\">" searching in text 1 of text document "test.html" options {search mode:grep, wrap around:true} with selecting match
    balance tags
end tell

它变得贪婪并抓住第一个标签到第二个最后一个结束标签之间的整条线,其间有文本,而不是将自己与第一个标签隔离开来.

在标签下的字典中进一步研究我确实遇到了我可以做的find标签:将spanTarget设置为(找到标签“span”start_offset counter)然后使用类| class |来标记标签. spanTarget标签的属性和使用余额标签,但我仍然遇到与以前相同的问题.

因此,在纯AppleScript中,如何在不贪婪的情况下删除与类关联的标记?

最佳答案 我相信Ron的答案是一个很好的方法,但如果您不想使用正则表达式,可以使用下面的代码实现.看到罗恩回答之后,我不会发布它,但我已经创建了它,所以我想我至少会给你第二个选择,因为你正在努力学习.

on run
    set theHTML to "<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class=\"foo\">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class=\"bar\">Pig brisket</span> jowl ham pastrami <span class=\"foo\">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>" 
    set theHTML to removeTag(theHTML, "<span class=\"foo\">", "</span>")
end run

on removeTag(theText, startTag, endTag)
    if theText contains startTag then
        set AppleScript's text item delimiters to {""}
        set AppleScript's text item delimiters to startTag
        set tempText to text items of (theText as string)
        set AppleScript's text item delimiters to {""}

        set middleText to item 2 of tempText as string
        if middleText contains endTag then
            set AppleScript's text item delimiters to endTag
            set tempText2 to text items of (middleText as string)
            set AppleScript's text item delimiters to {""}
            set newString to implode(tempText2, endTag)
            set item 2 of tempText to newString
        end if
        set newString to implode(tempText, startTag)
        removeTag(newString, startTag, endTag) -- recursive
    else
        return theText
    end if
end removeTag

on implode(parts, tag)
    set newString to items 1 thru 2 of parts as string
    if (count of parts) > 2 then
        set newList to {newString, items 3 thru -1 of parts}
        set AppleScript's text item delimiters to tag
        set newString to (newList as string)
        set AppleScript's text item delimiters to {""}
    end if
    return newString
end implode
点赞