java – 使用jsoup从带有可变页面数据的“form”类中提取文本

首先发布在这里,所以我会尽力保持这一点.我一直在使用Jsoup从一系列网页中提取数据以引入一个优秀的应用程序.我遇到了一个页面,它根据下拉框中的用户选择动态更新数据.当我在Chrome中检查html时,我可以看到数据,但我似乎无法提取它.我可以提取它周围的所有文本元素,但动态生成的任何内容都不会出来.

我正在看的页面有下面的表格类,为包装道歉,我无法摆脱它.

<form class="variations_form cart" method="post" enctype="multipart/form-data" data-product_id="8044" data-product_variations="[{&quot;variation_id&quot;:8047,&quot;variation_is_visible&quot;:true,&quot;variation_is_active&quot;:true,&quot;is_purchasable&quot;:true,&quot;display_price&quot;:19.70,&quot;display_regular_price&quot;:19.70,&quot;attributes&quot;:{&quot;attribute_size&quot;:&quot;500g&quot;},&quot;image_src&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-475x652.png&quot;,&quot;image_link&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann.png&quot;,&quot;image_title&quot;:&quot;LABELS_500g-FOOD Vann&quot;,&quot;image_alt&quot;:&quot;&quot;,&quot;image_srcset&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/08\/LABELS_500g-FOOD-Vann.png 1063w&quot;,&quot;image_sizes&quot;:&quot;(max-width: 475px) 100vw, 475px&quot;,&quot;price_html&quot;:&quot;<span class=\&quot;price\&quot;><span class=\&quot;amount\&quot;>$19.70<\/span><\/span>&quot;,&quot;availability_html&quot;:&quot;&quot;,&quot;sku&quot;:&quot;FOOD-Vanilla-500&quot;,&quot;weight&quot;:&quot;.5 kg&quot;,&quot;dimensions&quot;:&quot;&quot;,&quot;min_qty&quot;:1,&quot;max_qty&quot;:&quot;&quot;,&quot;backorders_allowed&quot;:false,&quot;is_in_stock&quot;:true,&quot;is_downloadable&quot;:false,&quot;is_virtual&quot;:false,&quot;is_sold_individually&quot;:&quot;no&quot;,&quot;variation_description&quot;:&quot;<p>500g<\/p>\n&quot;},{&quot;variation_id&quot;:8045,&quot;variation_is_visible&quot;:true,&quot;variation_is_active&quot;:true,&quot;is_purchasable&quot;:true,&quot;display_price&quot;:13.50,&quot;display_regular_price&quot;:13.50,&quot;attributes&quot;:{&quot;attribute_size&quot;:&quot;1kg&quot;},&quot;image_src&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-475x652.png&quot;,&quot;image_link&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van.png&quot;,&quot;image_title&quot;:&quot;LABELS_1kg-FOOD Van&quot;,&quot;image_alt&quot;:&quot;&quot;,&quot;image_srcset&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_1kg-FOOD-Van.png 1063w&quot;,&quot;image_sizes&quot;:&quot;(max-width: 475px) 100vw, 475px&quot;,&quot;price_html&quot;:&quot;<span class=\&quot;price\&quot;><span class=\&quot;amount\&quot;>$13.50<\/span><\/span>&quot;,&quot;availability_html&quot;:&quot;&quot;,&quot;sku&quot;:&quot;FOOD-Vanilla-1kg&quot;,&quot;weight&quot;:&quot;1 kg&quot;,&quot;dimensions&quot;:&quot;&quot;,&quot;min_qty&quot;:1,&quot;max_qty&quot;:&quot;&quot;,&quot;backorders_allowed&quot;:false,&quot;is_in_stock&quot;:true,&quot;is_downloadable&quot;:false,&quot;is_virtual&quot;:false,&quot;is_sold_individually&quot;:&quot;no&quot;,&quot;variation_description&quot;:&quot;<p>1kg<\/p>\n&quot;},{&quot;variation_id&quot;:8046,&quot;variation_is_visible&quot;:true,&quot;variation_is_active&quot;:true,&quot;is_purchasable&quot;:true,&quot;display_price&quot;:199.95,&quot;display_regular_price&quot;:199.95,&quot;attributes&quot;:{&quot;attribute_size&quot;:&quot;3kg&quot;},&quot;image_src&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-475x652.png&quot;,&quot;image_link&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van.png&quot;,&quot;image_title&quot;:&quot;LABELS_3kg-FOOD Van&quot;,&quot;image_alt&quot;:&quot;&quot;,&quot;image_srcset&quot;:&quot;http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-746x1024.png 746w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van-475x652.png 475w, http:\/\/www.sourcewebsite.com\/wp-content\/uploads\/2014\/09\/LABELS_3kg-FOOD-Van.png 1063w&quot;,&quot;image_sizes&quot;:&quot;(max-width: 475px) 100vw, 475px&quot;,&quot;price_html&quot;:&quot;<span class=\&quot;price\&quot;><span class=\&quot;amount\&quot;>$199.95<\/span><\/span>&quot;,&quot;availability_html&quot;:&quot;&quot;,&quot;sku&quot;:&quot;FOOD-Vanilla-3kg&quot;,&quot;weight&quot;:&quot;3 kg&quot;,&quot;dimensions&quot;:&quot;&quot;,&quot;min_qty&quot;:1,&quot;max_qty&quot;:&quot;&quot;,&quot;backorders_allowed&quot;:false,&quot;is_in_stock&quot;:true,&quot;is_downloadable&quot;:false,&quot;is_virtual&quot;:false,&quot;is_sold_individually&quot;:&quot;no&quot;,&quot;variation_description&quot;:&quot;<p>3kg<\/p>\n&quot;}]">

  <table class="variations" cellspacing="0">
    <tbody>
      <tr>
        <td class="label">
          <label for="size">Size</label>
        </td>
        <td class="value">
          <select id="size" class="" name="attribute_size" data-attribute_name="attribute_size">
            <option value="">Choose an option</option>
            <option value="500g">500g</option>
            <option value="1kg" selected="selected">1kg</option>
            <option value="3kg">3kg</option>
          </select><a class="reset_variations" href="#" style="visibility: visible; display: block;">Clear selection</a>	
        </td>
      </tr>
    </tbody>
  </table>

  <div class="angelleye_buton_box_relative" style="position: relative;">

    <div class="single_variation_wrap">
      <div class="woocommerce-variation-description" style="border: 1px solid transparent;">
        <p>1kg</p>
      </div>
      <div class="single_variation"><span class="price"><span class="amount selectorgadget_selected">$13.50</span></span>
      </div>
      <div class="variations_button">
        <div class="quantity">
          <input type="number" step="1" name="quantity" value="1" title="Qty" class="input-text qty text" size="4" min="1">
        </div>
        <button type="submit" class="single_add_to_cart_button button alt">Add to basket</button>
        <input type="hidden" name="add-to-cart" value="8044">
        <input type="hidden" name="product_id" value="8044">
        <input type="hidden" name="variation_id" class="variation_id" value="8045">
      </div>
    </div>

    <div class="blockUI blockOverlay angelleyeOverlay" style="display:none;z-index: 1000; border: none; margin: 0px; padding: 0px; width: 100%; height: 100%; top: 0px; left: 0px; opacity: 0.6; cursor: default; position: absolute; background: url(http://www.sourcewebsite.com/wp-content/plugins/woocommerce/assets/images/select2-spinner.gif) 50% 50% / 16px 16px no-repeat rgb(255, 255, 255);"></div>
  </div>

</form>

我试图从下面的div中提取价格“13.50”.

<div class="single_variation"><span class="price"><span class="amount selectorgadget_selected">$13.50</span></span>
</div>

我的代码如下:

    private class ParseFoodPriceURL extends AsyncTask<String, Void, String> {

    @Override
    protected String doInBackground(String... strings) {
        StringBuffer buffer = new StringBuffer();
        try {
            Document doc = Jsoup.connect(strings[0]).get();
            Elements foodPrice = doc.select("div.single_variation_wrap > div.single_variation");
            String priceTextSelection = foodPrice.text();
            buffer.append("Price: $" + priceTextSelection);

        }
        catch (Throwable t) {
            t.printStackTrace();
        }
        return buffer.toString();
    }

最佳答案 JSoup不是浏览器,因此它不会解释和执行JavaScript.如果网站的内容是动态生成的,则无法直接使用JSoup.我想到了两个选择:

>直接识别AJAX调用并通过这些调用获取信息.通常,响应不是HTML而是JSON.所以你可能需要其他解析库.此选项很快,但您需要调查并了解网页的工作原理.
>将selenium webdriver与真实的浏览器引擎(例如phantomjs)一起使用.这将像真正的浏览器一样加载网站,但您可以访问类似于JSoup的内容.这相对容易编程,但速度慢并且使用了大量资源.如果你在android中运行,这可能太多了.无论如何,Android的正确工具似乎是Selenoid.

点赞