国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

上傳包含 HTML 頁面中的 URL 的 CSV 文件,并使用 Flask 讀取要抓取的 URL
P粉799885311
P粉799885311 2023-09-07 11:22:35
0
1
772

我目前需要制作一個基于網(wǎng)絡的系統(tǒng),可以上傳包含 URL 列表的 CSV 文件。上傳后,系統(tǒng)將逐行讀取 URL,并將用于下一步抓取。這里,抓取需要先登錄網(wǎng)站再抓取。我已經(jīng)有了登錄網(wǎng)站的源代碼。但是,問題是我想將名為“upload_page.html”的html頁面與名為“upload_csv.py”的燒瓶文件連接起來。登錄和抓取的源代碼應該放在flask文件中的哪里?

upload_page.html

<div class="upload">
            <h2>Upload a CSV file</h2>
                <form action="/upload" method="post" enctype="multipart/form-data">
                 <input type="file" name="file" accept=".csv">
                 <br>
                 <br>
                 <button type="submit">Upload</button>
                </form>
</div>

upload_csv.py

from flask import Flask, request, render_template
import pandas as pd
import requests
from bs4 import BeautifulSoup
import csv
import json
import time
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

app = Flask(__name__)

@app.route('/')
def index():
    return render_template('upload_page.html')

#Code for Login to the website


@app.route('/upload', methods=['POST'])
def upload():
    # Read the uploaded file
    csv_file = request.files['file']
    # Load the CSV data into a DataFrameSS
    df = pd.read_csv(csv_file)
    final_data = []
    # Loop over the rows in the DataFrame and scrape each link
    for index, row in df.iterrows():
        link = row['Link']
        response = requests.get(link)
        soup = BeautifulSoup(response.content, 'html.parser')
        start = time.time()
        # will be used in the while loop
        initialScroll = 0
        finalScroll = 1000

        while True:
            driver.execute_script(f"window.scrollTo({initialScroll},{finalScroll})")
            # this command scrolls the window starting from the pixel value stored in the initialScroll
            # variable to the pixel value stored at the finalScroll variable
            initialScroll = finalScroll
            finalScroll += 1000

            # we will stop the script for 3 seconds so that the data can load
            time.sleep(2)
            end = time.time()
            # We will scroll for 20 seconds.
            if round(end - start) > 20:
                break

        src = driver.page_source
        soup = BeautifulSoup(driver.page_source, 'html.parser')
        # print(soup.prettify())

        #Code to do scrape the website

    return render_template('index.html', message='Scraped all data')


if __name__ == '__main__':
    app.run(debug=True)

我的登錄和抓取代碼是否位于正確的位置?但是,編碼不起作用,在我單擊上傳按鈕后,它沒有被處理

P粉799885311
P粉799885311

全部回復(1)
P粉207969787

雷雷

最新下載
更多>
網(wǎng)站特效
網(wǎng)站源碼
網(wǎng)站素材
前端模板