一 背景
有时候,我们想通过脚本方式去模拟浏览器的请求去做一些自动化的事情,并且可以直接获取都页面的一些元素并且执行页面的一些操作,此时我们可以考虑使用python的selenium。
Selenium是一个涵盖了一系列工具和库的项目,这些工具和库支持并支持Web浏览器的自动化。
二 具体操作
Centos7 默认有 python2.7版本
安装pip1
2wget https://bootstrap.pypa.io/pip/2.7/get-pip.py
python get-pip.py
安装selenium模块1
2
3pip install selenium
# 查看版本号 3.141.0
pip show selenium
安装chrome1
2
3yum install https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm -y
# 查看版本
google-chrome --version
安装chrome驱动
找Chrome对应的驱动,与系统中安装的Chrome浏览器版本一致的驱动
https://chromedriver.storage.googleapis.com/index.html
如Centos7下使用这个1
2
3
4cd /usr/local/bin/
wget https://chromedriver.storage.googleapis.com/90.0.4430.24/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
rm -rf chromedriver_linux64.zip
python脚本如下1
2
3
4
5
6
7
8
9
10
11
12
13
14
15from selenium import webdriver
opt = webdriver.ChromeOptions()
opt.add_argument('lang=zh_CN.UTF-8')
opt.add_argument('User-Agent=Mozilla/5.0 (Linux; U; Android 8.1.0; zh-cn; BLA-AL00 Build/HUAWEIBLA-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQBrowser/8.9 Mobile Safari/537.36')
opt.add_argument('--headless')
opt.add_argument('--no-sandbox')
opt.add_argument('--disable-dev-shm-usage')
opt.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=opt)
driver.get('https://www.baidu.com')
print(driver.current_window_handle)
print(driver.page_source)
driver.close()
三 常用的方法
参考官方文档:
https://www.selenium.dev/documentation/zh-cn/getting_started/https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.chrome.service
常用的方法如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104# <div id="elementid">xxxxxxx</div>
element=driver.find_element_by_id('elementid')
# <div id="elementid">xxxxxxx</div>
element=driver.find_element_by_tag_name('div')
# <div name="elementname">xxxxxxx</div>
element=driver.find_element_by_name('elementname')
# <a href="http://www.google.com/search?q=cheese">linktext</a>
element=driver.find_element_by_link_text('linktext')
# <a href="http://www.google.com/search?q=cheese">linktext xxxx 111</a>
element=driver.find_element_by_partial_link_text('linktext')
# <div id="food">xxxxxxx</div>
element=driver.find_element_by_css_selector('#food')
# 事件 - 它将移动到该元素,然后在给定元素的中间单击(不释放).
webdriver.ActionChains(driver).click_and_hold(searchBtn).perform()
# 事件 - 此方法首先将鼠标移动到元素的位置, 然后在给定元素执行上下文点击(右键单击).
webdriver.ActionChains(driver).context_click(searchBtn).perform()
# 事件 - 双击
webdriver.ActionChains(driver).double_click(searchBtn).perform()
# 事件 - 此方法将鼠标移到元素的中间. 执行此操作时, 该元素也会滚动到视图中.
webdriver.ActionChains(driver).move_to_element(gmailLink).perform()
# 事件 - 此方法将鼠标从其当前位置(或0,0)移动给定的偏移量. 如果坐标在视图窗口之外, 则鼠标最终将在浏览器窗口之外.
#Set x and y offset positions of element
xOffset = 100
yOffset = 100
# Performs mouse move action onto the element
webdriver.ActionChains(driver).move_by_offset(xOffset,yOffset).perform()
# 事件 - 此方法首先在源元素上单击并按住,然后移动到目标元素的位置后释放鼠标.
webdriver.ActionChains(driver).drag_and_drop(sourceEle,targetEle).perform()
# 事件 - 此方法首先在源元素上单击并按住, 移至给定的偏移量后释放鼠标.
webdriver.ActionChains(driver).drag_and_drop_by_offset(sourceEle, targetEleXOffset, targetEleYOffset).perform()
# 事件 - 此操作将释放按下的鼠标左键. 如果WebElement转移了, 它将释放给定WebElement上按下的鼠标左键.
# 结合 webdriver.ActionChains(driver).click_and_hold(sourceEle).move_to_element(targetEle).perform() 使用
webdriver.ActionChains(driver).release().perform()
# 增加cookie
# additional keys that can be passed in are:
# 'domain' -> String,
# 'secure' -> Boolean,
# 'expiry' -> Milliseconds since the Epoch it should expire.
driver.add_cookie({'name':'key', 'value':'value', 'path':'/'})
# 获取cookie
driver.get_cookie("foo")
# 删除cookie
driver.delete_cookie("CookieName")
# 清除cookie
driver.delete_all_cookies()
# 输出所有cookie
for cookie in driver.get_cookies(): print "%s -> %s" % (cookie['name'], cookie['value'])
# 元素Select
#<select>
# <option value=value1>Bread</option>
# <option value=value2 selected>Milk</option>
# <option value=value3>Cheese</option>
#</select>
from selenium.webdriver.support.select import Select
select_element = driver.find_element(By.ID,'selectElementID')
select_object = Select(select_element)
# Select an <option> based upon the <select> element's internal index
select_object.select_by_index(1)
# Select an <option> based upon its value attribute
select_object.select_by_value('value1')
# Select an <option> based upon its text
select_object.select_by_visible_text('Bread')
# Return a list[WebElement] of options that have been selected
all_selected_options = select_object.all_selected_options
# Return a WebElement referencing the first selection option found by walking down the DOM
first_selected_option = select_object.first_selected_option
# Return a list[WebElement] of options that the <select> element contains
all_available_options = select_object.options
# Deselect an <option> based upon the <select> element's internal index
select_object.deselect_by_index(1)
# Deselect an <option> based upon its value attribute
select_object.deselect_by_value('value1')
# Deselect an <option> based upon its text
select_object.deselect_by_visible_text('Bread')
# Deselect all selected <option> elements
select_object.deselect_all()
# 是否允许多选
does_this_allow_multiple_selections = select_object.is_multiple