0%

Pyshark中使用LiveCapture和InMemCapture杂记

  最近使用Pyshark(版本为0.4.2.2)的过程中发现有很多的限制,还有一些奇怪的问题
  1、sniff_continuously抛出TSharkCrashException的解决方法
  2、将原始数据内容解析为包(raw->packet)
  3、InMemCapture()使用only_summaries参数后多次用parse_packet会报错


Pyshark使用时参考的文章

PyShark入门(1):简介
PyShark入门(2):FileCapture和LiveCapture模块
PyShark入门(3):capture对象
Pyshark分析pcap文件
D.2. tshark: Terminal-based Wireshark


sniff_continuously抛出TSharkCrashException的解决方法

  此处的使用是在一个继承了QThread的线程类里面

1
2
3
4
5
6
7
8
9
10
#self.interface 网卡
#self.number 捕获数量
cap = pyshark.LiveCapture(interface = self.interface)
#逐个获取self.number个包
for pkt in cap.sniff_continuously(packet_count = self.number):
#如果stop变为True则线程结束,停止抓包
if self.stop:
break
else:
print(pkt)

  无论是使用sniff_continuously迭代器抓取完还是break出循环,都会报错,多次报错后会导致程序崩溃
  应该是代码存在bug,issue中有人提了这个问题,后看sniff_continuously代码注释,使用apply_on_packets和回调函数,每抓一个包调用回调函数,可以实现同样的功能,并且可以解决报错和崩溃的问题,还有略微的性能提升

1
2
3
4
5
6
7
8
9
10
11
cap = pyshark.LiveCapture(interface = self.interface)
cap.apply_on_packets(self.packet_captured, packet_count = self.number)
...
...
def packet_captured(self, pkt):
if self.stop:
# apply_on_packets -> packets_from_tshar -> try..except StopCapture:
#抛出异常,结束线程,如果调用exit,quit,terminate等方法还是会在循环里
raise StopCapture()
else:
print(pkt)

  此处值得注意的是不能通过exit,quit,terminate等方法停止线程,需要使用Pyshark中自定义异常StopCapture()来停止


将原始数据内容解析为包(raw->packet)

  这次的程序也是用pyqt5制作的界面,想要模仿一下Wireshark(Pyshark就是用Python封装了tshark,而tshark又是Wireshark的组件,这么做确实挺奇怪的...)
  Wireshark主界面一共有三个窗口,从上至下依次为,概要信息窗口,详细信息窗口,原始数据窗口,获取概要信息要only_summaries=True,获取原始数据要use_json=True, include_raw=True
  注:only_summaries 和 use_json、include_raw 不能同时为True,否则会抓不到包
  
  我的目的是同时获取详细信息和原始数据,但是在LiveCapture中只能二者获得其一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#详细数据内容
cap = pyshark.LiveCapture(interface='WLAN')
cap.sniff(packet_count=5)
print(cap[0])
-------------------------------
Layer ETH:
Destination: 88:35:68:24:e4:69
Address: 88:35:68:24:e4:69
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Source: 94:1d:17:58:4e:a6
Type: IPv4 (0x0800)
Address: 94:1d:17:58:4e:a6
.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Layer IP:
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
0000 00.. = Differentiated Services Codepoint: Default (0)
.... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
Total Length: 48
Identification: 0x14e5 (5349)
...
...

  在LiveCapture中用use_json=True, include_raw=True可以包的原始数据内容,但是会丢失tshark解析出来的东西

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#原始数据内容
cap = pyshark.LiveCapture(interface='WLAN', use_json=True, include_raw=True)
cap.sniff(packet_count=5)
print(cap[0].frame_raw.value)
-------------------------------
88356824e469941d17584ea6080045000030165340008006569fc0a80066acd91fee29fb01bbe701826c000000007002faf065340000020405b401010402
-------------------------------
print(cap[0])
-------------------------------
Packet (Length: 62)
Layer ETH:
src_raw: 941d17584ea6
src_raw: 6
src_raw: 6
src_raw: 29
type: 0x00000800
src:
addr_resolved: 94:1d:17:58:4e:a6
src_resolved_raw: 941d17584ea6
src_resolved_raw: 6
src_resolved_raw: 6
src_resolved_raw: 26
addr_resolved_raw: 941d17584ea6
addr_resolved_raw: 6
addr_resolved_raw: 6
addr_resolved_raw: 26
ig_raw: 0
ig_raw: 6
ig_raw: 3
ig_raw: 65536
...
...

  看源码注释发现InMemCapture可以把bytes解析为包,实现raw->packet,就是这么做会降低运行效率

1
2
3
4
5
6
7
raw_to_pkt = pyshark.InMemCapture()
# raw -> bytes
frame_raw_value = cap[0].frame_raw.value
bytes_raw = bytes.fromhex(frame_raw_value)
# bytes -> packet
#parse_packet函数返回是返回一个packet包,这里和后面是为了简化代码,直接使用
raw_to_pkt.parse_packet(bytes_raw)

  注:在Capture的事件循环中是不允许创建或使用其他Capture的,比如

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
raw_to_pkt = pyshark.InMemCapture()
cap = pyshark.LiveCapture(interface = self.interface, use_json=True, include_raw=True)
cap.apply_on_packets(self.packet_captured, packet_count = self.number)
...
...
def packet_captured(self, pkt):
if self.stop:
# apply_on_packets -> packets_from_tshar -> try..except StopCapture:
#抛出异常,结束线程,如果调用exit,quit,terminate等方法还是会在循环里
raise StopCapture()
else:
frame_raw_value = pkt.frame_raw.value
bytes_raw = bytes.fromhex(frame_raw_value)
#这里会抛出RuntimeError: Cannot run the event loop while another loop is running
raw_to_pkt.parse_packet(bytes_raw)

  但是可以创建一个线程来使用,将16行的代码修改如下(运行效率会进一步降低..)

1
2
3
t = threading.Thread(target=raw_to_pkt.parse_packet, args=(bytes_raw,))
t.start()
t.join()

  如果需要获取线程的返回值要再加一个类,参考:python获取多线程的返回值
  注:InMemCapture生成的包的序号和时间是根据使用InMemCapture时的时间生成的,与传入LiveCapture的包的序号和时间不一样   


InMemCapture()使用only_summaries参数后多次用parse_packet会报错

  如下情况会报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
cap = pyshark.LiveCapture(interface='WLAN',use_json=True,include_raw=True)
imc = pyshark.InMemCapture(only_summaries = True)

cap.sniff(packet_count=5)
# raw -> bytes
frame_raw_value = cap[0].frame_raw.value
bytes_raw = bytes.fromhex(frame_raw_value)
# bytes -> packet
print(imc.parse_packet(bytes_raw))
-------------------------------
1 0.0 192.168.1.222 192.168.1.233 TCP 62 14124 \xe2\x86\x92 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
-------------------------------
#再次使用会报错,tshark进程会奔溃
print(imc.parse_packet(bytes_raw))
-------------------------------
Traceback (most recent call last):
File "E:\Python3.6.1\lib\site-packages\pyshark\capture\inmem_capture.py", line 123, in _get_parsed_packet_from_tshark
DEFAULT_TIMEOUT)
File "E:\Python3.6.1\lib\asyncio\tasks.py", line 356, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "E:\Python3.6.1\lib\site-packages\pyshark\capture\inmem_capture.py", line 93, in parse_packet
return self.parse_packets([binary_packet])[0]
File "E:\Python3.6.1\lib\site-packages\pyshark\capture\inmem_capture.py", line 116, in parse_packets
self.eventloop.run_until_complete(self._get_parsed_packet_from_tshark(callback))
File "E:\Python3.6.1\lib\asyncio\base_events.py", line 466, in run_until_complete
return future.result()
File "E:\Python3.6.1\lib\site-packages\pyshark\capture\inmem_capture.py", line 126, in _get_parsed_packet_from_tshark
raise asyncio.TimeoutError("Timed out while waiting for tshark to parse packet. "
concurrent.futures._base.TimeoutError: Timed out while waiting for tshark to parse packet. Try rerunning with cap.set_debug() to see tshark errors. Closing tshark..

  这应该是bug,提了个issue,如果一定要使用的话必须每次使用parse_packet前重新创建InMemCapture对象,如下

1
2
3
4
5
6
7
for i in range(3):
imc = pyshark.InMemCapture(only_summaries = True)
print(imc.parse_packet(bytes_raw))
-------------------------------
1 0.0 192.168.1.222 192.168.1.233 TCP 62 14124 \xe2\x86\x92 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
1 0.0 192.168.1.222 192.168.1.233 TCP 62 14124 \xe2\x86\x92 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
1 0.0 192.168.1.222 192.168.1.233 TCP 62 14124 \xe2\x86\x92 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1