DDR爱好者之家 Design By 杰米

为了训练深度学习模型,经常要整理大量的标注数据,需统一不同格式的标注数据,一般情况下习惯读取TXT格式的数据。但实际中经常遇到XML格式的标注数据,在此举例:1.读取XML标注数据;2.写入TXT文件。

XML标注数据如下

<annotation verified="no"> 
 <folder>suE</folder> 
 <filename>Drivingrecord_001</filename> 
 <path>C:\Desktop\Drivingrecord_001.jpg</path> 
 <source> 
  <database>Unknown</database> 
 </source> 
 <size> 
  <width>1920</width> 
  <height>1080</height> 
  <depth>3</depth> 
 </size> 
 <segmented>0</segmented> 
 <object> 
  <name>苏E*****-蓝-1-白,灰-大众-上海大众-桑塔纳-尚纳</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>170</leftTopx> 
   <leftTopy>704</leftTopy> 
   <rightTopx>167</rightTopx> 
   <rightTopy>729</rightTopy> 
   <rightBottomx>242</rightBottomx> 
   <rightBottomy>735</rightBottomy> 
   <leftBottomx>243</leftBottomx> 
   <leftBottomy>710</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>苏E*****-蓝-1-黄-雷克萨斯-雷克萨斯(进口)-雷克萨斯RX</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>733</leftTopx> 
   <leftTopy>721</leftTopy> 
   <rightTopx>733</rightTopx> 
   <rightTopy>759</rightTopy> 
   <rightBottomx>881</rightBottomx> 
   <rightBottomy>760</rightBottomy> 
   <leftBottomx>882</leftBottomx> 
   <leftBottomy>722</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>苏*****-蓝-1-黑-宝马-宝马(进口)-宝马7系</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>1274</leftTopx> 
 <leftTopy>657</leftTopy> 
   <rightTopx>1274</rightTopx> 
   <rightTopy>671</rightTopy> 
   <rightBottomx>1325</rightBottomx> 
   <rightBottomy>670</rightBottomy> 
   <leftBottomx>1326</leftBottomx> 
   <leftBottomy>656</leftBottomy> 
  </bndbox> 
 </object> 
 <object> 
  <name>苏*****-蓝-1-灰-标致-东风标致-标致307</name> 
  <flag>polygon</flag> 
  <pose>Unspecified</pose> 
  <truncated>0</truncated> 
  <difficult>0</difficult> 
  <bndbox> 
   <leftTopx>1609</leftTopx> 
   <leftTopy>658</leftTopy> 
   <rightTopx>1611</rightTopx> 
   <rightTopy>671</rightTopy> 
   <rightBottomx>1659</rightBottomx> 
   <rightBottomy>669</rightBottomy> 
   <leftBottomx>1657</leftBottomx> 
   <leftBottomy>656</leftBottomy> 
  </bndbox> 
 </object> 
</annotation> 

在此,我们只需要图片名filename,和每个object的坐标(四个点的坐标)

Drivingrecord_001.jpg 170 704 167 729 242 735 243 710 733 721 733 759 881 760 882 722 1274 657 1274 671 1325 670 1326 656 1609 658 1611 671 1659 669 1657 656  

利用xml.dom.*模块,文件对象模块DOM在读取XML文件时,一次读取整个文件,将其所有数据保存在一个树结构中,此时,可利用DOM的各种函数来读取目标数据。在此,利用xml.dom.minidom解析XML文件。

并将目标数据写入TXT文档。

# -*- coding: utf-8 -*- 
""" 
Created on Fri Mar 2 15:36:44 2018 
 
@author: gg 
""" 
 
import xml.dom.minidom 
import os 
 
save_dir = 'D:\plate_train'  
if not os.path.exists(save_dir): 
  os.mkdir(save_dir) 
f = open(os.path.join(save_dir, 'landmark.txt'), 'w') 
 
DOMTree = xml.dom.minidom.parse('D:\plate_train\label\Drivingrecord_001.xml') 
annotation = DOMTree.documentElement 
 
filename = annotation.getElementsByTagName("filename")[0] 
imgname = filename.childNodes[0].data+'.jpg' 
print(imgname) 
   
objects = annotation.getElementsByTagName("object") 
 
loc = [imgname] #文档保存格式:文件名 坐标 
 
for object in objects: 
  bbox = object.getElementsByTagName("bndbox")[0] 
  leftTopx = bbox.getElementsByTagName("leftTopx")[0] 
  lefttopx = leftTopx.childNodes[0].data 
  print(lefttopx) 
  leftTopy = bbox.getElementsByTagName("leftTopy")[0] 
  lefttopy = leftTopy.childNodes[0].data 
  print(lefttopy) 
  rightTopx = bbox.getElementsByTagName("rightTopx")[0] 
  righttopx = rightTopx.childNodes[0].data 
  print(righttopx) 
  rightTopy = bbox.getElementsByTagName("rightTopy")[0] 
  righttopy = rightTopy.childNodes[0].data 
  print(righttopy) 
  rightBottomx = bbox.getElementsByTagName("rightBottomx")[0] 
  rightbottomx = rightBottomx.childNodes[0].data 
  print(rightbottomx) 
  rightBottomy = bbox.getElementsByTagName("rightBottomy")[0] 
  rightbottomy = rightBottomy.childNodes[0].data 
  print(rightbottomy) 
  leftBottomx = bbox.getElementsByTagName("leftBottomx")[0] 
  leftbottomx = leftBottomx.childNodes[0].data 
  print(leftbottomx) 
  leftBottomy = bbox.getElementsByTagName("leftBottomy")[0] 
  leftbottomy = leftBottomy.childNodes[0].data  
  print(leftbottomy) 
   
  loc = loc + [lefttopx, lefttopy, righttopx, righttopy, rightbottomx, rightbottomy, leftbottomx, leftbottomy] 
   
for i in range(len(loc)): 
  f.write(str(loc[i])+' ') 
f.write('\t\n')   
f.close() 
   

以上这篇python代码xml转txt实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持。

DDR爱好者之家 Design By 杰米
广告合作:本站广告合作请联系QQ:858582 申请时备注:广告合作(否则不回)
免责声明:本站资源来自互联网收集,仅供用于学习和交流,请遵循相关法律法规,本站一切资源不代表本站立场,如有侵权、后门、不妥请联系本站删除!
DDR爱好者之家 Design By 杰米

P70系列延期,华为新旗舰将在下月发布

3月20日消息,近期博主@数码闲聊站 透露,原定三月份发布的华为新旗舰P70系列延期发布,预计4月份上市。

而博主@定焦数码 爆料,华为的P70系列在定位上已经超过了Mate60,成为了重要的旗舰系列之一。它肩负着重返影像领域顶尖的使命。那么这次P70会带来哪些令人惊艳的创新呢?

根据目前爆料的消息来看,华为P70系列将推出三个版本,其中P70和P70 Pro采用了三角形的摄像头模组设计,而P70 Art则采用了与上一代P60 Art相似的不规则形状设计。这样的外观是否好看见仁见智,但辨识度绝对拉满。