Cleaning Up HTML With Simple Python

Feb 15, 2011 22:48 · 315 words · 2 minutes read Python

When you learn something new, you must practice it every day before you really understand it. I started working with Python the other day, so I am forcing myself to practice by writing scripts for various things that come up. Today’s task was cleaning up one of the pages on this website.

One of the pages in the menu on top of this page is for my ConCall Numbers. Its a listing of the Orange Business Conferencing dial in numbers for many countries around the world, along with the participant pass code. I use this for the classes I give, as well as for meetings I need to set up. But the page has been pretty ugly for a long time.

Ishot 110215232657 1

The text had come from an email I received that listed out the numbers. I simply copied and pasted it from the email into the page HTML editor. Along with it came dozens of   codes on every line. Fixing it just wasn’t a priority. But it turns out that fixing it was very easy with just a little bit of Python.

The result is shown here. Its still a boring list of numbers, but as you can see, its a lot nicer to look at.

Ishot 110215233315 1

It turns out that the script was very easy to create. I open a file, run three regex searches, and then write it to a new file. Once I had the file, I manually added a starting and ending

tag and I was done.


For those who are curious, here is the complete script. I am sure there is a better way to write this, but this was quick and easy, and it worked.

#!/usr/bin/python -tt

import sys
import re

def ReadFile('bad.html'):
    text = open(filename,'r').read()
    text = re.sub('( s?)+', '</td><td>', text)
    text = re.sub('<br />','</td></tr>n<tr><td>',text)
    text = re.sub('<tr><td>s?</td><td>','<tr><td>',text)
def main():

if __name__ == '__main__':

© Copyright 2017 Technovangelist

Powered by Hugo Theme By nodejh