Some tax-time automation

I often struggle to find the right balance between automation and manual work. As it is tax time, and Chase bank only gives you 90 days of statements, I find myself every year going back through our statements to find any business expenses and do our overall financial review for the year. In the past I’ve played around with MS Money, Quicken, Mint and kept my own spreadsheets. Now, I just download the statements at the end of year and use acrobat to combine and ruby to massage the combined PDF into a spreadsheet.1

To do my analysis I need everything in a CSV format. After, getting one PDF, I end up looking at the structure of the document which looks like:

Earn points [truncated] and 1{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} back per $1 spent on all other Visa Card purchases. 

Date of Transaction Merchant Name or Transaction Description $ Amount 
PAYMENTS AND OTHER CREDITS 
01/23 -865.63
AUTOMATIC PAYMENT - THANK YOU 

PURCHASES 
12/27  AMAZON MKTPLACE PMTS AMZN.COM/BILL WA  15.98
12/29  NEW JERSEY E-ZPASS 888-288-6865 NJ  25.00  
12/30  AMAZON MKTPLACE PMTS AMZN.COM/BILL WA  54.01  

0000001 FIS33339 C 2 000 Y 9 26 15/01/26 Page 1 of 2

I realize that I want all lines that have a number like MM/DD followed by some spaces and a bunch of text, followed by a decimal number and some spaces. In regular expression syntax, that looks like:

/^(\d{2}\/\d{2})\s+(.*)\s+(\d+\.\d+)\s+$/

which is literally just a way of describing to the computer where my data are.

Through using Ruby, I can easily get my expenses as CSV:

Boom. Hope this helps some of you who might otherwise be doing a lot of typing. Also, if you want to combine PDFs on the command line, you can use PDFtk thus:

pdftk file1.pdf file2.pdf cat output -

  1. The manual download takes about 10 minutes. When I get some time, I’m up for the challenge of automating this eventually with my own screen scraper and web automation using some awesome combination Ruby and Capybara. I also use PDFtk to combine PDF files. 

One thought on “Some tax-time automation

  1. this actually works better:

    require ‘csv’

    i = 0
    tot = 0

    CSV.open(“out.csv”, “wb”) do |csv|
    csv << [‘date’, ‘payee’, ‘amount’]
    File.open(‘data.txt’).each_line do |l|
    tot = tot + 1
    if l[/^(\d{2}\/\d{2})\s+(.*)\s+((?:[-,\d]+)?.\d+)\s+$/]
    #puts “‘#{$1.strip}’ and ‘#{$2.strip}’ and ‘#{$3.strip}'”
    csv << [$1.strip, $2.strip, $3.strip]
    i = i + 1
    else
    puts “not for #{l}”
    end
    end
    end

    puts ” ———- ”
    puts ” completed with #{i} of #{tot} —- or #{i/tot}”
    puts ” ———- “

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.