Have you read this PDF?

A friend of mine came to me with interesting situation.

He was a teacher at a school where the student’s email address was their student number + the school server. If you were student number 121, your e-mail address was 121@theSchool.edu.

You get the idea.

The report cards were in PDF form, and they wanted to mail off the report cards to the students. How were they planning on doing this?

  1. Gather a group of teachers in a conference room with their computers
  2. Give them a thumb drive full of report cards on PDFs
  3. Have the teacher open the PDF file
  4. COPY the student number
  5. Create an email, using the COPIED student number (PASTE and type “@theSchool.edu”)
  6. Copy from ANOTHER text source the message, “Dear Student, yakkity yak yak. Here is your report card. Read ’em weep. Love, the School”
  7. SEND email
  8. Rinse, lather and repeat
wearing a mask, a cape and armed with mad Ruby skillz, you can save the world (or at least a roomful of teachers)

This looks like a job for RUBY SUPERHERO!

Part One: Reading PDFs.

Like most tough things in Ruby, there is a GEM for that. In the case of parsing PDFs, you need to install the PDF Reader Gem which is located at:

https://github.com/yob/pdf-reader

Using this Gem, we can read the PDF and put it into a string format and then extract the data we need.

After installing the GEM, your script will start with

#!/usr/bin/env ruby
require 'pdf-reader'

Part Two: Counting the (Report) Cards

Since we have a folder full of PDFs, we need to read the folder and get a count of the PDF files in the folder.


theCount = Dir.glob('*.pdf').count
puts "there are " + Dir.glob('*.pdf').count + " files to e-mail"

I like to toss in friendly put messages so when things go horribly wrong, you get some indication of where it broke.

Part Three: Iterate thru the folder


Dir.glob('*.pdf') do |rb_pdf|

# lots of Ruby Magic! 

end

This nifty line says, “Read the directory, looking ONLY at files that end in .pdf and call the elements “rb_pdf”. As you go thru the elements one by one, do stuff to them.

The stuff part is coming.

Part Four: Read the file!


reader = PDF::Reader.new(rb_pdf)

This is where the GEM pdf-reader comes in. It will read the file element “rb_pdf” and creates an object called “reader”.

Object “reader” has a number of elements, but we are most interested in the text of the object.


longString = page.text

Now that the ENTIRE report card is a string, we just need to find the student number in that string data so we can generate the email address.

But wait, how to find the number in that haystack of data?

Part Five: Oh yeah, it’s REGEX time!

Regular Expressions, or regex, is a great way to find most anything in an ocean of data, but it has a brutal learning curve and it’s very unforgiving to newbies. There are lots of great resources and tools online to help you with Regex, I suggest you use them.

I know the student number is the only 6 digit number in the PDF so I’ll look for that.


studentNumber = longString[/\b\d{6}\b/] 
#returns the six digit number for emailing
eMailTarget = studentNumber + '@theSchool.edu'

Part Six: Mailing ain’t easy
So, you test the code and it all works great… but how to mail?

This was the hardest part, and your milage may vary depending on the mail server configuration where you are.

You might need to install the mail Gem, depending on where you are.


require 'mail'  # ruby mail library. https://github.com/mikel/mail
require 'openssl' #sometimes, Outlook just makes you crazy...
#Sending via Outlook

    Mail.defaults do
      delivery_method :smtp, { 
                               :address              => 'mail.theSchool.edu',
                               :port                 => 587,
                               :domain               => 'theSchool.edu',
                               :user_name            => 'theSchool/poorTeacher',
                               :password             => 'summerVacation',
                               :authentication       => :login,
                                :enable_starttls_auto => true,
                                :openssl_verify_mode => OpenSSL::SSL::VERIFY_NONE  
                               }
 
    end
                          
    # send test message
    Mail.deliver do
 
        from    'poorTeacher@sts.theSchool.edu'
        to      eMailTarget
        subject 'Report Card'
        body    'Congratulations on getting a report card'
        add_file :filename =>  rb_pdf
    end
    
    puts "mailed to " + eMailTarget  
    # end of mailer part

Summary
So what have we learned?

  • If you are doing the same thing 10 or 20 times over, it means a script should be doing it.
  • You can read pdfs using a Ruby Gem.
  • Regex is wicked powerful and can be wicked hard to figure out.
  • Outlook can drive you crazy if you are trying to automate something.

Author: heinrich

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.