{"id":48,"date":"2019-10-02T23:24:17","date_gmt":"2019-10-03T03:24:17","guid":{"rendered":"http:\/\/iheinrich.com\/?p=48"},"modified":"2019-10-03T11:47:14","modified_gmt":"2019-10-03T15:47:14","slug":"have-you-read-this-pdf","status":"publish","type":"post","link":"https:\/\/iheinrich.com\/index.php\/2019\/10\/02\/have-you-read-this-pdf\/","title":{"rendered":"Have you read this PDF?"},"content":{"rendered":"\n<p>A friend of mine came to me with interesting situation.<\/p>\n\n\n\n<p>He was a teacher at a school where the&nbsp;student\u2019s email address was their <strong>student number<\/strong> + <strong>the school server<\/strong>. If you were student number 121, your e-mail address was <em>121@theSchool.edu<\/em>.<\/p>\n\n\n\n<p>You get the idea.<\/p>\n\n\n\n<p>The report cards were in PDF form, and they wanted to mail off the \nreport cards to the students. How were they planning on doing this?<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Gather a group of teachers in a conference room with their computers<\/li><li>Give them a thumb drive full of report cards on PDFs<\/li><li>Have the teacher open the PDF file<\/li><li>COPY the student number<\/li><li>Create an email, using the COPIED student number (PASTE and type \u201c@theSchool.edu\u201d)<\/li><li>Copy from ANOTHER text source the message, \u201cDear Student, yakkity \nyak yak. Here is your report card. Read \u2019em weep. Love, the School\u201d<\/li><li>SEND email<\/li><li>Rinse, lather and repeat<\/li><\/ol>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright\"><a href=\"https:\/\/web.archive.org\/web\/20190211074905\/http:\/\/www.iheinrich.com\/wordpress\/wp-content\/uploads\/2015\/08\/RubyMan1.jpg\"><img decoding=\"async\" src=\"https:\/\/web.archive.org\/web\/20190211074905im_\/http:\/\/www.iheinrich.com\/wordpress\/wp-content\/uploads\/2015\/08\/RubyMan1-151x300.jpg\" alt=\"\" class=\"wp-image-187\"\/><\/a><figcaption>wearing a mask, a cape and armed with mad Ruby skillz, you can save the world (or at least a roomful of teachers)<\/figcaption><\/figure><\/div>\n\n\n\n<p>This looks like a job for <strong>RUBY SUPERHERO!<\/strong><\/p>\n\n\n\n<p><strong>Part One: Reading PDFs.<\/strong><\/p>\n\n\n\n<p>Like most tough things in Ruby, there is a GEM for that. In the case  of parsing PDFs, you need to install the PDF Reader Gem which is located  at:<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/yob\/pdf-reader\">https:\/\/github.com\/yob\/pdf-reader<\/a><\/p>\n\n\n\n<p>Using this Gem, we can read the PDF and put it into a string format and then extract the data we need.<\/p>\n\n\n\n<p>After installing the GEM, your script will start with<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/env ruby\nrequire 'pdf-reader'<\/code><\/pre>\n\n\n\n<p><strong>Part Two: Counting the (Report) Cards<\/strong><\/p>\n\n\n\n<p>Since we have a folder full of PDFs, we need to read the folder and get a count of the PDF files in the folder.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\ntheCount = Dir.glob('*.pdf').count\nputs \"there are \" + Dir.glob('*.pdf').count + \" files to e-mail\"<\/code><\/pre>\n\n\n\n<p>I like to toss in friendly <strong>put<\/strong> messages so when things go horribly wrong, you get some indication of where it broke.<\/p>\n\n\n\n<p><strong>Part Three: Iterate thru the folder<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nDir.glob('*.pdf') do |rb_pdf|\n\n# lots of Ruby Magic! \n\nend<\/code><\/pre>\n\n\n\n<p>This nifty line says, \u201cRead the directory, looking ONLY at files that\n end in .pdf and call the elements \u201crb_pdf\u201d. As you go thru the elements\n one by one, do stuff to them. <\/p>\n\n\n\n<p>The stuff part is coming.<\/p>\n\n\n\n<p><strong>Part Four: Read the file!<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nreader = PDF::Reader.new(rb_pdf)<\/code><\/pre>\n\n\n\n<p>This is where the GEM pdf-reader comes in. It will read the file element \u201crb_pdf\u201d and creates an object called \u201creader\u201d.<\/p>\n\n\n\n<p>Object \u201creader\u201d has a number of elements, but we are most interested in the text of the object.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nlongString = page.text<\/code><\/pre>\n\n\n\n<p>Now that the ENTIRE report card is a string, we just need to find the\n student number in that string data so we can generate the email \naddress.<\/p>\n\n\n\n<p>But wait, how to find the number in that haystack of data?<\/p>\n\n\n\n<p><strong>Part Five: Oh yeah, it\u2019s REGEX time!<\/strong><\/p>\n\n\n\n<p><em>Regular Expressions<\/em>, or regex, is a great way to find most \nanything in an ocean of data, but it has a brutal learning curve and \nit\u2019s very unforgiving to newbies. There are lots of great resources and \ntools online to help you with Regex, I suggest you use them.<\/p>\n\n\n\n<p>I know the student number is the only 6 digit number in the PDF so I\u2019ll look for that.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nstudentNumber = longString[\/\\b\\d{6}\\b\/] \n#returns the six digit number for emailing\neMailTarget = studentNumber + '@theSchool.edu'<\/code><\/pre>\n\n\n\n<p><strong>Part Six: Mailing ain\u2019t easy<\/strong><br>\nSo, you test the code and it all works great\u2026 but how to mail?<\/p>\n\n\n\n<p>This was the hardest part, and your milage may vary depending on the mail server configuration where you are.<\/p>\n\n\n\n<p>You might need to install the mail Gem, depending on where you are.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nrequire 'mail'  # ruby mail library. https:\/\/github.com\/mikel\/mail\nrequire 'openssl' #sometimes, Outlook just makes you crazy...\n#Sending via Outlook\n\n    Mail.defaults do\n      delivery_method :smtp, { \n                               :address              => 'mail.theSchool.edu',\n                               :port                 => 587,\n                               :domain               => 'theSchool.edu',\n                               :user_name            => 'theSchool\/poorTeacher',\n                               :password             => 'summerVacation',\n                               :authentication       => :login,\n                                :enable_starttls_auto => true,\n                                :openssl_verify_mode => OpenSSL::SSL::VERIFY_NONE  \n                               }\n \n    end\n                          \n    # send test message\n    Mail.deliver do\n \n        from    'poorTeacher@sts.theSchool.edu'\n        to      eMailTarget\n        subject 'Report Card'\n        body    'Congratulations on getting a report card'\n        add_file :filename =>  rb_pdf\n    end\n    \n    puts \"mailed to \" + eMailTarget  \n    # end of mailer part<\/code><\/pre>\n\n\n\n<p><strong>Summary<\/strong><br>\nSo what have we learned?<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>If you are doing the same thing 10 or 20 times over, it means a script should be doing it.<\/li><li>You can read pdfs using a Ruby Gem.<\/li><li>Regex is wicked powerful and can be wicked hard to figure out.<\/li><li>Outlook can drive you crazy if you are trying to automate something.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A friend of mine came to me with interesting situation. He was a teacher at a school where the&nbsp;student\u2019s email&#8230;<\/p>\n","protected":false},"author":1,"featured_media":49,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[5],"tags":[],"class_list":["post-48","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ruby","wpcat-5-id"],"_links":{"self":[{"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/posts\/48","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/comments?post=48"}],"version-history":[{"count":2,"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/posts\/48\/revisions"}],"predecessor-version":[{"id":51,"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/posts\/48\/revisions\/51"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/media\/49"}],"wp:attachment":[{"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/media?parent=48"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/categories?post=48"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/iheinrich.com\/index.php\/wp-json\/wp\/v2\/tags?post=48"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}