• ABOUT US
  • CONTACT US

I Love My Linux - A linux technology blog for penguins

  • blog
  • distributions
  • projects
  • contributors
Home

open source

Interesting read...

Michael Fletcher — Fri, 08/05/2009 - 10:23

Michael Fletcher's picture

I found this blog post by Mark Shuttleworth to be very interesting.   Mainly because I know very little about the development process of open source software.  It outlines some interesting points and if you have any points, please comment on the original post :-)

http://www.markshuttleworth.com/archives/288

 

  • Mark Shuttleworth
  • open source
  • Michael Fletcher's blog
  • Login to post comments

Ouch... nerds getting physical

Michael Fletcher — Thu, 05/03/2009 - 19:48

Michael Fletcher's picture

Man, makes me worry who I'm talking to about OSS...

http://linuxlock.blogspot.com/2009/03/tempers-flare-as-recession-creeps-...

  • freedom
  • marketing
  • open source
  • Michael Fletcher's blog
  • 1 comment

IEC follow-up

Quinn Reynolds — Tue, 20/01/2009 - 10:34

Quinn Reynolds's picture

IEC have joined the less-useless world.

http://mybroadband.co.za/news/Internet/6607.html

Good for them! Now SARS needs to pull some finger and follow suit.

  • firefox
  • open source
  • Quinn Reynolds's blog
  • Login to post comments

Let's hope he's listening.

Quinn Reynolds — Wed, 14/01/2009 - 10:48

Quinn Reynolds's picture

Economist and all-round smart guy Dean Baker wrote to Barack Obama suggesting some cunning things he might be able to do with his stimulus package (you in the back, stop sniggering) for the US. Note particularly Point 6.

http://www.truthout.org/011209R

  • economy
  • open source
  • Quinn Reynolds's blog
  • Login to post comments

Open Source Movies

Michael Fletcher — Sat, 28/06/2008 - 12:57

Michael Fletcher's picture

If you are interested in what Blender (a free open source 3D content creation suite) can do, I would suggest that you pop on over to the following two movies:

http://orange.blender.org (Elephants Dream is the story of two strange characters exploring a capricious and seemingly infinite machine. Created in 2005/06)

http://peach.blender.org (Big Buck Bunny tells the story of a giant rabbit with a heart bigger than himself. When one sunny day three rodents rudely harass him, something snaps... and the rabbit ain't no bunny anymore! In the typical cartoon tradition he prepares the nasty rodents a comical revenge. Created in 2007/08)

And the new project (although still a while away), http://apricot.blender.org which will be not be working on a movie, but using Blender to create a 3d game.  Crystal Space will be used as the 3D engine and delivery platform, and Python for some magic scripting to glue things together.

  • blender
  • creative commons
  • open source
  • Michael Fletcher's blog
  • Login to post comments

OCR Howto

Quinn Reynolds — Thu, 19/06/2008 - 12:23

Quinn Reynolds's picture

Optical Character Recognition (OCR) is a process that turns a scanned picture of a text document into an actual text document, using fancy AI techniques and an assortment of other whizzbang cunningness I'm not going to talk about. If you're in any kind of career that deals with reference material, you occasionally need to resort to OCR to convert hardcopy into ones and zeros.

I suddenly required OCR today when I was faced with a table (several tables, actually, each spanning several pages) full of useful information in a printed book that was unavailable in other formats. The authors of the book wanted to charge me several hundred dollars to send me the same data on a CD, which offended me since they were basically counting on making a quick buck off me being lazy and not wanting to enter in the data manually. Fortunately for you, I am lazy, and did not want to enter the data manually, so instead of wasting a morning doing that (or spending several hundred dollars), I wasted a morning learning how to do OCR.

I'm assuming you already have an installed, working scanner and/or a scanned copy of your document in some sort of image format. In Ubuntu 8.04, you then need to install the following packages through your package manager:

tesseract + dependencies

tesseract-ocr-eng (or whatever language pack is appropriate to your source material)

imagemagick (in theory you can also use GIMP, but a few people on the webtubes have complained about it making TIFF files that tesseract can't read)

OCR is not an exact science; it's worth your while to give tesseract the best chance and spend some time cleaning up your scanned document image first. Fire it up in your favourite image editor and do the following: rotate to get everything as horizontal as possible, crop and erase as much of the non-text graphics (borders, dust marks, embedded figures, etc) as you can, convert to grayscale (or even b&w). Re-save the image file.

tesseract only reads in certain kinds of TIFF files. Save-as-TIFF in GIMP and other image editors may work fine, but the method I used was to edit the original document scan in a lossless format like ppm, and then convert it to TIFF using ImageMagick's convert command-line function:

convert YourDocument.ppm YourDocument.tif

Substitute the .ppm for whatever extension your scanned document has, ImageMagick recognises just about everything.

Now you're ready to go. tesseract also runs from the command-line, like so:

tesseract YourDocument.tif YourOCR

That'll run the TIFF file of your document scan through tesseract's OCR engine and dump the output in a text file called YourOCR.txt. Obviously you can also batch many of these operations together if you have a large number of files.

  • ocr
  • open source
  • Ubuntu
  • Quinn Reynolds's blog
  • 1 comment
Syndicate content

Search

Poll

LSB 4.0 just got released. Do you think the Linux Standards Base is a good idea?:

Tags

eee pc firefox freedom humour linux marketing microsoft open source Ubuntu windows
more tags

Recent comments

  • Re
    3 hours 33 min ago
  • Re
    7 hours 36 min ago
  • Re
    11 hours 33 min ago
  • Re
    1 week 2 days ago
  • Re
    1 week 4 days ago
  • Re
    1 week 5 days ago
  • Re
    1 week 6 days ago
  • Re
    1 week 6 days ago
  • Re
    2 weeks 3 days ago
  • Re
    2 weeks 3 days ago

Recent blog posts

  • Fuzzy fonts in firefox 3.5?
  • Classic bash...
  • Off Topic - but so cool
  • HTC-Magic demo video
  • Thoughts on the HTC-Magic and Google Android
  • You Know You Want One.
  • watch this space...
  • Interesting read...
  • Does Dubya Know About This?
  • Ubuntu 9.04 Screen Switching Bullshit.
more

User login

  • Request new password
  • blog
  • distributions
  • projects
  • contributors