How to Import PDF’s into WordStat for Stata


Our distributor in Australia, has written this quick Blog that we thought you might find instructive: How to Import PDF’s into WordStat for Stata. You can also see a video tutorial on our website on how to import PDF and other documents such as ASCII, Rich Text Format, MS Word, HTML and WordPerfect documents using the Document Conversion Wizard which is available in QDA Miner and in WordStat 8: Document Conversion Wizard

If you have a PDF and you want to get it into Stata for analysis using WordStat then these are the steps to follow:

  • Open Stata
  • Go the User menu
  • Select WordStat
  • Select Document Conversion Wizard
  • Browse to find the file that you want to convert
  • Tell WordStat how you want to process the file.  It can be processed as a single document, as pages, as paragraphs or sections.  The simplest way is as a whole document, however, if you do it by pages or paragraphs you can analyse between pages or between paragraphs.
  • Click on Next
  • Then select the type of file that you want to save.  The options are QDA Miner Project (.PPJ) or Stata 13 (.dta) or Stata 14/15 (.dta).
  • Save the file to a location on your C:\

If you have saved the file as a Stata file, go back into Stata and then open the file.  You will have a Stata file with observations and variables.  If you chose document then there will be only one observation with all the text of the document as one of the variables. This is why Provalis Research chose Stata to work with, as a single Stata observation can hold up to 2.14 billion characters.  If you chose pages, then you will have observations for as many pages as was detected by the Document Conversion Wizard.

You then perform the analysis using WordStat.

Other things to remember are that you can import multiple documents using the Document Conversion Wizard.  For example if there were three studies that you wanted to examine then you can import all three at one time.  If you import as documents, then you would have (in this case) 3 cases / observations.  Again you can split it by pages or paragraph or section (as defined by a particular character in the document).