国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home Backend Development PHP Tutorial How to Extract Text from Word, Excel, and PowerPoint Files Using PHP?

How to Extract Text from Word, Excel, and PowerPoint Files Using PHP?

Nov 17, 2024 pm 07:42 PM

How to Extract Text from Word, Excel, and PowerPoint Files Using PHP?

How to Extract Text from Word and Other Office Files in PHP

Retrieving text from Microsoft Office documents, including Word (.doc and .docx), Excel (.xlsx), and PowerPoint (.pptx), is often necessary for tasks such as searching within document content.

Reading Word Documents

For .doc files, a binary file approach can be used:

class DocxConversion{
    // ...
    private function read_doc() {
        $fileHandle = fopen($this->filename, "r");
        $line = @fread($fileHandle, filesize($this->filename));   
        $lines = explode(chr(0x0D),$line);
        $outtext = "";
        foreach($lines as $thisline)
          {
            $pos = strpos($thisline, chr(0x00));
            if (($pos !== FALSE)||(strlen($thisline)==0))
              {
              } else {
                $outtext .= $thisline." ";
              }
          }
         $outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/","",$outtext);
        return $outtext;
    }
    // ...
}

For .docx files, which are essentially zip files containing XML, you'll need to:

class DocxConversion{
    // ...
    private function read_docx(){
        $striped_content = '';
        $content = '';
        $zip = zip_open($this->filename);
        if (!$zip || is_numeric($zip)) return false;
        while ($zip_entry = zip_read($zip)) {
            if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
            if (zip_entry_name($zip_entry) != "word/document.xml") continue;
            $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
            zip_entry_close($zip_entry);
        }// end while
        zip_close($zip);
        $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
        $content = str_replace('</w:r></w:p>', "\r\n", $content);
        $striped_content = strip_tags($content);
        return $striped_content;
    }
    // ...
}

Reading Excel Files

This can be done by extracting text from the "xl/sharedStrings.xml" file within the Excel file:

class DocxConversion{
    // ...
    function xlsx_to_text($input_file){
        $xml_filename = "xl/sharedStrings.xml"; //content file name
        $zip_handle = new ZipArchive;
        $output_text = "";
        if(true === $zip_handle->open($input_file)){
            if(($xml_index = $zip_handle->locateName($xml_filename)) !== false){
                $xml_datas = $zip_handle->getFromIndex($xml_index);
                $xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
                $output_text = strip_tags($xml_handle->saveXML());
            }else{
                $output_text .="";
            }
            $zip_handle->close();
        }else{
        $output_text .="";
        }
        return $output_text;
    }
    // ...
}

Reading PowerPoint Files

To extract text from a PowerPoint presentation, open each slide (.xml) file within the zip container:

class DocxConversion{
    // ...
    function pptx_to_text($input_file){
        $zip_handle = new ZipArchive;
        $output_text = "";
        if(true === $zip_handle->open($input_file)){
            $slide_number = 1; //loop through slide files
            while(($xml_index = $zip_handle->locateName("ppt/slides/slide".$slide_number.".xml")) !== false){
                $xml_datas = $zip_handle->getFromIndex($xml_index);
                $xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
                $output_text .= strip_tags($xml_handle->saveXML());
                $slide_number++;
            }
            if($slide_number == 1){
                $output_text .="";
            }
            $zip_handle->close();
        }else{
        $output_text .="";
        }
        return $output_text;
    }
    // ...
}

Usage

To use this class for file conversion, instantiate it with the file path and call the convertToText() method:

$docObj = new DocxConversion("test.doc");
//$docObj = new DocxConversion("test.docx");
//$docObj = new DocxConversion("test.xlsx");
//$docObj = new DocxConversion("test.pptx");
echo $docText= $docObj->convertToText();

The above is the detailed content of How to Extract Text from Word, Excel, and PowerPoint Files Using PHP?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

ArtGPT

ArtGPT

AI image generator for creative art from text prompts.

Stock Market GPT

Stock Market GPT

AI powered investment research for smarter decisions

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to work with arrays in php How to work with arrays in php Aug 20, 2025 pm 07:01 PM

PHParrayshandledatacollectionsefficientlyusingindexedorassociativestructures;theyarecreatedwitharray()or[],accessedviakeys,modifiedbyassignment,iteratedwithforeach,andmanipulatedusingfunctionslikecount(),in_array(),array_key_exists(),array_push(),arr

Describe the Observer design pattern and its implementation in PHP. Describe the Observer design pattern and its implementation in PHP. Aug 15, 2025 pm 01:54 PM

TheObserverdesignpatternenablesautomaticnotificationofdependentobjectswhenasubject'sstatechanges.1)Itdefinesaone-to-manydependencybetweenobjects;2)Thesubjectmaintainsalistofobserversandnotifiesthemviaacommoninterface;3)Observersimplementanupdatemetho

How to use the $_COOKIE variable in php How to use the $_COOKIE variable in php Aug 20, 2025 pm 07:00 PM

$_COOKIEisaPHPsuperglobalforaccessingcookiessentbythebrowser;cookiesaresetusingsetcookie()beforeoutput,readvia$_COOKIE['name'],updatedbyresendingwithnewvalues,anddeletedbysettinganexpiredtimestamp,withsecuritybestpracticesincludinghttponly,secureflag

Compare and contrast PHP Traits, Abstract Classes, and Interfaces with practical use cases. Compare and contrast PHP Traits, Abstract Classes, and Interfaces with practical use cases. Aug 11, 2025 pm 11:17 PM

Useinterfacestodefinecontractsforunrelatedclasses,ensuringtheyimplementspecificmethods;2.Useabstractclassestosharecommonlogicamongrelatedclasseswhileenforcinginheritance;3.Usetraitstoreuseutilitycodeacrossunrelatedclasseswithoutinheritance,promotingD

Explain database indexing strategies (e.g., B-Tree, Full-text) for a MySQL-backed PHP application. Explain database indexing strategies (e.g., B-Tree, Full-text) for a MySQL-backed PHP application. Aug 13, 2025 pm 02:57 PM

B-TreeindexesarebestformostPHPapplications,astheysupportequalityandrangequeries,sorting,andareidealforcolumnsusedinWHERE,JOIN,orORDERBYclauses;2.Full-Textindexesshouldbeusedfornaturallanguageorbooleansearchesontextfieldslikearticlesorproductdescripti

What are public, private, and protected in php What are public, private, and protected in php Aug 24, 2025 am 03:29 AM

Public members can be accessed at will; 2. Private members can only be accessed within the class; 3. Protected members can be accessed in classes and subclasses; 4. Rational use can improve code security and maintainability.

How to get the current date and time in PHP? How to get the current date and time in PHP? Aug 31, 2025 am 01:36 AM

Usedate('Y-m-dH:i:s')withdate_default_timezone_set()togetcurrentdateandtimeinPHP,ensuringaccurateresultsbysettingthedesiredtimezonelike'America/New_York'beforecallingdate().

How to execute an UPDATE query in php How to execute an UPDATE query in php Aug 24, 2025 am 05:04 AM

Using MySQLi object-oriented method: establish a connection, preprocess UPDATE statements, bind parameters, execute and check the results, and finally close the resource. 2. Using MySQLi procedure method: connect to the database through functions, prepare statements, bind parameters, perform updates, and close the connection after processing errors. 3. Use PDO: Connect to the database through PDO, set exception mode, pre-process SQL, bind parameters, perform updates, use try-catch to handle exceptions, and finally release resources. Always use preprocessing statements to prevent SQL injection, verify user input, and close connections in time.

See all articles